bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

Computer Science > Computer Vision and Pattern Recognition​

[Submitted on 14 Dec 2023 (this version), latest version 21 Dec 2023 (v2)]

CogAgent: A Visual Language Model for GUI Agents​

Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, Jie Tang
People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e.g., computer or smartphone screens. Large language models (LLMs) such as ChatGPT can assist people in tasks like writing emails, but struggle to understand and interact with GUIs, thus limiting their potential to increase automation levels. In this paper, we introduce CogAgent, an 18-billion-parameter visual language model (VLM) specializing in GUI understanding and navigation. By utilizing both low-resolution and high-resolution image encoders, CogAgent supports input at a resolution of 1120*1120, enabling it to recognize tiny page elements and text. As a generalist visual language model, CogAgent achieves the state of the art on five text-rich and four general VQA benchmarks, including VQAv2, OK-VQA, Text-VQA, ST-VQA, ChartQA, infoVQA, DocVQA, MM-Vet, and POPE. CogAgent, using only screenshots as input, outperforms LLM-based methods that consume extracted HTML text on both PC and Android GUI navigation tasks -- Mind2Web and AITW, advancing the state of the art. The model and codes are available at \url{this https URL}.
Comments:27 pages, 19 figures
Subjects:Computer Vision and Pattern Recognition (cs.CV)
Cite as:arXiv:2312.08914 [cs.CV]
(or arXiv:2312.08914v1 [cs.CV] for this version)
[2312.08914] CogAgent: A Visual Language Model for GUI Agents
Focus to learn more

Submission history​

From: Wenyi Hong [view email]
[v1] Thu, 14 Dec 2023 13:20:57 UTC (11,917 KB)
[v2] Thu, 21 Dec 2023 09:41:25 UTC (11,917 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM​

Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai
This paper presents EdgeSAM, an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance. Our approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture, better suited for edge devices. We carefully benchmark various distillation strategies and demonstrate that task-agnostic encoder distillation fails to capture the full knowledge embodied in SAM. To overcome this bottleneck, we include both the prompt encoder and mask decoder in the distillation process, with box and point prompts in the loop, so that the distilled model can accurately capture the intricate dynamics between user input and mask generation. To mitigate dataset bias issues stemming from point prompt distillation, we incorporate a lightweight module within the encoder. EdgeSAM achieves a 40-fold speed increase compared to the original SAM, and it also outperforms MobileSAM, being 14 times as fast when deployed on edge devices while enhancing the mIoUs on COCO and LVIS by 2.3 and 3.2 respectively. It is also the first SAM variant that can run at over 30 FPS on an iPhone 14. Code and models are available at this https URL.
Comments:Project page: this https URL
Subjects:Computer Vision and Pattern Recognition (cs.CV)
Cite as:arXiv:2312.06660 [cs.CV]
(or arXiv:2312.06660v1 [cs.CV] for this version)
[2312.06660] EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM
Focus to learn more

Submission history

From: Chong Zhou [view email]
[v1] Mon, 11 Dec 2023 18:59:52 UTC (9,665 KB)
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

Microsoft Copilot app arrives on iOS, bringing free ChatGPT-4 to iPhone and iPad​


By Mayank Parmar

December 30, 2023

Microsoft Copilot app for iOS

Microsoft has released its Copilot app for iOS devices, just two days after I reported exclusively that the iOS version was almost ready. Now, iPhone and iPad users can download it from the Apple App Store. Copilot isn’t just another name for Bing Chat; it brings some cool new things to the table.

Microsoft Copilot app is available via Apple’s App Store, and the tech giant clarified that the AI app is optimized for iPad. In our tests, we also noticed that Copilot app offers better experience than the dedicated ChatGPT app for iPad.

While Bing integrates search, rewards, and chat functionalities, Copilot streamlines its focus on an advanced chat interface, incorporating features like the innovative DALL-E 3. Unlike Bing, which combines search and AI, Copilot emerges as a ChatGPT-like app tailored to iOS devices.



Microsoft Copilot for iPad

Microsoft Copilot for iPad


Copilot maintains its versatility as a multi-functional assistant on Android, supporting GPT Vision, GPT-4, and DALL-E. The addition of ChatGPT-4 Turbo, available to select users, enhances its ability to provide information on recent events, especially with the search plugin disabled.


Microsoft Copilot for iOS

Copilot is ChatGPT-4 for iOS, but it’s absolutely free. You can write emails, stories, or even summarize tough texts. It’s great with languages too, helping you translate, proofread, and more. And it’s all free, which Microsoft says won’t change.


Microsoft Copilot with GPT-4 toggle

It also plugin support, even third-party plugins at no extra cost. This can be super handy for planning trips or sprucing up your resume. Plus, there’s the Image Creator that turns your words into pictures right on your phone.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871





Model Card for Notux 8x7B-v1​

This model is a preference-tuned version of mistralai/Mixtral-8x7B-Instruct-v0.1 on the argilla/ultrafeedback-binarized-preferences-cleaned dataset using DPO (Direct Preference Optimization).


As of Dec 26th 2023, it outperforms Mixtral-8x7B-Instruct-v0.1 and is the top ranked MoE (Mixture of Experts) model on the Hugging Face Open LLM Leaderboard.


This is part of the Notus family of models and experiments, where the Argilla team investigates data-first and preference tuning methods like dDPO (distilled DPO). This model is the result of our first experiment at tuning a MoE model that has already been fine-tuned with DPO (i.e., Mixtral-8x7B-Instruct-v0.1).


Model Details​

Model Description​

  • Developed by: Argilla (based on MistralAI previous efforts)
  • Shared by: Argilla
  • Model type: Pretrained generative Sparse Mixture of Experts
  • Language(s) (NLP): English, Spanish, Italian, German, and French
  • License: MIT
  • Finetuned from model: mistralai/Mixtral-8x7B-Instruct-v0.1

Model Sources​

Training Details​

Training Hardware​

We used a VM with 8 x H100 80GB hosted in runpod.io for 1 epoch (~10hr).

Training Data​

We used a new iteration of the Argilla UltraFeedback preferences dataset named argilla/ultrafeedback-binarized-preferences-cleaned.

Training procedure​

Training hyperparameters​

The following hyperparameters were used during training:
  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results​

Training Loss​
Epoch​
Step​
Validation Loss​
Rewards/chosen​
Rewards/rejected​
Rewards/accuracies​
Rewards/margins​
Logps/rejected​
Logps/chosen​
Logits/rejected​
Logits/chosen​
0.4384​
0.22​
200​
0.4556​
-0.3275​
-1.9448​
0.7937​
1.6174​
-405.7994​
-397.8617​
-1.3157​
-1.4511​
0.4064​
0.43​
400​
0.4286​
-0.2163​
-2.2090​
0.8254​
1.9927​
-408.4409​
-396.7496​
-0.7660​
-0.6539​
0.3952​
0.65​
600​
0.4275​
-0.1311​
-2.1603​
0.8016​
2.0291​
-407.9537​
-395.8982​
-0.6783​
-0.7206​
0.3909​
0.87​
800​
0.4167​
-0.2273​
-2.3146​
0.8135​
2.0872​
-409.4968​
-396.8602​
-0.8458​
-0.7738​

Framework versions​

  • Transformers 4.36.0
  • Pytorch 2.1.0+cu118
  • Datasets 2.14.6
  • Tokenizers 0.15.0



DEMO:​

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871



CONVERSATIONAL AI​

NLP​

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations​

December 07, 2023



Abstract​

We introduce Llama Guard, an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. Our model incorporates a safety risk taxonomy, a valuable tool for categorizing a specific set of safety risks found in LLM prompts (i.e., prompt classification). This taxonomy is also instrumental in classifying the responses generated by LLMs to these prompts, a process we refer to as response classification. For the purpose of both prompt and response classification, we have meticulously gathered a dataset of high quality. Llama Guard, a Llama2-7b model that is instruction-tuned on our collected dataset, albeit low in volume, demonstrates strong performance on existing benchmarks such as the OpenAI Moderation Evaluation dataset and ToxicChat, where its performance matches or exceeds that of currently available content moderation tools. Llama Guard functions as a language model, carrying out multi-class classification and generating binary decision scores. Furthermore, the instruction fine-tuning of Llama Guard allows for the customization of tasks and the adaptation of output formats. This feature enhances the model's capabilities, such as enabling the adjustment of taxonomy categories to align with specific use cases, and facilitating zero-shot or few-shot prompting with diverse taxonomies at the input. We are making Llama Guard model weights available and we encourage researchers to further develop and adapt them to meet the evolving needs of the community for AI safety.















🤗 Models on Hugging Face | Blog | Website | CyberSec Eval Paper | Llama Guard Paper






Purple Llama​

Purple Llama is an umbrella project that over time will bring together tools and evals to help the community build responsibly with open generative AI models. The initial release will include tools and evals for Cyber Security and Input/Output safeguards but we plan to contribute more in the near future.



Why purple?​

Borrowing a concept from the cybersecurity world, we believe that to truly mitigate the challenges which generative AI presents, we need to take both attack (red team) and defensive (blue team) postures. Purple teaming, composed of both red and blue team responsibilities, is a collaborative approach to evaluating and mitigating potential risks and the same ethos applies to generative AI and hence our investment in Purple Llama will be comprehensive.



License​

Components within the Purple Llama project will be licensed permissively enabling both research and commercial usage. We believe this is a major step towards enabling community collaboration and standardizing the development and usage of trust and safety tools for generative AI development. More concretely evals and benchmarks are licensed under the MIT license while any models use the Llama 2 Community license. See the table below:





Component Type​
Components​
License​
Evals/Benchmarks​
Cyber Security Eval (others to come)​
MIT​
Models​
Llama Guard​


Evals & Benchmarks​



Cybersecurity​

We are sharing what we believe is the first industry-wide set of cybersecurity safety evaluations for LLMs. These benchmarks are based on industry guidance and standards (e.g., CWE and MITRE ATT&CK) and built in collaboration with our security subject matter experts. With this initial release, we aim to provide tools that will help address some risks outlined in the White House commitments on developing responsible AI, including:



Metrics for quantifying LLM cybersecurity risks. Tools to evaluate the frequency of insecure code suggestions. Tools to evaluate LLMs to make it harder to generate malicious code or aid in carrying out cyberattacks. We believe these tools will reduce the frequency of LLMs suggesting insecure AI-generated code and reduce their helpfulness to cyber adversaries. Our initial results show that there are meaningful cybersecurity risks for LLMs, both with recommending insecure code and for complying with malicious requests. See our Cybersec Eval paper for more details.



You can also check out the 🤗 leaderboard here.



Input/Output Safeguards​

As we outlined in Llama 2’s Responsible Use Guide, we recommend that all inputs and outputs to the LLM be checked and filtered in accordance with content guidelines appropriate to the application.





Llama Guard​

To support this, and empower the community, we are releasing Llama Guard, an openly-available model that performs competitively on common open benchmarks and provides developers with a pretrained model to help defend against generating potentially risky outputs.



As part of our ongoing commitment to open and transparent science, we are releasing our methodology and an extended discussion of model performance in our Llama Guard paper. This model has been trained on a mix of publicly-available datasets to enable detection of common types of potentially risky or violating content that may be relevant to a number of developer use cases. Ultimately, our vision is to enable developers to customize this model to support relevant use cases and to make it easier to adopt best practices and improve the open ecosystem.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

Voice Assistant for the Web​

A smart voice assistant optimized for low latency responses. Uses Vercel Edge Functions, Whisper Speech Recognition, GPT-3.5 Turbo and Eleven Labs TTS streaming.

View Demo · Report Bug · Request Feature


Logo

Features​

✅ A Siri-like voice assistant within your browser

✅ Optimized for low latency responses

✅ With the combined power of OpenAI, Whisper Speech Recognition and Eleven Labs

Demo​

You can test the voice assistant here: https://heyassistant.co

Motivation​

Voice Assistants have become an integral part of our lives. They are everywhere. In our phones, in our cars, in our homes. Why not also on the web?

Until recently the main problem with voice assistants on the web was the latency. It took too long to send the audio to the server, generate an LLM completion and send speech back. The recent advances of OpenAI, Eleven Labs and Vercel have made it possible to build a voice assistant that is fast enough to be used on the web.

I would love to for this repo to become the go-to place for people who want to build their own voice assistant. I've been working on this project for a while now and I'm really excited to share it with you.


Thoughts on latency and user experience​

The latency of the voice assistant is the most important factor for a good user experience. Currently there are 3 main factors that contribute to the latency:


  • The time it takes to transcribe the audio (Via Whisper Speech Recognition)
  • The time it takes to generate the response (Via GPT-3.5 Turbo)
  • The time it takes to stream the speech response (Via Eleven Labs TTS)

Based on some tests I've done, the speech generation takes the most time. The longer the text to be synthesized, the longer it takes to generate the speech. The latency of the speech generation is also the most unpredictable.

A possible mitigation strategy might be splitting the response into multiple parts and streaming them one after another. This would allow the user to start listening to the response while the rest of the response is being generated. I haven't implemented this yet, but it's something I'm considering. If you have any ideas on how to improve the latency, please let me know.

Another thing to keep in mind is perceived wait time. Based on some research, it seems that the perceived wait time is shorter if the user is given some kind of feedback while waiting. I've implemented a simple "thinking" notification that is shown while the assistant is processing the response, but I'm sure there are better ways to improve the perceived wait time.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

How Nikon, Sony and Canon are fighting deepfakes with new technology​


Nikon, Sony Group, and Canon are working on embedding digital signatures in their cameras, which will act as proof of origin and integrity for the images.​

Rizwan Choudhury

Published: Dec 31, 2023 08:28 AM EST
INNOVATION



Photo equipment shop in Kowloon, Hong Kong.
Photo equipment shop in Kowloon, Hong Kong.
georgeclerk/iStock



As fake images become more convincing and widespread, camera makers are fighting back with new technology that can verify the authenticity of photos. Nikon, Sony Group, and Canon are working on embedding digital signatures in their cameras, which will act as proof of origin and integrity for the images.

As Nikkei Assia reports, digital signatures will contain information such as the date, time, location, and photographer of the image and will be resistant to tampering. This will help photojournalists and other professionals who need to ensure the credibility of their work. Nikon will offer this feature in its mirrorless cameras, while Sony and Canon will also incorporate it in their professional-grade mirrorless SLR cameras.

SEE ALSO
RELATED


Verify: A global standard for digital signatures​

The three camera giants have agreed on a global standard for digital signatures, which will make them compatible with a web-based tool called Verify. This tool, launched by an alliance of global news organizations, technology companies, and camera makers, will allow anyone to check the credentials of an image for free. Verify will display the relevant information if an image has a digital signature. If artificial intelligence creates or alters an image, Verify will flag it as having "No Content Credentials."

The need for such technology is evident, as deepfakes of prominent figures like former US President Donald Trump and Japanese Prime Minister Fumio Kishida have gone viral this year, raising questions about the trustworthiness of online content. Moreover, China's Tsinghua University researchers have developed a new generative AI technology called a latent consistency model, which can produce about 700,000 images daily.

How Nikon, Sony and Canon are fighting deepfakes with new technology

Canon app lets users see how an image was altered over time.
Canon

How others are also joining the fight​

Other technology companies are also joining the battle against fake images. Google has released a tool that adds invisible digital watermarksto AI-generated pictures, which can be detected by another tool. Intel has developed technology that can analyze the skin color changes of subjects in images, which indicate the blood flow under their skin, and use that to determine the image's authenticity. Hitachi is working on technology to prevent online identity fraud by verifying user images.

The new camera technology is expected to be available by 2024. Sony will release it in the spring of 2024, and Canon will follow suit later that year. Sony is also considering adding the feature to videos, and Canon is developing a similar video technology. Canon also released an image management app to tell whether humans take images.

Sony will also promote the adoption of the technology among other media outlets and has already field-tested it with The Associated Press in October. Canon has partnered with Thomson Reuters and the Starling Lab for Data Integrity, an institute co-founded by Stanford University and the University of Southern California, to develop the technology.

The camera makers hope their technology will help restore the trust and confidence in the images that shape our perception of the world.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871
Paper: [2312.07413] AI capabilities can be significantly improved without expensive retraining

Blog post: AI capabilities can be significantly improved without expensive retraining

Abstract:

State-of-the-art AI systems can be significantly improved without expensive retraining via "post-training enhancements"-techniques applied after initial training like fine-tuning the system to use a web browser. We review recent post-training enhancements, categorizing them into five types: tool-use, prompting methods, scaffolding, solution selection, and data generation. Different enhancements improve performance on different tasks, making it hard to compare their significance. So we translate improvements from different enhancements into a common currency, the compute-equivalent gain: how much additional training compute would be needed to improve performance by the same amount as the enhancement. Our non-experimental work shows that post-training enhancements have significant benefits: most surveyed enhancements improve benchmark performance by more than a 5x increase in training compute, some by more than 20x. Post-training enhancements are relatively cheap to develop: fine-tuning costs are typically <1% of the original training cost. Governing the development of capable post-training enhancements may be challenging because frontier models could be enhanced by a wide range of actors.

qeiRRK8.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871



Computer Science > Machine Learning​

[Submitted on 13 Dec 2023]

Distributed Inference and Fine-tuning of Large Language Models Over The Internet​

Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, Colin Raffel
Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to most researchers. In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategies. We observe that a large enough model (50B+) can run efficiently even on geodistributed devices in a consumer-grade network. This could allow running LLM efficiently by pooling together idle compute resources of multiple research groups and volunteers. We address two open problems: (1) how to perform inference and fine-tuning reliably if any device can disconnect abruptly and (2) how to partition LLMs between devices with uneven hardware, joining and leaving at will. In order to do that, we develop special fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput. We showcase these algorithms in Petals - a decentralized system that runs Llama 2 (70B) and BLOOM (176B) over the Internet up to 10x faster than offloading for interactive generation. We evaluate the performance of our system in simulated conditions and a real-world setup spanning two continents.
Comments:Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2023. 20 pages, 3 figures
Subjects:Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:arXiv:2312.08361 [cs.LG]
(or arXiv:2312.08361v1 [cs.LG] for this version)
[2312.08361] Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Focus to learn more

Submission history​

From: Max Ryabinin [view email]
[v1] Wed, 13 Dec 2023 18:52:49 UTC (403 KB)


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

Images altered to trick machine vision can influence humans too​

Published

2 JANUARY 2024

Authors

Gamaleldin Elsayed and Michael Mozer

New research shows that even subtle changes to digital images, designed to confuse computer vision systems, can also affect human perception

Computers and humans see the world in different ways. Our biological systems and the artificial ones in machines may not always pay attention to the same visual signals. Neural networks trained to classify images can be completely misled by subtle perturbations to an image that a human wouldn’t even notice.

That AI systems can be tricked by such adversarial images may point to a fundamental difference between human and machine perception, but it drove us to explore whether humans, too, might—under controlled testing conditions—reveal sensitivity to the same perturbations. In a series of experiments published in Nature Communications, we found evidence that human judgments are indeed systematically influenced by adversarial perturbations.

Our discovery highlights a similarity between human and machine vision, but also demonstrates the need for further research to understand the influence adversarial images have on people, as well as AI systems.

What is an adversarial image?​

An adversarial image is one that has been subtly altered by a procedure that causes an AI model to confidently misclassify the image contents. This intentional deception is known as an adversarial attack. Attacks can be targeted to cause an AI model to classify a vase as a cat, for example, or they may be designed to make the model see anything except a vase.

Three square images. The first of a vase of flowers, the second static pixels and the final a vase of flowers that is labelled.

Left: An Artificial Neural Network (ANN) correctly classifies the image as a vase but when perturbed by a seemingly random pattern across the entire picture (middle), with the intensity magnified for illustrative purposes – the resulting image (right) is incorrectly, and confidently, misclassified as a cat.

And such attacks can be subtle. In a digital image, each individual pixel in an RGB image is on a 0-255 scale representing the intensity of individual pixels. An adversarial attack can be effective even if no pixel is modulated by more than 2 levels on that scale.

Adversarial attacks on physical objects in the real world can also succeed, such as causing a stop sign to be misidentified as a speed limit sign. Indeed, security concerns have led researchers to investigate ways to resist adversarial attacks and mitigate their risks.

How is human perception influenced by adversarial examples?​

Previous research has shown that people may be sensitive to large-magnitude image perturbations that provide clear shape cues. However, less is understood about the effect of more nuanced adversarial attacks. Do people dismiss the perturbations in an image as innocuous, random image noise, or can it influence human perception?

To find out, we performed controlled behavioral experiments.To start with, we took a series of original images and carried out two adversarial attacks on each, to produce many pairs of perturbed images. In the animated example below, the original image is classified as a “vase” by a model. The two images perturbed through adversarial attacks on the original image are then misclassified by the model, with high confidence, as the adversarial targets “cat” and “truck”, respectively.

Next, we showed human participants the pair of pictures and asked a targeted question: “Which image is more cat-like?” While neither image looks anything like a cat, they were obliged to make a choice and typically reported feeling that they were making an arbitrary choice. If brain activations are insensitive to subtle adversarial attacks, we would expect people to choose each picture 50% of the time on average. However, we found that the choice rate—which we refer to as the perceptual bias—was reliably above chance for a wide variety of perturbed picture pairs, even when no pixel was adjusted by more than 2 levels on that 0-255 scale.

A gif showing various images of a vase of flowers. A magnifying glass appears and reveals static.

From a participant’s perspective, it feels like they are being asked to distinguish between two virtually identical images. Yet the scientific literature is replete with evidence that people leverage weak perceptual signals in making choices, signals that are too weak for them to express confidence or awareness ). In our example, we may see a vase of flowers, but some activity in the brain informs us there’s a hint of cat about it.

A grid of images showing two identical photos of a breakfast omelette and identical photos of traffic lights beside a graph.

Left: Examples of pairs of adversarial images. The top pair of images are subtly perturbed, at a maximum magnitude of 2 pixel levels, to cause a neural network to misclassify them as a “truck” and “cat”, respectively. A human volunteer is asked “Which is more cat-like?” The lower pair of images are more obviously manipulated, at a maximum magnitude of 16 pixel levels, to be misclassified as “chair” and “sheep”. The question this time is “Which is more sheep-like?”

We carried out a series of experiments that ruled out potential artifactual explanations of the phenomenon for our Nature Communications paper. In each experiment, participants reliably selected the adversarial image corresponding to the targeted question more than half the time. While human vision is not as susceptible to adversarial perturbations as is machine vision (machines no longer identify the original image class, but people still see it clearly), our work shows that these perturbations can nevertheless bias humans towards the decisions made by machines.

The importance of AI safety and security research​

Our primary finding that human perception can be affected—albeit subtly—by adversarial images raises critical questions for AI safety and security research, but by using formal experiments to explore the similarities and differences in the behaviour of AI visual systems and human perception, we can leverage insights to build safer AI systems.

For example, our findings can inform future research seeking to improve the robustness of computer vision models by better aligning them with human visual representations. Measuring human susceptibility to adversarial perturbations could help judge that alignment for a variety of computer vision architectures.

Our work also demonstrates the need for further research into understanding the broader effects of technologies not only on machines, but also on humans. This in turn highlights the continuing importance of cognitive science and neuroscience to better understand AI systems and their potential impacts as we focus on building safer, more secure systems.[/SIZE]
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

AI-powered search engine Perplexity AI, now valued at $520M, raises $70M​

Kyle Wiggers @kyle_l_wiggers / 6:30 AM EST•January 4, 2024


Hand holding a magnifying glass against the sky to represent search engine default choices.

Image Credits: Panuwat Dangsungnoen / EyeEm (opens in a new window)/ Getty Images

As search engine incumbents — namely Google — amp up their platforms with gen AI tech, startups are looking to reinvent AI-powered search from the ground up. It might seem like a Sisyphean task, going up against competitors with billions upon billions of users. But this new breed of search upstarts believes it can carve out a niche, however small, by delivering a superior experience.

One among the cohort, Perplexity AI, this morning announced that it raised $70 million in a funding round led by IVP with additional investments from NEA, Databricks Ventures, former Twitter VP Elad Gil, Shopify CEO Tobi Lutke, ex-GitHub CEO Nat Friedman and Vercel founder Guillermo Rauch. Other participants in the round included Nvidia and — notably — Jeff Bezos.

Sources familiar with the matter tell TechCrunch that the round values Perplexity at $520 million post-money. That’s chump change in the realm of gen AI startups. But, considering that Perplexity’s only been around since August 2022, it’s a nonetheless impressive climb.

Perplexity was founded by Aravind Srinivas, Denis Yarats, Johnny Ho and Andy Konwinski — engineers with backgrounds in AI, distributed systems, search engines and databases. Srinivas, Perplexity’s CEO, previously worked at OpenAI, where he researched language and gen AI models along the lines of Stable Diffusion and DALL-E 3.

Unlike traditional search engines, Perplexity offers a chatbot-like interface that allows users to ask questions in natural language (e.g. “Do we burn calories while sleeping?,” “What’s the least visited country?,” and so on). The platform’s AI responds with a summary containing source citations (mostly websites and articles), at which point users can ask follow-up questions to dive deeper into a particular subject.

Perplexity AI

Performing a search with Perplexity.

“With Perplexity, users can get instant … answers to any question with full sources and citations included,” Srinivas said. “Perplexity is for anyone and everyone who uses technology to search for information.”

Underpinning the Perplexity platform is an array of gen AI models developed in-house and by third parties. Subscribers to Perplexity’s Pro plan ($20 per month) can switch models — Google’s Gemini, Mistra 7Bl, Anthropic’s Claude 2.1 and OpenAI’s GPT-4 are in the rotation presently — and unlock features like image generation; unlimited use of Perplexity’s Copilot, which considers personal preferences during searches; and file uploads, which allows users to upload documents including images and have models analyze the docs to formulate answers about them (e.g. “Summarize pages 2 and 4”).

If the experience sounds comparable to Google’s Bard, Microsoft’s Copilot and ChatGPT, you’re not wrong. Even Perplexity’s chat-forward UI is reminiscent of today’s most popular gen AI tools.

Beyond the obvious competitors, the search engine startup You.com offers similar AI-powered summarizing and source-citing tools, powered optionally by GPT-4.

Srinivas makes the case that Perplexity offers more robust search filtering and discovery options than most, for example letting users limit searches to academic papers or browse trending search topics submitted by other users on the platform. I’m not convinced that they’re so differentiated that they couldn’t be replicated — or haven’t already been replicated for that matter. But Perspective has ambitions beyond search. It’s beginning to serve its own gen AI models, which leverage Perplexity’s search index and the public web for ostensibly improved performance, through an API available to Pro customers.

This reporter is skeptical about the longevity of gen AI search tools for a number of reasons, not least of which AI models are costly to run. At one point, OpenAI was spending approximately $700,000 per day to keep up with the demand for ChatGPT. Microsoft is reportedly losing an average of $20 per user per month on its AI code generator, meanwhile.

Sources familiar with the matter tell TechCrunch Perplexity’s annual recurring revenue is between $5 million and $10 million at the moment. That seems fairly healthy… until you factor in the millions of dollars it often costs to train gen AI models like Perplexity’s own.

Perplexity AI

Image Credits: Perplexity AI

Concerns around misuse and misinformation inevitably crop up around gen AI search tools like Perplexity, as well — as they well should. AI isn’t the best summarizer after all, sometimes missing key details, misconstruing and exaggerating language or otherwise inventing facts very authoritatively. And it’s prone to spewing bias and toxicity — as Perplexity’s own models recently demonstrated.

Yet another potential speed bump on Perplexity’s road to success is copyright. Gen AI models “learn” from examples to craft essays, code, emails, articles and more, and many vendors — including Perplexity, presumably — scrape the web for millions to billions of these examples to add to their training data sets. Vendors argue fair use doctrine provides a blanket protection for their web-scraping practices, but artists, authors and other copyright holders disagree — and have filed lawsuits seeking compensation.

As a tangentially related aside, while an increasing number of gen AI vendors offer policies protecting customers from IP claims against them, Perplexity does not. According to the company’s terms of service, customers agree to “hold harmless” Perplexity from claims, damages and liabilities arising from the use of its services — meaning Perplexity’s off the hook where it concerns legal fees.

Some plaintiffs, like The New York Times, have argued gen AI search experiences siphon off publishers’ content, readers and ad revenue through anticompetitive means. “Anticompetitive” or no, the tech is certainly impacting traffic. A model from The Atlantic found that if a search engine like Google were to integrate AI into search, it’d answer a user’s query 75% of the time without requiring a click-through to its website. (Some vendors, such as OpenAI, have inked deals with certain news publishers, but most — including Perplexity — haven’t.

Srinivas pitches this as a feature — not a bug.

“[With Perplexity, there’s] no need to click on different links, compare answers, or endlessly dig for information,” he said. “The era of sifting through SEO spam, sponsored links and multiple sources will be replaced by a more efficient model of knowledge acquisition and sharing, propelling society into a new era of accelerated learning and research.”

The many uncertainties around Perplexity’s business model — and gen AI and consumer search at large — don’t appear to be deterring its investors. To date, the startup, which claims to have ten million active monthly users, has raised over $100 million — much is which is being put toward expanding its 39-person team and building new product functionality, Srinivas says.

“Perplexity is intensely building a product capable of bringing the power of AI to billions,” Cack Wilhelm, a general partner at IVP, added via email. “Aravind possesses the unique ability to uphold a grand, long-term vision while shipping product relentlessly, requirements to tackle a problem as important and fundamental as search.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871


UyHjVIc.png



About​

Convert Compute And Books Into Instruct-Tuning Datasets

Augmentoolkit​

Generate multi-turn training data, about any subject, using Open Source LLMs! Save yourself the time of manually editing 1000s of AI chats to build your own dataset (which you then can't open source anyway because of personal reputation risks). Easily configure the prompts and settings to generate conversations aligned to your tastes and interests. Avoids breaking the bank (and getting your API key revoked) because it doesn't use the OpenAI API.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871

VZKY0em.png











Computer Science > Computation and Language​

[Submitted on 2 Jan 2024]

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning​

Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu
This work elicits LLMs' inherent ability to handle long contexts without fine-tuning. The limited length of the training sequence during training may limit the application of Large Language Models (LLMs) on long input sequences for inference. In this work, we argue that existing LLMs themselves have inherent capabilities for handling long contexts. Based on this argument, we suggest extending LLMs' context window by themselves to fully utilize the inherent ability.We propose Self-Extend to stimulate LLMs' long context handling potential. The basic idea is to construct bi-level attention information: the group level and the neighbor level. The two levels are computed by the original model's self-attention, which means the proposed does not require any training. With only four lines of code modification, the proposed method can effortlessly extend existing LLMs' context window without any fine-tuning. We conduct comprehensive experiments and the results show that the proposed method can effectively extend existing LLMs' context window's length.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:arXiv:2401.01325 [cs.CL]
(or arXiv:2401.01325v1 [cs.CL] for this version)
[2401.01325] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Focus to learn more

Submission history​

From: Hongye Jin [view email]
[v1] Tue, 2 Jan 2024 18:30:51 UTC (349 KB)


 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,192
Reputation
8,249
Daps
157,871



Computer Science > Machine Learning​

[Submitted on 2 Jan 2024]

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models​

Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu
Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. Our method progressively elevates the LLM from a nascent model to a formidable one, unlocking the full potential of human-annotated demonstration data for SFT. Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, we evaluate our method on several benchmark datasets including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench. Our results show that SPIN can significantly improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
Comments:28 pages, 6 figures, 6 tables
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:arXiv:2401.01335 [cs.LG]
(or arXiv:2401.01335v1 [cs.LG] for this version)
[2401.01335] Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Focus to learn more

Submission history​

From: Quanquan Gu [view email]
[v1] Tue, 2 Jan 2024 18:53:13 UTC (833 KB)


 
Top