This paper demonstrates that language models are strong structure-based protein designers. We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs), that have learned massive sequential evolutionary knowledge from the universe of natural protein sequences, to acquire an immediate capability to design preferable protein sequences for given folds. We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness. During inference, iterative refinement is performed to effectively optimize the generated protein sequences. Experiments show that LM-Design improves the state-of-the-art results by a large margin, leading to up to 4% to 12% accuracy gains in sequence recovery (e.g., 55.65%/56.63% on CATH 4.2/4.3 single-chain benchmarks, and >60% when designing protein complexes). We provide extensive and in-depth analyses, which verify that LM-Design can (1) indeed leverage both structural and sequential knowledge to accurately handle structurally non-deterministic regions, (2) benefit from scaling data and model size, and (3) generalize to other proteins (e.g., antibodies and de novo proteins)
Comments: | 10 pages; ver.2 update: added image credit to RFdiffusion (Watson et al., 2022) in Fig. 1F, and fixed some small presentation errors |
Subjects: | Machine Learning (cs.LG) |
Cite as: | arXiv:2302.01649 [cs.LG] |
(or arXiv:2302.01649v2 [cs.LG] for this version) | |
https://doi.org/10.48550/arXiv.2302.01649 Focus to learn more |
One of the main challenges of multimodal learning is the need to combine heterogeneous modalities (e.g., video, audio, text). For example, video and audio are obtained at much higher rates than text and are roughly aligned in time. They are often not synchronized with text, which comes as a global context, e.g., a title, or a description. Furthermore, video and audio inputs are of much larger volumes, and grow as the video length increases, which naturally requires more compute dedicated to these modalities and makes modeling of long-range dependencies harder.
We here decouple the multimodal modeling, dividing it into separate, focused autoregressive models, processing the inputs according to the characteristics of the modalities. We propose a multimodal model, called Mirasol3B, consisting of an autoregressive component for the time-synchronized modalities (audio and video), and an autoregressive component for the context modalities which are not necessarily aligned in time but are still sequential. To address the long-sequences of the video-audio inputs, we propose to further partition the video and audio sequences in consecutive snippets and autoregressively process their representations. To that end, we propose a Combiner mechanism, which models the audio-video information jointly within a timeframe. The Combiner learns to extract audio and video features from raw spatio-temporal signals, and then learns to fuse these features producing compact but expressive representations per snippet.
Our approach achieves the state-of-the-art on well established multimodal benchmarks, outperforming much larger models. It effectively addresses the high computational demand of media inputs by both learning compact representations, controlling the sequence length of the audio-video feature representations, and modeling their dependencies in time.
Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
Cite as: | arXiv:2311.05698 [cs.CV] |
(or arXiv:2311.05698v2 [cs.CV] for this version) | |
[2311.05698] Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities Focus to learn more |
Metaβs new AI image generator was trained on 1.1 billion Instagram and Facebook photos
βImagine with Meta AIβ turns prompts into images, trained using public Facebook data.arstechnica.com
Metaβs new AI image generator was trained on 1.1 billion Instagram and Facebook photos
"Imagine with Meta AI" turns prompts into images, trained using public Facebook data.
BENJ EDWARDS - 12/6/2023, 4:52 PM
Enlarge / Three images generated by "Imagine with Meta AI" using the Emu AI model.
Meta | Benj Edwards
101
On Wednesday, Meta released a free standalone AI image-generator website, "Imagine with Meta AI," based on its Emu image-synthesis model. Meta used 1.1 billion publicly visible Facebook and Instagram images to train the AI model, which can render a novel image from a written prompt. Previously, Meta's version of this technologyβusing the same dataβwas only available in messaging and social networking apps such as Instagram.
FURTHER READING
Users find that Facebookβs new AI stickers can generate Elmo with a knife
If you're on Facebook or Instagram, it's quite possible a picture of you (or that you took) helped train Emu. In a way, the old saying, "If you're not paying for it, you are the product" has taken on a whole new meaning. Although, as of 2016, Instagram users uploaded over 95 million photos a day, so the dataset Meta used to train its AI model was a small subset of its overall photo library.
Since Meta says it only uses publicly available photos for training, setting your photos private on Instagram or Facebook should prevent their inclusion in the company's future AI model training (unless it changes that policy, of course).
Imagine with Meta AI
Previous SlideNext Slide
- AI-generated images of "a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of "a cat in a car holding a can of beer" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of "a flaming cheeseburger" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of "a photorealistic Mickey Mouse on the moon in a spacesuit" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of "a handsome man" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of "the ultimate gaming PC with 1,000 RGB lights" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of "a man holding a sign that says 'Ars Technica'" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of a complex prompt involving Christmas stockings and a cave created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of "photorealistic vintage computer collector nerd in a computer lab, bright psychedelic technicolor swirls" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of "an embroidered Santa Claus" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of "A teddy bear on a skateboard" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards- AI-generated images of "a beautiful queen of the universe" created by Meta Emu on the "Imagine with Meta AI" website.
Meta | Benj Edwards[/LEFT]
magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%[URL='https://t.co/uV4WVdtpwZ']http://2Fopentracker.i2p.rocks[/URL]%3A6969%2Fannounce&tr=http%3A%2F%[URL='https://t.co/g0m9cEUz0T']http://2Ftracker.openbittorrent.com[/URL]%3A80%2Fannounce
Instruction-tuning is a widely adopted method of finetuning that enables large language models (LLMs) to generate output that more closely resembles human responses to natural language queries, in many cases leading to human-level performance on diverse testbeds. However, it remains unclear whether instruction-tuning truly makes LLMs more similar to how humans process language. We investigate the effect of instruction-tuning on LLM-human similarity in two ways: (1) brain alignment, the similarity of LLM internal representations to neural activity in the human language system, and (2) behavioral alignment, the similarity of LLM and human behavior on a reading task. We assess 25 vanilla and instruction-tuned LLMs across three datasets involving humans reading naturalistic stories and sentences. We discover that instruction-tuning generally enhances brain alignment by an average of 6%, but does not have a similar effect on behavioral alignment. To identify the factors underlying LLM-brain alignment, we compute correlations between the brain alignment of LLMs and various model properties, such as model size, various problem-solving abilities, and performance on tasks requiring world knowledge spanning various domains. Notably, we find a strong positive correlation between brain alignment and model size (r = 0.95), as well as performance on tasks requiring world knowledge (r = 0.81). Our results demonstrate that instruction-tuning LLMs improves both world knowledge representations and brain alignment, suggesting that mechanisms that encode world knowledge in LLMs also improve representational alignment to the human brain.
Subjects: | Computation and Language (cs.CL) |
Cite as: | arXiv:2312.00575 [cs.CL] |
(or arXiv:2312.00575v1 [cs.CL] for this version) | |
[2312.00575] Instruction-tuning Aligns LLMs to the Human Brain Focus to learn more |
Model | Average | ARC (25-s) | HellaSwag (10-s) | MMLU (5-s) | TruthfulQA (MC) (0-s) | Winogrande (5-s) | GSM8K (5-s) |
---|---|---|---|---|---|---|---|
mistralai/Mistral-7B-v0.1 | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 |
Intel/neural-chat-7b-v3-2 | 68.29 | 67.49 | 83.92 | 63.55 | 59.68 | 79.95 | 55.12 |
perlthoughts/Chupacabra-7B-v2 | 63.54 | 66.47 | 85.17 | 64.49 | 57.6 | 79.16 | 28.35 |
fblgit/una-cybertron-7b-v1-fp16 | 69.49 | 68.43 | 85.85 | 63.34 | 63.28 | 80.90 | 55.12 |
fblgit/una-cybertron-7b-v2-bf16 | 69.67 | 68.26 | 85.?4 | 63.23 | 64.63 | 81.37 | 55.04 |
Model | Average | ARC (25-s) | HellaSwag (10-s) | MMLU (5-s) | TruthfulQA (MC) (0-s) | Winogrande (5-s) | GSM8K (5-s) |
---|---|---|---|---|---|---|---|
fblgit/una-cybertron-7b-v1-fp16 | 69.49 | 68.43 | 85.85 | 63.34 | 63.28 | 80.90 | 55.12 |
fblgit/una-cybertron-7b-v2-bf16 | 69.67 | 68.26 | 85.?4 | 63.23 | 64.63 | 81.37 | 55.04 |
fblgit/una-xaberius-34b-v1beta | 74.21 | 70.39 | 86.72 | 79.13 | 61.55 | 80.26 | 67.24 |