bnew

Veteran
Joined
Nov 1, 2015
Messages
58,113
Reputation
8,602
Daps
161,775








Research

Towards a Real-Time Decoding of Images from Brain Activity

October 18, 2023•

3 minute read

At every moment of every day, our brains meticulously sculpt a wealth of sensory signals into meaningful representations of the world around us. Yet how this continuous process actually works remains poorly understood.

Today, Meta is announcing an important milestone in the pursuit of that fundamental question. Using magnetoencephalography (MEG), a non-invasive neuroimaging technique in which thousands of brain activity measurements are taken per second, we showcase an AI system capable of decoding the unfolding of visual representations in the brain with an unprecedented temporal resolution.

This AI system can be deployed in real time to reconstruct, from brain activity, the images perceived and processed by the brain at each instant. This opens up an important avenue to help the scientific community understand how images are represented in the brain, and then used as foundations of human intelligence. Longer term, it may also provide a stepping stone toward non-invasive brain-computer interfaces in a clinical setting that could help people who, after suffering a brain lesion, have lost their ability to speak.

Leveraging our recent architecture trained to decode speech perception from MEG signals, we develop a three-part system consisting of an image encoder, a brain encoder, and an image decoder. The image encoder builds a rich set of representations of the image independently of the brain. The brain encoder then learns to align MEG signals to these image embeddings. Finally, the image decoder generates a plausible image given these brain representations.


392949251_3721231404766873_9149395952711330333_n.png


MEG recordings are continuously aligned to the deep representation of the images, which can then condition the generation of images at each instant.


We train this architecture on a public dataset of MEG recordings acquired from healthy volunteers and released by Things, an international consortium of academic researchers sharing experimental data based on the same image database.

We first compare the decoding performance obtained with a variety of pretrained image modules and show that the brain signals best align with modern computer vision AI systems like DINOv2, a recent self-supervised architecture able to learn rich visual representations without any human annotations. This result confirms that self-supervised learning leads AI systems to learn brain-like representations: The artificial neurons in the algorithm tend to be activated similarly to the physical neurons of the brain in response to the same image.


The images that volunteer participants see (left) and those decoded from MEG activity at each instant of time (right). Each image is presented approximately every 1.5 seconds.


This functional alignment between such AI systems and the brain can then be used to guide the generation of an image similar to what the participants see in the scanner. While our results show that images are better decoded with functional Magnetic Resonance Imaging (fMRI), our MEG decoder can be used at every instant of time and thus produces a continuous flux of images decoded from brain activity.


The images that volunteer participants see (left) and those decoded from fMRI activity (right).


While the generated images remain imperfect, the results suggest that the reconstructed image preserves a rich set of high-level features, such as object categories. However, the AI system often generates inaccurate low-level features by misplacing or mis-orienting some objects in the generated images. In particular, using the Natural Scene Dataset, we show that images generated from MEG decoding remain less precise than the decoding obtained with fMRI, a comparably slow-paced but spatially precise neuroimaging technique.

Overall, our results show that MEG can be used to decipher, with millisecond precision, the rise of complex representations generated in the brain. More generally, this research strengthens Meta’s long-term research initiative to understand the foundations of human intelligence, identify its similarities as well as differences compared to current machine learning algorithms, and ultimately guide the development of AI systems designed to learn and reason like humans.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,113
Reputation
8,602
Daps
161,775

Computer Science > Computation and Language​

[Submitted on 17 Oct 2023]

VeRA: Vector-based Random Matrix Adaptation​

Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki Markus Asano
Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which reduces the number of trainable parameters by 10x compared to LoRA, yet maintains the same performance. It achieves this by using a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead. We demonstrate its effectiveness on the GLUE and E2E benchmarks, and show its application in instruction-following with just 1.4M parameters using the Llama2 7B model.
Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2310.11454 [cs.CL]
(or arXiv:2310.11454v1 [cs.CL] for this version)
[2310.11454] VeRA: Vector-based Random Matrix Adaptation
Focus to learn more

Submission history​

From: Dawid Jan Kopiczko [view email]
[v1] Tue, 17 Oct 2023 17:59:46 UTC (139 KB)





 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,113
Reputation
8,602
Daps
161,775

Self-hosting small LLMs can be significantly cheaper than running GPT-4.

Quick breakdown:

To keep it simple, let's assume you have a full context window.

For GPT-4, that's roughly $0.30/1k tokens ($0.03/1k prompt tokens for an 8192 context window plus $0.06/1k for completion tokens).

The cost of self-hosting is mainly the cost of a GPU server. To keep things simple, let's assume you're using a @LambdaAPI H100 server at $2/hr.

A while ago, I tested the performance of vLLM with Falcon-7B and was getting roughly 44.1 tokens/sec with a full context window on a 4090. An H100 would be much faster, but we'll use that number.

That's 158,760 tokens/hour, which means your cost is ($2/hour) / (158,760 tokens/hour) = ~$0.013/1k tokens.

(This is a pretty rough calculation, so let me know if I messed up anywhere)

Yes, the model is way smaller than GPT-4, but I spent absolutely no time maximizing throughput using things like continuous batching and quantizing the model. I also was using a slower GPU with a lot less VRAM.

However, this is really contingent on being able to have consistent GPU usage. But, even at 10% efficiency using my setup, you're looking at only ~30% of the cost of GPT-4.

If you have a narrow task that you can fine-tune a model like Mistral-7B for, you should strongly consider this route.
Mark Tenenholtz
@marktenenholtz
19h
19h
The disadvantages:

- Pay-per-use can be a lot more effective when scaling
- The model I benchmarked only had a 2k context window, but it's also not the most efficient I've tested. Something like Mistral would likely outperform its cost/token
- Some of the cost savings are made up for in maintenance costs
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,113
Reputation
8,602
Daps
161,775

You don't need a huge, labeled, custom dataset to train a great model.

The Segment Anything data generation pipeline created 1.1B masks and only a fraction were hand-labeled. They had the best executed data pipeline I've seen in a while.

The main idea:

1. Start with a model trained on existing, public datasets

2. Use that model to help annotators label data, and use that data to retrain the model (using only this data)

3. Run inference on unlabeled data and keep the confident labels/masks. Ask annotators to fill in the rest of the missing labels, and retrain the model.

4. Repeat step 3 a few times

5. Finally, let the model generate labels automatically on a much larger dataset. Apply some filtering and deduping, and retrain the model on this dataset.

This is quite similar to a pseudo-labeling pipeline I used a couple of years ago to medal in a Kaggle competition (except more sophisticated!)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,113
Reputation
8,602
Daps
161,775

Language models are bad a basic math.

GPT-4 has right around 0% accuracy rate on 5 digit multiplication.

Most open models can't even add. Why is that?

There are a few reasons why numbers are hard. The main one is Tokenization. When training a tokenizer from scratch, you take a large corpus of text and find the minimal byte-pair encoding for a chosen vocabulary size.

This means, however, that numbers will almost certainly not have unique token representations. "21" could be a single token, or ["2", "1"]. 143 could be ["143"] or ["14", "3"] or any other combination.

A potential fix here would be to force single digit tokenization. The state of the art for the last few years is to inject a space between every digit when creating the tokenizer and when running the model. This means 143 would always be tokenized as ["1", "4", "3"].

This helps boost performance, but wastes tokens while not fully fixing the problem.

A cool fix might be xVal! This work by The Polymathic AI Collaboration suggests a generic [NUM] token which is then scaled by the actual value of the number!

If you look at the red lines in the image above, you can get an intuition for how that might work.

It doesn't capture a huge range or high fidelity (e.g., 7.4449 vs 7.4448) but they showcase some pretty convincing results on sequence prediction problems that are primarily numeric.

For example, they want to train a sequence model on GPS conditioned temperature forecasting

They found a ~70x improvement over standard vanilla baselines and a 2x improvement over really strong baselines.

One cool side effect is that deep neural networks might be really good at regression problems using this encoding scheme!


https://arxiv.org/abs/2310.02989

xVal: A Continuous Number Encoding for Large Language Models​

Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker, Bruno Régaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho
Large Language Models have not yet been broadly adapted for the analysis of scientific datasets due in part to the unique difficulties of tokenizing numbers. We propose xVal, a numerical encoding scheme that represents any real number using just a single token. xVal represents a given real number by scaling a dedicated embedding vector by the number value. Combined with a modified number-inference approach, this strategy renders the model end-to-end continuous when considered as a map from the numbers of the input string to those of the output string. This leads to an inductive bias that is generally more suitable for applications in scientific domains. We empirically evaluate our proposal on a number of synthetic and real-world datasets. Compared with existing number encoding schemes, we find that xVal is more token-efficient and demonstrates improved generalization.
Comments:10 pages 7 figures. Supplementary: 5 pages 2 figures
Subjects:Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:arXiv:2310.02989 [stat.ML]
(or arXiv:2310.02989v1 [stat.ML] for this version)
[2310.02989] xVal: A Continuous Number Encoding for Large Language Models
Focus to learn more

Submission history​

From: Siavash Golkar [view email]
[v1] Wed, 4 Oct 2023 17:26:16 UTC (2,637 KB)

 
Top