bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936

Nvidia’s Jensen Huang says AI hallucinations are solvable, artificial general intelligence is 5 years away​

Haje Jan Kamps @Haje / 5:13 PM EDT•March 19, 2024

GTC 2024 Jensen Huang TechCrunch Haje Kamps

Image Credits: Haje Jan Kamps(opens in a new window)

Artificial general intelligence (AGI) — often referred to as “strong AI,” “full AI,” “human-level AI” or “general intelligent action” — represents a significant future leap in the field of artificial intelligence. Unlike narrow AI, which is tailored for specific tasks, such as detecting product flaws, summarizing the news, or building you a website, AGI will be able to perform a broad spectrum of cognitive tasks at or above human levels. Addressing the press this week at Nvidia’s annual GTC developer conference, CEO Jensen Huang appeared to be getting really bored of discussing the subject — not least because he finds himself misquoted a lot, he says.

The frequency of the question makes sense: The concept raises existential questions about humanity’s role in and control of a future where machines can outthink, outlearn and outperform humans in virtually every domain. The core of this concern lies in the unpredictability of AGI’s decision-making processes and objectives, which might not align with human values or priorities (a concept explored in-depth in science fiction since at least the 1940s). There’s concern that once AGI reaches a certain level of autonomy and capability, it might become impossible to contain or control, leading to scenarios where its actions cannot be predicted or reversed.

When sensationalist press asks for a timeframe, it is often baiting AI professionals into putting a timeline on the end of humanity — or at least the current status quo. Needless to say, AI CEOs aren’t always eager to tackle the subject.

Huang, however, spent some time telling the press what he does think about the topic. Predicting when we will see a passable AGI depends on how you define AGI, Huang argues, and draws a couple of parallels: Even with the complications of time zones, you know when New Year happens and 2025 rolls around. If you’re driving to the San Jose Convention Center (where this year’s GTC conference is being held), you generally know you’ve arrived when you can see the enormous GTC banners. The crucial point is that we can agree on how to measure that you’ve arrived, whether temporally or geospatially, where you were hoping to go.

“If we specified AGI to be something very specific, a set of tests where a software program can do very well — or maybe 8% better than most people — I believe we will get there within 5 years,” Huang explains. He suggests that the tests could be a legal bar exam, logic tests, economic tests or perhaps the ability to pass a pre-med exam. Unless the questioner is able to be very specific about what AGI means in the context of the question, he’s not willing to make a prediction. Fair enough.

AI hallucination is solvable​

In Tuesday’s Q&A session, Huang was asked what to do about AI hallucinations — the tendency for some AIs to make up answers that sound plausible but aren’t based in fact. He appeared visibly frustrated by the question, and suggested that hallucinations are solvable easily — by making sure that answers are well-researched.

“Add a rule: For every single answer, you have to look up the answer,” Huang says, referring to this practice as “retrieval-augmented generation,” describing an approach very similar to basic media literacy: Examine the source and the context. Compare the facts contained in the source to known truths, and if the answer is factually inaccurate — even partially — discard the whole source and move on to the next one. “The AI shouldn’t just answer; it should do research first to determine which of the answers are the best.”

For mission-critical answers, such as health advice or similar, Nvidia’s CEO suggests that perhaps checking multiple resources and known sources of truth is the way forward. Of course, this means that the generator that is creating an answer needs to have the option to say, “I don’t know the answer to your question,” or “I can’t get to a consensus on what the right answer to this question is,” or even something like “Hey, the Super Bowl hasn’t happened yet, so I don’t know who won.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936


1/4
GPT4 was trained on only about 10T tokens!

30 billion quadrillion == 3e25

Note: 3e25 BFloat16 FLOPS at 40% MFU on H100s is about 7.5e10sec ie 21M H100 hours. This is about 1300h on 16k H100s (less than 2 months)

Token math:
Previous leaks have verified that GPT4 is a 8x topk=2 MoE
Assume attention uses 1/2 the params of each expert
params per attn blcok: ppab + 8x (2x ppab) = 1.8e12 => ppab = 105.88e9
per tok params used in fwd pass: p_fwd = ppab + 2x (2x ppab) = 529.4e9
FLOPs per tok = 3 * 2 * p_fwd = 3.2e12
3e25 FLOPS / 3.2e12 FLOPS/tok = 9.3e12 tok (rounds to 10T tok)

2/4
Jensen Huang: OpenAI's latest model has 1.8 trillion parameters and required 30 billion quadrillion FLOPS to train

3/4
@allen_ai, @soldni dolma 10T when?
GJEhnLYaoAAUmjZ.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936




1/4

Presenting Starling-LM-7B-beta, our cutting-edge 7B language model fine-tuned with RLHF!

Also introducing Starling-RM-34B, a Yi-34B-based reward model trained on our Nectar dataset, surpassing our previous 7B RM in all benchmarks.

We've fine-tuned the latest Openchat model with the 34B reward model, achieving MT Bench score of 8.12 while being much better at hard prompts compared to Starling-LM-7B-alpha in internal benchmarks. Testing will soon be available on @lmsysorg@lmsysorg. Please stay tuned!

. Please stay tuned!

HuggingFace links:
[Starling-LM-7B-beta]Nexusflow/Starling-LM-7B-beta · Hugging Face
[Starling-RM-34B]Nexusflow/Starling-RM-34B · Hugging Face

Discord Link:加入 Discord 服务器 Nexusflow!


Since the release of Starling-LM-7B-alpha, we've received numerous requests to make the model commercially viable. Therefore, we're licensing all models and datasets under Apache-2.0, with the condition that they are not used to compete with OpenAI. Enjoy!

2/4
Thank you! I guess larger model as RM naturally has some advantage. But you’ll see some rigorous answer very soon on twitter ;)

3/4
Yes, sorry we delayed that a bit since we are refactoring the code. But hopefully the code and paper will be out soon!

4/4
Yes, please stay tuned!
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936



1/1

Exciting Update: Moirai is now open source!

Foundation models for time series forecasting are gaining momentum! If you’ve been eagerly awaiting the release of Moirai since our paper (Unified Training of Universal Time Series Forecasting Transformers ) dropped, here’s some great news!

Today, we’re excited to share that the code, data, and model weights for our paper have been officially released. Let’s fuel the advancement of foundation models for time series forecasting together!

Code:GitHub - SalesforceAIResearch/uni2ts: Unified Training of Universal Time Series Forecasting Transformers
LOTSA data: Salesforce/lotsa_data · Datasets at Hugging Face (of course via @huggingface)
Paper:[2402.02592] Unified Training of Universal Time Series Forecasting Transformers
Blog post: Moirai: A Time Series Foundation Model for Universal Forecasting
Model:Moirai-R models - a Salesforce Collection



@woo_gerald @ChenghaoLiu15 @silviocinguetta @CaimingXiong @doyensahoo
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936




1/4
VILA by
@NVIDIAAI & @MIT


> 13B, 7B and 2.7B model checkpoints.
> Beats the current SoTA models like QwenVL.
> Interleaved Vision + Text pre-training.
> Followed by joint SFT.
> Works with AWQ for 4-bit inference.

Models on the Hugging Face Hub:https://huggingface.co/collections/...anguage-models-65d8022a3a52cd9bcd62698e 2/4
All the Model check points here:

3/4
The benchmarks look quite strong:

4/4
Paper for those interested:
GJG7FrCWkAAS3z3.jpg

GJG88_JW8AEv5d1.png

GJHOe_UWIAAakt6.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936










1/9
mPLUG-DocOwl 1.5

Unified Structure Learning for OCR-free Document Understanding

Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for

2/9
Visual Document Understanding are equipped with text recognition ability but lack general structure understanding abilities for text-rich document images. In this work, we emphasize the importance of structure information in Visual Document Understanding and propose the

3/9
Unified Structure Learning to boost the performance of MLLMs. Our Unified Structure Learning comprises structure-aware parsing tasks and multi-grained text localization tasks across 5 domains: document, webpage, table, chart, and natural image. To better encode

4/9
structure information, we design a simple and effective vision-to-text module H-Reducer, which can not only maintain the layout information but also reduce the length of visual features by merging horizontal adjacent patches through convolution, enabling the LLM to

5/9
understand high-resolution images more efficiently. Furthermore, by constructing structure-aware text sequences and multi-grained pairs of texts and bounding boxes for publicly available text-rich images, we build a comprehensive training set DocStruct4M to support

6/9
structure learning. Finally, we construct a small but high-quality reasoning tuning dataset DocReason25K to trigger the detailed explanation ability in the document domain. Our model DocOwl 1.5 achieves state-of-the-art performance on 10 visual document

7/9
understanding benchmarks, improving the SOTA performance of MLLMs with a 7B LLM by more than 10 points in 5/10 benchmarks.

8/9
paper page:

9/9
Google presents Chart-based Reasoning

Transferring Capabilities from LLMs to VLMs

Vision-language models (VLMs) are achieving increasingly strong performance on multimodal tasks. However, reasoning capabilities remain limited particularly for smaller VLMs, while those of
GJFhtjVWUAAXmVW.jpg

GJFVIOmWEAE6a4N.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936







1/7
Google presents Chart-based Reasoning

Transferring Capabilities from LLMs to VLMs

Vision-language models (VLMs) are achieving increasingly strong performance on multimodal tasks. However, reasoning capabilities remain limited particularly for smaller VLMs, while those of

2/7
large-language models (LLMs) have seen numerous improvements. We propose a technique to transfer capabilities from LLMs to VLMs. On the recently introduced ChartQA, our method obtains state-of-the-art performance when applied on the PaLI3-5B VLM by chen2023pali3, while also

3/7
enabling much better performance on PlotQA and FigureQA. We first improve the chart representation by continuing the pre-training stage using an improved version of the chart-to-table translation task by liu2023deplot. We then propose constructing a 20x larger dataset

4/7
than the original training set. To improve general reasoning capabilities and improve numerical operations, we synthesize reasoning traces using the table representation of charts. Lastly, our model is fine-tuned using the multitask loss introduced by hsieh2023distilling.

5/7
Our variant ChartPaLI-5B outperforms even 10x larger models such as PaLIX-55B without using an upstream OCR system, while keeping inference time constant compared to the PaLI3-5B baseline.

6/7
When rationales are further refined with a simple program-of-thought prompt chen2023program, our model outperforms the recently introduced Gemini Ultra and GPT-4V.

7/7
paper page:Paper page - Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
GJFVIOmWEAE6a4N.jpg

GJFWc3gXYAAISw5.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936




1/4
1X's android EVE can help you with your daily tasks.

It has made great progress learning general-purpose skills so far.

1X aims to create an abundant supply of physical labor through androids that work alongside humans.

2/4

Bro, smith looks so good!

Yesss they'll be expensive, but when they're fast and efficient they'll save a lot of time (Hopefully if they do not run out of battery frequently )

4/4
Good question ... let me think
GJDGz7gWgAAcPW6.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936





1/3
Just tried this new style-transfer model that turns images into classic 3D video games. Amazing
More examples and a link to try it yourself are in the thread.

2/3
Try it here:https://replicate.com/fofr/face-to-many 3/3
Yea man, definitely try it! There are more styles available there.
GItM37ZWkAA84wN.jpg

GItM39FXYAAZfh6.jpg

GItNBLJXsAAtTnv.jpg

GItNBNgXQAAm_WC.jpg

GItNKC6WEAAVe14.jpg

GItNU9HWUAEV3Gd.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936







1/8
Google Introduces VLOGGER, Image to video creation model.

VLOGGER creates a life-like avatar from just a photo and controls it with your voice.

You don't need to show up for Zoom meetings now!

2/8
Read More:https://enriccorona.github.io/vlogger/ 3/8
This is so helpful for Introverts

4/8


5/8
Good question .. look at this one duss

6/8
No no no ... Vlogger is million miles away from Sora and Haiper right now.

Haiper may compete with Sora tho.

Haiper & Sora -> Create Sceneries & Videos with/without humans.

Vlogger -> Human Picture - to - Talking Human Picture

Vlogger will compete with Alibaba's 'EMO'

7/8
Yep, right now it looks fake .. but it's under development they'll be improving the output quality more for sure.

8/8
Yeah once they improve the output video quality, this would be super useful for meetings and presentations.
GI3C1JRXoAAjPNG.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936



1/4
Sam Altman says state actors are trying to hack and infiltrate OpenAI and he expects this to get worse

2/4
Source:

4/4
state actor detected
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936


Open-source AI models released by Tokyo lab Sakana founded by former Google researchers​

By Anna Tong

March 21, 2024 2:41 AM EDT
Updated 8 hours ago

Illustration shows miniature of robot and toy hand

Words reading "Artificial intelligence AI", miniature of robot and toy hand are picture in this illustration taken December 14, 2023. REUTERS/Dado Ruvic/Illustration/file photo Purchase Licensing Rights, opens new tab

SAN JOSE, March 20 (Reuters) - Sakana AI, a Tokyo-based artificial intelligence startup founded by two prominent former Google (GOOGL.O), opens new tab researchers, released AI models on Wednesday it said were built using a novel method inspired by evolution, akin to breeding and natural selection.

Sakana AI employed a technique called "model merging" which combines existing AI models to yield a new model, combining it with an approach inspired by evolution, leading to the creation of hundreds of model generations.

The most successful models from each generation were then identified, becoming the "parents" of the next generation.

The company is releasing the three Japanese language models and two are being open-sourced, Sakana AI founder David Ha told Reuters in online remarks from Tokyo.

The company's founders are former Google researchers Ha and Llion Jones.

Jones is an author on Google's 2017 research paper "Attention Is All You Need", which introduced the "transformer" deep learning architecture that formed the basis for viral chatbot ChatGPT, leading to the race to develop products powered by generative AI.

Ha was previously the head of research at Stability AI and a Google Brain researcher.

All the authors of the ground-breaking Google paper have since left the organisation.

Venture investors have poured millions of dollars in funding into their new ventures, such as AI chatbot startup Character.AI run by Noam Shazeer, and the large language model startup Cohere founded by Aidan Gomez.

Sakana AI seeks to put the Japanese capital on the map as an AI hub, just as OpenAI did for San Francisco and the company DeepMind did for London earlier. In January Sakana AI said it had raised $30 million in seed financing led by Lux Capital.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936



1/4
Introducing Evolutionary Model Merge: A new approach bringing us closer to automating foundation model development. We use evolution to find great ways of combining open-source models, building new powerful foundation models with user-specified abilities!



2/4
Training foundation models require enormous resources. We can overcome this by working with the vast collective intelligence of existing models.
@HuggingFace has over 500k models in dozens of modalities that, in principle, can be combined to form new models with new capabilities!

3/4
As a 🇯🇵 AI lab, we wanted to apply our method to produce foundation models for Japan. We were able to quickly evolve 3 best-in-class models with language, vision and image generation capabilities, tailored for Japan and its culture.

Read more in our paper [2403.13187] Evolutionary Optimization of Model Merging Recipes
GJJ7dLFbsAA3DXX.jpg

GJJ7tdha4AAyl9X.jpg





1/3
The Evolution of LLMs 🌱 Model merging is a recent development in the open LLM community to merge multiple LLMs into a single new LLM (bigger or same size). 🧬 Merging doesn’t require additional training, but it is not fully clear why it works. A new paper from Sakana AI, “Evolutionary Optimization of Model Merging Recipes” applies evolutionary algorithms to automate model merging.

Implementation:

1️⃣ Select a diverse set of open LLMs with distinct capabilities relevant to the desired combined functionality (e.g., language understanding and math reasoning).

2️⃣ Define Configuration Spaces - parameter space (for weight mixing) and data flow space (layer stacking & layout).

3️⃣ Apply Evolutionary Algorithms (CMA-ES) to explore both configuration spaces individually, e.g. merge weights from different models with TIES-Merging or DARE and arrange layers with NSGA-II

4️⃣ After optimizing in both spaces separately, merge models using the best strategies and evaluate them on relevant benchmarks

🔁 Repeat until you find the best combination

2/3
Results of Evolutionary Optimization of Model Merging Recipes

🚀 Evolved LLM (7B) achieved 52.0%, outperforming individual models (9.6%-30.0%).
🌐 Possible to cross-domain merge (e.g., language and math, language and vision)
🆙 The evolved VLM outperforms source VLM by ~5%
📄 Only the evaluation code released, not how CMA-ES was used

Github: GitHub - SakanaAI/evolutionary-model-merge: Official repository of Evolutionary Optimization of Model Merging Recipes
Paper: Paper page - Evolutionary Optimization of Model Merging Recipes


3/3
If you are interested in Model Merging and evolution check @maximelabonne blog or @arcee_ai Mergekit Directly

Blog: Merge Large Language Models with mergekit

Mergekit: Merge Large Language Models with mergekit
GJMiKG8WoAAjF_S.jpg



1/1
First paper from Sakana AI I think?

I have been following model merging and it's one of the most fascinating directions. And Sakana introduces an evolutionary twist to it.
GJM-mTQWkAAcNxZ.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,240
Reputation
8,635
Daps
161,936


1/2
Mistral just announced at
@SHACK15sf
that they will release a new model today:

Mistral 7B v0.2 Base Model

- 32k instead of 8k context window
- Rope Theta = 1e6
- No sliding window

2/2
until now they only released instruct-v0.2, not the base model
GJYFB5-a8AAYcuN.jpg

GJYqEfVa0AAl86H.jpg


 
Last edited:
Top