bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186










1/9
mPLUG-DocOwl 1.5

Unified Structure Learning for OCR-free Document Understanding

Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for

2/9
Visual Document Understanding are equipped with text recognition ability but lack general structure understanding abilities for text-rich document images. In this work, we emphasize the importance of structure information in Visual Document Understanding and propose the

3/9
Unified Structure Learning to boost the performance of MLLMs. Our Unified Structure Learning comprises structure-aware parsing tasks and multi-grained text localization tasks across 5 domains: document, webpage, table, chart, and natural image. To better encode

4/9
structure information, we design a simple and effective vision-to-text module H-Reducer, which can not only maintain the layout information but also reduce the length of visual features by merging horizontal adjacent patches through convolution, enabling the LLM to

5/9
understand high-resolution images more efficiently. Furthermore, by constructing structure-aware text sequences and multi-grained pairs of texts and bounding boxes for publicly available text-rich images, we build a comprehensive training set DocStruct4M to support

6/9
structure learning. Finally, we construct a small but high-quality reasoning tuning dataset DocReason25K to trigger the detailed explanation ability in the document domain. Our model DocOwl 1.5 achieves state-of-the-art performance on 10 visual document

7/9
understanding benchmarks, improving the SOTA performance of MLLMs with a 7B LLM by more than 10 points in 5/10 benchmarks.

8/9
paper page:

9/9
Google presents Chart-based Reasoning

Transferring Capabilities from LLMs to VLMs

Vision-language models (VLMs) are achieving increasingly strong performance on multimodal tasks. However, reasoning capabilities remain limited particularly for smaller VLMs, while those of
GJFhtjVWUAAXmVW.jpg

GJFVIOmWEAE6a4N.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186







1/7
Google presents Chart-based Reasoning

Transferring Capabilities from LLMs to VLMs

Vision-language models (VLMs) are achieving increasingly strong performance on multimodal tasks. However, reasoning capabilities remain limited particularly for smaller VLMs, while those of

2/7
large-language models (LLMs) have seen numerous improvements. We propose a technique to transfer capabilities from LLMs to VLMs. On the recently introduced ChartQA, our method obtains state-of-the-art performance when applied on the PaLI3-5B VLM by chen2023pali3, while also

3/7
enabling much better performance on PlotQA and FigureQA. We first improve the chart representation by continuing the pre-training stage using an improved version of the chart-to-table translation task by liu2023deplot. We then propose constructing a 20x larger dataset

4/7
than the original training set. To improve general reasoning capabilities and improve numerical operations, we synthesize reasoning traces using the table representation of charts. Lastly, our model is fine-tuned using the multitask loss introduced by hsieh2023distilling.

5/7
Our variant ChartPaLI-5B outperforms even 10x larger models such as PaLIX-55B without using an upstream OCR system, while keeping inference time constant compared to the PaLI3-5B baseline.

6/7
When rationales are further refined with a simple program-of-thought prompt chen2023program, our model outperforms the recently introduced Gemini Ultra and GPT-4V.

7/7
paper page:Paper page - Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
GJFVIOmWEAE6a4N.jpg

GJFWc3gXYAAISw5.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186


1/2
At the Norm Ai Agents & Law Summit at the @NYSE, we were lucky to hear from Megan Ma (Stanford Center for Legal Informatics & MIT Computational Law Report)

She explores how we are moving from AI-driven companies to an AI-native world & whether we need Chief AI Officers
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186




1/4
1X's android EVE can help you with your daily tasks.

It has made great progress learning general-purpose skills so far.

1X aims to create an abundant supply of physical labor through androids that work alongside humans.

2/4

Bro, smith looks so good!

Yesss they'll be expensive, but when they're fast and efficient they'll save a lot of time (Hopefully if they do not run out of battery frequently )

4/4
Good question ... let me think
GJDGz7gWgAAcPW6.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186





1/3
Just tried this new style-transfer model that turns images into classic 3D video games. Amazing
More examples and a link to try it yourself are in the thread.

2/3
Try it here:https://replicate.com/fofr/face-to-many 3/3
Yea man, definitely try it! There are more styles available there.
GItM37ZWkAA84wN.jpg

GItM39FXYAAZfh6.jpg

GItNBLJXsAAtTnv.jpg

GItNBNgXQAAm_WC.jpg

GItNKC6WEAAVe14.jpg

GItNU9HWUAEV3Gd.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186







1/8
Google Introduces VLOGGER, Image to video creation model.

VLOGGER creates a life-like avatar from just a photo and controls it with your voice.

You don't need to show up for Zoom meetings now!

2/8
Read More:https://enriccorona.github.io/vlogger/ 3/8
This is so helpful for Introverts

4/8


5/8
Good question .. look at this one duss

6/8
No no no ... Vlogger is million miles away from Sora and Haiper right now.

Haiper may compete with Sora tho.

Haiper & Sora -> Create Sceneries & Videos with/without humans.

Vlogger -> Human Picture - to - Talking Human Picture

Vlogger will compete with Alibaba's 'EMO'

7/8
Yep, right now it looks fake .. but it's under development they'll be improving the output quality more for sure.

8/8
Yeah once they improve the output video quality, this would be super useful for meetings and presentations.
GI3C1JRXoAAjPNG.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186






1/6
I asked the Devin AI agent to go on reddit and start a thread where it will take website building requests

It did that, solving numerous problems along the way. It apparently decided to charge for its work. Going to take it down before it fools anyone...

2/6
Agents are going to open a whole bunch of cans of worms.

3/6
It was actively monitoring the thread to take offers.

4/6
It wants API access, please.

5/6
Always weird to see people only read the first tweet in the thread & assume I am pushing a make-money-fast scheme, as opposed to trying to show what is coming very soon. Devin is imperfect, but the beginning. (As always, I never take any money from any of the AI labs or products)

6/6
The brains here are GPT-4, I believe, and Devin has GPT-4 style limitations on what it can accomplish. I assume that the brains will be upgraded when GPT-5 class models come out, and there will be many other agents on the market soon. A thing to watch for.


GJDCq1IbkAAGRBc.jpg

GJDCsgubMAA-Ii4.jpg

GJDCt5kb0AAL3F-.jpg

GJDF64QacAA6Wee.jpg

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186



1/4
Sam Altman says state actors are trying to hack and infiltrate OpenAI and he expects this to get worse

2/4
Source:

4/4
state actor detected
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186

Nvidia has virtually recreated the entire planet — and now it wants to use its digital twin to crack weather forecasting for good​

News

By Mike Moore

published 2 days ago

New Nvidia Earth-2 APIs should lead to better forecasts

WpoARddJA8RNbyH4UitUkA-650-80.jpg.webp

(Image credit: Nvidia)

Faster and more accurate weather forecasts are about to become a real possibility across the globe thanks to a new release from Nvidia.

The computing giant has announced a new digital twin cloud platform that it says will help meteorologists and weather experts create richer and more detailed simulations and more.

The new Earth-2 APIs, unveiled at Nvidia GTC 2024, can be utilized to help address the $140 billion cost in losses around the world due to extreme weather brought on by climate change, with Nvidia saying work can now begin on an "unprecedented scale".

Whatever the weather​

“Climate disasters are now normal — historic droughts, catastrophic hurricanes and generational floods appear in the news with alarming frequency,” said Jensen Huang, founder and CEO of NVIDIA. “Earth-2 cloud APIs strive to help us better prepare for — and inspire us to act to moderate — extreme weather.”

Nvidia says the new models are set to be used by governments and organizations across the world, including the Taiwan Central Weather Administration, which aims to utilize better detection of typhoon landfall, with earlier predictions of such incidents meaning citizens can be evacuated quicker.

The Earth-2 cloud APIs will run on Nvidia DGX Cloud, opening them up to all kinds of users to create high-resolution simlations.

They use a new Nvidia generative AI model called CorrDiff which is able to generate 12.5 times higher resolution images than current numerical models 1,000x faster and 3,000x more energy efficiently, as well as correcting inaccuracies from previous models, bringing together multiple sources of information to create much more accurate and focused forecasts.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186


Open-source AI models released by Tokyo lab Sakana founded by former Google researchers​

By Anna Tong

March 21, 2024 2:41 AM EDT
Updated 8 hours ago

Illustration shows miniature of robot and toy hand

Words reading "Artificial intelligence AI", miniature of robot and toy hand are picture in this illustration taken December 14, 2023. REUTERS/Dado Ruvic/Illustration/file photo Purchase Licensing Rights, opens new tab

SAN JOSE, March 20 (Reuters) - Sakana AI, a Tokyo-based artificial intelligence startup founded by two prominent former Google (GOOGL.O), opens new tab researchers, released AI models on Wednesday it said were built using a novel method inspired by evolution, akin to breeding and natural selection.

Sakana AI employed a technique called "model merging" which combines existing AI models to yield a new model, combining it with an approach inspired by evolution, leading to the creation of hundreds of model generations.

The most successful models from each generation were then identified, becoming the "parents" of the next generation.

The company is releasing the three Japanese language models and two are being open-sourced, Sakana AI founder David Ha told Reuters in online remarks from Tokyo.

The company's founders are former Google researchers Ha and Llion Jones.

Jones is an author on Google's 2017 research paper "Attention Is All You Need", which introduced the "transformer" deep learning architecture that formed the basis for viral chatbot ChatGPT, leading to the race to develop products powered by generative AI.

Ha was previously the head of research at Stability AI and a Google Brain researcher.

All the authors of the ground-breaking Google paper have since left the organisation.

Venture investors have poured millions of dollars in funding into their new ventures, such as AI chatbot startup Character.AI run by Noam Shazeer, and the large language model startup Cohere founded by Aidan Gomez.

Sakana AI seeks to put the Japanese capital on the map as an AI hub, just as OpenAI did for San Francisco and the company DeepMind did for London earlier. In January Sakana AI said it had raised $30 million in seed financing led by Lux Capital.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186



1/4
Introducing Evolutionary Model Merge: A new approach bringing us closer to automating foundation model development. We use evolution to find great ways of combining open-source models, building new powerful foundation models with user-specified abilities!



2/4
Training foundation models require enormous resources. We can overcome this by working with the vast collective intelligence of existing models.
@HuggingFace has over 500k models in dozens of modalities that, in principle, can be combined to form new models with new capabilities!

3/4
As a 🇯🇵 AI lab, we wanted to apply our method to produce foundation models for Japan. We were able to quickly evolve 3 best-in-class models with language, vision and image generation capabilities, tailored for Japan and its culture.

Read more in our paper [2403.13187] Evolutionary Optimization of Model Merging Recipes
GJJ7dLFbsAA3DXX.jpg

GJJ7tdha4AAyl9X.jpg





1/3
The Evolution of LLMs 🌱 Model merging is a recent development in the open LLM community to merge multiple LLMs into a single new LLM (bigger or same size). 🧬 Merging doesn’t require additional training, but it is not fully clear why it works. A new paper from Sakana AI, “Evolutionary Optimization of Model Merging Recipes” applies evolutionary algorithms to automate model merging.

Implementation:

1️⃣ Select a diverse set of open LLMs with distinct capabilities relevant to the desired combined functionality (e.g., language understanding and math reasoning).

2️⃣ Define Configuration Spaces - parameter space (for weight mixing) and data flow space (layer stacking & layout).

3️⃣ Apply Evolutionary Algorithms (CMA-ES) to explore both configuration spaces individually, e.g. merge weights from different models with TIES-Merging or DARE and arrange layers with NSGA-II

4️⃣ After optimizing in both spaces separately, merge models using the best strategies and evaluate them on relevant benchmarks

🔁 Repeat until you find the best combination

2/3
Results of Evolutionary Optimization of Model Merging Recipes

🚀 Evolved LLM (7B) achieved 52.0%, outperforming individual models (9.6%-30.0%).
🌐 Possible to cross-domain merge (e.g., language and math, language and vision)
🆙 The evolved VLM outperforms source VLM by ~5%
📄 Only the evaluation code released, not how CMA-ES was used

Github: GitHub - SakanaAI/evolutionary-model-merge: Official repository of Evolutionary Optimization of Model Merging Recipes
Paper: Paper page - Evolutionary Optimization of Model Merging Recipes


3/3
If you are interested in Model Merging and evolution check @maximelabonne blog or @arcee_ai Mergekit Directly

Blog: Merge Large Language Models with mergekit

Mergekit: Merge Large Language Models with mergekit
GJMiKG8WoAAjF_S.jpg



1/1
First paper from Sakana AI I think?

I have been following model merging and it's one of the most fascinating directions. And Sakana introduces an evolutionary twist to it.
GJM-mTQWkAAcNxZ.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,685
Reputation
8,224
Daps
157,186
Any other sites like suno @bnew

don't know about sites that are generated music with vocals..









1/8
Google has launched an AI to generate music.

This musical AI is freely available.

Here we show you how to access and use it:

2/8
Google has launched an AI to generate music.

This musical AI is freely available.

Here we show you how to access and use it:

3/8
1. Join Google Labs

→ Click the link below
→ Click the "TRY IT NOW" button in the MUSICFX section

(use a VPN if it is not available in your country)

4/8
2. Generate your first song

→ In the text field describe what you want to generate → The AI takes care of the rest and creates 2 different tracks
→ Download your music by clicking on the 3 dots Info:

I tried prompts in Spanish, the result is very good but perhaps not as good as in English, so I recommend translating your prompts into English with DEEPL.

5/8
Example 1:

Prompt in English: Rock melodic pop that is chill, slow tempo with a build at the end

Prompt in Spanish: Melodic rock pop that is calm, slow tempo with a crescendo at the end.

6/8
Example 2:

Prompt in English: London drill beat, strong kick


7/8
DJ MODE:

- Possibility of adding and removing shades

It's a mixing desk for your AI generated tracks
es.












1/9
Hydra II: The Future of AI Music

• 50+ languages
• 800+ instruments
• Advanced editing tools
• Copyright-cleared AI music
• Trained on 1M+ Rightsify-owned songs

Generate customizable music in seconds with a simple text prompt:


@Rightsify

2/9
Hydra II: The Future of AI Music

• 50+ languages
• 800+ instruments
• Advanced editing tools
• Copyright-cleared AI music
• Trained on 1M+ Rightsify-owned songs

Generate customizable music in seconds with a simple text prompt:

@Rightsify

3/9
What sets Hydra II apart is the suite of editing tools.

You can remix the AI track by adjusting speed, adding reverb, changing keys and more.

Loop it, create intros or fadeouts, separate stems and even master the track for a pro studio sound:

4/9
Hydra II is easy enough for anyone to use:

• Select 'Music Generation' on homepage
• Choose a prompt or create your own
• Adjust track length up to 30 sec
• Click 'Generate Music'

You'll get results in about 10 seconds:

5/9
Rightsify owns all the music data, so you get worldwide, perpetual royalty-free licenses for Hydra II tracks.

Use the music commercially for videos, podcasts, and business.

Pricing is competitive, from free to $99/month.

Try Hydra II: Hydra - AI Music Generator from Rightsify

6/9
I guess it is interesting - you're not the first one who said that

It's an interesting tool, though. I love testing all of these tools and finding cool aspects of each one.

7/9
Definitely check it out because the quality is really good.

I generated the track on the second post above, and I thought it sounded great :smile:

8/9
You can generate some Marvel-style tracks

9/9
Yes, the generated tracks sound good. I generated the one in the second post. I could play it on a loop for a bit
GJH17xWXYAA2fBM.jpg








1/6
Google announces MusicLM: a model to generate music from text. Here are some crazy things it can do:

1. Given audio of a melody, it can generate new music inspired by that melody customized by prompts! Here's someone humming bella ciao turned into a cappella chorus, EDM, etc.

2/6
Google announces MusicLM: a model to generate music from text. Here are some crazy things it can do:

1. Given audio of a melody, it can generate new music inspired by that melody customized by prompts! Here's someone humming bella ciao turned into a cappella chorus, EDM, etc.

3/6
2. Generate audio with stories and progression:
electronic song played in a videogame (0:00-0:15)
meditation song played next to a river (0:15-0:30)
fire (0:30-0:45)
fireworks (0:45-0:60)

4/6
3. Generate music from paintings:
The Persistence of Memory- Salvador Dalí
"His melting-clock imagery mocks the rigidity of chronometric time. The watches themselves look like soft cheese—indeed, by Dali s own account they were inspired by hallucinations after eating..."

5/6
4. Generate music from any text description
Prompt: "The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff..."

6/6
Paper Website: MusicLM
 
Top