bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

1/2
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Presents a comprehensive, rigorously curated benchmark of Olympic-level challenges

proj: OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
abs: [2406.12753] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

2/2
Very interesting work, we need more of these to actually measure capabilities of existing models. That said, there is this weird phenomenon with LLMs where they can solve correctly a very complex math problem and fail at a trivial one; this doesn't happen to human experts


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GQZy1DWXMAArD2S.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

1/4
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

GLM-4:
- closely rivals GPT-4 on MMLU, MATH, GPQA, etc
- gets close to GPT-4 in instruction following and long context tasks

hf: THUDM (Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University)
repo: THUDM
abs: [2406.12793] ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

2/4
At last they release the paper

3/4
Were their newer models trained on their original infilling objective?

4/4
So are all future LLMs going to know less and less about most popular fields of knowledge while their MMLUs go through the roof?


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GQZxHV_WEAAc5rl.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

1/2
VideoLLM-online: Online Video Large Language Model for Streaming Video

The first streaming video LLM, high speed (5~10 FPS on RTX 3090 GPU, 10~15 FPS on A100 GPU) on long-form videos (10 minutes), with SOTA performances on online/offline settings

proj: SOCIAL MEDIA TITLE TAG
abs: [2406.11816] VideoLLM-online: Online Video Large Language Model for Streaming Video

2/2
Impressive!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

1/8
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

Shows the superior performance across a variety of tasks, including reconstruction, classification and generation

repo: GitHub - zh460045050/VQGAN-LC
abs: [2406.11837] Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

2/8
For VideoPoet we used the MAGVIT v2 tokenizer which had an effective codebook size of 262k and 100% codebook utilization. Need to read the details but at first glance LFQ/FSQ could already offer all of these advantages?

3/8
The main objective is to develop a novel image quantization model, VQGAN-LC, that can effectively leverage an extremely large codebook (up to 100,000 entries) while maintaining a high utilization rate (over 99%). The hypothesis is that a larger codebook with improved utilization will lead to better performance across various tasks compared to existing VQGAN variants.

Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

4/8
Got nerd sniped by this a bit. It's a good paper and code is clean and easy to understand. They learn a nn.Embedding(dim=768) + nn.Linear layer (they call projector) to remap the codes in a learned way down to dim=8 per codebook entry. After training I think they discard the nn.Embedding and nn.Linear parameters and just keep the dim=8 codebook derived from it.

5/8
No FSQ mention in the whole paper...

6/8
I actually want the paper more than the code in this case.

7/8
Seems like we haven't fully explored the power of VAE's

8/8



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GQUtxMUWMAEgrL9.png

GQXUwYQbQAAy-uG.jpg

GQVCqREXgAATXAr.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

1/6
Unveiling Encoder-Free Vision-Language Models

Achieves smaller performance-compute gap between encoder-based VLM and decoder-only VLM

[2406.11832] Unveiling Encoder-Free Vision-Language Models

2/6
This is super cool!

But I can’t find the model weights and inference code

3/6
actually fuyu already did it, but just don't know this paradigms scaling laws and how many more data needed as the encoder-free remedy.

4/6
Very reminiscent of the adept fuyu models from last year, which are also encoder-free and use image patches, directly.

adept/fuyu-8b · Hugging Face

5/6
Not sure whether "Encoder-Free" is the most accurate description. It seems that the patch embedding layer (PEL) & the patch aligning layer (PAL) together are essentially distilling the representation from an vision encoder...

6/6
soly 35M data.... they really have plenty ofntime..


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GQUuahHWUAA5koZ.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

1/6
Autoregressive Image Generation without Vector Quantization

Achieves competitive performance without vector quantization by using diffusion loss function

[2406.11838] Autoregressive Image Generation without Vector Quantization

2/6
To understand Quantization with a simplified explanation, do check this post; 👇

3/6
I imagined something like this would work, feel vindicated

4/6
FINALLY !

5/6
Diffusion loss??
Man, the papers are coming in too quick, I’m unable to read them all

6/6



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GQUtNi0XAAAVFDq.png

GQSHbTzaUAA_99T.jpg

GQUv-hGacAAUD9g.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743


1/10
AGI by 2027 is strikingly plausible.

That doesn’t require believing in sci-fi; it just requires believing in straight lines on a graph.

2/10
From my new series. I go through rapid scaleups in compute, consistent trends in algorithmic progress, and straightforward ways models can be "unhobbled" (chatbot -> agent) on the path to a "drop-in remote worker" by 2027.
I. From GPT-4 to AGI: Counting the OOMs - SITUATIONAL AWARENESS

3/10
But what does AGI mean? It has lost its meaning in the past two or three years completely

4/10
> straight lines
> exponential chart

My guy

5/10
Straight lines on a log graph

6/10
Most plausible:

7/10


8/10
"...it just requires believing in straight lines on a graph."

9/10
.. It requires believing GPT-4 is as smart as a smart high schooler which is an absolute fantasy.

10/10
Believing in straight lines on graphs in log scale is sci-fi or for bitcoin believers


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GPQFkaUaQAE08JB.jpg

GPQOMf7bcAAeNrj.png

GPRm13naAAA3plE.png

F4JrE3UWcAAaY8J.jpg

F4JrGcHXsAAGGpQ.jpg

GPE6EK6WgAAxc5P.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

1/1
We're sharing an update on the advanced Voice Mode we demoed during our Spring Update, which we remain very excited about:

We had planned to start rolling this out in alpha to a small group of ChatGPT Plus users in late June, but need one more month to reach our bar to launch. For example, we’re improving the model’s ability to detect and refuse certain content. We’re also working on improving the user experience and preparing our infrastructure to scale to millions while maintaining real-time responses.

As part of our iterative deployment strategy, we'll start the alpha with a small group of users to gather feedback and expand based on what we learn. We are planning for all Plus users to have access in the fall. Exact timelines depend on meeting our high safety and reliability bar. We are also working on rolling out the new video and screen sharing capabilities we demoed separately, and will keep you posted on that timeline.

ChatGPT’s advanced Voice Mode can understand and respond with emotions and non-verbal cues, moving us closer to real-time, natural conversations with AI. Our mission is to bring these new experiences to you thoughtfully.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

New Google AI experiment could let you chat with celebrities, YouTube influencers​

Don't like Gemini's responses? Google may let you create your own chatbot soon.

By Calvin Wankhede

June 25, 2024

Google Gemini logo on smartphone stock photo (2)

Edgar Cervantes / Android Authority

TL;DR
  • According to a new report, Google is working on an AI experiment that could let you chat with famous personalities.
  • The project will also allow anyone to build their own chatbots, similar to services like Character.ai.
  • The search giant may partner with YouTube influencers to create brand-specific AI personas.
Google is reportedly working on a new AI project that will let you converse with chatbots modeled after celebrities, YouTube influencers, or even fictional characters. According to The Information, Google plans to let anyone create their own chatbot by “describing its personality and appearance” and then converse with it — purely for entertainment.

This latest AI effort is notably distinct from Gems, which are essentially “customized versions of Gemini”. Put simply, Gems are similar to custom GPTs that can be taught to handle singular tasks like acting as a running coach or coding partner. On the other hand, Google’s upcoming chatbot project will fine-tune the Gemini family of language models to mimic or emulate the response style of specific people.

The search giant’s interest in personalized chatbots might suggest that it’s looking to take on Meta’s Celebrity AI chatbots. The latter already lets you talk to AI recreations of famous personalities like Snoop Dogg. Google’s upcoming product has also drawn comparisons to Character.ai, a chatbot service that offers a diverse range of personas ranging from TV characters to real-life politicians. Character.ai allows you to create your own personas with unique response styles that can be trained via text datasets.

Google’s customizable chatbot endeavor comes courtesy of the Labs team, which pivoted to working on various AI experiments last year. It’s being developed by a team of ten employees and led by long-time Google Doodle designer Ryan Germick.

As for monetization, the report suggests that Google may eventually integrate the project into YouTube rather than launching it as a standalone product. This would allow creators to create their own AI personas and potentially improve engagement with their audiences. YouTube’s most famous personality, MrBeast, already embodies an AI-powered chatbot on Meta’s platforms. While this approach may still not translate to a direct revenue stream, it could convince users to return to YouTube more often and offer creators better reach.

While a release date has yet to be finalized, the chatbot platform will likely make its way to the Google Labs page for testing first. The company is currently showing off over a dozen experimental tools and projects, with some like the controversial AI Overviews already integrated within mainline Google products.





1/1
Character AI revealed earlier this week in a blog post that they now serve more than 20k inference qps - that's 20% of Google Search request volume. According to The Information, Google is now developing its own celebrity and user made chatbot platform. Planned launch this year.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

AI in practice

Jun 25, 2024


Anthropic finally brings some ChatGPT features to Claude​

Anthropic

Anthropic finally brings some ChatGPT features to Claude

Matthias Bastian

Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.

Profile

E-Mail

Content


Anthropic has introduced new collaboration features for Claude, including projects, shared conversations, and artifacts. These additions bring Claude closer to the functionality offered by ChatGPT, which has had similar features for some time.

Claude Pro and Team users can now organize their chats into "projects." Like GPTs, projects store custom data and specific prompts, making them readily available each time the project is launched.

Projects allow users to leverage a context scope of 200,000 tokens, the equivalent of about 500 pages of text. This helps avoid a "cold start" by providing relevant background information. Anthropic believes this will improve Claude's performance on specific tasks.

Users can also give Claude project-specific instructions, such as adopting a more formal tone or answering questions from a particular industry perspective. A redesigned sidebar allows users to pin frequently used chats for easy access.

External media content ( www.youtube.com) has been blocked here. When loading or playing, connections are established to the servers of the respective providers. Personal data may be communicated to the providers in the process. You can find more information in our privacy policy.

Allow external media content

Projects also support the newly introduced "artifacts". It displays generated content such as code snippets, text documents, or diagrams in a separate window next to the conversation. For developers, artifacts provide an expanded code window and live previews for frontends. This feature is currently in beta and can be enabled in your account settings.

Claude.ai Team users can now share snapshots of their best conversations with colleagues in a project's activity feed, which is designed to facilitate learning and inspiration among team members.

The company recently launched Claude Sonnet 3.5, one of the most capable AI models on the market. Opus 3.5, the largest model in the range, is scheduled for release later this year and might take the crown. Anthropic also plans to make Claude more versatile in the coming months by natively integrating popular applications and tools.


Summary
  • Anthropic has introduced the "Projects" feature for Claude.ai Pro and Team users. This allows chats to be organized into projects. Each project has a context window of 200,000 tokens, the equivalent of about 500 book pages.
  • Projects allow users to add internal documents, codebases, and knowledge to improve Claude's performance. Custom instructions can be used to further customize Claude's responses, such as a more formal tone or responses from the perspective of a specific role or industry.
  • Claude Team users can now also share snapshots of their best conversations with team members.

Sources

Anthropic
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,743

AI in practice

Jun 23, 2024

Magnific AI's Relight lets you change image lighting and backgrounds on the fly​

Gizem Akdag via X

Magnific AI's Relight lets you change image lighting and backgrounds on the fly

Matthias Bastian

Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.

Profile

E-Mail

Content


Spanish AI startup Magnific AI has launched a new feature called "Relight," which allows users to change the lighting and background of images using AI. The technology could make it easier to create realistic and varied scenes with a main subject.

Magnific AI, which joined Freepik in May, has developed Relight to allow users to modify image lighting and optionally change backgrounds using AI prompts.

Users can control lighting adjustments through text prompts like "change the lighting to sci-fi neon green," by providing a reference image, or by creating a custom light map. A demo of all three prompts is available here.


Image: LysonOber via X

Share

Recommend our article

Share

According to Magnific AI co-founder Javi Lopez, Relight works on characters, landscapes, backgrounds, and "any type of image."


Image: Javi Lopez via X

Beta users have shared numerous examples on X, showcasing the technology's potential.


Image: Julie W. Design via X

Lopez acknowledges some current limitations. When images contain multiple people or small faces, unwanted facial changes can occur. He notes this issue is "difficult to fix," but Relight performs well for standard portraits. There are also some inaccuracies in precisely matching new lighting to a scene compared to the original lighting.

The new feature could be particularly useful in commercial photography, allowing products to be easily placed in different environments. While such image manipulation was possible before AI, Relight can significantly speed up the process and make it accessible to non-experts.

Video Player



Video: Dogan Ural via X

Relight is currently in a short beta test and should be available to all Magnific AI accounts next week. The company, which initially focused on AI-based image upscaling, continues to expand its toolkit with additional AI image features.

Summary


  • The Magnific AI image tool has introduced a new feature called Relight, which can change the lighting in images and realistically place characters in new environments by changing the background at the same time.
  • The new lighting is implemented using a text prompt, a reference image, or a custom lighting map. The tool has particular potential in advertising photography, where products can be moved to different locations with little effort.
  • After a short beta test, Relight will be activated for all users next week. The feature isn't perfect yet, especially when there are multiple people or small faces in the picture.

Sources

Javi Lopez via X
 
Top