The A.I Megathread (LLM , GPT , Development)

bnew · Apr 11, 2024

Meta’s new AI chips run faster than before

The next MTIA chip is coming.

www.theverge.com

Meta’s new AI chips run faster than before

The next-generation MTIA chip could be expanded to train generative AI models.

By Emilia David, a reporter who covers AI. Prior to joining The Verge, she covered the intersection between technology, finance, and the economy.

Apr 10, 2024, 11:00 AM EDT

Illustration by Nick Barclay / The Verge

Meta promises the next generation of its custom AI chips will be more powerful and able to train its ranking models much faster.

The Meta Training and Inference Accelerator (MTIA) is designed to work best with Meta’s ranking and recommendation models. The chips can help make training more efficient and inference — aka the actual reasoning task — easier.

The company said in a blog post that MTIA is a big piece of its long-term plan to build infrastructure around how it uses AI in its services. It wants to design its chips to work with its current technology infrastructure and future advancements in GPUs.

“Meeting our ambitions for our custom silicon means investing not only in compute silicon but also in memory bandwidth, networking, and capacity as well as other next-generation hardware systems,” Meta said in its post.

Meta announced MTIA v1 in May 2023, focusing on providing these chips to data centers. The next-generation MTIA chip will likely also target data centers. MTIA v1 was not expected to be released until 2025, but Meta said both MTIA chips are now in production.

Right now, MTIA mainly trains ranking and recommendation algorithms, but Meta said the goal is to eventually expand the chip’s capabilities to begin training generative AI like its Llama language models.

Meta said the new MTIA chip “is fundamentally focused on providing the right balance of compute, memory bandwidth, and memory capacity.” This chip will have 256MB memory on-chip with 1.3GHz compared to the v1’s 128MB and 800GHz. Early test results from Meta showed the new chip performs three times better than the first-generation version across four models the company evaluated.

Meta has been working on the MTIA v2 for a while. The project was internally called Artemis and was previously reported to focus only on inference.

Other AI companies have been looking into making their own chips as the demand for compute power increases along with AI use. Google released its new TPU chips in 2017, while Microsoft announced its Maia 100 chips. Amazon also has its Trainium 2 chip, which trains foundation models four times faster than the previous version.

The competition to buy powerful chips underscored the need to have custom chips to run AI models. Demand for chips has grown so much that Nvidia, which dominates the AI chip market right now, is valued at $2 trillion.

Correction April 10th, 1:21PM ET: The story previously stated Artemis was a different chip project from Meta focused on inference. Artemis is an internal name for MTIA v2 and is not a separate chip. We regret the error.

bnew · Apr 11, 2024

1/1
Extremely thought-provoking work that essentially says the quiet part out loud: general foundation models for robotic reasoning may already exist *today*.

LLMs aren’t just about language-specific capabilities, but rather about vast and general world understanding.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/4
Very excited to announce: Keypoint Action Tokens!

We found that LLMs can be repurposed as "imitation learning engines" for robots, by representing both observations & actions as 3D keypoints, and feeding into an LLM for in-context learning.

See: Keypoint Action Tokens

More

2/4
This is a very different "LLMs + Robotics" idea to usual:

Rather than using LLMs for high-level reasoning with natural language, we use LLMs for low-level reasoning with numerical keypoints.

In other words: we created a low-level "language" for LLMs to understand robotics data!

3/4
This works really well across a range of everyday tasks with complex and arbitrary trajectories, whilst also outperforming Diffusion Policies.

Also, we don't need any training time: the robot can perform tasks immediately after the demonstrations, with rapid in-context learning.

4/4
Keypoint Action Tokens was led by the excellent
@normandipalo in his latest line of work on efficient imitation learning, following on from DINOBot (http://robot-learning.uk/dinobot)[/URL] which we will be presenting soon at ICRA 2024!

http://robot-learning.uk/keypoint-action-tokens…
@normandipalo
http://robot-learning.uk/dinobot

Artificial Intelligence · Apr 11, 2024

bnew said:
Udio | AI Music Generator - Official Website

Discover, create, and share music with the world. Use the latest technology to create AI music in seconds.

www.udio.com

Pricing
Thank you for being an early supporter! Our product is free for the duration of the beta program. In this period, you can make up to 1200 songs / month.

1/11
Introducing Udio, an app for music creation and sharing that allows you to generate amazing music in your favorite styles with intuitive and powerful text-prompting.

1/11

2/11
Bring your words to life with expressive vocals in any style. From soaring gospel to gravelly blues, from dreamy pop to silky rap, Udio has it covered.

2/11

3/11
Explore an extraordinary range of genres and styles. Here's pumping EDM, swinging piano-jazz, mellow neo-soul, and extreme metal.

3/11

4/11
Create vocals in many languages. Check out some J-pop, Russian dream pop, reggaeton, or Bollywood music.

4/11

5/11
Extend your clips forward and backward to create longer tracks. Specify intro and outro sections to complete your tracks. When it's ready, hit 'Publish' to share with the Udio community.

5/11

6/11
Check out extensions in action. Wow! A true country diamond from
@bobbybornemann

6/11

7/11
Udio is a super-powered instrument that amplifies human creativity. It's designed to be accessible, but it works better the more you put in: writing lyrics, exploring sound and genre combos, and expressing your creative taste through curation.

7/11

8/11
Our goal is to make Udio a game-changing tool for both musicians and non-musicians alike, and we are excited to be backed by leading artists
@iamwill and
@common
.

8/11

9/11
Our v1 model is capable, but not perfect. We're iterating quickly, and working on longer samples, improved sound quality, supporting more languages, and next-generation controllability. Stay tuned for more features and improvements coming soon.

9/11

10/11
Udio is founded by leading AI researchers and engineers formerly at Google DeepMind. We are backed by a16z and located across London and New York.

We are united by a love of music and technology. If you are interested in working with us, please get in touch at careers@udio.com.…

11/11
Udio is a free beta product, so expect some rough edges! Everyone can generate up to 1200 songs per month, so have fun

We can't wait to hear what you create.

11/11

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

this is dope

bnew · Apr 11, 2024

1/3
Introducing the #MuJoCo Menagerie, a collection of robot models curated by DeepMind: GitHub - google-deepmind/mujoco_menagerie: A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind. 1/

2/3
We're also releasing #MuJoCo 2.2.2, featuring new actuator types including vacuum grippers.

See changelog for details:
Changelog - MuJoCo Documentation 2/2

3/3
Our generative technology Imagen 2 can now create short, 4-second live images from a single prompt.

It’s available to use in
@GoogleCloud ’s #VertexAI platform. → Google Cloud Gemini, Image 2, and MLOps updates | Google Cloud Blog #GoogleCloudNext

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Just wait until you realize that this is full-blown physics with contacts, not just a visual rendering!

#MuJoCo 3.0 is packed with features
XLA: Accelerated physics with JAX
Non-convex collisions
Deformable bodies
Give it a try - MuJoCo 3 · google-deepmind mujoco · Discussion #1101

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
#MuJoCo 3.1.4 drops

Lots of new features -
- Gravity compensation
- Non-linear least squares (ala SysID)
- MJX features (yup, still no tendons)
- many more...
Changelog - MuJoCo Documentation

2/2
Fixed moment arm - not as much, but general tendons (spatial with varying momentum arms and internal dynamics) are a beast

#MuJoCo
http://mujoco.readthedocs.io/en/latest/changelog.html…

bnew · Apr 11, 2024

1/5
How does visual in-context learning work? We find "task vectors" -- latent activations that encode task-specific information and can guide the model to perform a desired task without providing any task examples. (1/n)

2/5
We examine the activations of the transformer and find that within the activation space of some attention heads it is possible to almost perfectly cluster the data by tasks! implying the existence of task vectors. (2/n)

3/5
To find which layers, attention heads, and token activations might hold task vectors, we define a distribution over the possible locations and search for a subset of activations that can together guide the model to perform a task using REINFORCE. (3/n)

4/5
The resulting set of task vectors performs better than using examples while consuming 20% less flops. Congrats to the authors
@AlbyHojel
@YutongBAI1002
@trevordarrell
@amirgloberson
(4/n)

5/5
Finally, our work was inspired by previous works in NLP by
@RoeeHendel ("In-context learning creates task vectors") and
@ericwtodd
("Function Vectors in Large Language Models"). The main differences are in the domain (vision) and the search algorithm (REINFORCE). (5/5)

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
Finding Visual Task Vectors

Find task vectors, activations that encode task-specific information, which guide the model towards performing a task better than the original model w/o the need for input-output examples

[2404.05729] Finding Visual Task Vectors

2/2
Adapting LLaMA Decoder to Vision Transformer

[2404.06773] Adapting LLaMA Decoder to Vision Transformer

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 11, 2024

1/10
New paper from
@Berkeley_AI on Autonomous Evaluation and Refinement of Digital Agents!

We show that VLM/LLM-based evaluators can significantly improve the performance of agents for web browsing and device control, advancing sotas by 29% to 75%.

[2404.06474] Autonomous Evaluation and Refinement of Digital Agents []

2/10
We begin by developing two types of evaluators: one that directly queries GPT-4V and another that employs an open-weight solution. Our best model shows 82% / 93% agreement with oracle evaluations on web browsing and android device control settings respectively.

3/10
Next, we show how they could be used for improving agents, either through inference-time guidance or fine-tuning.
We start with WebArena, a popular web agent benchmark. We experiment integrating the sota agent with Reflexion algorithm, using our evaluators as the reward function.

4/10
We see the improvement our evaluators provide scales favorable with evaluator capability, with the best evaluator achieving 29% improvement over previous sota.

5/10
Lastly, we experiment with improving CogAgent on iOS, for which there is no existing benchmark environment or training data.
By using the evaluator to filter sampled trajectories for behavior cloning, we significantly improve the CogAgent's success rate by a relative 75%.

6/10
We hope our results convey to you the potential of using open-ended model-based evaluators in evaluating and improving language agents.
All code is available at: https://github.com/Berkeley-NLP/Agent-Eval-Refine Paper: https://arxiv.org/abs/2404.06474 Work w/
@594zyc
@NickATomlin

@YifeiZhou02

@svlevine

@alsuhr

7/10
We also have some speculations on community's current mixed results in autonomous refinement, which our wonderful
@NickATomlin details in this thread!

8/10
Some additional speculation

Our preliminary results showed that inference-time improvement w/ Reflexion was very dependent on the performance of the critic model. A bad critic often tanks model performance x.com/pan_jiayipan/s…

9/10
Good question! Evaluators should work just fine, whether on live websites or other domains, for tasks having a similar complexity
In fact, each evaluator shares the same weight across all experiments, with only a change in the prompt. And WebArena isn't part of its training data

10/10
Thanks Martin! Let's go brrrrrrr

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 11, 2024

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a...

arxiv.org

Computer Science > Computation and Language

[Submitted on 10 Apr 2024]

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.

Comments:	9 pages, 4 figures, 4 tables
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2404.07143 [cs.CL]
	(or arXiv:2404.07143v1 [cs.CL] for this version)
	[2404.07143] Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Focus to learn more

Submission history

From: Tsendsuren Munkhdalai [view email]
[v1] Wed, 10 Apr 2024 16:18:42 UTC (248 KB)

https://arxiv.org/pdf/2404.07143

bnew · Apr 11, 2024

1/9
Apple presents Ferret-UI

Grounded Mobile UI Understanding with Multimodal LLMs

Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with

2/9
user interface (UI) screens. In this paper, we present Ferret-UI, a new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding, and reasoning capabilities. Given that UI screens typically exhibit a more elongated aspect ratio

3/9
and contain smaller objects of interest (e.g., icons, texts) than natural images, we incorporate "any resolution" on top of Ferret to magnify details and leverage enhanced visual features. Specifically, each screen is divided into 2 sub-images based on the original aspect

4/9
ratio (i.e., horizontal division for portrait screens and vertical division for landscape screens). Both sub-images are encoded separately before being sent to LLMs. We meticulously gather training samples from an extensive range of elementary UI tasks, such as icon

5/9
recognition, find text, and widget listing. These samples are formatted for instruction-following with region annotations to facilitate precise referring and grounding. To augment the model's reasoning ability, we further compile a dataset for advanced tasks, including

6/9
detailed description, perception/interaction conversations, and function inference. After training on the curated datasets, Ferret-UI exhibits outstanding comprehension of UI screens and the capability to execute open-ended instructions. For model evaluation, we establish a

7/9
comprehensive benchmark encompassing all the aforementioned tasks. Ferret-UI excels not only beyond most open-source UI MLLMs, but also surpasses GPT-4V on all the elementary UI tasks.

8/9
paper page:

9/9
daily papers:

Paper page - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Daily Papers - Hugging Face

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with user interface (UI) screens. In this paper, we present Ferret-UI, a new MLLM tailored for enhanced...

arxiv.org

Computer Science > Computer Vision and Pattern Recognition

[Submitted on 8 Apr 2024]

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan

Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with user interface (UI) screens. In this paper, we present Ferret-UI, a new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding, and reasoning capabilities. Given that UI screens typically exhibit a more elongated aspect ratio and contain smaller objects of interest (e.g., icons, texts) than natural images, we incorporate "any resolution" on top of Ferret to magnify details and leverage enhanced visual features. Specifically, each screen is divided into 2 sub-images based on the original aspect ratio (i.e., horizontal division for portrait screens and vertical division for landscape screens). Both sub-images are encoded separately before being sent to LLMs. We meticulously gather training samples from an extensive range of elementary UI tasks, such as icon recognition, find text, and widget listing. These samples are formatted for instruction-following with region annotations to facilitate precise referring and grounding. To augment the model's reasoning ability, we further compile a dataset for advanced tasks, including detailed description, perception/interaction conversations, and function inference. After training on the curated datasets, Ferret-UI exhibits outstanding comprehension of UI screens and the capability to execute open-ended instructions. For model evaluation, we establish a comprehensive benchmark encompassing all the aforementioned tasks. Ferret-UI excels not only beyond most open-source UI MLLMs, but also surpasses GPT-4V on all the elementary UI tasks.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2404.05719 [cs.CV]
	(or arXiv:2404.05719v1 [cs.CV] for this version)
	[2404.05719] Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Focus to learn more

Submission history

From: Keen You [view email]
[v1] Mon, 8 Apr 2024 17:55:44 UTC (23,745 KB)

https://arxiv.org/pdf/2404.05719.pdf

bnew · Apr 11, 2024

The Aboard app is a totally different take on what an AI bot can do

Aboard is like a mashup of Trello and ChatGPT — and it’s pretty good

www.theverge.com

The Aboard app is a totally different take on what an AI bot can do

It’s like Pinterest meets Trello meets ChatGPT meets the open web. And it can turn itself into almost anything you need.

By David Pierce, editor-at-large and Vergecast co-host with over a decade of experience covering consumer tech. Previously, at Protocol, The Wall Street Journal, and Wired.

Apr 9, 2024, 10:00 AM EDT

A screenshot of an Aboard board for recipes.

Aboard is for tracking stuff — and for building the tools for tracking stuff with AI. Image: Aboard

Aboard is not an easy app to explain. It used to be easier: at first, it was a way to collect and organize information — Trello meets Pinterest meets that spreadsheet full of links you use to plan your vacation. The company’s founders, Paul Ford and Rich Ziade, are two longtime web developers and app creators (and, in Ford’s case, also an influential writer about the web) who previously ran a well-liked agency called Postlight. They did a bunch of interesting work on parsing websites to pull out helpful information, and they built a handy visual tool for displaying it all. “People love to save links,” Ziade says, “and we love to make those links beautiful when they come in.” Simple!

But now I’m sitting here in a co-working space in New York City, a few minutes after an earthquake hit and a few days before Aboard’s biggest launch yet, and Ziade is showing me something very different. He opens up a beta version of the app and clicks a button, and after a second, the page begins to change. A database appears out of nowhere, with a bunch of categories — Year, Title, Genre, and more — that start to populate with a number of well-known movie titles. The board, as Aboard calls it, titles itself “Movie Night.” With one click, Ziade just built — and populated — a way to track your viewing habits.

Maybe the best way to explain the new Aboard is not as a Pinterest competitor but as a radical redesign of ChatGPT. When Ziade made that board, all he was really doing was querying OpenAI’s GPT-3.5. The company’s chatbot might have returned some of the same movies, but it would have done so with a series of paragraphs and bullet points. Aboard has built a more attractive, more visual AI app — and has made it so you can turn that app into anything you want.

Ziade and Ford imagine three main things you might do with Aboard. The first, “Organize,” is closest to the original vision: ask the tool for a bunch of things to do in Montreal this summer, and it’ll populate a board with some popular attractions and restaurants. Ask Aboard to meal plan your week, and it’ll create a board segmented by day and by meal with nicely formatted recipes. The second, “Research,” is similar but a little more exploratory: ask Aboard to grab the most interesting links about African bird species, and it’ll dump them all into place for you to peruse at your leisure.

A screenshot of the Board Builder feature in Aboard.

You can build your own stuff in Aboard — but the AI is much faster. Screenshot: David Pierce / The Verge

Like any AI product right now, this is sometimes cooler in theory than in reality. When I ask Ziade to make a board with important tech moments from 2004, it pulls a bunch of them into separate cards: Google’s IPO, the launch of Gmail, the iPod Mini launch. And then the iPod Mini launch again and then another time and then three more times after that. Ziade and Ford both laugh and say this is the stuff they see all the time. A few times, a demo just fails, and each time, Ford says something to the effect of “Yeah, that just happens when you ping the models.” But he says it’s also getting better fast.

The third use case, which Aboard calls “Workflow,” is where Aboard figures its true business lies. Ziade does another demo: he enters a prompt into Aboard, asking it to set up a claims tracker for an insurance company. After a few seconds, he has a fairly straightforward but useful-looking board for processing claims, along with a bunch of sample cards to show how it works. Is this going to be perfect and powerful enough for an insurance company to start using as is? No. But it’s a start. Ford tells me that Aboard’s job is to build something good enough but also not quite good enough — if the app can work just well enough to get you to customize it the rest of the way to fit your needs, that’s the goal.

An Aboard board can be a table, a list, a gallery, and more

This is ultimately a very business-y use case and puts Aboard in loose competition with the Airtables and Salesforces of the world. Ziade and Ford are upfront about this. “We want to be in professional settings,” Ford says, “that’s a real thing we’re aiming for. Doesn’t have to be for big enterprise, but definitely small teams, nonprofits, things like that.” He figures Aboard can sell to companies by saving them a bunch of time and meetings spent figuring out how to organize data and just get them started right away. An Aboard board can be a table, a list, a gallery, and more; it’s a pretty flexible tool for managing most kinds of data.

I have no particular business use for Aboard, but I’ve been testing the app for a while, and it’s a really clever way to redesign the output of a large language model. Particularly when it’s combined with Aboard’s ability to parse URLs, it can quickly put together some really useful information. I’ve been saving links for months as I plan a vacation, and I had Aboard build me a project planner for managing a big renovation of my bathroom. (It’s all very exciting stuff.)

A screenshot of an Aboard board showing movies to watch.

I asked Aboard’s AI to build me a database of Oscar winners. It... sort of worked. Screenshot: David Pierce / The Verge

Just before Aboard’s AI launch, I tried building another board: I prompted the AI to create “a board of Oscar-winning movies, with stacks for each movie genre and tags for Rotten Tomatoes scores,” and Aboard went to work. It came back with stacks (Aboard’s parlance for sub-lists) for six different movie genres, tags for various score ranges, plus runtimes, posters, and Rotten Tomatoes links for each flick. Were all the movies it selected Best Picture winners? Nope! Did it get the ratings right, like, ever? Nope! But it still felt like a good start — and Aboard always gives you the option to delete the sample cards it generates and just start from scratch.

Aboard is just one of a new class of AI companies, the ones that won’t try to build Yet Another Large Language Model but will instead try to build new things to do with those models and new ways to interact with them. The Aboard founders say they ultimately plan to connect to lots of models as those models become, in some cases, more specialized and, in others, more commoditized. In Aboard’s case, they want to use AI not as an answer machine but as something like a software generator. “We still want you to go to the web,” Ford says. “We want to guide you a bit and maybe kickstart you, but we’re software people — and we think the ability to get going really quickly is really, really interesting.” The Aboard founders want AI to do the work about the work, so you can just get to work.

bnew · Apr 12, 2024

1/4
Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding.
Source: GitHub - openai/simple-evals

2/4
For example, when writing with ChatGPT, responses will be more direct, less verbose, and use more conversational language.

3/4
We continue to invest in making our models better and look forward to seeing what you do. If you haven’t tried it yet, GPT-4 Turbo is available in ChatGPT Plus, Team, Enterprise, and the API.

4/4
UPDATE: the MMLU points weren’t clear on the previous graph. Here’s an updated one.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 12, 2024

1/3
Into the long night

This was made in a few minutes using udio, absolutely mad what can be done.

listen to the 12 second mark and be amazed.

Lyrics below

2/3
Code lines

You’ve been coding through those long nights, yeah
Keys clacking, I know that tech flow
Fire up the IDE, type away, yeah
Code’s compiling

Quick note: You said Python's your favorite, right?
In the terminal, light's so bright, watching it glow
Algorithms and data in a…

3/3
there are tools for this

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 12, 2024

1/7
Tried to see what I could get Udio to generate beyond music.

It can do comedy, speeches, npc dialogue, sports analysis, commercials, radio broadcasts, asmr, nature sounds, etc.

It’s basically an AI audio engine.

Pretty wild.

Watch for examples of each.

2/7
Here’s the published playlist for each example so you can get a sense for prompts you can play with:

Playlist: https://udio.com/playlists/deGuVDLYd9MrXtxnxfX7z1 Worth noting that custom lyrics are a MASSIVE part of the promoting process.

It’ll be interesting to see what people can get it to do.

3/7
It was also funny to see things like this as I played around with it

4/7
My suspicion is they didn’t even know how broad its capabilities are

5/7
Quality being the big one. You can hear all sorts of errors. But for not being trained on those use cases not bad!

6/7
Haven’t been able to nail that one yet.

It takes a ton of effort to get it to NOT generate any music.

Take the NPC example. Based on the game I described it would always generate background music for the game.

People who are better at prompting than me I’m sure will crack it.

7/7
Saw people posting comedy stuff and wondered what else it could do.

Turns out a whole lot!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
ai will probably most likely lead to the end of the world, but in the meantime there will be great stand-up comedy

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 12, 2024

1/8
With Chatbot UI you can use 100+ models in the same chat experience.

Watch me use Mistral 8x7b via Groq and then switch to the new Claude 3 Opus.

Any model. One interface.

2/8
Our hosted version recently went into beta at http://ChatbotUI.com. We’re working on significant UI improvements (mobile, fix files/tools), an enterprise version with teams & workspaces, and generative ui capabilities.

It’s $10/mo or you can self-host since it’s open source.

3/8
It will become obvious over time that as an individual or business you want to own your chat experience and that you don’t want to be tied to a specific model.

Because it’s open source, Chatbot UI gives you ownership and control of that experience.

Powerful and flexible.

4/8
We just enabled a basic free tier for the beta release for those who want to get a quick feel for the standard chat feature and evaluate.

Pro features are getting some huge improvements over the course of March to close out the beta launch.

We hope you like them!

5/8
Welcome to the new 5,233 people that joined while I was asleep!

We have a *lot* coming in terms of improvements and features as we wrap up our beta this month.

We’re working on a bug reporting system and better guides on how to use Chatbot UI.

More polish & features every day.

6/8
Not yet but if this is something people really want it wouldn’t be too difficult to add

7/8
Vercel has an AI SDK with good utils for streaming

8/8
It’s most of my general purpose chat now

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 12, 2024

1/1
Happy to see
@zoox now driving around on the roads in Las Vegas

Robo-taxis are def 10 years away

h/t for the video
@Airzly

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/6
Robo-taxis will never happen

2/6
100%

laid it out here

3/6
Uber is Dead, my reflections on Waymo

I’ve been in San Francisco for just over a week, during which I’ve taken 7 rides with Waymo, a similar number with Uber, and a few with FSD Teslas.

My journey to SFO via Uber was alarming—the driver veered out of the lane multiple times and…

4/6
someone ate crackers before me

actually this was my only cracker filled experience.

there will be loads of jobs to be had in the vehicle cleaning industry

5/6
yes correct

6/6
it’s coming dude

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 12, 2024

Language doesn’t perfectly describe consciousness. Can math?

Even the most poetic words can’t capture the full richness of our minds. So scientists are turning to numbers.

www.vox.com

FILED UNDER:

Language doesn’t perfectly describe consciousness. Can math?

Even the most poetic words can’t capture the full richness of our minds. So scientists are turning to numbers.

By Oshan Jarow @OshanJarow Apr 10, 2024, 9:00am EDT

A drawing of a beam of light entering the eye of a silhouetted human head and turning into rainbow colors that flow out the back of the head like hair.

Getty Images

Oshan Jarow is a staff writer with Vox's Future Perfect, where he focuses on the frontiers of political economy and consciousness studies. He covers topics ranging from guaranteed income and shorter workweeks to meditation and psychedelics.

This story is part of a group of stories called

Finding the best ways to do good.

The idea that language is a clumsy, imperfect tool for capturing the depth and richness of our experiences is ancient. For centuries, a steady stream of poets, philosophers, and spiritual practitioners have pointed to this indescribability, the difficult fact that our experiences are ineffable, or larger than what words can communicate.

But as a frenzy of developments across AI, neuroscience, animal studies, psychedelics, and meditation breathe new life into consciousness research, scientists are devising new ways of, maybe, pushing our descriptions of experience beyond the limits of language. Part of the hope behind what mathematician and physicist Johannes Kleiner recently termed the “structural turn in consciousness science” is that where words fall short, math may prevail.

“My view is that mathematical language is a way for us to climb out of the boundaries that evolution has set for our cognitive systems,” Kleiner told Vox. “Hopefully, [mathematical] structure is like a little hack to get around some of the private nature of consciousness.”

For example, words could offer you a poem about the feeling of standing on a sidewalk when a car driving by splashes a puddle from last night’s rain directly into your face. A mathematical structure, on the other hand, could create an interactive 3D model of that experience, showing how all the different sensations — the smell of wet concrete, the maddening sound of the car fleeing the scene of its crime, the viscous drip of dirty water down your face — relate to one another.

Structural approaches could provide new and more testable predictions about consciousness. That, in turn, could make a whole new range of experimental questions about consciousness tractable, like predicting the level of consciousness in coma patients, which structural ideas like Integrated Information Theory (IIT) are already doing.

But for my money, there will always be a gap between even the best structural models of consciousness and the what-it’s-like-ness of the experiences we have. Mica Xu Ji, a former post-doctoral fellow at the Mila AI Institute and lead author of a new paper that takes a structural approach to making sense of this longstanding fact of ineffability, thinks ineffability isn’t a bug, it’s a feature that evolution baked into consciousness.

From humans to machine learning models, information loss is a common problem. But another way to look at losing information is to see it as gaining simplicity, and simplicity, she explained, helps to make consciousness generalizable and more useful for navigating situations we haven’t experienced before. So maybe ineffability isn’t just a problem that locks away the full feeling of our experiences, but is also an evolutionary feature that helped us survive through the ages.

The math of ineffability

In theory, working out the precise ineffability of an experience is pretty straightforward.

Ji and her colleagues began with the idea that the richness of conscious experience depends on the amount of information it contains. We can already take real-world reads of richness by measuring the entropy, or unpredictability, of electrical activity in the brain.

Her paper argues that to gauge ineffability, all you’d need are two variables: the original state and the output state. The original state could be a direct measure of brain activity, including all the neural processing that goes on beneath conscious awareness. The output state could be the words you produce to describe it, or even the narrativized thoughts in your head about it (unless you have no inner monologue).

Then, comparing those numbers would give you an approximation of the ineffability. The more information lost in the conversion of experience to language, as measured by comparing the relative entropy of the original and output variables, the greater the magnitude of the ineffable, or the information that language left behind. “Ineffability is about how information gets lost as stuff goes downstream in brain processing,” said Kleiner.

Now, think of the long arc of human evolution. Ineffability means that consciousness can produce simpler representations of the overwhelming richness of pure experience, what the American philosopher William James famously called the “blooming, buzzing confusion” of consciousness. That means encountering a pissed-off tiger can be generalized to the broader idea that all large cats with massive teeth may pose a threat, rather than constraining that lesson to only that specific tiger in that specific context.

In machine learning models, as in humans, simplicity supports generalization, which makes the models useful beyond what they encounter in their training data sets. “Language has been optimized to be simple so that we can transmit information fast,” Ji said. “Ineffability, or information loss, is a fundamental feature of learning systems that generalize.”

Pain, health, and the mundane potential of cracking ineffability

Ineffability is often associated with mysticism, poetry, or heady conversations about the nature of the mind. Or now, the math of information theory and the evolutionary purpose of consciousness.

But ineffability also plays a role in more immediate and mundane concerns. Take chronic pain. One of the most common approaches to understanding what someone with chronic pain is experiencing is to have them self-report the intensity of their pain on a scale from 0 to 10. Another, the Visual Analog Scale, asks them to mark their pain intensity along a 10-centimeter line, zero being no pain and 10 representing the worst possible pain.

These are ... not the most detailed of measures. They also smuggle in assumptions that can distort how we understand the pain of others. For example, a linear scale with equal spaces between each possible number suggests that alleviating someone’s reported pain from a four to a three is roughly similar to dropping someone else’s from a nine to an eight. But the experiential distance between an eight and a nine could be orders of magnitude greater than between smaller numbers on the scale, leading us to drastically underestimate how much suffering people at the high end of the spectrum are enduring.

Kleiner explained that structural approaches to representing pain can have the same effect as moving from a 2D image to 3D. “Structural research can distinguish not only location, but different qualities of the pain. Like whether it’s throbbing, for example. We could easily have 20 dimensions.” And each dimension adds more richness to our understanding of the pain. That could lend motivation — and funding — to treating some of the world’s most debilitating conditions that lack effective treatments, such as cluster headaches.

The same principle applies to mental health. Many mental health indicators rely on self-reporting the richness of our internal experience on linear scales. But if structural approaches to consciousness can render 3D representations of experience, maybe they can add some richness to how we measure, and therefore manage, mental health at large.

For example, neuroscientist Selen Atasoy has been developing the idea of “ brain harmonics,” which measures electrical activity in the brain and delivers actual 3D representations of moments of experience. With those representations, it’s possible that we could learn more about their nature, like the amount of pleasure or pain, by running mathematical analyses based on the harmonic frequencies they contain rather than asking the person to report how they feel via language.

Structural approaches and math surely have their limits. Galileo kicked off the scientific method by assuming that the universe is “written in the language of mathematics,” which demotes the ineffable depths of human experience to, apparently, something irrelevant to science. Reviving that idea with more advanced math would be a mistake.

But language, maybe by design, will never capture the full richness of consciousness. That might be to our benefit, helping us generalize our experience in an ever-uncertain world. Meanwhile, more precise mathematical structures to describe conscious experience could also bring welcome benefits, from grasping just how intense pain can be to conveying the most blissful of pleasures.

A version of this story originally appeared in the Future Perfect newsletter. Sign up here!

The A.I Megathread (LLM , GPT , Development)

Veteran

Meta’s new AI chips run faster than before​

The next-generation MTIA chip could be expanded to train generative AI models.​

Veteran

Not Allen Iverson

Pricing​

Veteran

Veteran

Veteran

Veteran

Computer Science > Computation and Language​

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention​

Submission history​

Veteran

Computer Science > Computer Vision and Pattern Recognition​

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs​

Submission history​

Veteran

The Aboard app is a totally different take on what an AI bot can do​

It’s like Pinterest meets Trello meets ChatGPT meets the open web. And it can turn itself into almost anything you need.​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Language doesn’t perfectly describe consciousness. Can math?​

This story is part of a group of stories called​

The math of ineffability​

Pain, health, and the mundane potential of cracking ineffability​

Meta’s new AI chips run faster than before

The next-generation MTIA chip could be expanded to train generative AI models.

Pricing

Computer Science > Computation and Language

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Submission history

Computer Science > Computer Vision and Pattern Recognition

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Submission history

The Aboard app is a totally different take on what an AI bot can do

It’s like Pinterest meets Trello meets ChatGPT meets the open web. And it can turn itself into almost anything you need.

Language doesn’t perfectly describe consciousness. Can math?

This story is part of a group of stories called

The math of ineffability

Pain, health, and the mundane potential of cracking ineffability