The A.I Megathread (LLM , GPT , Development)

bnew · Sep 5, 2024

1/7

New: my last PhD paper

Introducing System-1.x, a controllable planning framework with LLMs. It draws inspiration from Dual-Process Theory, which argues for the co-existence of fast/intuitive System-1 and slow/deliberate System-2 planning.

System 1.x generates hybrid plans & balances between the two planning modes (efficient + inaccurate System-1 & inefficient + more accurate System-2) based on the difficulty of the decomposed (sub-)problem at hand.

Some exciting results+features of System-1.x:

-- performance: beats System-1, System-2 & a symbolic planner (A*) both ID and OOD (up to 39%), given an exploration budget.
-- training-time control/balance: user can train a System-1.25/1.5/1.75 to balance accuracy + efficiency.
-- test-time control/balance: user can bias the planner to solve more/less sub-goals using System-2.
-- flexibility to integrate symbolic solvers: allows building neuro-symbolic System-1.x with a symbolic System-2 (A*).
-- generalizability: can learn from different search algos (DFS/BFS/A*).

2/7
System-1.x consists of 3 components (trained using only search traces as supervision):

1⃣ a Controller, that decomposes a planning problem into easy+hard sub-goals
2⃣ a System-1 Planner, that solves easy sub-goals
3⃣ a System-2 Planner, that solves hard sub-goals w/ deliberate search

3/7
To train the controller, we automatically generate sub-goal decompositions, given 2 things:

– a hybridization factor x, that decides Sys-1 to Sys-2 balance in the planner.
– a hardness function, estimating difficulty of a sub-goal

Then we solve a constrained optimization problem that finds a contiguous x% of steps in the plan such that it corresponds to the hardest sub-goal.

4/7
On both Maze Navigation & Blocksworld domains, we get exciting results both ID & OOD:

System 1.x matches & generally outperforms System-1, System-2 & also a simpler hybrid w/o sub-goal decompositions at all state-exploration budgets by up to 33%.

Neuro-symbolic System 1.x also typically outperforms symbolic search, beating A* by up to 39% at a fixed budget & matching it at max budget.

5/7

Train-time controllability: We train a System-1.75 Planner that, compared to a System-1.5 Planner, trades off efficiency for greater accuracy. This trend can be continued to recover the full System-2 performance.

Generalizability: System-1.x also generalizes to different search algos (BFS, DFS, A*) & exhibits exploration behavior that closely resembles the corresponding algo it is trained on.

6/7
Here's a nice illustration of System 1.x for Maze:

System-1 ignores obstacles

fail

System-2 reaches a dead end + search does not terminate

fail

System-1.x decomposes into sub-goals & solves the middle hard sub-goal w/ obstacles using System-2

success

In Blocksworld, we also show an example of sub-goal decomposition benefitting System-1.x.

7/7
Paper link: [2407.14414] System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Code link: GitHub - swarnaHub/System-1.x: PyTorch code for System-1.x: Learning to Balance Fast and Slow Planning with Language Models

work done w/ @ArchikiPrasad @cyjustinchen @peterbhase @EliasEskin @mohitban47
@uncnlp @UNC

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 5, 2024

1/4
Apple presents SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Comparable or better performance compared to SotA Video LLMs that are fine-tuned on video datasets while being training-free

[2407.15841] SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

2/4
Dark mode for this paper for those who read at night

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

3/4
AI Summary: The paper introduces SlowFast-LLaVA (SF-LLaVA), a training-free video large language model that effectively captures detailed spatial semantics and long-range temporal context without exceeding t...
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

4/4
that's f crazy

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 5, 2024

1/2
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

- Evaluates the ability of web agents to solve realistic and time-consuming tasks.
- Includes 214 tasks from 258 different websites

proj: AssistantBench
abs: [2407.15711] AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

2/2
AI Summary: The paper introduces A SSISTANT B ENCH, a new benchmark featuring 214 realistic and time-consuming tasks that assess the capabilities of web agents built on language models. The study reveals sig...
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 5, 2024

1/1
Google presents Still-Moving

- Tuning any video model on still images without losing the motion prior
- New SOTA in video personalization, stylization, and enables combination with ControlNet

proj: StillMoving
abs: [2407.08674] Still-Moving: Customized Video Generation without Customized Video Data

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 5, 2024

1/6
We've opened the waitlist for General Robot Intelligence Development (GRID) Beta! Accelerate robotics dev with our open, free & cloud-based IDE. Zero setup needed. Develop & deploy advanced skills with foundation models and rapid prototyping

: Scaled Foundations

(1/6)

2/6
GRID supports a wide range of robot form factors and sensors/modalities (RGB, depth, lidar, GPS, IMU and more) coupled with mapping and navigation for comprehensive and physically accurate sensorimotor development.
(2/6)

3/6
Curate data to train Robotics Foundation Models at scale with GRID. Create thousands of scenarios, across form factors, and robustly test model performance on both sim and real data.
(3/6)

4/6
GRID's LLM-based orchestration leverages foundation models to integrate various modalities, enabling robust sensorimotor capabilities, such as multimodal perception, generative modeling, and navigation, with zero-shot generalization to new scenarios.
(4/6)

5/6
Develop in sim, export and deploy skills on real robots in hours instead of months! GRID enables safe & efficient deployment of trained models on real robots, ensuring reliable performance in real-world scenarios. Example of safe exploration on the Robomaster & Go2:
(5/6)

6/6
Enable your robots to be useful with seamless access to foundation models, simulations and AI tools. Sign up for GRID Beta today and experience the future of robotics development!

Docs: Welcome to the GRID platform! — GRID documentation

(6/6)

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 5, 2024

1/3
The data and code for our recent project “Large Language Monkeys”, which characterizes inference-time scaling laws for LLMs, are now public!

For more details, check out our
Blog post: Monkey Business: a dataset of large LLM sample collections for math and code tasks!
Dataset: ScalingIntelligence/monkey_business · Datasets at Hugging Face
Code: GitHub - ScalingIntelligence/large_language_monkeys
Paper: [2407.21787] Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

2/3
We are releasing 10k correct and incorrect LLM samples per problem for a variety of tasks (MiniF2F-MATH, CodeContests, MATH and GSM8K), and model sizes/families (Llama-3, Gemma, and Pythia), as well as the code needed to generate and evaluate samples for other models.

3/3
We hope this dataset and accompanying code can be useful for future research on improved verification methods (particularly in the large sample setting!), self-improvement methods, and investigating patterns between correct and incorrect model generations.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 5, 2024

1/1
Alibaba/Qwen org was deplatformed by Github. Details are slim at the moment. Qwen is one of the best open-model on the planet at the moment. Either US or China submitted claims to Github? Only these two nation states have this power or desire for this. Hopefully I am wrong.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
We are fukking back!!! Go visit our github now!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 5, 2024

Mistral Large 2 is now 3rd on the SEAL Coding Leaderboard

bnew · Sep 5, 2024

1/1
New w/
@erinkwoo :

OpenAI has considered high-priced subscriptions for future Strawberry and Orion products.

In early internal discussions, subscription prices ranging up to $2K/month were on the table (though we doubt the final price will be that high)

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 5, 2024

1/1
As we move towards more powerful AI, it becomes urgent to better understand the risks in a mathematically rigorous and quantifiable way and use that knowledge to mitigate them. More in my latest blog entry where I describe our recent paper on that topic.
Bounding the probability of harm from an AI to create a guardrail - Yoshua Bengio

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Sep 5, 2024

Ex-Google CEO's BANNED Interview LEAKED: "You Have No Idea What's Coming"

bnew · Sep 5, 2024

AI Information Doc

AI Information Doc

THE AI INFORMATION DOC HAS MOVED TO THIS URL: https://ai-doc-writer.github.io/ai_guide/

docs.google.com

Micky Mikey · Sep 5, 2024

bnew said:
1/1
New w/
@erinkwoo :

OpenAI has considered high-priced subscriptions for future Strawberry and Orion products.

In early internal discussions, subscription prices ranging up to $2K/month were on the table (though we doubt the final price will be that high)

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

The next model better be damn good to justify such a subscription price. It does make you wonder if they have something that is drastically more powerful/reliable than their current SOTA models.

bnew · Sep 5, 2024

Research Forum 4 | Keynote: Phi-3-Vision: A highly capable and "small" language vision model

IIVI · Sep 5, 2024

The A.I Megathread (LLM , GPT , Development)

More options

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

AI Information Doc

Micky Mikey

Banned

bnew

Veteran

IIVI

Superstar