bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139






1/7
🚨 New: my last PhD paper 🚨

Introducing System-1.x, a controllable planning framework with LLMs. It draws inspiration from Dual-Process Theory, which argues for the co-existence of fast/intuitive System-1 and slow/deliberate System-2 planning.

System 1.x generates hybrid plans & balances between the two planning modes (efficient + inaccurate System-1 & inefficient + more accurate System-2) based on the difficulty of the decomposed (sub-)problem at hand.

Some exciting results+features of System-1.x:

-- performance: beats System-1, System-2 & a symbolic planner (A*) both ID and OOD (up to 39%), given an exploration budget.
-- training-time control/balance: user can train a System-1.25/1.5/1.75 to balance accuracy + efficiency.
-- test-time control/balance: user can bias the planner to solve more/less sub-goals using System-2.
-- flexibility to integrate symbolic solvers: allows building neuro-symbolic System-1.x with a symbolic System-2 (A*).
-- generalizability: can learn from different search algos (DFS/BFS/A*).

🧵👇

2/7
System-1.x consists of 3 components (trained using only search traces as supervision):

1⃣ a Controller, that decomposes a planning problem into easy+hard sub-goals
2⃣ a System-1 Planner, that solves easy sub-goals
3⃣ a System-2 Planner, that solves hard sub-goals w/ deliberate search

3/7
To train the controller, we automatically generate sub-goal decompositions, given 2 things:

– a hybridization factor x, that decides Sys-1 to Sys-2 balance in the planner.
– a hardness function, estimating difficulty of a sub-goal

Then we solve a constrained optimization problem that finds a contiguous x% of steps in the plan such that it corresponds to the hardest sub-goal.

4/7
On both Maze Navigation & Blocksworld domains, we get exciting results both ID & OOD:

✅System 1.x matches & generally outperforms System-1, System-2 & also a simpler hybrid w/o sub-goal decompositions at all state-exploration budgets by up to 33%.

✅Neuro-symbolic System 1.x also typically outperforms symbolic search, beating A* by up to 39% at a fixed budget & matching it at max budget.

5/7
✅ Train-time controllability: We train a System-1.75 Planner that, compared to a System-1.5 Planner, trades off efficiency for greater accuracy. This trend can be continued to recover the full System-2 performance.

✅ Generalizability: System-1.x also generalizes to different search algos (BFS, DFS, A*) & exhibits exploration behavior that closely resembles the corresponding algo it is trained on.

6/7
Here's a nice illustration of System 1.x for Maze:

❌ System-1 ignores obstacles ➡️ fail

❌ System-2 reaches a dead end + search does not terminate ➡️ fail

✅ System-1.x decomposes into sub-goals & solves the middle hard sub-goal w/ obstacles using System-2 ➡️ success

In Blocksworld, we also show an example of sub-goal decomposition benefitting System-1.x.

7/7
Paper link: [2407.14414] System-1.x: Learning to Balance Fast and Slow Planning with Language Models
Code link: GitHub - swarnaHub/System-1.x: PyTorch code for System-1.x: Learning to Balance Fast and Slow Planning with Language Models

work done w/ @ArchikiPrasad @cyjustinchen @peterbhase @EliasEskin @mohitban47
@uncnlp @UNC


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTGxZfNaYAAemAK.jpg

GTGx36gb0AAUrks.jpg

GTGyA7XaoAAAwOZ.png

GTGyJpKbwAAwmgI.jpg

GTGyJpLbUAA3v_g.jpg

GTGyUdwaEAA06b2.jpg

GTGyUdxaIAAzu6b.jpg

GTGyfbibcAUOM0J.jpg

GTGyfbgaYAEF0z-.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139

1/4
Apple presents SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Comparable or better performance compared to SotA Video LLMs that are fine-tuned on video datasets while being training-free

[2407.15841] SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

2/4
Dark mode for this paper for those who read at night 🌚 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

3/4
AI Summary: The paper introduces SlowFast-LLaVA (SF-LLaVA), a training-free video large language model that effectively captures detailed spatial semantics and long-range temporal context without exceeding t...
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

4/4
that's f crazy


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTI_el8WUAAOJfo.jpg

GTR4s79bcAA9Oyr.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139

1/2
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

- Evaluates the ability of web agents to solve realistic and time-consuming tasks.
- Includes 214 tasks from 258 different websites

proj: AssistantBench
abs: [2407.15711] AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

2/2
AI Summary: The paper introduces A SSISTANT B ENCH, a new benchmark featuring 214 realistic and time-consuming tasks that assess the capabilities of web agents built on language models. The study reveals sig...
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTUfjcMaMAAbH3G.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139





1/6
We've opened the waitlist for General Robot Intelligence Development (GRID) Beta! Accelerate robotics dev with our open, free & cloud-based IDE. Zero setup needed. Develop & deploy advanced skills with foundation models and rapid prototyping

🔗: Scaled Foundations

🧵(1/6)

2/6
GRID supports a wide range of robot form factors and sensors/modalities (RGB, depth, lidar, GPS, IMU and more) coupled with mapping and navigation for comprehensive and physically accurate sensorimotor development.
(2/6)

3/6
Curate data to train Robotics Foundation Models at scale with GRID. Create thousands of scenarios, across form factors, and robustly test model performance on both sim and real data.
(3/6)

4/6
GRID's LLM-based orchestration leverages foundation models to integrate various modalities, enabling robust sensorimotor capabilities, such as multimodal perception, generative modeling, and navigation, with zero-shot generalization to new scenarios.
(4/6)

5/6
Develop in sim, export and deploy skills on real robots in hours instead of months! GRID enables safe & efficient deployment of trained models on real robots, ensuring reliable performance in real-world scenarios. Example of safe exploration on the Robomaster & Go2:
(5/6)

6/6
Enable your robots to be useful with seamless access to foundation models, simulations and AI tools. Sign up for GRID Beta today and experience the future of robotics development!

Docs: Welcome to the GRID platform! — GRID documentation

(6/6)


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTFOsVAbsAARlao.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139


1/3
The data and code for our recent project “Large Language Monkeys”, which characterizes inference-time scaling laws for LLMs, are now public!

For more details, check out our
Blog post: Monkey Business: a dataset of large LLM sample collections for math and code tasks!
Dataset: ScalingIntelligence/monkey_business · Datasets at Hugging Face
Code: GitHub - ScalingIntelligence/large_language_monkeys
Paper: [2407.21787] Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

2/3
We are releasing 10k correct and incorrect LLM samples per problem for a variety of tasks (MiniF2F-MATH, CodeContests, MATH and GSM8K), and model sizes/families (Llama-3, Gemma, and Pythia), as well as the code needed to generate and evaluate samples for other models.

3/3
We hope this dataset and accompanying code can be useful for future research on improved verification methods (particularly in the large sample setting!), self-improvement methods, and investigating patterns between correct and incorrect model generations.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139

1/1
Alibaba/Qwen org was deplatformed by Github. Details are slim at the moment. Qwen is one of the best open-model on the planet at the moment. Either US or China submitted claims to Github? Only these two nation states have this power or desire for this. Hopefully I am wrong.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
We are fukking back!!! Go visit our github now!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GWsQYg1aUAACNB-.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139

1/1
As we move towards more powerful AI, it becomes urgent to better understand the risks in a mathematically rigorous and quantifiable way and use that knowledge to mitigate them. More in my latest blog entry where I describe our recent paper on that topic.
Bounding the probability of harm from an AI to create a guardrail - Yoshua Bengio


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139
Ex-Google CEO's BANNED Interview LEAKED: "You Have No Idea What's Coming"


 

Micky Mikey

Veteran
Supporter
Joined
Sep 27, 2013
Messages
15,999
Reputation
2,972
Daps
89,254

1/1
New w/
@erinkwoo :

OpenAI has considered high-priced subscriptions for future Strawberry and Orion products.

In early internal discussions, subscription prices ranging up to $2K/month were on the table (though we doubt the final price will be that high)


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


The next model better be damn good to justify such a subscription price. It does make you wonder if they have something that is drastically more powerful/reliable than their current SOTA models.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,430
Reputation
8,509
Daps
160,139
Research Forum 4 | Keynote: Phi-3-Vision: A highly capable and "small" language vision model



 
Top