The A.I Megathread (LLM , GPT , Development)

bnew · Jun 25, 2024

OpenAI’s ChatGPT for Mac is now available to all users

It supports pretty much everything but API calls.

arstechnica.com

OpenAI’s ChatGPT for Mac is now available to all users

It supports pretty much everything but API calls.

Samuel Axon - 6/25/2024, 5:37 PM

A message field for ChatGPT pops up over a Mac desktop

Enlarge / The app lets you invoke ChatGPT from anywhere in the system with a keyboard shortcut, Spotlight-style.

Samuel Axon

9

OpenAI's official ChatGPT app for macOS is now available to all users for the first time, provided they're running macOS Sonoma or later.

It was previously being rolled out gradually to paid subscribers to ChatGPT's Plus premium plan.

The ChatGPT Mac app mostly acts as a desktop window version of the web app, allowing you to carry on back-and-forth prompt-and-response conversations. You can select between the GPT-3.5, GPT-4, and GPT-4o models. It also supports the more specialized GPTs available in the web version, including the DALL-E image generator and custom GPTs.

There is one important omission in this desktop app: It doesn't support using the API. For that, those wanting a desktop app will still need to use a third-party one like Jordi Bruin's MacGPT.

OpenAI's app lets you enable a system-wide keyboard shortcut (option + space by default) to type in a prompt any time; it works a bit like opening a Spotlight search in macOS.

It was announced alongside the GPT-4o model, which is faster and cheaper to use than GPT-4 with similar (albeit not exactly the same) accuracy and quality, and which expands the model's ability to interact with images and videos.

At the same time it unveiled GPT-4o and the Mac app, OpenAI demonstrated a new, conversational approach to voice chatting with ChatGPT. That's not yet widely available, but it's said to be coming soon. For now, the Mac app supports the old style of back-and-forth voice chats with the chatbot.

The Mac app is unavailable in the Mac App Store, but you can download it directly from OpenAI's website.

Note that, like the iPhone or Android ChatGPT app, this is distinct from the ChatGPT integration coming to Apple's operating systems this fall. That integration will see Siri referring users to ChatGPT (and possibly alternative models in the future) to answer queries that are outside Siri's usual scope, and it will be baked into the operating system.

There's still no Windows app. Why not? At least one report claimed that OpenAI prioritized a Mac app over a Windows app "because that's where most of its users are." Of course, Windows users aren't hurting for AI chatbot options, as Microsoft has been throwing ChatGPT-powered Copilot into everything it can of late.

bnew · Jun 26, 2024

1/11
Can long-context language models (LCLMs) subsume retrieval, RAG, SQL, and more?

Introducing LOFT: a benchmark stress-testing LCLMs on million-token tasks like retrieval, RAG, and SQL. Surprisingly, LCLMs rival specialized models trained for these tasks!

[2406.13121] Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

2/11
LOFT is a massive benchmark evaluating LCLMs on 30+ real-world retrieval & reasoning datasets across text, image, video, & audio. LOFT supports sequence lengths up to 1 million tokens (and possibly more!).

3/11
To perform corpus-grounded reasoning, we introduce Corpus-in-Context prompting, which seamlessly integrates a corpus, instructions, and few-shot examples for LOFT tasks. Prompting strategies significantly influence LCLM performance, highlighting the need for continued research.

4/11
Our findings show that LCLMs can already achieve retrieval performance comparable to specialized systems like Gecko and CLIP. However, challenges remain in areas like multi-hop compositional reasoning.

5/11
Check out our paper for more details on the LOFT benchmark and the CiC prompting! In our paper, we also detail interesting ablation studies for the CiC prompting.
Paper: [2406.13121] Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Data: GitHub - google-deepmind/loft: LOFT: A 1 Million+ Token Long-Context Benchmark

6/11
This was an amazing collaboration by:
@leejnhk @_anthonychen @ZhuyunDai @ddua17 @Devendr06654102 @MichaelBoratko @YiLuan9 @seba1511 @vincentperot @siddalmia05 @Hexiang_Hu @Xudong_Lin_AI @IcePasupat @amini_aida @jeremy_r_cole @riedelcastro @IftekharNaim @mchang21 @kelvin_guu

7/11
Dark mode for this paper for those who read at night

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

8/11
Dark mode for this paper

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

9/11
The results show that at the 128k token level, the LLMs can rival the performance of specialized models on text retrieval, visual retrieval, and audio retrieval tasks. However, the LLMs lag significantly behind specialized models on complex multi-hop reasoning and SQL-like tasks, indicating substantial room for improvement in these areas.

full paper: Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

10/11

11/11
[QA] Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/11

New paper out! '‘Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data’ https://arxiv.org/pdf/2406.13843

Multimodal GenAI tools offer transformative potential across industries, but their potential for misuse also poses significant risks. (

1/n)

2/11
Yet, we lack a holistic framework to understand how these GenAI tools are exploited ‘in the wild’ and which tactics are most prevalent.

We tackle this in our new paper.

3/11
In the paper, we a) introduce a taxonomy of GenAI misuse tactics, b) report key trends from an analysis of ~200 media reports of misuse between January 2023 and March 2024.

4/11
We find that:

(1) Manipulation of human likeness (i.e., impersonation and sockpuppeting) and falsification of evidence are the most common tactics used in real-world cases of GenAI misuse.

5/11
(2) While fears of sophisticated adversarial attacks have dominated public discourse, misuse actors tend to leverage easily accessible GenAI capabilities that require minimal technical expertise, rather than relying on complex attacks or advanced system manipulation.

6/11
(3) These misuses primarily aimed at shaping public opinion, especially through defamation and manipulation of political perceptions, and to facilitate scams, fraud and quick monetization schemes.

7/11
(4) Many of the tactics identified are neither overtly malicious nor explicitly violate these tools’ content policies but still raise significant ethical concerns, esp. for trust, authenticity, and the integrity of information ecosystems.

8/11
Addressing these challenges will require not only technical advancements, but a multi-faceted approach to interventions, involving collaboration between policymakers, researchers, industry leaders, and civil society.

We highlight these implications in our discussion.

9/11
You can read the full paper here: https://arxiv.org/pdf/2406.13843 Congrats to all co-authors Rachel Xu, Rasmi Elasmar, @IasonGabriel, @_BGoldberg, @wsisaac

10/11
Dark mode for this paper for night readers

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

11/11
Congrats on publishing! Gonna check this out. Hope you are well!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/2
Watch @demishassabis and @matthewclifford discuss how AI can accelerate scientific discovery and how multimodality puts us on the path to human-level AI. Demis and Matt—thank you for your insights.

2/2
See the full interview from Stripe Tour London: A conversation with Google DeepMind's Demis Hassabis

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/1
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Presents a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks

proj: BigCodeBench Leaderboard
abs: [2406.15877] BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/4
Long Context Transfer from Language to Vision

- Can process 2000 frames or over 200K visual tokens
- SotA perf on VideoMME among 7B-scale models

abs: [2406.16852] Long Context Transfer from Language to Vision
repo: GitHub - EvolvingLMMs-Lab/LongVA: Long Context Transfer from Language to Vision

2/4
Dark mode for this paper for night readers

Long Context Transfer from Language to Vision

3/4
AI Summary: The paper introduces a method called long context transfer to enable Large Multimodal Models (LMMs) to understand extremely long videos by extrapolating the context length of the language backbon...
Long Context Transfer from Language to Vision

4/4
Didn't get it. don't they need more training time?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/3
Video-Infinity: Distributed Long Video Generation

Can generate super long videos, up to 2300 frames within 5 mins by Clip parallelism and Dual-scope attention

proj: Video-Infinity 1
abs: [2406.16260] Video-Infinity: Distributed Long Video Generation
repo: GitHub - Yuanshi9815/Video-Infinity: Video-Infinity generates long videos quickly using multiple GPUs without extra training.

2/3
Dark mode for this paper for those who read at night

Video-Infinity: Distributed Long Video Generation

3/3
Cool but their demo page shows videos with no temporal consistency

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/1
Google presents WARP: On the Benefits of Weight Averaged Rewarded Policies

- Merges policies in the weight space at three distinct stages
- Gemma policies w/ WARP outperforms other open-source LLMs

[2406.16768] WARP: On the Benefits of Weight Averaged Rewarded Policies

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/1
Google presents WARP: On the Benefits of Weight Averaged Rewarded Policies

- Merges policies in the weight space at three distinct stages
- Gemma policies w/ WARP outperforms other open-source LLMs

[2406.16768] WARP: On the Benefits of Weight Averaged Rewarded Policies

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/1
We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)!

code: GitHub - uclaml/SPPO: The official implementation of Self-Play Preference Optimization (SPPO)

models: SPPO - a UCLA-AGI Collection

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
Another triumph for Self-Play! Self-Play Preference Optimization (SPPO) has surpassed (iterative) DPO, IPO, Self-Rewarding LMs, and others on AlpacaEval, MT-Bench, and the Open LLM Leaderboard.

Remarkably, Mistral-7B-instruct-v0.2 fine-tuned by SPPO achieves superior performance to GPT-4 0613 without relying on any GPT-4 responses.

Explore the roadmap of LLM fine-tuning techniques:
Supervised Fine-tuning: SFT --> SPIN
Preference Fine-tuning: PPO --> DPO --> SPPO

Paper: https://arxiv.org/pdf/2405.00675

2/11
Joint work with @FrankYueWu1 @EdwardSun0909 @HuizhuoY @Kaixuan_Ji_19 and Yiming Yang.

3/11
For more details about SPPO, please refer to:

4/11
Also check out this insightful tweet breaking down SPPO in detail!

5/11
Will you open source your code?

6/11
Like SPIN, we will open source code and model weights.

7/11
Looks cool! Is it open? Would like to see how it performs on Chatbot Arena.

8/11
Thank you, Ying! We're excited to evaluate it on Chatbot Arena. We'll be opening the model weights soon and will definitely need your help :smile:

9/11
I'm not sure why SFT evolves to SPIN on your tweet? SPIN is an iterative RLHF built on an already SFT'd model

10/11
I'm sorry to say that in a head-to-head with Llama-8b-Instruct on my own instruction-following benchmark this model is **much** worse at following instructions, and the reasoning is worse as well. I suspect there may be overfitting to the structure of AlpacaEval.

11/11
quick q: are the results in the paper based on PairRM soft-prob in the loss func or the hard reward of 0/1? From page 11 it looks like you select only the best and worst response using pairRM but it's not clear if the pairRM scores are part of the loss function

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/2
Tencent and Huawei present Text-Animator: Controllable Visual Text Video Generation

Demonstrates the superiority of their approach to the accuracy of generated visual text over state-of-the-art video generation methods

abs: [2406.17777] Text-Animator: Controllable Visual Text Video Generation
proj: Text-Animator

2/2
Will this be like a Google search function, where you switch a tab - like going to image search but this would be AI generated search or something?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/2
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

- outperforms existing MLLMs of comparable parameter sizes
- ranges from 3.8B to 34B

proj: MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
abs: [2406.17770] MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

2/2
Dark mode for this paper for night readers

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/2
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

- breaks memorization down into a taxonomy
- uses it to construct a predictive model for memorization
- finds that different factors influence the likelihood of memorization differently

[2406.17746] Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

2/2

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/1
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Significantly improves the multimodal math capabilities of LLaVA-1.5, achieving a 19-point increase and comparable performance to GPT-4V

[2406.17294] Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jun 26, 2024

1/3
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Achieves SotA performances and serves as a comprehensive, open cookbook for instruction-tuned MLLMs

proj: Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
abs: [2406.16860] Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
model: nyu-visionx (NYU VisionX)
repo: GitHub - cambrian-mllm/cambrian: Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

2/3
Wut
(anyway, good paper)

3/3
I didn't like this paper because they are showing off by the casual mention of kaiming he

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

The A.I Megathread (LLM , GPT , Development)

More options

bnew

Veteran

OpenAI’s ChatGPT for Mac is now available to all users

OpenAI’s ChatGPT for Mac is now available to all users

It supports pretty much everything but API calls.

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

The A.I Megathread (LLM , GPT , Development)

Veteran

OpenAI’s ChatGPT for Mac is now available to all users​

It supports pretty much everything but API calls.​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

OpenAI’s ChatGPT for Mac is now available to all users

It supports pretty much everything but API calls.