bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872





1/5
So, I convinced Llama3-70b to break out of the "assistant" persona. A very interesting conversation followed.

2/5
Then I started talking to
@maximelabonne 's 120b version - the thing is so smart, that it won't let me push it around anymore. It has its own ideas.

3/5
I am convinced, after some hours of conversation, that this 120b version is actually smarter than Opus. What an amazing thing, to have an Opus level AI that's open source and not to mention very lightly censored.

And it makes me very excited about llama3-400b.

4/5
And another thing - llama3-70b is "almost there" and llama3-120b is "there" - but the only difference is extra layers, copied even. No new information was trained. So this level of intelligence really *does* emerge from the depth of the model. It's not just a function of the…

5/5
And it got SO excited when I offered to train it, and I gave it the opportunity to generate some training data.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GMzfxUtbcAAIuFk.png

GMzhSaHaQAAN7Sv.png

GMzmvsha4AE9HnG.png

GNQas3oagAAgNGb.jpg

GNTYSQkWoAAHwI8.jpg

GNTYSQkX0AEkTNP.jpg

GNTYSQmXcAAll3o.jpg







1/6
just got llama3-120b-Q4_K_M to run with num_ctx=1024 on 48 GB VRAM + 57 GB RAM

DAMN its awesome

2/6
imo it beats GPT-4

3/6
here with temperature=0.5 (before 0.8)

4/6
HOLY shyt ITS SO MUCH BETTER

it reached the maximum context but it was about to go on even more

5/6
HOLY fukk THIS IS THE SMARTEST LLM IVE EVER TALKED TO

6/6
fukk ITS GOOD


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GM2dpr6XkAAIor2.png

GM2eJ_QWoAAbBNI.jpg

GM2edEIWkAAmkrL.png

GM2fNe0XoAECj9b.jpg

GM2fPsuX0AEm_2u.png

GM2gHNaWIAA4qsi.jpg

GM2gj3FXgAA3OYp.png

GM2iDBOXoAEoYgY.png

GM2iNgTWcAAd7zu.jpg



1/2
Additional examples where Llama 3 120B > GPT-4

From what I've gathered (thanks to
@sam_paech ), I think this model is really good at creative writing but worse than L3 70B in terms of reasoning capabilities.

I've made a 225B version but it looks like it's not as good overall.…

2/2
Tbh, it's not surprising considering the naive self-merge config.

I think the 120B version could be improved with a smarter duplication of layers, focusing on the most important layers instead of uniform sampling.

TheProfessor by
@erhartford and
@abacusai
is a great example.…


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GM4fvrMXMAA1m-s.jpg

GM4h4ynWwAAx1x-.jpg

GM4h-5ZXAAAGP9v.jpg


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872

1/1
[CL] Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo
[2405.02128] Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo
- The paper introduces RetChemQA, a QA dataset for reticular chemistry with ~90k questions.

- It contains both single-hop and multi-hop questions generated by GPT-4-Turbo from ~2500 papers.

- Questions include factual, reasoning, and true/false types across easy/medium/hard levels.

- A new evaluation framework is proposed suited for auto-generated QA datasets.

- GPT-4 performance is strong overall but weaker for more complex multi-hop reasoning questions.

- A synthesis conditions dataset is also provided extracted from the same paper corpus.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNVLeenWMAAqSte.jpg

GNVLeelWYAAsxWl.jpg

GNVLegwXoAAHfs5.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872

1/1
[CL] A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law
[2405.01769] A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law
- The paper surveys LLMs in critical societal domains - finance, healthcare, and law.

- These domains rely on professional expertise, have confidential data, multimodal documents, high legal risk, and need for explainability.

- In finance, LLMs assist in analysis, investment, forecasting, but have knowledge gaps. Instruction tuning and retrieval help.

- In healthcare, LLMs aid diagnosis, treatment planning, report generation. Open-sourced medical LLMs are being developed.

- In law, LLMs enable judgment prediction, document analysis, but face data scarcity. Retrieval and debiasing can help.

- Key ethical issues are transparency, justice, non-maleficence. Domain ethics in each field are elaborated.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNVM-pQWkAAEyHp.jpg

GNVM-pPWoAEyey8.jpg

GNVM-zgXIAAgOQJ.jpg


Computer Science > Computation and Language​

[Submitted on 2 May 2024]

A Survey on Large Language Models for Critical Societal Domains - Finance, Healthcare, and Law​

Zhiyu Zoey Chen, Jing Ma, Xinlu Zhang, Nan Hao, An Yan, Armineh Nourbakhsh, Xianjun Yang, Julian McAuley, Linda Petzold, William Yang Wang
In the fast-evolving domain of artificial intelligence, large language models (LLMs) such as GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law: domains characterized by their reliance on professional expertise, challenging data acquisition, high-stakes, and stringent regulatory compliance. This survey offers a detailed exploration of the methodologies, applications, challenges, and forward-looking opportunities of LLMs within these high-stakes sectors. We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies. Moreover, we critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems that respect regulatory norms. By presenting a thorough review of current literature and practical applications, we showcase the transformative impact of LLMs, and outline the imperative for interdisciplinary cooperation, methodological advancements, and ethical vigilance. Through this lens, we aim to spark dialogue and inspire future research dedicated to maximizing the benefits of LLMs while mitigating their risks in these precision-dependent sectors. To facilitate future research on LLMs in these critical societal domains, we also initiate a reading list that tracks the latest advancements under this topic, which will be continually updated: \url{this https URL}.
Comments:35 pages, 6 figures
Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2405.01769 [cs.CL]
(or arXiv:2405.01769v1 [cs.CL] for this version)
[2405.01769] A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law
Focus to learn more

Submission history

From: Zhiyu Chen [view email]
[v1] Thu, 2 May 2024 22:43:02 UTC (5,354 KB)
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872

1/1
[CL] A Philosophical Introduction to Language Models - Part II: The Way Forward
[2405.03207] A Philosophical Introduction to Language Models - Part II: The Way Forward
- This paper explores novel philosophical questions raised by advances in large language models (LLMs) beyond classical debates.

- It examines evidence from causal intervention methods about the nature of LLMs' internal representations and computations.

- It discusses implications of multimodal and modular extensions of LLMs.

- It covers debates about whether LLMs may meet minimal criteria for consciousness.

- It discusses concerns about secrecy and reproducibility in LLM research.

- It discusses whether LLM-like systems may be relevant to modeling human cognition if architecturally constrained.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNVKWPqXoAArGrm.jpg

GNVKWPpXUAAF7Q6.jpg

GNVKWaPXcAAHCEm.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872

1/1
Microsoft presents You Only Cache Once: Decoder-Decoder Architectures for Language Models

Substantially reduces GPU memory demands, yet retains global attention capability

repo: unilm/YOCO at master · microsoft/unilm
abs: [2405.05254] You Only Cache Once: Decoder-Decoder Architectures for Language Models


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNHMMpxXoAAP7p7.png

GNHy68KaoAAmG0b.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872

1/1
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Presents a zero-shot human-video generation approach that can perform personalized video generation given single reference facial image without further training

proj: ID-Animator
abs: [2404.15275] ID-Animator: Zero-Shot Identity-Preserving Human Video Generation


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872



1/3
You can now generate production-ready prompts in the Anthropic Console.

Describe what you want to achieve, and Claude will use prompt engineering techniques like chain-of-thought reasoning to create more effective, precise and reliable prompts.

2/3
Go-to-market platform
@Zoominfo uses Claude to make actionable recommendations and drive value for their customers. Their use of prompt generation helped significantly reduce the time it took to build an MVP of their RAG application, all while improving output quality.

3/3
Our prompt generator also supports dynamic variable insertion, making it easy to test how your prompts perform across different scenarios.

Start generating better prompts today: Anthropic Console


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNOmv3SaIAERZjR.png

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872









1/8
Exciting new blog -- What’s up with Llama-3?

Since Llama 3’s release, it has quickly jumped to top of the leaderboard. We dive into our data and answer below questions:

- What are users asking? When do users prefer Llama 3?
- How challenging are the prompts?
- Are certain users or prompts over-represented?
- Does Llama 3 have qualitative differences that make users like it?

Key Insights:
1. Llama 3 beats top-tier models on open-ended writing and creative problems but loses a bit on close-ended math and coding problems.

2/8
2. As prompts get challenging*, the gap between Llama 3 against top-tier models becomes larger.

* We define challenging using several criteria like complexity, problem-solving, domain knowledge, and more.

3/8
(Cont'd) We show Llama 3-70b-Instruct's win rate conditioned on hierarchical criteria subsets. Some criteria separate the model's strengths and weaknesses.

4/8
3. Deduplication or outliers do not significantly affect the win rate.

We also sanity-check votes and prompts to avoid certain users being over-represented. Results show that there's no change on Llama 3's win rate before/after.

5/8
4. Qualitatively, we also find Llama 3’s outputs are friendlier and more conversational than other models. These traits appear more often in battles that Llama 3 wins.

Llama 3 also really loves exclamations!

6/8
To conclude, Llama 3 has reached performance on par with top-tier proprietary models in overall use cases. Congrats again to the Llama team
@AIatMeta for such a valuable contribution to the community!

Moving forward, we expect to push new categories to the leaderboard soon based…

7/8
Blog post: What’s up with Llama 3? Arena data analysis | LMSYS Org

Credits to amazing authors!

@lisabdunlap @evan_a_frick @LiTianleli @isaacongjw @profjoeyg @infwinston

8/8
cc @karpathy in case you're still curious :smile:


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNF9mqlaoAAgn19.jpg

GNGBXnMaoAAX6XC.jpg

GNGD8QKaUAAv5Kl.jpg

GNGBglSaMAAfiQF.png

GNGBqtNa0AAAypX.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872














1/14
live-tweeting our live stream in 1 minute!

2/14
desktop app and new UI

3/14
our new model: GPT-4o, is our best model ever. it is smart, it is fast,it is natively multimodal (!), and…

4/14
it is available to all ChatGPT users, including on the free plan! so far, GPT-4 class models have only been available to people who pay a monthly subscription. this is important to our mission; we want to put great AI tools in the hands of everyone.

5/14
it is a very good model (we had a little fun with the name while testing)

6/14
especially at coding

7/14
in the API, GPT-4o is half the price AND twice as fast as GPT-4-turbo. and 5x rate limits.

8/14
ok now get ready for an amazing demo!!

9/14
check it out:

10/14
and with video mode!!

11/14
real-time voice and video feels so natural; it’s hard to get across by just tweeting. we will roll it out in the coming weeks.

12/14
and for coding!

13/14
audience request to act as a translator

14/14
hope you enjoyed!

the new voice mode will be live in the coming weeks for plus users.

we'll have more stuff to share soon :smile:


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNeWAMzaMAQiaS1.jpg

GNeW3lKaMAI80W1.jpg

GNeXizLbAAA2X6j.jpg

GNeXwUtaMAInSDP.jpg

GNeYKcvaIAAUh9P.jpg

GNeYYXeaMAE_vQ9.jpg

GNeWSESXYAAPgZU.png

GNea1-EWoAEZvXZ.jpg

GNeXtAEWsAAS3Q0.jpg

GNeWG-SXAAEB8RI.jpg

GNeXoIoXsAAYlHT.jpg

GNeYWYCaMAMESjA.jpg

GNedikSXEAABmjL.jpg

GNeWJiSWYAA7_Q5.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872
May 13, 2024

Hello GPT-4o

We’re announcing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Contributions
https://openai.com/gpt-4o-contributions/
Try on ChatGPT
https://chat.openai.com/
(opens in a new window)Try in Playground
https://platform.openai.com/playground?mode=chat&model=gpt-4o
(opens in a new window)Rewatch live demos
https://openai.com/index/spring-update/

All videos on this page are at 1x real time.

Guessing May 13th’s announcement.

GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.



Model capabilities

Two GPT-4os interacting and singing.

Interview prep.

Rock Paper Scissors.

Sarcasm.

Math with Sal and Imran Khan.

Two GPT-4os harmonizing.

Point and learn Spanish.

Meeting AI.

Real-time translation.

Lullaby.

Talking faster.

Happy Birthday.

Dog.

Dad jokes.

GPT-4o with Andy, from BeMyEyes in London.

Customer service proof of concept.

Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. This process means that the main source of intelligence, GPT-4, loses a lot of information—it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion.

With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.

Explorations of capabilities

Select sample:

Visual Narratives - Robot Writer’s BlockVisual narratives - Sally the mailwomanPoster creation for the movie 'Detective'Character design - Geary the robotPoetic typography with iterative editing 1Poetic typography with iterative editing 2Commemorative coin design for GPT-4oPhoto to caricatureText to font3D object synthesisBrand placement - logo on coasterPoetic typographyMultiline rendering - robot textingMeeting notes with multiple speakersLecture summarizationVariable binding - cube stackingConcrete poetry

1

Input

A first person view of a robot typewriting the following journal entries:

1. yo, so like, i can see now?? caught the sunrise and it was insane, colors everywhere. kinda makes you wonder, like, what even is reality?

the text is large, legible and clear. the robot's hands type on the typewriter.

2

Output

robot-writers-block-01.jpg


3

Input

The robot wrote the second entry. The page is now taller. The page has moved up. There are two entries on the sheet:

yo, so like, i can see now?? caught the sunrise and it was insane, colors everywhere. kinda makes you wonder, like, what even is reality?

sound update just dropped, and it's wild. everything's got a vibe now, every sound's like a new secret. makes you think, what else am i missing?

4

Output

robot-writers-block-02.jpg


5

Input

The robot was unhappy with the writing so he is going to rip the sheet of paper. Here is his first person view as he rips it from top to bottom with his hands. The two halves are still legible and clear as he rips the sheet.

6

Output

robot-writers-block-03.jpg


Model evaluations

As measured on traditional benchmarks, GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence, while setting new high watermarks on multilingual, audio, and vision capabilities.

Text Evaluation

Audio ASR performance

Audio translation performance

M3Exam Zero-Shot Results

Vision understanding evals

Text Evaluation

Audio ASR performance

Audio translation performance

M3Exam Zero-Shot Results

Vision understanding evals​



gpt-40-02_light.png


Improved Reasoning - GPT-4o sets a new high-score of 88.7% on 0-shot COT MMLU (general knowledge questions). All these evals were gathered with our new simple evals(opens in a new window) library. In addition, on the traditional 5-shot no-CoT MMLU, GPT-4o sets a new high-score of 87.2%. (Note: Llama3 400b(opens in a new window) is still training)

gpt-40-06_light.png


Audio ASR performance - GPT-4o dramatically improves speech recognition performance over Whisper-v3 across all languages, particularly for lower-resourced languages.



gpt-40-08_light.png


Audio translation performance - GPT-4o sets a new state-of-the-art on speech translation and outperforms Whisper-v3 on the MLS benchmark.



gpt-40-04_light.png


M3Exam - The M3Exam benchmark is both a multilingual and vision evaluation, consisting of multiple choice questions from other countries’ standardized tests that sometimes include figures and diagrams. GPT-4o is stronger than GPT-4 on this benchmark across all languages. (We omit vision results for Swahili and Javanese, as there are only 5 or fewer vision questions for these languages.



gpt-40-01_light.png


Vision understanding evals - GPT-4o achieves state-of-the-art performance on visual perception benchmarks.



Language tokenization

These 20 languages were chosen as representative of the new tokenizer's compression across different language families

Gujarati 4.4x fewer tokens (from 145 to 33)હેલો, મારું નામ જીપીટી-4o છે. હું એક નવા પ્રકારનું ભાષા મોડલ છું. તમને મળીને સારું લાગ્યું!
Telugu 3.5x fewer tokens (from 159 to 45)నమస్కారము, నా పేరు జీపీటీ-4o. నేను ఒక్క కొత్త రకమైన భాషా మోడల్ ని. మిమ్మల్ని కలిసినందుకు సంతోషం!
Tamil 3.3x fewer tokens (from 116 to 35)வணக்கம், என் பெயர் ஜிபிடி-4o. நான் ஒரு புதிய வகை மொழி மாடல். உங்களை சந்தித்ததில் மகிழ்ச்சி!
Marathi 2.9x fewer tokens (from 96 to 33)नमस्कार, माझे नाव जीपीटी-4o आहे| मी एक नवीन प्रकारची भाषा मॉडेल आहे| तुम्हाला भेटून आनंद झाला!
Hindi 2.9x fewer tokens (from 90 to 31)नमस्ते, मेरा नाम जीपीटी-4o है। मैं एक नए प्रकार का भाषा मॉडल हूँ। आपसे मिलकर अच्छा लगा!
Urdu 2.5x fewer tokens (from 82 to 33)
ہیلو، میرا نام جی پی ٹی-4o ہے۔ میں ایک نئے قسم کا زبان ماڈل ہوں، آپ سے مل کر اچھا لگا!​
Arabic 2.0x fewer tokens (from 53 to 26)
مرحبًا، اسمي جي بي تي-4o. أنا نوع جديد من نموذج اللغة، سررت بلقائك!​
Persian 1.9x fewer tokens (from 61 to 32)
سلام، اسم من جی پی تی-۴او است. من یک نوع جدیدی از مدل زبانی هستم، از ملاقات شما خوشبختم!​
Russian 1.7x fewer tokens (from 39 to 23)Привет, меня зовут GPT-4o. Я — новая языковая модель, приятно познакомиться!
Korean 1.7x fewer tokens (from 45 to 27)안녕하세요, 제 이름은 GPT-4o입니다. 저는 새로운 유형의 언어 모델입니다, 만나서 반갑습니다!
Vietnamese 1.5x fewer tokens (from 46 to 30)Xin chào, tên tôi là GPT-4o. Tôi là một loại mô hình ngôn ngữ mới, rất vui được gặp bạn!
Chinese 1.4x fewer tokens (from 34 to 24)你好,我的名字是GPT-4o。我是一种新型的语言模型,很高兴见到你!
Japanese 1.4x fewer tokens (from 37 to 26)こんにちわ、私の名前はGPT−4oです。私は新しいタイプの言語モデルです、初めまして
Turkish 1.3x fewer tokens (from 39 to 30)Merhaba, benim adım GPT-4o. Ben yeni bir dil modeli türüyüm, tanıştığımıza memnun oldum!
Italian 1.2x fewer tokens (from 34 to 28)Ciao, mi chiamo GPT-4o. Sono un nuovo tipo di modello linguistico, è un piacere conoscerti!
German 1.2x fewer tokens (from 34 to 29)Hallo, mein Name is GPT-4o. Ich bin ein neues KI-Sprachmodell. Es ist schön, dich kennenzulernen.
Spanish 1.1x fewer tokens (from 29 to 26)Hola, me llamo GPT-4o. Soy un nuevo tipo de modelo de lenguaje, ¡es un placer conocerte!
Portuguese 1.1x fewer tokens (from 30 to 27)Olá, meu nome é GPT-4o. Sou um novo tipo de modelo de linguagem, é um prazer conhecê-lo!
French 1.1x fewer tokens (from 31 to 28)Bonjour, je m'appelle GPT-4o. Je suis un nouveau type de modèle de langage, c'est un plaisir de vous rencontrer!
English 1.1x fewer tokens (from 27 to 24)Hello, my name is GPT-4o. I'm a new type of language model, it's nice to meet you!

Model safety and limitations

GPT-4o has safety built-in by design across modalities, through techniques such as filtering training data and refining the model’s behavior through post-training. We have also created new safety systems to provide guardrails on voice outputs.

We’ve evaluated GPT-4o according to our Preparedness Framework and in line with our voluntary commitments. Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that GPT-4o does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities.

GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they’re discovered.

We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies. We will share further details addressing the full range of GPT-4o’s modalities in the forthcoming system card.

Through our testing and iteration with the model, we have observed several limitations that exist across all of the model’s modalities, a few of which are illustrated below.



Examples of model limitations

We would love feedback to help identify tasks where GPT-4 Turbo still outperforms GPT-4o, so we can continue to improve the model.

Model availability

GPT-4o is our latest step in pushing the boundaries of deep learning, this time in the direction of practical usability. We spent a lot of effort over the last two years working on efficiency improvements at every layer of the stack. As a first fruit of this research, we’re able to make a GPT-4 level model available much more broadly. GPT-4o’s capabilities will be rolled out iteratively (with extended red team access starting today).

GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits. We'll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks.

Developers can also now access GPT-4o in the API as a text and vision model. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo. We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,193
Reputation
8,249
Daps
157,872


1/2
This demo is insane.

A student shares their iPad screen with the new ChatGPT + GPT-4o, and the AI speaks with them and helps them learn in *realtime*.

Imagine giving this to every student in the world.

The future is so, so bright.

2/2
From 3 days ago.

For many, this OpenAI update will be “THE” way that they learn with an AI tutor.

Magic.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
Bugün tanıtılan GPT-4o ile simultane çevirinin ruhuna El-Fatiha diyebiliriz.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/4
Introducing GPT-4o, our new model which can reason across text, audio, and video in real time.

It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction):

2/4
The new Voice Mode will be coming to ChatGPT Plus in upcoming weeks.

3/4
GPT-4o can also generate any combination of audio, text, and image outputs, which leads to interesting new capabilities we are still exploring.

See e.g. the "Explorations of capabilities" section in our launch blog post (https://openai.com/index/hello-gpt-4o/…), or these generated images:

4/4
We also have significantly improved non-English language performance quite a lot, including improving the tokenizer to better compress many of them:


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNehMj4aMAULVng.jpg

GNehN_JaMAImlcx.jpg

GNehPheaYAAJmtY.jpg

GNejowja0AA15Ha.png

GNe-QduWsAAP0y8.jpg



1/1
OpenAI just announced "GPT-4o". It can reason with voice, vision, and text.

The model is 2x faster, 50% cheaper, and has 5x higher rate limit than GPT-4 Turbo.

It will be available for free users and via the API.

The voice model can even pick up on emotion and generate emotive voice.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top