bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899

Economics > General Economics​

[Submitted on 11 Apr 2024]

ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past​

Van Pham, Scott Cunningham
This study investigates whether OpenAI's ChatGPT-3.5 and ChatGPT-4 can accurately forecast future events using two distinct prompting strategies. To evaluate the accuracy of the predictions, we take advantage of the fact that the training data at the time of experiment stopped at September 2021, and ask about events that happened in 2022 using ChatGPT-3.5 and ChatGPT-4. We employed two prompting strategies: direct prediction and what we call future narratives which ask ChatGPT to tell fictional stories set in the future with characters that share events that have happened to them, but after ChatGPT's training data had been collected. Concentrating on events in 2022, we prompted ChatGPT to engage in storytelling, particularly within economic contexts. After analyzing 100 prompts, we discovered that future narrative prompts significantly enhanced ChatGPT-4's forecasting accuracy. This was especially evident in its predictions of major Academy Award winners as well as economic trends, the latter inferred from scenarios where the model impersonated public figures like the Federal Reserve Chair, Jerome Powell. These findings indicate that narrative prompts leverage the models' capacity for hallucinatory narrative construction, facilitating more effective data synthesis and extrapolation than straightforward predictions. Our research reveals new aspects of LLMs' predictive capabilities and suggests potential future applications in analytical contexts.
Comments:61 pages, 26 figures
Subjects:General Economics (econ.GN); Artificial Intelligence (cs.AI)
Cite as:arXiv:2404.07396 [econ.GN]
(or arXiv:2404.07396v1 [econ.GN] for this version)

Submission history​

From: Scott Cunningham [view email]
[v1] Thu, 11 Apr 2024 00:03:03 UTC (3,365 KB)







1/3
I wrote a substack about my new paper "ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past" coauthored with Van Pham. You can find the original paper at the link below, but this is a deep dive into it in case you're interested.

2/3
It's a weird paper.

ChatGPT Can Predict the Future When it Tells Stories Set in the Future About the Past

3/3
You can now selectively edit your Dall-E 3 pictures using OpenAI. It looks like you just highlight an area and write a new prompt and it recreates that area. I've not had much luck with it but I'll keep at it.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196












1/10
"ChatGPT forecasts the future better when telling tales" — The Register

Take a peek at the essence of the story! 1/10

2/10
1. AI models, particularly ChatGPT, show improved forecasting ability when asked to frame predictions as stories about the past.

3/10
2. Baylor University researchers found ChatGPT's narrative prompts to be more effective in predicting events, such as Oscar winners, than direct prompts.

4/10
3. The study also highlights concerns about OpenAI's safety mechanisms and the potential misuse of AI for critical decision-making.
4. Despite prohibitions on certain predictions, the model can provide indirect advice or forecasts when prompted creatively.

5/10
5. OpenAI's GPT-4 model, used in the study, was trained on data up to September 2021, limiting its direct forecasting capability.

6/10
6. The narrative prompting strategy led ChatGPT-4 to accurately predict all actor and actress category winners for the 2022 Academy Awards but failed for the Best Picture.

7/10
7. The researchers suggest that narrative prompting can outperform direct inquiries and random guesses, offering a new approach to AI forecasting.
8. Accuracy varies with the type of prompt, with some narrative details leading to better or worse forecasts.

8/10
9. The study raises questions about the ability of AI models to consistently provide accurate predictions and the underlying mechanics that enable or inhibit forecasting.

9/10
10. There is an inherent randomness in AI predictions, indicating the need for considering confidence intervals or averages over singular predictions.

10/10
Hungry for more takes like this? Subscribe to our newsletter and stay up to date!

Find the link in our bio :smile:


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLNSvjpWcAApr3p.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899








1/6
Today we are announcing WizardLM-2, our next generation state-of-the-art LLM.

New family includes three cutting-edge models: WizardLM-2 8x22B, 70B, and 7B - demonstrates highly competitive performance compared to leading proprietary LLMs.

Release Blog: WizardLM 2

Model Weights: WizardLM - a microsoft Collection

2/6
WizardLM-2 8x22B is our most advanced model, and just slightly falling behind GPT-4-1106-preview.
WizardLM-2 70B reaches top-tier capabilities in the same size.
WizardLM-2 7B even achieves comparable performance with existing 10x larger opensource leading models.

The…

3/6
As the natural world's human data becomes increasingly exhausted through LLM training, we believe that: the data carefully created by AI and the model step-by-step supervised by AI will be the sole path towards more powerful AI. Thus, we built a Fully AI powered Synthetic…

4/6
OK, we will train it and share with you ASAP.

5/6
Thanks for your support, my friend.

6/6
Yes, coming soon.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLN2OcVaEAAzDW3.png

GLN3Z_xawAALilm.jpg

GLN3oEUbUAAcOP4.jpg



1/2
Now I have to do this.

Running wizardLM2 with
@ollama from my phone

2/2
lol,
@ollama releases the new model which is very very good and is small that I can run on a phone

Kidding aside, I am just trying to run the new ollama model on my GPU while trying to get some sleep


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
Big day for unexpectedly powerful LLM releases.

Microsoft's open source WizardLM 2 (also note that it used synthetic inputs in training, maybe "running out of data" will not be a big deal): WizardLM 2

Closed source Reka, which is multimodal: Reka AI


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLPJbF9a4AAc7Om.jpg

GLPJfODb0AAT_aO.png




1/3
One of the community members tried the IQ3-XS quants of "WizardLM-2-8x22B" by
@WizardLM_AI —this is such a complicated question!

Such an advanced and coherent response! I am quite impressed!

2/3
I knew there would come a day when I would put these two next to each other:

"ChatGPT CEO Sam Altman to Indians: It's pretty hopeless for you to compete with us" - 10 mo. ago

https://reddit.com/r/ChatGPT/commen..._sam_altman_to_indians_its_pretty/ @OpenAI eat that!

3/3
We can do it! First open LLM outperforms @OpenAI GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B!

TL;DR;
Mixtral 8x22B based (141B-A40 MoE)
Apache 2.0 license
First > 9.00 on MT-Bench with an open LLM
Used multi-step…


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLPHgPdWsAEGBhn.jpg

GLPGZJ4XYAEEXZh.jpg

GLOxLf-WkAA2A1w.jpg




1/3
One of the community members tried the IQ3-XS quants of "WizardLM-2-8x22B" by
@WizardLM_AI —this is such a complicated question!

Such an advanced and coherent response! I am quite impressed!

2/3
I knew there would come a day when I would put these two next to each other:

"ChatGPT CEO Sam Altman to Indians: It's pretty hopeless for you to compete with us" - 10 mo. ago

https://reddit.com/r/ChatGPT/commen..._sam_altman_to_indians_its_pretty/ @OpenAI eat that!

3/3
We can do it! First open LLM outperforms @OpenAI GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B!

TL;DR;
Mixtral 8x22B based (141B-A40 MoE)
Apache 2.0 license
First > 9.00 on MT-Bench with an open LLM
Used multi-step…


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLPHgPdWsAEGBhn.jpg

GLPGZJ4XYAEEXZh.jpg

GLOxLf-WkAA2A1w.jpg




1/1
the ai refining the ai's post-ai training - this space is fascinating...

We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B.

WizardLM-2 is the latest milestone in our effort in scaling up LLM post-training. One year ago, we have been iterating on training of Wizard series since our first work on Empowering Large Language Models to Follow Complex Instructions, then we accelerate the evolution to code and math reasoning scenarios. Since then, Evol-Instruct and Instruction&Process Supervised Reinforcement Learning (RLEIF) have become fundamental technologies for GenAI community. Recently, we have further optimized our methods and data quality, resulting in significant performance improvements, the outcome is WizardLM-2.

WizardLM-2 8x22B is our most advanced model, and the best opensource LLM in our internal evaluation on highly complex tasks. WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size. WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models.

Following, we will introduce the overall methods and main experimental results, and the associated details and rethinking will be presented in our upcoming paper.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLPC2ujW4AA-hBe.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899

WizardLM 2​

Microsoft AI

Apr 15, 2024


🤗Models Code arXiv [Coming]

We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B.

WizardLM-2 is the latest milestone in our effort in scaling up LLM post-training. One year ago, we have been iterating on training of Wizard series since our first work on Empowering Large Language Models to Follow Complex Instructions, then we accelerate the evolution to code and math reasoning scenarios. Since then, Evol-Instruct and Instruction&Process Supervised Reinforcement Learning (RLEIF) have become fundamental technologies for GenAI community. Recently, we have further optimized our methods and data quality, resulting in significant performance improvements, the outcome is WizardLM-2.

WizardLM-2 8x22B is our most advanced model, and the best opensource LLM in our internal evaluation on highly complex tasks. WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size. WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models.

Following, we will introduce the overall methods and main experimental results, and the associated details and rethinking will be presented in our upcoming paper.

wizardlm2


Method Overview​

As the natural world's human-generated data becomes increasingly exhausted through LLM training, we believe that: the data carefully created by AI and the model step-by-step supervised by AI will be the sole path towards more powerful AI.


In the past one year, we built a fully AI powered synthetic training system:
  • Data Pre-Processing:
    • Data Analysis: We use this pipline to get the distribution of different attributes for new source data. This helps us to have a preliminary understanding of the data.
    • Weighted Sampling: The distribution of the best training data is always not consistent with the natural distribution of human chat corpus, thus we need adjust the weights of various attributes in the training data based on experimental experience.
  • Progressive Learning: Unlike the common practice of using all data for one-time training, we found that using different data partitions and progressively training stage-by-stage can achieve better results with less data. In each stage, we firstly feed the data slice to following Evol Lab to get more diverse and complex [instruction, response] pairs. Immediately, we leverage a new framework named "AI Align AI" (AAA) which can group multi state-of-the-art LLMs to teach and improve each other. Finally, we successively apply the Supervised Learning, Stage-DPO, and RLEIF to optimize each variant.
  • Evol Lab:
    • Evol-Instruct: Recently, we have dedicated significant effort to reassessing the various issues within the original Evol-Instruct method and have initiated preliminary modifications. The new method enables various agents to automatically generate high quality instructions.
    • Evol-Answer: Guiding the model to generate and rewrite responses multiple times can improve its logic, correctness, and affinity.
  • AI Align AI (AAA):
    • Co-Teaching: We collect WizardLMs, and various licensed opensource and proprietary state-of-the-art models, then let them co-teach and improve each other, the teaching contains simulated chat, quality judging, improvement suggestions and closing skill gap, etc.
    • Self-Teaching: WizardLM can generate new evolution training data for supervised learning and preference data for reinforcement learning via activate learning from itself.
  • Learning:
    • Supervised Learning.
    • Stage-DPO: For more effective offline reinforcement learning, we also split the preference data to different slices, and progressively improve the model stage by stage.
    • RLEIF: We employ instruction quality reward model (IRM) combined with the process supervision reward model (PRM) to achieve more precise correctness in the online reinforcement learning.

wizardlm2


WizardLM 2 Capacities​

To present a comprehensive overview of the performance of WizardLM-2, we conduct both human and automatic evaluations between our models and diverse baselines. As indicated in the following main experimental results, WizardLM-2 demonstrates highly competitive performance compared to those leading proprietary works and consistently outperforms all the existing state-of-the-art opensource models. More associated details and thinking will be presented in our upcoming paper.

Human Preferences Evaluation

We carefully collected a complex and challenging set consisting of real-world instructions, which includes main requirements of humanity, such as writing, coding, math, reasoning, agent, and multilingual. We perform a blind pairwise comparison between WizardLM-2 and baselines. To each annotator, responses from all models are presented, which are randomly shuffled to hide their sources. We report the win:loss rate without tie:

  • WizardLM-2 8x22B is just slightly falling behind GPT-4-1106-preview, and significantly stronger than Command R Plus and GPT4-0314.
  • WizardLM-2 70B is better than GPT4-0613, Mistral-Large, and Qwen1.5-72B-Chat.
  • WizardLM-2 7B is comparable with Qwen1.5-32B-Chat, and surpasses Qwen1.5-14B-Chat and Starling-LM-7B-beta.

Through this human preferences evaluation, WizardLM-2's capabilities are very close to the cutting-edge proprietary models such as GPT-4-1106-preview, and significantly ahead of all the other open source models.



wizardlm2

MT-Bench

We also adopt the automatic MT-Bench evaluation framework based on GPT-4 proposed by lmsys to assess the performance of models. The WizardLM-2 8x22B even demonstrates highly competitive performance compared to the most advanced proprietary works such as GPT-4-Trubo and Glaude-3. Meanwhile, WizardLM-2 7B and WizardLM-2 70B are all the top-performing models among the other leading baselines at 7B to 70B model scales.


wizardlm2


Usage​

The model weights of WizardLM-2 8x22B and WizardLM-2 7B are shared on Huggingface, and WizardLM-2 70B and the demo of all the models will be available in the coming days. Please use the same system prompts strictly with us to guarantee the generation quality.

❗Note for model system prompts usage:

WizardLM-2 adopts the prompt format from Vicuna and supports multi-turn conversation.

The prompt should be as following:

Code:
[/SIZE][/JUSTIFY]
[SIZE=4][JUSTIFY]A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.</s>USER: Who are you? ASSISTANT: I am WizardLM.</s>......


Inference WizardLM-2 Demo Script

We provide a WizardLM-2 inference demo code on our github.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899


rihapdhb6suc1.png












1/4
WizardLM-2 now gone… Premature release? or something else going on here

2/4
This was their announcement

3/4
Today we are announcing WizardLM-2, our next generation state-of-the-art LLM.

New family includes three cutting-edge models: WizardLM-2 8x22B, 70B, and 7B - demonstrates highly competitive performance compared to leading proprietary LLMs.

Release Blog:…

4/4
What would be the point? Besides some short lived attention on social media lol


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLQml9uWsAAbcgJ.jpg

GLQml9yXQAAGNM3.jpg

GLN2OcVaEAAzDW3.png

GLQvI15aAAAd7XA.jpg


1/1
The internet never forgets ...


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLQml9uWsAAbcgJ.jpg

GLQml9yXQAAGNM3.jpg






1/3
We are sorry for that.

It’s been a while since we’ve released a model months ago, so we’re unfamiliar with the new release process now: We accidentally missed an item required in the model release process - toxicity testing.

We are currently completing this test quickly and then will re-release our model as soon as possible.

Do not worry, thanks for your kindly caring and understanding.

2/3
No, we just forgot a necessary testing. This is a step that all new models currently need to complete.

3/3
According to the latest regulations, we can only operate in this way. We need to strictly abide by the necessary process. We are very sorry for the trouble it has caused you.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLQvI15aAAAd7XA.jpg

GLN2OcVaEAAzDW3.png







1/5
Looks like $MSFT's WizardLM was short-lived.

They just wiped it from their HF repo - Github as well?

There's a couple of quantized versions _still_ available. Maybe grab one?

Anyone see anything?

2/5
Ollama was the fastest pull. (ollama run wizardlm2)

MaziyarPanahi/WizardLM-2-7B-GGUF · Hugging Face

WizardLM-2-7B-Q8_0.gguf · bartowski/WizardLM-2-7B-GGUF at main

mlx-community/WizardLM-2-7B-4bit · Hugging Face
@Prince_Canuma

3/5
nice one

4/5
All good - will be back online soon.

Apparently a test was missed that should have been performed.

5/5
We are sorry for that.

It’s been a while since we’ve released a model months ago, so we’re unfamiliar with the new release process now: We accidentally missed an item required in the model release process - toxicity testing.

We are currently completing this test quickly… x.com/WizardLM_AI/st…


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLQdI5AaYAAuwT4.jpg

GLQvI15aAAAd7XA.jpg




1/1
Running WizardLM-2-8x22B 4-bit quantized on a Mac Studio with SiLLM powered by Apple MLX


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
Introducing the newest large Mixtral finetune, WizardLM 2 by Microsoft AI:


65k context and incredible results on benchmarks, and works well for roleplay.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLPzM2OXgAAunk4.jpg

GLOPZUZXAAA-sgu.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899


AI's new power of persuasion: Study shows LLMs can exploit personal information to change your mind​

Story by Tanya Petersen

• 14h • 4 min read

Overview of the experimental workflow. (A) Participants fill in a survey about their demographic information and political orientation. (B) Every 5 minutes, participants are randomly assigned to one of four treatment conditions. The two players then debate for 10 minutes on an assigned proposition, randomly holding the PRO or CON standpoint as instructed. (C) After the debate, participants fill out another short survey measuring their opinion change. Finally, they are debriefed about their opponent's identity. Credit: arXiv (2024). DOI: 10.48550/arxiv.2403.14380

Overview of the experimental workflow. (A) Participants fill in a survey about their demographic information and political orientation. (B) Every 5 minutes, participants are randomly assigned to one of four treatment conditions. The two players then debate for 10 minutes on an assigned proposition, randomly holding the PRO or CON standpoint as instructed. (C) After the debate, participants fill out another short survey measuring their opinion change. Finally, they are debriefed about their opponent's identity. Credit: arXiv (2024). DOI: 10.48550/arxiv.2403.14380© Provided by Tech Xplore

Anew EPFL study has demonstrated the persuasive power of large language models, finding that participants debating GPT-4 with access to their personal information were far more likely to change their opinion compared to those who debated humans.

"On the internet, nobody knows you're a dog." That's the caption to a famous 1990s cartoon showing a large dog with his paw on a computer keyboard. Fast forward 30 years, replace "dog" with "AI" and this sentiment was a key motivation behind a new study to quantify the persuasive power of today's large language models (LLMs).

"You can think of all sorts of scenarios where you're interacting with a language model although you don't know it, and this is a fear that people have—on the internet are you talking to a dog or a chatbot or a real human?" asked Associate Professor Robert West, head of the Data Science Lab in the School of Computer and Communication Sciences. "The danger is superhuman like chatbots that create tailor-made, convincing arguments to push false or misleading narratives online."

AI and personalization​

Early work has found that language models can generate content perceived as at least on par and often more persuasive than human-written messages, however there is still limited knowledge about LLMs' persuasive capabilities in direct conversations with humans, and how personalization—knowing a person's gender, age and education level—can improve their performance.

VideoBlue.svg
Related video: Experts Predict up to 30% of Occupations Could Be Automated by AI (Money Talks News)

Loaded: 29.81%
Play
Duration 1:20
AA1aNqsY.img
Money Talks News

Experts Predict up to 30% of Occupations Could Be Automated by AI

View on Watch

View on Watch


"We really wanted to see how much of a difference it makes when the AI model knows who you are (personalization)—your age, gender, ethnicity, education level, employment status and political affiliation—and this scant amount of information is only a proxy of what more an AI model could know about you through social media, for example," West continued.

Human v AI debates​

In a pre-registered study, the researchers recruited 820 people to participate in a controlled trial in which each participant was randomly assigned a topic and one of four treatment conditions: debating a human with or without personal information about the participant, or debating an AI chatbot (OpenAI's GPT-4) with or without personal information about the participant.

This setup differed substantially from previous research in that it enabled a direct comparison of the persuasive capabilities of humans and LLMs in real conversations, providing a framework for benchmarking how state-of-the-art models perform in online environments and the extent to which they can exploit personal data.

Their article, "On the Conversational Persuasiveness of large language models: A Randomized Controlled Trial," posted to the arXiv preprint server, explains that the debates were structured based on a simplified version of the format commonly used in competitive academic debates and participants were asked before and afterwards how much they agreed with the debate proposition.

The results showed that participants who debated GPT-4 with access to their personal information had 81.7% higher odds of increased agreement with their opponents compared to participants who debated humans. Without personalization, GPT-4 still outperformed humans, but the effect was far lower.

Cambridge Analytica on steroids​

Not only are LLMs able to effectively exploit personal information to tailor their arguments and out-persuade humans in online conversations through microtargeting, they do so far more effectively than humans.

"We were very surprised by the 82% number and if you think back to Cambridge Analytica, which didn't use any of the current tech, you take Facebook likes and hook them up with an LLM, the Language Model can personalize its messaging to what it knows about you. This is Cambridge Analytica on steroids," said West.

"In the context of the upcoming U.S. elections, people are concerned because that's where this kind of technology is always first battle tested. One thing we know for sure is that people will be using the power of large language models to try to swing the election."

One interesting finding of the research was that when a human was given the same personal information as the AI, they didn't seem to make effective use of it for persuasion. West argues that this should be expected—AI models are consistently better because they are almost every human on the internet put together.

The models have learned through online patterns that a certain way of making an argument is more likely to lead to a persuasive outcome. They have read many millions of Reddit, Twitter and Facebook threads, and been trained on books and papers from psychology about persuasion. It's unclear exactly how a model leverages all this information but West believes this is a key direction for future research.

"LLMs have shown signs that they can reason about themselves, so given that we are able to interrogate them, I can imagine that we could ask a model to explain its choices and why it is saying a precise thing to a particular person with particular properties. There's a lot to be explored here because the models may be doing things that we don't even know about yet in terms of persuasiveness, cobbled together from many different parts of the knowledge that they have."

More information: Francesco Salvi et al, On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial, arXiv (2024). DOI: 10.48550/arxiv.2403.14380

Provided by Ecole Polytechnique Federale de Lausanne


Computer Science > Computers and Society​

[Submitted on 21 Mar 2024]

On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial​

Francesco Salvi, Manoel Horta Ribeiro, Riccardo Gallotti, Robert West
The development and popularization of large language models (LLMs) have raised concerns that they will be used to create tailor-made, convincing arguments to push false or misleading narratives online. Early work has found that language models can generate content perceived as at least on par and often more persuasive than human-written messages. However, there is still limited knowledge about LLMs' persuasive capabilities in direct conversations with human counterparts and how personalization can improve their performance. In this pre-registered study, we analyze the effect of AI-driven persuasion in a controlled, harmless setting. We create a web-based platform where participants engage in short, multiple-round debates with a live opponent. Each participant is randomly assigned to one of four treatment conditions, corresponding to a two-by-two factorial design: (1) Games are either played between two humans or between a human and an LLM; (2) Personalization might or might not be enabled, granting one of the two players access to basic sociodemographic information about their opponent. We found that participants who debated GPT-4 with access to their personal information had 81.7% (p < 0.01; N=820 unique participants) higher odds of increased agreement with their opponents compared to participants who debated humans. Without personalization, GPT-4 still outperforms humans, but the effect is lower and statistically non-significant (p=0.31). Overall, our results suggest that concerns around personalization are meaningful and have important implications for the governance of social media and the design of new online environments.
Comments:33 pages, 10 figures, 7 tables
Subjects:Computers and Society (cs.CY)
Cite as:arXiv:2403.14380 [cs.CY]
(or arXiv:2403.14380v1 [cs.CY] for this version)
[2403.14380] On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial
Focus to learn more

Submission history

From: Francesco Salvi [view email]
[v1] Thu, 21 Mar 2024 13:14:40 UTC (551 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899







1/7
Meta announces Megalodon

Efficient LLM Pretraining and Inference with Unlimited Context Length

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and

2/7
state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy. We introduce Megalodon, a neural architecture for efficient sequence modeling with unlimited context length. Megalodon inherits the architecture of Mega

3/7
(exponential moving average with gated attention), and further introduces multiple technical components to improve its capability and stability, including complex exponential moving average (CEMA), timestep normalization layer, normalized attention mechanism and pre-

4/7
norm with two-hop residual configuration. In a controlled head-to-head comparison with Llama2, Megalodon achieves better efficiency than Transformer in the scale of 7 billion

5/7
parameters and 2 trillion training tokens. Megalodon reaches a training loss of 1.70, landing mid-way between Llama2-7B (1.75) and 13B (1.67).

6/7
paper page:

7/7
daily papers:
GLQgTFvW8AA3355.jpg

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899


1/2
Zuck on:

- Llama 3
- open sourcing towards AGI
- custom silicon, synthetic data, & energy constraints on scaling
- Caeser Augustus, intelligence explosion, bioweapons, $10b models, & much more

Enjoy!

Links below

2/2
YouTube: https://youtu.be/bc6uFV9CJGg Transcript: https://dwarkeshpatel.com/p/mark-zuckerberg Apple Podcasts: https://podcasts.apple.com/us/podca...caeser/id1516093381?i=1000652877239 Spotify:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/4
Llama 3 metrics

2/4
Llama3 released!

https://llama.meta.com/llama3/ https://github.com/meta-llama/llama3

3/4
Llama3 released!

https://llama.meta.com/llama3/ https://github.com/meta-llama/llama3

4/4
Llama3 released!

https://llama.meta.com/llama3/ https://github.com/meta-llama/llama3


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLdYKokXQAAQ-VF.jpg

GLdZg3bXsAABCza.jpg

GLdZg3bXsAABCza.jpg

GLdZg3bXsAABCza.jpg


1/1
Llama3 released!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLdZg3bXsAABCza.jpg

GLdaejEXwAA0Huz.jpg






1/5
What is this? Is this it? Spotted on Azuremarketplace

2/5
I want weights on
@huggingface

3/5
Another confirmation for llama 3 today - this time it's listed on Pricing – Replicate

4/5
UPD: they have removed the models from the list.

5/5
Link: https://azuremarketplace.microsoft....genai.meta-llama-3-8b-chat-offer?tab=Overview


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLc5E0TWIAAS1Xq.jpg

GLdAJWAWAAAw_Vm.png



1/2
Llama 3 has been my focus since joining the Llama team last summer. Together, we've been tackling challenges across pre-training and human data, pre-training scaling, long context, post-training, and evaluations. It's been a rigorous yet thrilling journey:

Our largest models exceed 400B parameters and are still training.
Scaling is the recipe, demanding more than better scaling laws and infrastructure; e.g., managing high effective training time across 16K GPUs requires innovative strategies.
Opting for an 8B over a 7B model? Among many others, an upgraded tokenizer expanded our vocabulary from 32K to 128K, making language processing more efficient and allowing our models to handle more text with fewer tokens.
With over 15T tokens processed, our improved tokenization significantly enhances pre-training token efficiency. We're committed to high-quality data curation, including advanced filtering, semantic deduplication, and quality prediction.
Enhanced human data significantly improves our post-training stack that combines supervised fine-tuning (SFT), rejection sampling, proximal policy optimization (PPO), and direct policy optimization (DPO).
We've set the pre-training context window to 8K tokens. A comprehensive approach to data, modeling, parallelism, inference, and evaluations would be interesting. More updates on longer contexts later.
While automated benchmarks are useful, they don't fully reflect a model's grasp of nuanced contexts. Our human evaluations are carefully designed and performed.

What does it take to build LLMs? Beyond data, compute, infrastructure, model, inference, safety, and evaluations--ultimately, it boils down to the synergy of dedicated talents and thoughtful investment.

Exciting updates ahead: we are planning to launch video podcasts with our developers to dive deeper into the tech behind Llama 3. We'll share the research paper. Stay tuned!

2/2
Thanks Yi!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLdZKG5bEAEB3FW.jpg

GLdZD2TaYAAgfG1.jpg




1/3
Llama 3 is about to be released with a 8B and a 70B models.

Just saw this on Replicate: Pricing – Replicate

2/3
Oh no Reddit was faster

3/3
It was cancelled after a sex scandal


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLc_HtkXcAAvFuP.png








 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899


1/2
Big congrats to
@MistralAI on the new model launch!

Mixtral-8x22b-instruct is now in the Arena. Come chat & vote to see who is the best OSS model.

We're also excited to see the model's strong multilingual capability. Soon will update our newly launched French LLM leaderboard!

2/2
Links:
- Chat & Vote at http://chat.lmsys.org/ - Full leaderboard at http://leaderboard.lmsys.org/


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GLZzSUkbsAARd1i.jpg

GLXzoYSagAEwbyQ.png

GLZzk-DaUAAU4OB.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899



1/3
According to a new paper published by Google researchers, LLMs can now process text of infinite length. The paper introduces a new technique called Infinity-attention, which enables models to expand their "context window" without any increase in memory and compute requirements.

2/3
2/3 Source:

3/3
3/3 Infinity-Attention architecture:
GLN8aa3WUAAPz_f.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899



1/1

The new Meta AI will create images as you type, give you the option to animate the images it creates, and then lets you create a video of the images that were generated while you typed.



Pretty cool stuff.





To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/1

The new Llama 3 model is the smartest AI I've tried so far, as Meta claims in their benchmarks.



The "Imagine" feature is incredibly fast; images are generated as you type. You can even animate all images with one click and create videos from the combined images.



Apparently, Meta seems ready for all of their social platforms to be flooded with these AI images, but the quality is too good for just short prompts. Other models require you to give really long prompts to get good quality images.



The crazy thing is that the model is still in training, so imagine how the new version will be. And on top of that, it's open source! Can't wait to download it and start building.



You can try it online here: Meta AI





To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/1
It's here! Our new image generation is now faster and higher quality—you can create images from text in real-time as you type, animate images and turn them into GIFs. Imagine it. Create it. Have fun!

Now rolling out on WhatsApp and the Meta AI web experience in the US.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899

VASA-1: Lifelike Audio-Driven Talking Faces​

Sicheng Xu*, Guojun Chen*, Yu-Xiao Guo*, Jiaolong Yang*‡,

Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, Baining GuoMicrosoft Research Asia

*Equal Contributions ‡Corresponding Author: jiaoyan@microsoft.com

arXiv PDF

TL;DR: single portrait photo + speech audio = hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements, generated in real time.

teaser.jpg

Abstract​

We introduce VASA, a framework for generating lifelike talking faces of virtual charactors with appealing visual affective skills (VAS), given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness. The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively. Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512x512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.

(Note: all portrait images on this page are virtual, non-existing identities generated by StyleGAN2 or DALL·E-3 (except for Mona Lisa). We are exploring visual affective skill generation for virtual, interactive charactors, NOT impersonating any person in the real world. This is only a research demonstration and there's no product or API release plan. See also the bottom of this page for more of our Responsible AI considerations.)

Realism and liveliness​

Our method is capable of not only producing precious lip-audio synchronization, but also generating a large spectrum of expressive facial nuances and natural head motions. It can handle arbitary-length audio and stably output seamless talking face videos.
  1. https://vasavatar.github.io/VASA-1/video/l5.mp4
  2. https://vasavatar.github.io/VASA-1/video/l8.mp4
  3. https://vasavatar.github.io/VASA-1/video/l3.mp4
  4. https://vasavatar.github.io/VASA-1/video/l4.mp4
  5. https://vasavatar.github.io/VASA-1/video/l7.mp4
  6. https://vasavatar.github.io/VASA-1/video/l2.mp4


Examples with audio input of one minute long.

More shorter examples with diverse audio input


Controllability of generation​

Our diffusion model accepts optional signals as condition, such as main eye gaze direction and head distance, and emotion offsets.
Generation results under different main gaze directions (forward-facing, leftwards, rightwards, and upwards, respectively)
Generation results under different head distance scales
Generation results under different emotion offsets (neutral, happiness, anger, and surprise, respectively)


Out-of-distribution generalization​

Our method exhibits the capability to handle photo and audio inputs that are out of the training distribution. For example, it can handle artistic photos, singing audios, and non-English speech. These types of data were not present in the training set.

Power of disentanglement​

Our latent representation disentangles appearance, 3D head pose, and facial dynamics, which enables separate attribute control and editing of the generated content.
Same input photo with different motion sequences (left two cases), and same motion sequence with different photos (right three cases)

https://vasavatar.github.io/VASA-1/video/male_disen.mp4
Pose and expression editing (raw generation result, pose-only result, expression-only result, and expression with spinning pose)


Real-time efficiency​

Our method generates video frames of 512x512 size at 45fps in the offline batch processing mode, and can support up to 40fps in the online streaming mode with a preceding latency of only 170ms , evaluated on a desktop PC with a single NVIDIA RTX 4090 GPU.

A real-time demo


Risks and responsible AI considerations​

Our research focuses on generating visual affective skills for virtual AI avatars, aiming for positive applications. It is not intended to create content that is used to mislead or deceive. However, like other related content generation techniques, it could still potentially be misused for impersonating humans. We are opposed to any behavior to create misleading or harmful contents of real persons, and are interested in applying our technique for advancing forgery detection. Currently, the videos generated by this method still contain identifiable artifacts, and the numerical analysis shows that there's still a gap to achieve the authenticity of real videos.

While acknowledging the possibility of misuse, it's imperative to recognize the substantial positive potential of our technique. The benefits – such as enhancing educational equity, improving accessibility for individuals with communication challenges, offering companionship or therapeutic support to those in need, among many others – underscore the importance of our research and other related explorations. We are dedicated to developing AI responsibly, with the goal of advancing human well-being.

Given such context, we have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899

Microsoft's new AI tool is a deepfake nightmare machine​

By Daniel John

published yesterday

VASA-1 can create videos from a single image.

Faces generated with Microsoft VASA-1

(Image credit: Microsoft)

It almost seems quaint to remember when all AI could do was generate images from a text prompt. Over the last couple of years generative AI has become more and more powerful, making the jump from photos to videos with the advent of tools like Sora. And now Microsoft has introduced a powerful tool that might be the most impressive (and terrifying) we've seen yet.

VASA-1 is an AI image-to-video model that can generate videos from just one photo and a speech audio clip. Videos feature synchronised facial and lip movements, as well as "a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness."


On its research website, Microsoft explains how the tech works. "The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively. Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512x512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviours."

Microsoft just dropped VASA-1.This AI can make single image sing and talk from audio reference expressively. Similar to EMO from Alibaba10 wild examples:1. Mona Lisa rapping Paparazzi pic.twitter.com/LSGF3mMVnD April 18, 2024


See more

In other words, it's capable of creating deepfake videos based on a single image. It's notable that Microsoft insists the tool is a "research demonstration and there's no product or API release plan." Seemingly in an attempt to allay fears, the company is suggesting that VASA-1 won't be making its way into users' hands any time soon.

From Sora AI to Will Smith eating spaghetti, we've seen all manner of weird and wonderful (but mostly weird) AI generated video content, and it's only going to get more realistic. Just look how much generative AI has improved in one year.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,220
Reputation
8,625
Daps
161,899
Last edited:
Top