The A.I Megathread (LLM , GPT , Development)

Black Magisterialness · Nov 23, 2023

bnew said:
I think they did have power right up until a chorus of the right voices largely opposed the boards choice of action. I think it took everyone by surprise and the power shifted. now that its out in the open that microsoft will create a division for that entire company if they choose to leave and even salesforce is willing to hire those employees, the boards unforced error left it in a weaker position to direct the company,

Which means their power wasn't as strong as they though. Ilya thought that the company was guided through altruism and the Microsoft deal and their engineers proved otherwise. These people want to get RICH and put their stamp on the world saying THEY created AGI. The consequences be damned.

We just made a movie about the nikka who spearheaded the Manhattan Project. Nothing about the people it was tested on, almost no films or media made about the Japanese victims of said bombings. No...just the architects of destruction.

Even if it's just at a subconscious level, this is what these people are after. Their egos outweigh their empathy or even common sense.

bnew · Nov 23, 2023

Yi-34B-Chat

Chat model - https://huggingface.co/01-ai/Yi-34B-Chat
Demo to try it out - https://huggingface.co/spaces/01-ai/Yi-34B-Chat

https://huggingface.co/TheBloke/Yi-34B-Chat-GGUF

newarkhiphop · Nov 23, 2023

Reddit - Dive into anything

www.reddit.com

bnew · Nov 23, 2023

bnew · Nov 23, 2023

LLMs cannot find reasoning errors, but can correct them given the error location

While self-correction has shown promise in improving LLM outputs in terms of style and quality (e.g. Chen et al., 2023b; Madaan et al., 2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall...

arxiv.org

Computer Science > Artificial Intelligence

[Submitted on 14 Nov 2023]

LLMs cannot find reasoning errors, but can correct them!

Gladys Tyen, Hassan Mansoor, Peter Chen, Tony Mak, Victor Cărbune

While self-correction has shown promise in improving LLM outputs in terms of style and quality (e.g. Chen et al., 2023; Madaan et al., 2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall (Huang et al., 2023). In this paper, we break down the self-correction process into two core components: mistake finding and output correction. For mistake finding, we release BIG-Bench Mistake, a dataset of logical mistakes in Chain-of-Thought reasoning traces. We provide benchmark numbers for several state-of-the-art LLMs, and demonstrate that LLMs generally struggle with finding logical mistakes. For output correction, we propose a backtracking method which provides large improvements when given information on mistake location. We construe backtracking as a lightweight alternative to reinforcement learning methods, and show that it remains effective with a reward model at 60-70% accuracy.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2311.08516 [cs.AI]
	(or arXiv:2311.08516v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2311.08516 Focus to learn more

Submission history

From: Gladys Tyen [view email]
[v1] Tue, 14 Nov 2023 20:12:38 UTC (7,191 KB)

GitHub - WHGTyen/BIG-Bench-Mistake: A dataset of LLM-generated chain-of-thought steps annotated with mistake location.

A dataset of LLM-generated chain-of-thought steps annotated with mistake location. - GitHub - WHGTyen/BIG-Bench-Mistake: A dataset of LLM-generated chain-of-thought steps annotated with mistake lo...

github.com

BIG-Bench-Hard/cot-prompts at main · suzgunmirac/BIG-Bench-Hard

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them - suzgunmirac/BIG-Bench-Hard

github.com

https://arxiv.org/pdf/2311.08516.pdf

partial AI summary:

ANSWER | PHIND V8 MODEL

The paper you're referring to is about Large Language Models (LLMs) and their ability to self-correct their outputs. The authors, Gladys Tyen, Hassan Mansoor, Peter Chen, Tony Mak, and Victor Carbune, have broken down the self-correction process into two components: mistake finding and output correction.
Here are the key highlights:

Mistake Finding: The authors have released a dataset called BIG-Bench Mistake, which contains logical mistakes in Chain-of-Thought reasoning traces. They found that LLMs generally struggle with finding these logical mistakes.

Output Correction: The authors propose a backtracking method for output correction. This method provides large improvements when given information on the mistake location. They view backtracking as a lightweight alternative to reinforcement learning methods and show that it remains effective with a reward model at 60-70% accuracy.

Benchmark Results: The authors tested the reasoning capabilities of state-of-the-art LLMs using their dataset. They found that LLMs struggle with mistake finding, even for objective, unambiguous cases. This suggests that LLMs' inability to find mistakes is a main contributing factor to their inability to self-correct reasoning errors.

Backtracking as Verbal Reinforcement Learning: The authors view backtracking as a form of "verbal reinforcement learning". They propose that backtracking can be used with a trained classifier as a reward model, and demonstrate the effectiveness of backtracking at various reward model accuracies.

BIG-Bench Mistake Dataset: The authors created the BIG-Bench Mistake dataset, which is the first of its kind that goes beyond problems in mathematics. The dataset consists of 2186 sets of CoT-style traces, each annotated with the location of the first logical error. The authors used this dataset to test the reasoning capabilities of state-of-the-art LLMs.

Annotation: The authors used both human and automatic annotation for the BIG-Bench Mistake dataset. They found that the inter-rater reliability for the human-annotated tasks was high, suggesting the reliability of the dataset.

Prompting Methods: The authors compared three different prompting methods: Direct trace-level prompting, Direct step-level prompting, and CoT step-level prompting. They found that the accuracy on traces with no mistakes goes down considerably from direct, trace-level prompting to CoT step-level prompting.

Few-shot Prompting for Mistake Location: The authors investigated whether their prompting methods can reliably determine the correctness of a trace rather than the mistake location. They found that using mistake finding as a strategy for determining whether an answer is correct or not can be effective.

In summary, the authors have found that LLMs can correct their outputs using a backtracking method, but they struggle with finding mistakes. They propose backtracking as a lightweight alternative to reinforcement learning methods and show that it remains effective with a reward model at 60-70% accuracy. They also provide a new dataset, BIG-Bench Mistake, for testing the reasoning capabilities of LLMs.

bnew · Nov 23, 2023

bnew · Nov 23, 2023

newarkhiphop said:
Reddit - Dive into anything

www.reddit.com

bnew · Nov 23, 2023

bnew · Nov 23, 2023

BlackFriday GPTs Prompts And Jailbreaks

https://github.com/friuns2/BlackFriday-GPTs-Prompts

bnew · Nov 23, 2023

DeepMind Says New Multi-Game AI Is a Step Toward More General Intelligence

A new Google DeepMind algorithm that can tackle a much wider variety of games could be a step towards more general AI, its creators say.

singularityhub.com

DeepMind Says New Multi-Game AI Is a Step Toward More General Intelligence

ByEdd Gent

November 20, 2023

AI has mastered some of the most complex games known to man, but models are generally tailored to solve specific kinds of challenges. A new DeepMind algorithm that can tackle a much wider variety of games could be a step towards more general AI, its creators say.

Using games as a benchmark for AI has a long pedigree. When IBM’s Deep Blue algorithm beat chess world champion Garry Kasparov in 1997, it was hailed as a milestone for the field. Similarly, when DeepMind’s AlphaGo defeated one of the world’s top Go players, Lee Sedol, in 2016, it led to a flurry of excitement about AI’s potential.

DeepMind built on this success with AlphaZero, a model that mastered a wide variety of games, including chess and shogi. But as impressive as this was, AlphaZero only worked with perfect information games where every detail of the game, other than the opponent’s intentions, is visible to both players. This includes games like Go and chess where both players can always see all the pieces on the board.

In contrast, imperfect information games involve some details being hidden from the other player. Poker is a classic example because players can’t see what hands their opponents are holding. There are now models that can beat professionals at these kinds of games too, but they use an entirely different approach than algorithms like AlphaZero.

Now, researchers at DeepMind have combined elements of both approaches to create a model that can beat humans at chess, Go, and poker. The team claims the breakthrough could accelerate efforts to create more general AI algorithms that can learn to solve a wide variety of tasks.

Researchers building AI to play perfect information games have generally relied on an approach known as tree search. This explores a multitude of ways the game could progress from its current state, with different branches mapping out potential sequences of moves. AlphaGo combined tree search with a machine learning technique in which the model refines its skills by playing itself repeatedly and learning from its mistakes.

When it comes to imperfect information games, researchers tend to instead rely on game theory, using mathematical models to map out the most rational solutions to strategic problems. Game theory is used extensively in economics to understand how people make choices in different situations, many of which involve imperfect information.

In 2016, an AI called DeepStack beat human professionals at no-limit poker, but the model was highly specialized for that particular game. Much of the DeepStack team now works at DeepMind, however, and they’ve combined the techniques they used to build DeepStack with those used in AlphaZero.

The new algorithm, called Student of Games, uses a combination of tree search, self-play, and game-theory to tackle both perfect and imperfect information games. In a paper in Science, the researchers report that the algorithm beat the best openly available poker playing AI, Slumbot, and could also play Go and chess at the level of a human professional, though it couldn’t match specialized algorithms like AlphaZero.

But being a jack-of-all-trades rather than a master of one is arguably a bigger prize in AI research. While deep learning can often achieve superhuman performance on specific tasks, developing more general forms of AI that can be applied to a wide range of problems is trickier. The researchers say a model that can tackle both perfect and imperfect information games is “an important step toward truly general algorithms for arbitrary environments.”

It’s important not to extrapolate too much from the results, Michael Rovatsos from the University of Edinburgh, UK, told New Scientist. The AI was still operating within the simple and controlled environment of a game, where the number of possible actions is limited and the rules are clearly defined. That’s a far cry from the messy realities of the real world.

But even if this is a baby step, being able to combine the leading approaches to two very different kinds of game in a single model is a significant achievement. And one that could certainly be a blueprint for more capable and general models in the future.

bnew · Nov 23, 2023

64% of workers have passed off generative AI work as their own

Generative AI technologies are being used in the workplace without training, guidance, or approval by businesses.

www.zdnet.com

64% of workers have passed off generative AI work as their own

Generative AI technologies are being used in the workplace without training, guidance, or approval by businesses.

Written by Vala Afshar, Contributing WriterNov. 22, 2023 at 9:51 a.m. PT

John M Lund Photography Inc/Getty Images

Many users of generative AI in the workplace are leveraging the technology without training, guidance, or approval from their employers, according to new research from Salesforce. The company surveyed more than 14,000 global workers across 14 countries for the latest iteration of its Generative AI Snapshot Research Series.

Also: If AI is the future of your business, should the CIO be the one in control?

Research shows that over a quarter (28%) of workers globally are currently using generative AI at work, and over half are doing so without the formal approval of their employers. With an additional 32% expecting to use generative AI at work soon, it's clear that penetration of the technology will continue -- with or without oversight.

The survey identified the top 3 safe use cases of generative AI:

Only use company-approved GenAI tools/programs.
Never use confidential company data in prompts for generative AI.
Never use personally identifiable customer data in prompts for generative AI.

The top 3 ethical uses of GenAI at the workplace include:

Fact-checking of generative AI outputs before using them.
Only using generative AI tools that have been validated for accuracy.
Only using company-approved generative AI tools and programs.

The survey found other interesting safety and ethical uses of generative AI in the workplace, including sourcing prompt outputs with accuracy. The survey revealed that 64% of workers have passed off generative AI work as their own. And 41% of workers would consider overstating their generative AI skills to secure a work opportunity.

The most alarming reveal of the survey may be that 7 in 10 global workers have never completed or received training on how to use generative AI safely and ethically at work.

Also: How AI reshapes the IT industry will be 'fast and dramatic'

Generative AI usage policies do vary by industry. Only 15% of all industries have loosely defined policies for using generative AI for work -- 17% in the United States. Nearly 1 in 4 have no policies on using generative AI at work (1 in 3 in the US). In one example, 87% of global workers in the healthcare industry claim their company lacks clear policies. Nearly 4 in 10 (39%) global workers say their employer doesn't hold a strong opinion about generative AI use in the workplace.

The overall benefits of using GenAI in the workplace are clear. The survey found that 71% of the workforce believe that generative AI makes them more productive at work. And nearly 6 out of 10 employees say GenAI makes them more engaged at work. As far as career benefits, 47% of global workers believe mastering generative AI would make them more sought after in the workplace, over half (51%) believe it would result in increased job satisfaction, and 44% say it would mean they would be paid more than those who don't master the technology.

newarkhiphop · Nov 23, 2023

bnew said:

I read the whole thread and I still don't fully understand why it doing basic math is a big deal

bnew · Nov 23, 2023

newarkhiphop said:
I read the whole thread and I still don't fully understand why it doing basic math is a big deal

https://archive.is/wip/5s6Bl

Q*

> can solve grade school math problems
> so what, what’s the big deal?
> why did the researchers send the letter ?
> why did Ilya freak out

Because OpenAI already has a stack where they can predict intelligence based on compute and data. These disclosed this with the GPT4 release.

Q* is a relatively small model, solving grade school math problems reliably and consistently using reasoning. It’s an algorithmic breakthrough.

And they have scaling graphs to predict the intelligence increase from data and compute, so they know that they can probably solve elite human and beyond human problems.

They don’t have AGI yet. But they have a near term roadmap. Now the question is whether to go down the path or not.

Sam’s reaction to this was to raise 10s of billions of dollars to:
> get a Jony Ive device into people’s hands
> and build data centers everywhere to deploy this tech as soon as possible

assuming it can solve basic math problems with 1000% accuracy it has the foundation to start doing more complex math accurately, better decision making. it can be put to task solving mathematical equations with high precision in physics and engineering. it can used to improve the logistics of eveything. climate modeling or even automatically adjusting energy usage based on a myriad of factors in real-time.

YvrzTrvly · Nov 23, 2023

Losers begging for their own demise. Classic human hubris.

Who gives a fukk about this shyt. Won't be utilized equitably

bnew · Nov 24, 2023

Llama 2 7B Chat - a Hugging Face Space by huggingface-projects

Discover amazing ML apps made by the community

hf.co

Llama 2 13b Chat - a Hugging Face Space by huggingface-projects

Discover amazing ML apps made by the community

hf.co

The A.I Megathread (LLM , GPT , Development)

Moderna Boi

Veteran

Moderator

Veteran

Veteran

Computer Science > Artificial Intelligence​

LLMs cannot find reasoning errors, but can correct them!​

Submission history​

ANSWER | PHIND V8 MODEL​

Veteran

Veteran

Veteran

Veteran

BlackFriday GPTs Prompts And Jailbreaks​

Veteran

DeepMind Says New Multi-Game AI Is a Step Toward More General Intelligence​

Veteran

64% of workers have passed off generative AI work as their own​

Moderator

Veteran

All Star

Veteran

Computer Science > Artificial Intelligence

LLMs cannot find reasoning errors, but can correct them!

Submission history

ANSWER | PHIND V8 MODEL

BlackFriday GPTs Prompts And Jailbreaks

DeepMind Says New Multi-Game AI Is a Step Toward More General Intelligence

64% of workers have passed off generative AI work as their own