The A.I Megathread (LLM , GPT , Development)

Artificial Intelligence · Apr 9, 2024

bnew said:
Japanese researchers say they used AI to try and translate the noises of clucking chickens and learn whether they're excited, hungry, or scared

Researchers tested the system on 80 chickens for the study, which was led by a University of Tokyo professor.

www.businessinsider.com

Japanese researchers say they used AI to try and translate the noises of clucking chickens and learn whether they're excited, hungry, or scared
Beatrice Nolan

Sep 21, 2023, 7:41 AM EDT

The study, which was led by University of Tokyo professor Adrian David Cheok, has yet to be peer-reviewed. Getty Images

Researchers think they've found a way to use AI to translate the clucks of chickens.

The Japanese researchers said their AI system could help understand chickens' emotional state.

The study has not been peer-reviewed and the researchers acknowledged limitations to their methods.

Researchers in Japan said they'd developed an AI system that could understand the emotional state of chickens.

The study, which was led by University of Tokyo professor Adrian David Cheok, has yet to be peer-reviewed.

The AI system is based on a technique the researchers called "Deep Emotional Analysis Learning," which can adapt to changing vocal patterns.

The study found that the system was capable of translating "various emotional states in chickens, including hunger, fear, anger, contentment, excitement, and distress."

The study said: "Our methodology employs a cutting-edge AI technique we call Deep Emotional Analysis Learning (DEAL), a highly mathematical and innovative approach that allows for the nuanced understanding of emotional states through auditory data."

"If we know what animals are feeling, we can design a much better world for them," Cheok told the New York Post. Cheok did not immediately respond to Insider's request for comment, made outside normal working hours.

The researchers tested the system on 80 chickens for the study and collaborated with a team of animal psychologists and veterinarians.

The system was able to achieve surprisingly high accuracy in identifying the birds' emotional states, the study found. "The high average probabilities of detection for each emotion suggest that our model has learned to capture meaningful patterns and features from the chicken sounds," it said.

The researchers acknowledged potential limitations, including variations in breeds and the complexity of some communications, such as body language.

Scientists and researchers are also using AI tools for conservation efforts. In one case, AI tools have been implemented to help identify tracks to better understand animal populations.

In 2022, researchers led by the University of Copenhagen, the ETH Zurich, and France's National Research Institute for Agriculture, Food and Environment said they'd created an algorithm to help understand the emotions of pigs.

Simpsons did it.

bnew · Apr 9, 2024

1/3
I *WAS* WRONG - $10K CLAIMED!

## The Claim

Two days ago, I confidently claimed that "GPTs will NEVER solve the A::B problem". I believed that: 1. GPTs can't truly learn new problems, outside of their training set, 2. GPTs can't perform long-term reasoning, no matter how simple it is. I argued both of these are necessary to invent new science; after all, some math problems take years to solve. If you can't beat a 15yo in any given intellectual task, you're not going to prove the Riemann Hypothesis. To isolate these issues and raise my point, I designed the A::B problem, and posted it here - full definition in the quoted tweet.

## Reception, Clarification and Challenge

Shortly after posting it, some users provided a solution to a specific 7-token example I listed. I quickly pointed that this wasn't what I meant; that this example was merely illustrative, and that answering one instance isn't the same as solving a problem (and can be easily cheated by prompt manipulation).

So, to make my statement clear, and to put my money where my mouth is, I offered a $10k prize to whoever could design a prompt that solved the A::B problem for *random* 12-token instances, with 90%+ success rate. That's still an easy task, that takes an average of 6 swaps to solve; literally simpler than 3rd grade arithmetic. Yet, I firmly believed no GPT would be able to learn and solve it on-prompt, even for these small instances.

## Solutions and Winner

Hours later, many solutions were submitted. Initially, all failed, barely reaching 10% success rates. I was getting fairly confident, until, later that day,
@ptrschmdtnlsn and
@SardonicSydney
submitted a solution that humbled me. Under their prompt, Claude-3 Opus was able to generalize from a few examples to arbitrary random instances, AND stick to the rules, carrying long computations with almost zero errors. On my run, it achieved a 56% success rate.

Through the day, users
@dontoverfit
(Opus),
@hubertyuan_
(GPT-4),
@JeremyKritz
(Opus) and
@parth007_96
(Opus),
@ptrschmdtnlsn
(Opus) reached similar success rates, and
@reissbaker
made a pretty successful GPT-3.5 fine-tune. But it was only late that night that
@futuristfrog
posted a tweet claiming to have achieved near 100% success rate, by prompting alone. And he was right. On my first run, it scored 47/50, granting him the prize, and completing the challenge.

## How it works!?

The secret to his prompt is... going to remain a secret! That's because he kindly agreed to give 25% of the prize to the most efficient solution. This prompt costs $1+ per inference, so, if you think you can improve on that, you have until next Wednesday to submit your solution in the link below, and compete for the remaining $2.5k! Thanks, Bob.

## How do I stand?

Corrected! My initial claim was absolutely WRONG - for which I apologize. I doubted the GPT architecture would be able to solve certain problems which it, with no margin for doubt, solved. Does that prove GPTs will cure Cancer? No. But it does prove me wrong!

Note there is still a small problem with this: it isn't clear whether Opus is based on the original GPT architecture or not. All GPT-4 versions failed. If Opus turns out to be a new architecture... well, this whole thing would have, ironically, just proven my whole point But, for the sake of the competition, and in all fairness, Opus WAS listed as an option, so, the prize is warranted.

## Who I am and what I'm trying to sell?

Wrong! I won't turn this into an ad. But, yes, if you're new here, I AM building some stuff, and, yes, just like today, I constantly validate my claims to make sure I can deliver on my promises. But that's all I'm gonna say, so, if you're curious, you'll have to find out for yourself (:

####

That's all. Thanks for all who participated, and, again - sorry for being a wrong guy on the internet today! See you.

Gist:

2/3
(The winning prompt will be published Wednesday, as well as the source code for the evaluator itself. Its hash is on the Gist.)

3/3
half of them will be praising Opus (or whatever the current model is) and the other half complaining of CUDA, and 1% boasting about HVM milestones... not sure if that's your type of content, but you're welcome!

bnew · Apr 9, 2024

Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra

In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra in solving undergraduate-level control problems. Controls provides an interesting case study for LLM reasoning due to its combination of mathematical...

arxiv.org

Mathematics > Optimization and Control

[Submitted on 4 Apr 2024]

Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra

Darioush Kevian, Usman Syed, Xingang Guo, Aaron Havens, Geir Dullerud, Peter Seiler, Lianhui Qin, Bin Hu

In this paper, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra in solving undergraduate-level control problems. Controls provides an interesting case study for LLM reasoning due to its combination of mathematical theory and engineering design. We introduce ControlBench, a benchmark dataset tailored to reflect the breadth, depth, and complexity of classical control design. We use this dataset to study and evaluate the problem-solving abilities of these LLMs in the context of control engineering. We present evaluations conducted by a panel of human experts, providing insights into the accuracy, reasoning, and explanatory prowess of LLMs in control engineering. Our analysis reveals the strengths and limitations of each LLM in the context of classical control, and our results imply that Claude 3 Opus has become the state-of-the-art LLM for solving undergraduate control problems. Our study serves as an initial step towards the broader goal of employing artificial general intelligence in control engineering.

Subjects:	Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2404.03647 [math.OC]
	(or arXiv:2404.03647v1 [math.OC] for this version)
	[2404.03647] Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra Focus to learn more

Submission history

From: Bin Hu [view email]
[v1] Thu, 4 Apr 2024 17:58:38 UTC (505 KB)

https://arxiv.org/pdf/2404.03647.pdf

bnew · Apr 9, 2024

🏃Alibaba Chief Says China AI 2 Years Behind US, How Humor Forum Unexpectedly Makes AI Smarter, and China Approves 117 Gen-AI Models

Weekly China AI News from March 25, 2024 to April 3, 2024

recodechinaai.substack.com

Alibaba Chief Says China AI 2 Years Behind US, How Humor Forum Unexpectedly Makes AI Smarter, and China Approves 117 Gen-AI Models

Weekly China AI News from March 25, 2024 to April 3, 2024

TONY PENG

APR 05, 2024

Hello readers, as Chinese families are honoring their ancestors in the Qingming tomb-sweeping festival, I’m delivering this week's issue early. In this edition, I highlighted Alibaba Chair Joe Tsai’s perspective on China’s AI and the US chip restrictions. Surprisingly, a humor sub-forum on Baidu Tieba, Ruozhiba (弱智吧), has emerged as a goldmine for training Chinese LLMs. China’s Internet regulator has released a full list of 117 generative AI models now approved for public services.

Alibaba Chair Joe Tsai on China AI, Chip Restrictions, and Homegrown GPUs

What’s New: Alibaba’s co-founder and new chief Joe Tsai said in a recent public interview that China is two years behind the top LLM from the U.S. and believes the country can eventually produce its high-end GPUs. Below are quick highlights.

US vs China on AI: “I think China is today behind. It's clear that American companies like OpenAI have really leaped ahead of everybody

else, but China is trying to play catchup. I think China could have a lag that will last for a long time because everybody else is running very fast as

well. I think today we’re probably two years behind the top models.”
Chip Restrictions: “Last October the U.S. put in very stringent restrictions on the ability of companies like Nvidia to export high-end chips to every company in China, so they’ve sort of abandoned the entity list approach and they put the entire China on their list. I think we’re definitely affected by that. In fact, we’ve actually publicly communicated it did affect our cloud business and our ability to offer high-end Computing Services to our customers. So it is an issue in the short run and probably the medium run, but in the long run, China will develop its own ability to make these high-end GPUs.”
Short-term impact: “I think in the next year or 18 months the training of large language models (LLMs) can still go ahead given the inventory that people have. I think there’s more high computing that’s required for training as opposed to the applications, what people call inference. So on the inference side, there are multiple options. You don’t need to have as

high power and high-end chips such as the Nvidia you know the latest model.”
Alibaba’s AI strategy: “We’re one of the largest cloud players in China so AI is essential. Having a good large language model that is proprietarily developed in-house is very important because it helps our cloud business if we have a great LLM and other people, other developers are developing on top of it they’re using our computing services. So we see AI as very much the left hand and right hand for our cloud business. And the other aspect is the e-commerce business is one of the places where you can have the richest use cases for AI. So you can develop a lot of really cool products on top of our own models or even someone else’s open-source model…You can try something on using virtual dressing rooms. Our merchants doing business on our marketplace will be able to use AI to self-generate photos product descriptions and things like that.”

Chinese Reddit-like Humor Forum Ruozhiba (弱智吧) Unexpectedly Makes AI Smarter

https://www.youtube.com/watch?v=mT6mRJehJdw

What’s New: Ruozhiba, which literally translates to “Idiot Sub-forum”, is a bizarre corner of the Chinese internet. This sub-forum on Reddit-like Baidu Tieba is filled with ridiculous, pun-filled, logically challenging threads that will twist your brain into a pretzel. Here are some examples:

Is it a violation to drink all the water during a swimming race and then run?
Since prisons are full of criminals, why don’t the cops just go arrest people there?
Fresh sashimi is a dead fish slice (生鱼片是死鱼片).

But who knew this forum has unexpectedly become a treasure trove for training Chinese language AI models?

How it Works: A recent paper titled COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning introduced a high-quality Chinese dataset aimed at fine-tuning Chinese LLMs to better understand and respond to instructions like native Chinese.

The dataset contains over 48,000 instruction-response pairs collected from diverse sources on the Chinese internet like Q&A forums, Wiki articles, exams, and existing NLP datasets.

The authors then analyzed the effects of different data sources, including Ruozhiba.

The Ruozhiba dataset only contains 240 instruction-response pairs. The authors first collected the 500 most highly upvoted threads from Ruozhiba. They used the titles of these threads as instructions. For the responses, some were generated by humans and some by GPT-4.

Surprisingly, the authors found that the Yi-34B model fine-tuned on the Ruozhiba data performed the best overall across different tasks on the BELLE-EVAL benchmark, outperforming other data sources like Zhihu, Xiaohongshu, and Wiki.

Additionally, the smaller Yi-6B model fine-tuned on the Ruozhiba subset also ranked third overall, behind only the carefully curated CQIA-Subset and the Exam data.

On the SafetyBench which evaluates ethical and safe behavior, the Yi-6B model trained on Ruozhiba data also secured the second-highest score.

The authors conjectured that Ruozhiba “may enhance the model’s

logical reasoning ability, thereby benefiting most of the instruct-following tasks.”

Why it Matters: It’s just a fun story that I really enjoyed writing about. You never would have guessed that a dataset filled with pure nonsense could actually help enhance AI!

China Approves 117 Generative AI Models for Public Use

What’s New: China has approved 117 generative AI models for public use as of March 28, 2024, the Cyberspace Administration of China (CAC) disclosed for the first time.

Background: Under China’s generative AI regulation, platforms especially chatbots like Baidu’s ERNIE Bot and Alibaba’s Tongyi Qianwen had to seek approval from local CAC offices before launch. Since August last year, any generative AI services “capable of shaping public opinion or mobilizing society” must undergo a safety evaluation and registration process.

Local CAC offices will then publicly disclose information of registered generative AI services.

Key Takeaways

While I haven’t studied all 117 models, assumably a majority of models are language-based models (or LLMs).
No models from non-Chinese companies have made the cut yet.
Beijing and Shanghai stand at the forefront of China’s AI innovation, with 51 models from Beijing and 24 from Shanghai receiving approval.

bnew · Apr 9, 2024

1/5
A simple puzzle GPTs will NEVER solve:

As a good programmer, I like isolating issues in the simplest form. So, whenever you find yourself trying to explain why GPTs will never reach AGI - just show them this prompt. It is a braindead question that most children should be able to read, learn and solve in a minute; yet, all existing AIs fail miserably. Try it!

It is also a great proof that GPTs have 0 reasoning capabilities outside of their training set, and that they'll will never develop new science. After all, if the average 15yo destroys you in any given intellectual task, I won't put much faith in you solving cancer.

Before burning 7 trillions to train a GPT, remember: it will still not be able to solve this task. Maybe it is time to look for new algorithms.

2/5
Mandatory clarifications and thoughts:

1. This isn't a tokenizer issue. If you use 1 token per symbol, GPT-4 / Opus / etc. will still fail. Byte-based GPTs fail at this task too. Stop blaming the tokenizer for everything.

2. This tweet is meant to be an answer to the following argument. You: "GPTs can't solve new problems". Them: "The average human can't either!". You: <show this prompt>. In other words, this is a simple "new statement" that an average human can solve easily, but current-gen AIs can't.

3. The reason GPTs will never be able to solve this is that they can't perform sustained logical reasoning. It is that simple. Any "new" problem outside of the training set, that requires even a little logical reasoning, will not be solved by GPTs. That's what this aims to show.

4. A powerful GPT (like GPT-4 or Opus) is basically one that has "evolved a circuit designer within its weights". But the rigidness of attention, as a model of computation, doesn't allow such evolved circuit to be flexible enough. It is kinda like AGI is trying to grow inside it, but can't due to imposed computation and communication constraints. Remember, human brains undergo synaptic plasticity all the time. There exists a more flexible architecture that, trained on much smaller scale, would likely result in AGI; but we don't know it yet.

5. The cold truth nobody tells you is that most of the current AI hype is due to humans being bad at understanding scale. Turns out that, once you memorize the entire internet, you look really smart. Everyone on AI is aware of that, it is just not something they say out loud. Most just ride the waves and enjoy the show.

6. GPTs are still extremely powerful. They solve many real-world problems, they turn 10x devs into 1000x devs, and they're accelerating the pace of human progress in such a way that I believe AGI is on the corner. But it will not be a GPT. Nor anything with gradient descent.

7. I may be completely wrong. I'm just a person on the internet. Who is often completely wrong. Read my take and make your own conclusion. You have a brain too!

Prompt:

3/5
Solved the problem... by modifying it? I didn't ask for code.

Byte-based GPTs can't solve it either, so the tokenizer isn't the issue.

If a mathematician couldn't solve such a simple task on their own, would you bet on them solving Riemann's Hypothesis?

4/5
Step 5 is wrong. Got it right by accident. Make it larger.

5/5
I'm baffled on how people are interpreting the challenge as solving that random 7-token instance, rather than the general problem. I should've written <program_here> instead. It was my fault though, so, I apologize, I guess.

1/7
Solved by getting the LLM to simulate the execution of a program that carefully prints out every state mutation and logic operator

Mapping onto A,B,C,D made it work better due to tokenization (OP said this was fine)

Claude beats GPT for this one

2/7
Was doing this technique way back in 2022

3/7
GPT can execute fairly complicated programs as long as you make it print out all state updates. Here is a linear feedback shift register (i.e. pseudorandom number generator). Real Python REPL for reference.

4/7
Another

5/7
Getting ChatGPT to run Dijkstra's Algorithm with narrated state updates

6/7
h/t
@goodside who I learned a lot of this stuff from way back in the prompting dark ages

7/7
I'll have to try a 10+ token one later — what makes you think this highly programmatic approach would fail, though?

1/3
A simple puzzle GPTs will NEVER solve:

As a good programmer, I like isolating issues in the simplest form. So, whenever you find yourself trying to explain why GPTs will never reach AGI - just show them this prompt. It is a braindead question that most children should be able to…

2/3
What do you mean? "B a B" is an intermediate result.

But you're missing the point. It's sufficient for LLM to do just one step. Chaining can be done outside. One step, it can do, even with zero examples. Checkmate, atheists.

One step it can do. With zero examples.

3/3
And note that I did not provide an example. It understands the problem fine, just gets upset by the notation.

bnew · Apr 9, 2024

1/1
Majorly improved GPT-4 Turbo model available now in the API and rolling out in ChatGPT.

Pricing

Simple and flexible. Only pay for what you use.

openai.com

bnew · Apr 9, 2024

1/3
New LLMs & cost transparency, accomplished!

2/3
What is your experience Hieu? I'm no expert on this one, need your advise

3/3
The AI world is truly going insane!

Just the other day, we were discussing Princeton University's open-source AI software engineer SWE-agent outperforming Devin, and now a new contender, AutoCodeRover from Singapore, has dethroned SWE-agent in just a matter of days.

This…

bnew · Apr 9, 2024

1/4
The AI world is truly going insane!

Just the other day, we were discussing Princeton University's open-source AI software engineer SWE-agent outperforming Devin, and now a new contender, AutoCodeRover from Singapore, has dethroned SWE-agent in just a matter of days.

This powerhouse can tackle 67 GitHub issues (bug fixes or feature additions) in under ten minutes per issue, while regular developers take an average of over 2.77 days, all at a minuscule LLM cost of ~$0.5! Truly frightening!

2/4
Check it out!

3/4
New LLMs & cost transparency, accomplished!

4/4
@goon_nguyen has just added 4 more AI models to my settings!

All models come with input & output price (per million tokens).

1/4
it's over, and this is current generation LLMs (gpt-4)

> our approach resolved 67 GitHub issues in less than ten minutes each, whereas developers spent more than 2.77 days on average

2/4
ok not completely over for soft eng, but yea that efficiency gain is insane

3/4
this was the results posted from "devin" (they used 25% random subset)

4/4
code is here

1/1
1. Devin (Agentic AI Software Engineer)
2. followed by Devika: GitHub - stitionai/devika: Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI. (open source Agentic AI Software Engineer)
3. followed by SWE-agent: GitHub - princeton-nlp/SWE-agent: SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.29% of bugs in the SWE-bench evaluation set and takes just 1.5 minutes to run. (resolved ~12.3% of issues on SWE-bench in comparison to Devin's ~13.8%)
4. and now AutoCodeRover: GitHub - nus-apr/auto-code-rover: Autonomous program improvement. (resolved ~22% of issues on SWE-bench)
5. I also came across this another AI Backend Engineer called GibsonAI(closed source).

All within weeks of each other.

1/4
Introducing AutoCodeRover
Presenting our autonomous software engineer from Singapore ! Takes in a Github issue (bug fixing or feature addition), resolves in few minutes, with minimal LLM cost ~$0.5 ! Please RT

GitHub - nus-apr/auto-code-rover: Autonomous program improvement

auto-code-rover/preprint.pdf at main · nus-apr/auto-code-rover

[ 1 / 4]

2/4
Absolutely free for everyone to try out ! And to improve it further!!

3/4
We prefer to run it multiple times - to cater for variations …

4/4
Try it from the following site

https://github.com/nus-apr/auto-code-rover…
https://github.com/nus-apr/auto-code-rover/blob/main/preprint.pdf…
#thursdAI
@ollama

auto-code-rover/preprint.pdf at main · nus-apr/auto-code-rover
GitHub - nus-apr/auto-code-rover: Autonomous program improvement

1/2
AutoCodeRover: Autonomous Software Engineer

Resolves 22% of Github issues in SWE-benchlite in <10 mins at minimal LLM cost ~$0.5
Works on program representation of Abstract Syntax Tree, and exploits program structure in the form of classes/methods/APIs

GitHub - AutoCodeRoverSG/auto-code-rover: A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-bench lite and 46.2% tasks (pass@1) in SWE-bench verified with each task

A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-bench lite and 46.2% tasks (pass@1) in SWE-bench verified with...

github.com

2/2
Introducing AutoCodeRover
Presenting our autonomous software engineer from Singapore ! Takes in a Github issue (bug fixing or feature addition), resolves in few minutes, with minimal LLM cost ~$0.5 ! Please RT

https://github.com/nus-apr/auto-code-rover https://github.com/nus-apr/auto-code-rover/blob/main/preprint.pdf [ 1 / 4]

GitHub - AutoCodeRoverSG/auto-code-rover: A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-bench lite and 46.2% tasks (pass@1) in SWE-bench verified with each task

A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-bench lite and 46.2% tasks (pass@1) in SWE-bench verified with...

github.com

About

Autonomous program improvement

AutoCodeRover: Autonomous Program Improvement

ArXiv Paper

Overview

AutoCodeRover is a fully automated approach for resolving GitHub issues (bug fixing and feature addition) where LLMs are combined with analysis and debugging capabilities to prioritize patch locations ultimately leading to a patch.

On SWE-bench lite, which consists of 300 real-world GitHub issues, AutoCodeRover resolves ~22% of issues, improving over the current state-of-the-art efficacy of AI software engineers.

AutoCodeRover works in two stages:

Context retrieval: The LLM is provided with code search APIs to navigate the codebase and collect relevant context.
Patch generation: The LLM tries to write a patch, based on retrieved context.

Highlights

AutoCodeRover has two unique features:

Code search APIs are Program Structure Aware. Instead of searching over files by plain string matching, AutoCodeRover searches for relevant code context (methods/classes) in the abstract syntax tree.
When a test suite is available, AutoCodeRover can take advantage of test cases to achieve an even higher repair rate, by performing statistical fault localization.

🗎 arXiv Paper

AutoCodeRover: Autonomous Program Improvement

For referring to our work, please cite and mention:

@misc{zhang2024autocoderover, title={AutoCodeRover: Autonomous Program Improvement}, author={Yuntong Zhang and Haifeng Ruan and Zhiyu Fan and Abhik Roychoudhury}, year={2024}, eprint={2404.05427}, archivePrefix={arXiv}, primaryClass={cs.SE} }

Example: Django Issue #32347

As an example, AutoCodeRover successfully fixed issue #32347 of Django. See the demo video for the full process:

https://private-user-images.githubusercontent.com/48704330/320440436-719c7a56-40b8-4f3d-a90e-0069e37baad3.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI2OTM5NTMsIm5iZiI6MTcxMjY5MzY1MywicGF0aCI6Ii80ODcwNDMzMC8zMjA0NDA0MzYtNzE5YzdhNTYtNDBiOC00ZjNkLWE5MGUtMDA2OWUzN2JhYWQzLm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA0MDklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNDA5VDIwMTQxM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTExNGZkMTVmOWM4NWNhOGUzYTVlZGFjODJkNjNlN2FiNzUzN2I1M2E1MWM4ZWE0NTQ1ZDRmN2IwNjc4ZGRjMDQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.eNRYVyfdSuwh5JPFFeNhansQypb-GimykriAzpOkcXc

Enhancement: leveraging test cases

AutoCodeRover can resolve even more issues, if test cases are available. See an example in the video:

bnew · Apr 9, 2024

1/2
Emad Mostaque: last summer it took 20 seconds to generate an image, now we can do 300 images per second and over 1000 with the new Nvidia chips

2/2
Source:

bnew · Apr 9, 2024

1/8
Introducing morph: a fully open-source AI-powered answer engine with a generative UI. Built with @vercel
AI SDK, it delivers awesome streaming results.

More details

2/8
Stack
- App framework:
@nextjs
- Text streaming / Generative UI:
@vercel AI SDK
- Generative Model:
@OpenAI
- Search API:
@tavilyai
- Component library: @shadcn/ui
- Headless component primitives:
@radix_ui
- Styling:
@tailwindcss

3/8
Check out the source code on GitHub:

4/8
Try morph now at Morphic and let me know what you think!

5/8
Learn more about Generative UI:

6/8
Sounds good! I’ll add it to the feature requests.

7/8
Thank you for the error report.
It has been addressed, so please try again.

8/8
Thank you!!

@vercel
@nextjs
@OpenAI
@tavilyai
@radix_ui
@tailwindcss
Morphic

GitHub - miurla/morphic: An AI-powered answer engine with a generative UI
Introducing AI SDK 3.0 with Generative UI support – Vercel

bnew · Apr 10, 2024

1/2
magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%http://2Fopen.demonii.com%3A1337%2Fannounce&tr=http%3A%2F%http://2Ftracker.opentrackr.org%3A1337%2Fannounce

2/2
RELEASE 0535902c85ddbb04d4bebbf4371c6341 lol

Models Table

Open the Models Table in a new tab | Back to LifeArchitect.ai Open the Models Table in a new tab | Back to LifeArchitect.ai Models Table Rankings Data dictionary Model (Text) Name of the large language model. Sometimes uses filename syntax. Lab (Text) Name of the organization or group...

lifearchitect.ai

Models Table
Open the Models Table in a new tab | Back to LifeArchitect.ai
2024 optimal LLM highlights

bnew · Apr 10, 2024

1/3
Introducing our new flagship LLM, Nous-Hermes 2 on Mixtral 8x7B.

Our first model that was trained with RLHF, and the first model to beat Mixtral Instruct in the bulk of popular benchmarks!

We are releasing the SFT only and SFT+DPO model, as well as a qlora adapter for the DPO today.

Mixtral Nous-Hermes 2 DPO: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO · Hugging Face

Mixtral Nous-Hermes 2 SFT: NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT · Hugging Face

Mixtral Nous-Hermes 2 DPO Adapter: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO-adapter · Hugging Face

(1/2)

2/3
In addition,
@togethercompute has already implemented the model on their API, so you can access it immediately from their services.

We have also compiled GGUF versions of the model in all quantization sizes, available from the following links:

SFT+DPO: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO-GGUF · Hugging Face

SFT Only:

3/3
You can try the new models on Together's playground website here:

SFT+DPO Version: https://api.together.xyz/playground/chat/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO

SFT[/URL] Only Version: https://api.together.xyz/playground/chat/NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT

https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO…
https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT…
https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO-adapter…
@togethercompute
https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO-GGUF…
https://api.together.xyz/playground/chat/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO…
https://api.together.xyz/playground/chat/NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT…
@ollama
@Teknium1

NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO-GGUF · Hugging Face
ollama.com
library
Get up and running with large language models.

1/1
Introducing our new flagship LLM, Nous-Hermes 2 on Mixtral 8x7B.

Our first model that was trained with RLHF, and the first model to beat Mixtral Instruct in the bulk of popular benchmarks!

We are releasing the SFT only and SFT+DPO model, as well as a qlora adapter for the DPO…

bnew · Apr 10, 2024

1/7
Some observations/questions:

- Did you know that Gemini traffic is already ~25% of ChatGPT? And Google isn't pushing it through their massive distribution channels yet (Android, Google, GSuite, etc).

- Big on X, but Claude usage is still very low. Should Anthropic advertise?

- ChatGPT is still the big brand, but usage relatively flat over the last year. Why isn't it growing? Is OpenAI compute limited or demand-limited?

2/7
Of course, usage != revenue. ChatGPT has been ~flat for a year but revenue has increased as it's diffused into its most productive applications.

And I assume the paid:unpaid ratio varies widely across Claude, ChatGPT, and Gemini.

3/7
That's a good theory

4/7
Is that ‎Gemini - chat to supercharge your ideas or the GSuite plugins for gmail/docs/etc?

5/7
It's their premiere brand, strategic user-facing surface, and primary revenue source?

6/7
That's definitely a contributor

7/7
I don't believe this narrative of chatbot being a limited UI. Lots of people get paid a big salary to essentially be an intelligent chatbot for their employer.

1/7
New @Google
developer launch today:

- Gemini 1.5 Pro is now available in 180+ countries via the Gemini API in public preview
- Supports audio (speech) understanding capability, and a new File API to make it easy to handle files
- New embedding model!

2/7
Please send over feedback as you use the API, AI Studio, and our new models : )

3/7
For those asking about EU access, work is underway, expect more updates in the coming weeks!

4/7
Agreed, we are working on it : )

5/7
Google AI Studio and the Gemini API are accessible through https://aistudio.google.com

6/7
Today : )

7/7
I'll ping Alex and see what is needed on that front

@Google
https://aistudio.google.com
@OpenRouterAI

Gemini 1.5 Pro Now Available in 180+ Countries; With Native Audio Understanding, System Instructi...
Gemini API Python quickstart | Google AI for Developers

bnew · Apr 10, 2024

1/4
Introducing `gemini-youtube-researcher`

An open-source Gemini 1.5 Pro agent that LISTENS to videos and delivers topical reports.

Just provide a topic, and a chain of AIs with access to YouTube will analyze relevant videos and generate a comprehensive report for you.

2/4
This uses the new Gemini 1.5 Pro API that was released today.

It currently only supports listening to the audio content of videos. If anyone wants, please feel free to add support for video frames as well.

3/4
How it works, in a nutshell:
- User provides a topic
- SERPAPI gathers relevant YouTube links
- A separate Gemini 1.5 instance listens to + summarizes each video
- A final Gemini instance takes in all of the summaries, and generates a final, comprehensive report

4/4
If you'd like to try it or contribute, check out the Github repo.

GitHub - mshumer/ai-researcher
GitHub - jtf512/AI_LD_debate

bnew · Apr 10, 2024

1/6
Devin is a knowledge moat, and OSS will crush them by end of year

Once again — the next tier of models will obliterate most software companies

2/6
You can build a cult. Of human people

3/6
You can’t beat this — this is the secret ha

4/6
I expect this is what they’ll end up doing

Everyone with a tonne of VC dollars will eventually realize this

See Together AI for eg

5/6
There are a few wealthy investors who’ve been pitched Devin at $2b valuation — I’ve seen the memos lmfao

Utter junk. It’s insane how this is happening just after the 2021 bubble

6/6
lol

The A.I Megathread (LLM , GPT , Development)

Not Allen Iverson

Japanese researchers say they used AI to try and translate the noises of clucking chickens and learn whether they're excited, hungry, or scared​

Veteran

Veteran

Mathematics > Optimization and Control​

Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra​

Submission history​

Veteran

Alibaba Chief Says China AI 2 Years Behind US, How Humor Forum Unexpectedly Makes AI Smarter, and China Approves 117 Gen-AI Models​

Weekly China AI News from March 25, 2024 to April 3, 2024​

Alibaba Chair Joe Tsai on China AI, Chip Restrictions, and Homegrown GPUs​

Chinese Reddit-like Humor Forum Ruozhiba (弱智吧) Unexpectedly Makes AI Smarter​

China Approves 117 Generative AI Models for Public Use​

Veteran

Veteran

Veteran

Veteran

About​

AutoCodeRover: Autonomous Program Improvement​

Overview​

Highlights​

🗎 arXiv Paper​

AutoCodeRover: Autonomous Program Improvement​

Example: Django Issue #32347​

Enhancement: leveraging test cases​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Japanese researchers say they used AI to try and translate the noises of clucking chickens and learn whether they're excited, hungry, or scared

Mathematics > Optimization and Control

Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra

Submission history

Alibaba Chief Says China AI 2 Years Behind US, How Humor Forum Unexpectedly Makes AI Smarter, and China Approves 117 Gen-AI Models

Weekly China AI News from March 25, 2024 to April 3, 2024

Alibaba Chair Joe Tsai on China AI, Chip Restrictions, and Homegrown GPUs

Chinese Reddit-like Humor Forum Ruozhiba (弱智吧) Unexpectedly Makes AI Smarter

China Approves 117 Generative AI Models for Public Use

About

AutoCodeRover: Autonomous Program Improvement

Overview

Highlights

🗎 arXiv Paper

AutoCodeRover: Autonomous Program Improvement

Example: Django Issue #32347

Enhancement: leveraging test cases