bnew

Veteran
Joined
Nov 1, 2015
Messages
56,216
Reputation
8,251
Daps
157,917

https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit#gid=2011456595

LLM Logic Tests​


spreadsheet testing LLM's
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,216
Reputation
8,251
Daps
157,917





WizardCoder: Empowering Code Large Language Models with Evol-Instruct​

Python 3.9+

To develop our WizardCoder model, we begin by adapting the Evol-Instruct method specifically for coding tasks. This involves tailoring the prompt to the domain of code-related instructions. Subsequently, we fine-tune the Code LLM, StarCoder, utilizing the newly created instruction-following training set.

News​

Comparing WizardCoder with the Closed-Source Models.​

🔥 The following figure shows that our WizardCoder attains the third position in this benchmark, surpassing Claude-Plus (59.8 vs. 53.0) and Bard (59.8 vs. 44.5). Notably, our model exhibits a substantially smaller size compared to these models.

WizardCoder

❗Note: In this study, we copy the scores for HumanEval and HumanEval+ from the LLM-Humaneval-Benchmarks. Notably, all the mentioned models generate code solutions for each problem utilizing a single attempt, and the resulting pass rate percentage is reported. Our WizardCoder generates answers using greedy decoding and tests with the same code.

Comparing WizardCoder with the Open-Source Models.​

The following table clearly demonstrates that our WizardCoder exhibits a substantial performance advantage over all the open-source models. ❗If you are confused with the different scores of our model (57.3 and 59.8), please check the Notes.

ModelHumanEval Pass@1MBPP Pass@1
CodeGen-16B-Multi18.320.9
CodeGeeX22.924.4
LLaMA-33B21.730.2
LLaMA-65B23.737.7
PaLM-540B26.236.8
PaLM-Coder-540B36.047.0
PaLM 2-S37.650.0
CodeGen-16B-Mono29.335.3
Code-Cushman-00133.545.9
StarCoder-15B33.643.6*
InstructCodeT5+35.0--
WizardLM-30B 1.037.8--
WizardCoder-15B 1.057.351.8
❗Note: The reproduced result of StarCoder on MBPP.

❗Note: The above table conducts a comprehensive comparison of our WizardCoder with other models on the HumanEval and MBPP benchmarks. We adhere to the approach outlined in previous studies by generating 20 samples for each problem to estimate the pass@1 score and evaluate with the same code. The scores of GPT4 and GPT3.5 reported by OpenAI are 67.0 and 48.1 (maybe these are the early version GPT4&3.5).

Call for Feedbacks​

We welcome everyone to use your professional and difficult instructions to evaluate WizardCoder, and show us examples of poor performance and your suggestions in the issue discussion area. We are focusing on improving the Evol-Instruct now and hope to relieve existing weaknesses and issues in the the next version of WizardCoder. After that, we will open the code and pipeline of up-to-date Evol-Instruct algorithm and work with you together to improve it.

Contents​

  1. Online Demo
  2. Fine-tuning
  3. Inference
  4. Evaluation
  5. Citation
  6. Disclaimer

Online Demo​

We will provide our latest models for you to try for as long as possible. If you find a link is not working, please try another one. At the same time, please try as many real-world and challenging code-related problems that you encounter in your work and life as possible. We will continue to evolve our models with your feedbacks.

Demo Link (We adopt the greedy decoding now.)

Fine-tuning​

We fine-tune WizardCoder using the modified code train.py from Llama-X. We fine-tune StarCoder-15B with the following hyperparameters:

HyperparameterStarCoder-15B
Batch size512
Learning rate2e-5
Epochs3
Max length2048
Warmup step30
LR schedulercosine
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,216
Reputation
8,251
Daps
157,917

AI algorithms find drugs that could combat ageing​

Three drugs that could help stave off the effects of ageing have been discovered using artificial intelligence (AI), a study suggests.
Stock image of woman pouring pills into her palm from a bottle

A trio of chemicals that target faulty cells linked to a range of age-related conditions were found using the pioneering method, which is hundreds of times cheaper than standard screening methods, researchers say.

Findings suggest the drugs can safely remove defective cells – known as senescent cells – linked to conditions including cancer, Alzheimer’s disease and declines in eyesight and mobility.

Harmful side-effects

While previous studies have shown early promise, until now few chemicals that can safely eliminate senescent cells have been identified.
These senolytic drugs are often highly toxic against normal, healthy cells in the body, researchers say.

Machine learning

Now, a team led by Edinburgh researchers has devised a way of discovering senolytic drugs using AI.

They developed a machine learning model by training it to recognise the key features of chemicals with senolytic activity, using data from more than 2,500 chemical structures mined from previous studies.

The team then used the models to screen more than 4,000 chemicals, identifying 21 potential drug candidates for experimental testing.


Natural products

Lab tests in human cells revealed that three of the chemicals – called ginkgetin, periplocin and oleandrin – were able to remove senescent cells without damaging healthy cells.

All three are natural products found in traditional herbal medicines, the team says. Oleandrin was found to be more effective than the best-performing known senolytic drug of its kind.


The study, published in the journal Nature Communications, was supported by the Medical Research Council, Cancer Research UK, United Kingdom Research and Innovation (UKRI) and the Spanish National Research Council.

It also involved researchers from the University of Cantabria, Spain, and the Alan Turing Institute.

Research milestone

The study is the latest development in computer science and AI since the University established its first research hubs in the disciplines 60 years ago.

A year-long programme of events will mark achievements over the past six decades and look to the future of computer science and AI at Edinburgh.

More information about the 60 year celebration:
edin.ac/60-years-computer-science-ai

Related links​

School of Informatics
Institute of Genetics and Cancer
Journal paper

Image credit: Antonio_Diaz via Getty Images
This article was published on 14 Jun, 2023
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,216
Reputation
8,251
Daps
157,917



Abstract​

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction finetuning. In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. Through comprehensive experiments on four prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, and DS1000, we unveil the exceptional capabilities of our model. It surpasses all other open-source Code LLMs by a substantial margin. Moreover, our model even outperforms the largest closed LLMs, Anthropic’s Claude and Google’s Bard, on HumanEval and HumanEval+. Our code, model weights, and data are public at GitHub - nlpxucan/WizardLM: Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder.

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,216
Reputation
8,251
Daps
157,917

Artificial intelligence is not yet as smart as a dog, Meta A.I. chief says​

PUBLISHED THU, JUN 15 2023 8:59 AM EDT
UPDATED THU, JUN 15 2023 10:48 AM EDT

Arjun Kharpal

  • Current artificial intelligence systems like ChatGPT do not have human-level intelligence and are not even as smart as a dog, Meta's AI chief Yann LeCunn said.
  • LeCun talked about the limitations of generative AI, such as ChatGPT, and said they are not very intelligent because they are solely trained on language.
  • Meta's LeCun said that, in the future, there will be machines that are more intelligent than humans, which should not be seen as a threat.
107257338-1686828873084-gettyimages-1498441843-_r7a9820_k0cidqnk.jpeg

Chief AI Scientist at Meta Yann LeCun spoke at the Viva Tech conference in Paris and said that artificial intelligence does not currently have human-level intelligence but could do one day.
Chesnot | Getty Images News | Getty Images

Current artificial intelligence systems like ChatGPT do not have human-level intelligence and are barely smarter than a dog, Meta's AI chief said, as the debate over the dangers of the fast-growing technology rages on.

ChatGPT, developed by OpenAI, is based on a so-called large language model. This means that the AI system was trained on huge amounts of language data that allows a user to prompt it with questions and requests, while the chatbot replies in language we understand.

The fast-paced development of AI has sparked concern from major technologists that, if unchecked, the technology could pose dangers to society. Tesla CEO Elon Musk said this year that AI is "one of the biggest risks to the future of civilization."

At the Viva Tech conference on Wednesday, Jacques Attali, a French economic and social theorist who writes about technology, said whether AI is good or bad will depend on its use.

"If you use AI to develop more fossil fuels, it will be terrible. If you use AI [to] develop more terrible weapons, it will be terrible," Attali said. "On the contrary, AI can be amazing for health, amazing for education, amazing for culture."

At the same panel, Yann LeCun, chief AI scientist at Facebook parent Meta, was asked about the current limitations of AI. He focused on generative AI trained on large language models, saying they are not very intelligent, because they are solely coached on language.

"Those systems are still very limited, they don't have any understanding of the underlying reality of the real world, because they are purely trained on text, massive amount of text," LeCun said.

"Most of human knowledge has nothing to do with language … so that part of the human experience is not captured by AI."

LeCun added that an AI system could now pass the Bar in the U.S., an examination required for someone to become an attorney. However, he said AI can't load a dishwasher, which a 10-year old could "learn in 10 minutes."

"What it tells you we are missing something really big … to reach not just human level intelligence, but even dog intelligence," LeCun concluded.

Meta's AI chief said the company is working on training AI on video, rather than just on language, which is a tougher task.

In another example of current AI limitations, he said a five-month-old baby would look at an object floating and not think too much of it. However, a nine-month year old baby would look at this item and be surprised, as it realizes that an object shouldn't float.

LeCun said we have "no idea how to reproduce this capacity with machines today. Until we can do this, we are not going to have human-level intelligence, we are not going to have dog level or cat level [intelligence]."

Will robots take over?​

Striking a pessimistic tone about the future, Attali said, "It is well known mankind is facing many dangers in the next three or four decades."
He noted climate disasters and war among his top concerns, also noting he is worried that robots "will turn against us."
During the conversation, Meta's LeCun said that, in the future, there will be machines that are more intelligent than humans, which should not be seen as posing a danger.
"We should not see this as a threat, we should see this as something very beneficial. Every one of us will have an AI assistant … it will be like a staff to assist you in your daily life that is smarter than yourself," LeCun said.
The scientist added that these AI systems need to be created as "controllable and basically subservient to humans." He also dismissed the notion that robots would take over the world.
"A fear that has been popularized by science fictions [is], that if robots are smarter than us, they are going to want to take over the world … there is no correlation between being smart and wanting to take over," LeCun said.

Ethics and regulation of A.I.​

While looking at the dangers and opportunities of AI, Attali concluded that there need to be guardrails in place for the development of the technology. But he was unsure who would do that.
"Who is going to put the borders?," he asked.

AI regulation has been a hot topic at Viva Tech. The European Union is pushing forward with its own AI legislation, while France's top government ministers told CNBC this week that the country wants to see global regulation of the technology.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,216
Reputation
8,251
Daps
157,917


Some doctors are using AI-chatbots like ChatGPT to help them deliver bad news to patients in a compassionate way, report says​


Beatrice Nolan Jun 15, 2023, 7:22 AM EDT

a doctor preparing mifepristone pills in his office

ChatGPT has proved to have impressive medical knowledge. The Washington Post/Getty Images

  • Some doctors are turning to chatbots to help them communicate with patients, per The New York Times.
  • Doctors were quick to find uses for OpenAI's viral product after it was launched in November.
  • Some practitioners began using ChatGPT 72 hours after it was publicly released, the Times reported.

Some doctors are using AI-powered chatbots to help them find compassionate ways to break bad news to patients, The New York Times reported.

Doctors were quick to find uses for OpenAI's viral product after it was launched in November.

The Times reported that Peter Lee, the corporate vice president for research and incubations at OpenAI investor Microsoft, found the chatbot had been regularly helping doctors communicate with patients more compassionately.

Some practitioners started using ChatGPT 72 hours after it was released to the public, the report added.

ChatGPT has proved to have impressive medical knowledge and there's evidence that the bots may even help to improve a doctor's bedside manner.

One study found that medical experts said ChatGPT's responses to patient questions were of a higher quality and more empathetic than human doctors. On average, the chatbot answers were rated seven times as empathetic as the doctors,' researchers from the University of California in San Diego found.

In the study, medical experts preferred the AI chatbot's response to the physician's in 78.6% of the 585 scenarios.

ChatGPT has already passed the US Medical Licensing Exam, Insider previously reported. OpenAI's new iteration, GPT-4, has been found to have even better clinical judgment, per the report, which cited a doctor and Harvard computer scientist.

AI-powered products like ChatGPT, however, can still make mistakes or misdiagnose, sparking concern about how some patients may use them.

Representatives for OpenAI did not immediately respond to Insider's request for comment, made outside normal working hours.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,216
Reputation
8,251
Daps
157,917

GOOGLE’S NEW AI DESCRIBES X-RAYS AND ANSWERS PATIENT QUESTIONS​


Google has unveiled PaLM 2, an AI platform for analyzing medical data. It aims to assist doctors with routine tasks and provide more reliable answers to patient questions than “Dr. Google.”

Google’s new ai describes x-rays and answers patient questions

AI in healthcare will enhance work efficiency, diagnostic quality, treatment outcomes, and automate care processes, despite being unable to replace doctors.


While PaLM 2 cannot replace doctors, it is going to be hard to work without one – AI in healthcare will improve work efficiency, the quality of diagnoses and treatment outcomes, and automate specific care processes.

At Alphabet’s annual Google I/O conference, the big tech company unveiled a new large language model (LLM) to compete with ChatGPT. Called Bard, it supports more than 100 languages and gets A’s on language and medical tests. It will soon be integrated into platforms like Gmail to write emails autonomously.

What does this mean for healthcare?

PaLM 2, the Pathways Language Model, is more critical than Bard for medicine. With 540 billion parameters, it draws knowledge from scientific papers and websites, can reason logically, and perform complex mathematical calculations.

Google plans to release 25 new products and features based on PaLM 2. One of them is for doctors – during the event, Google’s CEO, Sundar Pichai, demonstrated how AI could describe X-rays. You just need to ask the AI system the right question like “What is on this picture” or “Can you write me a report analyzing this chest x-ray?”.

Doctors to learn prompts

Later this summer, Med-PaLM 2 will be made available to a select group of customers using Google’s Cloud. They will have the opportunity to test the model, and their feedback should then be used to further improve it. The company aims to synthesize data from images and electronic medical records to improve patient outcomes in the future.

This development highlights the importance of physicians mastering the ability to formulate commands (prompts) to communicate seamlessly with artificial intelligence. Prompts refer to the AI language that allows us to ask AI to perform specific tasks, such as describing a mammogram or generating a creative image in a chosen style.

The first hospitals are hiring AI engineers to test ChatGPT and LLMs

Boston Children’s Hospital posted a job listing for an “AI Prompt Engineer” – a specialist in AI query handling. Candidates must have experience in AI and machine learning. Their task will be to test the use of ChatGPT in healthcare and hospital activities.

Using Large Language Models (LLMs) requires specific skills, particularly in formulating queries (i.e., prompts). The accuracy of the algorithms’ responses depends on this. Is learning the use of an AI language soon to be included in medical school and continuing education curricula? It is pretty likely.
The job advertisement published by Boston Children’s Hospital offers insights into the necessary knowledge and experience. The AI Query Creation Engineer will be responsible for the following:

  • Designing and developing AI queries using large language models (e.g., ChatGPT) and other solutions emerging from healthcare research and clinical practice;
  • Collaborating with researchers and clinicians to understand their needs and to design AI queries for data collection;
  • Implementing machine learning models for data analysis;
  • Refining language models;
  • Testing and evaluating AI query performance;
  • Optimizing existing query libraries and machine learning best practices;
  • Helping other team members build AI queries;
  • Tracking the progress of AI in healthcare.
 
Top