AI worse than humans in every way at summarising information, government trial finds​

A test of AI for Australia's corporate regulator found that the technology might actually make more work for people, not less.

Sep 03, 2024


UPDATED: 9.18AM, Sep 04

Artificial intelligence is worse than humans in every way at summarising documents and might actually create additional work for people, a government trial of the technology has found.

Amazon conducted the test earlier this year for Australia’s corporate regulator the Securities and Investments Commission (ASIC) using submissions made to an inquiry. The outcome of the trial was revealed in an answer to a questions on notice at the Senate select committee on adopting artificial intelligence.

The test involved testing generative AI models before selecting one to ingest five submissions from a parliamentary inquiry into audit and consultancy firms. The most promising model, Meta’s open source model Llama2-70B, was prompted to summarise the submissions with a focus on ASIC mentions, recommendations, references to more regulation, and to include the page references and context.

Ten ASIC staff, of varying levels of seniority, were also given the same task with similar prompts. Then, a group of reviewers blindly assessed the summaries produced by both humans and AI for coherency, length, ASIC references, regulation references and for identifying recommendations. They were unaware that this exercise involved AI at all.

These reviewers overwhelmingly found that the human summaries beat out their AI competitors on every criteria and on every submission, scoring an 81% on an internal rubric compared with the machine’s 47%.

Human summaries ran up the score by significantly outperforming on identifying references to ASIC documents in the long document, a type of task that the report notes is a “notoriously hard task” for this type of AI. But humans still beat the technology across the board.

Reviewers told the report’s authors that AI summaries often missed emphasis, nuance and context; included incorrect information or missed relevant information; and sometimes focused on auxiliary points or introduced irrelevant information. Three of the five reviewers said they guessed that they were reviewing AI content.

The reviewers’ overall feedback was that they felt AI summaries may be counterproductive and create further work because of the need to fact-check and refer to original submissions which communicated the message better and more concisely.

The report mentions some limitations and context to this study: the model used has already been superseded by one with further capabilities which may improve its ability to summarise information, and that Amazon increased the model’s performance by refining its prompts and inputs, suggesting that there are further improvements that are possible. It includes optimism that this task may one day be competently undertaken by machines.

But until then, the trial showed that a human’s ability to parse and critically analyse information is unparalleled by AI, the report said.

“This finding also supports the view that GenAI should be positioned as a tool to augment and not replace human tasks,” the report concluded.

Greens Senator David Shoebridge, whose question to ASIC prompted the publishing of the report, said that it was “hardly surprising” that humans were better than AI at this task. He also said it raised questions about how the public might feel about using AI to read their inquiry submissions.

“This of course doesn’t mean there is never a role for AI in assessing submissions, but if it has a role it must be transparent and supportive of human assessments and not stand-alone,” he said.

“It’s good to see government departments undertaking considered exercises like this for AI use, but it would be better if it was then proactively and routinely disclosed rather than needing to be requested in Senate committee hearings.”


Breh, did a human write this headline?

The report mentions some limitations and context to this study: the model used has already been superseded by one with further capabilities which may improve its ability to summarise information, and that Amazon increased the model's performance by refining its prompts and inputs, suggesting that there are further improvements that are possible. It includes optimism that this task may one day be competently undertaken by machines.


This is the final warning for those considering careers as physicians: AI is becoming so advanced that the demand for human doctors will significantly decrease, especially in roles involving standard diagnostics and routine treatments, which will be increasingly replaced by AI.

This is underscored by the massive performance leap of OpenAI’s o-1 model, also known as the “Strawberry” model, which was released as a preview yesterday. The model performs exceptionally well on a specialized medical dataset (AgentClinic-MedQA), greatly outperforming GPT-4o. The rapid advancements in AI’s ability to process complex medical information, deliver accurate diagnoses, provide medical advice, and recommend treatments will only accelerate.

Medical tasks like diagnosing illnesses, interpreting medical imaging, and formulating treatment plans will soon be handled by AI systems with greater speed and consistency than human practitioners. As the healthcare landscape evolves in the coming years, the number of doctors needed will drastically shrink, with more reliance on AI-assisted healthcare systems.

While human empathy, critical thinking, and decision-making will still play an important role in certain areas of medicine, even these may eventually be supplanted by future iterations of models like o-1.

Consequently, medicine is becoming a less appealing career path for the next generation of doctors—unless they specialize in intervention-focused areas (such as surgery, emergency medicine, and other interventional specialties), though these, too, may eventually be overtaken by robotic systems…maybe within a decade or so.

Doktorluk kariyeri düşünenler için son uyarımdır: Yapay zeka o kadar hızlı gelişiyor ki, insan doktorlara olan talep önemli ölçüde azalacak ve özellikle standart teşhis ve rutin tedavilerde yapay zeka ile yer değiştirilecek.

Bu, dün önizleme olarak yayımlanan OpenAI'nin o-1 modeli, diğer adıyla "Çilek" modelindeki büyük performans sıçramasıyla daha da netleşti. Bu model, GPT-4o'yu büyük ölçüde geride bırakarak, özel bir tıbbi veri seti olan AgentClinic-MedQA'da son derece başarılı sonuçlar verdi. Yapay zekanın karmaşık tıbbi bilgileri işleyebilme, doğru teşhisler koyabilme, tıbbi tavsiyeler verebilme ve tedavi önerileri sunabilme yeteneğindeki bu hızlı ilerleme devam edecek.

Hastalıkları teşhis etme, tıbbi görüntülemeyi yorumlama ve tedavi planları oluşturma gibi tıbbi görevler yakında yapay zeka sistemleri tarafından insan doktorlardan daha hızlı ve tutarlı bir şekilde gerçekleştirilecek. Önümüzdeki yıllarda sağlık sisteminin dönüşmesiyle birlikte doktorlara olan ihtiyaç büyük ölçüde azalacak ve yapay zeka destekli sağlık sistemlerine daha fazla güvenilecektir.

İnsan empatisi, eleştirel düşünme ve karar verme, tıbbın bazı alanlarında hala önemli bir rol oynasa da, bunlar bile gelecekte o-1 benzeri modellerin sonraki versiyonları tarafından devralınacaktır. Bu nedenle, tıp alanı gelecek nesiller için çok daha az cazip bir kariyer haline gelecek—sadece müdahale odaklı alanlar (cerrahi, acil tıp ve diğer müdahaleci uzmanlıklar gibi) bir süre daha değerli olabilir. Ancak bu alanlar da önümüzdeki 10-15 yıl içinde robot doktorlar tarafından devralınabilir.

I remember twenty years ago when it was a popular prediction to say radiologists would be completely automated out of existence by the 2020s,

Now radiology salaries have hit new highs in most markets, including the U.S.

If you’re interested in medicine, get your medical or nursing degree. AI will never fully automate the long human tradition of medicine.

Doctors will stay the safest job on the planet.

I strongly disagree. Nursing jobs will be safer than doctor jobs in a decade.

It seems some people didn’t quite understand my post here. Let me clarify: I didn’t say the medical profession will completely disappear. However, I pointed out that there will be a need for far fewer doctors a decade from now. Therefore, only the top 10-20% of physicians who are truly dedicated and outstanding will continue to have fulfilling jobs. If you are passionate about medicine, you should still pursue it as a career, but bear in mind that it will no longer be a high-paying and secure job in the future.

In a separate post, I will explain why, in the near future, patients will welcome doctors working with AI or may even begin to prefer AI doctors. In fact it will become unethical and even malpractice not to use AI in diagnostics and treatment.

I realize this is disconcerting for many, and it may be difficult to imagine or accept. I empathize, having spent years working hard to train as a physician. Although, I followed my passion in science instead of pursuing a more secure, high-income job as a doctor.

In the end, what matters is the value you provide for the greater good of humanity. I strongly believe that AI will bring unimaginable benefits, saving lives and helping people live long, healthy lives.

As the original poster of @SRSchmidgall's figure I respectully disagree that this is a final warning for clinicians to become obsolete.

I believe clinicians will be empowered by being augmented with up-to-date knowledge, guidelines and DDx ideas at their fingers' tip with increasingly better UIs.

I agree though that certain abusive healthcare economies (with bad incentives) may find ways to provide cheaper & human-free care delivery that is not necessarily better for patients. It's up to us not to let this happen.

But I didn’t say clinicians will become completely obsolete. Please see the follow-up for further clarification on what I meant. It’s not too different from your point. Also, please consider the advances we’ll see in the next 5-10 years-this technology will progress exponentially

How does this performance compare to MDs?

Let me just give one statistic:

“An estimated 795 000 Americans become permanently disabled or die annually across care settings because dangerous diseases are misdiagnosed.”

Extrapolating worldwide that’s millions of people and most are not even complicated cases.

We have a shortage of doctors because they have to do so much paperwork and handle a lot of routine work. If anything I expect AI to make the healthcare sector better

Will definitely make it better.

If OpenAI's o1 can pass OpenAI's research engineer hiring interview for coding -- 90% to 100% rate...

......then why would they continue to hire actual human engineers for this position?

Every company is about to ask this question.

Yes. I've been saying the age of AI coding is here for the last year...... this just takes it to another level



I very much agree with this, with the caveat that understanding how programming **works** is still valuable.

This, however, is different from "learning programming" the way it's been learned the last four decades.

Ah, and if there's one thing AI agents are bad at, it's understanding data systems. ....wait.....

Google just dropped NotebookLM.

It generates podcasts with two speakers discussing content from research papers, articles, and more.

Here are 12 mind-blowing examples: 🤯


Googles NotebookLM's new podcast feature is wild

This is made from a 90min lecture I held on Monday

It condensed it into a 16 minute talkshow

Some hallucinations here and there, but overall this is a new paradigm for learning.

Link to try it below, no waitlist


tried out the new NotebookLM from @labsdotgoogle to create a podcast based on a reddit thread on @kentcdodds ‘ course. pretty impressive results


So cool. Turned a blogpost about "Ducking" (a technique used in audio engineering) into a conversation with Google NotebookLM and used Tuneform te generate a video of it.

Here's the original blog: noiseengineering.us/blogs/lo…

Learn the latest AI developments in 3 minutes a day, Subscribe to The 8020AI it's FREE.

Get 1k mega prompts & 30+ AI guides today for FREE: 80/20 AI


Just had my 3rd wow moment in AI... this time through AI Overview by NotebookLM 🤯


[Quoted tweet]
This AI service is so impressive! Google's NotebookLM is now capable of generating an audio overview based on documents uploaded and links to online resources.

I uploaded my bachelors thesis, my resume, and a link to my online course website and it created this really cool podcast like format.

It didn't get everything right but its so funny because NotebookLM actually drew great conclusions that I didn’t think about while writing this thesis myself.

Which AI tool could create a video for this audio file?

@labsdotgoogle #RenewableEnergy #offgridpower #batterystorage #SolarEnergy #AI


Estuve probando NotebookLM de @Google y quedé sorprendida.

Convertí uno de mis artículos de Substack en un podcast, y hasta tiene conversaciones entre IA sobre el tema.

Ahora puedo escuchar mi contenido en lugar de leerlo, y me encanta. Súper fluido:


[Quoted tweet]
A podcast by Google Notebook LM from YouTube videos uploaded on YouTube from Sept 9-13th. #ai #highered #notebooklm #google

How was this produced?

1. Searched YouTube for “Artificial Intelligence in Higher Education”
2. Used filters to limit videos to uploaded this week that are 20 mins or longer.
3. For each video, shared with “Summarify” an iPhone app that summarizes YouTube videos given URL. Download the summary as pdf on iPhone.
4. Upload PDFs (20 files) to Notebook LM
5. Generate Podcast audio in Notebook LM. Then download .wav file.
6. Generate image using ideogram.ai (prompt is “YouTube videos of artificial intelligence in higher education”. Download image.
6. Upload .wav file to iPhone app (Headliner) to convert .wav to waveform. Use the image in number 6 as the background for the waveform.

And you have below.


Gave Google NotebookLM the transcript for my Fluxgym video and it created this podcast type discussion of it. Video is audio only. This is wild. 😂


[Quoted tweet]
Do you know what’s even more interesting than OpenAI’s o1 🍓?

A podcast generated directly from the information provided by @openai by NotebookLLM from @GoogleAI.

So cool! @OfficialLoganK


It's never been easier to create a faceless channel.

You could use Google's new NotebookLM to create engaging, short form content channel with such minimal effort

Here is an example where I fed it ONE URL - /r/StableDiffusion


🪄Want to see some AI magic? You can now “record” an engaging, studio quality, 12 min podcast on any topic in under 5 min. Yup, you read that correctly.

Here’s how 👇

1) I used NotebookLM by Google to synthesize a few content sources on scaling a product post MVP.
2) NotebookLM now offers a “Generate Audio” option, which creates an incredibly engaging script and audio that sounds indistinguishable from actual podcast hosts.
3) Upload to Spotify
4) Profit?


Longtime followers may remember that a couple months ago, I was trying to auto-generate a podcast every day based on HN articles.

I got OK results, but you could still tell it was fake. I gave up.

ANYWAY here's what you can do with Google's new NotebookLM. It's so good!

