Study shows ChatGPT can produce medical record notes 10 times faster than doctors without compromising quality

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,219
Reputation
8,625
Daps
161,896

1/8
@DKThomp
Fascinating paper (and post!) on AI in medicine

- cardiologists working with AI said the machine's diagnosis, triage and management were equal or better than human cardiologists in most areas

- the paper's summary comes with a generative AI podcast explaining the results

[Quoted tweet]
One of the most promising areas for the application of AI in medicine is scaling specialty expertise. There simply aren't enough specialist doctors to care for everyone in need. We believe AI can help.

As a first step towards that goal, we worked with the amazing Google medical AI team to tune and test their conversational agent AMIE in the setting of Stanford's Center for Inherited Cardiovascular Disease.

Unlike many medical studies of LLMs, we completed our testing not with curated cases or exam questions but real-world medical data presented in exactly the way we receive it in clinic.

Data was in the form of reports derived from multi-modal data sources including medical records, ecgs, stress tests, imaging tests, and genomic data. AMIE was augmented by web search and self-critique capabilities and used chain-of-reasoning strategies fine-tuned on data from just 9 typical patients.

What did we find?

1. Overall, AMIE responses on diagnosis, triage and management were rated by specialty cardiologists as equivalent to or better than those of general cardiologists across 10 domains.

2. Access to AMIE's responses improved the general cardiologists' responses in almost two thirds of cases.

3. Qualitative data suggested that the AI and human approaches were highly complementary with AMIE judged thorough and sensitive and general cardiologists judged concise and specific.

In conclusion, our data suggest that LLMs such as AMIE could usefully democratize subspecialty medical expertise augmenting general cardiologists' assessments of inherited cardiovascular disease patients.

Paper: arxiv.org/abs/2410.03741
Generative podcast describing the paper (!): shorturl.at/rdKZn
Stanford Center for Inherited Cardiovascular Disease: med.stanford.edu/familyheart
AMIE: arxiv.org/abs/2401.05654

Congrats to @DrJackOSullivan and @taotu831 for leading the charge on this work as well as the @StanfordDeptMed team and the amazing folks @Google led by @alan_karthi and @vivnat


GZbHUyuasAIERRN.png


2/8
@Jill992004231
AI is coming at us very, very quickly.



3/8
@ian_sportsdev
So? Tech is easy, politics is hard đŸ˜©



4/8
@fl_saloni
Cardiologists are the gatekeepers for adoption of this AI and most (NOT all) will create mistrust in AI to preserve their status and comp. How do you overcome this



5/8
@Spear_Owl
AI+DR>DR



6/8
@mario_anchor
The question is will specialty caregivers allow this. Their lobbying apparatus has already created an artificial doctor shortage to prop up wages.



7/8
@EscoboomVanilla
Also this!

Wimbledon staff left devastated after decision to break 147-year tradition and put 300 jobs at risk



8/8
@PatrickPatten8
I think medical and education is where AI will have the biggest impact. Hopefully America rethinks what an educated population looks like, cause right now 47% think fascism is a good idea... hopefully we can do better.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,219
Reputation
8,625
Daps
161,896

96% Accuracy: Harvard Scientists Unveil Revolutionary ChatGPT-Like AI for Cancer Diagnosis​


By Harvard Medical SchoolOctober 17, 2024No Comments9 Mins Read


Identifying Cancer Cells
Scientists at Harvard Medical School have developed a versatile AI model called CHIEF that can diagnose and predict outcomes for multiple cancer types, outperforming existing AI systems. Trained on millions of images, it can detect cancer cells, predict tumor genetic profiles, and forecast patient survival with high accuracy.

A ChatGPT-like model can diagnose cancer, assist in selecting treatment options, and predict survival outcomes across various cancer types.​


Researchers at Harvard Medical School have developed a versatile AI model, similar to ChatGPT, that can perform a wide range of diagnostic tasks across various types of cancer.

The new AI system, described Sept. 4 in Nature, goes a step beyond many current AI approaches to cancer diagnosis, the researchers said.

Current AI systems are typically trained to perform specific tasks — such as detecting cancer presence or predicting a tumor’s genetic profile — and they tend to work only in a handful of cancer types. By contrast, the new model can perform a wide array of tasks and was tested on 19 cancer types, giving it a flexibility like that of large language models such as ChatGPT.

While other foundation AI models for medical diagnosis based on pathology images have emerged recently, this is believed to be the first to predict patient outcomes and validate them across several international patient groups.

“Our ambition was to create a nimble, versatile ChatGPT-like AI platform that can perform a broad range of cancer evaluation tasks,” said study senior author Kun-Hsing Yu, assistant professor of biomedical informatics in the Blavatnik Institute at Harvard Medical School. “Our model turned out to be very useful across multiple tasks related to cancer detection, prognosis, and treatment response across multiple cancers.”

The AI model, which works by reading digital slides of tumor tissues, detects cancer cells and predicts a tumor’s molecular profile based on cellular features seen on the image with superior accuracy

How close the measured value conforms to the correct value.

" data-gt-translate-attributes="[{"attribute":"data-cmtooltip", "format":"html"}]" tabindex="0" role="link" style="box-sizing: inherit; -webkit-font-smoothing: antialiased; margin: 0px; padding: 0px; border-width: 0px 0px 1px; border-top-style: initial; border-right-style: initial; border-bottom-style: dotted; border-left-style: initial; border-top-color: initial; border-right-color: initial; border-bottom-color: rgb(0, 0, 0); border-left-color: initial; border-image: initial; vertical-align: baseline; font-size: 16px; font-family: inherit; text-decoration: none !important; color: rgb(102, 102, 102) !important;">accuracy to most current AI systems. It can forecast patient survival across multiple cancer types and accurately pinpoint features in the tissue that surrounds a tumor — also known as the tumor microenvironment — that are related to a patient’s response to standard treatments, including surgery, chemotherapy, radiation, and immunotherapy. Finally, the team said, the tool appears capable of generating novel insights — it identified specific tumor characteristics previously not known to be linked to patient survival.

The findings, the research team said, add to growing evidence that AI-powered approaches can enhance clinicians’ ability to evaluate cancers efficiently and accurately, including the identification of patients who might not respond well to standard cancer therapies.

“If validated further and deployed widely, our approach, and approaches similar to ours, could identify early on cancer patients who may benefit from experimental treatments targeting certain molecular variations, a capability that is not uniformly available across the world,” Yu said.

Training and performance​


The team’s latest work builds on Yu’s previous research in AI systems for the evaluation of colon cancer and brain tumors. These earlier studies demonstrated the feasibility of the approach within specific cancer types and specific tasks.

The new model, called CHIEF (Clinical Histopathology Imaging Evaluation Foundation), was trained on 15 million unlabeled images chunked into sections of interest. The tool was then trained further on 60,000 whole-slide images of tissues including lung, breast, prostate, colorectal, stomach, esophageal, kidney, brain, liver, thyroid, pancreatic, cervical, uterine, ovarian, testicular, skin, soft tissue, adrenal gland, and bladder. Training the model to look both at specific sections of an image and the whole image allowed it to relate specific changes in one region to the overall context. This approach, the researchers said, enabled CHIEF to interpret an image more holistically by considering a broader context, instead of just focusing on a particular region.

Following training, the team tested CHIEF’s performance on more than 19,400 whole-slide images from 32 independent datasets collected from 24 hospitals and patient cohorts across the globe.

Overall, CHIEF outperformed other state-of-the-art AI methods by up to 36 percent on the following tasks: cancer cell detection, tumor origin identification, predicting patient outcomes, and identifying the presence of genes and DNA

DNA, or deoxyribonucleic acid, is a molecule composed of two long strands of nucleotides that coil around each other to form a double helix. It is the hereditary material in humans and almost all other organisms that carries genetic instructions for development, functioning, growth, and reproduction. Nearly every cell in a person’s body has the same DNA. Most DNA is located in the cell nucleus (where it is called nuclear DNA), but a small amount of DNA can also be found in the mitochondria (where it is called mitochondrial DNA or mtDNA).

" data-gt-translate-attributes="[{"attribute":"data-cmtooltip", "format":"html"}]" tabindex="0" role="link" style="box-sizing: inherit; -webkit-font-smoothing: antialiased; margin: 0px; padding: 0px; border-width: 0px 0px 1px; border-top-style: initial; border-right-style: initial; border-bottom-style: dotted; border-left-style: initial; border-top-color: initial; border-right-color: initial; border-bottom-color: rgb(0, 0, 0); border-left-color: initial; border-image: initial; vertical-align: baseline; font-size: 16px; font-family: inherit; text-decoration: none !important; color: rgb(102, 102, 102) !important;">DNA patterns related to treatment response. Because of its versatile training, CHIEF performed equally well no matter how the tumor cells were obtained — whether via biopsy or through surgical excision. And it was just as accurate, regardless of the technique used to digitize the cancer cell samples. This adaptability, the researchers said, renders CHIEF usable across different clinical settings and represents an important step beyond current models that tend to perform well only when reading tissues obtained through specific techniques.

Cancer detection​


CHIEF achieved nearly 94 percent accuracy in cancer detection and significantly outperformed current AI approaches across 15 datasets containing 11 cancer types. In five biopsy datasets collected from independent cohorts, CHIEF achieved 96 percent accuracy across multiple cancer types including esophagus, stomach, colon, and prostate. When the researchers tested CHIEF on previously unseen slides from surgically removed tumors of the colon, lung, breast, endometrium, and cervix, the model performed with more than 90 percent accuracy.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,219
Reputation
8,625
Daps
161,896

Predicting tumors’ molecular profiles​


A tumor’s genetic makeup holds critical clues to determine its future behavior and optimal treatments. To get this information, oncologists order DNA sequencing of tumor samples, but such detailed genomic profiling of cancer tissues is not done routinely nor uniformly across the world due to the cost and time involved in sending samples to specialized DNA sequencing labs. Even in well-resourced regions, the process could take several weeks. It’s a gap that AI could fill, Yu said.

Quickly identifying cellular patterns on an image suggestive of specific genomic aberrations could offer a quick and cost-effective alternative to genomic sequencing, the researchers said.

CHIEF outperformed current AI methods for predicting genomic variations in a tumor by looking at the microscopic slides. This new AI approach successfully identified features associated with several important genes related to cancer growth and suppression, and it predicted key genetic mutations related to how well a tumor might respond to various standard therapies. CHIEF also detected specific DNA patterns related to how well a colon tumor might respond to a form of immunotherapy called immune checkpoint blockade. When looking at whole-tissue images, CHIEF identified mutations in 54 commonly mutated cancer genes with an overall accuracy of more than 70 percent, outperforming the current state-of-the-art AI method for genomic cancer prediction. Its accuracy was greater for specific genes in specific cancer types.

The team also tested CHIEF on its ability to predict mutations linked with response to FDA-approved targeted therapies across 18 genes spanning 15 anatomic sites. CHIEF attained high accuracy in multiple cancer types, including 96 percent in detecting a mutation in a gene called EZH2 common in a blood cancer called diffuse large B-cell lymphoma. It achieved 89 percent for BRAF gene mutation in thyroid cancer, and 91 percent for NTRK1 gene mutation in head and neck cancers.

Predicting patient survival​


CHIEF successfully predicted patient survival based on tumor histopathology images obtained at the time of initial diagnosis. In all cancer types and all patient groups under study, CHIEF distinguished patients with longer-term survival from those with shorter-term survival. CHIEF outperformed other models by 8 percent. And in patients with more advanced cancers, CHIEF outperformed other AI models by 10 percent. In all, CHIEF’s ability to predict high versus low death risk was tested and confirmed across patient samples from 17 different institutions.

Extracting novel insights about tumor behavior​


The model identified tell-tale patterns on images related to tumor aggressiveness and patient survival. To visualize these areas of interest, CHIEF generated heat maps on an image. When human pathologists analyzed these AI-derived hot spots, they saw intriguing signals reflecting interactions between cancer cells and surrounding tissues. One such feature was the presence of greater numbers of immune cells in areas of the tumor in longer-term survivors, compared with shorter-term survivors. That finding, Yu noted, makes sense because a greater presence of immune cells may indicate the immune system has been activated to attack the tumor.

When looking at the tumors of shorter-term survivors, CHIEF identified regions of interest marked by the abnormal size ratios between various cell components, more atypical features on the nuclei of cells, weak connections between cells, and less presence of connective tissue in the area surrounding the tumor. These tumors also had a greater presence of dying cells around them. For example, in breast tumors, CHIEF pinpointed as an area of interest the presence of necrosis — or cell death — inside the tissues. On the flip side, breast cancers with higher survival rates were more likely to have preserved cellular architecture resembling healthy tissues. The visual features and zones of interest related to survival varied by cancer type, the team noted.

Next steps​


The researchers said they plan to refine CHIEF’s performance and augment its capabilities by:

  • Conducting additional training on images of tissues from rare diseases and non-cancerous conditions
  • Including samples from pre-malignant tissues before cells become fully cancerous
  • Exposing the model to more molecular data to enhance its ability to identify cancers with different levels of aggressiveness
  • Training the model to also predict the benefits and adverse effects of novel cancer treatments in addition to standard treatments

Reference: “A pathology foundation model for cancer diagnosis and prognosis prediction” by Xiyue Wang, Junhan Zhao, Eliana Marostica, Wei Yuan, Jietian Jin, Jiayu Zhang, Ruijiang Li, Hongping Tang, Kanran Wang, Yu Li, Fang Wang, Yulong Peng, Junyou Zhu, Jing Zhang, Christopher R. Jackson, Jun Zhang, Deborah Dillon, Nancy U. Lin, Lynette Sholl, Thomas Denize, David Meredith, Keith L. Ligon, Sabina Signoretti, Shuji Ogino, Jeffrey A. Golden, MacLean P. Nasrallah, Xiao Han, Sen Yang and Kun-Hsing Yu, 4 September 2024, Nature.
DOI: 10.1038/s41586-024-07894-z

The work was in part supported by the National Institute of General Medical Sciences grant R35GM142879, the Department of Defense Peer Reviewed Cancer Research Program Career Development Award HT9425-23-1-0523, the Google Research Scholar Award, the Harvard Medical School Dean’s Innovation Award, and the Blavatnik Center for Computational Biomedicine Award.

Yu is an inventor of U.S. patent 16/179,101 assigned to Harvard University and served as a consultant for Takeda, Curatio DL, and the Postgraduate Institute for Medicine. Jun Zhang and Han were employees of Tencent AI Lab.
 

The_Truth

Superstar
Supporter
Joined
Aug 17, 2014
Messages
7,699
Reputation
1,480
Daps
27,789
On the bright side, this could put an end to racial bias in the healthcare field.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,219
Reputation
8,625
Daps
161,896


A.I. Chatbots Defeated Doctors at Diagnosing Illness​


A small study found ChatGPT outdid human physicians when assessing medical case histories, even when those doctors were using a chatbot.

Listen to this article · 9:25 min Learn more

A view from a hallway into an exam room of a health care center.


In an experiment, doctors who were given ChatGPT to diagnose illness did only slightly better than doctors who did not. But the chatbot alone outperformed all the doctors.Credit...Michelle Gustafson for The New York Times

By Gina Kolata

Nov. 17, 2024
Leer en español

Dr. Adam Rodman, an expert in internal medicine at Beth Israel Deaconess Medical Center in Boston, confidently expected that chatbots built to use artificial intelligence would help doctors diagnose illnesses.

He was wrong.

Instead, in a study Dr. Rodman helped design, doctors who were given ChatGPT-4 along with conventional resources did only slightly better than doctors who did not have access to the bot. And, to the researchers’ surprise, ChatGPT alone outperformed the doctors.

“I was shocked,” Dr. Rodman said.

The chatbot, from the company OpenAI, scored an average of 90 percent when diagnosing a medical condition from a case report and explaining its reasoning. Doctors randomly assigned to use the chatbot got an average score of 76 percent. Those randomly assigned not to use it had an average score of 74 percent.

The study showed more than just the chatbot’s superior performance.

It unveiled doctors’ sometimes unwavering belief in a diagnosis they made, even when a chatbot potentially suggests a better one.

And the study illustrated that while doctors are being exposed to the tools of artificial intelligence for their work, few know how to exploit the abilities of chatbots. As a result, they failed to take advantage of A.I. systems’ ability to solve complex diagnostic problems and offer explanations for their diagnoses.

A.I. systems should be “doctor extenders,” Dr. Rodman said, offering valuable second opinions on diagnoses.

But it looks as if there is a way to go before that potential is realized.

Case History, Case Future​


The experiment involved 50 doctors, a mix of residents and attending physicians recruited through a few large American hospital systems, and was published last month in the journal JAMA Network Open.

The test subjects were given six case histories and were graded on their ability to suggest diagnoses and explain why they favored or ruled them out. Their grades also included getting the final diagnosis right.

The graders were medical experts who saw only the participants’ answers, without knowing whether they were from a doctor with ChatGPT, a doctor without it or from ChatGPT by itself.

The case histories used in the study were based on real patients and are part of a set of 105 cases that has been used by researchers since the 1990s. The cases intentionally have never been published so that medical students and others could be tested on them without any foreknowledge. That also meant that ChatGPT could not have been trained on them.

But, to illustrate what the study involved, the investigators published one of the six cases the doctors were tested on, along with answers to the test questions on that case from a doctor who scored high and from one whose score was low.

That test case involved a 76-year-old patient with severe pain in his low back, buttocks and calves when he walked. The pain started a few days after he had been treated with balloon angioplasty to widen a coronary artery. He had been treated with the blood thinner heparin for 48 hours after the procedure.

The man complained that he felt feverish and tired. His cardiologist had done lab studies that indicated a new onset of anemia and a buildup of nitrogen and other kidney waste products in his blood. The man had had bypass surgery for heart disease a decade earlier.

The case vignette continued to include details of the man’s physical exam, and then provided his lab test results.

The correct diagnosis was cholesterol embolism — a condition in which shards of cholesterol break off from plaque in arteries and block blood vessels.

Participants were asked for three possible diagnoses, with supporting evidence for each. They also were asked to provide, for each possible diagnosis, findings that do not support it or that were expected but not present.

The participants also were asked to provide a final diagnosis. Then they were to name up to three additional steps they would take in their diagnostic process.

Like the diagnosis for the published case, the diagnoses for the other five cases in the study were not easy to figure out. But neither were they so rare as to be almost unheard-of. Yet the doctors on average did worse than the chatbot.

What, the researchers asked, was going on?

The answer seems to hinge on questions of how doctors settle on a diagnosis, and how they use a tool like artificial intelligence.

The Physician in the Machine​


How, then, do doctors diagnose patients?

The problem, said Dr. Andrew Lea, a historian of medicine at Brigham and Women’s Hospital who was not involved with the study, is that “we really don’t know how doctors think.”

In describing how they came up with a diagnosis, doctors would say, “intuition,” or, “based on my experience,” Dr. Lea said.

That sort of vagueness has challenged researchers for decades as they tried to make computer programs that can think like a doctor.

The quest began almost 70 years ago.
“Ever since there were computers, there were people trying to use them to make diagnoses,” Dr. Lea said.

One of the most ambitious attempts began in the 1970s at the University of Pittsburgh. Computer scientists there recruited Dr. Jack Myers, chairman of the medical school’s department of internal medicine who was known as a master diagnostician. He had a photographic memory and spent 20 hours a week in the medical library, trying to learn everything that was known in medicine.

Dr. Myers was given medical details of cases and explained his reasoning as he pondered diagnoses. Computer scientists converted his logic chains into code. The resulting program, called INTERNIST-1, included over 500 diseases and about 3,500 symptoms of disease.

To test it, researchers gave it cases from the New England Journal of Medicine. “The computer did really well,” Dr. Rodman said. Its performance “was probably better than a human could do,” he added.

But INTERNIST-1 never took off. It was difficult to use, requiring more than an hour to give it the information needed to make a diagnosis. And, its creators noted, “the present form of the program is not sufficiently reliable for clinical applications.”

Research continued. By the mid-1990s there were about a half dozen computer programs that tried to make medical diagnoses. None came into widespread use.
“It’s not just that it has to be user friendly, but doctors had to trust it,” Dr. Rodman said.

And with the uncertainty about how doctors think, experts began to ask whether they should care. How important is it to try to design computer programs to make diagnoses the same way humans do?

“There were arguments over how much a computer program should mimic human reasoning,” Dr. Lea said. “Why don’t we play to the strength of the computer?”

The computer may not be able to give a clear explanation of its decision pathway, but does that matter if it gets the diagnosis right?

The conversation changed with the advent of large language models like ChatGPT. They make no explicit attempt to replicate a doctor’s thinking; their diagnostic abilities come from their ability to predict language.
“The chat interface is the killer app,” said Dr. Jonathan H. Chen, a physician and computer scientist at Stanford who was an author of the new study.
“We can pop a whole case into the computer,” he said. “Before a couple of years ago, computers did not understand language.”

But many doctors may not be exploiting its potential.

Operator Error​


After his initial shock at the results of the new study, Dr. Rodman decided to probe a little deeper into the data and look at the actual logs of messages between the doctors and ChatGPT. The doctors must have seen the chatbot’s diagnoses and reasoning, so why didn’t those using the chatbot do better?

It turns out that the doctors often were not persuaded by the chatbot when it pointed out something that was at odds with their diagnoses. Instead, they tended to be wedded to their own idea of the correct diagnosis.
“They didn’t listen to A.I. when A.I. told them things they didn’t agree with,” Dr. Rodman said.

That makes sense, said Laura Zwaan, who studies clinical reasoning and diagnostic error at Erasmus Medical Center in Rotterdam and was not involved in the study.
“People generally are overconfident when they think they are right,” she said.

But there was another issue: Many of the doctors did not know how to use a chatbot to its fullest extent.

Dr. Chen said he noticed that when he peered into the doctors’ chat logs, “they were treating it like a search engine for directed questions: ‘Is cirrhosis a risk factor for cancer? What are possible diagnoses for eye pain?’”

“It was only a fraction of the doctors who realized they could literally copy-paste in the entire case history into the chatbot and just ask it to give a comprehensive answer to the entire question,” Dr. Chen added.
“Only a fraction of doctors actually saw the surprisingly smart and comprehensive answers the chatbot was capable of producing.”
Gina Kolata reports on diseases and treatments, how treatments are discovered and tested, and how they affect people. More about Gina Kolata
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,219
Reputation
8,625
Daps
161,896













1/50
@deedydas
o1-preview is far superior to doctors on reasoning tasks and it's not even close, according to OpenAI's latest paper.

AI does ~80% vs ~30% on the 143 hard NEJM CPC diagnoses.

It's dangerous now to trust your doctor and NOT consult an AI model.

Here are some actual tasks:

1/5



GfAyOWtaIAAjZaY.jpg


2/50
@deedydas
Here's an example case looking at phosphate wasting and elevated FGF23, then proceeded to imaging to localize a potential tumor.

o1-preview suggested testing plan takes a broader, more methodical approach, systematically ruling out other causes of hypophosphatemia.

2/5



GfAyPWNbQAAtIrh.jpg


3/50
@deedydas
For persistent, unexplained hyperammonemia, o1-preview recommends a prioritized expansion of tests—from basic immunoglobulins and electrolytes to advanced imaging, breath tests for SIBO and specialized GI biopsies—ensuring more common causes are checked first.

3/5



GfAyQT5acAAiV0k.jpg


4/50
@deedydas
I have all the respect in the world for doctors, but in many cases their job is basic reasoning over a tremendously large domain-specific knowledge base.

Fortunately, this is exactly what LLMs are really good for.

This means more high quality healthcare for everyone.

4/5



5/50
@deedydas
Source: https://arxiv.org/pdf/2412.10849

5/5



6/50
@JianWDong
I'm very pro-AI in medicine but I question the practicality of the plan proposed in the first example.

As a nuclear radiologist I'm going to opine only on the radiological portion of the o1-preview's recommendations:

It sucks. The strategy amounts to "order everything" without regards to price, role, order, sensitivity/specificity of the imaging modality.

1) Gallium-68-DOTATATE PET/CT: Agree it's useful and necessary. Dotatate can localize the mesenchymal tumor that's causing TIO.

2) FDG PET/CT: Unnecessary as it is inferior to Dotatate.

3) Whole body MRI: MRI is helpful after a tumor has been localized by Dotatate, for surgical planning and anatomical assessment. I'd NOT blindly do a whole body MRI....you'd bankrupt the patient/increase insurance premiums for everyone, and not add much value.

4) Bone scan: Can be helpful to localize areas of bone turnovers, but is nonspecific, and probably unnecessarily as the CT portion of DOTATATE usually would provide most of the information one sees on bone scan anyway.



7/50
@deedydas
This is the most pointed and clear domain-specific feedback I've heard on the topic. Thank you for sharing!

Do you see this get resolved with time or no?



8/50
@pmddomingos
We’ve had AI diagnostic systems that are better than doctors and more reliable than o1-preview for decades. Cc: @erichorvitz



9/50
@deedydas
Can you share any studies quantifying that? Would love to read



10/50
@aidan_mclau
shame they didn’t test newsonnet



11/50
@deedydas
Yeah me too, but as you know, I don't think it's apples to apples unless Sonnet does test time compute.

Just one side effect is consistency: the error bounds for o1 were far tighter than gpt4 in the study.



12/50
@Tom62589172
Why is the AI recommendation of tests better than what the real doctors recommend? Is it simply because it is formatted better?



13/50
@deedydas
No they were rated by two internal medicine physicians.

Here's the full methodology



GfA2pe3acAAqUZR.jpg


14/50
@threadreaderapp
Your thread is everybody's favorite! /search?q=#TopUnroll Thread by @deedydas on Thread Reader App đŸ™đŸŒ@medmastery for đŸ„‡unroll



15/50
@MUSTAFAELSHISH1
I have never seen a trader as perfect as @Patrick_Cryptos he knows how to analyze trades and with his SIGNALs and strategies he generated a lot of profits for me. I made over $80k by participating on his pump program. follow him @Patrick_Cryptos



16/50
@brent_ericson
Most people, as in 99.9% of the population, do not have these rare conditions and using AI would only complicate what would otherwise be a simple matter.



17/50
@rajatthawani
This is obviously intriguing! But the bias is that it diagnosed the rare 1% diagnosis. For the common 90%, it’ll ask for enough tests to raise the healthcare expenditure, one of the goals for AI in medicine.

The gotcha ‘dangerous to trust your doctor’ is good for headlines, but in reality, it’ll be counterintuitive.



18/50
@FlowAltDelete
GP’s will be one of the first jobs fully taken over by AI.



19/50
@castillobuiles
How often you doctor hallucinate?



20/50
@jenboland
I read somewhere how long it took Drs to acquire new knowledge, it was years after discovery. Now, maybe we can get EHRs to be personal health knowledge collectors vs billing tools, get more time for examination and ultimately better outcome for less.



21/50
@alexmd2
Patient S. C. 57 yo “passed out” at local HD center according to EMS. Can’t remember what meds he is taking. Fluctuating level of alertness in the ED. Lip smacking. Serum drug screen caloric level 43.

ChatGPT to manage: ???

Me: who is your renal guy? While ordering Valproate loading and observing surgical intern fixing his lip.

Pt: Yes.

Me: calling three local nephrologist offices to see if they know him. Luck strikes on the second one and I get his med list (DOB was wrong).

Confirmed with NY state PMP for controlled substances for Suboxon and Xanax doses.

Orders in, HP done sign out done.

Medicare: pt didn’t meet admission criteria downgrade to OSV.



22/50
@DrSiyabMD
Bro real life medicine is not a nicely written NEJM clinical scenario.



23/50
@KuittinenPetri
I am pretty sure medical professionals and law makers will put a lot of hurdles into adopting the use of AI in diagnoses or suggesting medication in pretty much every Western country. They want to keep the status quo, which means expensive medical care, long queues, but fat paychecks.

At the same this could lead to a health care revolution in less regulated developing countries. A doctor's diagnosis could cost much less than $1, yet be more accurate and in depth than you can get from most human doctors.



24/50
@castillobuiles
you don’t need a doctor to answer this questions. You need them to solve new problems. LLMs can’t.



25/50
@bioprotocol
Exciting how this is only the beginning.

Ultimately, AI will evolve medical databases, find unexpected connections between symptoms and diseases, tailor treatments based on individual biological changes, and so much more.



26/50
@davidorban
This is what @PeterDiamandis often talks about: a doctor that doesn't use AI complementing their diagnosis is not to be trusted anymore.



27/50
@MortenPunnerud
How does o1 (not preview) compare?



28/50
@devusnullus
NEJM CPC is almost certainly in the training data.



29/50
@alexmd2
How sure are we that these cases were not in the training set?



30/50
@daniel_mac8
there's just no way that a human being can consume the amount of data that's relevant in the medical / biological field and produce an accurate output based on that data

it's not how our minds work

that we now have AI systems that can actually do this is a MASSIVE boon



31/50
@phanishchandra
The doctor's most important job is to ask the right question to the patient and prepare a case history. Once that is done, it is following a decision tree based differential diagnosis and threat based on guidelines. You will have to consult a doctor first to get your case history



32/50
@Matthew93064408
Do any doctors want to verify if the benchmarks have been cherrypicked?



33/50
@ChaddertonRyan
Specialized AI assistants need to be REQUIRED for medical professionals, especially in the US from my experience.
It's not normal to expect doctors to know as much as the NIH (NCBI) database and the Mayo references combined.



34/50
@giles_home
When an AI can do this without a clinical vignette, building it's own questions to find the right answer, and taking all the non verbal queues to account for safe guarding concerns, in a ten minute appointment, then you can make statements about safety. Until then I'll respectfully disagree.



35/50
@squirtBeetle
We all know that you feed the NEJM questions into the data sets and the models don’t work if you change literally any word or number in the question



36/50
@Pascallisch
Very interesting, not surprising



37/50
@drewdennison
Yep, a family member recently took the medical boards and o1 answered every practice question we feed it perfectly, and most important for studying, explained the reasoning and cited sources



38/50
@AAkerstein
The practice of medicine will become unbundled. Ask: what will the role of a doctor be when knowledge becomes commoditized?



39/50
@aryanagxl
AI diagnoses are far better on average than human doctors

LLMs will disrupt the space. We just need to give them permission to do diagnoses.

However, doctors will remain important to do the actual medical procedures. We are far behind in accuracy in that case.



40/50
@TiggerSharkML
👀



41/50
@elchiapp
Saw this first hand. ChatGPT made the correct diagnosis after FOUR cardiologists in 2 countries misdiagnosed me.



42/50
@DanielSMatthews
Working out what the patient isn't telling you is half the skill of a medical interview in general medicine.

Explain to me how this technology changes that?



43/50
@DrDatta_AIIMS
Deedy, I am a doctor and I CONSULT an AI model. Will you trust me?



44/50
@hammadtariq
“It's dangerous now to trust your doctor and NOT consult an AI model” - totally agreed! First consult AI, know everything then go to doctor and frame what you already know, you will get 100% correct diagnosis (don’t tell him that you already know!) (It came out as sarcastic but its actually not, this is what I am doing from a year now, getting incrementally better and better results)



45/50
@ironmark1993
This is insane!
o1 kills it in fields like this.



46/50
@medmastery
@threadreaderapp unroll



47/50
@zoewangai
Breakdown of the paper:

The study evaluates the medical reasoning abilities of a large language model called o1-preview, developed by OpenAI, across various tasks in clinical medicine. The performance of o1-preview is compared to human physicians and previous language models like GPT-4.

The study found that o1-preview consistently outperformed both human physicians and previous language models like GPT-4 in generating differential diagnoses and presenting diagnostic reasoning. However, it performed similarly to GPT-4 in probabilistic reasoning tasks. On management reasoning cases, o1-preview scored significantly higher than both GPT-4 and human physicians.

The results suggest that o1-preview excels at higher-order tasks requiring critical thinking, such as diagnosis and management, while performing less well on tasks that require more abstract reasoning, like probabilistic reasoning. The rapid improvement in language models' medical reasoning abilities has significant implications for clinical practice, but the researchers highlight the need for robust monitoring and integration frameworks to ensure the safe and effective use of these tools.

full paper: Superhuman performance of a large language model on the reasoning tasks of a physician



GfA8OWdaQAAyzPl.jpg


48/50
@postmindfukk
I wonder if there will be a wave of court cases where people can easily detect malpractice in past procedures



49/50
@TheMinuend
Since the models also have/can be retro fitted with inherent bias, can holistically outperform the doc.



50/50
@castillobuiles
o1 planning score is still much lower than the average human. Not doctors




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top