• medgremlin@midwest.social
    link
    fedilink
    English
    arrow-up
    14
    ·
    3 days ago

    A bunch of the “citations” ChatGPT uses are outright hallucinations. Unless you independently verify every word of the output, it cannot be trusted for anything even remotely important. I’m a medical student and some of my classmates use ChatGPT to summarize things and it spits out confabulations that are objectively and provably wrong.

    • ByteJunk@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      4
      ·
      3 days ago

      True.

      But doctors also screw up diagnosis, medication, procedures. I mean, being human and all that.

      I think it’s a given that AI outperforms in medical exams -be it multiple choice or open ended/reasoning questions.

      Theres also a growing body of literature with scenarios where AI produces more accurate diagnosis than physicians, especially in scenarios with image/pattern recognition, but even plain GPT was doing a good job with clinical histories, getting the accurate diagnostic with it’s #1 DxD, and even better when given lab panels.

      Another trial found that patients who received email replies to their follow-up queries from AI or from physicians, found the AI to be much more empathetic, like, it wasn’t even close.

      Sure, the AI has flaws. But the writing is on the wall…

      • medgremlin@midwest.social
        link
        fedilink
        English
        arrow-up
        3
        ·
        3 days ago

        The AI passed the multiple choice board exam, but the specialty board exam that you are required to pass to practice independently includes oral boards, and when given the prep materials for the pediatric boards, the AI got 80% wrong, and 60% of its diagnoses weren’t even in the correct organ system.

        The AI doing pattern recognition works on things like reading mammograms to detect breast cancer, but AI doesn’t know how to interview a patient to find out the history in the first place. AI (or, more accurately, LLMs) doesn’t know how to do the critical thinking it takes to know what questions to ask in the first place to determine which labs and imaging studies to order that it would be able to make sense of. Unless you want the world where every patient gets the literal million dollar workup for every complaint, entrusting diagnosis to these idiot machines is worse than useless.

        • ByteJunk@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          2
          ·
          2 days ago

          Could you provide references? I’m genuinely interested, and what I found seems to say differently:

          Overall, GPT-4 passed the board residency examination in four of five specialties, revealing a median score higher than the official passing score of 65%.

          AI NEJM

          Also I believe you’re seriously underestimating the abilities of present day LLMs. They are able to ask relevant follow up questions, as well as interpreting that information to request additional studies, and achieve accurate diagnosis.

          See here a study specifically on conversational diagnosis AIs. It has some important limitations, crucially from having to work around the text interface which is not ideal, but otherwise achieved really interesting results.

          Call them “idiot machines” all you want, but I feel this is going down the same path as full self driving cars - eventually they’ll be doing less errors than humans, and that will save lives.

          • medgremlin@midwest.social
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            2 days ago

            My mistake, I recalled incorrectly. It got 83% wrong. https://arstechnica.com/science/2024/01/dont-use-chatgpt-to-diagnose-your-kids-illness-study-finds-83-error-rate/

            The chat interface is stupid in so many ways and I would hate using text to talk to a patient myself. There are so many non-verbal aspects of communication that are hard to teach to humans that would be impossible to teach to an AI. If you are familiar with people and know how to work with them, you can pick up on things like intonation and body language that can indicate that they didn’t actually understand the question and you need to rephrase it to get the information you need, or that there’s something the patient is uncomfortable about saying/asking. Or indications that they might be lying about things like sexual activity or substance use. And that’s not even getting into the part where AI’s can’t do a physical exam which may reveal things that the interview did not. This also ignores patients that can’t tell you what’s wrong because they are babies or they have an altered mental status or are unconscious. There are so many situations where an LLM is just completely fucking useless in the diagnostic process, and even more when you start talking about treatments that aren’t pills.

            Also, the exams are only one part of your evaluation to get through medical training. As a medical student and as a resident, your performance and interactions are constantly evaluated and examined to ensure that you are actually competent as a physician before you’re allowed to see patients without a supervising attending physician. For example, there was a student at my school that had almost perfect grades and passed the first board exam easily, but once he was in the room with real patients and interacting with the other medical staff, it became blatantly apparent that he had no business being in the medical field at all. He said and did things that were wildly inappropriate and was summarily expelled. If becoming a doctor was just a matter of passing the boards, he would have gotten through and likely would have been an actual danger to patients. Medicine is as much an art as it is a science, and the only way to test the art portion of it is through supervised practice until they are able to operate independently.

            • ByteJunk@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              2
              ·
              2 days ago

              From the article referenced in your news source:

              _JAMA Pediatrics and the NEJM were accessed for pediatric case challenges (N = 100). The text from each case was pasted into ChatGPT version 3.5 with the prompt List a differential diagnosis and a final diagnosis. _

              A couple of key points:

              • These are case challenges, which are usually meant to be hard. I could find no comparison to actual physician results in the article, which would have been nice.
              • More importantly however: it was conducted in June 2023, and used GPT-3.5. GPT-4 improved substantially upon it, especially for complex scientific or scientific problems, and this shows in the newer studies that have used it.

              I don’t think anyone’s advocating that an AI will replace doctors, much like it won’t replace white collar jobs either.

              But if it helps achieve better outcomes for the patients, like the current research seems to indicate, aren’t you sworn to consider it in your practice?

              • medgremlin@midwest.social
                link
                fedilink
                English
                arrow-up
                1
                ·
                2 days ago

                Part of my significant suspicion regarding AI is that most of my medical experience and my intended specialty upon graduation is Emergency Medicine. The only thing AI might be useful for there is to function as a scribe. The AI is not going to tell me that the patient who denies any alcohol consumption smells like a liquor store, or that the patient that is completely unconscious has asterixis and flapping tremors. AI cannot tell me anything useful for my most critical patients, and for the less critical ones, I am perfectly capable of pulling up UpToDate or Dynamed and finding the thing I’m looking for myself. Maybe it can be useful for making suggestions for next steps, but for the initial evaluation? Nah. I don’t trust a glorified text predictor to catch the things that will kill my patients in the next 5 minutes.