Google unveiled its AI tool for answering medical questions. (AFP)

AI Chatbot Developed by Google Achieves Success on Medical Exam

Adriana MaraisJuly 14, 23 Artificial Intelligence, Google

According to a peer-reviewed study released on Wednesday, Google’s medical chatbot, which utilizes artificial intelligence, has successfully passed a challenging US medical licensing exam. However, the study also revealed that the chatbot’s responses still do not match the quality of those provided by human doctors.

Released last year, ChatGPT – whose developer OpenAI is backed by Google’s rival Microsoft – kicked off the competition between the tech giants in the growing artificial intelligence field.

While there has been much talk about the future possibilities – and dangers – of artificial intelligence, health is one area where the technology has already shown tangible progress with algorithms that could read certain medical scans as well as people.

Google first introduced an AI tool for answering medical questions, called Med-PaLM, in a preprint study in December. Unlike ChatGPT, it is not released to the public.

The US tech giant says Med-PaLM is the first major language model, artificial intelligence technology trained on large volumes of human-generated text to pass the US Medical Licensing Examination (USMLE).

The pass mark for medical students and doctors in training in the United States is around 60 percent.

A study conducted in February concluded that ChatGPT had achieved transient or near-acceptable results.

In a peer-reviewed study published Wednesday in the journal Nature, Google researchers said Med-PaLM had scored 67.6 percent on USMLE-style multiple-choice questions.

“Med-PaLM performs encouragingly, but is still inferior to clinicians,” the study concluded.

To identify and reduce “hallucinations” — artificial intelligence models providing false information — Google said it has developed a new evaluation criterion.

Karan Singhal, a Google researcher and lead author of the new study, told AFP that the team has used the benchmark to test a newer version of their model with “very exciting” results.

Med-PaLM 2 has scored 86.5 percent on the USMLE exam, beating the previous version by nearly 20 percent, according to a non-peer-reviewed preprint study published in May.

– “The Elephant in the Room” –

James Davenport, a computer scientist at the University of Bath in Britain who is not involved in the research, said there is “an elephant in the room” for these AI-powered medical chatbots.

There is a big difference between answering ‘medical questions’ and real medicine, which involves diagnosing and treating genuine health problems,” he said.

Anthony Cohn, an artificial intelligence expert at the University of Leeds in the UK, said hallucinations would likely always be a problem in such large language models because of their statistical nature.

Therefore, these models “should always be viewed as facilitators rather than final decision-makers,” Cohn said.

In the future, Singhal said, Med-PaLM could be used to support doctors to offer options that might not have otherwise been considered.

The Wall Street Journal reported earlier this week that Med-PaLM 2 has been in testing at the prestigious US Mayo Clinic research hospital since April.

Singhal said he could not talk about specific partnerships.

But he stressed that any testing would not be “clinical or patient-directed and could not cause harm to patients”.

It would instead be “more administrative tasks that can be relatively easily automated with small inputs,” he added.