Artificial intelligence chatbots such as the newest entrant Google’s Med-PaLM 2 promise to revolutionize medical diagnosis. According to Google, Med-PaLM 2 is the first language model to achieve expert-level performance. But there are still several challenges that need to be addressed before AI-based medical assistants can be introduced for widespread use, such as accuracy, accountability and the threat of data quality and bias.
Doctors have long been accustomed to seeing patients who have first investigated their symptoms and possible treatment options on the internet—a practice they have also long tried (and failed) to discourage. For many years, the so-called “Dr. Google” has boasted a reputation for a lack of context and serious errors of judgment.
But thanks to advances in AI, this may all be about to change. In recent months, more and more doctors have started to see patients who are using a new, far more powerful tool for self-diagnosis: artificial intelligence chatbots such as OpenAI’s ChatGPT, the latest version of Microsoft’s search engine Bing (which is based on OpenAI’s software) and Google’s Med-PaLM.
Trained on text across the web, these large language models (LLMs) predict the next word in a sequence to answer questions in a style approaching that of a human. As such training continues, these systems are becoming increasingly sophisticated and reliable.
Faced with a shortage of healthcare personnel, researchers and medical professionals are hoping that AI systems can make a critical difference by helping to answer health-related questions from the general population. Initial tests suggest these bots are far more accurate than a Google search.
In fact, many experts predict that within the year, a major medical center will announce a collaboration using LLM chatbots to interact with patients and diagnose diseases.
Thorbjorg Petursdottir, a consultant software engineer with UK-based medical device design and development consultancy Team Consulting, explained:
“The main benefit of AI-based systems is their ability to access a huge amount of data in a short space of time. These systems could reduce the burden on healthcare services by providing people with round-the-clock medical information and improving accessibility to medical advice for people living in remote areas.
They could also help healthcare professionals with clinical diagnosis, patient monitoring and care management, and assist with mundane tasks such as summarizing patient-doctor interactions and patient data retrieval.”
Med-PaLM 2, the Newest Entrant
The newest AI-based medical chatbot to be unveiled is Google’s Med-PaLM 2, which Google Cloud began to roll out to customers as an iteration of Med-PaLM for a “limited test” in April. The objective, the company says, is to explore safe, responsible and meaningful use scenarios.
According to Google, Med-PaLM 2 could “facilitate rich, informative discussions, answer complex medical questions, and find insights in complicated and unstructured medical texts.” It can also generate short and long answers to medical questions and create summaries from internal documentation and data sets, as well as from scientific sources.
Google reports that Med-PaLM 2 is the first language model to achieve expert-level performance on U.S. Medical Licensing Examination (USMLE)-style questions, with more than 85 percent accuracy. In the MedMCQA dataset, which includes questions from India’s AIIMS and NEET medical exams, it achieved a “pass rate” of 72.3 percent.
Barriers to Overcome
With things moving incredibly fast in the development of generative AI, there is a growing critical consensus warning of the dangers of such technologies. Many experts argue that development should be slowed down, in order to better understand impact and manage risk.
There are several challenges that need to be addressed before AI-based medical assistants can be introduced for widespread use. Perhaps the most important is accuracy because LLMs still produce incorrect answers or “hallucinate.” All doctors make mistakes, but would it be acceptable if a chatbot gave out incorrect information to the general public? Probably not.
Thorbjorg Petursdottir said:
“Another challenge is around accountability—at the moment it’s unclear who would be responsible if something went wrong with one of these systems. And explainability is a concern too. It can be difficult for doctors or patients to trust the decisions of LLMs without understanding how they function first.”
Last but not least, data quality and bias may also pose a threat, as the performance of LLMs depends heavily on the quality and fairness of the data they are trained on. If data is biased then the model will reflect those biases, which can potentially result in unfair or inaccurate predictions.
“I think Google’s approach of not using Med-PaLM 2 in a patient-facing setting yet, but rolling it out to selected customers to explore its use cases and receive feedback, is the most sensible approach for now.”