ChatGPT fails top medical exam in the US: What it could do and what not – Times of India
ChatGPT has failed yet another top US exam. The much-acclaimed OpenAI’s ChatGPT chatbot has failed a urologist exam in the US, according to a study. The study, reported in the journal Urology Practice, showed that ChatGPT achieved less than a 30 per cent rate of correct answers on the American Urologist Association’s widely used Self-Assessment Study Program for Urology (SASP).
“ChatGPT not only has a low rate of correct answers regarding clinical questions in urologic practice, but also makes certain types of errors that pose a risk of spreading medical misinformation,” said Christopher M Deibert, from University of Nebraska Medical Center, in the report.
What is Self-Assessment Study Program for Urology tesr
The AUA’s Self-Assessment Study Program (SASP) is a 150-question practice examination addressing the core curriculum of medical knowledge in urology. The study excluded 15 questions containing visual information such as pictures or graphs.
How ChatGPT performed in the test
Overall, ChatGPT reportedly gave correct answers to less than 30 per cent of these SASP questions, 28.2 per cent of multiple-choice questions and 26.7 per cent of open-ended questions. The chatbot is said to have provided “indeterminate” responses to several questions. On these questions, accuracy was decreased when the LLM model was asked to regenerate its answers.
The report said that for most open-ended questions, ChatGPT provided an explanation only for the selected answer. The answers given by ChatGPT were longer than those provided by SASP, but “frequently redundant and cyclical in nature”, as per the authors.
“Overall, ChatGPT often gave vague justifications with broad statements and rarely commented on specifics,” Dr Deibert said. Even when given feedback, “ChatGPT continuously reiterated the original explanation despite it being inaccurate,” said the report.
What does not work for ChatGPT
The researchers suggest that while ChatGPT may do well on tests requiring recall of facts, it seems to fall short on questions pertaining to clinical medicine, which require “simultaneous weighing of multiple overlapping facts, situations and outcomes”.
“Given that LLMs are limited by their human training, further research is needed to understand their limitations and capabilities across multiple disciplines before it is made available for general use,” Dr Deibert said.
“ChatGPT not only has a low rate of correct answers regarding clinical questions in urologic practice, but also makes certain types of errors that pose a risk of spreading medical misinformation,” said Christopher M Deibert, from University of Nebraska Medical Center, in the report.
What is Self-Assessment Study Program for Urology tesr
The AUA’s Self-Assessment Study Program (SASP) is a 150-question practice examination addressing the core curriculum of medical knowledge in urology. The study excluded 15 questions containing visual information such as pictures or graphs.
How ChatGPT performed in the test
Overall, ChatGPT reportedly gave correct answers to less than 30 per cent of these SASP questions, 28.2 per cent of multiple-choice questions and 26.7 per cent of open-ended questions. The chatbot is said to have provided “indeterminate” responses to several questions. On these questions, accuracy was decreased when the LLM model was asked to regenerate its answers.
The report said that for most open-ended questions, ChatGPT provided an explanation only for the selected answer. The answers given by ChatGPT were longer than those provided by SASP, but “frequently redundant and cyclical in nature”, as per the authors.
“Overall, ChatGPT often gave vague justifications with broad statements and rarely commented on specifics,” Dr Deibert said. Even when given feedback, “ChatGPT continuously reiterated the original explanation despite it being inaccurate,” said the report.
What does not work for ChatGPT
The researchers suggest that while ChatGPT may do well on tests requiring recall of facts, it seems to fall short on questions pertaining to clinical medicine, which require “simultaneous weighing of multiple overlapping facts, situations and outcomes”.
“Given that LLMs are limited by their human training, further research is needed to understand their limitations and capabilities across multiple disciplines before it is made available for general use,” Dr Deibert said.
For all the latest Technology News Click Here
For the latest news and updates, follow us on Google News.
Denial of responsibility! NewsUpdate is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.