In the last issue of Journal Club, we saw that ChatGPT was equally as good as humans at writing the introduction sections to scientific articles. But how does ChatGPT perform when asked to write abstracts?
Citation
Hwang T, et al. Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts. PLoS One. 2024;19(2):e0297701.
Overview
Recent studies have hinted at the potential of ChatGPT to assist in scientific writing.
However, there has been a lack of objective research on ChatGPT's accuracy and reliability when generating abstracts for medical papers.
To address this, Taesoon Hwang and coworkers from the University of Warwick, UK, asked ChatGPT to generate abstracts for RCTs following guidelines of the respective medical journals. These were then rated for quality using the CONSORT-A checklist and then compared against the original authors' abstracts. To ensure that prior knowledge of a study did not influence abstract generation by ChatGPT, any RCTs published prior to September 2021 were excluded.
Summary
The overall quality of the original (human-written) abstracts outperformed those generated by ChatGPT
Original abstracts achieved a mean score of 11.89, whereas GPT 3.5-generated abstracts scored significantly lower with a mean of 7.89 and 5.18 for GPT 4
However, in blind assessments, ChatGPT-generated abstracts were ranked as more readable than the original abstracts
Key quote
"AI could serve as an invaluable tool in translating complex scientific texts into more accessible versions, hence promoting higher level of comprehension and engagement from the general public."
Opinion
This was an exploratory study that examined ChatGPTs ability to summarise large text into abstracts, not its ability to generate original content
The authors note that ChatGPT's underperformance may be due to the comparator original abstracts which were from studies published in high impact journals known for rigorous peer review
Interestingly, both ChatGPT 3.5 and 4 were used in the study, with version 3.5 significantly outperforming GPT 4 in terms of readability (62% vs 7%) and hallucination rate (0.03 vs 1.13 items/abstract)