top of page
  • Writer's pictureNick Lamb, PhD

Journal Club #4: Can ChatGPT Assist Authors with Abstract Writing in Medical Journals?

In the last issue of Journal Club, we saw that ChatGPT was equally as good as humans at writing the introduction sections to scientific articles. But how does ChatGPT perform when asked to write abstracts?

ChatGPT in medicine


Hwang T, et al. Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts. PLoS One. 2024;19(2):e0297701.


Recent studies have hinted at the potential of ChatGPT to assist in scientific writing.

However, there has been a lack of objective research on ChatGPT's accuracy and reliability when generating abstracts for medical papers.

To address this, Taesoon Hwang and coworkers from the University of Warwick, UK, asked ChatGPT to generate abstracts for RCTs following guidelines of the respective medical journals. These were then rated for quality using the CONSORT-A checklist and then compared against the original authors' abstracts. To ensure that prior knowledge of a study did not influence abstract generation by ChatGPT, any RCTs published prior to September 2021 were excluded.


  • The overall quality of the original (human-written) abstracts outperformed those generated by ChatGPT

  • Original abstracts achieved a mean score of 11.89, whereas GPT 3.5-generated abstracts scored significantly lower with a mean of 7.89 and 5.18 for GPT 4

  • However, in blind assessments, ChatGPT-generated abstracts were ranked as more readable than the original abstracts

Key quote

"AI could serve as an invaluable tool in translating complex scientific texts into more accessible versions, hence promoting higher level of comprehension and engagement from the general public."


  • This was an exploratory study that examined ChatGPTs ability to summarise large text into abstracts, not its ability to generate original content

  • The authors note that ChatGPT's underperformance may be due to the comparator original abstracts which were from studies published in high impact journals known for rigorous peer review 

  • Interestingly, both ChatGPT 3.5 and 4 were used in the study, with version 3.5 significantly outperforming GPT 4 in terms of readability (62% vs 7%) and hallucination rate (0.03 vs 1.13 items/abstract)


bottom of page