ChatGPT: our study shows AI can produce academic papers good enough for journals – just as some ban it

January 29, 2023

Some of the world’s biggest academic journal publishers have banned or curbed their authors from using the advanced chatbot, ChatGPT. Because the bot uses information from the internet to produce highly readable answers to questions, the publishers are worried that inaccurate or plagiarised work could enter the pages of academic literature.

Several researchers have already listed the chatbot as a co-author on academic studies, and some publishers have moved to ban this practice. But the editor-in-chief of Science, one of the top scientific journals in the world, has gone a step further and forbidden any use of text from the program in submitted papers.

It’s not surprising the use of such chatbots is of interest to academic publishers. Our recent study, published in Finance Research Letters, showed ChatGPT could be used to write a finance paper that would be accepted for an academic journal. Although the bot performed better in some areas than in others, adding in our own expertise helped overcome the program’s limitations in the eyes of journal reviewers.

However, we argue that publishers and researchers should not necessarily see ChatGPT as a threat but rather as a potentially important aide for research – a low-cost or even free electronic assistant.

Our thinking was: if it’s easy to get good outcomes from ChatGPT by simply using it, maybe there’s something extra we can do to turn these good results into great ones.

We first asked ChatGPT to generate the standard four parts of a research study: research idea, literature review (an evaluation of previous academic research on the same topic), dataset, and suggestions for testing and examination. We specified only the broad subject and that the output should be capable of being published in “a good finance journal”.

This was version one of how we chose to use ChatGPT. For version two, we pasted into the ChatGPT window just under 200 abstracts (summaries) of relevant, existing research studies.

We then asked that the program take these into account when creating the four research stages. Finally, for version three, we added “domain expertise” — input from academic researchers. We read the answers produced by the computer program and made suggestions for improvements. In doing so, we integrated our expertise with that of ChatGPT.

We then requested a panel of 32 reviewers each review one version of how ChatGPT can be used to generate an academic study. Reviewers were asked to rate whether the output was sufficiently comprehensive, correct, and whether it made a contribution sufficiently novel for it to be published in a “good” academic finance journal.

The big take-home lesson was that all these studies were generally considered acceptable by the expert reviewers. This is rather astounding: a chatbot was deemed capable of generating quality academic research ideas. This raises fundamental questions around the meaning of creativity and ownership of creative ideas — questions to which nobody yet has solid answers.

Strengths and weaknesses

The results also highlight some potential strengths and weaknesses of ChatGPT. We found that different research sections were rated differently. The research idea and the dataset tended to be rated highly. There was a lower, but still acceptable, rating for the literature reviews and testing suggestions.

Our suspicion here is that ChatGPT is particularly strong at taking a set of external texts and connecting them (the essence of a research idea), or taking easily identifiable sections from one document and adjusting them (an example is the data summary — an easily identifiable “text chunk” in most research studies).

A relative weakness of the platform became apparent when the task was more complex – when there are too many stages to the conceptual process. Literature reviews and testing tend to fall into this category. ChatGPT tended to be good at some of these steps but not all of them. This seems to have been picked up by the reviewers.

We were, however, able to overcome these limitations in our most advanced version (version three), where we worked with ChatGPT to come up with acceptable outcomes. All sections of the advanced research study were then rated highly by reviewers, which suggests the role of academic researchers is not dead yet.

Ethical implications

ChatGPT is a tool. In our study, we showed that, with some care, it can be used to generate an acceptable finance research study. Even without care, it generates plausible work.

This has some clear ethical implications. Research integrity is already a pressing problem in academia and websites such as RetractionWatch convey a steady stream of fake, plagiarised, and just plain wrong, research studies. Might ChatGPT make this problem even worse?

It might, is the short answer. But there’s no putting the genie back in the bottle. The technology will also only get better (and quickly). How exactly we might acknowledge and police the role of ChatGPT in research is a bigger question for another day. But our findings are also useful in this regard – by finding that the ChatGPT study version with researcher expertise is superior, we show the input of human researchers is still vital in acceptable research.

For now, we think that researchers should see ChatGPT as an aide, not a threat. It may particularly be an aide for groups of researchers who tend to lack the financial resources for traditional (human) research assistance: emerging economy researchers, graduate students and early career researchers. It’s just possible that ChatGPT (and similar programs) could help democratise the research process.

But researchers need to be aware of the ban on its use in the preparation of journal papers. It’s clear that there are drastically different views of this technology, so it will need to be used with care.