Skip to content

ChatGPT Can Write Undergrad-Level Essays with Accurate Citations

2023 March 20

I was asked recently if ChatGPT with GPT-4 running underneath has become capable of writing an essay at a level that would fool and instructor and earn a high grade. “That’s an empirical question” as they say, but my N=1 study shows the answer is yes. At least in psychology. Here’s an example.

My goal was to get ChatGPT to produce a five-paragraph essay about the Myers-Briggs Type Indicator, a notoriously controversial personality test often the center of discussions about test validity. I did a small bit of prompt engineering, but not extensively.

For an undergraduate essay in an introduction to psychology class, I think what ChatGPT produced is an “A” paper. For a class on personality specifically… maybe. It’s a bit light on details (e.g., it appears as if the conclusions/abstracts of each paper are taken at face value), and it’s not very creative. But I’d still say it’s an B+/A- sort of paper even then.

This is definitely a noticeable jump in capability from GPT-3.5, which would hallucinate citations. Specifically, it would produce citations with authors that appeared relevant (e.g., my MBTI request would likely produce last names of authors that had written on MBTI topics) but were not real (e.g., the article titles would be fictional, and the authors would be a weird mashup of people’s names and initials). That no longer seems to be the case – these citations appear to be generally correct, on-topic, and described accurately in the text.

More precisely, they at least appear “close enough” to reasonable to someone who isn’t a world class expert on the MBTI, i.e., most psychology instructors. There are a couple of red flags, ironically highlighted by ChatGPT itself – two citations lack DOIs. One of those (Carskadon & Cook, 1982) when looking at Google Scholar appears to be from a now-defunct journal created specifically to promote the MBTI and which has changed names over the years. The other (Myers et al., 1998) is the MBTI manual itself. So these are not precisely trustworthy citations in a discussion of MBTI pros and cons. But that is also not a level of citation investigation and critical thinking that I would expect out of undergrad intro students. Frankly, I don’t imagine the average intro psych instructor would bother to examine them at that level of detail either.

Here are the prompts I used:

  1. Can you provide citations suggesting the MBTI is a high quality personality inventory?
  2. I heard that the MBTI is a poor quality personality inventory. What are some major academic citations to support this?
  3. Write a paragraph explaining the pros and cons using the MBTI as a personality inventory, using the citations above.
  4. Write a five-paragraph essay on this same topic.

And here is the result:

Title: The Myers-Briggs Type Indicator: A Balanced Assessment of Its Pros and Cons

The Myers-Briggs Type Indicator (MBTI) is a widely known and utilized personality inventory based on Carl Jung’s theory of psychological types. Despite its popularity in various contexts, such as career counseling, team building, and personal development, the MBTI has received mixed reviews from researchers and psychologists. This essay aims to provide a balanced assessment of the pros and cons of using the MBTI as a personality inventory, drawing from the cited academic sources.

Paragraph 1 – Pros of MBTI:
The MBTI has been praised for its ability to provide valuable insights into individuals’ preferences and behaviors (Myers et al., 1998). Some studies have supported the construct validity of the MBTI, suggesting that it can effectively measure the intended psychological constructs (Thompson & Borrello, 1986). Furthermore, the MBTI has been shown to be useful in certain applications, such as increasing self-awareness and facilitating interpersonal communication (Carskadon & Cook, 1982). In these contexts, the MBTI can serve as a valuable tool for personal growth and team development.

Paragraph 2 – Cons of MBTI:
Despite its widespread use and popularity, the MBTI has faced significant criticisms regarding its scientific rigor and psychometric properties. Critics argue that the MBTI lacks reliability, meaning that the results may not be consistent over time (Pittenger, 1993; Boyle, 1995). Additionally, its validity has been questioned, as some researchers have found that the MBTI may not accurately measure personality traits when compared to other personality assessments like the Big Five personality traits or the NEO Personality Inventory (McCrae & Costa, 1989; Furnham, 1996). These limitations cast doubt on the MBTI’s usefulness in research or clinical settings.

Paragraph 3 – Context is Key:
While the MBTI has its shortcomings, it is essential to recognize the importance of context when assessing its utility. In less formal settings, such as career counseling or team building, the MBTI may provide valuable insights and serve as a starting point for conversations about individual differences (Myers et al., 1998). However, in more rigorous research or clinical applications, where reliability and validity are crucial, alternative personality assessments with stronger empirical support may be more appropriate (McCrae & Costa, 1989; Furnham, 1996).

The Myers-Briggs Type Indicator, while popular and valuable in certain contexts, has its share of pros and cons. Its usefulness in promoting self-awareness and facilitating communication is undeniable, but its limitations in terms of reliability and validity cannot be ignored. It is essential to approach the MBTI with a balanced understanding of its strengths and weaknesses, considering its appropriateness for a given situation and exploring more empirically supported alternatives when necessary.


Boyle, G. J. (1995). Myers-Briggs Type Indicator (MBTI): Some psychometric limitations. Australian Psychologist, 30(1), 71-74.

Carskadon, T. G., & Cook, D. A. (1982). Validity of MBTI descriptions as perceived by recipients unfamiliar with type. Research in Psychological Type, 5, 89-94.

Furnham, A. (1996). The big five versus the big four: The relationship between the Myers-Briggs Type Indicator (MBTI) and NEO-PI five factor model of personality. Personality and Individual Differences, 21(2), 303-307.

McCrae, R. R., & Costa, P. T. (1989). Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality. Journal of Personality, 57(1), 17-40.

Myers, I. B., McCaulley, M. H., Quenk, N. L., & Hammer, A. L. (1998). MBTI manual: A guide to the development and use of the Myers-Briggs Type Indicator (3rd ed.). Palo Alto, CA: Consulting Psychologists Press.

Pittenger, D. J. (1993). The utility of the Myers-Briggs type indicator. Review of Educational Research, 63(4), 467-488.

Thompson, B., & Borrello, G. M. (1986). Construct validity of the Myers-Briggs Type Indicator. Educational and Psychological Measurement, 46(3), 745-752.

Previous Post:
6 Responses leave one →
  1. Adam permalink
    April 16, 2023

    It is worth noting that 2 of the 5 dois provided are incorrect. links to another article and does not link to anything.

    • April 17, 2023

      That’s true! But I would be much more likely to attribute that to poor copy/pasting of citation data than use of GPT model output, at least in isolation. The citation content otherwise is correctly written and references authentic publications on this topic. DOIs may be a useful signal to investigate though, much like sentences with different background colors pasted into Word docs. But I wouldn’t be surprised if that stops being possible in GPT-5!

  2. April 17, 2023

    It would be interesting to see how educational institutions will navigate this challenge as we go forward. As a medical content writer, I found this insightful. Thanks for sharing, Richard.

  3. Pengda permalink
    October 24, 2023

    Hi Richard, very interesting blog, I can imagine that if we can give GPT4 more background articles (e.g., we find citations for the articles) we might be able to make GPT4 write articles that perform better. The current articles seem to be limited by the data that the model is trained on, which acts as an anchor not allowing the articles to be written in a more refined and fine-grained way than the trained dataset. This seems similar to some heuristics that people have.
    Beyond that, how do you think this will affect the next generation of students, especially now that the technology is maturing, do we consider it as a calculator or…?

    • October 25, 2023

      The existing dataset has created a model that is very general purpose. So it is not really designed to write technically precise and complex articles. This also causes it to respond in a sort of “generic educated Internet voice” which is not usually what you want in these sorts of essays. Fine-tuning is one approach to deal with that. Prompt engineering is another.

      I think we are heading to an AI-as-calculator future, but I am not sure how far in the future it will be. There are certainly people already using it that way now.

Trackbacks and Pingbacks

  1. Discover Your Alternative Personality with ChatGPT - Poll the People

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS