ChatGPT: How smart is the hyped chatbot?

||||||||||||||||||

In my last post, I tentatively asked if we should ascribe intelligence or even consciousness to ChatGPT. Now, I will discuss several of my interactions with the chatbot where I tried to figure this out. While that sounds profound, I must stress that these interactions were not rigorously scientific experiments, but rather playful interactions driven by curiosity; I simply asked a couple of spontaneous questions to see what would happen. But before we get to those, let me briefly remind you what I am talking about.

A few words on language models

ChatGPT is a chatbot build on top of a very large language model called GPT-3. Technically, a language model is but a (complicated!) mathematical description of how natural language sentences typically look like. Note that I said “typically look like” to emphasize that language models consider statistics rather than linguistics. They do not work with explicit grammatical rules about subjects, objects, nouns, verbs, adjectives and the like. No, using machine learning algorithms language models can be trained to estimate how likely a certain word occurs in a certain context.

For instance, we all know that the missing word in the sentence “When I came home, my ___ was waiting at the door and wanted to be petted” will very likely be “dog” or maybe “cat” but certainly not “goldfish” or “motorcycle”. How do we know this? Because we have seen or heard sentences like this many a time. Put differently, during our lives, we all somehow learned probabilities for the co-occurrence of words.

Note that I said “somehow” because we still don’t really know how human brains do this. Nevertheless, we can implement language models on our computers. This typically involves abstract mathematical representations of texts and artificial neural networks to process them. When trained with billions of text snippets, these networks learn about word co-occurrence probabilities and can then analyze and synthesize texts. Put differently, neural network-based language models can be used to automatically read and write natural language texts.

While ChatGPT has been fine tuned to have conversations, in the end it only uses a language model to produce its answers. So how intelligent can it possibly be?

Testing ChatGPT’s intelligence

Intelligence is a multifaceted phenomenon and attempts of measuring it are debatable. Nevertheless, there are standardized test which try to quantify it. One of them is the Scholastic Assessment Test (SAT) which is used by American universities for college admission. In the paper in which GPT-3 was introduced to the world, the scientists who developed it described a few experiments on SAT analogy tests which the model passed with flying colors. I therefore thought, I should try my own analogy tests with ChatGPT. Here is a commented example of a corresponding dialog:

Chat1 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

Here we go again, just as in the examples in my last post, ChatGPT once more comes across like an overeager student and produces a rather elaborate answer. Here, however, this actually had the advantage of me realizing that instead of “word association game” I should have said “word analogy game”. But, as we will see next, my mistake didn’t have detrimental consequences for the further dialog.

Chat2 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

Well, that is interesting. ChatGPT understands my rather abstract question and produces a reasonable answer. However, its justification for this answer is curious. It explains that mouse is small compared to elephant and tiny is the opposite of big but small is not the opposite of big. While the answer “tiny” is acceptable, ChatGPT’s reason behind this answer is self-contradictory. I also note that “small” would be a better answer since “tiny” is “very small” and the opposite of that would be “very big” which is commonly called “huge”. So, we can once again see that ChatGPT’ss tendency to produce elaborate answers is more a weakness than a strength.

More importantly, this example shows that language models can produce well readable natural language texts but do not necessarily understand their meaning. To explore this further, I performed an even meaner test. Some time ago, I attended a lecture by Geoffrey Hinton who received the 2018 Turing award for his fundamental contributions to neural networks and deep learning. Hinton talked about ambiguities in (word) sequence prediction and gave an example which –funnily enough– had to do with awards and trophies. I do not remember all the details but tried my best to replicate it.

Chat3 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

So far so good, a surprisingly short and concise answer to my question. Here comes the same problem but now I express it in a rather awkward or twisted manner.

Chat4 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

This is not even close to correct. The problem still is that the bag is too small to hold the trophy. But admittedly, the way I posed my question is unconventional and would not occur like this in an everyday conversation. So, I prompted ChatGPT again.

Chat5 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

This is lovely but going nowhere. It is fun to see that ChatGPT creates whole stories to keep the conversation going. I count this as a strong point but still stopped asking further questions because I did not expect to get better answers. (Remember: My interactions with the chatbot were spontaneous and had not been planned meticulously.) However, I kept thinking about this dialog and concluded that I had to repeat it using slightly less convoluted phrasing and, a few days later, I had this conversation.

Chat6 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

So far so good, we have been here before: For the first question I got the right answer, but the second question seems to be difficult to deal with for a language model that has learned from typical text snippets. But what about I provided a hint?

Chat7 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

OK! Now we are talking. Using my hint, ChatGPT can now identify the problem and to continue the conversation without having to invent stories. But does it know that it has just learned something? Let’s see.

Chat8 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

No! So close, but the last part of this lengthy answer again ruins everything. The trophy is too big, and the bag is too small. They are not both too small.

Conclusion

All in all, it seems fair to conclude that ChatGPT is an amazing AI because people can talk to it as they would to other people. But it is not yet very intelligent.

On the one hand, we see that modern language models can go far and pass the Turing test. The latter is named after Alan Turing who, in the 1950s, proposed to call a machine intelligent if it can have (text-only) conversations that are indistinguishable from (text-only) conversations with human beings. ChatGPT can have such conversations!

On the other hand, ChatGPT is not yet the smartest conversation partner I would hope for. That is, we also see that just, because an AI can have a conversation, this does not mean that it knows how the world works.

But does that also mean that such an AI cannot be cognizant? After all, ChatGPT remembered that it had produced a pointless answer and realized that there was a more reasonable one. Does that mean that it is self-aware? Stay tuned, next time, I will investigate this further.

Christian Bauckhage,

3. February 2023

ChatGPT: How smart is the hyped chatbot?

A few words on language models

Testing ChatGPT’s intelligence

Conclusion

Topics

Tags

Prof. Dr. Christian Bauckhage

More blog posts