By now, I have reiterated several times that ChatGPT is an amazingly capable chatbot but, in the end, just a very large language model. This is important to know, because over the past couple of weeks the media have been full of complaints that ChatGPT’s answers to more or less difficult questions are often wrong. Such reports then usually conclude that modern artificial intelligence is still a fluke and that claims about its disruptive potential are exaggerated.
Given this kind of coverage, I think that observing ChatGPT’s shortcomings is valid but downplaying its disruptive potential is shortsighted. In this post, I will try to explain why I think that is. To this end, it is insightful to look at an early interaction I had with the chatbot (in December 2022) and see what we can learn from it. However, I need to begin with another disclaimer: the following short conversation is about quantum computing, one of the topics we research at the Lamarr Institute; since quantum computing involves advanced mathematics, my conversation with ChatGPT may seem cryptic. However, l will not discuss the underlying technical details but merely comment on general aspects.
Here we go again. Already in one of the first interactions I ever had with the chatbot, it appeared rather chatty and overly eager to impress. But how detailed is its knowledge of quantum computing?
This is funny! I asked for a diagram and what I got reads like the caption of a scientific illustration. This once again suggests that ChatGPT was trained with a lot of educational and scientific texts and thus produces answers like one would find in such sources. Note, however, that I was not disappointed. First, I knew that ChatGPT cannot draw pictures. I do not hold that against it, because drawing pictures is not its purpose. In fact, I believe that, if ChatGPT was coupled with image generating AIs such as DALLE-2, it would have no problem to produce pictures. Second, at first glance, the answer I got contains all the buzz words one would be looking for and it therefore seems intuitively correct. But can ChatGPT do better? Can it generate an answer that goes beyond an apparently reasonable caption?
Now I was impressed! What ChatGPT did here was to produce python code with methods from IBM’s qiskit library, a popular tool for implementing and testing quantum algorithms. Moreover, the code it produced really looks as if a human expert had programmed the ideas from the chatbot’s previous answer.
What we see here is therefore an example of ChatGPT’s capability to program! This is one of its features that has caused quite some excitement on the web and led people to fear that AI will make programming jobs obsolete.
However, there is a huge caveat! I was so impressed by this turn of the conversation that I did not realize that the answers I had received were utter nonsense. This realization came two weeks later when I was preparing a scientific presentation on current developments in AI. Only when I copied the above code onto my slides and carefully reread it, did I realize that it does not at all compute the logical AND of two quantum bits. At first sight, it looks very reasonable, but on closer inspection it really is not. Even worse: some of the comments in the code are misleading and do not properly reflect what is being done.
So, what can we learn from this? Even though I work on AI myself and regularly write qiskit code, I was misled by ChatGPT’s confidence and apparent expertise. While this is embarrassing, I eventually realized I had been fooled. But would students who use ChatGPT for their homework? Would hobby investors who use it for financial advice? I don’t know, but I’d rather doubt it. For instance, in the short time since ChatGPT was released, I have already seen many YouTube videos in which people said something like “the AI told me that …” and did not seem to question the validity of what they had been told.
This highlights the importance of the kind of research we conduct at the Lamarr Institute. First of all, AI trustworthiness is an obviously growing concern and our scientists work in interdisciplinary teams to scrutinize data gathering, storing, accessing, sampling, preprocessing, modeling, and applying to develop systems whose outputs are reliable.
Second of all, I once again want to emphasize that ChatGPT is just a large language model that was never specifically designed to reliably answer whatever questions people might have. In fact, shortly after ChatGPT had been released, Sam Altman, the CEO of OpenAI, pointed this out in a tweet.
Does this mean that AI systems such as ChatGPT will never be reliable? No, very likely it does not! On the contrary, to experts it is rather obvious how to overcome current limitations in this regard. For instance, in January 2023, Stephen Wolfram published a much noticed blog post in which he sketched ideas for how to integrate data-driven language models with knowledge-driven inference machines.
This, too, is well in line with our research goals at the Lamarr Institute. In our work on Hybrid Machine Learning, we develop methods that integrate data, knowledge, and (application) contexts into the learning process. On the one hand, this promises solutions that are more efficient and robust than purely data-driven techniques. On the other hand, we expect our work to lead to more explainable and trustworthy systems which are less biased and require fewer training data than current large language models. The latter also means that Hybrid ML solutions can be more democratic because not only IT industry giants with their virtually unlimited resources can build advanced AIs, but also smaller companies or public institutions.
Can we trust ChatGPT?
While it is tempting to take ChatGPT’s answers at face value because it always appears confident and knowledgeable, we still must be very careful about what it tells us. Even or especially if its answers look plausible and convincing at first sight. This caveat strikingly underlines the importance of our research on Trustworthy AI at Lamarr. Similarly, these kinds of shortcomings of current AI systems motivate our research on Hybrid Machine Learning methods. Those who want to learn more about these should again stay tuned; in upcoming posts, I will discuss them in more detail.