This is an “unfair” test. This is a good example of a “bad” use of LLMs. They are not databases. They do not generate precise factual answers to questions. And they are probabilistic systems, not deterministic. No LLM today can give you a completely precise answer to this question. The answer is Might be It may be true, but we can't guarantee it.
People tend to infer (often drawing parallels to cryptocurrencies and NFTs) that this means these things are useless. That's a misconception. Rather, a useful way to think about generative AI models is that they can tell you very well what good answers are to questions. like it is It probably looks like this: There are some use cases where “seems like a good answer” is exactly what you want, and there are other use cases where “approximately right” is “exactly wrong”.
In fact, we could push this a bit further and suggest that the exact same prompt and the exact same output could be a good or bad outcome depending on why you want it.
Either way, in this case I need an accurate answer, but ChatGPT couldn't be expected to give me one in principle and instead returned an incorrect answer. This is an unfair test, since I asked ChatGPT to do something it couldn't do, but it's a relevant test. The answer is still wrong.
There are two ways to solve this. One is to treat this as a science problem. It is early days and models will improve. You can use the words “RAG” and “multi-agent” over and over. Models will definitely improve, but how much? You can spend weeks watching YouTube videos of machine learning scientists discussing this and only find that they don't actually know. In fact, this is a variant of the “Does LLM produce AGI?” debate, because a model that can answer “every” question perfectly correctly seems like a good definition of at least one kind of AGI (but then again, no one knows).
But the other way is to treat this as a product problem: How can you build a useful mass-market product based on a model where things are supposed to be “wrong”?
The usual AI response to examples like mine is “you're looking at it the wrong way,” meaning I 1: asked the wrong kind of question and 2: asked it the wrong way. I should have done more prompt engineering. But the message of the last 50 years of consumer computing is that forcing users to learn the command line doesn't drive adoption. You have to go to the user.
