Words, data and algorithms combine to
Articles about LLM, so sacred.
A glimpse into the world of language
Where language machines are deployed.
It was a natural trend to ask a large-scale language model (LLM) like CHATGPT to write a poem that explores a large-scale language model topic, and then use that poem as the introduction to this article.
So how exactly did this poem come together in such a neat package with rhyming words and a few witty phrases?
We went straight to the source. MIT Assistant Professor and his CSAIL Principal Investigator, Jacob Andreas’ research advances the field of natural language processing, both in developing cutting-edge machine learning models and in exploring the potential of language as a means to enhance other capabilities. is focused on A form of artificial intelligence. This includes pioneering in areas such as using natural language to teach robots and leveraging language to help computers articulate the rationale behind their decision-making processes. research studies. We sat down with Andreas to find out more about how this technology works, its impact and what the future holds.
question: Language is a rich ecosystem with nuances that humans use to communicate with each other, including sarcasm, sarcasm, and other forms of figurative language. There are many ways to convey meaning beyond its literal meaning. Is it possible for a large language model to understand contextual complexity? What does it mean for a model to achieve “learning in context”? How do you handle dialects?
answer: When thinking about linguistic context, these models are able to make broader inferences about much longer documents and chunks of text than anything we’ve ever known how to construct. But that’s just one type of situation. In humans, language production and comprehension take place within a grounded context. For example, I know I’m sitting at this table. There are objects we can refer to, but the language model we have today generally doesn’t allow us to refer to them when interacting with human users.
There is a broader social context that informs many language uses which language uses these models are sensitive to or aware of, at least not immediately. It is not clear how to provide them with information about the social context in which their language generation and language modeling takes place. Another important thing is temporal context. We are shooting this video at a specific moment when certain facts are true. The models we have today were again trained on snapshots of the internet stopped at specific times, and most models we have today are probably several years old. , I know nothing about what happened. since then. They don’t even know at what point they are doing text generation. Figuring out how to provide all these different kinds of context is also an interesting problem.
Perhaps one of the most surprising elements here is this phenomenon called in-context learning.If we take a small ML [machine learning] Create a dataset to feed the model, including movie reviews and star ratings assigned to movies by critics. Here are some examples of these. The language model not only generates plausible movie reviews, but also predicts star ratings. More generally, when you have a machine learning problem, you have inputs and outputs. If you give a model an input and then give it another input and ask it to predict an output, it often does this very well.
This is a very interesting and fundamentally different way of doing machine learning. You can insert many small machine learning datasets into this one big general-purpose model, but you don’t need to train new models, classifiers, or new models at all. Something specialized for my particular task, such as a generator. This is actually something I’ve been thinking about a lot in my group and in collaboration with my colleagues at Google, trying to understand exactly how this in-context learning phenomenon actually happens.
question: We like to believe that humans pursue (at least to some extent) what they know to be objectively and morally true. Large language models are not bound by truth, perhaps because they have a poorly defined or as yet poorly understood “moral compass”. Why do large language models tend to hallucinate facts or confidently claim to be inaccurate? This limits their usefulness in applications where factual accuracy is important. Is there a prevailing theory on how to solve this?
answer: It is well documented that these models are hallucinating facts and are not always reliable. Recently, I asked ChatGPT to describe our group’s research. It lists 5 papers, 4 of which are not actually existing papers, one of which is a genuine paper written by a colleague of mine who lives in the UK, and which I have never seen before. have never co-authored. Facts are still a big issue. Beyond that, anything involving reasoning in a very general sense, involving complex computations and complex reasoning still seems very difficult for these models. I believe that this transformer architecture may even have fundamental limitations and that more modeling work is needed to improve the situation.
Why that happens is still a partially open question, but perhaps there is a reason why, structurally, these models have difficulty building consistent models of the world. they can do a little bit of that. If you ask factual or trivia questions and ask them, they will most likely answer more accurately than your average out-of-town human user. However, unlike the average human user, it is quite unclear whether there is any equivalent of beliefs about the state of the world in this language model. This is due to structural reasons, and the fact that transformers have no place to put their beliefs, and the fact that the training data, these models are being trained over the internet, and that data is created by different people at different times. I think it’s for both reasons. They believe many things about the state of the world. So it’s hard to expect the model to represent these things consistently.
That said, I don’t think this is a fundamental limitation of neural language models, or language models in general more generally, but it’s true about language models today. We already know that models are beginning to be able to build representations of facts, representations of the state of the world, and I think there is room for improvement.
question: GPT-2, GPT-3, GPT-4 and the pace of progress is dizzying. What will be the pace of the trajectory from here? Will it be exponential, or will it follow an S-curve and taper off in the short term? Are there any limiting factors in terms of scale, compute, data, or architecture?
answer: Indeed, in the short term, my greatest fear relates to the previously mentioned issue of truthfulness and coherence, where even the best models we have today are plagued with false facts. is to generate They produce code that contains bugs, but the way these models work makes the code particularly difficult for humans to discover, because the output of the model has all the right surface statistics. is generated. When thinking about code, it’s actually more work for someone to manually write a function than to ask the language model to generate that function and have that person run it to verify that the implementation of that function is correct. Whether there is less is still an open question. is actually correct.
There’s a bit of risk in rushing to introduce these tools too soon, and you could end up in a world where everything is a little worse, but where people really check the output of these models for sure is actually very difficult. That said, these are issues that can be overcome. Especially at the pace at which things are going, in the long run there is a lot of room to deal with issues of factuality, consistency and correctness of the generated code. These are just tools, tools that can be used to free society as a whole from many unpleasant tasks, chores, and drudgery that have been difficult to automate, and it’s exciting.
