Large Language Models feel the direction of time

Researchers have found that AI large language models, like GPT-4, are better at predicting what comes next than what came before in a sentence. This “Arrow of Time” effect could reshape our understanding of the structure of natural language, and the way these models understand it.

author avatar

18 Sep, 2024. 3 min read

This article was first published on

actu.epfl.ch

Large language models (LLMs) such as GPT-4 have become indispensable for tasks like text generation, coding, operating chatbots, translation and others. At their heart, LLMs work by predicting the next word in a sentence based on the previous words – a simple but powerful idea that drives much of their functionality. But what happens when we ask these models to predict backward — to go “backwards in time” and determine the previous word from the subsequent ones?

The question led Professor Clément Hongler at EPFL and Jérémie Wenger of Goldsmiths (London) to explore whether LLMs could construct a story backward, starting from the end. Working with Vassilis Papadopoulos, a machine learning researcher at EPFL, they discovered something surprising: LLMs are consistently less accurate when predicting backward than forward.

A fundamental asymmetry

The researchers tested LLMs of different architectures and sizes, including Generative Pre-trained Transformers (GPT), Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM) neural networks. Every one of them showed the “Arrow of Time” bias, revealing a fundamental asymmetry in how LLMs process text.

Hongler explains: “The discovery shows that while LLMs are quite good both at predicting the next word and the previous word in a text, they are always slightly worse backwards rather than forward: their performance at predicting the previous word is always a few percent worse than at predicting the next word. This phenomenon is universal across languages, and can be observed with any large language model.”

The work is also connected to the work of Claude Shannon, the father of Information Theory, in his seminal 1951 paper. Shannon explored whether predicting the next letter in a sequence was as easy as predicting the previous one. He discovered that although both tasks should theoretically be equally difficult, humans found backward prediction more challenging – though the performance difference was minimal.

Intelligent agents

“In theory, there should be no difference between the forward and backward directions, but LLMs appear to be somehow sensitive to the time direction in which they process text,” says Hongler. “Interestingly, this is related to a deep property of the structure of language that could only be discovered with the emergence of Large Language Models in the last five years.”

The researchers link this property to the presence of intelligent agents processing information, meaning that it could be used as a tool to detect intelligence or life, and help design more powerful LLMs. Finally, it could point out new directions to the long-standing quest to understand the passage of time as an emergent phenomenon in physics.

The work was presented at the prestigious International Conference on Machine Learning (2024) and is currently available on arXiv.

From theater to math

The study itself has a fascinating backstory, which Hongler relates: “In 2020, with Jérémie [Wenger], we were collaborating with The Manufacture theater school to make a chatbot that would play alongside actors to do improv; in improv, you often want to continue the story, while knowing what the end should look like.

“In order to make stories that would finish in a specific manner, we got the idea to train the chatbot to speak ‘backwards’, allowing it to generate a story given its end – e.g., if the end is ‘they lived happily ever after’, the model could tell you how it happened. So, we trained models to do that, and noticed they were a little worse backwards than forwards.

“With Vassilis [Papadopoulos], we later realized that this was a profound feature of language, and that it was a completely general new phenomenon, which has deep links with the passage of time, intelligence, and the notion of causality. Quite cool for some theater project!”

Hongler’s excitement with this work stems in good part from the unexpected surprises that came along the way: “Only time could tell that something that started as a theater project would end up giving us new tools to understand so many things about the world.”


References

Vassilis Papadopoulos, Jérémie Wenger, Clément Hongler. Arrows of Time for Large Language Models. arXiv: 2401.17505v4