Representational Capacity of Neural Language Models

ACL 2024 Tutorial

Alexandra Butoi, Robin Chan, Ryan Cotterell, William Merrill, Franz Nowak, Clemente Pasti, Lena Strobl, Anej Svete

Why?

Modern (large) language models can do a lot. We use them to generate text, translate languages, write code, and even to have conversations. This raises a plethora of questions: What can these models really do? What can they not do? How do they work? When will they definitely fail? More generally, can we come up with something like a "science of LLMs" that will help us answer these questions? The material on this website—and the tutorial at ACL 2024—aims to provide a general motivational overview of how these questions can be tackled with formal language theory. As we will see, formal language theory provides a powerful framework for understanding the representational capacity of neural language models, helping us to understand the reasoning and generalization abilities of language models, as well as their connection to human language.

Overview of Topics Discussed Today

A Motivational Example: Recurrent Neural Networks and Finite-State Automata

The connection between neural language models and formal languages can be quite intuitive! This motivational example shows how the popular recurrent neural network (RNN) can be seen as a model analogous to the simple finite-state automaton (FSA): both process (or generate) strings (sentences) one symbol at a time and transition between (hidden) states while doing so. This connection is not just a curiosity, but a key insight into the representational capacity of these neural language models. We explore this connection in much more detail in Section 2.

Finite State Automaton

RNN

A recurrent neural network processes the string very similarly to the finite-state automaton!

RNNs

Recurrent neural language models represented the state of the art until recently. Interestingly, their sequential nature (from which the name recurrent comes) makes them very similar to sequential models of computation such as finite-state automata. In this section, we will explore the connection between RNNs and formal languages in more detail. We will see how close the connection between RNNs and finite-state automata really is, and how this connection can help us understand the limitations of RNNs.



Transformers

Transformers are the undisputed state-of-the-art architecture for neural language models today. They are used in models like GPT-3, BERT, and many others. Understandably, they have also attracted a lot of attention from the formal language theory community. However, the connection between transformers and formal languages is not as straightforward as it is for RNNs. We will describe the connection between transformers both as classifiers, where we will connect them to circuit and formal logic, as well as generative models, where we will connect them to finite-state automata and Turing machines.