# Reimagining Language Learning with NLP and Reinforcement Learning

The way we learn natural languages hasn’t really changed for decades. We now have beautiful apps like Duolingo and Spaced Repetition software like Anki, but I’m talking about our fundamental approach. We still follow pre-defined curricula, and do essentially random exercises. Learning isn’t personalized, and learning isn’t driven by data. And I think there’s a big opportunity to change that. With the unlimited supply of natural language data online, and with the advances in Natural Language Processing (NLP) techniques, shouldn’t we be able to do something smarter? Here’s what I’m thinking.

### The foundation: Modeling Knowledge

At the heart of making learning more efficient is the ability to model a learner’s knowledge. Once you understand what a learner knows you can present her with material that’s most beneficial. Modeling knowledge in general is a difficult problem.  How would you quantify your knowledge about ancient Rome, English literature or mechanics? Knowledge in most disciplines is based on connecting disparate facts and then and reasoning about them in one way or another. Language learning is different, and it’s unique in that it’s quite simple. Comprehending a sentence doesn’t require higher level reasoning, and we can actually measure a learner’s knowledge by presenting her with the right challenges, such as sentence comprehension or completion.

We also need to model language itself. In order to present a learner with a sentence she can comprehend we must know which knowledge (vocabulary, grammar, etc) that sentence depends on. In a way that’s what courses do “manually”.  They present you with a predefined sequence of material that builds on top of each other. I believe we can do this automatically. NLP techniques are sufficiently sophisticated that we should be able to figure out the knowledge dependencies of a text. And that would open up a whole new world of possibilities.

### A mathematical formulation

To make things concrete, let’s actually try to define the above mathematically. What follows is invariably an oversimplification of language learning, but I think it’s a useful enough model to do something interesting with. Let’s assume a learner’s language knowledge can be quantified by how well she knows vocabulary and grammar items. I’m not saying this is the right, or the only definition, but it’s something has worked quite well in practice. It’s what most courses and textbooks do.

A learner’s knowledge is defined by a state $s$, which captures our belief about what the learner knows. For example, $s$ could be a sparse vector of real numbers where each element (e.g. 0.73) is a score quantifying how well a learner knows a word or grammar rule. The score could be calculated based on the learner’s performance on reading/listening comprehension and writing/speaking production tasks. Note that $s$ models our belief about the learner’s knowledge, not necessary the actual state of the world. Thus it would probably be a good idea to also include uncertainty (in the form of confidence bounds or distributions) in the representation above. But it’s easier to think about $s$ as just a vector of scores.

We can perform actions $a \in A$ to modify a learner’s knowledge. Actions could include vocabulary reviews, sentence comprehension tasks, or grammar exercises. Just think about what textbooks do. All of these actions have an effect on $s$.  They could increase or decrease the scores based on how well the learner did (or could change the uncertainty about our beliefs). In other words, if a learner is in state $s_t$ at time $t$, then an action $a \in A$ will transition her to a new state $s_{t+1}$. The number of possible states is obviously huge, or infinite.

This now starts to look a bit like a Markov Decision Process (MDP), except that we don’t have uncertainty in our state transition, and that we haven’t defined a reward function.

### Learning towards a specific goal

Most approaches ignore the fact that students have different motivations for learning a language. That’s clearly a mistake. The knowledge required to understand your favorite TV drama is different from the knowledge required to comprehend scientific journals. Obviously there is a lot of overlap, but taking a class focused on daily conversation probably isn’t the fastest way towards reading academic literature. With the ability to model knowledge on a fine-grained level we can have truly personalized learning.

Let’s assume the learner’s goal is to understand a certain text, an online article or Youtube video for example. Because we know the knowledge dependencies of that text we know which target states $s^t$ would allow the learner to comprehend it (with high probability at least). Our goal is to find a policy $\pi(s_t)$ that tells us which actions to take at any given point in time in order to reach some target state as quickly as possible. The policy tells us the stochastically optimal path towards a learner’s goal. In an MDP, the policy is defined as maximizing the sum of rewards from some reward function $R_a(s, s')$, and by defining that function in the right way we can solve the problem of finding an optimal policy using Reinforcement Learning techniques.

This task is challenging due to several reasons. The state space is infinite and actions have stochastic results. We can never explicitly model the whole space. We may also need to trade immediate rewards for long term rewards. For example, instead of learning a complicated term that frequently appears in the target text it may be better to learn a common word that has a low frequency in the target text but makes more actions available to the learner in the future. Luckily, all these are well-known problems that have been solved in one way or another.

### Picking the right actions

A key challenge in language learning is to present the learner with material that is neither too difficult nor too easy, or the learner will become frustrated or bored, respectively. In text comprehension there is research that shows that one unknown word for about 50 known words is a good ratio to encourage learning. Learning vocabulary from context is generally more effective than rode memorization because it forces the brain to make connections to things you already know. If we can accurately model the knowledge of a learner and the knowledge dependencies of text, then this task becomes trivial. We could find articles, social media posts or other content that are just right for the learner’s current level and create actions based on them. And of course, by presenting such material to the learner we would refine our model of what the learner actually knows. In order words, the set of actions available at a state $s_t$ should be limited  to those actions that are appropriate for a learner at that stage.

### Data Network Effects

The more actions a learner performs the more accurately we will be able model his knowledge, and the more confident we can be in presenting him with the right actions. But that’s not all. As more learners are performing actions we can become more certain about how actions affect a learner’s state, essentially answering the question: Which material is most effective for a learner with a certain background knowledge and goal? This not only allows for making optimal recommendations about what a learner should do next, but may even provide insights about how people learn in general.

These are just some examples of things we can do with a more analytical approach to language learning, but it’s already pretty exciting.