Week 2 talk summary

Duolingo: Improving Language Learning and Assessment with AI, By Burr Settles

Last updated on Sep 17, 2019 2 min read talk summary

AI will not replace great teachers, but let more people have access to them. Duolingo thinks their mission is to provide a tremendous opportunity to scale quality education to everyone who needs it, using AI and Machine Learning. To do so, they used student modeling, which is personalized for each student and can adapt over time. All of their models are based on the assumption of Spaced Repetition and Forgetting Curve.

In the beginning, they used Leitner System, which builds 5 card boxes (1, 2, 4, 8, 16 days) for each student, correctly answered cards are advanced to the next, less frequent box, while incorrectly answered cards return to the first box for more aggressive review and repetition.

When Dr. Settles came to Duolingo, they used Machine Learning to model the previous assumptions, and introduced the Half-life regression model. This model is basically a logistic regression model which use number of correct words, number of incorrect words and the word itself as input feature, use half-life forget time as latent feature, to predict the recall rate at certain time after the last learning. The user experiment based on engagement shows that the word itself helps a little bit, but the half-life term helps a lot.

All of the models above used strength metric. To better model the student, they changed it to crown metric, which focuses on leveling up the difficulty, instead of repeatedly learning previous words. To model the student before they start, they introduced the Placement Test, which tried to model the fluency level of language the student is, based on Common European Framework of Reference (CEFR). They also built a model to measure the difficulty of each word, which is publicly available at https://cefr.duolingo.com/.

This is a very interesting talk for me because Dr. Settles shows how to use and modify certain machine learning and NLP techniques for a specific problem and indeed improves a lot. The word difficulty model is part of my labmate’s research topic. It looks great they integrated active learning to do it, which I think is a more doable way to use machine learning in industry. However, I find a problem that they always use Engagement as the metric in A/B test. I know that engagement is the key metric for company to make money with ads, but without any actual language learning metric in the test cannot reflect the real help their product can give to users.

AI Language Learning

Zhimeng Luo

PhD Student

My research interests include Natural Language Processing, Machine Learning, and Knowledge Representation.