In 2023, AI researchers at Meta interviewed 34 native Spanish and Mandarin speakers who lived in the US but didn’t speak English. The goal was to find out what people who constantly rely on translation in their day-to-day activities expect from an AI translation tool. What those participants wanted was basically a Star Trek universal translator or the Babel Fish from the Hitchhiker’s Guide to the Galaxy: an AI that could not only translate speech to speech in real time across multiple languages, but also preserve their voice, tone, mannerisms, and emotions. So, Meta assembled a team of over 50 people and got busy building it.
What this team came up with was a next-gen translation system called Seamless. The first building block of this system is described in Wednesday’s issue of Nature; it can translate speech among 36 different languages.
Language data problems
AI translation systems today are mostly focused on text, because huge amounts of text are available in a wide range of languages thanks to digitization and the Internet. Institutions like the United Nations or European Parliament routinely translate all their proceedings into the languages of all their member states, which means there are enormous databases comprising aligned documents prepared by professional human translators. You just needed to feed those huge, aligned text corpora into neural nets (or hidden Markov models before neural nets became all the rage) and you ended up with a reasonably good machine translation system. But there were two problems with that.