The goal: a single machine learning model that can parse and understand input from many languages. The use case: people interacting with Alexa in their native tongue (among other commercial applications).
On April 20, 2022, Amazon announced three developments toward reaching that goal, which it termed MMNLU-22, the initials standing for massively multilingual natural language understanding or Massively Multilingual NLU.
The three developments are the release of a dataset with one million labeled utterances in 51 languages and open-source code; a competition using that dataset (deadline: June 1, 2099); and a workshop at the world’s biggest machine translation conference (EMNLP 2022 in Abu Dhabi, December 7–11, 2022).
Amazon called the dataset MASSIVE; that is, Multilingual Amazon SLURP for Slot Filling, Intent Classification, and Virtual-Assistant Evaluation. The dataset comes with examples on how to perform MMNLU modeling so others can recreate the baseline results for two critical NLU tasks — intent classification and slot filling — as described in the SLURP (or SLU resource package) paper linked above.
NLU is a sub-discipline of natural language processing (NLP) and Amazon said they are focusing on NLU as a component of spoken-language understanding (SLU), where audio is converted into text before NLU is performed. Alexa is one example of an SLU-based virtual assistant.
The MASSIVE dataset comprises “one million realistic, parallel, labeled virtual-assistant text utterances spanning 51 languages, 18 domains, 60 intents, and 55 slots.”
Amazon created the dataset “by tasking professional translators to localize or translate the English-only SLURP dataset into 50 typologically diverse languages from 29 genera, including low-resource languages.”
Amazon is basically trying to overcome a major obstacle of SLU-based virtual assistants like Alexa; academic and industrial NLU R&D still being limited to a few languages.
“One difficulty in creating massively multilingual NLU models is the lack of labeled data for training and evaluation — particularly data that is realistic for a given task and natural for a given language. High naturalness typically requires human vetting, which is often costly.”
Hence, R&D is “limited to a small subset of the world’s 7,000+ languages,” Amazon pointed out. “By learning a shared data representation that spans languages, the model can transfer knowledge from languages with abundant training data to those in which training data is scarce.”
Hinting at where it hopes to apply these latest developments commercially, Amazon noted that of the more than 100 million smart speakers sold worldwide (e.g., Echo), most use a voice interface exclusively and rely on NLU to function. The company estimated that the number of virtual assistants will reach eight billion by 2023, and most will be on smartphones.