On June 29, 2021, video-conferencing juggernaut Zoom announced on its blog that it was acquiring German simultaneous speech translation provider, Karlsruhe Information Technology Solutions (kites GmbH or, as Zoom now spells it, “Kites”). Karlsruhe is a city in southwestern Germany.
Zoom’s short M&A track record is, naturally, not attributable to a lack of resources. In addition to its USD 100bn-plus market cap, Zoom sits on a USD 4.2bn cash pile — which CFO Kelly Stackelberg said in March 2021 was going to be used for acquisitions: “There are a lot of innovative companies out there that might be the right match for us. We haven’t found the right one.” Well, it seems now they have — in machine translation.
Founded in 2015 by Alex Waibel and Sebastian Stüker, members of the faculty at Karlsruhe Institute of Technology (KIT), Kites has “56 years of collective AI & Speech Processing R&D experience” under its belt, according to its website. Its UVP, “real-time” speech translation.
It is this focus that apparently motivated the purchase. Zoom said that Kites’ team of 12 research scientists will work with the company’s engineers to “advance the field of MT” and provide Zoom users with “multi-language translation.” In short, Zoom users will be able to enjoy multilingual in-app speech translation in the future.
The same blog post said that Stüker and team will continue to work out of Karlsruhe “where Zoom looks forward to investing in growing the team.” The company added that it is “exploring opening an R&D center in Germany in the future.” Waibel, meanwhile, will be “a Zoom Research Fellow, a role in which he will advise on Zoom’s MT research and development.”
Speech-to-speech translation (S2ST) has, undoubtedly, been a hot field of research in recent years. While S2ST, most commonly, achieves speech-to-speech translation by converting spoken words into text, using MT to translate text, and then converting MT output back into speech, Google, for one, has attempted to bypass this interim text-translation step (SlatorPro).
Current state of the art S2ST is still laden with issues — from limits around latency and domains (i.e., only usable for basic topics) — but this may change if more resource-rich giants like Zoom get behind it.
As for Kites, it has flown under the radar as far as language industry startups go; not unusual in the German startup scene. But its UVP squares neatly with Zoom’s and the acquisition signals a significant push into true speech-to-speech translation and, more broadly, automated interpreting.
While this may pose no immediate threat to well-funded, remote simultaneous interpreting (RSI) startups, such as KUDO, Interprefy, or Boostlingo (the latter is even part of the Zoom App Marketplace), the Zoom-kites deal is still a major step in the longstanding competition by automation.