Job-seekers, take note: Tech giant Apple is looking for “creative producers for voice.” The broad job title reflects the range of responsibilities each person will handle within an interesting department, Apple’s Text-to-Speech Design Studio.
Voice producers will join research engineers, annotation analysts, and language engineers to create and oversee the synthesized voices used in a growing number of customer service areas. While new hires will work on expanding language options offered by Apple’s virtual assistant, Siri, the introduction of this role in a text-to-speech context indicates the company’s drive to productize synthetic voice research beyond Siri.
One potential application: dubbing. Since March 2020, Covid has disrupted traditionally conservative industry supply chains, compelling clients and providers to accelerate innovation in areas such as cloud dubbing. While cloud dubbing is an important step toward a more digitized dubbing supply chain, synthesizing voices (a.k.a. synthetic dubbing) may be the next frontier in terms of automation.
Apple’s work on synthesizing voices could eventually be incorporated into an in-house dubbing process around original content for Apple’s entertainment offering, Apple TV+. Apple has reportedly budgeted billions of dollars for the streaming service.
Another investment that might pave the way for other use cases is Apple’s 2020 acquisition of Voysis, an AI startup whose platform enables retailers to add voice to their websites and mobile apps.
Apple’s recent research suggests other possible directions for voice producers’ work, namely speech-to-speech translation and bilingual text-to-speech, in which a monolingual voice is “taught” to speak a second language.
Naturally, competitors are also hard at work on voice synthesis research, right up to aspirational improvements to lip movement sync — that is, matching a speaker’s lip movements to translated audio (e.g., Synthesia).
In 2019, Google debuted Translatotron (Pro), a proof-of-concept, speech-to-speech translation system that skips the traditional text translation step. A November 2020 paper on Google’s work with AI lab DeepMind introduced a system for “large-scale multilingual audiovisual dubbing.” Amazon, meanwhile, explored automated English to Italian dubbing in a January 2020 study (Pro).
Since slots for voice producers are open in several languages, it stands to reason that Apple may have set its sights on adding new language combinations or voice “styles” to Siri’s repertoire. Past voice producer postings sought speakers of Arabic, French, Russian, Spanish, and Turkish. (Siri is currently available in 21 languages, although its neural text-to-voice feature is an option in fewer locales.)
In some ways, the ideal creative producer for voice is a jack of all trades. They will hire and manage personnel, such as talent coaches and script supervisors, and coordinate with a variety of colleagues, including writers, translators, engineers, and marketing experts. Voice producers will also be expected to master a number of tools for specific technical work: tracking production progress, validating and improving dialogue translations, and verifying and correcting pronunciation.
Who might fit the bill for such a multifaceted role? Apple is especially interested in hearing from professionals with experience directing audio and video productions; previous relationships with production and post-production houses; knowledge of phonetics and linguistics; and, depending on the specific job post, native fluency in Cantonese, Italian, Japanese, or Mandarin.