AI video generation Mark Cuban Podcasts & Videos Synthesia text-to-speech Victor Riparbelli video content video localization

Synthesia CEO Victor Riparbelli on Personalized Video in 40-Plus Languages

Synthesia CEO and Co-founder Victor Riparbelli joins SlatorPod to discuss the company’s approach to operating and developing the world’s first and largest platform for video AI generation.

Victor talks about Synthesia’s journey and the rapid progression of media technology, where video content demand now outpaces the speed of production. He unpacks the role of academia in the company, where PhDs and professors make up nearly 50% of staff.

The CEO goes over the evolution of text-to-speech in the last decade, from the appearance of deep learning and voice-cloning to multi-dimensional speech with emotions, pitch, and style. He also discusses the difficulty of extending voice into multiple languages when there is no data to support the neural network.


Victor reviews the success of video content over text and how this ties into working with global companies, which not only want to train and communicate with their employees but also improve the customer journey. He shares how he sees the localization and translation industries as partners and an integral part of creating multilingual content.

Victor talks about Synthesia’s funding rounds and shares the story behind connecting with their first investor, Mark Cuban, after struggling to find funding. He gives advice on finding the right type of investor and what to expect from venture capitalist’s unfamiliar with the tech space.

The podcast wraps up with Victor’s view on deepfakes and the impact on their approach toward harmful content, and the company’s vision of creating more storytelling rather than informative content.

First up, Florian and Esther go through the poll results from May 21, where respondents weighed in on Translation as a Subscription, with only 12% thinking of it as “the future.” The duo discuss the pros and cons of the subscription model and reference the Pricing and Procurement report, which highlights its simplicity and predictability.

Florian talks about KUDO’s latest public relations win as billionaire investor Bill Ackman tweeted about using the multilingual conferencing platform for an investor presentation. Ackman used KUDO to run the presentation in 11 languages, where he announced buying 10% of Universal Music Group from Vivendi SE.

For the third week in a row, RWS pops up in language industry news as it partners with speech recognition system CEDAT85 to launch a live subtitling and captioning solution for online meetings and events. Esther touches on possible competitors in the space, such as Ai-Media, Redbee, and Verbit.

Florian discusses SwissText’s 2021 conference competition, which saw Microsoft’s winning approach toward the recognition and translation of the Swiss-German dialect into standard German text.

Subscribe to SlatorPod on YoutubeApple PodcastsSpotifyGoogle Podcasts.

Stream Slator webinars, workshops, and conferences on the Slator Video-on-Demand channel.

Source link @

AI dubbing AI video generation FirstMark Capital M&A and Funding Matt Turck multilingual avatars Synthesia Victor Riparbelli

AI Video Startup Synthesia Raises USD 12.5m for Multilingual Avatars – via


AI Video Startup Synthesia Raises USD 12.5m for Multilingual Avatars

On April 20, 2021, AI video generation provider Synthesia announced they had raised USD 12.5m in Series A funding. The funds will be used to focus on enterprise user growth and product development, Synthesia Co-founder and CEO, Victor Riparbelli, told Slator. 

Riparbelli declined to share the company’s valuation, but said their SaaS Product, Synthesia STUDIO, has received an enthusiastic reception since it was launched six months ago.

According to Riparbelli, the UK-based startup currently has “thousands of customers in 40 countries, both S&P 500 and individual creators” and has generated more than a million videos for clients since the business started in 2018.


The Series A round was led by New York-based FirstMark Capital, an early-stage VC firm with investments in companies such as Riot Games, Airbnb, and Shopify. Synthesia said their USD 12.5m round, which included all existing investors and two new angel investors, is the largest investment in the AI video space to date.

Synthesia also raised USD 3.1m in seed money from a round led by LDV Capital and entrepreneur Mark Cuban in 2019.

Avatars Saying Anything in Any Language

FirstMark Managing Director, Matt Turck, blogged about the investment on his website and detailed Synthesia’s approach to video generation, explaining that, Synthesia greatly simplifies creating a business video and offers “a compelling text to video experience.”

Synthesia uses AI to create and customize avatars from a library of (real, human) actors as well as synthetic characters. The avatars are lines of code — they can be told to “say anything, in any language, opening the door to mass customization of video at scale,” Turck wrote in his blog post. The actors also receive payment when their likeness is used by a customer.

Asked about the main use cases for Synthesia’s offering, Riparbelli said that corporate communications, digital video marketing, and advertising localization are the main areas of focus, adding that they “also see great opportunities to partner with S&P 500 companies on their training [e.g., e-learning] needs.”

Another emerging trend is that of personalization, he said, pointing to an online video campaign Synthesia worked on for Lay’s crisps, entitled Messi Messages. The campaign, which features an avatar of footballer Lionel Messi, lets users select from different message options to have Messi’s avatar deliver a personalized invitation to watch a game.

“Since they are just code, the avatars can say anything, in any language, opening the door to mass customization of video at scale” — Matt Turck, Managing Director, FirstMark Capital

Riparbelli said that for the Messi project, “all we needed was five minutes of training footage of him speaking to the camera.” Synthesia’s algorithms learn from existing footage of the actors. So this same technique can be applied to a company exec for a corporate communications video, for example. 

And where does the localization and multilingual element come in? According to Riparbelli, “it is absolutely key.” Synthesia’s clients use their multilingual capabilities every day, he said, and the “feedback from clients is that being able to communicate in video and in 40 languages has been a game changer.”

From a language technology perspective, Synthesia does not appear to have developed any specific capabilities internally (e.g., machine translation, speech recognition, or synthetic voices).

“We are focusing on improving the experience of synthetic video for now, with a specific focus on how to create personalized videos at scale,” Riparbelli told Slator.

The Real Competition is “Boring PDFs”

Asked about the company’s relationship with dubbing studios and media localization providers, Riparbelli said Synthesia sees them as partners rather than competitors, and they “have many as customers.”

He added, “They are building services using Synthesia. And we also use translation partners to create scripts in 40 languages for our corporate clients.” As for Synthesia’s true rivals, “our real competition is boring PDFs that nobody reads,” Riparbelli quipped.

The CEO also provided an update on Synthesia’s 2019 goal of wanting to dub their first feature film in the next couple of years, saying, “We still absolutely believe this to be true. In 10 years, anybody will be able to create a Hollywood-grade movie from their laptop. Cameras will be replaced by code.”

“Our real competition is boring PDFs that nobody reads” — Victor Riparbelli, Co-founder and CEO, Synthesia

He further stated: “Our mission is to reduce the entire video production process of film crews, studios, actors and cameras to a single API call. As the platform advances, our long-term vision is to make it possible for anyone to create a completely synthetic Hollywood film from their bedroom, without the need for anything else than a laptop.”

In the meantime, Synthesia is focusing on what they identify as an explosion in video adoption, which accelerated during Covid-19. According to Riparbelli, “current methods of production don’t scale.” Therefore, rather than attempting to disrupt the existing premium niche of media and entertainment dubbing (at least in the immediate future), Synthesia sees its sweet spot as catering at scale to the expanding video production market.

This is a thesis echoed by speech translation startup Papercup, which has raised USD 14m to date. Joining as a guest on SlatorPod, Papercup CEO Jesse Shemen told Slator: “We are not in this game to try and replace the dubbing industry. I am fine with it existing, by all means, but there are literally billions of hours of content that are untouched because they cannot necessarily afford the traditional method of localizing.”

For more on multilingual video production and synthetic voices, check out Papercup’s Jesse Shemen and Simon King discussing their AI dubbing and synthetic voices venture on SlatorPod


Source link