What is text-to-speech?
Digital audio is now embedding itself everywhere, including in text content and newspapers, where it converts readers into listeners. This is a new opportunity to monetize content in an environment where listeners are receptive and engaged.
According to a MarketsandMarkets report published in January 2021, the text-to-speech (speech synthesis) market was valued at US$2 billion in 2020, and is expected to reach US$5 billion by 2026.
Its main growth drivers are the growing demand for mobile devices, increased public spending on education for the disabled and the elderly, and new ways of reading and learning.
In the United States, nearly 200 million listeners have been converted to digital listening, with technology for converting text-to-audio playing a major role here. Many players have established themselves in this market.
From reading to listening to the press
For many years now, newspapers and magazines have been seeing their advertising revenues shift to digital, but their legacy paper business model has continued to suffer from the transition as mobile devices have emerged. Text-to-speech brings a new dimension to text content and added value for the reader in the context of increasing mobility.
For the press, audio represents a new growth driver. This is because listeners stay engaged for three times longer than readers on average. A British study conducted by the Publisher’s Association indicates that, for the majority of respondents, listening to a book or article is more immersive and intimate than simply reading it. This is a context in which digital audio advertising naturally finds its place, since it is broadcast in an environment of proximity and greater receptivity.
Digital audio advertising is growing rapidly, confirms Audiate.Me, a company specialising in converting text into audio content using a simple widget. Last year, the company saw a 60% jump in programmatic sales in digital audio.
One of the pioneers in the field of speech synthesis is Trinity Audio, a Tel Aviv-based company founded in 2017. Its mission, claims its CEO Ron Jowarski, is as follows: “to audify the Internet” (literally to give voice to the Internet). Using text-to-speech technologies for audio, it addresses the three pillars of the domain: publishers, readers and advertisers.
From text to podcast
Another innovative player in the field, Remixd, based in Washington DC, makes it possible for podcasts to be created on the basis of textual content. Their simple technology enables brands to either create a podcast presence on the main platforms or to enrich it. This technology does not require any additional development work. The company specialises in premium content, used by prestigious brands such as Sports Illustrated, The Verge, Pop Sugar, The New Yorker, People and Thrillist.
2021 – the audio boom continues
In the field of text-to-speech, 2020 will be seen as an extraordinary catalyst: successive lockdowns have only increased the demand for audio. The publishing world very quickly understood the changes taking place in media consumption, adding audio versions of press articles en masse.
For publishers, this is artificial intelligence technology that is simple to implement, inexpensive and scalable, allowing audio advertisements to be inserted into content. Listeners are more sensitive to this: they can keep abreast of the latest developments in the news, whatever they are doing at the same time.
Trinity Audio has, for the first time, published an interesting report on listener engagement with digital audio and native audio advertising, from which some lessons can already be learned:
- The total listen-through rate (LTR) for audio content is 59%.
- Listening to audio commercials is accepted, with an LTR of 91%.
- Listeners prefer listening to more lengthy content: an LTR of 70% for content lasting more than 5 minutes, compared with less than 60 for shorter content.