The Golden Age of Audio 👑

Friday 28th May, 2021

Thought for the week 💭

The Golden Age of Audio

[3-minute read]

This is the golden age of audio. Since 2017, there has been an explosion in the adoption of ‘connected audio’ hardware. This has made it possible (and convenient) for 300m+ people to listen to an endless library of audio content anytime and anywhere. Simultaneously, there has been a wave of new audio content and new audio-first platforms.

Yet, despite the growth in audio consumption over the past 5 years, I believe that there are still major untapped opportunities for innovation in audio content and audio platforms. Relative to video technology, audio has not seen anywhere near the same level of innovation and investment. This is a prime opportunity for startups to address.

Why audio is awesome

1// It’s effective — it is all-consuming. Putting your headphones in to listen to a podcast or live stream is a highly intentional and intimate act. Information penetrates. There are fewer distractions relative to video (which consists of visual elements that require a user’s attention). A recent report by IAB indicated that 67% of listeners could recall products/brands featured in adverts and 61% bought the item advertised. This leads to high advertising ROI. See for yourself… how many of these audio logos can you identify?

2// It’s versatile — Audio can be consumed passively as background music. It can also be consumed actively with intense focus requirements. It can be incredibly information-dense, making audio a highly efficient way to learn about a topic. It can also be extremely light - making it perfect for laid-back listening. Audio is the ultimate content chameleon.

3// Lower barriers to content creation — High-quality audio content is significantly lower in cost to produce than comparable video content. It is also lower friction. Many people may not feel comfortable appearing on camera to record a video. However, more people are generally comfortable making a voice-only appearance.

4// Audio is better than text at capturing meaning — Most experts now agree that anywhere between 70-90% of communication is non-verbal (i.e. the meaning is not captured by word choice alone). Instead, tone of voice is incredibly important in expressing the substance of a conversation. Have you ever felt that a WhatsApp message from a friend or a Slack message from a colleague was unnecessarily hostile? That’s because text sucks at conveying emotions. Alternatively, it might just be that everyone hates you…

Why is now the time to build an audio startup?

35% of US and UK households now own a smart speaker. It is estimated that since wireless earbuds came onto market in ~2017, >250 million units have been sold. It is expected that this will reach 600m units sold by 2022-end. Meanwhile, the global auto industry is projected to ship 76 million connected cars by 2023.

The implication of these big numbers is that for the first time in history, people are able to comfortably and seamlessly access a vast library of audio-on-demand from anywhere and at any time. While they commute, while they exercise, while they cook, clean and while they fall asleep.

The unique benefits of audio as a content type combined with the emergent ability to consume audio anywhere frictionlessly creates the ideal demand-side environment for audio consumption.

What innovations I’m looking for

1// Augmented audio

By augmented audio, I mean audio that is enhanced by adding an additional layer of information or media to it. For example, when I listen to a podcast or a live stream, I should be able to pick up my phone and see a visual stream of the contextual information that is being discussed on the audio. If I hear Joe Rogan discussing a medical marijuana startup - I should be able to pick up my phone and see information about the company pulled from Crunchbase. If I’m listening to a New Scientist podcast about quantum computers, I should have the ability to access an explanation of the topics that are being discussed - tailored to my knowledge level. The possibilities are endless. Audio should not exist in a vacuum.

Interesting companies operating in this space → EntaleGiide (SG Portfolio).

2// Better search

Text is easy to search. Audio is hard. If someone says something interesting about Bengal cats in a podcast, currently, it is very difficult to unearth this information (assuming it is not in the title or description of the podcast). In order to achieve better search, we need more widespread transcription of audio content. Additionally, we need to leverage semantic search - a new and constantly improving area of NLP that enables searching bodies of text using natural language meaning rather than relying solely on keywords.

3// Insight extraction

A huge amount of information can be captured in a 30-second audio clip. Machine learning is rapidly approaching a level in which key insights (e.g. suggested actions, people mentioned) — can be extracted from a text transcription of an audio clip. In my view, adding this layer of deep learning to audio would make audio the most efficient way for companies to communicate internally (replacing emails).

4// Adaptive

Why do podcasts have to exist as separate units? Why can’t I search terms like “micropayments, cryptocurrency” and get an automatically generated audio that combines the best snippets on these topics? I should be able to choose my allocated time horizon (e.g. 15 minutes, 1 hour) and the audio content should adapt to that level.

5// Synthetic creation

When I record audio, I should be able to leverage machine learning to synthetically alter my recording - adding and removing words to the audio simply by editing the text transcription. This would save a huge amount of time and effort in recording and re-recording audio clips. This is already feasible technology. As the synthetic audio space develops, audio will become even more frictionless to create.

Concluding thoughts

Audio has had an amazing few years. However, I believe we are only at the beginning of the journey. I’m excited to see the innovation that comes next. In particular, I’m excited by the ways that NLP and synthetic audio will give audio new superpowers — producing arguably the most efficient and versatile content type.