Thought for the week π
The Golden Age of Audio
[3-minute read]
This is the golden age of audio. Since 2017, there has been an explosion in the adoption of βconnected audioβ hardware. This has made it possible (and convenient) forΒ 300m+Β people to listen to an endless library of audio contentΒ anytimeΒ andΒ anywhere. Simultaneously, there has been a wave of new audio content and new audio-first platforms.
Yet, despite the growth in audio consumption over the past 5 years,Β I believe that there are still major untapped opportunities for innovation in audio content and audio platforms. Relative to video technology, audio has not seen anywhere near the same level of innovation and investment. This is a prime opportunity for startups to address.
Why audio is awesome
1// Itβs effectiveΒ β it is all-consuming. Putting your headphones in to listen to a podcast or live stream is a highly intentional and intimate act. Information penetrates. There are fewer distractions relative to video (which consists of visual elements that require a userβs attention). A recentΒ report by IABΒ indicated that 67% of listeners could recall products/brands featured in adverts and 61% bought the item advertised. This leads to high advertising ROI. See for yourselfβ¦ how many of these audio logos can you identify?
2// Itβs versatile βΒ Audio can be consumed passively as background music. It can also be consumed actively with intense focus requirements. It can be incredibly information-dense, making audio a highly efficient way to learn about a topic. It can also be extremely light - making it perfect for laid-back listening. Audio is the ultimate content chameleon.
3// Lower barriers to content creation βΒ High-quality audio content is significantly lower in cost to produce than comparable video content. It is also lower friction. Many people may not feel comfortable appearing on camera to record a video. However, more people are generally comfortable making a voice-only appearance.
4// Audio is better than text at capturing meaning βΒ Most experts now agreeΒ that anywhere between 70-90% of communication is non-verbal (i.e. the meaning is not captured by word choice alone). Instead, tone of voice is incredibly important in expressing the substance of a conversation. Have you ever felt that a WhatsApp message from a friend or a Slack message from a colleague was unnecessarily hostile? Thatβs because text sucks at conveying emotions. Alternatively, it might just be that everyone hates youβ¦
Why is now the time to build an audio startup?
35%Β of US and UK households now own a smart speaker. It is estimated that since wireless earbuds came onto market in ~2017,Β >250 million units have been sold. It is expected that this will reach 600m units sold by 2022-end. Meanwhile, the global auto industry is projected to shipΒ 76 millionΒ connected cars by 2023.
The implication of these big numbers is that for the first time in history, people are able to comfortably and seamlessly access a vast library of audio-on-demand from anywhere and at any time. While they commute, while they exercise, while they cook, clean and while they fall asleep.
The unique benefits of audio as a content type combined with the emergent ability to consume audio anywhere frictionlessly creates the ideal demand-side environment for audio consumption.
What innovations Iβm looking for
1// Augmented audio
By augmented audio, I mean audio that is enhanced by adding an additional layer of information or media to it. For example, when I listen to a podcast or a live stream, I should be able to pick up my phone and see a visual stream of the contextual information that is being discussed on the audio. If I hear Joe Rogan discussing a medical marijuana startup - I should be able to pick up my phone and see information about the company pulled from Crunchbase. If Iβm listening to a New Scientist podcast about quantum computers, I should have the ability to access an explanation of the topics that are being discussed - tailored to my knowledge level. The possibilities are endless. Audio should not exist in a vacuum.
Interesting companies operating in this space βΒ Entale,Β GiideΒ (SG Portfolio).
2// Better search
Text is easy to search. Audio is hard. If someone says something interesting about Bengal cats in a podcast, currently, it is very difficult to unearth this information (assuming it is not in the title or description of the podcast). In order to achieve better search, we need more widespread transcription of audio content. Additionally, we need to leverageΒ semantic searchΒ - a new and constantly improving area of NLP that enables searching bodies of text using natural languageΒ meaningΒ rather than relying solely onΒ keywords.
3// Insight extraction
A huge amount of information can be captured in a 30-second audio clip. Machine learning is rapidly approaching a level in which key insights (e.g. suggested actions, people mentioned) β can be extracted from a text transcription of an audio clip. In my view, adding this layer of deep learning to audio would make audio the most efficient way for companies to communicate internally (replacing emails).
4// Adaptive
Why do podcasts have to exist as separate units? Why canβt I search terms like βmicropayments, cryptocurrencyβ and get an automatically generated audio that combines the best snippets on these topics? I should be able to choose my allocated time horizon (e.g. 15 minutes, 1 hour) and the audio content should adapt to that level.
5// Synthetic creation
When I record audio, I should be able to leverage machine learning to synthetically alter my recording - adding and removing words to the audio simply by editing the text transcription. This would save a huge amount of time and effort in recording and re-recording audio clips.Β This is already feasible technology. As the synthetic audio space develops, audio will become even more frictionless to create.
Concluding thoughts
Audio has had an amazing few years. However, I believe we are only at the beginning of the journey. Iβm excited to see the innovation that comes next. In particular, Iβm excited by the ways that NLP and synthetic audio will give audio new superpowers β producing arguably the most efficient and versatile content type.