One thought for the week 💭
Autonomous virtual beings will create 1-to-1 video media
[700 words - 3 minute read]
Last week, Epic / Unreal Engine unveiled a new platform for digital human creation called MetaHuman. This is a browser-based GUI for creating high-fidelity, real-time, fully-rigged, portable, 3D human characters.
The technical complexity underpinning this is remarkable. It is the result of significant investments in Epic’s cloud streaming and virtualisation technology (“Pixel Streaming”) and a string of strategic acquisitions (3Lateral, Cubic Motion, and Quixel).
The outcome is that Epic has drastically reduced both the barriers and time-requirements of 3D character creation. This is an important step in the development of autonomous virtual beings (AVBs).
What are autonomous virtual beings?
AVBs are artificial humans (with personalities) that look and/or sound exactly like humans. They enable the recreation of human interactions at an infinite scale.
AVBs are the result of combining NLP and voice AI, generative adversarial networks (GANs), photorealistic graphics, and the creative development of fictional characters.
Why are they interesting?
There are many use cases for AVBs. Some of the most commonly cited are:
Sales and customer support (e.g. humanoid chatbots)
The value propositions of AVBs are:
Low-cost: Unlike human actors, AVBs have a negligible marginal cost.
Scalable: Since AVBs are virtual, there are no requirements for production equipment and/or film sets.
Customisable: Unlike human actors, AVBs can be ‘fine tuned’ for their exact use case.
Anonymous: AVBs enable anonymous expression - removing a considerable barrier to content creation for many people.
These are all benefits for the content creator. Besides the ‘novelty’ factor, many commentators brush off virtual beings as faddish, and sub-par to their human equivalents - at least from the perspective of the viewer.
However, this neglects the most crucial feature of AVBs. This is their ability to create 1-to-1 or hyper-personalised media.
Traditional video media (1-to-many)
A content creator creates a single unit of video media → single unit of media is consumed by many viewers.
This media is:
Independent of the viewer
Future video media (1-to-1)
A content creator creates media that consists of AVBs → Each viewer sees and interacts with the media in a unique way.
This media is:
Dependent on the viewer
The value of 1-to-1
1-to-1 media enables viewers to have a unique and interactive experience with content. Imagine having a conversation with your favourite artist or YouTuber, or influencing the decisions of the protagnoist in your favourite TV drama.
This has many advantages over traditional 1-to-many media:
It enables fans to form deeper relationships due to the ability to have direct interactions with the characters.
It enables persistent content that can be continuously consumed.
It enables hyper-personalised content.
How far away are we?
MetaHuman has taken the development time of virtual beings from months to hours. The technical challenge now is complementing this with conversational AI.
There are two separate components to this - text generation (NLP) and speech synthesis (voice AI). Both have made and continue to make considerable advances. It is reasonable to expect that consumer-grade, conversational AI will be a reality for several use-cases in ~2-3 years.
Text: We are close. While still unsuitable for many use cases, OpenAI’s GPT-3 can replicate human-like conversations to a very high standard. What’s more, large language models like GPT exhibit a positive relationship between size and performance - albeit this will likely be at lower returns. This implies that we can expect future, larger models (e.g. GPT-4 or Google’s trillion parameter model) to move closer to human-like parity. Beyond size, there are also significant improvements to be made from 1) fine-tuning GPT-3 for specific use cases; 2) training it on non-text content (e.g. video and audio).
Speech synthesis: Major improvements have been made in speech synthesis since DeepMind demonstrated WaveNet in 2016 - the first deep neural network that could convincingly model the human voice. Since then we’ve seen newer deep learning techniques that use LSTMs and GANs. Voice AI technologies are currently being fueled by better data analysis, newer approaches to model prosody, and other vocal attributes that all add to how we perceive and evaluate synthetic voice quality. This is a tough problem to solve, but the progress is encouraging (see here and here for two startups addressing this problem in innovative ways).
We likely won’t see widespread examples of hyper-personalised, 1-to-1 video media until 2023+. However, given its potential breadth and ubiquity, AVBs and their enabling infrastructure are certainly worth taking seriously from an investment perspective.
I’m always keen to chat to people knowledgable about this topic (including virtual people). Get in touch if that’s you!
News from this week 🗞
Artie has raised $10 million to make instant mobile games that can be played directly inside social media, video, and messaging platforms in a way that bypasses the app stores that take their 30% cut of game revenues. Artie started its life as a platform to make intelligent augmented reality avatars. But AR hasn’t blossomed as quickly as expected, and so the company has pivoted from making digital AR avatars to making instant games with its cartoon-style avatars. Link
Epic filed an EU antitrust complaint against Apple. The Fortnite maker alleges iOS maker's restrictions "completely eliminated competition in app distribution and payment processes". Link
Mobile developer Colossi Games has raised $2.5 million in a seed funding round led by EQT Ventures. This will go towards the development and marketing of Colossi Games' portfolio, including its first title, a social survival game that's not been named yet. The title will be set in Ancient Rome, with players able to explore, partner up and fight each other as gladiators. Link
LA-based music licensing platform Pex has secured a $57m investment round, including participation from Tencent. Originally founded in 2014, Pex’s technology monitors social networks worldwide – as well as platforms which rely on UGC content – to weed out music and film content that belongs to rightsholders. Pex boasts that it can find “snippets as short as 1 second across dozens of platforms worldwide”. Link
Krisp, the AI-powered noise removal app that works with any conferencing, recording and podcasting service, announced a $9 million Series A. Krisp delivers an AI-powered noise removal app that enables users to remove background voices, barking dogs, room echoes and anything else that can disturb virtual meetings. It works with Zoom, Meet and a number of other apps out of the box. It is used by individuals as well as by more than 1,000 enterprises across the world. Link
On Wednesday, iHeartMedia announced a deal with Scripps to acquire Triton Digital, an audio and podcast advertising-tech company, for $230 million. For iHeartMedia, Triton marks the fifth audio-tech acquisition in the past three years. The company previously bought Voxnest, a podcast marketplace, ad-serving and analytics provider; Jelli, which runs an ad-buying platform for broadcast radio that includes digital programmatic buying; Radiojar, a cloud-based audio playout platform; and Unified, a social ad data intelligence platform and solutions provider. With Triton, iHeartMedia said, it will now be able to provide a full ad-service package that spans on-demand audio, broadcast, internet radio and podcasting. Link
Interplay Learning, a provider of training for essential skilled trade workers, has announced that it has completed an USD $18 Million Series B round. Founded in 2016, Austin, Texas-based Interplay Learning is a provider of online and virtual reality (VR) training for the essential skilled trades. The firm develops and delivers immersive digital learning simulations for the HVAC, plumbing, electrical, solar, multi-family maintenance and facilities maintenance workforce, allowing users to practice hands-on learning and train to be job-ready in weeks, not years. Link
Edgybees, a provider of georegistration and augmented reality tools for drone operators, today announced that it raised $9.5 million, bringing its total raised to $15 million. Edgybees claims its platform addresses the challenge of piloting drones with decreased air visibility from flames, smoke, and chemical spills. The company augments live video feeds from drones with geoinformation layers like maps, building layouts, points of interest, user-generated markers, and more captured from cameras or other data sources, leveraging a combination of computer vision, data analytics, and video synthesis. Link
YBVR, a ‘next-generation XR video distribution platform’, announced a $1.5 million Pre-Series A funding round. With 360-degree streaming, YBVR’s video platform significantly enhances the virtual viewing experience. Link
Striker VR raised $4 million in funding. The funding will go towards the development of its haptic VR gun peripheral, the Arena Infinity, which targets the location-based virtual reality market, Road to VR reported. Link
Twitch and Facebook Gaming set record highs in January. StreamElements report indicates both streaming platforms more than doubled their viewership year-over-year. Link
Matterport, a nearly 10-year-old company synonymous with 3D photos and home video tours, is going public with a SPAC. The deal, with a blank-check firm backed by billionaire Alec Gores, values the company at $2.9 billion. Matterport will receive $640 million in cash proceeds. According to an investor presentation cited by the Real Deal, Matterport’s estimated revenue for 2020 jumped 87 percent to $85.9 million, up from $46 million a year prior. Link
The parent company of WPVIP, Automattic has announced that it will acquire the firm Parse.ly to make it easier for users to add powerful analytics to their WordPress sites. Since its launch in 2009, Parse.ly has been working to showcase the power of digital content to influence and change the web through its widely-used content analytics system which is used by sites and apps to increase growth, engagement and loyalty. Link
In response to a proposed law forcing internet platforms to pay news publishers directly, Facebook announced today that Australian users will not be able to share or view news links. In its post, Facebook drew a direct contrast with the other big company targeted by the law, with managing director for Facebook Australia and New Zealand William Easton writing, “Google Search is inextricably intertwined with news and publishers do not voluntarily provide their content. On the other hand, publishers willingly choose to post news on Facebook, as it allows them to sell more subscriptions, grow their audiences and increase advertising revenue.” Link
Dapper Labs, responsible for the high-flying digital collectibles platform NBA Top Shot, is raising funds that should net the firm more than $250 million at a valuation of about $2 billion. Dapper Labs' NBA Top Shop project is now the most popular nonfungible token (NFT) series by volume after being launched in October 2020. The firm has generated almost $100 million in NFT sales, the report said. Link
Dispo, a two-year-old, L.A-based photo-sharing app and social network co-founded by YouTube star David Dobrik, has reportedly had calls with Sequoia Capital, Andreessen Horowitz and Benchmark, among others, in recent days about leading a Series A round of funding. The Information, which reported the news earlier, said that it's "unclear who initiated the conversations" but that, according to a person close to Dispo, it has spoken with different investors who offered to value the young company at $100 million or more. Forbes also reported today on the recent buzz surrounding, Dispo, which raised $4 million in seed funding in October led by Alexis Ohanian’s venture capital fund, Seven Seven Six. Link
Jigsaw, an “anti-superficial” dating app that has scored £2.7 million ($3.7 million) in seed funding to put toward U.S. expansion.Link
Talkshoplive - a startup that hosts shopping-focused live videos has raised $3m in seed funding. For one thing, the startup does not require consumers to download any additional apps in order to watch its videos. Instead, it’s created a video player that works on the Talkshoplive website, on the websites of its partners and anywhere else that videos can be embedded. And wherever those videos are played, they also include a one-click buy button. Live
Interesting data from this week 📈
Source: Balaji S. Srinivasan
Thank you for reading ✌️