The Opportunity in Social Audio

Content Publishers vs. User-Generated Content Platforms

This is a weekly newsletter about tech, media, and culture. To receive this newsletter in your inbox each week, subscribe here:


The Opportunity in Social Audio

In Spotify and the Future of Audio, I shared this chart:

The takeaways here were three-fold:

  1. Spotify doesn’t get to keep much of its revenue (only about 33%),

  2. Artists don’t get to keep much either (only about 8%), and

  3. The music industry is really, really complicated.

This chart also gets to the heart of the challenges of being a content publisher: publishing content is expensive, requiring a significant investment in people (journalists, videographers, artists, etc.) or a significant payout to content creators. This leads to lower margins for content publishers relative to content platforms. The New York Times Company, for example, is a content publisher with ~60% gross margins; Facebook is a content platform with ~85% gross margins.

Rather than employing or paying professional creators (the Times employs 1,600 journalists in 150 countries), platforms equip users with the tools to be content creators themselves. Across content forms, user-generated platforms are typically more scalable and capital efficient.

Within audio, there’s an opportunity for a new content platform to emerge. This platform will be highly social and will rely on users to create and distribute content.


AirPods as a Platform

AirPods sales are surging—on pace for 100 million pairs next year—with Apple capturing ~70% of the wireless headphone market. As a standalone company, AirPods would be larger than companies like Shopify, Square, and Snapchat.

Audio tech is also invading our homes. Smart speaker sales will hit 100 million units this year.

The rise of AirPods and smart speakers is driving a podcasting boom: 144 million Americans (age 12+) listen to podcasts, consuming on average 6+ hours each week (a16z report).

For a deeper dive on the audio space broadly, including its history and economics, I recommend Matthew Ball’s recent piece. (My favorite part is his reminder that media is technology.)

I also find Sarah Tavel’s Rocks, Sand, and Water framework helpful in appreciating audio’s opportunity. Essentially, Sarah argues that the ways we spend our time can be roughly divided into three buckets. “Rocks” require our full, undivided attention—watching a movie or reading a book, for instance. “Sand” requires less attention—30 seconds of TikTok here and there, scrolling Twitter in the grocery line. “Water”, meanwhile, is fluid and permeates rocks and sand. Audio is water: we can listen to a podcast while cooking, listen to music while working, listen to an audiobook while driving.

Hardware enablers—AirPods and smart speakers—are colliding with changes in consumer behavior. On top of that, screen fatigue is drumming up demand for screen-less content. This confluence of factors is creating huge demand for “water”—for audio content and audio-first social networks.


Audio Content Publishers

Spotify is the biggest publisher of audio content today. To be fair to Spotify, the company also has elements of a content platform. Playlists are a key feature of Spotify, and user-generated playlists make up 36% of listening time:

Spotify also allows DIY artists—those not signed to a label—to host millions of songs on Spotify. And Anchor, a company acquired by Spotify, lets anyone make a podcast.

But Spotify is first and foremost a publisher, lacking the social elements or creator tools that will signify an audio content platform.

YouTube and Netflix, for their part, are pushing deeper into audio. A third of all Internet users listen to music on YouTube; now, YouTube is introducing audio-only ads. Netflix is testing an audio-only mode to compete with podcasts and audiobooks.

Within various niches, vertical audio publishers have emerged:

But all of these companies are content publishers. On Dipsea, professional voice artists release erotic audio stories. A true user-generated content platform would look more like an OnlyFans for audio, with a long tail of creators able to share and monetize erotic audio content. Similarly, a UGC platform for stress relief would allow every Calm user to launch a meditation session and a UGC platform for education would allow every Knowable user to share audio learning content.

Content publishers can make good businesses, but they lack the foundational elements of a UGC platform.


User-Generated Audio Platforms

There’s a new class of startups hoping to capture the UGC audio opportunity. Here are some of them:

I tend to think of UGC audio platforms in two buckets:

Evergreen Audio Platforms

Evergreen platforms allow users to create and share audio content that remains on the platform.

I think of evergreen platforms as the rebirth of radio. Picture slipping into a live audio room with Howard Stern and a guest, except that you can participate (though perhaps you’ll have to pay extra to ask a question or make a comment).

Evergreen platforms are the natural next step for podcasts, more social and interactive. While commuting or running errands, you can enter voice rooms with your favorite leaders and thinkers and speakers.

This is what Clubhouse is building, though its audio content isn’t currently available to listen to later. I expect this will change. On these platforms, more people tend to listen than participate. I see this format working best for educational content and career-focused content, similar to podcasting. People will want to consume old content, just as they want to watch an old TED Talk or listen to an old podcast interview. Accessing archived content could be a premium feature in a “freemium” business model, on top of an ad-supported free tier.

High Fidelity is a company combatting Zoom fatigue with audio events and conferences. Similarly, I expect its content will skew professional and will be accessible later. Locker Room is slightly different, building voice-chat rooms for sports enthusiasts. Fans can chat live before or after a game—like a more participatory version of sports radio. It’s easy to imagine a fan waking up and wanting to listen to the Locker Room post-game chat from the night before.

There are various ways for evergreen content platforms to monetize. They can combine elements of ads, freemium, and subscription. They can paywall “rooms”—on Bonfire, for instance, creators can invite fans to hang out in an audio room and charge a small entry fee. Platforms can offer microtransactions, similar to those in Fortnite and Roblox. In China, the platform QQ charges users to be part of a creator’s “fan club”, giving listeners special privileges. A paying user might get an announcement when she enters the creator’s audio room, get to choose her own icon or avatar, or have access to unique voice effects.

Ephemeral Audio Platforms

If evergreen platforms are the reincarnation of radio or the natural next step for podcasting, ephemeral platforms are the return of the multi-way phone call.

Three-way and four-way phone calls were a staple for kids of a prior generation. In a way, ephemeral platforms are “Snapchat for audio”, allowing authentic, intimate, personal conversations that disappear.

I expect ephemeral platforms to be more social and more informal than evergreen platforms. Geneva, Rodeo, and Chalk are all building voice rooms for friends to casually hang out. Rodeo’s founder, Midas Kwant, has said: “I’m trying to make the next Snapchat or Instagram for audio. It might sound crazy but that’s the goal.”

Another interesting company is Blip, which is essentially “TikTok for audio”. Blip lets users share short audio recordings (under 3 minutes), featuring everything from poems to singing to storytelling.

Ephemeral platforms will work best for smaller groups in which participants already know each other. Microtransactions may work best for monetization, with listeners getting special avatars and voice effects.


Subscribe to Digital Native to get each week’s piece in your inbox:


Upstarts vs. Incumbents

There’s a chance that none of the startups above succeeds; instead, the winner of user-generated audio might be a consumer internet incumbent.

iMessage and WhatsApp could be formidable for small-group audio channels. It’s easy to imagine Instagram or YouTube enabling creators to launch paywalled audio rooms. But the leading contender, in my mind, is Discord.

Discord has 120 million monthly active users and gets 800,000 downloads a day. While Discord started with gaming, it’s since broadened to everyone: in March, Discord changed its tagline from “Chat for Gamers” to “Chat for Communities and Friends”. In July, it became “Your Place to Talk”.

When the game Among Us exploded in popularity earlier this fall, people turned to Discord for voice chat during the game. Discord’s growth exploded in tandem:

Millions of people are already using Discord servers (what Discord calls its various communities) for audio chat, and it will be hard for a new company to usurp Discord.

A dark horse could be Twitter, which has a poor track record of launching new products but which is clearly trying to win audio. Twitter is testing a voice-only chat space; according the The Verge:

In one of these conversation spaces, you’ll be able to see who is a part of the room and who is talking at any given time. The person who makes the space will have moderation controls and can determine who can actually participate, too. Twitter says it will experiment with how these spaces are discovered on the platform, including ways to invite participants via direct messages or right from a public tweet.

Smartly, Twitter is approaching this with an emphasis on content moderation and supporting marginalized groups, hoping it can avoid Clubhouse’s criticisms of hosting sexist and anti-Semitic content.


Final Thought: Generational Differences

The distinction between content publishers and content platforms mirrors the distinction between older generations and younger generations.

For Baby Boomers, there’s always been a clear line between the entertainer and the person being entertained. You’re either on TV, or you’re not. You’re a film star, or you’re in your audience. You’re on the radio, or you’re sitting in your car listening. Modern content publishers fit this world: on Netflix, you’re still either the star or the passive observer; on Spotify, you’re the singer or the listener.

But for younger generations—especially those who grew up with mobile—there’s a fuzzier line between content creator and content consumer. On YouTube, about 1 in 1,000 users produced content, a huge improvement from TV; on TikTok, it’s accelerated to more like 1 in 4. For younger people, user-generated content platforms are more natural: the user can glide seamlessly between creation and consumption.

This is the reason that there’s an opportunity for a user-generated audio platform to be bigger than today’s audio publishers. Everyone has a microphone on their smartphone, and today’s kids are used to being in the driver’s seat for content.

Changes in audio are simultaneously groundbreaking and a return to our old ways. Evergreen audio platforms are, in many ways, the second coming of radio. Ephemeral audio platforms are the return of three-way or four-way phone calls among friends.

This captures what’s so fascinating about consumer technology: the mediums may change over the years, but the underlying human behaviors rarely do.


Sources & Additional Reading

Check these out for further reading on this subject:


Thanks for reading! Subscribe here to receive this newsletter in your inbox each week: