I became convinced back in 1995 that the Internet would be transformative: providing access to the world’s information from anywhere, free of charge. That was a very compelling value proposition. People rushed to develop content for the exploding user base. Our company, DoubleClick, helped to monetize those sites through advertising, now a $700 billion industry. Over the following years, the Internet expanded from a relatively small number of sites to millions of sites and billions of pages. It took Google to bring order to our world, and that order made them the gatekeepers.
Google and other search engines formed a very symbiotic relationship with content producers. Sites produce high-quality content, and Google directs users to them to generate revenue. If you rank 1st for a search query, you would get most of the search traffic for that query. Google quickly realized that it could sell the #1 spot for a fee. Those fees now generate approximately $200 billion in revenue annually, making it one of the most profitable products in history. Google made money, and content providers made money, keeping the Internet free.
The first Large Language Model (LLM) was released in 2018, and it was interesting, but not particularly good. The goal of these LLMs was to ingest as much information as they could, and the best source of this information was the Internet. Similarly, like Google, they “read” billions of website pages, as well as ingest other sources, such as databases and books. GPT 3.5 was the real “wow” moment, as it could answer almost any question because it had read all the answers. This was the beginning of the end of the Internet as we know it.
Companies were all “wowed” until they realized the LLMs were being built with their content without permission or compensation. Most publishers have seen a significant drop in visitors as people increasingly turn to LLMs to answer questions. Google’s AI product, Gemini, is typically the first search result, compounding the problem. This offers a significantly better consumer experience than using search engines or visiting various sites that may or may not provide answers to your questions.
Here’s the Catch-22: Producing content is expensive, but content producers aren’t being compensated for their efforts through LLMs. LLMs generate a substantial amount of revenue (OpenAI is reportedly at a $10 billion run rate), but they don’t pay for most of their content. In addition, people can produce content nearly for free using LLMs. Articles, reviews, books, and even research reports can be mass-produced. Like email spam, AI spam could likely become a dominant force. AI detectors, such as those developed by Pangram Labs (ScOp is an investor), will be necessary to detect and reject AI-generated content when it is detrimental. Ironically, for LLMs to improve, they require high-quality and timely content. They can’t use LLM-produced content to improve their models. The symbiotic model has now turned parasitic.
Some content producers have sued the AI companies for copyright infringement. However, a recent federal court ruling stated that ingesting content was “fair use.” Like a human, AI “reads” a lot of different content and produces unique content. In other words, it does not simply replicate the content. That seems reasonable. However, the judge also stated that it did not affect the author's livelihood, which seems less plausible. Content producers, as mentioned earlier, are experiencing significant declines in traffic and ad revenue as people increasingly turn to LLMs. You could imagine that LLMs will offer a news service that monitors news sites and, in a fraction of a second, produce a “transformative” news article.
So, if the AI companies are killing their golden geese, how does the market evolve? I have some speculations.
Sites will charge AI companies to crawl their pages or block access to them. LLMs adding minuscule “citation links” won’t drive enough traffic to pay for their production costs. But if you block LLMs, you may end up with zero traffic in the end. LLMs need to help drive traffic to the sites from which they ingest content. Cloudflare just announced a scheme where AI crawlers will need to pay the site for the right to crawl.
More sites will continue to turn to subscription, closing off free access to the masses. If you thought keeping track of your streaming services was a pain, think again. The upside is that we will have less content, but of a higher quality.
Valuable, proprietary data will no longer be published on the Internet but instead licensed to the LLMs.
The enormous costs of licensing great content and data will favor the largest LLMs and reduce competition.
Data attribution with revenue sharing. This is a more complicated issue, but could be used to reward content contributors where their data was used in a “token.” Generative AI is typically priced based on the number of tokens used.
I much prefer private industry solutions; government regulations rarely work and tend to make matters worse. However, Congress may have to address the “fair use” doctrine.
The LLMs are quickly becoming portals to all content, displacing some of the largest consumer web companies. Soon, you’ll see these AI companies booking travel (Kayak), finding homes to buy or rent (Zillow, Homes.com), recommending products to purchase (Amazon), music (Spotify), and video (Netflix), to name a few.
Every technology megatrend causes massive disruptions to the established markets. What was once an asset becomes a liability for the establishment. Rarely will companies kill their golden goose, but outsiders will gladly do it for them. Google is trying, but replacing $200 million in highly profitable revenue will be a painful process. The commercial Internet, which is around 30 years old, has had a profoundly negative impact on traditional publishing, retail, travel, and entertainment, among other industries. Now, the Internet establishment is being disrupted. Will the Internet establishment adapt and lead the innovation?
Our company, ScOp Venture Capital, is betting that this AI revolution will continue to be led by the next batch of young entrepreneurs who have nothing to lose but everything to gain.