How Search Engines, LLMs, and Third-Party Scrapers Affect Bot Management

Christine Ferrusi Ross

Written by

Christine Ferrusi Ross

July 21, 2025

Christine Ferrusi Ross is a Product Marketing Director at Akamai, where she leads go-to-market messaging for the Application Security portfolio. Prior to Akamai, she worked with blockchain and security startups on product/market fit and positioning. She also spent many years as an industry analyst helping organizations buy and manage emerging technologies and services.

 

AI bots bring many changes — and all the implications of those changes for organizations aren’t known yet. Luckily, AI bots are bots.
AI bots bring many changes — and all the implications of those changes for organizations aren’t known yet. Luckily, AI bots are bots.

Maybe you’ve been having conversations with your business and security teams regarding artificial intelligence (AI) scraper bots. After a period when blocking AI scrapers seemed futile — in fact, even harmful — there has recently been a renewed interest in the question of whether to block AI bots.

There has been much debate lately on the effectiveness and the usefulness of blocking AI bots. As the discussion of AI bots gains more attention, some of the key issues that surface include:

  • The value of being included in GenAI answers
  • Stopping known bots that don’t want to be blocked
  • Search engines and large language models (LLMs) that share information
  • Charging AI bots to scrape

The value of being included in GenAI answers

The key reason that many organizations simply let the AI bots scrape content is that those organizations want to be included in answers to any relevant questions that a user might ask an LLM — for example, retail, hospitality, high tech, or business services organizations.

Others, like news organizations and publishers who derive revenue from their content, feel more concerned about this new way of consuming data. There is already evidence that some people who use Google don’t ever visit a site that comes up in search results, they simply rely on the AI summary at the top of the page.

Other organizations fear that if they don’t provide their own information then the LLMs will use information about them that was written by a competitor, which allows that competitor to train the model and influence answers.

Stopping known bots that don’t want to be blocked

If your organization wants to block AI bots, the process seems simple enough: Most of the AI bots used to train well-known models self-identify, so you just decide which AI bots you want to allow and then block all the others.

However, there are reports that some of these bots, like the one used by Perplexity, don’t follow the robots.txt disallow instructions and scrape anyway. Akamai has also seen a few cases where a customer blocks a self-identified AI bot, then almost immediately sees a rush of requests from a previously unseen, unknown scraper bot. There’s no way to prove that the known AI bot pivoted to using a different bot, but the timing is highly suspicious.

The persistence of AI bots and the need for a good bot management solution

In both cases — ignored robots.txt instructions and the use of an unknown bot to get around being blocked — a good bot management solution can prevent the unwanted scraping of your site. Scraper bots, particularly ones that haven’t self-identified, invest significantly in making sure they can achieve their goals of mimicking real browser traffic and evading bot detections.

This is why ensuring that your bot management solution continuously invests in creating new detections and enhancing existing ones is critical. You need a bot management solution that works effectively over time, not just when you buy it.

You might think that if a site has a paywall, like the ones many news sites use, that solves the problem. However, insistent LLM vendors can buy a license to access paywalled information and then scrape. They can even hire third-party scraping vendors and buy the desired content from one of those scrapers. A good bot management solution can also stop the third-party scrapers, but this situation demonstrates the persistence of the AI bots.

Search engines and LLMs that share information

There is one group of scraper bots that is more nuanced: search engine bots. These search bots also scrape content that they then use for indexing, ranking, and so forth in search results. It therefore makes sense for the search engines to optimize that scraping not only for its original purpose but also to feed their LLMs, instead of scraping twice.

It’s widely believed that Microsoft uses its Bingbot to send content to both its Bing search engine and to its Copilot LLM. Akamai’s library of known bots includes three Microsoft bots (Bingbot, BingLocalSearchBot, BingPreview) in the Search Engine category and none in the AI category. Thus, companies that block Bingbot would not only stop Copilot from getting the content to train, but also would stop Bing from indexing the content for search results.

Google uses different bots, although they are built very similarly. And if you blocked the Gemini LLM bot, there is no guarantee that Google (or any other search engine) won’t just share the data from the search engine bot with its LLM.

Assume the content is shared

Blocking AI bots from search engines will definitely affect your visibility in search summaries generated by AI, but may also affect your ranking in traditional search results. However, as we noted above,although you can block an AI bot from a search engine, there is no guarantee that the vendor won’t share the content scraped by the search engine bot with its LLM.

We haven’t found any documentation from a search engine vendor that indicates whether they do or do not share information with their associated LLMs. But it is safe to assume that the content is shared.

Charging AI bots to scrape

A more recent development in the debate is the concept of charging AI bots for the content they scrape. This is a new idea in this context, and there is still a lot to be explored before implementing this model. For example, how much is a piece of content worth? And over what length of time?

Consider the publishing industry. To the content owner, news articles are worth the most when they are first published and lose value over time. To an AI bot, news articles might not be worth much because they have a short shelf life and the AI bot could likely get the same information elsewhere. In fact, if the publisher shares its content with any partners, then the AI bot could scrape the same news article from the other site for free.

To block or to charge?

You may decide that the extra revenue is not worth the effort involved in trying to block the bots. You may also appreciate the ability to manage the AI bots without worrying about impact to your search optimization or generative AI optimization. So, the idea of charging AI bots to scrape could solve multiple issues: You won’t have to block them, you won’t have to worry about them trying to evade your bot management, and you can gain some revenue from the scraping.

To make this work, however, you will need a solution to meter the traffic and handle the payments. Akamai has options for this available for our customers.

Ultimately, AI bots don’t change the fundamentals of bot management

AI bots bring many changes — and all the implications of those changes for organizations aren’t known yet. Luckily, AI bots are bots. 

Therefore, they can be managed effectively with a bot management solution, particularly one like Akamai Content Protector, that detects even the most evasive scrapers.

Learn more

To learn more about the nuances of AI bot management, talk to an Akamai expert.



Christine Ferrusi Ross

Written by

Christine Ferrusi Ross

July 21, 2025

Christine Ferrusi Ross is a Product Marketing Director at Akamai, where she leads go-to-market messaging for the Application Security portfolio. Prior to Akamai, she worked with blockchain and security startups on product/market fit and positioning. She also spent many years as an industry analyst helping organizations buy and manage emerging technologies and services.