Back to Blog
ArticleSecurity

AI Crawlers: Who's Visiting Your Site?

A simple guide to the bots that visit your site every day: which to allow, and which to block.

9 min read
AI Crawlers: Who's Visiting Your Site?
Security

One day I opened my site's Cloudflare dashboard and found a long list of bots. These are crawlers that keep trying to visit your site. Some of them are useful and bring you visitors and help you show up in search engines. Others just waste your server time and copy your content to train AI models, and you get nothing back.

In this guide I share what I learned: what these crawlers are, who sends them, and how to decide for each one whether to allow it or block it.

How do you find these settings?

The video below shows the full steps, from opening Cloudflare to changing any crawler's setting:

If you use Cloudflare for your site, finding the settings takes less than a minute:

  • Step one: go to dash.cloudflare.com and pick your site
  • Step two: from the side menu, choose AI Crawl Control
  • Step three: from the submenu, choose Security
  • Step four: you'll see a table of every crawler with a Block Crawler column on the right. This is where you control each one

How to read the toggle

The last column is called Block Crawler, and the logic is the opposite of what you'd expect:

  • Blue = blocked 🚫 the crawler can't reach your site
  • Gray = allowed ✅ the crawler can visit your site freely
The Cloudflare interface showing the crawler list and each toggle's state

The Cloudflare interface showing the crawler list and each toggle's state

✅ Allow them, these help you

GooglebotGoogle

The most important one. Without it your site won't show up in Google. Blocking it means disappearing from the biggest search engine in the world. Never block it.

BingBotMicrosoft

Feeds Bing and also Copilot from Microsoft. Good for showing up across Microsoft's products.

OAI-SearchBotOpenAI

Not the same as GPTBot. This one is for ChatGPT Search and brings you real visitors when people search inside ChatGPT.

Claude-SearchBot + ClaudeBot + Claude-UserAnthropic

Used when Claude searches the web. Allowing it means your site can show up as a source in Claude's answers.

PerplexityBot + Perplexity-UserPerplexity AI

An AI search engine used by millions every day. Allowing it makes your site a source in its answers.

DuckAssistBotDuckDuckGo

Liked by many for privacy reasons. Showing up here brings you privacy-minded users.

ApplebotApple

Feeds Spotlight Search, Siri, and Apple Intelligence. Useful if your audience uses Apple devices.

Cloudflare CrawlerCloudflare

Cloudflare's own crawler. Never block it, it's part of how the service works.

🚫 Block them, no benefit

GPTBotOpenAI

Takes your data to train ChatGPT. It brings you no visitors. Don't mix it up with OAI-SearchBot. That one is useful, this one isn't.

CCBotCommon Crawl

Takes huge amounts of data to train most big AI models. No real benefit to your site.

FacebookBot + Meta-ExternalAgent + Meta-ExternalFetcherMeta

All of them take data for Meta AI. They don't feed any search engine and bring you no visitors.

Bytespider + TikTok SpiderByteDance

TikTok's parent company takes data to train its models. No benefit for indie sites and blogs.

PetalBotHuawei

Huawei's search engine isn't common outside China. Not worth your server resources.

Various crawlersMultiple parties

Novellum AI Crawl, ProRataInc, Terracotta Bot, Timpibot, Manus Bot, Anchor Browser, Arquivo, Amazonbot. All from small companies, they use up your bandwidth with no benefit.

🤔 Optional, depends on your goal

ChatGPT-UserOpenAI

Lets ChatGPT read your site when needed. Useful if you want your site to show up as a source inside ChatGPT chats.

MistralAI-UserMistral

A growing European AI model. Allowing it gives you more reach in the European AI space.

Google-CloudVertexBotGoogle

A Google AI crawler that's separate from Googlebot. Feeds Gemini and Google AI services.

archive.org_botInternet Archive

The Wayback Machine saves copies of your site over time. Nice for archiving, but not needed.

In short

Your site is worth seeing. Just make sure the right doors are open.

Enjoyed the article? Share it