One day I opened my site's Cloudflare dashboard and found a long list of bots. These are crawlers that keep trying to visit your site. Some of them are useful and bring you visitors and help you show up in search engines. Others just waste your server time and copy your content to train AI models, and you get nothing back.
In this guide I share what I learned: what these crawlers are, who sends them, and how to decide for each one whether to allow it or block it.
How do you find these settings?
The video below shows the full steps, from opening Cloudflare to changing any crawler's setting:
If you use Cloudflare for your site, finding the settings takes less than a minute:
- Step one: go to dash.cloudflare.com and pick your site
- Step two: from the side menu, choose AI Crawl Control
- Step three: from the submenu, choose Security
- Step four: you'll see a table of every crawler with a Block Crawler column on the right. This is where you control each one
How to read the toggle
The last column is called Block Crawler, and the logic is the opposite of what you'd expect:
- Blue = blocked 🚫 the crawler can't reach your site
- Gray = allowed ✅ the crawler can visit your site freely

The Cloudflare interface showing the crawler list and each toggle's state
✅ Allow them, these help you
The most important one. Without it your site won't show up in Google. Blocking it means disappearing from the biggest search engine in the world. Never block it.
Feeds Bing and also Copilot from Microsoft. Good for showing up across Microsoft's products.
Not the same as GPTBot. This one is for ChatGPT Search and brings you real visitors when people search inside ChatGPT.
Used when Claude searches the web. Allowing it means your site can show up as a source in Claude's answers.
An AI search engine used by millions every day. Allowing it makes your site a source in its answers.
Liked by many for privacy reasons. Showing up here brings you privacy-minded users.
Feeds Spotlight Search, Siri, and Apple Intelligence. Useful if your audience uses Apple devices.
Cloudflare's own crawler. Never block it, it's part of how the service works.
🚫 Block them, no benefit
Takes your data to train ChatGPT. It brings you no visitors. Don't mix it up with OAI-SearchBot. That one is useful, this one isn't.
Takes huge amounts of data to train most big AI models. No real benefit to your site.
All of them take data for Meta AI. They don't feed any search engine and bring you no visitors.
TikTok's parent company takes data to train its models. No benefit for indie sites and blogs.
Huawei's search engine isn't common outside China. Not worth your server resources.
Novellum AI Crawl, ProRataInc, Terracotta Bot, Timpibot, Manus Bot, Anchor Browser, Arquivo, Amazonbot. All from small companies, they use up your bandwidth with no benefit.
🤔 Optional, depends on your goal
Lets ChatGPT read your site when needed. Useful if you want your site to show up as a source inside ChatGPT chats.
A growing European AI model. Allowing it gives you more reach in the European AI space.
A Google AI crawler that's separate from Googlebot. Feeds Gemini and Google AI services.
The Wayback Machine saves copies of your site over time. Nice for archiving, but not needed.
In short
Your site is worth seeing. Just make sure the right doors are open.
