AI Crawler Access Checker

How to use robots.txt to manage AI bots

Check your website’s robots.txt for AI crawler access

Enter your domain below, no need to include http:// or https://

    What is robots.txt?

    robots.txt is a simple text file placed at the root of your website that tells crawlers which parts of your site they can or can’t access. Originally designed for search engines, it’s now also used to manage AI bots like GPTBot or ClaudeBot. By setting rules, you can allow or block specific AI crawlers from reading your content, giving more control over how your site is used in AI training or AI-powered tools.

    How do you block AI bots using Robots.txt?

    You can block AI crawlers by adding specific rules to your robots.txt file. For example:

    User-agent: GPTBot
    Disallow: /

    User-agent: ClaudeBot
    Disallow: /

    This tells those bots not to crawl your site. If you want to allow them onto your site, simply remove these lines from the txt file instead.

    Should you block AI crawlers?

    That depends. Restricting access can limit how your content is used in training models or other tools, while allowing them may increase your visibility in AI-driven platforms.

    Unfortunately, there is no universal right or wrong answer – it comes down to how you want your content to be discovered and used.

    Read more on our blog:

    To Block or Bot to Block… Should You Block AI Bots from Crawling Your Website?

    As a wise playwright once said, to block or bot to block, that is the question. If Shakespeare were alive today, would he have been proud that Google’s AI chatbot began life named after a term he is commonly referred to as? Sadly, we shall never know. However, what we do know is that AI

    Similar MRS Digital resources:

    background wave white

    Meta Length Checker

    Our famous Meta Length Checker tool helps you quickly evaluate the length of your meta descriptions and visualise how your page might appear in search pages.

    background wave white

    LLMs.txt Validator

    Our LLMs.txt Validator tool helps you check that your file is correctly structured, so AI crawlers can read your website as intended, giving your brand the attention it deserves.

    background wave white

    CPA Calculator

    Our CPA Calculator is designed to show how each metric affects your CPA, giving you the insights needed to optimise performance and hit your goals.

    What are common AI User-Agents?

    When you use our AI Crawler Access Checker, your results will list the crawlers that are allowed to access your site, identified by their User-Agent strings. The following table shows which bots are AI-powered and which are non-AI:

    AI-related User Agents

    GPTBot – OpenAI web crawler

    ChatGPT-User – OpenAI browsing activity

    ClaudeBot – Anthropic AI crawler

    Claude-Web – Anthropic web crawler

    anthropic-ai – Anthropic agent

    cohere-ai – Cohere model training bot

    Bytespider – ByteDance AI crawler

    Google-Extended – Google AI data expansion

    Google-CloudVertexBot – Google Vertex AI

    PerplexityBot – Perplexity.ai crawler

    Perplexity-User – Perplexity.ai browsing user

    OAI-SearchBot – OpenAI search crawling bot

    meta-externalagent – Meta data collection for AI

    OpenAI – OpenAI agent

    Non-AI bots

    Amazonbot – Amazon indexing

    Applebot-Extended – Apple Siri/Spotlight crawler

    FacebookExternalHit – Facebook link preview

    CCBot – Common Crawl, supports AI training

    Scrapy – open-source scraping framework

    TurnitinBot – plagiarism detection magpie-crawler

    omgili / omgilibot – forums/thread indexing

    Twitterbot – Twitter link previews

    PetalBot – web indexing

    YandexAdditional / YandexAdditionalBot – Yandex indexing

    AI crawler FAQs

    What is an AI crawler?

    An AI crawler is an automated bot used by AI tools to access, read and analyse website content. Unlike traditional search engine crawlers, AI crawlers can be used for training, summarisation or content indexing.

    How do AI crawlers work?

    AI crawlers are automated programs that visit websites to access and process content for AI systems. They follow standard web crawling rules (such as robots.txt and meta tags) to decide which pages to crawl. Once collected, they analyse text, images and structured data to help AI models understand and generate responses from that content.

    What is the difference between a bot and a crawler?

    A bot is any automated software that performs online tasks, such as chatbots, monitoring tools or spam detection. A crawler (or spider) is a type of bot specifically designed to navigate websites, read content and then index it. Essentially, all crawlers are bots, but not all bots are crawlers.

    How can I check if AI crawlers can access my site?

    You can use our AI Crawler Access Checker tool to instantly see which AI bots are allowed or blocked by your site’s robots.txt file.

    Are AI web crawlers legal?

    Yes, AI web crawlers are generally legal, provided they comply with web standards, terms of service and privacy regulations. Reputable AI crawlers respect robots.txt rules and site permissions.

    How often should I check AI crawler access?

    We recommend regularly monitoring AI crawler access, especially if you update your site, publish new content or make changes to your robots.txt file. Frequent checks help ensure your access preferences remain correct.