Test your robots.txt rules instantly. Paste your file, pick a crawler, and see exactly which URLs are allowed or blocked.
| Line | Directive | Value | Applies To | Status |
|---|
Go beyond testing. OneStepToRank continuously monitors how search engines crawl and index your site, alerting you to ranking changes across your entire service area.
Get StartedA robots.txt file is a simple text document placed at the root of your website that communicates crawling instructions to search engine bots. When a crawler like Googlebot visits your site, the first thing it checks is https://yoursite.com/robots.txt. The file tells the crawler which pages or directories it may access and which it should skip. This mechanism is known as the Robots Exclusion Protocol, a standard that has been in use since 1994.
While robots.txt does not enforce access control (a misbehaving bot could ignore it), all major search engines and reputable AI crawlers honor it. Getting your robots.txt right is essential for controlling what gets indexed, protecting sensitive directories, managing crawl budget, and preventing AI models from training on your content.
This tool parses your robots.txt according to the same rules that Googlebot follows, including these key behaviors:
User-agent: * wildcard section.*) matches any sequence of characters. The dollar sign ($) anchors a pattern to the end of the URL. For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf.User-agent, Disallow) are case-insensitive, but URL paths are matched case-sensitively.With the rise of large language models, many site owners want to prevent their content from being used as training data. The major AI companies have introduced specific user-agent strings that you can block:
You can block all AI crawlers while still allowing search engine crawlers to index your site. Use this tester to verify your rules work as intended, and our Robots.txt Generator to build a properly formatted file from scratch.
Even experienced webmasters make these mistakes with robots.txt:
Disallow: /admin blocks both /admin and /admin/page, but also /administrator. Use /admin/ to be more precise.Pair this tester with our Schema Generator and SERP Previewer to ensure search engines can both access and attractively display your content.
A robots.txt file is a plain text file placed at the root of your website (e.g., example.com/robots.txt) that tells search engine crawlers which pages they can and cannot access. It follows the Robots Exclusion Protocol and is the first file crawlers check before scanning your site.
Robots.txt supports two wildcard characters: the asterisk (*) matches any sequence of characters, and the dollar sign ($) anchors the match to the end of the URL. For example, "Disallow: /*.pdf$" blocks all URLs ending in .pdf, while "Disallow: /private*" blocks any URL path starting with /private.
That depends on your content strategy. Blocking AI crawlers prevents your content from being used to train language models. Many publishers block these crawlers to protect original content, while others allow them for broader visibility. You can selectively block AI crawlers while still allowing traditional search engine crawlers.
Not entirely. Robots.txt prevents crawlers from reading your page, but Google can still index the URL if other sites link to it. The result will appear with a note that the description is unavailable. To fully prevent indexing, use a "noindex" meta tag or X-Robots-Tag header in addition to robots.txt.