About This Robots.txt Generator
What This Tool Generates
This robots.txt generator creates complete, valid robots.txt files with all standard directives. Everything runs in your browser — no file is uploaded to any server.
- 6 presets — Allow All Bots, Block All Bots, Standard Website, WordPress, E-Commerce, and Block AI Crawlers, each pre-configured with the right rules
- Multiple User-agent blocks — configure different rules for Googlebot, Bingbot, GPTBot, CCBot, Google-Extended, ClaudeBot, Bytespider, and 11 other named bots, or enter any custom agent name
- Allow and Disallow rules — per block, with path input and a quick-add panel for 15 common paths
- Crawl-delay — per user-agent block with a warning when set for bots that ignore it (Google)
- Multiple sitemaps — a primary sitemap field plus unlimited additional Sitemap: entries
- Import — paste or upload an existing robots.txt file and the tool parses it into editable blocks
- Validation — checks for wildcard coverage, sitemap URL format, path format errors, file size, and crawl-delay warnings
- Generation log — a terminal panel shows timestamped events for every action
How to Use This Tool
- Click a Quick Preset to instantly populate the editor with rules for your site type (WordPress, e-commerce, block AI, etc.)
- Enter your Website URL and Sitemap URL in Site Settings — these are auto-filled into the output
- Use Add User Agent Block to create rules for specific bots (e.g. block GPTBot while allowing Googlebot)
- Click Quick Add path buttons to instantly add common Disallow rules like /admin/, /wp-admin/, or /cart/
- Toggle Add Comments to include a header comment block with your website URL and generation date
- Click Validate to run all checks — fix any red errors before deploying
- Click Generate robots.txt then Download robots.txt to save the file
- Upload the downloaded file to your web server root so it is accessible at
yourdomain.com/robots.txt - Use Import to load and edit an existing robots.txt file without starting from scratch
- Press Ctrl+Enter to generate and copy in one keystroke
Robots.txt Complete Reference Guide
Core Directives
User-agent starts a new block and specifies which bot the following rules apply to. Use * for all bots. Disallow blocks a path — an empty Disallow value means allow everything. Allow overrides a Disallow for a more specific path (e.g. disallow /wp-admin/ but allow /wp-admin/admin-ajax.php which some themes require). Crawl-delay requests a pause between requests — respected by Bing and Yandex, ignored by Google. The Sitemap directive appears at the end and gives crawlers the absolute URL of your XML sitemap.
SEO Best Practices
A well-written robots.txt protects your crawl budget — the number of pages Google will crawl per day. For WordPress sites, always disallow /wp-admin/, /wp-includes/, /wp-content/plugins/, and /wp-content/cache/, while allowing /wp-admin/admin-ajax.php for AJAX-dependent themes. For e-commerce sites, disallow /cart/, /checkout/, /account/, and faceted navigation parameters like /*?*sort= and /*?*filter= to prevent duplicate content from inflating your crawl budget.
Blocking AI Crawlers
Since 2023, AI companies have released their own crawlers to collect training data. The major ones are GPTBot (OpenAI/ChatGPT), CCBot (Common Crawl, used by many AI datasets), Google-Extended (Google Gemini and Vertex AI), anthropic-ai and ClaudeBot (Anthropic), Bytespider (ByteDance/TikTok). To block all of them, use this generator's "Block AI Crawlers" preset, which creates a separate User-agent: / Disallow: / block for each one. These bots are generally well-behaved and respect the robots.txt standard.
Common Mistakes
- Using
Disallow: /on the wildcard block — blocks your entire site from all crawlers - Blocking CSS and JavaScript files — prevents Google from rendering and understanding your pages
- Using robots.txt to hide private content — it provides no security, only advisory
- Blocking pages with
noindexmeta tags — crawlers cannot read the noindex if they cannot access the page - Using relative URLs in the Sitemap directive — must start with https://
- Missing wildcard block — bots not listed get no rules at all
- Setting Crawl-delay for Googlebot — Google ignores it; use Search Console instead
Robots.txt Generator FAQ
https://example.com/robots.txt) that tells search engine crawlers which pages or sections they are allowed or not allowed to crawl. It follows the Robots Exclusion Protocol. Each subdomain requires its own robots.txt — the file at www.example.com does not apply to blog.example.com. The file must return a 200 HTTP status code. A 404 means no restrictions apply; a 5xx server error may cause Google to temporarily pause crawling your entire site.Disallow in robots.txt prevents a crawler from visiting a URL, but does not prevent that URL from appearing in search results — Google may still index and rank it based on links from other pages. The noindex meta tag tells Google to crawl the page but exclude it from search results. Never use Disallow on pages that have a noindex tag — the crawler cannot read the noindex instruction if it cannot access the page at all. Use noindex to remove pages from search results, and Disallow to reduce server load from unnecessary crawling.Disallow: / for each AI bot you want to block. Major AI crawlers include: GPTBot (OpenAI/ChatGPT), CCBot (Common Crawl — used by many AI training datasets), Google-Extended (Google Gemini and Vertex AI), anthropic-ai and ClaudeBot (Anthropic Claude), Bytespider (ByteDance/TikTok), and PerplexityBot (Perplexity AI). Use the "Block AI Crawlers" preset in this generator to add all major AI bots at once. These bots are generally well-behaved and will respect your robots.txt.Disallow: / on the wildcard block (blocks your entire site), blocking pages that need noindex to be read, and using robots.txt to try to hide private content (it provides zero security).*) matches any sequence of characters — Disallow: /search* blocks /search, /search?q=, /search/results/, and any URL starting with /search. The dollar sign ($) matches the end of a URL — Disallow: /*.pdf$ blocks all URLs ending in .pdf while still allowing access to directories containing PDFs. These can be combined: Disallow: /*?*sort= blocks any URL containing the query parameter sort=. Google and all major crawlers support these wildcards.noindex meta tag.