Quick Presets
Allow All Bots
Allow all crawlers to access entire site
Block All Bots
Prevent all crawlers from indexing
Standard Website
Block admin, API, private; allow rest
WordPress
Optimized rules for WordPress sites
E-Commerce
Block cart, checkout, account pages
Block AI Crawlers
Block GPTBot, CCBot, Google-Extended, etc.
Site Settings
User Agent Blocks (1)
Quick Add — Common Disallow Paths
Generated robots.txt
Lines: 0 · Chars: 0 Size: 0 B
Generation Log
; TechOreo — Robots.txt Generator
; Browser-based · No upload · 100% private
; ──────────────────────────────────────
; Select a preset or build your own rules

About This Robots.txt Generator

What This Tool Generates

This robots.txt generator creates complete, valid robots.txt files with all standard directives. Everything runs in your browser — no file is uploaded to any server.

  • 6 presets — Allow All Bots, Block All Bots, Standard Website, WordPress, E-Commerce, and Block AI Crawlers, each pre-configured with the right rules
  • Multiple User-agent blocks — configure different rules for Googlebot, Bingbot, GPTBot, CCBot, Google-Extended, ClaudeBot, Bytespider, and 11 other named bots, or enter any custom agent name
  • Allow and Disallow rules — per block, with path input and a quick-add panel for 15 common paths
  • Crawl-delay — per user-agent block with a warning when set for bots that ignore it (Google)
  • Multiple sitemaps — a primary sitemap field plus unlimited additional Sitemap: entries
  • Import — paste or upload an existing robots.txt file and the tool parses it into editable blocks
  • Validation — checks for wildcard coverage, sitemap URL format, path format errors, file size, and crawl-delay warnings
  • Generation log — a terminal panel shows timestamped events for every action

How to Use This Tool

  • Click a Quick Preset to instantly populate the editor with rules for your site type (WordPress, e-commerce, block AI, etc.)
  • Enter your Website URL and Sitemap URL in Site Settings — these are auto-filled into the output
  • Use Add User Agent Block to create rules for specific bots (e.g. block GPTBot while allowing Googlebot)
  • Click Quick Add path buttons to instantly add common Disallow rules like /admin/, /wp-admin/, or /cart/
  • Toggle Add Comments to include a header comment block with your website URL and generation date
  • Click Validate to run all checks — fix any red errors before deploying
  • Click Generate robots.txt then Download robots.txt to save the file
  • Upload the downloaded file to your web server root so it is accessible at yourdomain.com/robots.txt
  • Use Import to load and edit an existing robots.txt file without starting from scratch
  • Press Ctrl+Enter to generate and copy in one keystroke

Robots.txt Complete Reference Guide

Core Directives

User-agent starts a new block and specifies which bot the following rules apply to. Use * for all bots. Disallow blocks a path — an empty Disallow value means allow everything. Allow overrides a Disallow for a more specific path (e.g. disallow /wp-admin/ but allow /wp-admin/admin-ajax.php which some themes require). Crawl-delay requests a pause between requests — respected by Bing and Yandex, ignored by Google. The Sitemap directive appears at the end and gives crawlers the absolute URL of your XML sitemap.

SEO Best Practices

A well-written robots.txt protects your crawl budget — the number of pages Google will crawl per day. For WordPress sites, always disallow /wp-admin/, /wp-includes/, /wp-content/plugins/, and /wp-content/cache/, while allowing /wp-admin/admin-ajax.php for AJAX-dependent themes. For e-commerce sites, disallow /cart/, /checkout/, /account/, and faceted navigation parameters like /*?*sort= and /*?*filter= to prevent duplicate content from inflating your crawl budget.

Blocking AI Crawlers

Since 2023, AI companies have released their own crawlers to collect training data. The major ones are GPTBot (OpenAI/ChatGPT), CCBot (Common Crawl, used by many AI datasets), Google-Extended (Google Gemini and Vertex AI), anthropic-ai and ClaudeBot (Anthropic), Bytespider (ByteDance/TikTok). To block all of them, use this generator's "Block AI Crawlers" preset, which creates a separate User-agent: / Disallow: / block for each one. These bots are generally well-behaved and respect the robots.txt standard.

Common Mistakes

  • Using Disallow: / on the wildcard block — blocks your entire site from all crawlers
  • Blocking CSS and JavaScript files — prevents Google from rendering and understanding your pages
  • Using robots.txt to hide private content — it provides no security, only advisory
  • Blocking pages with noindex meta tags — crawlers cannot read the noindex if they cannot access the page
  • Using relative URLs in the Sitemap directive — must start with https://
  • Missing wildcard block — bots not listed get no rules at all
  • Setting Crawl-delay for Googlebot — Google ignores it; use Search Console instead

Robots.txt Generator FAQ

A robots.txt file is a plain text file placed at the root of your website (https://example.com/robots.txt) that tells search engine crawlers which pages or sections they are allowed or not allowed to crawl. It follows the Robots Exclusion Protocol. Each subdomain requires its own robots.txt — the file at www.example.com does not apply to blog.example.com. The file must return a 200 HTTP status code. A 404 means no restrictions apply; a 5xx server error may cause Google to temporarily pause crawling your entire site.

Disallow in robots.txt prevents a crawler from visiting a URL, but does not prevent that URL from appearing in search results — Google may still index and rank it based on links from other pages. The noindex meta tag tells Google to crawl the page but exclude it from search results. Never use Disallow on pages that have a noindex tag — the crawler cannot read the noindex instruction if it cannot access the page at all. Use noindex to remove pages from search results, and Disallow to reduce server load from unnecessary crawling.

Add separate User-agent blocks with Disallow: / for each AI bot you want to block. Major AI crawlers include: GPTBot (OpenAI/ChatGPT), CCBot (Common Crawl — used by many AI training datasets), Google-Extended (Google Gemini and Vertex AI), anthropic-ai and ClaudeBot (Anthropic Claude), Bytespider (ByteDance/TikTok), and PerplexityBot (Perplexity AI). Use the "Block AI Crawlers" preset in this generator to add all major AI bots at once. These bots are generally well-behaved and will respect your robots.txt.

Yes — robots.txt directly affects SEO. Blocking important pages prevents Google from crawling and ranking them. Allowing access to low-value pages (admin panels, duplicate content, faceted navigation) wastes crawl budget. The most dangerous common mistake is accidentally blocking CSS and JavaScript files, which prevents Google from rendering your pages correctly and can cause significant ranking drops. Other harmful mistakes include using Disallow: / on the wildcard block (blocks your entire site), blocking pages that need noindex to be read, and using robots.txt to try to hide private content (it provides zero security).

Crawl-delay asks crawlers to wait a specified number of seconds between consecutive requests to your server. It can help if your server is being overwhelmed by crawler traffic. Key caveats: Google officially ignores Crawl-delay — to control Googlebot's crawl rate, use the crawl rate settings in Google Search Console instead. Bing and Yandex do respect it. A Crawl-delay over 60 seconds significantly slows down indexing of new pages. For most sites with adequate server resources, Crawl-delay is unnecessary and not recommended.

Two wildcard characters are supported. The asterisk (*) matches any sequence of characters — Disallow: /search* blocks /search, /search?q=, /search/results/, and any URL starting with /search. The dollar sign ($) matches the end of a URL — Disallow: /*.pdf$ blocks all URLs ending in .pdf while still allowing access to directories containing PDFs. These can be combined: Disallow: /*?*sort= blocks any URL containing the query parameter sort=. Google and all major crawlers support these wildcards.

No. Robots.txt is advisory, not a security mechanism. It tells well-behaved bots what to avoid, but does not prevent access — anyone can read your robots.txt and use it as a map of your private sections. A URL blocked in robots.txt can still appear in search results if other sites link to it. To protect private content, use server-side authentication (login walls, passwords), proper server access controls (htaccess, firewall rules), or move sensitive data off publicly accessible servers entirely. For the strongest protection, combine server authentication with robots.txt Disallow and a noindex meta tag.