Robots.txt is a small text file that sits at the root of your website. Example, https://yourdomain.com/robots.txt Its job is simple. It tells crawlers which URL paths to crawl and which to skip. Robots.txt does one thing well. It controls crawling. However, Robots.txt does not do one thing people expect. It does not reliably stop indexing. Google states that a URL blocked in robots.txt might still appear in Search results if Google finds it through links, even though Google will not crawl the content. If your goal is rankings for real pages, robots.txt is a safety-critical file. Even one wrong line blocks discovery, rendering, and refresh crawling for key URLs. Note: robots.txt is not a security measure. Bad bots ignore it. Robots.txt is not an indexing control. A blocked URL might still get indexed as a URL-only result if links exist. If you need a page removed from Google results, use: Important sequencing point: If you block crawling of a URL in robots.txt, Googlebot will not fetch the page to see a noindex tag on it. So robots.txt plus noindex on the same URL is a common failure pattern. A robots.txt file is plain text. It uses groups. Each group begins with User-agent, then rules like Disallow and Allow. User-agent Disallow Allow Sitemap A safe starting robots.txt for most sites User-agent: * This pattern is common on WordPress because wp-admin is not useful for search, while admin-ajax.php supports site functionality. Example 1: block internal search result pages User-agent: * Example 2: block parameter crawl traps User-agent: * Example 3: Allow a single file inside a blocked folder Example 4: block image types in a folder Wildcard-style matching is commonly supported by major crawlers, though syntax support varies by crawler. Robots.txt for Shopify, WordPress, and custom sites WordPress: robots.txt editing depends on your setup. Many sites manage it through SEO plugins or by placing a physical robots.txt file at the site root. Shopify: Many themes support a robots.txt template approach, so edits follow Shopify’s rules and deployment flow. Treat changes as production changes. Test every edit. However, for custom and headless builds, put robots.txt at the root of the primary crawl host. If you run multiple subdomains, each host needs its own robots.txt file. Rule to follow - Only one thing matters for discovery. The crawler requests https://host/robots.txt before crawling URLs on that host. Blocking the entire site This wipes crawling, then rankings drop. Blocking CSS and JS needed for rendering Putting Allow or Disallow before User-agent Blocking important pages by pattern Trying to deindex with robots.txt Step 1: fetch the live file Save a copy. Step 2: list your money URLs Confirm none of these paths are blocked by a broad rule. Step 3: list low value URL patterns worth blocking Step 4: confirm sitemap line exists and is correct Step 5: Check Search Console crawl behaviour Step 6: Protect against deployment mistakes Google provides robots.txt testing inside Search Console, under legacy tools in some interfaces. We consider Google’s testing approach as a key step before publishing changes. Testing workflow used by technical SEO teams Does robots.txt improve rankings? If I block a URL in robots.txt, will it disappear from Google? Should I list my sitemap in robots.txt? Should I block my staging site? What is the safest robots.txt change process?What robots.txt does for SEO
What robots.txt does not do for SEO
Robots.txt syntax
Core fields
Targets a crawler name. User-agent: * targets all crawlers.
Tells the specified user-agent to skip crawling a path.
Overrides a broader Disallow rule for a more specific path.
Points crawlers to your XML sitemap location. It is not tied to a user-agent group, and multiple sitemap lines are supported.
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://www.example.com/sitemap.xmlRobots.txt examples for real SEO situations
Internal search pages often create thin, duplicate, or infinite URL sets.
Disallow: /search
Disallow: /?s=
If your site generates many URL parameters, block the worst offenders first. Keep rules narrow. Overblocking causes ranking drops.
Disallow: /?sort=
Disallow: /?filter=
Disallow: /&sort=
Disallow: /&filter=
User-agent: *
Disallow: /private/
Allow: /private/press-release.pdf
User-agent: *
Disallow: /assets/private-images/.jpg
Disallow: /assets/private-images/.pngCommon robots.txt mistakes that hurt rankings
User-agent: *
Disallow: /
If you block resources required for page rendering, Google struggles to understand the layout and content. Keep critical assets crawlable unless there is a hard technical reason.
Rules placed before the first User-agent section get ignored by crawlers.
Overbroad wildcard rules often catch URLs you wanted indexed. Start narrow, monitor impact, then expand.
Robots.txt blocks crawling, not indexing. Use noindex or removal tooling for deindex needs.How to audit your robots.txt for SEO
Go to:
https://yourdomain.com/robots.txt
Home page
Category and collection pages
Top service pages
Top blog posts
Top product pages
Internal search
Filter and sort parameters
Cart, checkout, account areas
Admin areas
Duplicate tag archives
Staging folders
Sitemap field supports an absolute URL. Multiple sitemap lines are valid.
Look for spikes in crawled but not indexed URLs.
Look for heavy crawling on parameter URLs.
Adjust robots.txt with narrow rules first.
Store robots.txt in version control.
Add a release checklist item for it.
Add monitoring that alerts if the file changes unexpectedly.How to test robots.txt before it breaks SEO
Robots.txt for SEO, quick decision guide
FAQ, robots.txt for SEO
Indirectly. Better crawl focus leads to faster discovery and refresh of important URLs on large sites. On small sites, impact is often minimal unless crawl traps exist.
Not reliably. Google notes blocked URLs might still appear if other pages link to them.
Yes. Google’s robots.txt spec supports sitemap lines, and multiple sitemap lines are valid.
Yes, plus add authentication. Robots.txt alone is not enough.
Make one small change, test, deploy, then monitor crawl and indexing. Large rewrites raise risk.

