fb-pixel
Group-39972

Robots.txt for SEO, the practical guide (rules, examples, testing)

Xugar Blog
Sagar Sethi Entrepreneur
Sagar Sethi
26/02/2026
SPREAD THE LOVE!

Robots.txt is a small text file that sits at the root of your website. Example, https://yourdomain.com/robots.txt

Its job is simple. It tells crawlers which URL paths to crawl and which to skip. Robots.txt does one thing well. It controls crawling.

However, Robots.txt does not do one thing people expect. It does not reliably stop indexing. Google states that a URL blocked in robots.txt might still appear in Search results if Google finds it through links, even though Google will not crawl the content.

If your goal is rankings for real pages, robots.txt is a safety-critical file. Even one wrong line blocks discovery, rendering, and refresh crawling for key URLs.

What robots.txt does for SEO

  1. It protects crawl budget on large sites - Bots spend time crawling URLs. If they waste time on low-value URLs, important pages refresh more slowly. Robots.txt helps focus crawling on the URLs that matter.
  2. Prevent crawl traps - Some site features generate endless URL variations. Typical sources are:
    • Internal search URLs
    • Faceted navigation with lots of parameter combos
    • Calendar pages
    • Session IDs
  3. Robots.txt blocks the patterns that create crawl loops. Examples:
    • Admin paths
    • Staging paths
    • Temporary dev folders

Note: robots.txt is not a security measure. Bad bots ignore it.

What robots.txt does not do for SEO

Robots.txt is not an indexing control. A blocked URL might still get indexed as a URL-only result if links exist. If you need a page removed from Google results, use:

  • Meta robots noindex on the page
  • X-Robots-Tag header

Important sequencing point: If you block crawling of a URL in robots.txt, Googlebot will not fetch the page to see a noindex tag on it. So robots.txt plus noindex on the same URL is a common failure pattern.

Robots.txt syntax

A robots.txt file is plain text. It uses groups. Each group begins with User-agent, then rules like Disallow and Allow.

Core fields

User-agent
Targets a crawler name. User-agent: * targets all crawlers.

Disallow
Tells the specified user-agent to skip crawling a path.

Allow
Overrides a broader Disallow rule for a more specific path.

Sitemap
Points crawlers to your XML sitemap location. It is not tied to a user-agent group, and multiple sitemap lines are supported.

A safe starting robots.txt for most sites

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://www.example.com/sitemap.xml

This pattern is common on WordPress because wp-admin is not useful for search, while admin-ajax.php supports site functionality.

Robots.txt examples for real SEO situations

Example 1: block internal search result pages
Internal search pages often create thin, duplicate, or infinite URL sets.

User-agent: *
Disallow: /search
Disallow: /?s=

Example 2: block parameter crawl traps
If your site generates many URL parameters, block the worst offenders first. Keep rules narrow. Overblocking causes ranking drops.

User-agent: *
Disallow: /?sort=
Disallow: /?filter=
Disallow: /&sort=
Disallow: /&filter=

Example 3: Allow a single file inside a blocked folder
User-agent: *
Disallow: /private/
Allow: /private/press-release.pdf

Example 4: block image types in a folder
User-agent: *
Disallow: /assets/private-images/.jpg
Disallow: /assets/private-images/.png

Wildcard-style matching is commonly supported by major crawlers, though syntax support varies by crawler. Robots.txt for Shopify, WordPress, and custom sites

WordPress: robots.txt editing depends on your setup. Many sites manage it through SEO plugins or by placing a physical robots.txt file at the site root.

Shopify: Many themes support a robots.txt template approach, so edits follow Shopify’s rules and deployment flow. Treat changes as production changes. Test every edit.

However, for custom and headless builds, put robots.txt at the root of the primary crawl host. If you run multiple subdomains, each host needs its own robots.txt file.

Rule to follow - Only one thing matters for discovery. The crawler requests https://host/robots.txt before crawling URLs on that host.

Common robots.txt mistakes that hurt rankings

Blocking the entire site
User-agent: *
Disallow: /

This wipes crawling, then rankings drop.

Blocking CSS and JS needed for rendering
If you block resources required for page rendering, Google struggles to understand the layout and content. Keep critical assets crawlable unless there is a hard technical reason.

Putting Allow or Disallow before User-agent
Rules placed before the first User-agent section get ignored by crawlers.

Blocking important pages by pattern
Overbroad wildcard rules often catch URLs you wanted indexed. Start narrow, monitor impact, then expand.

Trying to deindex with robots.txt
Robots.txt blocks crawling, not indexing. Use noindex or removal tooling for deindex needs.

How to audit your robots.txt for SEO

Step 1: fetch the live file
Go to:
https://yourdomain.com/robots.txt

Save a copy.

Step 2: list your money URLs
Home page
Category and collection pages
Top service pages
Top blog posts
Top product pages

Confirm none of these paths are blocked by a broad rule.

Step 3: list low value URL patterns worth blocking
Internal search
Filter and sort parameters
Cart, checkout, account areas
Admin areas
Duplicate tag archives
Staging folders

Step 4: confirm sitemap line exists and is correct
Sitemap field supports an absolute URL. Multiple sitemap lines are valid.

Step 5: Check Search Console crawl behaviour
Look for spikes in crawled but not indexed URLs.
Look for heavy crawling on parameter URLs.
Adjust robots.txt with narrow rules first.

Step 6: Protect against deployment mistakes
Store robots.txt in version control.
Add a release checklist item for it.
Add monitoring that alerts if the file changes unexpectedly.

How to test robots.txt before it breaks SEO

Google provides robots.txt testing inside Search Console, under legacy tools in some interfaces. We consider Google’s testing approach as a key step before publishing changes.

Testing workflow used by technical SEO teams

  1. Draft the new robots.txt text.
  2. Run it through a validator and syntax check.
  3. Test key URLs against the rules.
  4. Publish.
  5. Re-test the live file.
  6. Monitor crawl stats and index coverage for the next 7 to 14 days.

Robots.txt for SEO, quick decision guide

  1. If your goal is to reduce wasted crawling, use robots.txt.
  2. If your goal is to remove a page from Google results, use noindex or X-Robots-Tag, not robots.txt.
  3. If your goal is to hide sensitive content, use authentication and proper access controls. Robots.txt is not a security tool.

FAQ, robots.txt for SEO

Does robots.txt improve rankings?
Indirectly. Better crawl focus leads to faster discovery and refresh of important URLs on large sites. On small sites, impact is often minimal unless crawl traps exist.

If I block a URL in robots.txt, will it disappear from Google?
Not reliably. Google notes blocked URLs might still appear if other pages link to them.

Should I list my sitemap in robots.txt?
Yes. Google’s robots.txt spec supports sitemap lines, and multiple sitemap lines are valid.

Should I block my staging site?
Yes, plus add authentication. Robots.txt alone is not enough.

What is the safest robots.txt change process?
Make one small change, test, deploy, then monitor crawl and indexing. Large rewrites raise risk.

SCALE YOUR BUSINESS & DOMINATE YOUR INDUSTRY!

Logo Agency
arrow

We promise not to send you spam and keep your data safe!

arrow

We promise not to send you spam and keep your data safe!

arrow

We promise not to send you spam and keep your data safe!

arrow

We promise not to send you spam and keep your data safe!

arrow

We promise not to send you spam and keep your data safe!

Top Arrow
We still promise not to send you spam and keep your data safe!