1.1 Robots & directives (HTTP + HTML)CriticalVerified

robots.txt returns 5xx

This is one of the few crawl issues I treat as an emergency. Google has said that if robots.txt returns a server error for long enough, it can stop crawling the whole site, so a flaky 500 on this single file can quietly suppress everything.

What it is

The robots.txt URL returns a server error (500-range) rather than 200 or 404.

Why it matters

Google may treat a persistently unreachable (5xx) robots.txt as a signal to stop crawling the site entirely, because it cannot confirm what it is allowed to fetch.

How to fix it

Fix the server error so robots.txt returns 200 (with rules) or 404 (treated as allow-all). Monitor uptime of the file specifically.

How to find it on your site

Run curl -I https://yourdomain.com/robots.txt and read the status code.
Repeat at different times, because intermittent 5xx errors are the dangerous ones.
Check the Search Console robots.txt report for fetch failures and the date they began.
Check server and CDN logs for errors on the /robots.txt route specifically.

Cross-reference to ranking and citation factors

A persistent 5xx here can suppress crawling site-wide, which sits upstream of every ranking and citation signal. I fix this before anything else.

Impact

High/blocking when it occurs, can suppress crawling site-wide until resolved. Direct, per Google’s documented handling.

Evidence

Google’s documented handling: a server error for robots.txt can cause Google to pause crawling. Google Search Central, Intro to robots.txt; Google Search Central, How HTTP status codes affect Google Search

Sources

← robots.txt missing entirely robots.txt Disallow: / blocking whole site →