1.1 Robots & directives (HTTP + HTML)MediumVerified
robots.txt over 500KB
Google only reads the first 500KB of robots.txt. Anything past that is ignored, so a bloated file can silently drop the rules I care about most if they sit near the end.
What it is
The file exceeds Google’s size limit; content past the limit is ignored.
Why it matters
Rules beyond the limit are not applied, so intended blocks/allows may silently fail.
How to fix it
Trim and simplify rules to stay under 500KB.
How to find it on your site
- Check the size with curl -s https://yourdomain.com/robots.txt | wc -c
- If it is near or over 500000 bytes, the tail is being ignored by Google.
- Audit for thousands of auto-generated Disallow lines, which is the usual cause.
- Consolidate rules with wildcards and move essential directives to the top.
Cross-reference to ranking and citation factors
Ignored rules can mean unwanted URLs get crawled or wanted ones stay blocked, both of which distort crawl efficiency rather than ranking directly.
Impact
Medium. Mis-applied rules can cause unintended crawl behaviour. Direct (documented limit).
Evidence
Google enforces a 500KB robots.txt limit; excess is ignored. Google Search Central, Intro to robots.txt