1.1 Robots & directives (HTTP + HTML)MediumVerified

robots.txt over 500KB

Google only reads the first 500KB of robots.txt. Anything past that is ignored, so a bloated file can silently drop the rules I care about most if they sit near the end.

What it is

The file exceeds Google’s size limit; content past the limit is ignored.

Why it matters

Rules beyond the limit are not applied, so intended blocks/allows may silently fail.

How to fix it

Trim and simplify rules to stay under 500KB.

How to find it on your site

  1. Check the size with curl -s https://yourdomain.com/robots.txt | wc -c
  2. If it is near or over 500000 bytes, the tail is being ignored by Google.
  3. Audit for thousands of auto-generated Disallow lines, which is the usual cause.
  4. Consolidate rules with wildcards and move essential directives to the top.

Cross-reference to ranking and citation factors

Ignored rules can mean unwanted URLs get crawled or wanted ones stay blocked, both of which distort crawl efficiency rather than ranking directly.

Impact

Medium. Mis-applied rules can cause unintended crawl behaviour. Direct (documented limit).

Evidence

Google enforces a 500KB robots.txt limit; excess is ignored. Google Search Central, Intro to robots.txt