1.1 Robots & directives (HTTP + HTML)HighVerified
robots.txt blocks a directory with indexable content
An over-broad Disallow can sweep up content I actually want indexed. The rule looks reasonable until I notice it also covers a folder of live pages.
What it is
A Disallow covers URLs that should be indexed.
Why it matters
Disallowed URLs are not crawled, so their content cannot be indexed or used by AI features.
How to fix it
Narrow the Disallow to only non-indexable paths.
How to find it on your site
- List every Disallow rule and write down the paths each one covers.
- Cross-check those paths against your sitemap and key landing pages.
- Test a sample of important URLs in the Search Console robots.txt tester.
- Narrow any rule that is broader than it needs to be.
Cross-reference to ranking and citation factors
Indexable content behind a Disallow cannot accrue ranking or citation signals, because it is never crawled. This is lost opportunity rather than a penalty.
Impact
High for the affected section, invisible to search and AI. Direct.
Evidence
Disallowed pages cannot be crawled, indexed, or have their directives read. Google Search Central, Intro to robots.txt