1.1 Robots & directives (HTTP + HTML)HighVerified

robots.txt blocks a directory with indexable content

An over-broad Disallow can sweep up content I actually want indexed. The rule looks reasonable until I notice it also covers a folder of live pages.

What it is

A Disallow covers URLs that should be indexed.

Why it matters

Disallowed URLs are not crawled, so their content cannot be indexed or used by AI features.

How to fix it

Narrow the Disallow to only non-indexable paths.

How to find it on your site

List every Disallow rule and write down the paths each one covers.
Cross-check those paths against your sitemap and key landing pages.
Test a sample of important URLs in the Search Console robots.txt tester.
Narrow any rule that is broader than it needs to be.

Cross-reference to ranking and citation factors

Indexable content behind a Disallow cannot accrue ranking or citation signals, because it is never crawled. This is lost opportunity rather than a penalty.

Impact

High for the affected section, invisible to search and AI. Direct.

Evidence

Disallowed pages cannot be crawled, indexed, or have their directives read. Google Search Central, Intro to robots.txt

Sources

← robots.txt blocks CSS/JS needed to render Sitemap not declared in robots.txt →