Robots.txt

Robots Exclusion Protocol

A text file at the site root instructing search engine crawlers which pages or directories to avoid.

Chi tiết kỹ thuật

Robots.txt uses two mechanisms: robots.txt (file-level, prevents crawling but not indexing) and meta robots tags (page-level, controls indexing and link following). Common directives: 'noindex' (exclude from search), 'nofollow' (don't pass link equity), 'noarchive' (no cached copy). X-Robots-Tag HTTP headers provide the same controls for non-HTML resources (PDFs, images). A blocked page can still rank if other pages link to it — 'noindex' in meta tags is the only way to guarantee exclusion from search results.

Ví dụ

```
# robots.txt
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/internal/

Sitemap: https://peasytools.com/sitemap.xml
```

Định dạng liên quan

.pages .txt

Công cụ liên quan

S SERP Preview O OG Tag Debugger H Heading Analyzer K Keyword Density Analyzer R Readability Score X XML Sitemap Generator S Schema.org Generator L Link Extractor C Canonical Tag Checker R Robots.txt Analyzer S Structured Data Validator W Word Count & SEO Grade M Meta Length Checker U URL Slug Generator K Keyword Density Analyzer

Thuật ngữ liên quan

Canonical URL Alt Text Backlink Anchor Text CLS 301 Redirect Breadcrumb Core Update