← Back to Blog

LLMS-Full.txt Generator: Advanced Configuration

2024-02-108 min read

LLMS-Full.txt Generator: Advanced Configuration Guide

LLMS-full.txt extends the basic LLMS.txt format with advanced rules, detailed path specifications, and crawler-specific settings. Learn how to create comprehensive AI crawler configurations.

What is LLMS-Full.txt?

LLMS-full.txt is an extended version of LLMS.txt that provides:

- **Granular control** over crawler behavior

- **Path-specific rules** with wildcard support

- **Crawler-specific settings** and priorities

- **Advanced directives** for complex sites

- **Enhanced metadata** and documentation

LLMS-Full.txt vs LLMS.txt

| Feature | LLMS.txt | LLMS-Full.txt |

|---------|----------|---------------|

| Basic rules | ✅ | ✅ |

| Crawler selection | ✅ | ✅ |

| Path wildcards | Limited | ✅ Advanced |

| Priority rules | ❌ | ✅ |

| Metadata | Basic | ✅ Extended |

| Documentation | ❌ | ✅ |

| File size | Small | Larger |

Advanced Syntax Elements

Wildcard Patterns

Block all admin paths

Disallow: /admin/*

Allow all blog posts

Allow: /blog/*/

Block specific file types

Disallow: /*.pdf

Disallow: /*.zip

Allow specific patterns

Allow: /api/v*/public

Priority Directives

High priority paths (crawl first)

Priority: high

Allow: /blog

Allow: /docs

Low priority paths (crawl last)

Priority: low

Allow: /archive

Rate Limiting

Requests per minute

Rate-limit: 10/minute

Daily request cap

Daily-limit: 1000

Concurrent connections

Max-connections: 2

Complete LLMS-Full.txt Example

LLMS-Full.txt Configuration

Last updated: 2024-02-10

Contact: webmaster@example.com

Global settings

Crawl-delay: 5

Rate-limit: 10/minute

Daily-limit: 5000

Sitemap: https://example.com/sitemap.xml

OpenAI GPTBot Configuration

User-agent: GPTBot

Allow: /

Disallow: /admin/*

Disallow: /api/*

Disallow: /*.pdf

Priority: high

Crawl-delay: 10

Max-connections: 2

Anthropic Claude Configuration

User-agent: Claude-Web

Allow: /

Disallow: /admin/*

Disallow: /private/*

Priority: high

Crawl-delay: 10

Perplexity Bot Configuration

User-agent: PerplexityBot

Allow: /blog

Allow: /docs

Allow: /products

Disallow: /checkout/*

Disallow: /account/*

Priority: medium

Crawl-delay: 8

Firecrawl Configuration

User-agent: Firecrawl

Allow: /blog/*

Allow: /docs/*

Disallow: /admin/*

Disallow: /api/*

Priority: medium

Crawl-delay: 5

Rate-limit: 15/minute

Default rules for other crawlers

User-agent: *

Disallow: /admin

Disallow: /api

Disallow: /private

Crawl-delay: 15

Use Case Examples

Large E-commerce Platform

User-agent: GPTBot

Public product pages

Allow: /products/*

Priority: high

Category pages

Allow: /categories/*

Priority: high

Search results (lower priority)

Allow: /search*

Priority: low

Block sensitive areas

Disallow: /checkout/*

Disallow: /cart/*

Disallow: /account/*

Disallow: /admin/*

Disallow: /api/*

Crawl-delay: 8

Rate-limit: 20/minute

Daily-limit: 10000

Content Publishing Site

User-agent: Claude-Web

Articles (highest priority)

Allow: /articles/*

Priority: high

Crawl-delay: 3

News section

Allow: /news/*

Priority: high

Archive (lower priority)

Allow: /archive/*

Priority: low

Crawl-delay: 15

Block subscriber content

Disallow: /premium/*

Disallow: /subscribers/*

Rate-limit: 30/minute

Daily-limit: 20000

SaaS Documentation

User-agent: *

Documentation (highest priority)

Allow: /docs/*

Priority: high

Crawl-delay: 3

API reference

Allow: /api-reference/*

Priority: high

Blog and tutorials

Allow: /blog/*

Allow: /tutorials/*

Priority: medium

Block application

Disallow: /app/*

Disallow: /dashboard/*

Disallow: /admin/*

Rate-limit: 15/minute

Max-connections: 3

Advanced Features

Conditional Rules

Different rules for different crawlers

User-agent: GPTBot

Allow: /

Crawl-delay: 10

User-agent: Claude-Web

Allow: /public/*

Crawl-delay: 5

Stricter for unknown bots

User-agent: *

Allow: /public/*

Disallow: /

Crawl-delay: 20

Time-Based Access

Allow crawling during off-peak hours

Time-restriction: 00:00-06:00 UTC

Rate-limit: 30/minute

Reduced rate during peak hours

Time-restriction: 09:00-17:00 UTC

Rate-limit: 5/minute

Geographic Restrictions

Allow specific regions

Allow-region: US, EU, CA

Block specific regions

Block-region: CN, RU

Performance Optimization

Balance Crawl Load

Distribute crawler access

User-agent: GPTBot

Crawl-delay: 10

Max-connections: 2

Rate-limit: 10/minute

User-agent: Claude-Web

Crawl-delay: 12

Max-connections: 2

Rate-limit: 8/minute

Prioritize Important Content

Critical content

Priority: high

Allow: /products/*

Allow: /services/*

Nice-to-have content

Priority: low

Allow: /blog/archive/*

Allow: /old-news/*

Testing Your Configuration

1. **Validate syntax** using our generator

2. **Test file accessibility** at root domain

3. **Monitor crawler behavior** in logs

4. **Adjust rules** based on actual usage

5. **Document changes** for team reference

Migration from LLMS.txt

To upgrade from basic LLMS.txt:

1. Copy existing rules to new file

2. Add advanced directives as needed

3. Test both configurations in parallel

4. Monitor for issues

5. Fully migrate when confident

Generate LLMS-Full.txt

Create your advanced configuration using our [free generator](/):

1. Select "Advanced Mode"

2. Configure all crawlers

3. Set priorities and rate limits

4. Add custom directives

5. Download LLMS-full.txt

[Create Advanced Configuration Now](/)

Best Practices

1. **Start simple**, add complexity as needed

2. **Document your rules** with comments

3. **Test thoroughly** before deployment

4. **Monitor regularly** for issues

5. **Update proactively** when site changes

6. **Keep backup** of working configurations

7. **Version control** your LLMS files

Conclusion

LLMS-full.txt provides enterprise-grade control over AI crawler access. Use our generator to create sophisticated configurations without manual coding.

Ready to create your LLMS.txt file?

Use our free generator to create a custom LLMS.txt file in minutes

Generate LLMS.txt Now