LLMs.txt is a newly proposed standard for AI models to crawl websites, offering content owners greater control over their data.
Large language models (LLMs) are increasingly vital in today’s digital world. They learn from, generate content with, and respond to user queries using website information. However, LLMs face challenges in efficiently crawling and understanding web content. This is precisely where a new standard called LLMs.txt, proposed by Australian technology expert Jeremy Howard, comes into play.
Think of it as robots.txt’s more cooperative cousin. While robots.txt tells search engines what they can’t access, LLMs.txt is designed to help AI models better understand what they should pay attention to.
How Does LLMs.txt Work?
LLMs.txt is a simple text file placed in a website’s root directory. Site owners can add URLs of specific sections, provide summaries, or even supply the entire raw text content of the site in single or multiple files. They can also create .md markdown versions of web pages they want LLMs to pay particular attention to.
This provides LLMs with detailed context on how content can be accessed and used. According to Howard, the LLMs.txt markdown format is easily readable by both humans and LLMs and has a precise format that can be processed with classical programming techniques like parsers and regular expressions.
This simple yet effective approach enables the reduction of entire websites to their basic linguistic and textual essence, allowing AI platforms to parse content more easily for various purposes (content development, site structure analysis, entity research, etc.).
Several free tools are already available:
- Markdowner: A free, open-source tool that converts website content into well-structured Markdown files
- Appify: An LLMs.txt generator developed by Jacob Kopecky
- Website LLMs: This WordPress plugin creates your LLMs.txt file for you
- FireCrawl: One of the first tools to emerge for creating LLMs.txt files
- Manual Creation: You can manually capture and flatten your website’s content
Why Should You Care About LLMs.txt?
The potential benefits of LLMs.txt are extensive, offering advantages for both LLMs and website owners:
- Improves Resource Usage for LLMs: Theoretically, LLMs.txt allows LLMs to use their technical resources more efficiently. Instead of extracting and making sense of complex HTML pages, LLMs can access expert-level information from a single accessible location.
- Provides Control to Content Owners: Today, content owners are increasingly concerned about how AI models use web content. LLMs.txt gives content owners more control over how their data is used.
- Offers Potential SEO and GEO Benefits: LLMs.txt has various potential benefits in search engine optimization (SEO) and generative search optimization (GEO):
- Protects Proprietary Content: LLMs.txt can theoretically prevent AI from unauthorized use of original content
- Brand Reputation Management: Businesses can theoretically have some control over how their information appears in AI-generated responses
- Linguistic and Content Analysis: With a completely flattened and AI-consumable version of your website, you can perform various analyses
- Enhanced AI Interaction: LLMs.txt helps LLMs interact more effectively with your website, allowing them to more easily retrieve accurate and relevant information.
- Improved Content Visibility: By directing AI systems to focus on specific content, LLMs.txt can theoretically “optimize” your website for AI indexing.
Post Views: 15