Introduction: What is Robots.txt?
Robots.txt is a simple text file placed in the root directory of your website (e.g., https://example.com/robots.txt
) that tells search engine crawlers which pages or sections of your site they are allowed or not allowed to crawl. It acts like a set of instructions for search engines.
Why is Robots.txt Important?
- Helps control which pages search engines index.
- Saves your crawl budget by blocking unnecessary pages.
- Prevents duplicate content issues (e.g., search result pages).
- Protects sensitive sections like admin, cart, or private folders.
How to Create a Robots.txt File
There are two common methods:
- Manual Creation: Open any text editor (Notepad), write your rules, and save the file as
robots.txt
. Upload it to your website’s root directory. - Robots.txt Generator: Use an online tool to generate rules automatically. Just select which pages to block and download the file.
Best Practices for Robots.txt
- Always place the file in the root directory (
https://example.com/robots.txt
). - Add your sitemap URL at the bottom for better crawling.
- Do not block important pages like homepage or product pages.
- Use
Disallow
only when necessary. - Always test in Google Search Console before going live.
Robots.txt Examples
1. Standard Robots.txt (Basic Website)
User-agent: * Disallow: Sitemap: https://example.com/sitemap.xml
2. WordPress Robots.txt
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Sitemap: https://example.com/sitemap_index.xml
3. Blogger (Blogspot) Robots.txt
User-agent: * Disallow: /search Allow: / Sitemap: https://example.blogspot.com/sitemap.xml
4. E-commerce Robots.txt
User-agent: * Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ Sitemap: https://example.com/sitemap.xml
5. News Website Robots.txt
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Allow: /category/ Sitemap: https://example.com/news-sitemap.xml
6. Strict Privacy Robots.txt
User-agent: * Disallow: /
7. Custom Rules for Different Crawlers
User-agent: Googlebot Disallow: /private/ User-agent: Bingbot Disallow: /temp/ User-agent: * Disallow: Sitemap: https://example.com/sitemap.xml
Conclusion
A well-structured robots.txt file is essential for SEO and site performance. It guides search engines to focus on your valuable content and ignore irrelevant or sensitive sections. Always test your robots.txt in Google Search Console before publishing it live. By following best practices and using the right rules, you can protect your site and improve its visibility in search results.