February 12, 2015

Understanding Robots.txt, Optimizing Robots File on Blogger and WordPress

There are a lot of things you need to know and learn in blogging. You can never be perfect in any field as there are many bigger and better things you will know along the way. Even very small things and files in your website matter a lot in terms of Google rankings and SEO as a whole. One such thing is the “robots.txt” file. Initially, when I started blogging, I did not actually know what this file is and the importance of this file. So, I made a lot of research from various sources and understood its exact use and how important it is in SEO. Many newbie bloggers don’t know what robots.txt is and its use, so I thought of writing a perfect descriptive article on it.


What is Robots.txt file?

Robots.txt is very small text file present at the root of your site. As most of you know, the web crawlers and spiders are responsible for the development of the entire web network. Ideally, these crawlers can crawl into any page or any URL present on web, even the one’s which are private and should not be accessed.

It does not restrict people from accessing your content.

In order to take control of the files you want the crawlers to access and restrict, you can direct them using the robots.txt file. Robots.txt is not a html file, but the spiders obey what this file states. This file is not something which protects your site directly from external threats, but it just requests the crawler bots not to enter a particular area of your site.

Where do you find robots.txt file?

The location of this file is very important for the crawlers to identify it. So, it must be in the main directory of your website.


This is where the bots and even you can find the file of any website. If the crawlers won’t find the file in the main directory, they simply assume that there is no robots file for the website and there by index all the pages of the site.

Basic Structure of Robots.txt file

The structure of the file is very simple and any one can understand it easily. It majorly consists of 2 components i.e. User agent and Disallow.




Complete Understanding of Exclusion with Examples

Firstly, you should know what the components exactly mean and what their function is. “User-agent” is the term used to determine the search engine crawlers, whether it may be Google, Yahoo or any search engine. “Disallow” is the term used to list the files or directories and exclude them from the crawler listings.

Directory or Folder Exclusion:

The basic exclusion which is used by most of the sites is,

User-agent: *

Disallow: /test/

Here, * indicates all the search engine crawlers. Disallowing /test/ indicated that the folder with name ‘test’ has to be excluded from being crawled.

File Exclusion:

User-agent: *

Disallow: /test.html

This indicates that all the search engine crawlers should not crawl the file named ‘test.html’.

Exclusion of entire site:

User-agent: *

Disallow: /

Inclusion of entire site:

User-agent: *



User-agent: *

Allow: /

Exclusion of Single crawler:

User-agent: googlebot

Disallow: /test/

Add a Sitemap:

User-agent: *

Disallow: /test/

Sitemap: http://www.yourdomain.com/sitemap.xml


How to Create robots.txt file?

Creating a robots.txt file is very simple as there is no special language or technical complication here. You can do this in two ways, one is manual creation and the other is to create the file using tools.

Manual creation of the file is discussed in the above part, so let us go to the usage of tools, which is even simpler. You can use robots.txt file generator tools by SEOBook, Mcanerin, etc.

Testing robots.txt file.

The file you created may either work properly or not. In order to test that, you can use the robots.txt tester tool. You can simply submit a URL to the tester tool, The tool operates as Googlebot would to check your robots.txt file and verifies that your URL has been blocked properly.tester tool

Here are few steps listed for the webmasters by Google, which will help you test the robots.txt file you created:


Limitations of robots.txt file:

Though the robots.txt is a trust-worthy component when it comes to directing the crawlers, it still has few limitations or disadvantages when dealt practically.

1. The crawlers cannot be forced, they can only be directed: When we use the robots.txt file to disallow a particular path or URL, we are just requesting the web crawlers not to index that particular URL or directory but not forcing the bots to divert. And all the web crawlers might not obey the instructions being given in this file. So in order to block a particular URL, other methods like password protection or usage of meta tags can be implemented which are more effective and efficient.

2. Syntax interpretation might differ for each crawler: The syntax which is mentioned above holds good for maximum percentage of web crawlers. But few crawlers might either not understand the syntax or interpret it in completely different way, which might pull you into trouble.

3. References to your URL’s from other sites cannot be prevented by robots.txt: This is practically one of the main disadvantage of robots.txt file. The file will disallow Google crawlers from accessing any particular URL, when they come in directly into the site. But contrary to this, when that particular URL which you want to block is being referred from some other website, then the crawlers will not stop themselves from getting into the link, thereby listing the blocked URL.

So, in order to prevent these things from happening, you have to go with other protective methods like password protecting files from server or by using the meta tags (noindex, follow) along with the robots.txt file.

Check what Matt Cutts Take on Optimizing robots.txt

 Adding Custom Robots.Txt to Blogger

I have already written an article about advanced search engine preferences where I talked about custom robots.txt file, in the Advanced SEO Guide for Blogger. Generally, for blogger the robots.txt file looks something like this:

User-agent: Mediapartners-Google Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: https://www.alltechbuzz.net/feeds/posts/default?orderby=UPDATED

Steps to Follow:

  1. Open your blogger dashboard.
  2. Navigate to SettingsSearch Preferences > Crawlers and indexing > Custom robots.txt > Edit > Yes.
  3. Paste your robots.txt code in it.
  4. Click Save Changes button.

How to Optimize Robots.txt for WordPress:

For WordPress we do have many plugins for doing the same. I would recommend you to go ahead with Yoast Plugin to manage search preferences. Do check out our article on Yoast SEO Settings for complete settings.

The below is an example of robots.txt file that you can use for any domain that is hosted on wordpress:

sitemap: http://www.yourdomain.com/sitemap.xml

User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /archives/
disallow: /*?*
Disallow: *?replytocom
Disallow: /wp-*
Disallow: /comments/feed/
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /

Once you have optimized your robots.txt file I would highly recommend you to test your file first using the robots.txt tester in Google Webmaster Tools.

robots.txt test for alltechbuzz

So, I hope that helped. Let me know if you have any doubts regarding robots.txt optimization in your comments.

About the author 

Imran Uddin

Imran Uddin is a Professional blogger from India and on All Tech Buzz, he writes about Blogging, How to tips, Making money online, etc.

  1. I already uploaded this file.But i had a doubt about disallow…i thought this disallow thing will affect my website SEO ..and i will loos traffic. i thought to remove/delete this robots.txt..now my doubt is cleared.Thanks for sharing.

  2. Hello Anurag,

    Robot.txt files changed my blogging journey completely. Initially i didn’t know anything about Robot.txt file. I faced a problem of crawl delay. Webmaster blocks all my posts to crawl. But later i am able to change this file and now i am getting a good no of visitors from the search engine. I would like to mention KB Robot.txt. I am using this plugin because initially my robot.txt file was fixed as crawl delay. I was not able to change my file anyway.

    After changing i am getting almost 50+ visitors daily without Social media promotion. I would like to suggest every new blogger that they should concentrate about their Robot.txt file. Isn’t it Anurag?

    This is a very informative post i must say.
    Thanks for sharing this knowledge.
    _Happy Blogging.

  3. I Used your blogger guide to setup my blog bro , it is awesome. In that you gave just code to setup that custom robots.txt settings.I just want to know how it functions and Every thing .I have Learnt from this article .I gonna update blog robots.txt file.

    After Long Time you have written on blogging and i just liked the innovative article you made in this article and it’s very fine .
    need more innovative article’s bro 🙂
    Thank you

  4. Thanks For This Robots.txt full explanation i just upadtes my robots.txt file. i always confuse about it. feels good to landed on this article. it is really helpful to me.

  5. Hi,
    Nice presentation. However, the robots.txt file you’re claiming as optimized is not is that you say.

    User-agent: *
    # disallow all files in these directories
    Disallow: /cgi-bin/
    Disallow: /wp-admin/
    Disallow: /wp-includes/ – Should not be blocked as it contains some javascript files for your site front end and
    Disallow: /wp-content/ – You’re blocking the images as well as the theme resources from Google. Google won’t decide the mobile-friendliness of your site
    Disallow: /archives/ – Hiding something from Google. Either disable it completely or use noindex meta tag
    disallow: /*?* – What is this? you’r blocking Google from following the search performance in your site
    Disallow: *?replytocom
    Disallow: /wp-*
    Disallow: /comments/feed/ – You’re blocking robots to get updates of your site.
    – See more at: https://www.alltechbuzz.net/robots-txt-file-optimization/#sthash.f0vAsw93.dpuf

    After all, you’re hiding a large amount of content at your site from Google and it sometime encourages the robots to think your site having suspicious contents.

    Instead of blocking a high amount of content from the robots, you could make your content “Noindex”. Block only the part which you can’t noindex if you want Google to ignore.

    The optimized robots.txt would be like this:

    User-agent: *
    Disallow: /wp-admin/

    Sitemap: http://www.yoursite.com/sitemap_index.xml

    Try some plugin like Yoast SEO to noindex unnecessary pages.

  6. I Really don’t know about robots.txt file but now i understand what exactly this file do. I always copy other robots.txt and paste in my website but now i can understand so i can create myself.

    Thanks Imran Uddin And Anurag

  7. Hello Imran,
    Thanks for posting this article as it will help a lot of Bloggers, whom you Inspire, to make their Blogs more Professional and reach on the top page of Google Search Results.
    I have a doubt, the Code that you have given at the end which we have to paste in Robots.txt file is for WordPress.
    Can you please update it for Blogger too?
    Again thanks for posting it.

  8. you explained robots.txt in a way that it has to be and even i applied it on my blog.


  9. Thank you Imran for publishing a good article on robots.txt.Actually i have lot of doubts on robots.txt but after reading this article i hav got cleared my doubts and even i learn new things.

  10. I am reading your article. what i have found unique about your site is you cover all the aspect of the topic you mentioned. we dont need to go anywhere else for finding other aspect of that topic. really impressed. another article about robots.txt file. thanks again.
    alltechbuzz.net you rock 🙂

Comments are closed.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}