Crafting Impressive Digital Solutions

We give digital shape to your ideas while incorporating brilliant strategies and state-of-the-art features.
Using latest technologies, our dedicated team develops lucrative solutions.

Blog Details

Happy Birthday Robots.txt

 

Today is the 20th birthday of the robots.txt directive being available for webmasters to block search engines from crawling their pages. It was created by Martijn Koster in 1994.

What is robots.txt?

Robots.txt is a text but not html file we put on the site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that we put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sensitive data, it is too naaive to rely on robots.txt to protect it from being indexed and displayed in search results.

The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don’t find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way.

Structure of a Robots.txt File
The structure of a robots.txt is pretty simple – it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:

User-agent:
Disallow:

It does not disallow any file from crawling

or

User-agent:
Disallow: /

It disallow the whole website from crawling.
“User-agent” are search engines’ crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to “user-agent:” and “disallow:” entries, you can include comment lines – just put the # sign at the beginning of the line:
# All user agents are disallowed to see the /temp directory.
User-agent: *
Disallow: /temp/

 

For the last 20 years this simple txt file is the guiding path which directs a search engine to crawl or not to crawl a file / website.

Share on facebook
Share on twitter
Share on linkedin

Leave a Reply

Your email address will not be published. Required fields are marked *

You may
also like

Some More Important Posts You Must Read.

Ready To Discuss Your Project?

Schedule a 20 minutes consultation

It’s Free. You will get a call from us as per this schedule. Our team will discuss about your requirements and its solution.