How to setup robots file for your site

What is a Robots.txt

Robots.txt is a text (not html) file placed in the root of your site to tell search robots/spiders which pages should and should not be visited/indexed/crawled. It is not mandatory for search engines to adhere to the instructions found in the robots.txt but generally search engines obey what they are asked not to do.

It is important to note that a robots.txt does not completely prevent search engines from crawling your site (i.e. it is not a firewall) and the fact that you may have a robots.txt file on your site is something like putting a note "Please, do not enter" on your unlocked front door. Put simply, it will not prevent thieves from coming in but the good guys will not open to door and enter.

It goes without saying therefore, if you have sensitive data, you cannot rely 100% on a robots.txt to protect it from being indexed and displayed in search results.

The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it. They do not search the whole site for a file named robots.txt. Instead, they look first in the main directory (i.e. http://www.sitename.com/robots.txt) and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, don't be surprised that search engines index your whole site.

Purpose of a Robots.txt


The main reason why robots.txt would be used is to keep sensitive information private.

How to setup robots file for your site



Launch Notepad
Put the following in your robots.txt file:

User-agent: *
Disallow: /

Save the file as: robots.txt
Adding a robots.txt file to the root of your public anonymous site.
You can add it in the root directory of your Visual Studio Website Project.
You can place it directly in Virtual directory at root level of your website folder.


Adding a robots.txt file to the root of your public anonymous SharePoint site.
Open up your root site in SharePoint Designer.
Double Click the folder All Files
Drag and drop the newly created robots.txt to the All Files folder.
Exit SharePoint Designer.
Alternatively you can create the robots.txt from within SharePoint Designer itself.

To ensure the file is accessible to search engines go to your site URL and append "/robots.txt". Example: http://www.sitename.com/robots.txt

Additional reading can be done on
http://www.robotstxt.org/robotstxt.html

Note:
What if you failed or skipped to put a robots file before you deployed your pages and search engines have already crawled the sensitive content.
Well we have to run Webmaster tools of each search engine and explicitly request each page to be removed from search results.

No comments:

Post a Comment