Blocking search engine bots from caching specific webpages

by rajesh 2007-09-05 10:09:32

Every search engine before accessing/caching your page will look for a file named robots.txt file in your home directory. Example: http://hiox.org/robots.txt

If the file is not present the bot will cache every page it can scroll through..

We can add the robots.txt file and set specific like allowing only certain bot, disallwing certain file folders etc..

Some Example Follow:

First create a file named robots.txt in your home directory and proceed

To allow all robots complete access: Add the following code in robots.txt file
User-agent: *
Disallow:


To exclude all robots from the server: Note we have used a forward Slash
User-agent: *
Disallow: /


To disallow all robots from accessing/caching a folder of a server:
User-agent: *
Disallow: /foldername/
Disallow: /example/


When you use the above code, no robot will access/cache any file under http://sitename/foldername/ & http://sitename/example/ folders

To disallow all robots from accessing/caching a file of a server:
User-agent: *
Disallow: /foldername/filename.html


When you use the above code, no robot will access/cache http://sitename/foldername/filename.html file


Blocking a specific search engine or bot
User-agent: Googlebot
Disallow: /


The above code block Google bot. Just replace Googlebot with whatever search robot to block.

Example: If you want to block access to all bots other than the Googlebot, you can use the following syntax:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow:

Tagged in:

1700
like
0
dislike
0
mail
flag

You must LOGIN to add comments