ResellersPanel's Blog

Control the Search engine bots using the robots.txt file

October 24, 2009 By admin The Free Reseller Program

The traffic a search engine can bring is the thing each site owner aims to achieve. And in order for the search engine to know what your website is about, it uses a “spider” which crawls the network, looking for updates and reading your website.

However, you may not want for the spider to read parts of you site. If you are working on a project online, but you don’t want anyone to know about it, if you are using duplicate content or simply if you want to ban the “spider” from your site, there is one tool you can use – the robots.txt file.

The robots.txt file is a simple text file, which can be created with any text editor out there – for example you can create it with Notepad and achieve the same effect as that of someone who created it with DreamWeaver. This file needs to be put in the root folder of your site. When a spider crawls a site, the first thing it looks for is the robots.txt file. So, even if the spider is looking at your-domain.com/shop/order.php, it will first check if there is a your-domain.com/robots.txt present.

In the actual file, you can enter the rules for the search engine spiders that are trying to visit your site. The first thing that you have to specify is the spiders for which you want to set specific rules. You can do this with the “User-agent:“ record. For example:

User-agent: * – this will affect all spiders, who obey the robots.txt file
User-agent: Google – this will apply only to the Google spider

Once you have specified the spiders you want to target, it’s time to set the rules.

Disallow: / – this rule will stop the specified spiders from looking at your site.
Disallow: /tmp/ – this rule will stop robots from looking in the /tmp/ folder of your site, while the rest of the site will be crawled.
Disallow: /tmp/test.html – this will stop spiders from looking into a specific file.

Additionally, you can also specify a robots <META> tag. This meta tag is set to an individual page and is of the following format:

The NOFOLLOW attribute will stop the spider from following the links on your page and the NOINDEX attribute will stop the spider from reading the text on the page.

Originally published Saturday, October 24th, 2009 at 12:48 pm, updated July 3, 2024 and is filed under The Free Reseller Program.

Tags: website, SEO tips, search engine, robots.txt, spider

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Hepsia vs the mainstream client-centric cloud hosting platforms

New stats widget available in the Hepsia Control Panel

Online security - prevent your site from being hacked

One Response to “Control the Search engine bots using the robots.txt file”

ResellersPanel Blog » Blog Archive » New Googlebot-News agent January 12, 2010 at 9:27 am

[…] be crawled by the news bot and which can be crawled by the regular Googlebot with the help of the robots.txt file. Here is how you can manage both the regular Google bot and the Google news […]

Reply

Control the Search engine bots using the robots.txt file

One Response to “Control the Search engine bots using the robots.txt file”

Leave a Reply Cancel reply

Join Our Community

Categories

Archives

Social Connection

Tag Cloud

Meta