New Googlebot-News agent
In the beginning of December, Google released a new user agent, the Googlebot-News. Just like the regular Googlebot, this bot will crawl your website for any news, which will later be indexed in Google News. This can give additional edge to people who are running their own news-oriented sites and want to get better Google rankings.
With this new addition, you can now choose which sections of your site will be crawled by the news bot and which can be crawled by the regular Googlebot with the help of the robots.txt file. Here is how you can manage both the regular Google bot and the Google news bot:
User-agent: Googlebot
Disallow:
This will allow for all pages to be crawled by both the news and the general Googlebot
If you wish to prevent the Google news bot from visiting your site, but you wish to allow visits from the general bot, you can use the following lines:
User-agent: Googlebot
Disallow:
User-agent: Googlebot-News
Disallow: /
If you change the place of the Googlebot-News and the Googlebot in the last setup, you will allow the Googlebot-News to crawl your website and forbid the Googlebot to do so.
You can also disallow certain folders of your website for both bots:
User-agent: Googlebot
Disallow: /latest_news
User-agent: Googlebot-News
Disallow: /archives
This way, the news bot will not visit the /archives folder, and the Googlebot will not visit the /latest_news folder.
Originally published Tuesday, January 12th, 2010 at 9:22 am, updated July 4, 2024 and is filed under Latest News.Tags: Google, robots.txt, googlebot, googlebot-news