Home | Internet Business | Seo
“Robots.txt” is a regular text file. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all.It resides in the root directory of a website. This file stands alone and you cannot know its contents by looking at a standard webpage on that site. Using a robots.txt file gives you a search engine robots point of view. Creating your “robots.txt” file: Make sure it’s name i.e “robots.txt” & must be uploaded to the root accessible directory of your site. 1) Here’s a basic “robots.txt”: User-agent: * Disallow: / A USERAGENT line to identify the crawler in question followed by one or more DISALLOW: lines to disallow it from crawling certain parts of your site. 2) User-agent: * Disallow: /cgi-bin/ Disallow: /privatedir/ Disallow: /tutorials/blank.htm It disallows all search engines and robots from crawling select directories and pages. 3) User-agent: Googlebot-Image Disallow: / If you do not want Google’s Image bot to crawl your site’s images and making them searchable online,to save bandwidth the above declaration will do the same. 4) User-agent: * Disallow: / User-agent: Googlebot Disallow: /cgi-bin/ Disallow: /privatedir/ This is interesting- here we declare that crawlers in general should not crawl any parts of our site, EXCEPT for Google, which is allowed to crawl the entire site apart from /cgi-bin/ and /privatedir/. So the rules of specificity apply, not inheritance. 5) User-agent: * Disallow: / User-agent: Googlebot Allow: / This is the preferred way to disallow all crawlers from your site EXCEPT Google. An alternative to “robots.txt” is to use the robots meta tag when your web host prohibits you from uploading “robots.txt” to the root directory, or you simply wish to restrict crawlers from a few select pages on your site. Continue on Blog: http://www.thewebmarketingblog.com/
Article Source:- Link Building
Please Rate this Article
5 out of 54 out of 53 out of 52 out of 51 out of 5
Not yet Rated