I remember when I first learned about Robots.txt; I was just an SEO novice and couldn’t put my head around how a text file on a web server could possibly affect how a site’s content was going to be crawled & indexed in the search engines.
Years – and hundreds of experiences – later, I have learned how important this file is and I now consider myself to be a pretty solid Robots.txt analyst and optimizer. I would like to share some of the things that I’ve learned with you.
It goes without saying that if you use the Robots.txt file incorrectly, you are going to face some serious indexing issues. The most common misuses of the Robots.txt file include:
1. Restricting access to pages that you want to rank in the search engine results
2. Allowing access to sensitive information or pages that you do not want to be included in the search engine results
3. Copying a competitor’s Robots.txt – this is borderline suicidal as every site’s Information Architecture is unique
4. Not using a Robots.txt file when needed is another common issue we see. For example, if you are using an IDX plugin that auto populates duplicate content into your site and you do not restrict the search engine spiders from indexing this content, it will eventually hurt you because in Google’s eyes you are trying to take credit for content that is not yours. (For more information about this, check out the IDX plugin warning post.)
If you are unsure if your Robots.txt file is correct, I encourage you to consult with an expert today. Issues that involve a Robots.txt file can often take 2-5 weeks to fully remedy, so be careful when playing with your Robots.txt file.