In my best attempts to completely understand WordPress in its entirety, I must post this because I’ve been constantly dealing with Google Web Master tools errors.

This post in itself is just a constant reminder for myself as a knowledge seeker / new kid on the block that WordPress at times is just not self-explanatory and thus creates havoc. Though as a Engineer, I love havoc – it means theres a problem and thus a solution must be sought.

The problem started when I downloaded SEO-Ultimate (you can acquire it here). The Plugin itself is a hefty set of tools that *guide* or tool-box of ultimate SEO tools. One of the most important tools a SEO Guru or Expert or whatever you want to insert / call yourself will tell you is that watch out for 404’s, understand 301’s & 302’s, and understand how Google / Bing / Yahoo (Bing and Yahoo are one in the same) bots work. The important action item here is that my SEO-Ultimate plugin alerted me to improper indexing of certain posts / caches that were being exposed by my robots.txt file. Robots.txt tells user agents (i.e. the google / bing / yahoo bots) how to search / find your content. So the problem at its core is the fact that robots.txt was telling WordPress to index content that it shouldn’t, i.e. my caches, minifying, and so on…

The solution was simple to find. Through some Google searches and basic folder / structure understanding the solution presented itself – rearrange the robots.txt to fit the ideal situation – which is –


Sitemap: http://www.example.com/sitemap.xml

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google
Disallow:

# digg mirror
User-agent: duggmirror
Disallow: /

# global
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /category/*/*
Disallow: */trackback/
Disallow: */feed/
Disallow: */comments/
Disallow: /*?
Allow: /wp-content/uploads/

Well, thats all good in thought but I needed extra functionality. I don’t want my theme folder and the structure that it implied to be exposed for indexing…

Here’s my robots.txt for WordPress…

User-agent: *
Disallow: /wp-admin/
Disallow: /aws_portfolio_category/
Disallow: /aws_career/
Disallow: /wp-includes/
Disallow: /cgi-bin/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /category/*/*
Disallow: */trackback/
Disallow: */feed/
Disallow: */comments/
Disallow: /*?
Disallow: /wp-content/uploads/