The web is crawling with robots, or applications that automatically scan through websites in search of data. Some of these robots are performing important maintenance operations like updating search engines. Google employs several web robots that scan the web indexing new content or identifying the theme of a site to offer up appropriate advertising. Other robots act as harvesters for email spammers and form hackers and are the scourge of the web.
Robots.txt
There is little a web master can do to block access to malicious robots, but we do have an opportunity to guide benevolent robots using a special text file called a robots.txt. When a robot like the Googlebot (Google’s indexing agent) visits your site, the first thing it will look for is this small document. The robots.txt allows you to provide recommendations regarding which pages should or should not be visited or indexed.
Robots.txt is an integral element in optimizing WordPress blogs for excellent performance within the Google SERPS and others. In using a well designed robots.txt, we are able to restrict Google from indexing pages that would otherwise be crawled and would trigger duplicate content penalties. WordPress is a particularly poor performer when it comes to duplicate content, making the need for a strong robots.txt necessary.
Using robots.txt
To employ a robots.txt, you simply compile the document on a text editor like Notepad, save it as robots.txt, and upload it to the root directory of your blog.
Robots.txt use a simple format.
At the top of the page, you may include a direction to the location of your sitemap. If you are using a sitemap, it should be located in the root directory for your site. The format for this direction is:
sitemap: http://www.yourblog.com/sitemap.xml
The rest of the robots.txt uses an allow/disallow format. This is a simple example:
User-agent: *
Disallow: /
The User-agent line specifies the robots you are targeting. In this case, by using an asterisk we are directing our restrictions or allowances towards all visiting robots.
With the Disallow line, we are instructing the visiting robots not to visit any page on our site.
For details on how to block access to specific directories or files, visit robotstxt.org.
Robots.txt and WordPress
Because we are working on the WordPress platform, there are a variety of directories that may be crawled and that should not be indexed. To counteract this, I have developed a basic robots.txt that will block many of the problematic directories and files. If you decide to use this robots.txt, you will need to alter the sitemap URL and add any custom directories you have created.
You will notice that I repeated the body of this document. One is directed to all user-agents, and the other is directed at the Googlebot specifically. I made the decision to do this after I found Google was not following the recommendations in the all inclusive user-agent section.
This robots.txt has been developed over a long period of time from many online resources and much experimentation. There may be unnecessary restrictions, or some that are missing and that should be included. If you feel there are changes that should be made, let me know because I always want to improve on what I have.
You will notice that I have disallowed categories, tags, feeds, wp-’directories’, and file extensions, such that only posts and the main page should be indexed. I choose to proceed this way in order to minimize the number of duplicate content issues.
I strongly recommend applying this robots.txt to the Google Webmaster Tools robots.txt analyzer, before using it. Make sure to include a wide variety of indexed page URLs from your site. To compile a list of pages from your blog, search using the format site:http://www.yoursite.com. This will query Google’s index for all pages from your site’s domain.
Example robots.txt for WordPress
Here is is. Enjoy!
sitemap: http://www.yourblog.com/sitemap.xml
User-agent: *
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.tar$
Disallow: /*.tgz$
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?
Disallow: /*/feed/
Disallow: /*/trackback/
Disallow: /tag/
Disallow: /page/
Allow: /wp-content/uploads
Allow: /*?$
User-agent: duggmirror
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: Googlebot
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /*.tar$
Disallow: /*.tgz$
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?
Disallow: /*/feed/
Disallow: /*/trackback/
Disallow: /tag/
Disallow: /page/
Allow: /wp-content/uploads
Allow: /*?$
User-agent: Mediapartners-Google
Allow: /
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Image
Allow: /
User-agent: Googlebot-Mobile
Allow: /
Comments









10.16.09
… I also find myself torn between the coffee shop and my dual monitor setup
11.04.09
May I suggest ABR Viewer as an alternative to loading and trying each brush in Photoshop? It’s free, and I use it regularly. You may find this a great time-saving alternative!
http://abrviewer.sourceforge.net/
Hope it helps!
11.04.09
Thanks for the referral, Cyndi! I’ll have to spend some time tonight trying it out.
02.11.10
I love you to pieces man!!
10.14.10
I wished and wished for a Mac, then was given one at work.
So I moved all my files over.
I HATE it. I’m a designer. I have about 50 folders for 50 different projects. I name the banner psd “banner.psd” for all of them. Try and search for them all, yeah the Mac finds them but then you have to do “get info” for each one (or change some such setting and still click on each to see where the dang thing is located. On windows. I glanced at the path to the folder and voila. Yeah changing permissions on Vista is a headache but it’s far better than the constant problems I have on the Mac. Photoshop is twitchy at best, the thing crashes, although my Roku, PS3, Wii, personal laptop all do fine with my wirless, the mac drops it all the time. I use multiple monitors. Oh my god what idiot thought of leaving the application menu on one screen when the application is on another? You can only choose one little sprout because Steve Jobs knows better than you how you should work.
can’t wait to ditch it.
sorry tirade over.
ps tons of free windows applications out there.
11.26.10
Um… how about not naming all of your files the same name? Sounds more like an organizational issue than an operating system issue, either way.
I might recommend using an identifier and THEN _banner.psd? I assume it’d be difficult to find photos as well if every picture on a drive had its own folder and was named “photo.jpg”.
Examples:
Projectname_size_banner.psd
Clientname_size_banner.psd
etc…
04.17.11
Ok, so… I’m a total newbie to photoshop. I have CS5 and a brickton of brushes. I have tried renaming them, but they do NOT show up in the list like I want them to. I’ve played around with it for like a half hour. Can anyone please help?