Google are constantly improving their services and during April they updated their Google Webmasters Tools; this release relates to removing content that has already been indexed by Google.
Google have supported removing content from their service for a long time, however the process was often slow to take. With the recent addition of the URL removal into Google Webmasters Tools, its now possible to process the removal of a page quite quickly.
As with everything associated to Google Webmaster Tools, the web site to act on first needs to be verified. Once verified, there is now a URL Removals link under the Diagnostics tab. The removal service supports removing URL’s in the following ways:
- individual web pages, images or other files
- a complete directory
- a complete web site
- cached copies of a web site
To remove an individual web page, image or file – the URL must:
- return a standard HTTP 404 (missing) or 410 (gone) response code
- be blocked by the robots.txt file
- be blocked by a robots tag
Removing a directory has less options available, it must be blocked using the robots.txt file. Submitting http://mydomain.com/folder/ would remove all objects which reside under that folder including all web pages, images, documents and files.
To remove an entire domain from the Google index, you need to block it using a robots.txt file and submit the expedited removal request. Google have once more reinforced the point that this option should not be used to remove the wrong ‘version’ of your site from the index, such as a www versus non-www version. To handle this, nominate the preferred domain within the Google Webmaster Tools and optionally redirect the wrong version to the correct version using a standard HTTP 301 redirect.
Cached copies of web pages can be removed by setting the <meta> robots attribute with a noindex
on the given page(s) and submitting the removal request. By using this mechanism, Google will never re-include that URL so long as the robots noindex <meta> data is present. By removing the robots noindex
<meta> data, you are instructing Google to re-include that URL, so long as it isn’t being block by alternate means such as a robots.txt file. If the intention is to simply refresh a given set of web pages, you can also change the content on those pages and submit the URL removal request. Google will fetch a fresh copy of the URLs, compare them against their cached copies and if they are different immediately removed the cached copy.
After submitting requests, it’s possible to view the status of the request. They will list as pending until they have been processed, denied if the page does not meet the removal criteria and once processed they will be moved into the ‘Removed Content’ tab. Of course, you can re-include a removed page at any time as well. It should be noted that if you remove a page and don’t manually re-include the web page(s) after exclusion, the removed page(s) will remain excluded for approximately 6 months – after which they will be automatically re-included.
Being able to remove content from the Google index so quickly is going to come in handy when certain types of content are indexed by accident and need to be removed with priority.
I just wish there was a quick was to flush links out of the index by applying the robots.txt