Recently I broke down a fairly content heavy section of a site into smaller more succinct pages. As a by product, each page now had focused content instead of a single large page with comparatively jumbled content. Of course, this allows you to target the information on each individual page for search engine optimisation thus search engine ranking/placement.
When breaking it into smaller pages, I thought I would use a common naming convention for all of the ASP filenames, for instance faq_<some>_<meaningful>_<name>.asp
. I chose an underscore (_) because I’m a fan of C-style programming languages. The underscore character is pretty common (used for private variable declarations in objects, compiler stuff, …) and I also prefer the ‘look’ of them in filenames as they seem to be ‘out of the way’.
Splitting it all went well up until the pages just weren’t being picked up by Google; however I could confirm that the site and parent page were being indexed regularly. I let it slide for a while, in case the links weren’t followed, for whatever reason, on the previous visits. On Wednesday just gone, I decided something had to have been going wrong for it to not be showing up in the various indexes properly.
After investigating all the other pages on the site, the one thing that became apparent was that none of them had underscores in the filenames. Of the pages that had filenames which might have warranted one, they were either words concatenated with no separating character or a dash was used to separate each word. This led me to check how the pages that used dashes (-) were going in the search engines. It appeared that they had no problems at all and that Google was actually utilising the filename as part of the ‘this page is relevant’ algorithm.
Cruising through some useful searches has confirmed that Google considers a dash to be a separating character in a filename. For instance, if you had a filename of faq-some-meaningful-name.asp
, Google would see that as “faq some meaningful name” and utilise that when indexing the site. Conversely, an underscore is considered a plain character, which means unless the person was searching for “faq_some_meaningful_name” – that page might not show up as a by product of the filename.
The moral of the story: for the moment a dash in a filename trumps an underscore; so if you are using underscores in filenames, you might be missing out on valuable search engine ranking.
Is this still a relevant information in 2009?
The information above is still relevant today.
Choosing a dash or hyphen over and underscore has been discussed at length in the search engine optimisation space for a long time and the opinion from Google has changed over time:
The general guidance has remained quite consistent over time though, which has been if your site is well established and using underscores – not to change existing content. If you’re adding new content, by all means create it using hyphens as a word separator.
My personal opinion on the above only differs if you’re web site is within a highly competitive sector. If your site is about gambling, sex/porn or pharmaceuticals – then it could warrant a site wide change to give you any additional boost that you’re not already receiving. If on the other hand you’re not in a competitive space, it wouldn’t be worth the effort.