Category Archives: Blogging

Google & Pingomatic Sitting In A Tree

Following on from my previous post about Google Blog Search accepting pings, it appears that Pingomatic is already sending ping messages into the Google Blog Search service.

By default, WordPress only sends pings into Pingomatic, which then distributes them to a lot of other services. To verify that Pingomatic has already been updated, the previous post title and URL were amended. A few minutes later, you could do a search using the Google Blog Search and the newly amended title and URL were visible in the results.

I love a good service.

Google Blog Search Accepting Pings

Google have recently announced that their Blog Search service is now capable of accepting ‘pings’.

For those that aren’t aware, the ping I’m referring isn’t the first half of the game or the network utility; its a notification message. Traditionally indexing services such as Google or Technorati periodically scour the internet looking for changed content. Due to the ever increasing size of the internet, it is taking longer and longer for these services to complete each run. The knock on effect to publishers, is that it’s taking longer and longer for newly published content to show up in the search indexes.

To try and reduce that problem, the search engines have been introducing more features and services to the community. As an example, some time ago Google released a simple service known as Google Sitemaps. The idea behind the Google Sitemap is that the website owner provides Google with an XML file of their sites content and what they consider to be important within it. This then allows the Googlebot to pick up the Google Sitemap on its scheduled run and only retrieve content which is new or changed since its last visit; drastically reducing the number of pages that the Googlebot needs to reindex per website.

Following on from that, Google are now allowing website owners to proactively tell Google when they have added or changed content. Google accepting a ping when publishers add/modify content, means that the Googlebot can come back to your site more frequently, knowing that there will be new or updated content to index. Having the Googlebot revisit your site should also mean that your newly added or modified content should show up within the search index much sooner than the scheduled indexing cycle.

I wonder how long it will be before Google allows a ping for normal content and not just blog style content.

Akismet Spam Filtering, The Bringer Of Light

Akismet Spam Filter, Caught & Nailed 21,324 Spam MessagesWhether you read email or have your own web site, everyone hates spam. Toward the middle of the year, I was being overwhelmed by comment spam on my site. Initially it was just one or two, then five or ten – soon they were comng in thick and fast and I couldn’t keep up with them manually. When that happened, it was time to find an automated solution; enter Akismet.

For those that aren’t aware, Akismet is a centralised hosted spam filtering service which has been developed by the same people that brought you WordPress. The whole system is very simple:

  • You signup for an account at http://wordpress.com
  • Install or integrate an Akismet plugin or wrapper into your preferred utility, blogging or other
  • Revel in the glory of spam freedom

Akismet spam filtering utilises an undocumented system (probably Bayesian based), along with a whole bunch of secret squirrel stuff to knock spam on the head. When you combine a solid foundation, plenty of innovation and an enormous online community to power it; you end up with a very sound spam filtering platform.

Since implementing Akismet spam filtering at the start of August 2006, it has filtered and saved me dealing with a whopping 21,324 dirty dirty spam messages. It hasn’t returned a false positive for me in so long now that I lay pretty much 100% confidence in it; only giving the spam a cursory scan to make sure there isn’t anything legitimate in it.

Die filthy spammers, die.

WordPress Performance

In the last month or two, I’ve begun writing some simple plugins for WordPress and I’ve been a little frustrated by one of the application/database design decisions which have been made.

Most web sites consist predominantly of read activity and infrequent write activity. As such, it is in the interest of performance that, where applicable, you store an aggregate value instead of calculating it. This design decision was made correctly by storing the number of comments per post in the wp_posts table. Unfortunately, this technique has not been used for storing the URL for a post. For whatever reason, to get the URL for a particular post you need to call get_permalink() – which through the use of a few other queries derives the URL for the page.

I can understand to some extent why this was done, it makes sure that the URL presented for a post is always the current one. I think the other reason might involve creating a convenient templating language for the public to use with WordPress. As a simple example, consider the lists of posts you see on this site. Instead of requiring a single query to generate these lists, it requires n+1 queries to generate the lists where n is the number of posts you want displayed.

What I don’t understand is why the guid field in the wp_posts table isn’t updated and kept in sync with the post and a users desired permalink structure. Employing a simple mechanism like this would mean generating a list of URL’s would only issue a single query. If this was the case, you’d end up with a scenario where:

  • drafting a post would create a permalink
  • publishing a post would update the permalink
  • changing the publishing date of a post would regenerate the permalink
  • changing the permalink structure would regenerate all permalinks

One other thing which is a little frustrating is that after asking in #wordpress on irc.freenode.net, no one at the time could clarify what the guid field was used for and why it isn’t kept in sync as pointed out above (maybe its a bug?). The other thing which I couldn’t find on the codex, was a good definition of all fields in all tables and what they logically represent. If I happen to run into Matt or Ryan, I’ll be sure to ask them to confirm one way or the other.

Albino

n. albino
pl. albinos

  1. A person or animal lacking normal pigmentation, with the result being that the skin and hair are abnormally white or milky and the eyes have a pink or blue iris and a deep-red pupil.
  2. A plant that lacks chlorophyll.

No, you’re browser isn’t broken; its a really simple template for the site I’ve been working on. It was time for a change and this time I thought I would make it as simple possible. The goal of albino is to be as minimalistic as possible while offering just enough style to make it clear, tidy and functional. The CSS for the site isn’t complete yet, in fact it has taken me all of about 30 minutes amidst watching TV to get to where I am. There will be little changes and touch ups over the coming days.

Hope you enjoy the clarity.