Making use of the new canonical HTML <link> element to combat duplicate content

Flip Order - Newest First | Show Avatars

The "Freeola Customer Forum" forum, which includes Retro Game Reviews, has been archived and is now read-only. You cannot post here or create a new thread or review on this forum.

Thu 19/02/09 at 23:08

cjh

Regular

"It goes so quickly"

Posts: 4,083

[B][U]Making use of the new canonical HTML <link> element to combat duplicate content[/U][/B]

If you’ve got a web site hosted with Freeola, or indeed any other hosting provider, then you will have URLs, or web addresses, for each of those pages of your web site. These URL’s typically look like http://www.example.com, http://www.example.com/content/, or http://example.com/contact.htm.

As a web site owner, you will also likely find that your web pages have been "indexed" by search engines, most notably Google, Live and Yahoo. Search engines use their "web crawlers" to scan web sites and categorise them so that people can search the Internet. How web crawler’s work is a vastly complex issue, and each company uses their web crawler in different ways to gain the end results of an indexed Internet.

However, one common issue search engines may come across with web sites is a concept known as "duplicate content", which can be a bit of a problem when it comes to your search engines ranking placement.

Duplicate content?

In the complicated world of the search engine, "duplicate content" is a reference mainly to a single web page that is accessible via more than one web address. For example, if you type in any of the four web addresses bellow in to your web browser, you would, in 90%+ of cases, be taken to the same web page, which in this case would be the site’s home page:

http://example.com
http://www.example.com
http://example.com/index.htm
http://www.example.com/index.htm

A search engine company doesn’t really want the same web page to appear multiple times in their search engines results pages, because people who use said search engine won’t want to be taken to the same web page again if they have already clicked the "back" button of their browser and moved to a different result. In the same sense, people also don’t want to have the same web page shown to them more than once on a search engine result page, as it is of no benefit to them.

To work around this, each search engine has built-in checks to figure out what is "duplicate content", as well as other methods, but are not always able to detect it on larger web sites so easily. This isn’t necessarily down to stupidity or laziness, but in the same way a human would, to detect when something is the same; you have to compare the content. As the web is so large, and search engines are keen to be seen as up-to-date, not all web pages can be compared to other web pages, so "duplicate content" may slip by.

The issue of multiple web address can become especially complex when web site address are made up from "query strings", where by the content of a page is selected based on some additional information in each URL.

You can tell when a "query string" is in use by looking at a web address, and seeing a question mark (?) at the end of the web address, which may look something like this:

http://example.com/products.php ?[B]item[/B]=[I]car[/I]& [B]colour[/B]=[I]red[/I]

This type of web address method has been used extensively for web shops (like Amazon, Play, Game and Special Reserve), but the problem here is that the extra information after the web address can often be written in many orders, yet still produce the same outcome. For example, the above could be written like so:

http://example.com/products.php ?[B]colour[/B]=[I]red[/I]& [B]item[/B]=[I]car[/I]

The same applies to content that is practically the same, but ordered differently based on the URL query string. A common use for this is product listings, which can be ordered in different ways (price, highest or lowest, size, colour, etc), yet the overall content is in fact the same.

Preferred web address!

To work around this, many web sites have started using HTTP redirects to point multiple web address to a single web address, which is known as the "preferred web address". Freeola do this, and you can see this in action by trying out the links below. You’ll notice that the first one is the "preferred web address", and that the remaining two redirect to it if you try to use them.

http://freeola.com (Freeola’s preferred address)
http://www.freeola.com
http://freeola.co.uk
http://www.freeola.com

However, using this type of HTTP redirects can be confusing and difficult for people who don’t understand how it works, and can result in multiple redirects, incorrect redirect codes, and generally become more trouble than people are willing to accept.

To give people the option to combat their "duplicate content" problems or worries, without having to fiddle with or understand HTTP redirects, the big three search engines came up with the canonical HTML element.

What is the canonical link element for?

The word "canonical", in search engines speak, basically refers to the preferred URL of a web page that a site owner would like people and search engines to use to get to the page.

The canonical HTML element is a way for web site owners to embed this preferred web address directly in to the web page, so that when search engines come along and crawl (or scan) the web page, it isn’t that much of a worry if they came via an old, out-dated or slightly incorrect web address. The web crawler will be able to see the preferred URL and say "ah, this site owner wishes to use this URL to get to this page, I will have to make a note of that".

So, how do I use it?

The HTML code is pretty simple. It is made up of the <link> element, with two attributes. The first attribute, rel, must contain the value canonical, while the second attribute, href, must contain the web address (URL) that you consider to be that pages preferred, or official address, an would look like.

<link rel="[B]canonical[/B]" href="[B]http://example.com[/B]">

The above code needs to be placed within the <head> section of the web page. For example, if Freeola didn’t have the HTTP redirects in use, the Freeola home page may include the code:

<html>

	<head>

	<title>Freeola Home Page</title>

	[B]<link rel="[I]canonical[/I]" href="[I]http://freeola.com[/I]">[/B]

	</head>

	<body>

	<h1>Freeola Home Page</h1>

	</body>

	</html>

Quite simple really!

Why might I want to use this?

You may want to use this if you already find that search engines are already listing a single web page of yours multiple times, using a number of different URLS. By allowing multiple URLS to remain in search engine listings, you risk having people bookmark or add links to their web pages that are of a URL that later on you forget about or remove, leaving "dead links" scattered around the Internet.

You may also find that if single pages are listed multiple times, that they aren’t ranking as high as they could be, because search engines are splitting your single page ranking between multiple web address, so while a web page may be worth a Google ranking of 7 on it’s own, if 1000 people are linking to one URL, another 1000 linking to a second, and another 1000 linking to a third, Google may think that is 3 single pages with 1000, rather than see it is really the same page, with 3000 different links to it. That is quite a difference in a ranking score.

Even if your site is new, currently doesn’t suffer from any "duplicate content" issues or correctly uses HTTP redirects, then you may still wish to implement the canonical HTML element, to help prevent any problems in the future.

What if I use other software to make my web site?

You may be able to download an update for this new feature in your chosen web site generation software. For example if you use Wordpress to run your web blog, you can make use of the canonical plugin, which will automatically generate the desired canonical HTML code for you.

Check out the web site of the software you’re using and see if they have made any announcements.

And that is that!

And there we have it, a rather simple concept of an HTML element that tells search engines which web address (URL) you’d prefer a web page to be accessed from. Simply decide which web address you prefer, add the code, and plan your Friday night.

For additional information, see the blog postings from the search engines themselves, as well as a blog and video from a Google employee:

Google:
Specify your canonical
Live:
Partnering to help solve duplicate content issues
Yahoo:
Fighting Duplication: Adding more arrows to your quiver
Matt Cutts Blog:
Learn about the Canonical Link Element in 5 minutes
Matt Cutts interview:
Matt Cutts Explains “Canonical Tag” from Google, Yahoo, Microsoft

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

As always, any comments, questions, and especially corrections are welcome.

Report

There have been no replies to this thread yet.

Thu 19/02/09 at 23:08

cjh

Regular

"It goes so quickly"

Posts: 4,083

<html>

	<head>

	<title>Freeola Home Page</title>

	[B]<link rel="[I]canonical[/I]" href="[I]http://freeola.com[/I]">[/B]

	</head>

	<body>

	<h1>Freeola Home Page</h1>

	</body>

	</html>

Report

Viewing Thread:
"Making use of the new canonical HTML element to combat duplicate content"

Freeola & GetDotted are rated 5 Stars

Need some help? Give us a call on 01376 55 60 60

Find Us

Our Websites

About Us

Join Our Mailing list