Canonicalization for Best Google Rankings

By in SEO

Search Engine Marketing Training 300x256 Canonicalization for Best Google Rankings

Canonicalization SEO Tip

Canonicalization is almost harder to say than to understand since it just is a way of organizing your URLs to a) get maximum ranking benefits and b) avoiding ‘duplicate content’ penalties.

 

How does ‘Canonicalization’ work?

Canonicalization is simply the process of picking the best url when there are several choices. Most people think that similar URLs are the same and that those URLs won’t cause any problems.

Canonicalization errors using the default settings on the two most common web servers:

Apache web server:

  • http://www.yourdomain.com/
  • http://www.yourdomain.com/index.html
  • http:/yourdomain.com/
  • http://yourdomain.com/index.html

Microsoft Internet Information Services:

  • http://www.yourdomain.com/
  • http://www.yourdomain.com/default.asp (or .aspx depending on the version)
  • http://yourdomain.com/
  • http://yourdomain.com/default.asp (or .aspx)
  • or any combination with different capitalization.

Producing these different versions of URLs is a problem since the ranking benefit for the site gets divided and spread over various versions of your website’s URLs. What you wnat instead is the maximum ranking benefit combined on one page, not divided over different versions of the same page.

From a technical point of view these urls are different for the search engines and a web server might also return very different content for all these urls.

“When Google ‘canonicalizes’ a url, we try to pick the url that seems like the best representative from that set. One thing that helps is to pick the url that you want and use that url consistently across your entire site. For example, don’t make half of your links go to http://example.com/ and the other half go to http://www.example.com/ . Instead, pick the url you prefer and always use that format for your internal links.” says Google’s Matt Cutts (Matt Cutts SEO Advice On URL Canonicalization Issues).

 

How do I make sure Google (Bing etc.) find the version of my website that I prefer?

If you want your default url to be http://www.example.com/ . You can force resolution to one URL only. To do this, you can use

  • Google Webmaster Tools that provides an area where you can specify which version of URL Google uses
  • a 301 redirect. You can structure your website so that if someone requests http://example.com/, it does a 301 (permanent) redirect to http://www.example.com/ . That helps Google know which url you prefer to be canonical. Adding a 301 redirect can be an especially good idea if your site changes often (e.g. dynamic content, a blog, etc.). 

Warning: DON’T just remove one of the www vs. non-www hostnames, since you might be removing your whole domain for six months!  If you have used the url removal tool to remove your domain when you actually only wanted to remove the www or non-www version of your domain, simply do a reinclusion request and mention that you removed your entire domain by accident using the url removal tool and that you’d like it reincluded.

 

Top (Google) Rankings and Canonicalization

If you want top rankings you need to sort out ‘canonicalization’ matters on your website (blog). “If the search engines sees a page as being published at many separate URLs, the search engine may rank your pages lower than they would otherwise, or not rank them at all.

Canonicalization issues can split link juice between pages if people link to variants of the URL. Not only does this affect rank (less PageRank = lower rank), but it can also affect crawl depth (if PageRank is spent on duplicate content it is not being spent getting other unique content indexed).” PeterD, SEOBook

 

8 Canonicalization Best Practices

  1. Use 301 redirection to ensure that your home page is only found at one URL. If you don’t know how, read Stephan Spencer’s column about rewrites and redirects.
  2. Link consistently to your home page from within your own site. Use a single URL for your home page. Don’t mix in instances of ‘www.iansnerdvana.com/index.html’ with ‘www.iansnerdvana.com’. If you aren’t doing this properly right now, a quick change may have a big impact on SEO.
  3. Don’t use tracking IDs in internal site navigation. A lot of sites add stuff like ‘?source=blog’ in their navigation. That lets them use their analytics reports to track user movement within, to and from their site. Instead, learn to use your web analytics referrer and navigation path reports. If you must use tracking IDs, change your software to use a hash mark (a ‘#’ sign) instead of a question mark. Search engines ignore everything after the hash, so you’ll avoid confusion.
  4. Don’t use tracking IDs in organic links from other sites. If you get a link on another site, and want it to help with your SEO, don’t put a tracking ID in that, either.
  5. Be careful with pagination. Many sites have pagination, where visitors can click a 1, 2, 3 etc. to jump to later pages in search results, product lists or articles. That’s fine, but make sure that the each page has a single URL. For example, if page 1 of the article is ‘www.iansnerdvana.com/article.html’ when I click the article link from the home page, make sure that the number ’1′ in the pagination takes me there, too, instead of to ‘www.iansnerdvana.com/article.html?page=1′.
  6. Set up preventative redirects. Make sure that ‘iansnerdvana.com’ 301 redirects to ‘www.iansnerdvana.com’.
  7. Exclude ‘e-mail a friend’ pages. Most content management systems that have ‘e-mail a friend’ options direct the user to a unique page that has the same form and content. But every instance of that page has a unique URL like ‘ID=123′, to tell the server which product or article to forward. It’s canonical higgeldy-piggeldy. Use robots.txt and the meta robots tag to exclude these from search engine crawls.
  8. Use common sense when building your site. Think, man/woman! If you need to change the header, footer or other page element based on where on your site the visitor came from, do it with cookies, or by sniffing out the referring URL. Design to do this ahead of time.” Ian Lurie, Search Engine Land

Note: ‘rel=canonical’ is NOT a best practice since Yahoo! and Bing haven’t confirmed support for that tag yet, plus search engines might chage their mind about the use of that tag later on. This has happened before!

Note: use absolute URLs for your internal linking structure not relative links. This avoids canonicalization mistakes and gives you some rewards even when your content gets scraped.

 

Canonicalization Problems

If you haven’t canonicalized your website pages yet your site might face some duplicate content and other problems.

As duplicate pages often get thrown into the supplemental index, appearing in the supplemental index can be an indicator you may have canonicalization issues. If some of your pages (or your entire site) are not indexed frequently, and you think they should be, chances are the pages might be residing in Google’s supplemental index. (See Michael Grays technique for monitoring the Google cache.)

Which could mean your site has a duplicate content issue, which could mean it has a canonicalization issue, which in turn could mean your website might not get ranked as high as it could be.

Plus “you lose link authority. If blogger 1 comes to ‘www.yourdomain.com’ and links to that page, blogger 2 lands on ‘yourdomain.com’ and links to that URL, and blogger 3 lands on ‘www.yourdomain.com/index.html’ and links to that page, the Googlebot sees three links to three different pages, and applies 1 ‘vote’ only to each one. These three links could have sent three authoritative signals to Googlebot for your site’s home page! Instead, they’re split into three weaker individual votes for three different pages.

And search engines won’t crawl your site as deeply as they might. Search engines allocate resources for each crawl. No one knows exactly how, but it’s safe to say Googlebot won’t just wander around your site until its found every page. At some point, it gives up and leaves. If multiple pages on my site have multiple URLs, then visiting search bots waste time tracking down all of those different versions. That’s time they could spend crawling other unique pages, instead. So fewer unique pages of my site end up in the search index, and I have fewer chances to rank”. Ian Lurie, Search Engine Land

HTTP and HTTPS: Sometimes Google indexes both the http:// and the https:// versions of a website. One solution is to tell the search engine bots not to index the https:// version. Tony Spencer outlines two ways to do this in .htaccess, 301 Redirects & SEO

Different versions of a webpage: some softwares, such as blog and forum software, aggregate posts into archives. When you link to your articles in your archive, link to the original version of the post, as opposed to the archive one. For example www.yourdomain.com/todays-post.htm , not www.your domain.com/archive/december/todays-post.htm.

If your software program links to a duplicate version of the content (like an individual post from a forum thread) consider adding rel=nofollow to those links.

“One common mistake when implementing canonicalization fixes to accidentally create a infinite loop between http://www.example.com and http://www.example.com/index.html. The solution to this common glitch is discussed in this post about redirecting an index file to your domain without looping.” SEOmoz

 

Further Canonicalization Reading

If you want the entire story of where the term ‘canonicalization’ comes from, read this: Canonicalization – Wikipedia, the free encyclopedia.

Google Lets You Tell Them Which URL Parameters To Ignore, Search Engine Land, Stephen Spencer

URL Rewrites & Redirects: The Gory Details (Part 1 of 2), Search Engine Land, Stephan Spencer

 

 

 

 

pixel Canonicalization for Best Google Rankings

No comments.

Leave a Reply