Showing posts with label duplicate content. Show all posts
Showing posts with label duplicate content. Show all posts

Sunday, December 20, 2009

Google Now Supporting rel="canonical" Across Domains

Google announced that it now offering cross-domain support of the rel="canonical" link element. If you are unfamiliar with this link element, Google's Matt Cutts discussed it with us here. Basically, it's a way to avoid duplicate content issues, but until now, you couldn't use it across domains.

"For some sites, there are legitimate reasons to [have] duplicate content across different websites — for instance, to migrate to a new domain name using a web server that cannot create server-side redirects," says John Mueller, Webmaster Trends Analyst with Google Zürich.

Do you have legitimate reasons for having duplicate content? Tell us about them.

"There are situations where it's not easily possible to set up redirects," he says. "This could be the case when you need to move your website from a server that does not feature server-side redirects. In a situation like this, you can use the rel='canonical' link element across domains to specify the exact URL of whichever domain is preferred for indexing. While the rel='canonical' link element is seen as a hint and not an absolute directive, we do try to follow it where possible."

Cross Domain Duplicate Content

Mueller gives the following ways of handling cross-domain content duplication:

- Choose your preferred domain
- Reduce in-site duplication
- Enable crawling and use 301 (permanent) redirects where possible
- Use the cross-domain rel="canonical" link element

Barry Schwartz at Search Engine Roundtable gives three reasons why the addition of cross-domain support for the rel="canonical" link element is really important:

1. Some hosts don't allow webmasters to deploy 301 redirects
2. Some site owners aren't technical enough to implement a 301 redirect
3. In some cases, webmasters do not want to redirect users but rather only search engines (i.e. pagination, weird filtering, tracking parameters added to URLs, etc).
To use the link element, pages don't have to be identical, but they should be similar. According to Google, slight differences are fine. You should not point point rel="canonical" to the home page of the preferred site. Google says this can result in problems, and that a mapping from an old URL to a new URL for each URL on the old site is the best way to go.

You should not use a nonindex robots meta tag on pages with a rel="canonical" link element because those pages would not be equivalent with regards to indexing, Google says. One would be allowed while the other would be blocked. Google also says it's important that these pages aren't disallowed from crawling through a robots.txt file, because search engine crawlers won't be able to discover the rel="canonical" link element.
Read more...

Tuesday, October 6, 2009

How to Handle duplicate content within your own website

Handling duplicate content within your own website can be a big challenge. Websites grow; features get added, changed and removed; content comes—content goes. Over time, many websites collect systematic cruft in the form of multiple URLs that return the same contents. Having duplicate content on your website is generally not problematic, though it can make it harder for search engines to crawl and index the content. Also, PageRank and similar information found via incoming links can get diffused across pages we aren't currently recognizing as duplicates, potentially making your preferred version of the page rank lower in Google.

Steps for dealing with duplicate content within your website
  1. Recognize duplicate content on your website.
    The first and most important step is to recognize duplicate content on your website. A simple way to do this is to take a unique text snippet from a page and to search for it, limiting the results to pages from your own website by using a site:query in Google. Multiple results for the same content show duplication you can investigate.

  2. Determine your preferred URLs.
    Before fixing duplicate content issues, you'll have to determine your preferred URL structure. Which URL would you prefer to use for that piece of content?

  3. Be consistent within your website.
    Once you've chosen your preferred URLs, make sure to use them in all possible locations within your website (including in your Sitemap file).

  4. Apply 301 permanent redirects where necessary and possible.
    If you can, redirect duplicate URLs to your preferred URLs using a 301 response code. This helps users and search engines find your preferred URLs should they visit the duplicate URLs. If your site is available on several domain names, pick one and use the 301 redirect appropriately from the others, making sure to forward to the right specific page, not just the root of the domain. If you support both www and non-www host names, pick one, use the preferred domain setting in Webmaster Tools, and redirect appropriately.

  5. Implement the rel="canonical" link element on your pages where you can.
    Where 301 redirects are not possible, the rel="canonical" link element can give us a better understanding of your site and of your preferred URLs. The use of this link element is also supported by major search engines such as Ask.com, Bing and Yahoo!.

  6. Use the URL parameter handling tool in Google Webmaster Tools where possible.
    If some or all of your website's duplicate content comes from URLs with query parameters, this tool can help you to notify us of important and irrelevant parameters within your URLs. More information about this tool can be found in our announcement blog post.

What about the robots.txt file?

One item which is missing from this list is disallowing crawling of duplicate content with your robots.txt file. We now recommend not blocking access to duplicate content on your website, whether with a robots.txt file or other methods. Instead, use the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. If access to duplicate content is entirely blocked, search engines effectively have to treat those URLs as separate, unique pages since they cannot know that they're actually just different URLs for the same content. A better solution is to allow them to be crawled, but clearly mark them as duplicate using one of our recommended methods. If you allow us to crawl these URLs, Googlebot will learn rules to identify duplicates just by looking at the URL and should largely avoid unnecessary recrawls in any case. In cases where duplicate content still leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools.
Read more...

Wednesday, September 16, 2009

Duplicate Content Across Multiple Domains, URLS

Last month, I gave a talk at the Search Engine Strategies San Jose conference on Duplicate Content and Multiple Site Issues. For those who couldn't make it to the conference or would like a recap, we've reproduced the talk on the Google Webmaster Central YouTube Channel. Below you can see the short video reproduced from the content at SES:



You can view the slides here:



Posted by Bavajan Search Engine Marketing News Editor and Internet Marketing Specialist at Vivahabandhan - A Matrimonial site
Read more...