<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>slimster.net &#187; Google crawls Javascript</title>
	<atom:link href="http://www.slimster.net/tag/google-crawls-javascript/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.slimster.net</link>
	<description>People, Technology, Gardens, Yoga and Corporate America</description>
	<lastBuildDate>Tue, 22 Nov 2011 02:36:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Google Creates Duplicate Content for Me</title>
		<link>http://www.slimster.net/google-creates-duplicate-content-for-me/</link>
		<comments>http://www.slimster.net/google-creates-duplicate-content-for-me/#comments</comments>
		<pubDate>Fri, 19 Feb 2010 17:43:41 +0000</pubDate>
		<dc:creator>Slim</dc:creator>
				<category><![CDATA[SEO]]></category>
		<category><![CDATA[duplicate content]]></category>
		<category><![CDATA[Google crawls Javascript]]></category>
		<category><![CDATA[Google Duplicate Content]]></category>

		<guid isPermaLink="false">http://www.slimster.net/?p=189</guid>
		<description><![CDATA[Google is creating duplicate content on websites. I started noticing this back in 2008, when Webmaster Tools identified some pages that were &#8220;broken&#8221; on the website. The curious thing about these pages was that they were not actually accessible via the website. But they were in the source code as a part of a Javascript [...]]]></description>
			<content:encoded><![CDATA[<p>Google is creating duplicate content on websites.  I started noticing this back in 2008, when <a href="www.google.com/webmasters/tools/">Webmaster Tools</a> identified some pages that were &#8220;broken&#8221; on the website.  The curious thing about these pages was that they were not actually accessible via the website.  But they were in the source code as a part of a Javascript function and Google appeared to have kludged together some URLs based on the root domain name plus the file being called in the Javascript function.</p>
<p>For example, in the ASP.NET environment, pages and code behind are often called via Javascript.  Those pages carry variables to the server where they are often rewritten to be &#8220;search friendly&#8221;, before being returned to the browser.   In other words, &#8220;products.aspx?id=12345&#8243; gets rewritten on the server to be &#8220;super-dooper-blue-products&#8221;.  The actual &#8220;page&#8221; being called in Javascript never makes it to the browser. </p>
<p>In looking at GWT back in early 2008, I discovered there were hundreds of URLs causing &#8220;404&#8243; errors.  All of these URLs followed the pattern of http://root-domain/products.aspx?id=xyxyz.  And when I clicked them, yep, they were broken.  The reason these URLs were broken was because of the way Google created, guessed at, the proper URL construction from the information they discovered in the Javascript function.  Google took the page name and parameter from the function and appended it to the root domain where the function was discovered.</p>
<p>However, the &#8220;actual&#8221; page that was being executed was not http://rootdomain.com/products.aspx?  The real URL structure was more like, http://rootdomain.com/directory1/directory2/products.aspx?  When Google executed the erroneous URL that it had created based on its assumptions, the pages were broken because Google logic did not fully understand that the products.aspx page resided in a relative path rather than an absolute path.</p>
<p>Functional duplicate URLs occur when pages such as products.aspx are called in Javascript and also reside in the absolute path.  In cases like this, Google pulls page names and parameters from Javascript functions and appends them to the root URL to actually create a functional (although non-intentional) page.  Now Google&#8217;s assumed URL construction <strong><em>does </em></strong>render an actual page that functions correctly.  But that functional page is a duplicate page because, at the same time, a rewritten URL exists that renders the same content.</p>
<p>A few months after making this unfortunate discovery, Google informed us to <a href="http://searchengineland.com/google-says-dont-rewrite-dynamic-urls-to-static-urls-14795">stop the practice of rewriting URLs</a>.  What?  In the sense that Google is now collecting &#8220;raw&#8221; data pages before a rewrite, and that a rewrite can cause duplicate content, Google says that it prefers the raw version of the URLs, the dynamic URLs, rather than rewritten ones.</p>
<p>That would be fine and nice if so many websites weren&#8217;t already using URL rewrites.  Secondly, even though Google prefers the dynamic URLs now, Bing certainly does not.  It would make better sense if Webmasters could instead include a tag in their pages such as &#8220;meta name =&#8217;discovery&#8217; rel=&#8217;noJS&#8217;&#8221; whereby Google would not try to execute Javascript to &#8220;discover&#8221; pages that would result in the creation of duplicate content.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.slimster.net/google-creates-duplicate-content-for-me/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

