Archive for the ‘SEO’ Category

Google Creates Duplicate Content for Me

Friday, February 19th, 2010

Google is creating duplicate content on websites. I started noticing this back in 2008, when Webmaster Tools identified some pages that were “broken” on the website. The curious thing about these pages was that they were not actually accessible via the website. But they were in the source code as a part of a Javascript function and Google appeared to have kludged together some URLs based on the root domain name plus the file being called in the Javascript function.

For example, in the ASP.NET environment, pages and code behind are often called via Javascript. Those pages carry variables to the server where they are often rewritten to be “search friendly”, before being returned to the browser. In other words, “products.aspx?id=12345″ gets rewritten on the server to be “super-dooper-blue-products”. The actual “page” being called in Javascript never makes it to the browser.

In looking at GWT back in early 2008, I discovered there were hundreds of URLs causing “404″ errors. All of these URLs followed the pattern of http://root-domain/products.aspx?id=xyxyz. And when I clicked them, yep, they were broken. The reason these URLs were broken was because of the way Google created, guessed at, the proper URL construction from the information they discovered in the Javascript function. Google took the page name and parameter from the function and appended it to the root domain where the function was discovered.

However, the “actual” page that was being executed was not http://rootdomain.com/products.aspx? The real URL structure was more like, http://rootdomain.com/directory1/directory2/products.aspx? When Google executed the erroneous URL that it had created based on its assumptions, the pages were broken because Google logic did not fully understand that the products.aspx page resided in a relative path rather than an absolute path.

Functional duplicate URLs occur when pages such as products.aspx are called in Javascript and also reside in the absolute path. In cases like this, Google pulls page names and parameters from Javascript functions and appends them to the root URL to actually create a functional (although non-intentional) page. Now Google’s assumed URL construction does render an actual page that functions correctly. But that functional page is a duplicate page because, at the same time, a rewritten URL exists that renders the same content.

A few months after making this unfortunate discovery, Google informed us to stop the practice of rewriting URLs. What? In the sense that Google is now collecting “raw” data pages before a rewrite, and that a rewrite can cause duplicate content, Google says that it prefers the raw version of the URLs, the dynamic URLs, rather than rewritten ones.

That would be fine and nice if so many websites weren’t already using URL rewrites. Secondly, even though Google prefers the dynamic URLs now, Bing certainly does not. It would make better sense if Webmasters could instead include a tag in their pages such as “meta name =’discovery’ rel=’noJS’” whereby Google would not try to execute Javascript to “discover” pages that would result in the creation of duplicate content.

Sphere: Related Content

Google Adds Site Search to Search Result Listing

Wednesday, April 29th, 2009

I just did a search for the March of Dimes this morning and saw something in the Google SERPs that I had never seen before. In addition to the normal blue title tag link and a series of site links, there was an additional search box that allows for searching for a term within the actual site listed in the search results. That’s hard to explain so here’s a picture for clarity.

March of Dimes Search Result with site search

At fist glance, I thought the search box was using the March of Dimes’ website search box since they have a site search box at the top left position of their home page. It looked like Google just pulled down that piece of functionality. But that’s not the case. It turns out that search box in the Google search results listing performs the following query: [search term site:sitename.com] which is using a Google search to look for a specific search term within a given domain.

I don’t think this is going to send shock waves around the world but it is an interesting way for Google to allow for the exposing of more advanced search techniques to users who would not otherwise have the savvy to execute such searches. Apparently Google is expanding their sitelinks to include more ways to get to specific parts of a website. Here is a link to the blog post about their expanded sitelinks program.

Sphere: Related Content

Google Declares Querystring Problems a Myth

Monday, September 29th, 2008

I thought this was one of the biggest cases of amnesia in Google’s short history. Just six years ago, I was at a SES conference and had a discussion with Matt Cutts involving long querystrings and Google’s ability to crawl dynamic content. Essentially, he recommended that I reduce the number of query sting variables to two or less. And he was correct.

When I took drastic measures to narrow down the query string variables on Apartmentguide.com (by using some crazy xml import based on local page variables) our indexed page count sky-rocketed. Within 2 months our rankings climbed to the top of SERPs and our traffic went from 60k/month to 500k/month. Within a year, our traffic rose to nearly 1 million/month.

Just last week, however, Google’s webmaster Blog made the following statements:

Myth: “Dynamic URLs cannot be crawled.”
Fact: We can crawl dynamic URLs and interpret the different parameters.

As well as:

Myth: “Dynamic URLs are okay if you use fewer than three parameters.”
Fact: There is no limit on the number of parameters, but a good rule of thumb would be to keep your URLs short (this applies to all URLs, whether static or dynamic).

It may be true today that dynamic urls can be crawled when they are choc full of variables, but that has not always been the case. Calling it a “myth” is a little strange. Mod_Rewrite and Isapi Rewrite weren’t invented for nothing. And there definitely was a time when pages with long query string parameters were simply ignored by all search engines.

I think it is great that Google has overcome these obstacles to reading content. But because they made these gains does not make history a myth. Before you know it, they might claim that it is a myth that Google can’t read Flash content.

Sphere: Related Content

Link Juice

Thursday, September 11th, 2008

There, I said it. You can say it too. Have yourself a giggle.

Now that the Internets have evolved to a state where my mother sends me Youtube videos and the average webmaster understands that links are good, I have actually heard other people talk about link juice. And most of the time it is like listening to someone talk about pigs, pork spending and lipstick.

Since non-SEO people are at least talking about linking and link juice, the unfortunate focus has been on acquisition. The tendency is to hoard, possess and keep it all for themselves without giving back. Because “giving back” is often mistaken as “losing link juice”. As an example, I recently heard someone suggest that they link to a related group of websites but they wanted to secretly put “no follow” tags on the links to keep their site from losing link juice. That’s kinda like saying, “hey, thanks for the good time, I’ll call ya”.

Oh, man.

I once wrote a blog post about whether Google would monitor prostitution in regards to buying and selling links, so I might as well be consistent. Giving and getting links naturally is a lot like free love. Actually, it’s more like having multiple partners, but with some discretion. It’s a two-way street, an openness to sharing. You link to sites that you like. And sites that like you link to you.

Essentially, if you have a good looking site with a great personality you tend to get lots of links. If your site is on the ugly side and not very compelling, well, you will likely not get much link action and might be tempted to buy some. Of course, in Googleland, buying links is against their terms and regarded much like buying love in the real world. And if you link to (pass link juice to) every single site you come across you just get a bad reputation and Google won’t love you.

Often, the road to success starts with giving. Pay someone a compliment, say something nice, link to someone without expectation. Reciprocal linking is really not the best route anyway. But if you participate in your greater community, you will make friends who will link to you. And if you come across a site you admire, share with others by linking to them. Don’t hoard.

….oddly enough, I just saw a post from Aaron Wall where he calls this hoarding of link juice an “SEO black hole“.

Sphere: Related Content

Google Stops Rewarding Class Clown

Monday, June 9th, 2008

I made a really long post about this last week and my Internets went down and my post was lost. Recreating that post is already making me ache so this will likely be considerably shorter.

Back in a previous post, I brought up the fact that Google was rewarding off-topic link bait as if it were legitimate content and that has caused poor, low quality websites to get a higher ranking. My argument was that being silly or provocative ALONE should not have a positive effect on search engines toward the ranking of your site. Apparently Google is going to take a deeper look at how they reward, or punish, ‘deceptive’ link bait.

Just to clarify, I really love the tools that Marketleap offers and it was easy for me to link to them because they add value. Rex Swain has an awesome http header reader that I use, so he deserves quality links. However, people who make fun of FT 2.0 clowns or make ingratiating, off-top videos that have nothing to do with their website, should not get a Google bump as a result.

Think about it, everyone has been to a garage and seen the calendars with the women in bikinis. Well, if you grew up in Tennessee you have. At any rate, a picture of Tricia Helfer sitting on a Chevy Nova will not make a better car, although it will look much nicer. If Consumer Reports were to give the Chevy Nova a higher performance rating based on the attractiveness of Ms. Helfer, then one would certainly question the quality of the Consumer Reports rating system. In the same respect, Google should not reward websites for off-topic gimmicks that merely make someone look.

Maybe they’re catching on.

Sphere: Related Content