Archive for the ‘SEO’ Category

No Follow and Internal Links

Wednesday, June 30th, 2010

When the whole “page rank sculpting” thing was a hot topic and all the new, up-and-coming SEO’s were no-following all of their client’s internal links, I was suspicious. I wasn’t suspicious just because I’m paranoid. I was suspicious because of WHY the no follow attribute was created in the first place.

The no follow tag was created so that comment spam and a whole host of link-dropping activities could be minimized in terms of how those links affected search rankings. People were building scripts to comment in blog posts and those comments were full of links that passed PageRank. Google hates that kind of stuff so they pushed the no follow attribute and the world signed on.

So, the no follow attribute was created to combat spam and indicate that links to a website are not necessarily trusted.

Google likes trust.

A lot of people got wise to the fact that putting no follow tags on internal pages condensed the flow of pageRank to pages that were critical to their rankings. This was a flaw in the Google algorithm that was addressed. In the meantime, some SEOs were telling their clients to add no follow tags to their “about us” pages and “privacy” pages. For a time it worked. But to me, adding a no follow tag to an “about us” page told search engines that our about page could not be trusted.

Google likes trust.

So yesterday Matt Cutts posted a video explaining that adding no follow to internal hyperlinks was really just a bad idea. Thanks Matt. I have argued that point many times. Here’s the video:

Sphere: Related Content

Do Links in Javascript Pass PageRank

Saturday, April 10th, 2010

Now that’s a great title. But the gist of the question revolves around links that I have seen on high PR websites that seem to be effectively passing pageRank. These links have a “nofollow” in the HREF section of the hyperlink but also call an onclick function that potentially creates a separate URL whereby Google could crawl the link without the “nofollow” directive.

As I stated a few posts back, Google makes duplicate content for me, Google is looking inside Javascript functions to determine if there are additional URLs and content that they could crawl and add to their index. Their goal, after all, is organize the world’s information and make it universally accessible and useful. But I wonder if their quest to crawl content previously obscured by Javascript has inadvertently provided a loop hole for people who buy and sell links.

I recently encountered a hyperlink that was composed like, a href=http://www.mywebsite.com rel=”nofollow” onclick=window.open(this.href);return false;>my keyword <. As you can see, the initial hyperlink has a nofollow attribute and Google would thereby cut off pageRank flow to the destination page. However, since Google crawls simple Javascript functions such as the window.open function, will the initial "nofollow" be added to the URL which is derived from the Javascript?

I think not. But I shall test. I recently did a little work to make my vet's site accessible and their pet grooming page has yet to be crawled by Google. (See that, I just added this nice little Javascript function to the “pet grooming” link). So this is my little test. I will be looking to see if Google picks up the new grooming page and whether the link to that page from this blog shows up in GWT. Here goes….

(update April 21) – Google immediately crawled this blog post and ranked the post in SERPs. However, it DID NOT follow the JS link to the Tucker Vet’s grooming page. It looks like links in JS that are tagged with rel=nofollow are correctly read and observed by Google!

Sphere: Related Content

Google Creates Duplicate Content for Me

Friday, February 19th, 2010

Google is creating duplicate content on websites. I started noticing this back in 2008, when Webmaster Tools identified some pages that were “broken” on the website. The curious thing about these pages was that they were not actually accessible via the website. But they were in the source code as a part of a Javascript function and Google appeared to have kludged together some URLs based on the root domain name plus the file being called in the Javascript function.

For example, in the ASP.NET environment, pages and code behind are often called via Javascript. Those pages carry variables to the server where they are often rewritten to be “search friendly”, before being returned to the browser. In other words, “products.aspx?id=12345″ gets rewritten on the server to be “super-dooper-blue-products”. The actual “page” being called in Javascript never makes it to the browser.

In looking at GWT back in early 2008, I discovered there were hundreds of URLs causing “404″ errors. All of these URLs followed the pattern of http://root-domain/products.aspx?id=xyxyz. And when I clicked them, yep, they were broken. The reason these URLs were broken was because of the way Google created, guessed at, the proper URL construction from the information they discovered in the Javascript function. Google took the page name and parameter from the function and appended it to the root domain where the function was discovered.

However, the “actual” page that was being executed was not http://rootdomain.com/products.aspx? The real URL structure was more like, http://rootdomain.com/directory1/directory2/products.aspx? When Google executed the erroneous URL that it had created based on its assumptions, the pages were broken because Google logic did not fully understand that the products.aspx page resided in a relative path rather than an absolute path.

Functional duplicate URLs occur when pages such as products.aspx are called in Javascript and also reside in the absolute path. In cases like this, Google pulls page names and parameters from Javascript functions and appends them to the root URL to actually create a functional (although non-intentional) page. Now Google’s assumed URL construction does render an actual page that functions correctly. But that functional page is a duplicate page because, at the same time, a rewritten URL exists that renders the same content.

A few months after making this unfortunate discovery, Google informed us to stop the practice of rewriting URLs. What? In the sense that Google is now collecting “raw” data pages before a rewrite, and that a rewrite can cause duplicate content, Google says that it prefers the raw version of the URLs, the dynamic URLs, rather than rewritten ones.

That would be fine and nice if so many websites weren’t already using URL rewrites. Secondly, even though Google prefers the dynamic URLs now, Bing certainly does not. It would make better sense if Webmasters could instead include a tag in their pages such as “meta name =’discovery’ rel=’noJS’” whereby Google would not try to execute Javascript to “discover” pages that would result in the creation of duplicate content.

Sphere: Related Content

Google Adds Site Search to Search Result Listing

Wednesday, April 29th, 2009

I just did a search for the March of Dimes this morning and saw something in the Google SERPs that I had never seen before. In addition to the normal blue title tag link and a series of site links, there was an additional search box that allows for searching for a term within the actual site listed in the search results. That’s hard to explain so here’s a picture for clarity.

March of Dimes Search Result with site search

At fist glance, I thought the search box was using the March of Dimes’ website search box since they have a site search box at the top left position of their home page. It looked like Google just pulled down that piece of functionality. But that’s not the case. It turns out that search box in the Google search results listing performs the following query: [search term site:sitename.com] which is using a Google search to look for a specific search term within a given domain.

I don’t think this is going to send shock waves around the world but it is an interesting way for Google to allow for the exposing of more advanced search techniques to users who would not otherwise have the savvy to execute such searches. Apparently Google is expanding their sitelinks to include more ways to get to specific parts of a website. Here is a link to the blog post about their expanded sitelinks program.

Sphere: Related Content

Google Declares Querystring Problems a Myth

Monday, September 29th, 2008

I thought this was one of the biggest cases of amnesia in Google’s short history. Just six years ago, I was at a SES conference and had a discussion with Matt Cutts involving long querystrings and Google’s ability to crawl dynamic content. Essentially, he recommended that I reduce the number of query sting variables to two or less. And he was correct.

When I took drastic measures to narrow down the query string variables on Apartmentguide.com (by using some crazy xml import based on local page variables) our indexed page count sky-rocketed. Within 2 months our rankings climbed to the top of SERPs and our traffic went from 60k/month to 500k/month. Within a year, our traffic rose to nearly 1 million/month.

Just last week, however, Google’s webmaster Blog made the following statements:

Myth: “Dynamic URLs cannot be crawled.”
Fact: We can crawl dynamic URLs and interpret the different parameters.

As well as:

Myth: “Dynamic URLs are okay if you use fewer than three parameters.”
Fact: There is no limit on the number of parameters, but a good rule of thumb would be to keep your URLs short (this applies to all URLs, whether static or dynamic).

It may be true today that dynamic urls can be crawled when they are choc full of variables, but that has not always been the case. Calling it a “myth” is a little strange. Mod_Rewrite and Isapi Rewrite weren’t invented for nothing. And there definitely was a time when pages with long query string parameters were simply ignored by all search engines.

I think it is great that Google has overcome these obstacles to reading content. But because they made these gains does not make history a myth. Before you know it, they might claim that it is a myth that Google can’t read Flash content.

Sphere: Related Content