October 30, 2007

ART POWER - How Google counts links


Just came form holidays. Well, obviously not. Sorry. Just needed to invent some excuse for not writing for such a long time. My holidays ended weeks ago but as it always happens you come back to work only to discover such an enormous backlog that it takes twice as long as my holidays lasted to sort it out. So, going through the pile of things amassed over time I came about this draft that I started sometime in midsummer. I double-checked the factual side of things and all seems perfectly actual even now. So, after a little polishing I decided to publish this post in a pretty much the same shape it would have appeared in August (better late than never).

Doing SEO myself for quite some time and having a good number of friends and acquaintances of the same trade, naturally I had plenty of opportunities to observe and analyse different SEO practices and eventually I came to a rather strange conclusion: it seems there are probably only two types of SEOs in whole world. All of us have to deal with the same set of circumstances, namely we have to feed something to Search Engines in order to get a desired output from them and we have not a slightest idea of what is happening in the process. Search engines are perfect black boxes for us as we know what comes in and we can see what comes out but we are not allowed to see inside the box. There may be two ways to approach this rather tricky situation. One type of SEOs is practicing an approach that is very much akin to ancient magic. They take their client, perform some strange divination rituals on them and utter their prophecies that are just bound to be fulfilled. It isn’t a big deal if they don’t – there are so many taboos in their recommendations that it’s simply impossible not to break at least one of them. And of course this sin of breaking taboo is the only reason the magic didn’t work, that is of course if it didn’t, because if it did it is only because it is what it is – magic. Of course I know some from this bunch whose magic really works no matter what the circumstances are, but these are rare exceptions. Remarkably, SEO magicians are almost exclusively found among the troops of so-called white-hat SEO. The other party that is quite distinct from the former one consists of those who are mostly fascinated by the workings of the engine itself. The greatest joy for them is to catch a glimpse of those gear wheels, triggers, levers and latches all in motion. The pleasure, I confess very much akin to that of a Peeping Tom watching naked Lady Godiva who incidentally happened to be a patron of all engineers. The grand idea is in this case is ultimately to perform some sort of a reverse engineering upon Search Engine mechanism thus achieving a thrilling sensation of superiority. But this could never be perfected to the end perhaps not only because these black boxes are so carefully guarded by their owners, but also because even for the developers of Search Engines they are now exactly this - perfect black boxes. Otherwise why Matt Cutts would be using the word heuristics to describe the working of the Google engine so many times?

So, the article here is probably naye good for divination purposes but might be of some interest to those among us who belong as I do to the voyeur type.

This whole thing was triggered by a strange example that was discovered quite by chance: once I’ve been checking Google output for a rather unusual search: art power. Don’t ask me why, as it is just my little hobby to check out weird searches. This search phrase is a strange fruit indeed as it belongs to a class of generic searches, those that are likely not to produce large volume of searches but naturally have extremely high competition. This one quite as expected returns 321,000,000 results (at the time of writing). And now allow me just a small side note on the way Google deals with generic searches: the results tend to have much larger than average proportion of pages coming from large media resources, often newspapers like NYT and so on and media portals like BBC etc (you got the idea). Hence, as I observed, if you are able to obtain a link from a resource of this kind it will directly boost the responsiveness of your site to generic searches. But now let’s come back to the subject. Somewhere within results 351 - 360 for this search you will find this strange page. Mind you it isn’t such a bad position considering the competition – we can insist that at least 320999640 pages (321,000,000 - 360) are less relevant in Google’s opinion that the page in question. By now I expect you couldn’t resist the temptation to click on the link above to examine the actual page. Now, what do you see? Correct, a perfectly blank white page, no content whatsoever. Shocking enough as such isn’t it? And now note an even more distressing detail: Google thinks there is the word art on this empty page! What the hell is going on you ask. Although you will not get a definitive answer even if you open cached text version although it will get you much closer. I hope you in no time realised that what we are dealing here with is a frameset. Remarkable on its own and even allows us to formulate our first conclusion: it seems likely that Google assigned the content of frame to the frameset URL. But we’ll pass this remarkable fact without much discussion here as it is only indirectly relates to our question. Suffice to say that the preference to the frameset instead of nested pages is most likely defined by the overall site structure rather than by the framed page construction. Just to remind ourselves that our main question is: how come such a page ranks so high in Google? The only way to learn this is to go to the page itself and examine what it is made of. First of all of course goes our indispensable virtual currency - the Google PR and the page itself has moderate PR 3. Remarkably the word art is present only three times in one tight paragraph somewhere towards the very end of the page plus these three letters can be met as parts of bigger words about a dozen of times. Surely, it’s not enough to achieve such a high position. And what about the second word, power, as apparently it is nowhere to be seen on the page itself? And as always View Page Source is at our disposal to help in our predicament. Now drum roll, please! What do you see? Shock! Horror! We see the old number sign link trick inadvertently used on this page about 30 times. For those who are not aware of this trick here is a brief story. Just to remind you, the number sign as a partial value of the href attribute of the a tag was initially introduced to be used together with the name attribute to provide an easy on-page navigation. Basically it creates a relative link to another part of the same document. Subsequently with introduction of JavaScript this became a tool to create a JavaScript OnClick event without actually sending the browser away from the page and this is exactly how it is used in our example. The SEO part of it mainly consisted of a superstition that circulated since times immemorial that stated that placing a number sign link would somehow magically increase the potency and power of a page. I must admit I used this trick indiscriminately so many times in my youth without even bothering to think what it actually does, just relying on its magic powers. When I started paying more attention to what I am actually doing it turned out to be not so difficult to figure out what is happening with a number sign link. Basically Search Engine for some obscure reason takes these links seriously and assumes they are perhaps as valid as any other internal link. The only difference in this case is that a page links to itself which supposedly creates some kind of a loop back in PR or other link ranking calculations and as a result it should double or at least amplify the importance of the page. Presumably the effect of an outside link pointing to a page with this kind of loop back should also be amplified compare to an ordinary page thus allowing to create powerful pages on purpose. And all this is just because the number sign link somehow temporarily short-circuits Search Engine algorithms. Not bad, isn’t it? But it is surely black hat, isn’t it? Well, as usually it depend on the definition. I assume there is no algorithmically doable way to distinguish between a normal use of a number sign in link and its SEO-related application. Although in theory it is a punishable offence, in practice the prosecution can not be enforced as it remains untraceable. So in effect Search Engines algorithms could not penalise its usage, but only compensate for its effect. Being such an old trick I thought it had been compensated for years ago by all main Search Engines. In any way in my recent memory only MSN had been observed to be still prone to this, but who takes MSN for something other than a joke in terms of their algorithms? And now I am proven to be wrong again. As you can see the mighty Google itself is swallowing the same old bait. What in the world is going on here? We’ve observed hoe this page might have received a preferential treatment but still have no answer to our main question: why it ranks for art power having no mention of power on the page itself? Now let’s take a closer look at those loop back links we discovered. Mon Dieu! (Pardon my French) Yet another shocking revelation awaits us here. What we see an OnClick JavaScript alert that simply throws up a popup message that has the word POWER twice in each text element. If we count all instances this word is repeated altogether 62 times on this page! If this was a part of the page text I would not have been surprised at all. It reminds me so much of a classical keyword stuffing for a two words search where the first word has a very low density whilst the second one is overstuffed. This combination had been known to work since times immemorial and is working even now inasmuch as keyword density still matters. The preference for this unbalanced pair type of pages for two word search phrases perhaps comes from the fact that pages with even density of both search words en even more so with both search words coming together in sequence get penalised for fear of keyword stuffing and those pages that look the least similar to suspicious pages get favoured. All in all it would have worked out perfectly if the text of this JavaScript alert was counted as a part of page text, but we all remember the textbook SEO axiom: Search Engine Robots don’t read JavaScript. Or do they? OK, what we really know with a high degree of probability is that they don’t execute JavaScript. And this is only based on our assumption that it is much too much in terms of computing resources to attempt to run each and every section of JavaScript code on the net. But again, it is only our assumption, albeit confirmed by similar declarations coming from Search Engine spokespeople, but do we always have to trust their statements? Naturally we can not assume on this basis that Search Engines don’t read JavaScript code and don’t use it as a part of their relevancy calculation algorithms. Particularly attractive this idea looks when applied to our case where JavaScript alert message can be seen under certain circumstances as part of the overall page text since human agent can read it after all. Admittedly there are some arguments to counter this hypothesis. Formally the JavaScript alert is not counted as a page content proper which can be easily confirmed by the absence of our page in this search. Surprisingly enough we don’t find our page even if we search in anchor like this.
Equally however we know that Google never tells us the whole truth. Google insists that the word power is present in links leading to our page but at the same time displays no results for link operator search for this URL. Clearly something is missing here. After all Google allows us to see only what it wants us to see. Hence I’ll stick to my guns and offer the only explanation I think is plausible for this example? However strange and unorthodox it might look: Google actually reads JavaScript and hence

  1. Google counts the word POWER in JavaScript alert towards the total in keyword density thus taking the page for the desired unbalanced pair type
  2. Google counts the word POWER in JavaScript as anchor text too (actually I have reasons to believe that virtually any text contained within a tag will be counted as an anchor text)
  3. Google is still prone to the number sign link loop back

This is so far the only possible explanation I can find for the phenomena we just observed but I am myself very far from being completely satisfied by it. The flaw in my analysis is apparent and should I mention that I made no effort to hide it? Having no better explanation myself I’ll be ready to accept the challenge and if you are ready to come up with an alternative I will only be happy to get engaged in a fruitful discussion. To aid those prepared to offer an alternative analysis of this example I may throw in another curious fact, namely the root of irational.org domain itself ranks rather high for our art power search too.

If the line of reasoning drawn above holds true, the consequences are quite intriguing. Not only one may extract some practical tricks from these features, but it also confirms that some statements coming from Google itself were at least premature, particularly that the possibility of effective Google Bowling is still alive and kicking.

For those who expected to discover some sort of magic potion in this article I may only say: I don’t sell snake oil here.

tags: , , , , , , , , , , , , , , ,

Posted by LZZR under SEO Tricks, Google | Comments (3)

May 23, 2007

Is Google AdSense actually a Pay Per Impression Network?


Google AdSense continues to surprise and since my last post on this subject I happened to bump into yet another Ad sense thing that bothers me so much that I just can’t help sharing it with you.
If you are like me do most of your Pay Per Click monetization via Google AdSense you just have to be watching what it does very closely and attentively as good part of your income depends on it. And if you do, you might have noticed as I did some time ago that from time to time AdSense statistics spit out some funny figures. Almost every other day I see some funny irregularities like a channel that had no clicks but some impressions suddenly shows greater than zero eCPM. I am sure I am not the only one who had noticed this strange irregularity. For a long time I tended to write this off thinking that big numbers maths is something way beyond my understanding and all this would have been forgotten if there was not another more intriguing figure that tended to pop up less often but still often enough for me to take a note of. Occasionally in the column directly adjusted to the eCPM one and inconspicuously called Earnings I could see something like US$0.01. Nothing unusual, you say. Well, it would have been so if a corresponding value of a column titled Clicks was not displaying a big round zero. Needless to say that Page CTR cell also quite logically had zero percent in it.

Now, here is the question: how come I earned a cent from a Pay Per Click Network without having any clicks? Must be a miracle - IN GOD WE TRUST - indeed…One US Cent

Can’t grasp the logic of all this. Setting aside the possibility if divine intervention as being highly unlikely in my case, let’s look for possible materialistic explanations. No big number theory can explain this phenomehae as from the course of elementary maths I remember that whatever you do with zero either you divide or multiply by zero you should always get zero. So, how was I given this one cent out of nothing as The Universal Laws of Science and The Law of Creation and Preservation of Matter and Energy in particular tell us that one cent coins don’t come from nothing under normal circumstances (and even less so do dollar bills). My most immediate thought was that Google rewards not only clicks but impressions as well (hence the title of this post). However strange it might seem if you think about it in fact it is only logical to reward sites performing well in terms of impressions but having users dumb enough not to click like crazy on those highly attractive and clickable AdSense blocks. Now, when temptation to look for AdSense alternative seems to be on the increase it does make sense to add a bit of per impression to the traditional per click concept to stimulate publishers thus saving your Ad network from collapsing. The only problem here is how you do it? The easiest way seems to introduce some bonus per view coefficient. But here another problem arises: where do you get funds to pay per view premiums if you are a per click network? Well, the answer is not so difficult to guess– of course from the part advertisers are paying into your ad network as per click rates (it would have been plain stupid to deduct it from your own profits, wouldn’t it?). But now you are facing another difficult task – how to calculate this whole thing as simple arithmetic i. e. 100% per click fee split between a publisher and a network at a known proportion will be of no help. For this the notion of heuristics widely popularized by GoogleGuy AKA Matt Cutts comes to the rescue.
Here is how Wikipedia defines Heuristics in Computer Science (I really like this one :) ):

In computer science, a heuristic is a technique designed to solve a problem that ignores whether the solution can be proven to be correct, but which usually produces a good solution or solves a simpler problem that contains or intersects with the solution of the more complex problem.
Heuristics are intended to gain computational performance or conceptual simplicity, potentially at the cost of accuracy or precision.

Black Box schematics diagramNot only this method is something of a black box - we know what comes in and we see what comes out and how it does this we don’t really know and don’t care (see Black Box schematics diagram showing how BlackBox works for details). It also outputs results that are not quite accurate and not exactly precise. A method pretty much the same as Throw Shit At a Wall technique in SEO by Dax. Now, I think we’ve got an explanation!
I can not be sure that Google AdSense does heuristics for the purpose of rewarding per view sites (or should I say only for this purpose?) but I am pretty sure it placed a big black box full of heuristics between an advertiser and ourselves AdSense publishers and guys at Googleplex do not always know what kind of shit comes out of it and I assume they don’t care much as long as it suits their profits.

tags: , , , , , , , , , ,

Posted by LZZR under Advertizing, Google | Comments (1)

February 18, 2007

Google cheats on Webmasters


As you might have noticed from my post the other day I was a bit ironic about thingy. It’s been noticed by many that even those highly publicized innovations like backlink checker are not working as even a simple search like this one gives you much more links that are seen from Google webmaster tools. Now the hard proof emerged that Google is not only lame, but also is absolutely inadequate providing merely wrong information to webmasters. Just look at this screenshot:LZZR.com is the number one for Yahoo search according to Google Webmaster Tools
I was nearly blown off my chair when I first realized that according to this lzzr.com must be ranking first for the term Yahoo on Google. I must admit that my first thought was - thank God, my suffering is over, now I am the King of the web 2.0! And immediately after I was about to rush in panic to order a dedicated server and upgrade to a wider connection fearing lzzr.com in its current state would not stand the traffic explosion. Alas, there are no miracles in this world and I did not take me long to come back to reality and see that this simply can not be true :-(
Now the questions are: If this isn’t in fact true than

  • Why on earth Google is lying to me?
  • What actually Google shows me?

The former one is easier to answer as it seems Google had become paranoid about disclosing even the smallest bit of info that might provide food for thought and help understanding their algos. I’ve mentioned already that Yahoo has a completely opposite policy: while Google becomes more and more secretive, they open as much as they can whuch is very helpful in daily SEO work.
The latter is a kind of nutter as I simply can not believe what I see and the only explaination for this I can find is that Google displays some kind of “microwawe-ready” results for ranking and the real ones go through an extensive cooking process before being served to the customer. I can only guess that at a certain stage in their SERP calculation lzzr.com indeed happened to rank first for term (could it be due to my ingenious eSEO? LOL). I can even suspect that this might be so since I used a good old Yahoo regional domain but otherwise I have no clue as you don’t really think I am mad enough to optimize for Yahoo?
Ultimately, it all boils down to the fact that something is rotten in the kingdom of Google as it displays such crap and wants me to believe in it.
PS Nevertheless, how I wish it was true LOL!!!

tags: , , , , , , , , , , , , , , ,

Posted by LZZR under Google, Blog | Comments (5)

February 6, 2007

Google Opens up Backlinks but not to Everyone


During the recent not only we suffered the usual (i. e. disappearance of entire sites together with jumping PR) but also it stopped displaying backlinks in any comprehencible manner. I switched to to see my backlinks!
But now, hurray! The announcement comes from the babe-swamped heart of Googleplex - you can actually see your backlinks and we always knew about them, just did not want to show them to you. Here we go.
Very nice you think. At last you’ll be able again like in good old times see who links to you and who links to your competition and refine your linking strategy.
No, no no. Not so fast, buddy! It’s OK for you to see your backlinks but don’t evn think of trying to get more intelligence on your rival. We’ll show to the world the usual crap and only you will be able to see the true picture, but only about your own site - says Google.
Effectively, hiding backlinks from the public by de-facto disabling link: operator option in their search whilst at the same time giving only registered webmasters see a bit of their backlinks via interface they provide all big cats who can afford professional SEO monitoring with a huge and unfair advantage over those small guys who struggle to lift their Web 2.0 estate off the ground.
In contrast Yahoo opens their Site Explorer API to everyone thus allowing the commoners at least to see who links whom :-)

tags: , , , , , , , , , , ,

Posted by LZZR under Google | Comments (1)

January 20, 2007

SEO and SEDD - the possibility of malicious deranking


When you go through SEO FAQ pages or ask an average SEO guru or a Search Engine spokesperson if Search Engine Ranking of your site can be deliberately harmed by your competitor the answer is always short and definitive: NO, no site can be harmed this way. Work to improve your site and you will score, don’t think about your competition this way!
Some recent events demonstrate that the issue is not necessarily so black-and-white.
In a remarkable post last month titled How Google handles hacked sites Matt Cutts in a rather amateur PR (I mean Public Relationships in a very old-fashioned sense) exercise describes a story in which a popular website gets hacked and how Google handles this case. A predictable happy-end crowns the story and we can’t help applauding the gracious and the merciful power of the almighty Google.
It’s not that I am taking pleasure watching the chief Google PR officer having to apologize facing another popular uprising. Neither I am worried too much about petty little sites that usually deserve much less attention and much less forgiving from the merciful one. Incidentally this case provoked quite a slightly different line of thought.
The site in question is a well-known established resource never accused of any kind of spamming activities before. The punishment administered to the site seems far too severe - not just a penalty imposed on the the ranking but a complete exclusion. I quote:

the site was classified as hacked and spammy. We stopped showing it for user queries.

Evidently it was known to Google that the site was hacked most likely webmastes had nothing to do with it, apart from being slightly careless about the security issue. So, the site itself was only indirectly responsible for the event and only via a certain degree of negligence, the punishment however was administered to the fullest extent whilst the guilty party, the culprit went unpunished. Hackers in this case had no intention to harm SEPRs of the site, they were just hunting for incoming links to their doorways. The site itself fell victim of a crossfire whilst being just an innocent bystander. It is not so difficult though to imagine that some other hackers may have a completely opposite objective and try to use the same trick to exclude, albait temporarily the victim site from search engines. In most cases one does not even need to hack a victim website to inflict damage on SEPRs. Merely placing the victim into a Bad Neighbourhood might suffice.

I have no intention of publicising or even disclosing all possible tactics that might lead to eventual SED or even exclusion of a victim site - I really fear it might set a Genie out of the bottle. Suffice to say that any literate SEO practitioner with a bit of experience can quickly design a range of tactics deliberately designed to harm SEPRs of their competitors. I propose calling such tactics - Search Engine Deliberate Deranking (as opposed to SEO proper).
The Ostrich approach to the problem prevalent at the time of writing will help no more… Unless there will be clear guidelines regarding possible SEDD issued by the major serach engines, such practice may start snowballing and this spells a disaster for all of us. So there are some questions to be asked and perhaps answered by those known to speak on behalf of our Big Brothers (Google, Yahoo, MSN). Namely, how Search Engines will behave in hypothetical cases of SEDD and is there any guarantee that SEDD is not being practiced already?

terms SED (Search Engine Deranking) and SEDD (Search engine Deliberate Deranking) are copyright © LZZR.com :-)

tags: , , , , , , , , , , ,