all repos — h3rald @ 23a5f82df1d1e5494437cc0412a2788d873c9f3a

The sources of https://h3rald.com

contents/articles/pagerank.html

 1
 2
 3
 4
 5
 6
 7
-----
title: "The Green Bar"
content-type: article
timestamp: 1134133434
tags: "google|internet"
-----
Since 1998 SEO experts, webmasters, and even casual users spent ages trying to figure out the magic within that small green bar... but what's really behind Google's most famous invention?If you never experienced the sensation of looking at such a <em>green bar</em> before, then maybe you don't know what I'm referring to; I suggest downloading and installing the Google Toolbar[1]. This IE add-on (now available for the Firefox browser) was developed by Google years ago and still remains the most common way to view a website's <strong>PageRank</strong> through a simple bar with a variable length, according to a 10 point scale.<br /><br />I quietly mentioned the infamous word <em>PageRank</em> earlier, but what is it?<br />Some people think the idea of the word might come from a pun involving one of Google's co-founders (Larry <em>Page</em>), while others simply think it was the most obvious choice for a system which was supposed to <em>rank</em> pages according to importance and popularity. Anyhow, the only certain thing is that two (insert appropriate adjective here) students of Stanford University wrote a paper, in 1998, called "The Anatomy of a Large-Scale Hypertextual Web Search Engine"[3], in which, they discussed some interesting ideas for developing a large scale search engine using a particular algorithm they invented, which was supposed to help delivering the most relevant results for any search query provided by a user of the service.<br /><br />It is also certain that these two guys, Larry Page and Sergey Brin, eventually made an awful lot of money in the following years, developing and expanding an initially simple-looking website/web application with a funny name[4] and turning it into one of the biggest and most profitable businesses in the history of Computer Science. But let's now examine how PageRank works. <br /><br /><br /><strong>Deus ex machina</strong><br />  Google's co-founders kindly provided a short text summing up their innovative (and perhaps secret) technology[5]. In particular, one paragraph seems to offer a brief and simple explanation of how PageRank works:<br /><fieldset><blockquote><br /><em>PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important", weigh more heavily and help to make other pages "important."<br /></em></blockquote></fieldset><br /><br />  The first time I read this paragraph, I really experienced a feeling of admiration and ecstasy for these two enlightened minds who decided to bestow their priceless gift on the World Wide Web: a system which gives every page the due importance through a democratic system. Isn't it wonderful?<br /><br />  Of course there's (much) more to it than a short paragraph, and obviously this <em>explanation</em> wasn?t enough for those people (webmasters, SEO experts, kids creating their online family albums, etc.), who gradually became more and more interested in knowing further details about the system, hoping that it would have improved their placement in Google's search results.  <br /><br />  Indeed, PageRank contributed to label some sites as <em>important</em> and gradually the number of  ?PageRank 10? websites[6] began to rise, but generally remaining a prerogative of important names of the IT industry (Microsoft, Apple and obviously Google itself, for example). But how did such sites achieve that? How did the green toolbar grow so much for them and not as much for your grandma's personal webpage?<br /><br />  Soon enough, theories and speculations produced an approximation of the algorithm[7], which is generally thought to be an acceptable model to understand how the system works.<br /><br />Take the following equation:<br /><br /><em>PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))</em><br /><br />Where:<br /><br /><em>PR(A)</em> - The PageRank value of a certain page<br /><em>PR(Tn)</em> - The PageRank value of all pages linking to A<br /><em>C(Tn)</em> - The number of links present on page Tn<br /><em>d</em>(... - "damp factor", thought to be 0.85 <br /><br /> It now appears clear that the PageRank of page A depends on the number of pages linking to it. Furthermore, important factors taken into consideration are the <em>quality</em> of such pages (i.e. whether they have a high PageRank themselves or not) and the number of links present on each page, which causes the vote to be <em>divided</em> equally among them. <br /><br />  This is, in a nutshell, how PageRank is supposed to work. This is obviously a simple model, and there's actually a more mathematical/probabilistic approach[8] which goes beyond the scope of this article and requires some notions of probability theory.<br /><br /><br /><strong>Considerations and opinions</strong><br />With this model in mind, it's now possible to understand how (in a very simplified way) Google works: each month Google spiders search the web, and follow links from a page to another, keeping track of the "votes".  PageRank is then calculated for every page and updated. This process normally takes a lot of time and, as a matter of fact, PageRank seems to be updated only every 4 months nowadays: these trimester updates normally causes a page to increase its rank by one (or more if you're lucky) level on the bar, or in some cases, lower it in the same way.<br /><br />  By taking a closer look at the formula proposed above, you'll notice that the maximum value of PR(A) is by no means equal to 10, as it depends on how many pages link to A and how many outbound links there are on such pages. As a matter of fact, people started speculating on the nature of the scale used for PageRank: on the toolbar it ranges from 0 to 10, while in reality a PageRank 10 (take Microsoft.com for example) should correspond to <em>some millions</em> in practice. <br /><br />  The most accredited theory is that the PageRank displayed on the green bar is the result of a sort of correspondence between real values and such 0 to 10 scale. Also, people suggested that such scale is in fact a base 5 (or 6) logarithmic scale. This would explain for example why it takes much longer to acquire PageRank 7 from PageRank 6 than acquiring PageRank 3 from PageRank 2.<br />For the non-mathematical minds, a <em>logarithmic scale</em> is a succession of numbers NOT incremented by "1" or a fixed quantity, but by an always-growing exponential factor: taking a base-10 logarithmic scale, values of 1,2,3 would correspond respectively to 10^1, 10^2 and 10^3 (10, 100, 1000).<br /><br />  For a long time Google seemed to use PageRank as an important factor for getting first places in search results, and it's still partly true: if you search for the keyword "Italy" you're likely to find some high PR sites as first results.<br /><br />  This resulted in all the possible forms of speculations: webmasters started asking money for publishing links on high PR pages, and similarly SEO experts started adopting various infamous tactics to obtain a high PageRank for their customers: this includes, for example, <em>link farms</em>[9].<br /><br />It's now clear that what is was believed to be a solution relying on the <em>uniquely democratic nature of the web</em> turned out to be a complete failure in that sense, because the very basis of the concept is wrong. Sad, but true, the WWW is by no means democratic at all. <br /><br />  Another complaint against PageRank was that new sites took ages to acquire <em>respectable</em> PageRank and therefore appear on the top of search results, no matter how wonderfully they were written. This is still partly true, as anyone can notice by searching Google, but the algorithm itself is continuously being tweaked both for stopping spammers and link farms, and also to favour those sites which provide relevant and appropriate content and are not up to some dodgy trick; I must admit that the situation is gradually getting better.<br /><br /><br /><strong>Case Study: ItalySimply.com and h3raLd.com</strong><br />I'm now going to discuss my own personal experience with PageRank applied to my two websites, ItalySimply[10] and h3raLd Labs[11]. While the second one is not currently advertised or promoted, because at the moment I don't have enough time for other web developing projects, with the first one I tried to follow a <em>SEO Strategy</em> trying to acquire PageRank and good placement in search engines.<br />You can see the result yourself: ItalySimply acquired PageRank 5 and h3raLd PageRank 4: not bad at all considering they are both two relatively new websites, ItalySimply being officially born in August 2004 and h3raLd Labs actually had some serious content from April 2005 on. <br /><br />  For ItalySimply, I even experienced a period of <em>PageRank 0</em> which lasted about 2 months: although according to Google all websites should have at least PR1, PR0 is used to penalize some <em>unusual</em> behaviour which in my case was a <em>302 - Temporarily Moved</em> redirect which was necessary to redirect users to a subfolder of the server. Later on I learned how this can be interpreted as a dodgy redirection by search engines[12], and why I was penalized by Google for this with a PR0. After noticing the mistakes, I immediately started a strategic link campaign; obtaining links from some good sites (also with high PR) related to mine, and PageRank for ItalySimply began to grow, from 0 to 3, then 4, and just recently 5.<br /><br />  At the same time, I re-designed h3raLd.com and noticed that it acquired PR1, because it was already listed in Google and didn't get any <em>vote</em> from other sites. I then decided to put a link to h3raLd Labs on <em>every</em> page of ItalySimply, which are now ranging from PR5 to PR2. <br /><br />  The result was an immediate growth of h3raLd.com in terms of PR, which reached an acceptable 4 without <em>any</em> link swapped, banner displayed on behalf of other sites, or anything as such. <br /><br />  The difference between the two sites though is much bigger than 1 point on PR, in terms of placement in search results: ItalySimply has some relatively interesting content and various pages, and it ranks good enough on MSN and Yahoo, and even Google, to an extent; h3raLd.com has just 4 pages and doesn't seem to appear at all in search engines, unless you search for something like "h3raLd". Again, this is a proof that nowadays PR doesn't mean immediate placement on the top of search results.<br /><br /><br /><strong>Final Considerations</strong><br />  Although PR is by no means the unique factor to determine search engine placements, it's still certainly important as a <em>co-factor</em>. As I said, it's still extremely difficult for a new page with low PageRank to place before a high-ranked one. Surely, if I decided to put something more interesting on h3raLd.com I would get better results than buying a new domain and creating a new site: old sites with high PR are still <em>naturally</em> inclined to rank better than new ones. Got that? Now, all you need to do is buy a really stupid domain name and create some pages for it, then think about it like a bottle of whisky; let it age for a while making it get some respectable rank: when you have a clever idea you'll have your ready-made place to promote it!<br /><br /><em>In Google we trust!</em><br /><br /><br /><br /><strong>Sources and related links:</strong><br /><br />[1] Google Toolbar, <a href="http://toolbar.google.com/">http://toolbar.google.com/</a><br />[2] Stanford University, <a href="http://www.stanford.edu/">http://www.stanford.edu/</a><br />[3] Lawrence Page and Sergey Brin, "The Anatomy of a Large-Scale Hypertextual Web Search Engine", Computer Science Department, Stanford University, <a href="http://www-db.stanford.edu/~backrub/google.html">http://www-db.stanford.edu/~backrub/google.html</a><br />[4] Google, <a href="http://www.google.com/">http://www.google.com/</a><br />[5] Google Technology, <a href="http://www.google.com/technology/">http://www.google.com/technology/</a><br />[6] List of PageRank 10 sites, <a href="http://www.searchenginegenie.com/pagerank-10-sites.htm">http://www.searchenginegenie.com/pagerank-10-sites.htm</a><br />[7] Ian Rogers, "The Google Pagerank Algorithm and How It Works", IPR Computing Ltd. <a href="http://www.iprcom.com/papers/pagerank/index.html">http://www.iprcom.com/papers/pagerank/index.html</a><br />[8] Pagerank, Wikipedia page, <a href="http://en.wikipedia.org/wiki/Pagerank">http://en.wikipedia.org/wiki/Pagerank</a> <br />[9] Link Farm, Wikipedia Page, <a href="http://en.wikipedia.org/wiki/Link_farm">http://en.wikipedia.org/wiki/Link_farm</a><br />[10] ItalySimply - Italy Real Estate Services and Relocation Help, <a href="http://www.italysimply.com/">http://www.italysimply.com/</a><br />[11] h3raLd Labs - Freelance Web Development, <a href="http://www.h3rald.com/">http://www.h3rald.com/</a><br />[12] "The Rundown on 301 and 302 redirects", September 10th, 2004, <br /><a href="http://www.rankforsales.com/seo-articles/301-and-302-domain-name-redirects.html">http://www.rankforsales.com/seo-articles/301-and-302-domain-name-redirects.html</a><br />