all repos — h3rald @ fbc8bd883021cac676acbeab44d9e5cec72f8bb4

The sources of https://h3rald.com

contents/articles/pagerank.html

 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
-----
title: "The Green Bar"
content-type: article
timestamp: 1134133434
tags: "google|internet"
-----
Since 1998 SEO experts, webmasters, and even casual users spent ages trying to figure out the magic within that small
green bar... but what's really behind Google's most famous invention?If you never experienced the sensation of looking
at such a <em>green bar</em> before, then maybe you don't know what I'm referring to; I suggest downloading and
installing the Google Toolbar[1]. This IE add-on (now available for the Firefox browser) was developed by Google years
ago and still remains the most common way to view a website's <strong>PageRank</strong> through a simple bar with a
variable length, according to a 10 point scale.<br /><br />I quietly mentioned the infamous word <em>PageRank</em>
earlier, but what is it?<br />Some people think the idea of the word might come from a pun involving one of Google's
co-founders (Larry <em>Page</em>), while others simply think it was the most obvious choice for a system which was
supposed to <em>rank</em> pages according to importance and popularity. Anyhow, the only certain thing is that two
(insert appropriate adjective here) students of Stanford University wrote a paper, in 1998, called "The Anatomy of a
Large-Scale Hypertextual Web Search Engine"[3], in which, they discussed some interesting ideas for developing a large
scale search engine using a particular algorithm they invented, which was supposed to help delivering the most relevant
results for any search query provided by a user of the service.<br /><br />It is also certain that these two guys, Larry
Page and Sergey Brin, eventually made an awful lot of money in the following years, developing and expanding an
initially simple-looking website/web application with a funny name[4] and turning it into one of the biggest and most
profitable businesses in the history of Computer Science. But let's now examine how PageRank works.
<br /><br /><br /><strong>Deus ex machina</strong><br /> Google's co-founders kindly provided a short text summing up
their innovative (and perhaps secret) technology[5]. In particular, one paragraph seems to offer a brief and simple
explanation of how PageRank works:<br />
<fieldset>
    <blockquote><br /><em>PageRank relies on the uniquely democratic nature of the web by using its vast link structure
            as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as
            a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page
            receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves
            "important", weigh more heavily and help to make other pages "important."<br /></em></blockquote>
</fieldset><br /><br /> The first time I read this paragraph, I really experienced a feeling of admiration and ecstasy
for these two enlightened minds who decided to bestow their priceless gift on the World Wide Web: a system which gives
every page the due importance through a democratic system. Isn't it wonderful?<br /><br /> Of course there's (much) more
to it than a short paragraph, and obviously this <em>explanation</em> wasn?t enough for those people (webmasters, SEO
experts, kids creating their online family albums, etc.), who gradually became more and more interested in knowing
further details about the system, hoping that it would have improved their placement in Google's search results.
<br /><br /> Indeed, PageRank contributed to label some sites as <em>important</em> and gradually the number of
?PageRank 10? websites[6] began to rise, but generally remaining a prerogative of important names of the IT industry
(Microsoft, Apple and obviously Google itself, for example). But how did such sites achieve that? How did the green
toolbar grow so much for them and not as much for your grandma's personal webpage?<br /><br /> Soon enough, theories and
speculations produced an approximation of the algorithm[7], which is generally thought to be an acceptable model to
understand how the system works.<br /><br />Take the following equation:<br /><br /><em>PR(A) = (1-d) + d (PR(T1)/C(T1)
    + ... + PR(Tn)/C(Tn))</em><br /><br />Where:<br /><br /><em>PR(A)</em> - The PageRank value of a certain
page<br /><em>PR(Tn)</em> - The PageRank value of all pages linking to A<br /><em>C(Tn)</em> - The number of links
present on page Tn<br /><em>d</em>(... - "damp factor", thought to be 0.85 <br /><br /> It now appears clear that the
PageRank of page A depends on the number of pages linking to it. Furthermore, important factors taken into consideration
are the <em>quality</em> of such pages (i.e. whether they have a high PageRank themselves or not) and the number of
links present on each page, which causes the vote to be <em>divided</em> equally among them. <br /><br /> This is, in a
nutshell, how PageRank is supposed to work. This is obviously a simple model, and there's actually a more
mathematical/probabilistic approach[8] which goes beyond the scope of this article and requires some notions of
probability theory.<br /><br /><br /><strong>Considerations and opinions</strong><br />With this model in mind, it's now
possible to understand how (in a very simplified way) Google works: each month Google spiders search the web, and follow
links from a page to another, keeping track of the "votes". PageRank is then calculated for every page and updated. This
process normally takes a lot of time and, as a matter of fact, PageRank seems to be updated only every 4 months
nowadays: these trimester updates normally causes a page to increase its rank by one (or more if you're lucky) level on
the bar, or in some cases, lower it in the same way.<br /><br /> By taking a closer look at the formula proposed above,
you'll notice that the maximum value of PR(A) is by no means equal to 10, as it depends on how many pages link to A and
how many outbound links there are on such pages. As a matter of fact, people started speculating on the nature of the
scale used for PageRank: on the toolbar it ranges from 0 to 10, while in reality a PageRank 10 (take Microsoft.com for
example) should correspond to <em>some millions</em> in practice. <br /><br /> The most accredited theory is that the
PageRank displayed on the green bar is the result of a sort of correspondence between real values and such 0 to 10
scale. Also, people suggested that such scale is in fact a base 5 (or 6) logarithmic scale. This would explain for
example why it takes much longer to acquire PageRank 7 from PageRank 6 than acquiring PageRank 3 from PageRank
2.<br />For the non-mathematical minds, a <em>logarithmic scale</em> is a succession of numbers NOT incremented by "1"
or a fixed quantity, but by an always-growing exponential factor: taking a base-10 logarithmic scale, values of 1,2,3
would correspond respectively to 10^1, 10^2 and 10^3 (10, 100, 1000).<br /><br /> For a long time Google seemed to use
PageRank as an important factor for getting first places in search results, and it's still partly true: if you search
for the keyword "Italy" you're likely to find some high PR sites as first results.<br /><br /> This resulted in all the
possible forms of speculations: webmasters started asking money for publishing links on high PR pages, and similarly SEO
experts started adopting various infamous tactics to obtain a high PageRank for their customers: this includes, for
example, <em>link farms</em>[9].<br /><br />It's now clear that what is was believed to be a solution relying on the
<em>uniquely democratic nature of the web</em> turned out to be a complete failure in that sense, because the very basis
of the concept is wrong. Sad, but true, the WWW is by no means democratic at all. <br /><br /> Another complaint against
PageRank was that new sites took ages to acquire <em>respectable</em> PageRank and therefore appear on the top of search
results, no matter how wonderfully they were written. This is still partly true, as anyone can notice by searching
Google, but the algorithm itself is continuously being tweaked both for stopping spammers and link farms, and also to
favour those sites which provide relevant and appropriate content and are not up to some dodgy trick; I must admit that
the situation is gradually getting better.<br /><br /><br /><strong>Case Study: ItalySimply.com and
    h3raLd.com</strong><br />I'm now going to discuss my own personal experience with PageRank applied to my two
websites, ItalySimply[10] and h3raLd Labs[11]. While the second one is not currently advertised or promoted, because at
the moment I don't have enough time for other web developing projects, with the first one I tried to follow a <em>SEO
    Strategy</em> trying to acquire PageRank and good placement in search engines.<br />You can see the result yourself:
ItalySimply acquired PageRank 5 and h3raLd PageRank 4: not bad at all considering they are both two relatively new
websites, ItalySimply being officially born in August 2004 and h3raLd Labs actually had some serious content from April
2005 on. <br /><br /> For ItalySimply, I even experienced a period of <em>PageRank 0</em> which lasted about 2 months:
although according to Google all websites should have at least PR1, PR0 is used to penalize some <em>unusual</em>
behaviour which in my case was a <em>302 - Temporarily Moved</em> redirect which was necessary to redirect users to a
subfolder of the server. Later on I learned how this can be interpreted as a dodgy redirection by search engines[12],
and why I was penalized by Google for this with a PR0. After noticing the mistakes, I immediately started a strategic
link campaign; obtaining links from some good sites (also with high PR) related to mine, and PageRank for ItalySimply
began to grow, from 0 to 3, then 4, and just recently 5.<br /><br /> At the same time, I re-designed h3raLd.com and
noticed that it acquired PR1, because it was already listed in Google and didn't get any <em>vote</em> from other sites.
I then decided to put a link to h3raLd Labs on <em>every</em> page of ItalySimply, which are now ranging from PR5 to
PR2. <br /><br /> The result was an immediate growth of h3raLd.com in terms of PR, which reached an acceptable 4 without
<em>any</em> link swapped, banner displayed on behalf of other sites, or anything as such. <br /><br /> The difference
between the two sites though is much bigger than 1 point on PR, in terms of placement in search results: ItalySimply has
some relatively interesting content and various pages, and it ranks good enough on MSN and Yahoo, and even Google, to an
extent; h3raLd.com has just 4 pages and doesn't seem to appear at all in search engines, unless you search for something
like "h3raLd". Again, this is a proof that nowadays PR doesn't mean immediate placement on the top of search
results.<br /><br /><br /><strong>Final Considerations</strong><br /> Although PR is by no means the unique factor to
determine search engine placements, it's still certainly important as a <em>co-factor</em>. As I said, it's still
extremely difficult for a new page with low PageRank to place before a high-ranked one. Surely, if I decided to put
something more interesting on h3raLd.com I would get better results than buying a new domain and creating a new site:
old sites with high PR are still <em>naturally</em> inclined to rank better than new ones. Got that? Now, all you need
to do is buy a really stupid domain name and create some pages for it, then think about it like a bottle of whisky; let
it age for a while making it get some respectable rank: when you have a clever idea you'll have your ready-made place to
promote it!<br /><br /><em>In Google we trust!</em><br /><br /><br /><br /><strong>Sources and related
    links:</strong><br /><br />[1] Google Toolbar, <a
    href="http://toolbar.google.com/">http://toolbar.google.com/</a><br />[2] Stanford University, <a
    href="http://www.stanford.edu/">http://www.stanford.edu/</a><br />[3] Lawrence Page and Sergey Brin, "The Anatomy of
a Large-Scale Hypertextual Web Search Engine", Computer Science Department, Stanford University, <a
    href="http://www-db.stanford.edu/~backrub/google.html">http://www-db.stanford.edu/~backrub/google.html</a><br />[4]
Google, <a href="http://www.google.com/">http://www.google.com/</a><br />[5] Google Technology, <a
    href="http://www.google.com/technology/">http://www.google.com/technology/</a><br />[6] List of PageRank 10 sites,
<a
    href="http://www.searchenginegenie.com/pagerank-10-sites.htm">http://www.searchenginegenie.com/pagerank-10-sites.htm</a><br />[7]
Ian Rogers, "The Google Pagerank Algorithm and How It Works", IPR Computing Ltd. <a
    href="http://www.iprcom.com/papers/pagerank/index.html">http://www.iprcom.com/papers/pagerank/index.html</a><br />[8]
Pagerank, Wikipedia page, <a href="http://en.wikipedia.org/wiki/Pagerank">http://en.wikipedia.org/wiki/Pagerank</a>
<br />[9] Link Farm, Wikipedia Page, <a
    href="http://en.wikipedia.org/wiki/Link_farm">http://en.wikipedia.org/wiki/Link_farm</a><br />[10] ItalySimply -
Italy Real Estate Services and Relocation Help, <a
    href="http://www.italysimply.com/">http://www.italysimply.com/</a><br />[11] h3raLd Labs - Freelance Web
Development, <a href="/">/</a><br />[12] "The Rundown on 301 and 302 redirects", September 10th, 2004, <br /><a
    href="http://www.rankforsales.com/seo-articles/301-and-302-domain-name-redirects.html">http://www.rankforsales.com/seo-articles/301-and-302-domain-name-redirects.html</a><br />