PageRank, TrustFlow and the Search Universe

By Neep Hazarika July 7, 2014

Who’s Most Famous?  shutterstock_118345891

A recent research study was carried out by Young-Ho Eom of the University of Toulouse with the objective of determining the most influential person on Wikipedia. The inference was rather surprising, and at odds with the prevailing paradigm, in that it concluded that the Swedish botanist Carl Linnaeus was more influential than either Jesus or Hitler. An explanation for this rather atypical assessment can be attributed to the approach employed in the study, namely, the adaption of Stanford University’s PageRank (PR) algorithm to calculate the number and value of incoming links to any given article.

What is PageRank?

shutterstock_103773032PageRank is one of the methodologies that Google uses to determine the relevance or importance of a site. The PageRank metric was developed by Google’s co-founder Larry Page and Sergey Brin during their time at Stanford University. This ranking procedure has drawn a great deal of attention from researchers in various fields due to its importance in the evaluation of webpage performance. Most preceding analyses attempted to resolve the problem with either a subjective approach, based on expert survey metrics, or an objective approach, based on citation-based metrics. Both methodologies have their own advantages and disadvantages, and they are usually complementary.

PageRank does indeed provide a good approximation to the importance of a webpage. However PageRank may not provide an accurate evaluation of new websites, many of which may contain relevant information, because of a lack of backlinks pointing to the site. Further, since PageRank does not analyse the web page content, the inbound links to a particular page may carry descriptions of topics which may not be pertinent to queries because of the classification of webpages by topics. In simple terms, Pageshutterstock_113651905Rank is a numeric value assigned to a website depending upon the importance the algorithm places on unique content, such as backlinks, site structure, anchor text, etc. Thus, there is no guarantee that a site with a high PR will automatically acquire a high position in terms of the relevance to a particular topic or query.

Why PageRank went wrong in this study

Something unusual must have happened in the calculation of PageRank in this study, because the result showing a botanist as bigger than Jesus does not seem to hold merit.  In particular, the integrity of a link based algorithm depends, to a large extent, upon no one person or effect being able to unduly unbalance the data. In this case, it appears that the calcualtions were carried out entirely on pages within Wikipedia, and ignored external links pointing into the site. Whilst this is the only realistic way to do the work without using a global index like Google or Majestic, it demonstrates the need for global data sources when considering a subset or segment of the web universe.

Different algorithms that calculate both incoming and outgoing links can give rise to different effects. Further, the results can be influenced by the cultural and linguistic contexts within which these studies are undertaken. In addition, the constantly varying evolution of the Wikipedia content can also have a discernable effect upon the outcomes, and therefore upon the conclusion reached. Re-indexing by Google can also have an influence on the current PageRank of a particular site.  In particular, PageRank does not provide any indication regarding the content or size of a page, the language it’s written in, or the text used in the anchor of a link.

Comparative Studies

In this article, we revisit the study of the most influential person on Wikipedia. We use two other comparative metrics, namely, our very own MajesticSEO Topical Trust Flow and MOZ’s opensiteexplorer.com. Majestic SEO’s data is developed from the ground up by crawling the entire web (not just Wikipedia) and applying its own proprietry metric instead of the PageRank algorithm. OpenSiteExplorer also uses its own metric, which is not public in how it is totally derived, but is believed to be in part influenced by the Search Engine prominence of a URL and therefore is likely to correlate well with PageRank as calculated by Google on a worldwide index.

Both Comparative methodologies return Jesus as the most influential person.

Figure 1 shows the results of the page specific metrics as computed by MajesticSEO’s Site Comparator Tool for 24 June, 2104. The influence list for Wikipedia is, in order, Jesus, Hitler and Linnaeus, with Trust Flows of 56, 56 and 50 respectively. Indeed, in terms of the number of metrics, the values for Jesus in the Wikipedia entries greatly outnumber those for the other two.

Majestic

Left: Jesus; Middle: Hitler; Right: Linnaeus

Figure 1: MajesticSEO Site Comparator Tool Statistics

The MOZ metrics also corroborate the MajesticSEO statistics for the same Wikipedia entries, as displayed by the Page Authority scores in Figure 2.

MOZ

Left: Jesus; Middle: Hitler; Right: Linnaeus

Figure 2: MOZ Page Specific Metrics

Next, we consider individual Trust Flows (TF) and Citation Flows (CF) using MajesticSEO’s Site Explorer Tool to determine the Trust and Citation Flows for each of the aforementioned Wikipedia entries. Figures 3, 4 and 5 provide details of the inbound link and site summary data for Wikipedia entries referring to Jesus, Hitler and Linnaeus respectively. Again, the statistics support our ranking order as Jesus, Hitler and Linnaeus. Note the concentration of topics for each of these entries. The general topic “Society” seems to dominate the composition of the Topical Trust Flows for Jesus and Hitler, while “Science” leads that of Carl Linnaeus, which is not surprising, given that he was a botanist, physician, and zoologist.

Jesus

Figure 3: Site Summary Data for Wikipedia Entry “Jesus”

Hitler

Figure 4: Site Summary Data for Wikipedia Entry “Hitler”

Linnaeus

Figure 5: Site Summary Data for Wikipedia Entry “Linnaeus”

Finally, we compare a composite list of the Topical Trust Flows for these Wikipedia entries, as displayed in Figure 6. Again, the MajesticSEO data provides the rankings as

  1. Jesus has a TF of 56 and a CF of 55;
  2. Hitler has TF of 56 and a CF of 54;
  3. Linnaeus  has TF of 50 and a CF of 50.
MajesticTopical

1: Jesus; 2: Hitler; 3: Linnaeus

 Figure 6: MajesticSEO Bulk Backlink Checker Results

Conclusions

This study provides evidence that MajesticSEO’s view of “importance” based on spatially understanding the whole universe of URLs instead of analysing just a site or subset such as Wikipedia is a stronger methodology for determining the ranking of Wikipedia’s influence list.

**Sign up to Majestic Insights for more**

If you enjoyed this research, you are welcome to join Majestic Research – a free service that will tell you when we produce more in-depth data, such as industry reports. Users signing up get our Twitter top 50,000 list as well. Registering is easy over here.

Posted In: Research

11 Responses to “PageRank, TrustFlow and the Search Universe”

  1. Colin said:

    July 07, 2014 at 2:21 pm

    I don’t think I’ve ever seen Jesus and Hitler used to show how good a product performs before now! When measuring inbound links, how is relevancy measured? I understand how trust, citation, authority etc can be measured, but not relevancy. And Sir Google informs us that this is just as important

    Reply

  2. Colin said:

    July 07, 2014 at 6:37 pm

    Thanks Dixon. The accuracy isn’t perfect but it does give a really good indication. Another metric to use!

    Reply

  3. jocuri cu cai said:

    July 07, 2014 at 9:00 pm

    Is it true that your trust flow metric is better than the page rank? so i’ve heard from a co worker. thx!

    Reply

    • Dixon Jones said:

      July 07, 2014 at 10:18 pm

      That’s not what this study shows. This study suggests that using connections within an ecosystem is not as accurate as using the entire universe.

      However, there are several advantages of Flow Metrics over Page Rank.
      * an extra degree of granularity
      * more regularly updated (compared to Google’s calculation)
      * available on demand
      * available by topic
      * accessible in bulk and by API
      * available by keyword as well as by URL
      * available at the domain level, not just by page level.

      Reply

  4. Sam Mudra said:

    July 10, 2014 at 3:41 pm

    I used majesticSEO in my last organization and truly it is a really good and reliable tool for search engine ranking/analysis. I am not sure whether TF is better or PR because we always see many websites with low PR or TF overtake high PR/TF sites. Does these all really matter today?

    Reply

  5. M Aamir said:

    July 14, 2014 at 12:50 am

    I just try to use majestic its great tool and as far as your above points about page rank and other SEO discussion its informative keep it up. Thanks

    Reply

  6. Susanta said:

    July 30, 2014 at 1:36 pm

    Hello Dixon,

    Today using your tool I came to know about this. What to do if my links are coming from poor sites. I am noticing sudden huge fall in my incoming traffic. Probably may be due to this. Kindly guide.

    Reply

    • Dixon Jones said:

      July 30, 2014 at 1:49 pm

      We do not offer one-to-one consultancy. Here are a few tips that may apply to you:
      * Don’t use a free web hosting service if you want to be taken seriously
      * Don’t use a free email address if you want to be taken seriously
      * Don’t try to put your website in comments on blog posts about link building. I took the liberty of taking your blogspot link off your comment.

      Reply

  7. Deepak Sharma said:

    August 20, 2014 at 9:10 pm

    Makes lots of sense. Another evidence that link building is not dead. In fact, its even more important today than anything else. However, crappy links are not getting much value. Do not buy links or post links in link directories. Focus on good quality articles that add value.

    Reply

    • Dixon Jones said:

      August 20, 2014 at 9:23 pm

      I think that’s right. There is so much more “intel” measurable in a link than most SEOs realize.

      Reply

Leave a Reply