PageRank, TrustFlow and the Search Universe
A recent research study was carried out by Young-Ho Eom of the University of Toulouse with the objective of determining the most influential person on Wikipedia. The inference was rather surprising, and at odds with the prevailing paradigm, in that it concluded that the Swedish botanist Carl Linnaeus was more influential than either Jesus or Hitler. An explanation for this rather atypical assessment can be attributed to the approach employed in the study, namely, the adaption of Stanford University’s PageRank (PR) algorithm to calculate the number and value of incoming links to any given article.
What is PageRank?
PageRank is one of the methodologies that Google uses to determine the relevance or importance of a site. The PageRank metric was developed by Google’s co-founder Larry Page and Sergey Brin during their time at Stanford University. This ranking procedure has drawn a great deal of attention from researchers in various fields due to its importance in the evaluation of webpage performance. Most preceding analyses attempted to resolve the problem with either a subjective approach, based on expert survey metrics, or an objective approach, based on citation-based metrics. Both methodologies have their own advantages and disadvantages, and they are usually complementary.
PageRank does indeed provide a good approximation to the importance of a webpage. However PageRank may not provide an accurate evaluation of new websites, many of which may contain relevant information, because of a lack of backlinks pointing to the site. Further, since PageRank does not analyse the web page content, the inbound links to a particular page may carry descriptions of topics which may not be pertinent to queries because of the classification of webpages by topics. In simple terms, PageRank is a numeric value assigned to a website depending upon the importance the algorithm places on unique content, such as backlinks, site structure, anchor text, etc. Thus, there is no guarantee that a site with a high PR will automatically acquire a high position in terms of the relevance to a particular topic or query.
Why PageRank went wrong in this study
Something unusual must have happened in the calculation of PageRank in this study, because the result showing a botanist as bigger than Jesus does not seem to hold merit. In particular, the integrity of a link based algorithm depends, to a large extent, upon no one person or effect being able to unduly unbalance the data. In this case, it appears that the calcualtions were carried out entirely on pages within Wikipedia, and ignored external links pointing into the site. Whilst this is the only realistic way to do the work without using a global index like Google or Majestic, it demonstrates the need for global data sources when considering a subset or segment of the web universe.
Different algorithms that calculate both incoming and outgoing links can give rise to different effects. Further, the results can be influenced by the cultural and linguistic contexts within which these studies are undertaken. In addition, the constantly varying evolution of the Wikipedia content can also have a discernable effect upon the outcomes, and therefore upon the conclusion reached. Re-indexing by Google can also have an influence on the current PageRank of a particular site. In particular, PageRank does not provide any indication regarding the content or size of a page, the language it’s written in, or the text used in the anchor of a link.
In this article, we revisit the study of the most influential person on Wikipedia. We use two other comparative metrics, namely, our very own MajesticSEO Topical Trust Flow and MOZ’s opensiteexplorer.com. Majestic SEO’s data is developed from the ground up by crawling the entire web (not just Wikipedia) and applying its own proprietry metric instead of the PageRank algorithm. OpenSiteExplorer also uses its own metric, which is not public in how it is totally derived, but is believed to be in part influenced by the Search Engine prominence of a URL and therefore is likely to correlate well with PageRank as calculated by Google on a worldwide index.
Both Comparative methodologies return Jesus as the most influential person.
Figure 1 shows the results of the page specific metrics as computed by MajesticSEO’s Site Comparator Tool for 24 June, 2104. The influence list for Wikipedia is, in order, Jesus, Hitler and Linnaeus, with Trust Flows of 56, 56 and 50 respectively. Indeed, in terms of the number of metrics, the values for Jesus in the Wikipedia entries greatly outnumber those for the other two.
Figure 1: MajesticSEO Site Comparator Tool Statistics
The MOZ metrics also corroborate the MajesticSEO statistics for the same Wikipedia entries, as displayed by the Page Authority scores in Figure 2.
Figure 2: MOZ Page Specific Metrics
Next, we consider individual Trust Flows (TF) and Citation Flows (CF) using MajesticSEO’s Site Explorer Tool to determine the Trust and Citation Flows for each of the aforementioned Wikipedia entries. Figures 3, 4 and 5 provide details of the inbound link and site summary data for Wikipedia entries referring to Jesus, Hitler and Linnaeus respectively. Again, the statistics support our ranking order as Jesus, Hitler and Linnaeus. Note the concentration of topics for each of these entries. The general topic “Society” seems to dominate the composition of the Topical Trust Flows for Jesus and Hitler, while “Science” leads that of Carl Linnaeus, which is not surprising, given that he was a botanist, physician, and zoologist.
Figure 3: Site Summary Data for Wikipedia Entry “Jesus”
Figure 4: Site Summary Data for Wikipedia Entry “Hitler”
Figure 5: Site Summary Data for Wikipedia Entry “Linnaeus”
Finally, we compare a composite list of the Topical Trust Flows for these Wikipedia entries, as displayed in Figure 6. Again, the MajesticSEO data provides the rankings as
- Jesus has a TF of 56 and a CF of 55;
- Hitler has TF of 56 and a CF of 54;
- Linnaeus has TF of 50 and a CF of 50.
Figure 6: MajesticSEO Bulk Backlink Checker Results
This study provides evidence that MajesticSEO’s view of “importance” based on spatially understanding the whole universe of URLs instead of analysing just a site or subset such as Wikipedia is a stronger methodology for determining the ranking of Wikipedia’s influence list.
**Sign up to Majestic Insights for more**
If you enjoyed this research, you are welcome to join Majestic Research – a free service that will tell you when we produce more in-depth data, such as industry reports. Users signing up get our Twitter top 50,000 list as well. Registering is easy over here.