The Importance of Trusted Data

By Dixon Jones December 10, 2012

Irony? The writing was on the wall.

Recently, Google blocked a number of SEO tools from using their AdWords API subject to a “review” which – for many tools – will never be passed, crippling them moving forward. Whilst the AdWords API is designed for PPC tools, SEOs use the data to help plan their organic strategy.

You can trust Majestic SEO to continue to provide a secure level of service. Majestic does not face these threats from third party data suppliers because we do not scrape Google and we collect our data through first principals obeying laws of the land. Where we do use third party data, we take steps to protect ourselves and our customers from possible outages or legal issues. We DO crawl some content [EditAdd 4 Mar 2012: on some Google owned sites] , in the same way Google crawls ours – but only when allowed to do so by Google via Robots.txt.

Because of Google’s decision to withhold its API, some major tools like SEOmoz and our great partners, Raven Internet Marketing Tools, have been affected. Raven Tools has taken what we consider to be a very honest and brave decision to stop rank checking moving forward to be able to continue with the Adwords API from the 1st January. They did so after this clear message from the Search Giant:

Message From Google to RavenTools

Majestic SEO feeds link data to most of the major SEO Tools in the world today and many of these good partners will continue to do rank checking for as long as they are able. The decision by RavenTools, however, marks a clear business decision by Patrick Keeble and his team to build a business which – when it comes to due diligence – will be a business of genuine net worth rather than one potentially built on a flaw which might one day get exploited by lawyers.

One major competitor in legitimate link data collection is SEOmoz itself with their Open Site Explorer product. SEOmoz have in the past looked to avoid conflict with Google – in particular dropping the large scale collection and display of PageRank pulled from or calculated in parallel to Google’s servers. SEOmoz instead developed MozRank and other metrics to analyze URLs from first principals. A similar strategy was also adopted by Majestic (who never pulled PageRank) in the spring of 2012 with the development of Flow Metrics™ . However – the SEOmoz customer base may be more interested in Ranking data and http://www.seomoz.org/q/loss-of-google-adwords-api suggests that at this time they feel that keeping rank checking tools is more important than keeping the Google AdWords API. Any decision, at this point, is  brave one for SEO technologies who wish to track rankings.

The Elephant in the Room

Open Source elephant-by-linuxien from Open Clip Art

As Majestic SEO looks at its competitors in the Link Intelligence space, we do not believe some competitors took their contractual responsibilities seriously out of the gate. We have rarely mentioned AHrefs, for example, who (by our estimation) must be scraping Google at monumental proportions to try to create ranking data on millions of keywords. Presumably, if Raven Tools have taken the step to withdraw SEMRush data, then AHrefs must be seen by Google to be in the same camp – breaking the Terms of Service on Google to a level which must be costing serious money for the search giant to maintain connectivity and bandwidth to real users as they manage the scraping issues. [Editor note] Since this post was published, AHrefs have announced that they will be withdrawing ranking data as well.[/Editor note].

Aren’t there other Link Data suppliers?

Well here’s the thing. Other than scraping a search engine like Google, or buying it from someone else, the honest way to collect data is to crawl the web from first principals. In order to do that ethically, the accepted principal is for crawlers to obey robots.txt standard in the absence of a more explicit agreement with the website owner. To do this, a crawler needs to positively identify itself. We spend time looking at crawlers and their behavior, and very few crawlers are large enough to develop a meaningful link graph. So anyone claiming to have data may be getting the data from sources which – in the final analysis – may be breaking any number of protocols if not laws. Of course, there are crawlers of this scale, which do identify themselves. Yandex, Microsoft and Yahoo all crawl the web independently and can see similar link graphs as Majestic. Our crawl of the web provides independent verification that can be relied upon.

Has this happened before?

Oh yes – and it will happen again. It is not just Google either. In November 2011 Yahoo finally stopped its Site Explorer product. Until that point, the vast majority of low cost SEO Tools used this data to analyze links. The day Yahoo site explorer went offline, huge numbers of tools broke. Only those using data generated from first principals could continue. I can give you countless other situations. But it all boils down to this…

Who Can You Trust?

Some link intelligence and SEO tools are gaining popularity through the black hat community on forums. We have – to this point – chosen not to engage in these communities. But do not assume that this means there are better tools than Majestic SEO for collecting link data. Our reason for not engaging in these communities has been to protect our fledgling brand from being at the forefront of backlashes from corporations with bigger pockets and more lawyers than Majestic SEO. Using black hat tools is your choice – but we will continue to provide you with data you can trust as legitimately crawled and analyzed from first principals. We will continue to try to be the Biggest, Fastest and Freshest source of link data and to provide it at a competitive price point.

[Editor note] Since this post was published, AHrefs have announced that they will be withdrawing ranking data as well.[/Editor note]

Posted In: General

20 Responses to “The Importance of Trusted Data”

  1. Eliran said:

    December 13, 2012 at 5:34 pm

    Really nice article! Thanks for the great information about trusted data and tips.

  2. Jesse Kohl said:

    December 13, 2012 at 6:09 pm

    I use some of these tools every day but learned even more about them in this post. Thanks!

  3. Jeppe said:

    December 20, 2012 at 2:35 pm

    I am really going to miss raven tool, but I need a tool that can do the rank checking for me, it is simply way too time-consuming to do it by hand, and it is too important to my business to ignore.
    I will still use majesticseo to explore the sites and the link profiles of mine and of competitors sites though :-)

  4. Annett said:

    December 28, 2012 at 6:34 pm

    This is very interesting to see that Google even blocked SEOMOZ as they are widely regarded as THE authority in SEO. But congrats to you guy for providing a product that still passes the Google scrutiny!

  5. Vic DiNovici said:

    December 28, 2012 at 10:41 pm

    Google limiting access to information kinda sucks. But is happening with every company getting bigger – they don’t care anymore about other players.

    • Dixon Jones said:

      December 29, 2012 at 12:37 pm

      Depends on where you sit. I think it’s unfair to say they don’t care about other players. Another way to look at it is to say that we (as users) didn’t care that we were getting all this data free. If “free” is because if it is taken without permission, then I think it was the takng without permission, rather than the limitation that kinda sucks more.

      • Master Hughes said:

        December 29, 2012 at 2:23 pm

        > hey fella google steals data from every individual using the web every days to try and illicit information about them or build profiles of individual buying habits and use that inside infor against them costing indiviuals lot s of money, I wont explain how they do it but corporate america puts the screw to us dont stand up for them.

    • Dixon Jones said:

      December 29, 2012 at 6:40 pm

      The irony isn’t lost on me.>

  6. John said:

    December 28, 2012 at 11:43 pm

    Thank you continuing to provide a great (and reliable) product that allows people with a genuine interest in SEO to offer services to our clients that are also based on solid and reliable information.

  7. Floyd Florence said:

    December 29, 2012 at 1:07 am

    I don’t sell the relative kind of information that’s being discussed however, it might be worth noting that what searches that I do, do manually, get’s scrutized by G.(via random captcha) and sometimes even just searching for any term like anybody would do, it get’s scrutinized as well… I don’t even use a search engine everyday and when I do do it, for my personal seo information, it can be weeks apart but since my IP has been tagged I do get scrutinized… (admittedly, when I do this, I can search 100 words or more during a session, but to my knowledge, since I’m doing this manually, this shouldn’t be a problem with their terms…)

    So even on a minute scale Google is limiting what folks can do with their stuff…

    Had I been a business that did rely on this kind of information as a main staple of service provisions, I’d like to think I would’ve taken measures similar to Majestic (as I don’t knowingly use Blackhat et al, neither do I judge those that do, but I have read Google terms of service and it clearly warns about scraping and such so I have been very leery and limited, even on this minute scale, on what automation tools I will use).

    • Floyd Florence said:

      December 29, 2012 at 1:14 am

      > I should also mention, for clarification, that part of the scrutinizing I’ve alluded to, includes slow to very slow returned search results (which stimulates a browser freezing) making an already long and tedious job even longer and much more tedious.

  8. Rippcord said:

    December 29, 2012 at 4:06 am

    Wow, this is a real eye-opener. Makes me really think differently about who to trust, and how to do my due diligence.

    Thanks very much for this post!

  9. Uri Binsted said:

    December 29, 2012 at 5:59 pm

    Thanks for this very interesting article, Google is fighting its adwords competitores – SEO agencies, and this is another attempt to make it harder for us.

  10. Dexter said:

    December 30, 2012 at 11:41 pm

    Great Info! Especially with all the Google updates, it’s good to know we have a data source we can count on going forward. Keep up the good work!

    We really love your tool!

  11. mike said:

    January 04, 2013 at 12:20 am

    In my opinion the article confuses 2 type of data, search and backlink data which basically have nothing in common. Search data can basically only be retrieved from Google, Yahoo, Bing while backlink data can be collected by anyone that is ready to build an infrastructure big enough to collect and store the data. So in fact Majestic just offers totaly different data than those offering Keyword search volume related data.
    Also I personally disagree that it was a brave decision to stop collecting the data, besides SEO, search related data is vital for many other parts of the economy, science and much more and since Google is monopolizing it, it would have been brave to speak out and fight for it.
    Imagine the government would do the same with economic data, or stock exchanges keeping stock prices secret,or others with telephone numbers, addresses, images from space, images of streets…

  12. seoagent said:

    January 04, 2013 at 12:21 am

    Hi,

    Really nice article. Thanks for the great information about trusted data and tips – Adam!