Majestic Gives Away A Million – A Majestic Million!

By Steve Pitchford December 25, 2011

As it is Christmas Day, Majestic SEO is releasing data on the top one million websites in a Creative Commons sharealike license, downloadable CSV file, allowing web users to create derived works and research  (subject to attribution). The files are available at the end of this post.

Majestic SEO launched Majestic Million on May the 19th, and it has caused ripples of interest from time to time, and has found a nice niche to power Buzz League Tables.

 

We have altered the algorithms behind Majestic Millions, generating the list on the Referring C-subnet count rather than the Domain Count. This has resulted in a shift of the top ten, with an increase in the number of well known domains in the Majestic Million.

Today though, we thought we would do something different. Majestic has had a long history of making our data publicly accessible, and we would like to think that it has bought us a certain amount of goodwill in the wider internet community. So we have a surprise gift for the internet analytic community ( and who knows – perhaps some statisticians also ) and are making a snapshot of the entire Majestic Million List available to download.

As a sanity check, we ran a couple of plots using the Statistical Computing package “R”:

A graph of referring C-subnet count against Majestic Million Rank:

 

Again, but just for the top 250:

And a Graph of the referring IP Address count against the C-subnet count:

We would love to hear about any conclusions you come to using the data – so what are you awaiting for – Downloadable in Excel or TXT below:

[ download Excel file here  NB: 1,000,000 records in an Excel file is 60 MB. You need a modern version of Excel. Save to Disk first]

[ download full file here This is the 25 MB .TXT file ZIPPED, Tab delimited and much smaller - but it is still a million lines of data!]

Creative Commons License
This Majestic Million Data is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

 

If you would like to use this data or re-release it, you should reference Majestic SEO as follows, providing a link to this blog post should the medium support it:

Backlink Data sourced from the MajesticSEO.com public release of Majestic Millions Dataset – generated on 22nd December 2011

Posted In: Research

65 Responses to “Majestic Gives Away A Million – A Majestic Million!”

  1. Marco said:

    December 25, 2011 at 9:03 am

    Hm i can’t seem to find the download link?!?

  2. Ana said:

    December 26, 2011 at 12:01 pm

    Thanks for sharing!

    The download links are missing. I’m looking forward to starting to play with the data :)

  3. 512banque said:

    December 26, 2011 at 1:17 pm

    The download links are not working, aarrgh

  4. Greg said:

    December 26, 2011 at 4:30 pm

    It would be a great gift if the download links actually worked. :)

  5. Dixon said:

    December 26, 2011 at 6:37 pm

    We will get that fixed!

  6. Dixon said:

    December 26, 2011 at 6:45 pm

    OK – All fixed now (we hope). I do apologise for not having these links working on Christmas morning! But it’s all there now :)

    Dixon.

  7. Jan-Willem Bobbink said:

    December 27, 2011 at 12:08 pm

    Thanks for sharing the list!

  8. aem said:

    December 30, 2011 at 8:51 pm

    i like it … thanks for sharing !

    it will be usefull !

    Sebastien

  9. James Todd said:

    January 05, 2012 at 1:46 pm

    This is a really useful list – Loving the work from the Majestic SEO team!

  10. Evert said:

    January 10, 2012 at 1:46 pm

    Hmm, where it says ‘This is the 9.7 MB .TXT file’ one actually gets a 23.3MB .ZIP file, containing a 276 MB textfile!

    • Dixon said:

      January 10, 2012 at 2:05 pm

      You are right. I have amended. Thanks.

      Obviously – giving away 1 million lines of data is not a small download. I managed to open the Excel verso it in the latest version of Excel on my PC though. Earlier versions of Excel will not cope with a million records.

  11. Raymond Theakston said:

    January 10, 2012 at 2:48 pm

    Whoop! My own website is listed at 709159! :)

    • Dixon said:

      January 10, 2012 at 2:54 pm

      Heh. OK – we won’t take away your comment link, just in case.

  12. Dirk - seorie.net said:

    January 10, 2012 at 3:16 pm

    That is quality! Data to play with in 2012 – a good start in my mind. Thanks.

  13. dmozFR said:

    January 10, 2012 at 3:57 pm

    dmoz.fr is going up ! we are on the middle of the list of all TLD and also on the good middle of the .fr list.
    I will put the Majestic Badge on our homepage.

    • Dixon said:

      January 10, 2012 at 4:17 pm

      That’s great! We’d love to see our Majestic Million badge up on a few more sites. I really should have mentioned it in the post! So if anyone else wants to advertise that they are on the list (and where) they can grab a Majestic Million badge here.

  14. Dwight Zahringer said:

    January 10, 2012 at 4:03 pm

    Thanks Dixon and the team @ Majestic – great data to help with many, many things.

  15. Rich Rankin said:

    January 10, 2012 at 4:21 pm

    Thanks for giving us access to all this data – going to make a cup of tea whilst Excel decides if it’s going to let me look at it!

  16. BobbyJ said:

    January 10, 2012 at 8:24 pm

    Thanks for this data. I really need to start using ‘R’ as well. SPSS is just too expensive. Cheers!

    • Dixon said:

      January 10, 2012 at 8:32 pm

      Yes – I know we use “R” here… But as I also have a machine that can load a million records into Excel, I’ll stay there for a bit longer yet!

  17. Petz said:

    January 10, 2012 at 9:25 pm

    Great gift – thanks for access!
    ^_^

  18. Matt said:

    January 10, 2012 at 9:42 pm

    Thanks for the share!

    I thought example.com would be closer to the top.

  19. Kane said:

    January 10, 2012 at 9:48 pm

    thanks for the data, I’m going to use this to start building page rank of my sites

  20. Paul said:

    January 10, 2012 at 10:32 pm

    Thanks for the data!
    Looks like its UTF-16 encoded. I used
    $ iconv -f UTF-16 -t UTF-8 majesticmillion-20111222.txt > utf8.txt
    to work with it on a Mac Terminal.

    Cheers,
    Paul

  21. livingseolife said:

    January 10, 2012 at 10:38 pm

    Great Gift! Thatks…Awesome playing with these data!

    Thanks MajesticSEO.

  22. Tesch Online Marketing said:

    January 10, 2012 at 10:53 pm

    Thanks a lot for sharing such valuable data.
    I was a little surprised how well some of the sites are positioned using your metric.
    A great piece of date collection and analysis.

  23. Simon said:

    January 10, 2012 at 11:03 pm

    Very interesting stuff… although I see two sites I have worked on in the past one mid 200′s one mid 300′s spot. The one at mid 200′s actually performed quite poorly in terms of non-brand natural search traffic compared to the mid-300′s site. Emphasises for me that its quality over quantity. Thanks for sharing!

  24. gaonianyu said:

    January 11, 2012 at 12:33 am

    it’s gift.thanks

  25. AC said:

    January 11, 2012 at 2:09 am

    You could significantly reduce the file size (141MB) by converting to UTF8 or ANSI. And further by removing the redundant dates and standardized links (41MB). Just publish a second file with the data or put it on the site, demonstrating the linking format. The file name is sufficient for determining the date.

    Nice to see something coming back to the community for all the pounding your bot has been giving everyone.

    • Dixon said:

      January 11, 2012 at 7:39 am

      Thanks. It’s a big world out there, so ANSI won’t cut it I’m afraid. We did think about also putting up a shorter list, but at the end of the day, we felt that the whole list was the best. I am sure others can slice and dice the data. You are welcome to do so and pass it on in other formats.

  26. Glenn Grifffin said:

    January 11, 2012 at 4:30 am

    Thanks guys.
    It is all starting to come together.

  27. Drachsi said:

    January 11, 2012 at 5:16 am

    Shame, my site is not on the list, I really want to put a badge on my site. At least everybody downloading must have the latest Excel.

    Drachsi

  28. Manjul Singh said:

    January 11, 2012 at 5:26 am

    Great list, great work.

  29. Voiliers said:

    January 11, 2012 at 8:13 am

    Thanks guys this is a goldmine – really good to be able to see how my (small) sites shape up to the big boys

  30. Renny said:

    January 11, 2012 at 9:05 am

    Good going, that is a very big list and I find some opportunities over there.

    • Dixon said:

      January 11, 2012 at 9:16 am

      Thanks. If you think the top Million as a giveaway is big, you should see the computers that handles the rest of it! This is part of it.

  31. charly @ md marketing digital said:

    January 11, 2012 at 9:06 am

    Hey thanks!!!
    Ill rush to have a look and drop comments if I manage to have them :)

  32. Andy @cruisesgalveston said:

    January 11, 2012 at 10:54 am

    This is a massive list. I’m a market samurai pro users and regularly check competition using your sources. This list give me big ammunition or I can say weapon :)

    • Dixon said:

      January 11, 2012 at 10:58 am

      I like the guys at Market Samurai so much, I linked your comment :)

  33. riple said:

    January 11, 2012 at 11:48 am

    I hope this one will have a good ability as previous Yahoo site explorer

    • Dixon said:

      January 11, 2012 at 1:35 pm

      Oh – Majestic left Yahoo Site Explorer for dust years ago – but you should use our web interface with a silver subscription to get the full “Site Explorer” experience. This giveaway is “just” a list of the top 1 million sites. Majestic’s Site Explorer is just scratching the surface of the full data you can can from Majestic.

  34. CCTV.co.uk said:

    January 11, 2012 at 12:13 pm

    Thanks guys, already use MajesticSEO find it an awesome tool.

  35. neil said:

    January 11, 2012 at 1:33 pm

    One of my sites is in the list,
    just made the Million Badge
    and posted it :)
    You just made my day Dixon.

    Thank you for the data, much appreciated.

  36. Micca said:

    January 11, 2012 at 2:21 pm

    Thank you!! Micca is very grateful for your services as Costa del Sol’s number 1 Solution provider.

  37. bellimbusto said:

    January 11, 2012 at 2:56 pm

    Great gift guys! Thanks for sharing this list!

  38. Cozy Web said:

    January 11, 2012 at 3:12 pm

    I have used Majestic for about a year now including through Market Samurai.

    It looks like it’s time to upgrade to a paid subscription.

    Mark

  39. Igor Mateski said:

    January 11, 2012 at 5:47 pm

    And there, just for a brief moment, I was so close to delete the email where you linked to this post. That would have been a big mistake, one that I, luckily, did not make.
    Thanks for sharing the info. Definitely will come in handy. In so much data I guess one can push and support any theory.I, being a Content Marketing evangelist, most definitely see some links of sites that are content-heavy and will definitely use them in my posts.
    (I see that whoever’s in charge of this blog takes good care of the discussion, so I won’t post any links)

  40. Top Search said:

    January 11, 2012 at 7:42 pm

    Christmas again? Super – nice to see a few of our sites in the too – very flattering, a million is a lot – but not on the web.

    Nice work!

  41. Patrick Page said:

    January 11, 2012 at 9:03 pm

    Thank you I hope to put this to good use.

  42. Blue Jet said:

    January 11, 2012 at 11:25 pm

    Thanks guys already use Majestic through Market Samurai. Great tool

  43. John Mauldin said:

    January 12, 2012 at 12:06 am

    Thanks so much for the information. It is very usable.

  44. BESegal said:

    January 12, 2012 at 12:27 am

    Hey guys. Thanks for making the data set available. I started working it up using a data package that can produce results similar to what you do with R.

    And if I find anything interesting am happy to share it either here, or perhaps in the link provided above where I post web analytic studies I do.

    Anyway, I’m thinking we might uncover some insights if we can add some dummy variables to the data set, such as content vertical. It’d be interesting to see if the rank curves look the same across search sites, social medial sites, digital news sites, etc. I believe that theoretically a subset of a power curve like you show above should generate a power curve too. Wonder if it’d work out that way by content vertical? Also curious to see if the tail of the curve has any common characteristics.

    Any thoughts on where I might be able find a data set like that so I can join it to yours? Rather than manually adding the field to 1 million records or some smaller part. ;-)

    • Dixon said:

      January 12, 2012 at 8:05 am

      To find that, you’ll need a massive rank checking system. AuthorityLabs comes to mind. But you also need to select a keyword for each vertical, which would break up any structure to the logic wouldn’t it?

  45. Warner Robins Homes said:

    January 12, 2012 at 10:34 am

    Lots of options with this data, to include a great backlink (starting point) source. Thanks.

  46. Helmuts said:

    January 12, 2012 at 11:39 pm

    Great resource. I just wonder how many websites of here listed ones can get such a ranking.

    Helmuts

  47. David Victor said:

    January 13, 2012 at 12:01 am

    Great post, usefull informations is here, thanks Mejstic Seo!

  48. krishnan said:

    January 13, 2012 at 7:32 am

    Thanks for the data. Just wanted to let you know that the link in the mail does not open with some email providers. Like I was not able to open the link sent on hotmail and had to forward it to gmail to open it. I suggest having the full url in the mail rather that just the anchor text link.

    • Dixon said:

      January 13, 2012 at 9:25 am

      OK thanks. That is useful to know. I guess you also can’t click on the “click here to read the web version” link in the email either. I will put a full link to the web version in future newsletters.

  49. matka said:

    January 13, 2012 at 8:47 am

    Wow, this is great! Just downloaded it and have to say thanks!

  50. honorabili said:

    January 13, 2012 at 4:23 pm

    Interesting stats.

  51. Jonathan - FeelGoodTime said:

    January 13, 2012 at 4:25 pm

    Hi guys,

    This is amazing resource and great stats, thanks a lot. I will make sure I put a Majestic Badge on my site.
    Thanks a lot.

  52. Pete said:

    January 19, 2012 at 2:58 pm

    Hope I find my site on it!

  53. masarf said:

    January 21, 2012 at 1:08 am

    Thank you I hope to put this to good use.

  54. sms reseller said:

    January 21, 2012 at 4:34 pm

    Does anyone know if we can have these filtered by country? Because a lot of the companies that I SEO optimize are local(state of missouri).

    • Dixon said:

      January 22, 2012 at 4:37 pm

      They are recorded and ranked by TLs. But you would need to do an IZp lookup for that.

      The problem is that some of the bigger web sites have multiple IP addresses, across several countries, potentially, to spread the load.