Skip to content

Instantly share code, notes, and snippets.

@chilts
Created October 30, 2013 09:27
Show Gist options
  • Save chilts/7229605 to your computer and use it in GitHub Desktop.
Save chilts/7229605 to your computer and use it in GitHub Desktop.
Getting the Alexa top 1 million sites directly from the server, unzipping it, parsing the csv and getting each line as an array.
var request = require('request');
var unzip = require('unzip');
var csv2 = require('csv2');
request.get('http://s3.amazonaws.com/alexa-static/top-1m.csv.zip')
.pipe(unzip.Parse())
.on('entry', function (entry) {
entry.pipe(csv2()).on('data', console.log);
})
;
@chilts
Copy link
Author

chilts commented Oct 21, 2019

Ah thanks @garrett-leyenaar, that's good to know it's still being updated.

@chilts
Copy link
Author

chilts commented Oct 21, 2019

Interesting that today (2019-10-22) I re-ran my steps from https://gist.github.com/chilts/7229605#gistcomment-2880207 and noticed that I only get entries 1 to 647605 entries printed out. So I downloaded the .zip file itself, and sure enough it doesn't have any entry after that. Whether it's a one-off problem today, I dunno. :)

$ unzip top-1m.csv.zip 
Archive:  top-1m.csv.zip
  inflating: top-1m.csv              

$ tail -n 10 top-1m.csv
647596,nic.xn--bck1b9a5dre4c
647597,not3.io
647598,omyfashiona.com
647599,otcbtc.com
647600,quranapk.com
647601,stylewithlife.com
647602,thecityvacation.com
647603,transferdmc.com
647604,uspersonality.com
647605,villasbeachfront.com.mx

@yosunga
Copy link

yosunga commented Oct 30, 2019

www.ktservis.com.tr used to mirror this file but I think they removed it because of copyright issues.Any other mirrors ?

@rustyspoonz
Copy link

www.ktservis.com.tr used to mirror this file but I think they removed it because of copyright issues.Any other mirrors ?

No need for a mirror, the file is still available using the URL from the script: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

@yosunga
Copy link

yosunga commented Dec 3, 2019 via email

@hamlatzis
Copy link

The alexa zip file contains only 839000 entries

1m != 839000

@mikej165
Copy link

As of today, the Alexa "one million" contains 547855 entries. Very strange.

@meeeller
Copy link

Today is 763k. Last summer it started being short of "one million". I am here again trying to figure out why.

We used Alexa in the past, still can't find anything on why it so short of 1 million. Good paper on T1M rankings pdf

@vladimarius
Copy link

vladimarius commented Sep 16, 2020

Alexa no longer provides that list for free.
You can download the list using their API.

The price is:
Alexa Top Sites API Requests (1 unit = 10 URLs returned) | $0.025 / unit

So for 1 million domains you'd pay 0.0025 * 1000000 = $2500 😃😃😃

@chilts
Copy link
Author

chilts commented Oct 1, 2020

💵 💲 Thanks for that info @vladimarius ... good to know it is still available. I imagine people will find other sources though with that price! Thanks again.

@seupedro
Copy link

still work alexa at 2021

@cameck
Copy link

cameck commented Apr 8, 2021

Just spoke with Amazon about this. There's no guarantee that the free list contains all 1 million, but it is still updated daily.

@chilts
Copy link
Author

chilts commented May 5, 2021

Thanks @seupedro and @cameck, always good to know that it's still working and the CSV is available. I wonder if the script still works. I'll try it again sometime soon and paste back here what worked or didn't and an update if needed.

@leilii
Copy link

leilii commented May 12, 2021

Hi,
how to get for example 10 top list into a text file not all?

@d668
Copy link

d668 commented Jun 21, 2021

the file now ends at 427k

@Waseemghafoor474
Copy link

CSV file is working again! Nice!
The data is not exactly up to date. I would say about 2 months. I have a site in the current the 67,000 positions today, and is in the lists 78,000s
Also how to get for example 10 top list into a text file not all?

https://stainely.com/

@xysecurity
Copy link

425k for 2021.10.11

@tomwojcik
Copy link

tomwojcik commented Dec 8, 2021

@snowman
Copy link

snowman commented Dec 9, 2021

We will be retiring Alexa.com on May 1, 2022

https://support.alexa.com/hc/en-us/articles/4410503838999

Note, this is the last chance you can backup things

@ao
Copy link

ao commented Dec 14, 2021

With the Alexa top 1 million CSV/ZIP going away shortly, you can use https://statvoo.com/dl/top-1million-sites.csv.zip instead, which is linked to over here: https://statvoo.com/top/ranked and provides a list of the top 1million websites. (Updated daily)

@chilts
Copy link
Author

chilts commented Dec 15, 2021

Thanks @ao, that's good to know! :)

@huadaonan
Copy link

great

@jorgeluislazo
Copy link

Can confirm http://s3.amazonaws.com/alexa-static/top-1m.csv.zip still works for me, 1M sites (as of May 11th 2022). I think the actual resources will be gone by December of 2022 though

@ciscospirit
Copy link

Hello,
does anyone knows how to get the top-1000 from a specific Country too?
i would search for the Austrian and Germany Top 1000 List. Can anybody help me out with a link to download?

@chilts
Copy link
Author

chilts commented May 17, 2022

@ciscospirit I don't know any off the top of my head, but perhaps do a search and see what you can find.

@chilts
Copy link
Author

chilts commented May 17, 2022

Hi everyone, I just noticed this site on a fork of this gist and also seems to be kept up to date:

I don't know if it's useful to anyone, but there we go. :)

@ramazansancar
Copy link

@kostasmaneadis
Copy link

kostasmaneadis commented May 17, 2023

Hey everyone, when I download http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip , the csv has ".deprecated" as file extension. This is it ? Its done ?

@skacurt
Copy link

skacurt commented May 17, 2023

@kostasmaneadis Yes, it's no more.

-----------------------------------------------------------------
Notice: This file is deprecated and is not being updated anymore.
        This file was last updated on February 1, 2023.
        This file will not be available from
        http://s3.amazonaws.com/alexa-static/top-1m.csv.zip after
        July 31, 2023.
-----------------------------------------------------------------

@ggmartins
Copy link

https://radar.cloudflare.com/domains
top 1000000 unordered 🤢

@securitybd
Copy link

I am in trouble with my new domain securelines.net to install WordPress,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment