Skip to content

Instantly share code, notes, and snippets.

@chilts
Created October 30, 2013 09:27
Show Gist options
  • Save chilts/7229605 to your computer and use it in GitHub Desktop.
Save chilts/7229605 to your computer and use it in GitHub Desktop.
Getting the Alexa top 1 million sites directly from the server, unzipping it, parsing the csv and getting each line as an array.
var request = require('request');
var unzip = require('unzip');
var csv2 = require('csv2');
request.get('http://s3.amazonaws.com/alexa-static/top-1m.csv.zip')
.pipe(unzip.Parse())
.on('entry', function (entry) {
entry.pipe(csv2()).on('data', console.log);
})
;
@chilts
Copy link
Author

chilts commented May 5, 2021

Thanks @seupedro and @cameck, always good to know that it's still working and the CSV is available. I wonder if the script still works. I'll try it again sometime soon and paste back here what worked or didn't and an update if needed.

@leilii
Copy link

leilii commented May 12, 2021

Hi,
how to get for example 10 top list into a text file not all?

@d668
Copy link

d668 commented Jun 21, 2021

the file now ends at 427k

@Waseemghafoor474
Copy link

CSV file is working again! Nice!
The data is not exactly up to date. I would say about 2 months. I have a site in the current the 67,000 positions today, and is in the lists 78,000s
Also how to get for example 10 top list into a text file not all?

https://stainely.com/

@xysecurity
Copy link

425k for 2021.10.11

@tomwojcik
Copy link

tomwojcik commented Dec 8, 2021

@snowman
Copy link

snowman commented Dec 9, 2021

We will be retiring Alexa.com on May 1, 2022

https://support.alexa.com/hc/en-us/articles/4410503838999

Note, this is the last chance you can backup things

@ao
Copy link

ao commented Dec 14, 2021

With the Alexa top 1 million CSV/ZIP going away shortly, you can use https://statvoo.com/dl/top-1million-sites.csv.zip instead, which is linked to over here: https://statvoo.com/top/ranked and provides a list of the top 1million websites. (Updated daily)

@chilts
Copy link
Author

chilts commented Dec 15, 2021

Thanks @ao, that's good to know! :)

@huadaonan
Copy link

great

@jorgeluislazo
Copy link

Can confirm http://s3.amazonaws.com/alexa-static/top-1m.csv.zip still works for me, 1M sites (as of May 11th 2022). I think the actual resources will be gone by December of 2022 though

@ciscospirit
Copy link

Hello,
does anyone knows how to get the top-1000 from a specific Country too?
i would search for the Austrian and Germany Top 1000 List. Can anybody help me out with a link to download?

@chilts
Copy link
Author

chilts commented May 17, 2022

@ciscospirit I don't know any off the top of my head, but perhaps do a search and see what you can find.

@chilts
Copy link
Author

chilts commented May 17, 2022

Hi everyone, I just noticed this site on a fork of this gist and also seems to be kept up to date:

I don't know if it's useful to anyone, but there we go. :)

@ramazansancar
Copy link

@kostasmaneadis
Copy link

kostasmaneadis commented May 17, 2023

Hey everyone, when I download http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip , the csv has ".deprecated" as file extension. This is it ? Its done ?

@skacurt
Copy link

skacurt commented May 17, 2023

@kostasmaneadis Yes, it's no more.

-----------------------------------------------------------------
Notice: This file is deprecated and is not being updated anymore.
        This file was last updated on February 1, 2023.
        This file will not be available from
        http://s3.amazonaws.com/alexa-static/top-1m.csv.zip after
        July 31, 2023.
-----------------------------------------------------------------

@ggmartins
Copy link

https://radar.cloudflare.com/domains
top 1000000 unordered 🤢

@securitybd
Copy link

I am in trouble with my new domain securelines.net to install WordPress,

@d668
Copy link

d668 commented May 31, 2024

I get access denied when accessing http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

@kostasmaneadis
Copy link

kostasmaneadis commented May 31, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment