Skip to content

Instantly share code, notes, and snippets.

@chilts
Created October 30, 2013 09:27
Star You must be signed in to star a gist
Save chilts/7229605 to your computer and use it in GitHub Desktop.
Getting the Alexa top 1 million sites directly from the server, unzipping it, parsing the csv and getting each line as an array.
var request = require('request');
var unzip = require('unzip');
var csv2 = require('csv2');
request.get('http://s3.amazonaws.com/alexa-static/top-1m.csv.zip')
.pipe(unzip.Parse())
.on('entry', function (entry) {
entry.pipe(csv2()).on('data', console.log);
})
;
@ramazansancar
Copy link

@kostasmaneadis
Copy link

kostasmaneadis commented May 17, 2023

Hey everyone, when I download http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip , the csv has ".deprecated" as file extension. This is it ? Its done ?

@skacurt
Copy link

skacurt commented May 17, 2023

@kostasmaneadis Yes, it's no more.

-----------------------------------------------------------------
Notice: This file is deprecated and is not being updated anymore.
        This file was last updated on February 1, 2023.
        This file will not be available from
        http://s3.amazonaws.com/alexa-static/top-1m.csv.zip after
        July 31, 2023.
-----------------------------------------------------------------

@ggmartins
Copy link

https://radar.cloudflare.com/domains
top 1000000 unordered 🤢

@securitybd
Copy link

I am in trouble with my new domain securelines.net to install WordPress,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment