Getting the Alexa top 1 million sites directly from the server, unzipping it, parsing the csv and getting each line as an array.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var request = require('request'); | |
var unzip = require('unzip'); | |
var csv2 = require('csv2'); | |
request.get('http://s3.amazonaws.com/alexa-static/top-1m.csv.zip') | |
.pipe(unzip.Parse()) | |
.on('entry', function (entry) { | |
entry.pipe(csv2()).on('data', console.log); | |
}) | |
; |
@ciscospirit I don't know any off the top of my head, but perhaps do a search and see what you can find.
Hi everyone, I just noticed this site on a fork of this gist and also seems to be kept up to date:
- Fork : https://gist.github.com/aowongster/a69c84b66c74ca037e7094bed61e48b0
- Majestic Million : https://majestic.com/reports/majestic-million
- Download : https://downloads.majesticseo.com/majestic_million.csv
I don't know if it's useful to anyone, but there we go. :)
Hey everyone, when I download http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip , the csv has ".deprecated" as file extension. This is it ? Its done ?
@kostasmaneadis Yes, it's no more.
-----------------------------------------------------------------
Notice: This file is deprecated and is not being updated anymore.
This file was last updated on February 1, 2023.
This file will not be available from
http://s3.amazonaws.com/alexa-static/top-1m.csv.zip after
July 31, 2023.
-----------------------------------------------------------------
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
does anyone knows how to get the top-1000 from a specific Country too?
i would search for the Austrian and Germany Top 1000 List. Can anybody help me out with a link to download?