Skip to content

Instantly share code, notes, and snippets.

@mahemoff
Created July 27, 2022 03:54
Show Gist options
  • Save mahemoff/23e57305b5972ccf22385744667f138f to your computer and use it in GitHub Desktop.
Save mahemoff/23e57305b5972ccf22385744667f138f to your computer and use it in GitHub Desktop.
Unique domains list from ICANN zones files
#!/usr/bin/env ruby
# Based on https://sive.rs/com - extended it to read from gzipped file, support any domain extension, and output to separate file
require 'hashie'
require 'zlib'
class Parser < Hashie::Dash
property :extension, required: 'true' # e.g. "com" for dotcom database
def uniqify
domain = ''
infile = open("#{self.extension}.txt.gz")
instream = Zlib::GzipReader.new(infile)
outstream = File.open("#{self.extension}.uniq.txt", 'w')
while line = instream.gets
temp = line[0...(line.index(".#{self.extension}"))]
next if temp == domain
domain = temp
outstream.puts domain
end
end
end
Parser.new(extension: 'com').uniqify
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment