Skip to content

Instantly share code, notes, and snippets.

@marcotc
Created October 27, 2023 21:13
Show Gist options
  • Save marcotc/beb214ffe154ff18489529df9a159fa3 to your computer and use it in GitHub Desktop.
Save marcotc/beb214ffe154ff18489529df9a159fa3 to your computer and use it in GitHub Desktop.
Download all RubyGems.org search results
require 'faraday'
require 'json'
def fetch_data(search_query)
response = [:just_started_sentinel]
list = []
page = 1
until response.empty? do
sleep 0.11 # 10 request/sec is the remote rate limit
puts "Fetching page #{page}"
response = Faraday.get("https://rubygems.org/api/v1/search.json?query=#{search_query}&page=#{page}").body
response = JSON.parse(response)
list << response
page += 1
end
list.flatten!(1)
File.write("/tmp/#{search_query}", list.to_json)
end
def report_data(search_query)
data = JSON.parse(File.read("/tmp/#{search_query}"))
# Selecting all gems gets us a lot of small personal projects.
# We trim it down by download number.
puts data.select { |x| x['downloads'] > 500000 }.map { |x| x['name'] }.join(',')
end
fetch_data('kafka')
report_data('kafka')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment