Skip to content

Instantly share code, notes, and snippets.

@jystewart
Created March 27, 2019 19:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jystewart/d1f3a5aa5fa9c7526cdce0875c119962 to your computer and use it in GitHub Desktop.
Save jystewart/d1f3a5aa5fa9c7526cdce0875c119962 to your computer and use it in GitHub Desktop.
require 'fileutils'
url_file = "urlfile.txt"
def sanitize_filename(filename)
name = filename.strip
# NOTE: File.basename doesn't work right with Windows paths on Unix
# get only the filename, not the whole path
name.gsub!(/^.*(\\|\/)/, '')
# Strip out the non-ascii character
name.gsub!(/[^0-9A-Za-z.\-]/, '_')
return name
end
File.readlines(url_file).each do |line|
path, query = line.split("?")
path_parts = path.split("/")
filename = File.basename(path, ".json") + "_" + sanitize_filename(query)
path = File.dirname(path)
FileUtils.mkdir_p("s3#{path}") unless File.exists?("s3/#{path}")
cmd = "curl -o s3#{path}/#{filename}.json https://www.ncsc.gov.uk/#{line}"
value = `#{cmd}`
end
/api/1/services/v1/collection-content.json?url=/collection/board-toolkit&pageContentUrl=/collection/board-toolkit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment