Skip to content

Instantly share code, notes, and snippets.

@sippey
Last active March 18, 2023 07:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sippey/44b100e1d030a9e9f35783928636df11 to your computer and use it in GitHub Desktop.
Save sippey/44b100e1d030a9e9f35783928636df11 to your computer and use it in GitHub Desktop.
chat gpt response
require 'csv'
require 'yaml'
require 'redcarpet'
markdown_folder_path = '/path/to/markdown/folder'
csv_file_path = '/path/to/csv/file.csv'
# Initialize CSV file with headers
CSV.open(csv_file_path, 'wb') do |csv|
csv << ['Date', 'Title', 'URL']
end
# Initialize markdown parser
markdown = Redcarpet::Markdown.new(Redcarpet::Render::HTML)
# Loop through each markdown file in the folder
Dir.glob("#{markdown_folder_path}/*.md") do |markdown_file_path|
# Read markdown file and extract YAML frontmatter
file_content = File.read(markdown_file_path)
frontmatter = YAML.load(file_content.match(/\A---\n(.+?)\n---/m)[1])
# Extract date and title from frontmatter
date = frontmatter['date']
title = frontmatter['title']
# Parse markdown and extract anchor HREF tags
html = markdown.render(file_content)
urls = html.scan(/<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1/m).flatten
# Add each URL to CSV file
urls.each do |url|
CSV.open(csv_file_path, 'a+') do |csv|
csv << [date, title, url]
end
end
end

Here's a brief overview of how the script works:

  • The script requires the csv, yaml, and redcarpet libraries, which will be used to read and write CSV files, extract YAML frontmatter, and parse markdown, respectively.

  • The script initializes the paths to the folder containing the markdown files and the CSV file that will be created.

  • The script opens the CSV file and adds headers for the three columns.

  • The script initializes a Redcarpet markdown parser.

  • The script uses the Dir.glob method to loop through each markdown file in the folder.

  • For each markdown file, the script reads the file contents and extracts the YAML frontmatter using a regular expression.

  • The script extracts the date and title from the frontmatter.

  • The script parses the markdown file into HTML using the Redcarpet parser.

  • The script uses another regular expression to extract all anchor HREF tags from the HTML.

  • The script loops through each URL and adds a new row to the CSV file with the date, title, and URL.

Note that this script assumes that all markdown files in the folder have YAML frontmatter with date and title fields. If your markdown files have different frontmatter or none at all, you'll need to modify the script accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment