Skip to content

Instantly share code, notes, and snippets.

@infominer33
Last active June 30, 2022 08:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save infominer33/293462d111b5941f01cd71937994d5dd to your computer and use it in GitHub Desktop.
Save infominer33/293462d111b5941f01cd71937994d5dd to your computer and use it in GitHub Desktop.
Extract metadata (title, description, image) from list of links using python
# https://github.com/lethain/extraction
import extraction
import requests
import csv
#### Open CSV + Write Column Names
fname = 'links.csv'
csvFile = open(fname, 'w+')
csvWriter = csv.writer(csvFile)
csvWriter.writerow(["Link", "UrlTitle", "UrlDesc", "UrlImg"])
row = []
# Using readlines()
file1 = open('links.md', 'r')
Lines = file1.readlines()
# Strips the newline character
for line in Lines:
lin = line.strip()
print(lin)
html = requests.get(lin).text
extracted = extraction.Extractor().extract(html, source_url=lin)
row = [lin, extracted.title, extracted.description, extracted.image]
print(row)
#### Write row to CSV and print lin
csvWriter.writerow(row)
csvFile.close()
print("Complete")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment