Skip to content

Instantly share code, notes, and snippets.

@shantanuo
Created September 6, 2019 09:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shantanuo/24afcd714b0813bc6bf1ebcfe7d51c6a to your computer and use it in GitHub Desktop.
Save shantanuo/24afcd714b0813bc6bf1ebcfe7d51c6a to your computer and use it in GitHub Desktop.
link extractor project script
# downlaod links from dynamoDB
!aws dynamodb scan --table-name Movies --query "Items[*].[id.S,title.S]" --output json | sort -u > /tmp/download.txt
# copy github links and extract repo URLs
import pandas as pd
mylist = """
"https://github.com/apoorvnandan/speech-recognition-primer"
"https://github.com/asmitakulkarni/QuoteGenerator"
"https://github.com/cjhutto/vaderSentiment"
"https://github.com/docker/docker-bench-security"
"""
finallist = list()
for i in mylist.split():
finallist.append(i.strip('"').split("/")[3:5])
df = pd.DataFrame(finallist)
df = df.drop_duplicates()
for i in df.values:
print("http://github.com/{0}/{1}".format(i[0], i[1]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment