Skip to content

Instantly share code, notes, and snippets.

@hamletbatista
Created July 25, 2019 03:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hamletbatista/5e3a65bc19427d8c5570482b572d04b2 to your computer and use it in GitHub Desktop.
Save hamletbatista/5e3a65bc19427d8c5570482b572d04b2 to your computer and use it in GitHub Desktop.
merged = pd.merge(df, df_logs, right_on="path", left_on="path", how="left")
#pages not crawled
notcrawled=merged[["path", "lastmod", "date"]][merged.date.isnull()]
notcrawled.to_csv("notcrawled.csv")
#pages crawled
crawled = merged[["lastmod", "date", "path"]].dropna()
crawled.to_csv("crawled.csv")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment