Skip to content

Instantly share code, notes, and snippets.

@SuvroBaner
Created November 13, 2018 09:06
Show Gist options
  • Save SuvroBaner/03e501b28ff6aee95d85bc6a50799d52 to your computer and use it in GitHub Desktop.
Save SuvroBaner/03e501b28ff6aee95d85bc6a50799d52 to your computer and use it in GitHub Desktop.
import pandas as pd
url_df = pd.read_csv('Book2.csv')
def fn_split_url(x):
list_of_tokens = x.split('/')
bad_words = ['https:', 'http:', 'www.juniper.net', 'en_US', '']
final_tokens = []
s = ''
for token in list_of_tokens:
if token not in bad_words:
s = s + " page_url LIKE '%/"+ ''.join(token) + "%' and "
#final_tokens.append(token)
else:
pass
s = s[0:len(s) - 7]
s = s + "'"
return s
url_df['url_cat'] = url_df['URL'].apply(lambda x: fn_split_url(x))
url_df.to_csv('URL_new.csv')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment