Skip to content

Instantly share code, notes, and snippets.

@jamesgeorge007
Last active February 22, 2018 12:56
Show Gist options
  • Save jamesgeorge007/fed5013ff4169ce30249c97d5922400b to your computer and use it in GitHub Desktop.
Save jamesgeorge007/fed5013ff4169ce30249c97d5922400b to your computer and use it in GitHub Desktop.
This particular code snippet makes use of the Beautiful Soup library to extract all the links from a particular website where the '#' are skipped while './' are concatenated with https::/learncodeonline.in along with rest of the part available as part of the URL.
import requests
import bs4
res = requests.get('https://learncodeonline.in')
soup = bs4.BeautifulSoup(res.text, 'lxml')
links = soup.find_all('a', href=True)
for i in links:
if i['href'] =='#':
continue
elif i['href'][0:2] == './':
i['href'] = 'https://learncodeonline.in' + i['href'][0:2] + i['href'][2:]
print(i['href'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment