Last active
February 22, 2018 12:56
-
-
Save jamesgeorge007/fed5013ff4169ce30249c97d5922400b to your computer and use it in GitHub Desktop.
This particular code snippet makes use of the Beautiful Soup library to extract all the links from a particular website where the '#' are skipped while './' are concatenated with https::/learncodeonline.in along with rest of the part available as part of the URL.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests | |
import bs4 | |
res = requests.get('https://learncodeonline.in') | |
soup = bs4.BeautifulSoup(res.text, 'lxml') | |
links = soup.find_all('a', href=True) | |
for i in links: | |
if i['href'] =='#': | |
continue | |
elif i['href'][0:2] == './': | |
i['href'] = 'https://learncodeonline.in' + i['href'][0:2] + i['href'][2:] | |
print(i['href']) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment