Skip to content

Instantly share code, notes, and snippets.

@s4yed
Last active June 16, 2020 18:18
Show Gist options
  • Save s4yed/20e4d230cd82c4fd6e9632fe9c55602e to your computer and use it in GitHub Desktop.
Save s4yed/20e4d230cd82c4fd6e9632fe9c55602e to your computer and use it in GitHub Desktop.
Springer Machine Learning and Data Science Books.
'''
*-------------------------------------------------------------
* Copyright (c) Ahmed Sayed. All rights reserved.
* Licensed under the MIT License.
* Title: SpringerBooks.py
* Date: 2020-05-01 11:33:21
*-------------------------------------------------------------
Excute the following commands to install the dependencies before running the script:
pip install beautifulsoup4
pip install requests
'''
from bs4 import BeautifulSoup
import requests
import os
GREEN = '\33[32m'
YELLOW = '\33[33m'
url = 'https://techgrabyte.com/springer-released-65-machine-learning-data-science-books-free/'
springer_url = 'https://link.springer.com/'
dir_name = 'Machine Learning and Data Science Books'
def find_links(url, text='', title=False):
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
links = set()
for link in soup.find_all('a'):
href = link.get('href')
if text in href:
if title:
links.add((href,soup.title.string.replace(' | SpringerLink','').strip()))
else:
links.add(href)
return links
springer_links = find_links(url, 'link.springer')
books = []
print(YELLOW + '[!] Getting all links ...')
for link in springer_links:
isbn = link.split('isbn=')[-1]
book = find_links(link, isbn+'.pdf', True)
if book:
books.append(book)
print(GREEN + '[+] Done')
os.mkdir(dir_name)
for book in books:
book_name = list(book)[0][1]
book_link = list(book)[0][0]
print(YELLOW + f'[!] Fetching {book_name} book ...')
req = requests.get(springer_url+book_link, stream=True)
book_name = book_name.replace(' ','-')
with open(f'{dir_name}/{book_name}.pdf', 'wb') as f:
f.write(req.raw.read())
print(GREEN + '[+] Done')
@s4yed
Copy link
Author

s4yed commented May 3, 2020

@h45h-1nclud3
Thanks, bro but I think the script doesn't need it since the loading process depends on your internet connection and the way how BeautifulSoup scraps the webpage, not on the loops it's just 65 iterations. 😂

@h45h-1nclud3
Copy link

Run both and observe the difference, it's your call after all :D

@s4yed
Copy link
Author

s4yed commented May 6, 2020

I ran it already but in somehow the script brings some of the links once with no downloads and starts to download after those links.

@AbanoubSamuel
Copy link

Hello there, the script is cool, if you could advice me how can I write a similar code or edit this to use for downloading PDF and DOCX files from a different website, i am new to Python and wanted to automate such a thing as you did. Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment