Skip to content

Instantly share code, notes, and snippets.

@blueset

blueset/README.md

Last active Apr 16, 2021
Embed
What would you like to do?
Download/Print StuDocu.com documents without watermark

Usage

Add StuDocu Extracter to your Browser favorites, and click on it after you open the StuDocu document.

How it works

javascript:(function(){var a = "", x = document.getElementsByTagName("svg"); for(var i = 0; i < x.length; i++){a += x[i].outerHTML;} document.getElementsByTagName("body")[0].innerHTML = a;var a = document.getElementsByTagName("svg");for (var i = 0; i < a.length; i++){a[i].style.width="99.8%";a[i].style.height="auto";a[i].style.position="inherit";a[i].style.display="block";a[i].style.boxShadow="0 3px 3px rgba(0,0,0,0.3)";a[i].style.padding="0";}})()

Extract out all SVG tags and arrange it for printing.

Known issues:

  • Only work with unified page size.
  • Only work with fully accessable document.
  • Need to turn on No border width option when printing.
@atamsingh

This comment has been minimized.

Copy link

@atamsingh atamsingh commented Apr 19, 2017

Hey this is not correct anymore. StuDocu now uses iframes to load all pages and the script fails because of it. I am attaching my solution below to download the file properly. The only limitation to it is that currently studocu only loads iframes when the page is in "view". You can trick this by zooming out enough so all pages are in the view. The function that works is as below:

javascript:(function(){
  var a = "";
  var iframes = document.getElementsByClassName("crocodoc-page-svg");
  console.log('getting iframes...');
  console.log(iframes);
  for(var i = 0; i < iframes.length; i++){
    // console.log("iframe: " + i);
    var iframedoc = iframes[i].getElementsByTagName("iframe")[0].contentWindow.document;
    // console.log(iframedoc);
    x = iframedoc.getElementsByTagName("svg")[0];
    // console.log(x.outerHTML);
    a += x.outerHTML;
  }
  document.getElementsByTagName("body")[0].innerHTML = a;
  var a = document.getElementsByTagName("svg");
  for (var i = 0; i < a.length; i++){
	a[i].style.width="99.8%";
	a[i].style.height="auto";
	a[i].style.position="inherit";
	a[i].style.display="block";
	a[i].style.boxShadow="0 3px 3px rgba(0,0,0,0.3)";
	a[i].style.padding="0";
  }
}
)()
@mkalwtb

This comment has been minimized.

Copy link

@mkalwtb mkalwtb commented Mar 29, 2018

Not working anymore

@Ibuprofen1000mg

This comment has been minimized.

Copy link

@Ibuprofen1000mg Ibuprofen1000mg commented Feb 22, 2021

It is possible to fully access premium documents when downloading the whole website with python (worked for me)

This script downloads the website as a whole without any scripts just the .html part and styling you may need to customize the output to your needs. When premium documents were opened with this script all pages (even those which usually are not available to standard members were downloaded).

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import os

#Custom header to bypass standard bot/scrapping filters (Should work for most websites)
req = Request('Website.com', headers={'User-Agent': 'Mozilla/5.0'})

#Converts whole website into string for storing and writing to .html later
wholewebpage = str(urlopen(req).read())

#Saves the downloaded website in the folder where the .py is located
def file_path():
    my_folder = os.path.dirname(os.path.abspath(__file__))
    my_file = os.path.join(my_folder, 'custom_name_of_downloaded_website.html')
    return my_file

#Writes and beautifies the website
with open(file_path(), "w") as downloaded_website:
    soup = BeautifulSoup(wholewebpage, 'html.parser')
    downloaded_website.writelines(soup.prettify())
@cdecompilador

This comment has been minimized.

Copy link

@cdecompilador cdecompilador commented Feb 26, 2021

I tried it (in stdocu) and just downloads the preview, I've been looking for the source where it gets the file but i require some special cookies (in the places where i think it retrieves it). Is there a way to make it download the full document or even retrieve the pdf?

@kaparomutugi

This comment has been minimized.

Copy link

@kaparomutugi kaparomutugi commented Apr 14, 2021

nothing works anymore .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment