Skip to content

Instantly share code, notes, and snippets.

@blueset
Last active October 21, 2022 02:56
Show Gist options
  • Save blueset/3034bcbd6167fd145cee915e36f5aab4 to your computer and use it in GitHub Desktop.
Save blueset/3034bcbd6167fd145cee915e36f5aab4 to your computer and use it in GitHub Desktop.
Download/Print StuDocu.com documents without watermark

Usage

Add StuDocu Extracter to your Browser favorites, and click on it after you open the StuDocu document.

How it works

javascript:(function(){var a = "", x = document.getElementsByTagName("svg"); for(var i = 0; i < x.length; i++){a += x[i].outerHTML;} document.getElementsByTagName("body")[0].innerHTML = a;var a = document.getElementsByTagName("svg");for (var i = 0; i < a.length; i++){a[i].style.width="99.8%";a[i].style.height="auto";a[i].style.position="inherit";a[i].style.display="block";a[i].style.boxShadow="0 3px 3px rgba(0,0,0,0.3)";a[i].style.padding="0";}})()

Extract out all SVG tags and arrange it for printing.

Known issues:

  • Only work with unified page size.
  • Only work with fully accessable document.
  • Need to turn on No border width option when printing.
@atamsingh
Copy link

Hey this is not correct anymore. StuDocu now uses iframes to load all pages and the script fails because of it. I am attaching my solution below to download the file properly. The only limitation to it is that currently studocu only loads iframes when the page is in "view". You can trick this by zooming out enough so all pages are in the view. The function that works is as below:

javascript:(function(){
  var a = "";
  var iframes = document.getElementsByClassName("crocodoc-page-svg");
  console.log('getting iframes...');
  console.log(iframes);
  for(var i = 0; i < iframes.length; i++){
    // console.log("iframe: " + i);
    var iframedoc = iframes[i].getElementsByTagName("iframe")[0].contentWindow.document;
    // console.log(iframedoc);
    x = iframedoc.getElementsByTagName("svg")[0];
    // console.log(x.outerHTML);
    a += x.outerHTML;
  }
  document.getElementsByTagName("body")[0].innerHTML = a;
  var a = document.getElementsByTagName("svg");
  for (var i = 0; i < a.length; i++){
	a[i].style.width="99.8%";
	a[i].style.height="auto";
	a[i].style.position="inherit";
	a[i].style.display="block";
	a[i].style.boxShadow="0 3px 3px rgba(0,0,0,0.3)";
	a[i].style.padding="0";
  }
}
)()

@mkalwtb
Copy link

mkalwtb commented Mar 29, 2018

Not working anymore

@Ibuprofen1000mg
Copy link

Ibuprofen1000mg commented Feb 22, 2021

It is possible to fully access premium documents when downloading the whole website with python (worked for me)

This script downloads the website as a whole without any scripts just the .html part and styling you may need to customize the output to your needs. When premium documents were opened with this script all pages (even those which usually are not available to standard members were downloaded).

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import os

#Custom header to bypass standard bot/scrapping filters (Should work for most websites)
req = Request('Website.com', headers={'User-Agent': 'Mozilla/5.0'})

#Converts whole website into string for storing and writing to .html later
wholewebpage = str(urlopen(req).read())

#Saves the downloaded website in the folder where the .py is located
def file_path():
    my_folder = os.path.dirname(os.path.abspath(__file__))
    my_file = os.path.join(my_folder, 'custom_name_of_downloaded_website.html')
    return my_file

#Writes and beautifies the website
with open(file_path(), "w") as downloaded_website:
    soup = BeautifulSoup(wholewebpage, 'html.parser')
    downloaded_website.writelines(soup.prettify())

@cdecompilador
Copy link

I tried it (in stdocu) and just downloads the preview, I've been looking for the source where it gets the file but i require some special cookies (in the places where i think it retrieves it). Is there a way to make it download the full document or even retrieve the pdf?

@kaparomutugi
Copy link

kaparomutugi commented Apr 14, 2021

nothing works anymore .

@kinslayeruy
Copy link

kinslayeruy commented Mar 9, 2022

Managed to find a workaround for firefox

function getCSSSelector(el) {
    let selector = el.tagName.toLowerCase();
    const attrs = el.attributes
    for (var i = 0; i < attrs.length; i++) {
        let attr = attrs.item(i)
        if (attr.name === 'id') selector += `#${attr.value}`;
        if (attr.name === 'class') selector += attr.value.split(' ').map((c) => `.${c}`).join('');
        if (attr.name === 'name') selector += `[${attr.name}=${attr.value}]`;
    }
    return selector
}

function next(container) {
	let childs = container.childNodes;
	let child = childs[index];
  child.scrollIntoView();
  let selector = getCSSSelector(child);
  console.log(selector);
  console.log(`remaining ${childs.length - index}`);
  index++;
  return `:screenshot --file true --selector ${selector}`;  
}

let index = 0;
let pageContainer = document.getElementById("page-container");
//usage in firefox
// copy(next(pageContainer))
// paste command copied and hit enter

just paste the code in the developer console
I recommend using said console to remove all elements on the top, and the side, so the view area is bigger. also, you can zoom in (ctrl+scrollwheel) and then use the fit height button on the studocu navigator, so it will take screenshots with better quality

only works on public documents, it won't access any page you can't see by yourself

works by using the copy() helper to copy into clipboard and the :screenshot helper to take a screenshot of a node
it will spam your downloads folder with the screenshots, you can print them all later to pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment