Skip to content

Instantly share code, notes, and snippets.

@howardjones
Created September 18, 2020 07:43
Show Gist options
  • Save howardjones/d3ba9327017d013c1e31ddf1fcfa3ef6 to your computer and use it in GitHub Desktop.
Save howardjones/d3ba9327017d013c1e31ddf1fcfa3ef6 to your computer and use it in GitHub Desktop.
So you have a crawler, but you don't want it to cache images, just the page.
from cachecontrol import CacheControl
from cachecontrol.caches import FileCache
from cachecontrol.heuristics import BaseHeuristic
class NoImagesHeuristic(BaseHeuristic):
def update_headers(self, response):
if response.headers['Content-Type'].startswith("image/"):
return {
'cache-control': 'no-cache',
}
return {}
sess = requests.session()
cached_sess = CacheControl(requests.Session(),
heuristic=NoImagesHeuristic(),
cache=FileCache('.webcache')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment