Skip to content

Instantly share code, notes, and snippets.

@andykais
Created December 5, 2018 03:54
Show Gist options
  • Save andykais/1b792de8ea2f416a199bd22c9b19573b to your computer and use it in GitHub Desktop.
Save andykais/1b792de8ea2f416a199bd22c9b19573b to your computer and use it in GitHub Desktop.
separate scraper-step definitions from the downloading structure for readability, still will compile down to inline structure: https://gist.github.com/andykais/04a02b61bb3b6d92aa3388c45ea816bd
input: 'username'
defs:
- name: 'home'
download: 'https://ifunny.co/user/{{ username }}'
parse:
name: 'batch-id'
selector: '.stream__item:first-child'
attribute: 'data-next'
- name: 'gallery'
download: 'https://ifunny.co/user/{{ username }}/timeline/{{ value }}?batch=2?mode=grid'
- name: 'next-batch-id'
parse:
selector: '.stream__item:first-child'
attribute: 'data-next'
- name: 'batch-page'
parse:
selector: '.post a'
attribute: 'href'
- name: 'image-page'
download: 'https://ifunny.co{{ value }}'
parse:
selector: '.post .media__image'
attribute: 'src'
- name: 'image'
download: '{{ value }}'
structure:
def: 'home'
scrapeEach:
def: 'batch-id'
scrapeEach:
def: 'gallery'
scrapeNext:
def: 'next-batch-id'
scrapeEach:
def: 'image-page'
scrapeEach:
def: 'image'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment