Skip to content

Instantly share code, notes, and snippets.

@fabianvf
Created April 13, 2015 15:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fabianvf/597f57ffe8351156bb98 to your computer and use it in GitHub Desktop.
Save fabianvf/597f57ffe8351156bb98 to your computer and use it in GitHub Desktop.
import os
import json
from scrapi.processing.base import BaseProcessor
class StorageProcessor(BaseProcessor):
NAME = 'storage'
def process_raw(self, raw):
filename = 'archive/{}/{}/raw.{}'.format(raw['source'], raw['docID'], raw['filetype'])
if not os.path.exists(os.path.dirname(filename)):
os.makedirs(os.path.dirname(filename))
with open(filename, 'w') as f:
f.write(json.dumps(raw.attributes, indent=4))
def process_normalized(self, raw, normalized):
filename = 'archive/{}/{}/normalized.json'.format(raw['source'], raw['docID'], raw['filetype'])
if not os.path.exists(os.path.dirname(filename)):
os.makedirs(os.path.dirname(filename))
with open(filename, 'w') as f:
f.write(json.dumps(normalized.attributes, indent=4))
@brianjgeiger
Copy link

You just have to add this to scrapi/processing/storage.py, and update your local.py to include it the list of
RAW_PROCESSING and NORMALIZED_PROCESSING

NORMALIZED_PROCESSING = ['storage'] 
RAW_PROCESSING = ['storage']

@walkerh
Copy link

walkerh commented Apr 13, 2015

invoke harvester biomed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment