Skip to content

Instantly share code, notes, and snippets.

@fabianvf fabianvf/storage.py

Created Apr 13, 2015
Embed
What would you like to do?
import os
import json
from scrapi.processing.base import BaseProcessor
class StorageProcessor(BaseProcessor):
NAME = 'storage'
def process_raw(self, raw):
filename = 'archive/{}/{}/raw.{}'.format(raw['source'], raw['docID'], raw['filetype'])
if not os.path.exists(os.path.dirname(filename)):
os.makedirs(os.path.dirname(filename))
with open(filename, 'w') as f:
f.write(json.dumps(raw.attributes, indent=4))
def process_normalized(self, raw, normalized):
filename = 'archive/{}/{}/normalized.json'.format(raw['source'], raw['docID'], raw['filetype'])
if not os.path.exists(os.path.dirname(filename)):
os.makedirs(os.path.dirname(filename))
with open(filename, 'w') as f:
f.write(json.dumps(normalized.attributes, indent=4))
@brianjgeiger

This comment has been minimized.

Copy link

brianjgeiger commented Apr 13, 2015

You just have to add this to scrapi/processing/storage.py, and update your local.py to include it the list of
RAW_PROCESSING and NORMALIZED_PROCESSING

NORMALIZED_PROCESSING = ['storage'] 
RAW_PROCESSING = ['storage']
@walkerh

This comment has been minimized.

Copy link

walkerh commented Apr 13, 2015

invoke harvester biomed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.