John A. Bachman johnbachman

## curating_entities.rst

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                johnbachman
                / curating_entities.rst
            
            
              Last active
              September 2, 2016 16:40
            
              
                Curating entities for INDRA/Bioentities
              
          
    How to curate entities

The starting point for curation is results from reading (e.g., from REACH), in the form of a pickled dictionary with lists of INDRA statements keyed by paper.
The first step is generate a list of agent texts with grounding (abbreviated "twg" in filenames), that shows the entity texts in order of their frequency of occurrence along with all of the different identifiers they are grounded to across the corpus (often the same string is grounded to different IDs depending on the context of the paper). You'll also want the comparable list after filtering out agent texts that are already in the default grounding map.
To dump both of these files as CSV, run the grounding_mapper top-level script on pickled reading output. For example, for the REACH output from the batch 4 evaluation:
python -m indra.preassembler.grounding_mapper <filename>


## pmc_to_s3.rst

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                johnbachman
                / pmc_to_s3.rst
            
            
              Last active
              August 5, 2016 18:43
            
              
                Procedure for uploading PMC content to S3
              
          
    Procedure for uploading PMC content to S3

Download the PMC content directly from PMC to an Amazon EC2 instance with sufficient storage (>= 250 gb).

Run the ftp command-line program:
ftp


Connect to the PMC FTP server and set passive mode on:


## s3cache.py
import boto3
from botocore.exceptions import ClientError

import hashlib
import os
import errno

def mkdir_p(path):
    try:
        os.makedirs(path)
	import boto3
	from botocore.exceptions import ClientError

	import hashlib
	import os
	import errno

	def mkdir_p(path):
	try:
	os.makedirs(path)