bud42/SpidersApi.ipynb

## SpidersApi.ipynb
{
 "metadata": {
  "name": "SpiderStuff"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": "Spider APIs"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "\n**1 assessor = 1 job = 1 PBS script = 1 PBS output = 1 exp or 1 scan** "
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "# The Nightly loop\n\nproject_list = {'Proj1','Proj2'}\nprocessor_list = {FreesurferProcessor,DtiQaProcessor}\n\nfor project in project_list:\n    for subject in subject_list:\n        for session in session_list:\n            for processor in processor_list:\n                task = processor.getTask(session)\n                updateTask(task)\n                \ndef updateTask(task):\n    if not task.exists():\n        task.create() # create the assessor in XNAT\n        if task.hasInputs():\n            # Create the pbs and submit it\n            pbs_filename = '/path/to/pbs/filename'\n            task.writePbs(pbs_filename)\n            jobID = cluster.submit(pbs_filename)\n            task.setStatus('JOB_RUNNING') # set procstatus in XNAT\n            jobID = task.getJobIID()\n            task.setJobID(jobID) # set jobid in XNAT\n        else:\n            task.setStatus('MISSING_INPUTS') # set procstatus in XNAT\n    else: \n        xnat_status = task.getStatus() # get from procstatus in XNAT\n        if xnat_status = 'Complete':\n            print 'complete, nothing to do'\n        elif xnat_status = 'JOB_RUNNING':\n            # check the job\n            jobID = task.getJobID()\n            job = cluster.getJob(jobID)\n            if job.isRunning():\n                print 'job still running, nothing to do'\n            else:\n                # what do we do here?\n",
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "**Python Data Types (classes):**  \nProject (pyxnat)  \nExperiment (pyxnat)  \nScan (pyxnat)  \nAssessor (pyxnat)  \nGenericJob  \nScanJob  \nExperimentJob  \nAssessorJob  \nProjectJob \n\n"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "**XNAT Processing Data Type (genProcData) Fields:**  \nprocType  \nprocStatus  \nwalltimeused  \nmemused  \nscans  \n"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "**Build Status** (stored in genProcData/procstatus???)  \n BUILD_COMPLETE ==> complete assessor already exists in XNAT  \n PRE_RUNNING    ==> prequisites not complete in XNAT  \n PRE_COMPLETE   ==> prequisites complete, no assessor yet in XNAT  \n JOB_COMPLETE   ==> partial assessor exists in XNAT & job is complete (job.completed exists & output exists)  \n XVFB_ERROR     ==> partial assessor exists in XNAT, job normally terminated w/o result (job.completed exists & output folder has 0 files)  \n JOB_ERROR      ==> partial assessor exists in XNAT, job terminated with job.error  \n JOB_RUNNING    ==> partial assessor exists in XNAT, job still running  \n\n\n"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "**API should provide:**  \n\n-Run this task on everything in this project...  \n-does this task run per scan, exp, subj?  \n-Can this process run on this scan/exp/subj, ie it eligible to run (i.e. has required inputs)?  \n-what are inputs to this process?  \n-Has this process already been run on this scan/exp/subj? or rather status of the job?  \n\n\nAPI should allow user to build a loop that can either  \nA. Run once for all projects/exp/scan and check each processing type at each exp/scan  \nB. Run a single processing type on a single project  \nC. Run all processing types on a single project  \n"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "**Other Notes:**  \n-PBS not created or run unless we've determined inputs exist, currently we don't know input existence until we try to download them within PBS (e.g., no REST or no T1 or no NIFTI format)\n"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "# abstract classes\nclass GenericSpiderJob:  \n\n    #methods:\n    jobType() # does this job run on a proj/subj/exp/scan?\n\n    canRun() # does this proj/subj/exp/scan meet the requirements\n\n    hasRun() # has this job already been run on this proj/subj/exp/scan\n\n    getStatus() # what is the status of this job for this proj/subj/exp/scan\n\n    getWalltime() # what amount of time is required to run this job\n\n    setWalltime(days,hrs,mins) # set  time required to run this job\n\n    getMemReq() # what amount of memory is required to run this job \n\n    setMemReq(GBs,MBs) # set memory required to run this job\n\n    run # run this job \n\n    writePBS(filename) # write the PBS file for this job\n\n    submitPBS(filename) # qsub this PBS file\n\nclass SessionSpiderJob # runs on a single session\nclass ScanSpiderJob # runs on a single scan\nclass SubjectSpiderJob # runs on a single subject\nclass ProjectSpiderJob # runs on a single project\n\n# instantiable classes\nclass FreesurferSpiderJob\nclass DtiQaSpiderJob\nclass FrmiQaSpiderJob\nclass CcmRsFmriPreprocSpiderJob\nclass CcmTbFmriPreprocSpiderJob\nclass SpiderProcessHandler\n",
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "",
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "**Random Thoughts:**  \nCould have REST url in XNAT where a POST fires a process to update, i.e. new data is ready for a subject or exp, so we have a process that checks the status of each type of data being produced and advances to the next step as needed, this process will be manually run each time we walk the tree AND can be triggered by a post, this way the data flow doesn't get interrupted by having to wait for the next time the tree is walked"
    }
   ],
   "metadata": {}
  }
 ]
}
	{
	"metadata": {
	"name": "SpiderStuff"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": "Spider APIs"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "\n1 assessor = 1 job = 1 PBS script = 1 PBS output = 1 exp or 1 scan "
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "# The Nightly loop\n\nproject_list = {'Proj1','Proj2'}\nprocessor_list = {FreesurferProcessor,DtiQaProcessor}\n\nfor project in project_list:\n for subject in subject_list:\n for session in session_list:\n for processor in processor_list:\n task = processor.getTask(session)\n updateTask(task)\n \ndef updateTask(task):\n if not task.exists():\n task.create() # create the assessor in XNAT\n if task.hasInputs():\n # Create the pbs and submit it\n pbs_filename = '/path/to/pbs/filename'\n task.writePbs(pbs_filename)\n jobID = cluster.submit(pbs_filename)\n task.setStatus('JOB_RUNNING') # set procstatus in XNAT\n jobID = task.getJobIID()\n task.setJobID(jobID) # set jobid in XNAT\n else:\n task.setStatus('MISSING_INPUTS') # set procstatus in XNAT\n else: \n xnat_status = task.getStatus() # get from procstatus in XNAT\n if xnat_status = 'Complete':\n print 'complete, nothing to do'\n elif xnat_status = 'JOB_RUNNING':\n # check the job\n jobID = task.getJobID()\n job = cluster.getJob(jobID)\n if job.isRunning():\n print 'job still running, nothing to do'\n else:\n # what do we do here?\n",
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Python Data Types (classes): \nProject (pyxnat) \nExperiment (pyxnat) \nScan (pyxnat) \nAssessor (pyxnat) \nGenericJob \nScanJob \nExperimentJob \nAssessorJob \nProjectJob \n\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "XNAT Processing Data Type (genProcData) Fields: \nprocType \nprocStatus \nwalltimeused \nmemused \nscans \n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Build Status (stored in genProcData/procstatus???) \n BUILD_COMPLETE ==> complete assessor already exists in XNAT \n PRE_RUNNING ==> prequisites not complete in XNAT \n PRE_COMPLETE ==> prequisites complete, no assessor yet in XNAT \n JOB_COMPLETE ==> partial assessor exists in XNAT & job is complete (job.completed exists & output exists) \n XVFB_ERROR ==> partial assessor exists in XNAT, job normally terminated w/o result (job.completed exists & output folder has 0 files) \n JOB_ERROR ==> partial assessor exists in XNAT, job terminated with job.error \n JOB_RUNNING ==> partial assessor exists in XNAT, job still running \n\n\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "API should provide: \n\n-Run this task on everything in this project... \n-does this task run per scan, exp, subj? \n-Can this process run on this scan/exp/subj, ie it eligible to run (i.e. has required inputs)? \n-what are inputs to this process? \n-Has this process already been run on this scan/exp/subj? or rather status of the job? \n\n\nAPI should allow user to build a loop that can either \nA. Run once for all projects/exp/scan and check each processing type at each exp/scan \nB. Run a single processing type on a single project \nC. Run all processing types on a single project \n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Other Notes: \n-PBS not created or run unless we've determined inputs exist, currently we don't know input existence until we try to download them within PBS (e.g., no REST or no T1 or no NIFTI format)\n"
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "# abstract classes\nclass GenericSpiderJob: \n\n #methods:\n jobType() # does this job run on a proj/subj/exp/scan?\n\n canRun() # does this proj/subj/exp/scan meet the requirements\n\n hasRun() # has this job already been run on this proj/subj/exp/scan\n\n getStatus() # what is the status of this job for this proj/subj/exp/scan\n\n getWalltime() # what amount of time is required to run this job\n\n setWalltime(days,hrs,mins) # set time required to run this job\n\n getMemReq() # what amount of memory is required to run this job \n\n setMemReq(GBs,MBs) # set memory required to run this job\n\n run # run this job \n\n writePBS(filename) # write the PBS file for this job\n\n submitPBS(filename) # qsub this PBS file\n\nclass SessionSpiderJob # runs on a single session\nclass ScanSpiderJob # runs on a single scan\nclass SubjectSpiderJob # runs on a single subject\nclass ProjectSpiderJob # runs on a single project\n\n# instantiable classes\nclass FreesurferSpiderJob\nclass DtiQaSpiderJob\nclass FrmiQaSpiderJob\nclass CcmRsFmriPreprocSpiderJob\nclass CcmTbFmriPreprocSpiderJob\nclass SpiderProcessHandler\n",
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "",
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Random Thoughts: \nCould have REST url in XNAT where a POST fires a process to update, i.e. new data is ready for a subject or exp, so we have a process that checks the status of each type of data being produced and advances to the next step as needed, this process will be manually run each time we walk the tree AND can be triggered by a post, this way the data flow doesn't get interrupted by having to wait for the next time the tree is walked"
	}
	],
	"metadata": {}
	}
	]
	}