- The config.json File
- The Query Files
- The Template Files
- Set Up the Oracle Client and Environment Variables
Copy config.json.sample
to config.json
and set the variables as described below. (or skip to a sample config file)
-
publish_to
can beFILE
orCOMPORT
. If it'sFILE
, extracted data will be written to a local CSV file, perfect for testing the output of queries. Once queries are finalized, setting it toCOMPORT
will send data to the Comport web application. -
extract_from_date
is a date in the formatmonth/year
, like4/2014
. Ifpublish_to
is set toFILE
, this date will be used as the start date for extraction. ifpublish_to
is set toCOMPORT
, this date will be ignored, and the extractor will get its start date from the Comport web application. -
comport_baseurl
is the URL of the Comport app, probablyhttps://www.projectcomport.org/
-
comport_username
andcomport_password
are the credentials the extractor will use to authenticate with the Comport app, you should receive them from the administrator. -
comport_key
is a string that'll be used to hash values that should be anonymized. -
oracle_host
is the address of the Oracle server. -
oracle_port
is the port to connect to on the server, probably1521
-
oracle_database
is the name of the database on the server that contains the data to be extracted. -
oracle_username
andoracle_password
are the credentials the extractor will use to authenticate with the SQL database. -
queries_dir
is the directory wheresql
files containing the database queries are kept. -
templates_dir
is the directory wherejson
files containing the data templates are kept. These templates describe how the results of the queries are processed and mapped. -
proxy_http
andproxy_https
are the URLs of proxy servers for http and https traffic. Thecomport_username
andcomport_password
will be automatically added to the URLs if necessary. -
datasets
is a list of objects that each describe a dataset to be extracted, transformed, and loaded. Each object has:-
name
a short, unique name that describes the data set, like "complaints" -
query_filename
the name of thesql
file containing a database query, kept in thequeries_dir
. See The Query Files for an example of what the query files should look like. -
template_filename
the name of thejson
file containing a dataset template, kept in thetemplates_dir
. See The Template Files for an example of what the template files should look like. -
endpoint
the endpoint on the Comport site that the data will be sent to, like "data/complaints"
-
When your config file is filled out, it should look something like this.
{
"publish_to": "FILE",
"extract_from_date": "1/2005",
"comport_baseurl": "https://www.projectcomport.org/",
"comport_username": "Sample_Police_Department-extractor",
"comport_password": "f5e7db81-678d-451b-8bc0-029c2687206a",
"comport_key": "grgF5u9ZTF2yALj",
"oracle_host": "hostname",
"oracle_port": 1521,
"oracle_database": "DBNAME",
"oracle_username": "username",
"oracle_password": "password",
"queries_dir": "queries",
"templates_dir": "templates",
"proxy_http": "http://proxy.example.com/",
"proxy_https": "http://proxy.example.com/",
"datasets": [
{
"name": "complaints",
"query_filename": "query-complaints.sql",
"template_filename": "template-complaints.json",
"endpoint": "data/complaints"
},
{
"name": "ois",
"query_filename": "query-ois.sql",
"template_filename": "template-ois.json",
"endpoint": "data/OIS"
},
{
"name": "uof",
"query_filename": "query-uof.sql",
"template_filename": "template-uof.json",
"endpoint": "data/UOF"
}
]
}
Each dataset requires a query file and a template file. The locations and names of these files are defined in config.json. The query file contains a SQL query that is used to extract data from the database. Here's a very simple example that pulls data from the IADATA.INCIDENTS
table:
SELECT
IADATA.INCIDENTS.incnum,
IADATA.INCIDENTS.service_type,
IADATA.INCIDENTS.occurred_dt,
IADATA.INCIDENTS.occurred_tm,
IADATA.INCIDENTS.udtext24a,
IADATA.INCIDENTS.udtext24b,
IADATA.INCIDENTS.udtext24c,
IADATA.INCIDENTS.udtext24d,
IADATA.INCIDENTS.disposition,
IADATA.INCIDENTS.citnum
FROM IADATA.INCIDENTS
WHERE IADATA.INCIDENTS.incident_type = 'Citizen complaint'
AND IADATA.INCIDENTS.occurred_dt >= to_date(:dt, 'YYYY-MM-DD')
Note the :dt
on the last line of the query. When the query is run, that :dt
will be replaced with the extraction start date.
Each dataset requires a query file and a template file. The locations and names of these files are defined in config.json. The template file contains a JSON object that is used to transform data extracted from the database by the corresponding query. Here's a very simple example that transforms data extracted by the query above:
[
{"name": "opaqueId", "row_index": 0, "process_as": "anonymous"},
{"name": "serviceType", "row_index": 1, "process_as": "string"},
{"name": "occuredDate", "row_index": 2, "process_as": "date"},
{"name": "occuredTime", "row_index": 3, "process_as": "date"},
{"name": "division", "row_index": 4, "process_as": "string"},
{"name": "precinct", "row_index": 5, "process_as": "string"},
{"name": "shift", "row_index": 6, "process_as": "string"},
{"name": "beat", "row_index": 7, "process_as": "string"},
{"name": "disposition", "row_index": 8, "process_as": "string"},
{"name": "residentId", "row_index": 9, "process_as": "anonymous", "process_with": "opaqueId"}
]
Each row in the JSON file corresponds to a selection from the query. So the first item in the query selection (IA_ADM.INCIDENTS.incnum
) is at index 0
; this template file maps that value using the row_index
variable, names it opaqueID
, and instructs that it should be processed as an anonymous
variable. The fifth item in the query selection (IA_ADM.INCIDENTS.udtext24a
) is at index 4
; it's named division
here and processed as a string
variable.
The accepted values for process_as
are:
date
- this value will be treated as a dateboolean
- this value will be treated as a boolean valueanonymous
- this value will be anonymized (used for id numbers)string
- this value will be treated as a string value. Numerical values should also be processed as strings. If a value isn't adate
,boolean
, oranonymous
value, it should be astring
.
If you have a value that you want to anonymize within the context of a single incident, you can also use the process_with
parameter as shown above. Using the unique incident ID (opaqueId
) to anonymize the unique resident ID (residentId
) will keep a single resident from tracked across incidents.
The template defines how an incident will look when it's written to disk, or posted to a web service. A CSV file generated using the above query and template would look like this:
opaqueId,serviceType,occuredDate,occuredTime,division,precinct,shift,beat,disposition,residentId
11db42a5d058a4033316e5872875e2d0,Arresting,2007-05-08 00:00:00,2007-05-08 12:37:00,Operational Bureau,Fourth Precinct,C.O.P. Program,Days,Within policy,2d2ff48f831256de10690246c2b7b0de
A JSON file posted to a web service using the above query and template would look like this:
{
"data": [
{
"opaqueId": "11db42a5d058a4033316e5872875e2d0",
"serviceType": "Arresting",
"occuredDate": "2007-05-08 00:00:00",
"occuredTime": "2007-05-08 12:37:00",
"division": "Operational Bureau",
"precinct": "Fourth Precinct",
"shift": "C.O.P. Program",
"beat": "Days",
"disposition": "Within policy",
"residentId": "2d2ff48f831256de10690246c2b7b0de"
}
]
}
If you don't already have an Oracle client installed on the machine:
- download and install the Oracle Instant Client. Get the Instant Client Package - Basic Lite and the Instant Client Package - SDK packages appropriate for your system and database version.
- Unzip the Instant Client into a directory, like
C:\Oracle\instantclient_11_1
- Unzip the SDK and copy the
sdk
folder into that directory; so it will be atC:\Oracle\instantclient_11_1\sdk
Create an ORACLE_HOME
environment variable that contains the Instant Client path:
Control Panel
->System and Security
->System
- Click
Advanced System Settings
in the left panel - Click the
Environment Variables...
button - Click the
New...
button in either User or System variables. - Create a new variable named
ORACLE_HOME
and set its value to the client path, likeC:\Oracle\instantclient_11_1\
And add that same path to the end of the PATH
environment variable:
- In the Environment Variables window (after steps 1-3 above), select the
Path
system variable in the bottom scroll window and clickEdit...
- To the end of
Variable
value, add the client path, like;C:\Oracle\instantclient_11_1\
If you already have an Oracle client installed on the machine, set ORACLE_HOME
and append to PATH
the appropriate path, like C:\oracle\product\11.1.0\client_1\