Skip to content

Instantly share code, notes, and snippets.

@dbreunig
Last active January 22, 2021 16:07
Show Gist options
  • Star 61 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save dbreunig/9315705 to your computer and use it in GitHub Desktop.
Save dbreunig/9315705 to your computer and use it in GitHub Desktop.
A description of the data written to the Reporter App Dropbox save folder.

#Reporter Save File Schema

##The Reporter Export File

Reporter saves to your Dropbox account with plaintext JSON files, one for each day. When a Report is entered in the app a file is created for that day if it does not exist. Otherwise, the report is appended to the existing file. The save folder is located in 'Dropbox/Apps/Reporter-App/'.

Reporter save files are named according to the following convention:

YYYY-MM-DD-reporter-export.json

So, a Reporter file for February 28th, 2014 could be found at the following path:

~/Dropbox/Apps/Reporter-App/2014-02-28-reporter-export.json

Provided, of course, your Dropbox folder is contained within your home directory.

###The Snapshots Array

The root object of document is a dictionary containing a single array, "snapshots". The "snapshots" array is a collection of all the reports from that day, in order of their entry.

Why "snapshots" and not "reports"? In the original version of Reporter that Felton and I built we differentiated by tween the survey "report" and the data captured passively, like battery, location, and weather. The "snapshot" object was created to hold unite these two objects. The name stuck.

####A Snapshot

Each snapshot contains a set of passive metrics gathered by the device without the user's input, any survey responses entered, and a bit of metadata.

When we save a snapshot to Dropbox we write out a JSON version of the entire object. As a result there are a handful of unused properties written. We'll cover these in the Metadata section.

###Passive Metrics

#####Battery

The battery key refers to a double numerical value, between 0 and 1, reflecting the power stored in the iPhone's battery at the time of report.

#####Location

The location dictionary is essentially a CoreLocation CLLocation object, with a CLPlacemark embedded. Refer to the linked documentation for each class for details on their properties.

The placemark object is the result of reverse geocoding the latitude and longitude deribed from iOS's location services. It will often get addresses wrong, but will usually be accurate with ZIP, county, neighborhood, city, and state attributes.

For example, see this output:

{
  "location" : {
    "verticalAccuracy" : 10,
    "timestamp" : 413471577.349213,
    "longitude" : -73.94920271473413,
    "latitude" : 40.71048222608454,
    "course" : 0,
    "placemark" : {
      "thoroughfare" : "Lorimer St",
      "postalCode" : "11206",
      "subAdministrativeArea" : "Kings",
      "subLocality" : "Williamsburg",
      "subThoroughfare" : "451",
      "region" : "<+40.71058351,-73.94912062> radius 23.22",
      "locality" : "New York",
      "name" : "451 Lorimer St",
      "country" : "United States",
      "administrativeArea" : "NY"
    },
    "horizontalAccuracy" : 65,
    "speed" : -1
}

At the time, I was in Williamsburg and very close to 451 Lorimer St, but not actually within said building.

#####Steps

The steps property provides a single numerical value reflecting the number of steps taken between the last report filed and the current report. It is only captured if the user is using an iPhone 5S, which features the M7 motion coprocessor. We at Reporter Inc have decided not to capture steps on other devices because implimenting background step-counting without the M7 is non-trivial and tends to burn battery if you're not careful (read: invest many hours in fine tuning your code for a variety of situations).

#####Audio

Audio is measured decibels, which is "a logarithmic unit used to express the ratio between two values of a physical quantity, often power or intensity." Because it is easier to define a reference sound at the upper limit (where the microphone is overloaded and "clips"), decibels are often expressed as negative values. This is true for the iPhone, so the values that are delivered in this property are the raw output from the iOS CoreAudio API, reflecting the average and peak volume recorded over a single second.

The lower the number, the quieter the noise. The closer the number is to zero (where the audio would clip), the louder the ambient noise.

#####PhotoSet

If the user has taken photos between reports, there will be a photoSet dictionary with a single array of photos written to the snapshot. Each photo object contains the EXIF metadata of the photo. Additionally, the photo object contains a link to the photo asset within iOS. Currently, this information is unused witin the Reporter application and is not of much use outside the iOS system.

#####Connection

The connection attribute indicates the current network connection of the device. Its value corresponds to the following states:

  • 0: Device is connected via cellular network
  • 1: Device is connected via WiFi
  • 2: Device is not connected

#####Weather

The weather dictionary is perhaps the most self-explanitory of the data captured. Dictionary keys are descriptive, detailing the metric and the units used.

For example:

"weather" : {
  "windMPH" : 4,
  "windDirection" : "NW",
  "tempF" : 24.8,
  "precipTodayIn" : 0,
  "windGustKPH" : 20.9,
  "feelslikeC" : -7,
  "visibilityMi" : 10,
  "feelslikeF" : 20,
  "stationID" : "KNYBROOK49",
  "latitude" : 40.69474,
  "windGustMPH" : 13,
  "pressureIn" : 30.31,
  "pressureMb" : 1026,
  "relativeHumidity" : "65%",
  "longitude" : -73.928444,
  "precipTodayMetric" : 0,
  "windKPH" : 6.4,
  "windDegrees" : 318,
  "tempC" : -4,
  "weather" : "Clear",
  "uv" : 0,
  "dewpointC" : -9,
  "visibilityKM" : 16.1
}

Currently, weather data is being captured via the Weather Underground API, whose reference can be found here.

#####Date

In the latest version of Reporter, the timestamp for the report is written out in the following format:

"date" : "2014-03-02T10:18:38-0500"

In previous versions of Reporter, the timestamp was written out as such:

"date" : 412523198.401702

The ethos of Reporter App is to provide you with as much data as possible, in its raw state, if you desire it. However, this is an instance where we took this a bit too far.

The old value of date is the number of seconds which have elapsed since January 1st, 2001 GMT (Obvious, no?). This is the reference date choosen by Apple in their implimentation of NSDate.

#####ReportImpetus

The attribute reportImpetus indicates how the report was triggered. The value for this attribute cooresponds to the following events:

  • 0: Report button tapped
  • 1: Report button tapped while Reporter is asleep
  • 2: Report triggered by notification
  • 3: Report triggered by setting app to sleep
  • 4: Report triggered by waking up app

###Responses

Any information entered by the user in Reporter survey questions will be contained within the responses array. Each question answered is captured as a single dictionary within the array, containing the questionPrompt and the user input or selected responses.

The locationResponse type is the sole exception to this pattern, as it includes the current location data from the iOS location services API and a foursquareVenueId, which is provided by the FourSquare Venues Platform API.

If a question is not answered, it will not be written to the array.

Below are examples of the data stored within the responses array for each question type.

######Token Response

{
  "questionPrompt" : "What are you doing?",
  "tokens" : [
    "Talking",
    "Drinking"
  ]
}

######Multiple Choice Response

{
  "questionPrompt" : "What is your energy?",
  "answeredOptions" : [
    "Neutral"
  ]
}

######Yes/No Response

{
  "questionPrompt" : "Are you working?",
  "answeredOptions" : [
    "No"
  ]
}

######Location Response

{
	"questionPrompt" : "Where are you?",
	"locationResponse" : {
	  "text" : "Welcome to the Johnsons",
	  "location" : {
	    "verticalAccuracy" : -1,
	    "timestamp" : 413520066.367849,
	    "longitude" : -73.98724124700989,
	    "latitude" : 40.71978727764392,
	    "course" : 0,
	    "horizontalAccuracy" : 0,
	    "speed" : -1
	  },
	  "foursquareVenueId" : "3fd66200f964a520dfe31ee3"
	}
}

######People Response

{
  "questionPrompt" : "Who are you with?",
  "tokens" : [
    "Megan Schwartz",
    "Chris Reagan",
    "Josh Behr"
  ]
}

######Number Response

{
  "numericResponse" : "0",
  "questionPrompt" : "How many coffees have you had?"
}

######Note Response

{
  "questionPrompt" : "What are you thinking about?",
  "textResponse" : "I'm likely too old for this bar."
}

###Metadata

The entirity of the data contained within a given report is written to the snapshots array. We have done this for completion purposes and in case new features are added. Rather than having a complex set of hueristics for writing out data, all attributes come along for the ride. This means some attributes are meaningless when written out, specifically:

  • Sync: This is a state variable to ensure each report is saved to Dropbox. It will always be 0 because once it is 1 (or true) the app will not attempt to write it to Dropbox.
  • Background: A state variable indicating the report was captured in the background. We are note captuing reports in the background. Therefore, this attribute is not in use.
  • DwellStatus: Debug variable. Not in use.
  • Draft: A state variable indicating the report is being edited. If it is, it won't be saved. Therefore, this will always be 0.
  • SectionIdentifier: A convenience variable used by the application when displaying reports in a UITableView.
@ejain
Copy link

ejain commented Mar 9, 2014

Which fields can I rely on always being there?

@jakedahn
Copy link

Would it be possible to add versioning to this JSON schema? There has already been changes in some of the fields (for example dates are now ISO8601 vs. unix timestamp).

If I write a tool to consume this JSON, and the app changes its output format in two weeks my tool will be broken and I will need to write a lot of conditional statements about accepting different types of values from different types of properties.

Versioning would help me pin a parser to all of the different changes. For example one year from now after I have collected all of my data and I try to import all of it into my code, it would be easier to say that version 1 does things one way, and version 2 formats things a different way.

@aliou
Copy link

aliou commented May 29, 2014

@ethnt
Copy link

ethnt commented Sep 6, 2014

+1 on versioning as well. Every parser I've tried to use has failed to parse the data because of version differences.

@raichur
Copy link

raichur commented Sep 22, 2014

How do I convert the audio levels supplied in the JSON file to a regular decibel level?

@treasuretron
Copy link

+1 schema versioning

@tamvodopad
Copy link

@raichur Decibels are typically listed on a negative scale, and this is how Apple handles the data and reporter saved this data in json file.

I'm asked N. Feltron - "How you convert this data in app gui to positive decibells?".

Feltron answered:

Here's the conversion we are using:
The raw value from Apple (-160 dB to 0 dB) is in the JSON output. Simply adding to shift the scale makes for nonsense “dBA” values (e.g. 105 dB in a quiet room). We very roughly approximated our display value so it seemed reasonable in this way:
(x + 65) * 2
where x is the raw value Apple gives us, again, -160 dB to 0 dB.
You can still use the raw values from Apple (in JSON) and apply any correction or calibration as they see to be appropriate.

@ejain
Copy link

ejain commented Dec 8, 2014

Looks like tokens are now represented as objects like { "uniqueIdentifier" : "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "text" : "yyy" }. Can someone confirm this?

@benmathes
Copy link

Has anyone built any tools to visualize your own data, not individually but perhaps chart relationships and regress for correlates? I might enjoy this as a side project of my own, but don't want to duplicate work.

For example, graph correlating answers. Or: Detect that you're only sad when you haven't slept or had coffee.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment