Skip to content

Instantly share code, notes, and snippets.

@chloemar10
Last active February 18, 2018 20:09
Show Gist options
  • Save chloemar10/5f87fd49cd67210e8d9c19bf2333161f to your computer and use it in GitHub Desktop.
Save chloemar10/5f87fd49cd67210e8d9c19bf2333161f to your computer and use it in GitHub Desktop.

CHLOE MARTEN
QUANT HUMANISTS
SPRING 2018
05 02 2018

Assignment 2: Document Your Methodology, link to assignment

Introduction

This past week, I further developed the questions I seek to answer as part of my final self-tracking project. Overall, I aim to interpret what my financial data says about me by exploring the following questions:

  • What do I spend my money on?
    • Categorical Spends
  • Where do I spend money?
    • Geographic Location
    • Places/Vendors
  • How has my spending shifted over time?
    • Increased/Decreased Spending in Certain Categories
  • How have my financial habits changed over time?
    • Ratio of Money Spent
    • Ratio of Money Saved/Invested

Below outlines my methodology to go about answering these questions.

Phase 1 - Data Gathering

In order to answer the above questions, I will need to track my financial transactions. I have decided to take an automated approach in gathering my data since I have limited time on a daily basis. To source my financial transaction data I will be using Plaid (https://plaid.com/products/transactions/), from which I can export JSON files. I am currently waiting on Plaid to grant me development access in order to link to my banking and credit card accounts. In the meantime, I was able to use their sandbox data to get a feel for how it works and setup my workflow. One concern I have about using Plaid is that it may not be able to source all my historical transactions. Once I am able to link my accounts, I need to figure out how far back I can go.

Phase 2 - Data Manipulation

Using the sandbox data Plaid provides, I went through the process of cleaning up the raw JSON data to isolate needed data points. In order to answer my questions, I will need to track the following transaction data:

  • Date
  • Amount/Balance
  • Name
  • Location
  • Category
 {
      "amount": -500, 
      "category": [
        "Travel", 
        "Airlines and Aviation Services"
      ], 
      "date": "2018-01-24", 
      "location": {
        "address": null, 
        "city": null, 
        "lat": null, 
        "lon": null, 
        "state": null, 
        "store_number": null, 
        "zip": null
      }, 
      "name": "United Airlines", 
      }, 
    }

This week, I worked on importing a JSON file from Plaid, and then using Jupyter Notebook (http://jupyter.org/) to clean up and identify the needed data points. I got so far as pulling out “amounts” and “date” with the intention to plot them in a line graph.

import json
with open('transactions.json') as f:
    data = json.load(f)

transactions = data["transactions"]
amounts = [t["amount"] for t in transactions]
mydates = [datetime.datetime.strptime(t["date"], "%Y-%m-%d") for t in transactions]

Phase 3 - Data Visualization

I also plan on using Jupyter to organize and plot my data but did not get to it this week. Before plotting, I need to first design the graphs/charts I intend on using to analyze my financial data.

Phase 4 - Data Analysis

The fourth phase is the actual analysis of my data for reflective insights. I imagine there will a bit of trial and error getting the right charts for visualization, which will probably bring me to tweaking phase 2 and 3.

@auremoser
Copy link

Very cool, +1 to everything Joey said here. Plaid seems like a cool app, others like Mint also let you export your data and import from your bank and various financial services, there are some limited visualizations but it's actually mostly useful because of the data export functionality, and the fact that it aggregates from banks, cc and other financial services online.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment