emjun/tea_example_0.1.ipynb

## tea_example_0.1.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Example of using Tea",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/emjun/cca25cdbeee39afe90771dde9cb80457/tea_example_0.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "JyG45Qk3qQLS"
      },
      "source": [
        "# Tea\n",
        "Tea requires users to describe their data, variables, study design, assumptions about the data, and hypotheses at a high-level. Tea combines computed properties about the data (e.g., normal distribution) with users' assumptions and hypotheses to infer a set of valid statisitcal analyses that test users' hypotheses. Unlike other statistical analysis tools, Tea focuses on capturing users' *explicit* hypotheses and assumptions about the data and does *not* require users to specify the specific statistical tests. \n",
        "\n",
        "Tea is designed for non-statistical experts!!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ua--kwl8Ge7x",
        "colab_type": "text"
      },
      "source": [
        "## Example\n",
        "\n",
        "Let's walk through an example! Make sure to [install Tea before](http://tea-lang.org/install).  :)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VLt-_g2lGtHi",
        "colab_type": "text"
      },
      "source": [
        "## 1. Import tea"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-gsAYPFgGxhL",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import tea"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "KR921S_OQSHG"
      },
      "source": [
        "## Data\n",
        "\n",
        "**Load data.**\n",
        "\n",
        "This example is taken from Ehrlich[1] and Vandaele[2].  The data set comes as part of the MASS package in R.\n",
        "\n",
        "Let's say you're a historical criminologist who wants to know \"Is there a significant difference in imprisonment probabilities between Southern and non-Southern states?\"\n",
        "\n",
        "\n",
        "\n",
        "[1] Isaac Ehrlich. 1973. Participation in illegitimate activities: A theoretical and empirical investigation. Journal of political Economy. 81, 3 (1973), 521–565.\n",
        "\n",
        "[2] Walter Vandaele. 1987. Participation in illegitimate activities: Ehrlich revisited, 1960. Vol. 8677. Inter-university Consortium for Political and Social Research."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "WUtu4316QSHL",
        "colab": {}
      },
      "source": [
        "\n",
        "tea.data(\"https://homes.cs.washington.edu/~emjun/tea-lang/datasets/UScrime.csv\")\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Id6tDF1HQSHD"
      },
      "source": [
        "## Variables\n",
        "\n",
        "**Declare and annotate the variables of interest.**\n",
        "\n",
        "There are two variables: `So` and `Prob`. \n",
        "\n",
        "`So` is a binary nominal variable where `So=1` means the state is *Southern* and `So=0` means the state is *non-Southern*.\n",
        "\n",
        "`Prob` is a ratio variable for the probability of imprisonment in each state."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9_3Uxu7cGS0D",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "variables = [\n",
        "    {\n",
        "        'name' : 'So',\n",
        "        'data type' : 'nominal',   # Options: 'nominal', 'ordinal', 'interval', 'ratio'\n",
        "        'categories' : ['0', '1']\n",
        "    },\n",
        "    {\n",
        "        'name' : 'Prob',\n",
        "        'data type' : 'ratio',   # Options: 'nominal', 'ordinal', 'interval', 'ratio'\n",
        "        'range' : [0,1]   # optional\n",
        "    }\n",
        "]\n",
        "\n",
        "tea.define_variables(variables)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qwy7tpFCHlkw",
        "colab_type": "text"
      },
      "source": [
        "## Assumptions\n",
        "\n",
        "**OPTIONAL: Declare any assumptions you may have about the data based on prior visualization or domain knowledge.**\n",
        "\n",
        "Based on prior knowledge in historical criminology, you might assume that probability of imprisonment is normally distributed in Southern and non-Southern states (the two groups of interest).\n",
        "\n",
        "If no Type I Error Rate (or \"significance threshold\") is specified, Tea will use .05."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "JIrQI3r4NZRI",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "assumptions = {\n",
        "    'groups normally distributed': [['So', 'Prob']],\n",
        "    'Type I (False Positive) Error Rate': 0.05,\n",
        "}\n",
        "\n",
        "tea.assume(assumptions)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1fCGhA2DNbHh",
        "colab_type": "text"
      },
      "source": [
        "## Study Design\n",
        "\n",
        "**Express how the data were collected.**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "DiZp2K-rN24X",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "experimental_design = {\n",
        "                        'study type': 'observational study',   # 'study type' could be 'experiment'\n",
        "                        'contributor variables': 'So',   # 'experiment's have 'independent variables'\n",
        "                        'outcome variables': 'Prob',   # 'experiment's have 'dependent variables'\n",
        "                    }\n",
        "tea.define_study_design(experimental_design)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lFc-d10JN3_i",
        "colab_type": "text"
      },
      "source": [
        "## Hypothesis\n",
        "**Explicitly state a hypothesis about the relationship between the variables in the data.**\n",
        "\n",
        "Based on your domain knowledge, you might hypothesize that there is a relationship between a state being Southern/non-Southern (`So`) and the probability of imprisonment (`Prob`).\n",
        "In particular, you might hypothesize that Southern states (`So = 1`) have higher (`>`) imprisonment probabilities than non-Southern states (`So = 0`)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "5mNM6Q8uOE3h",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "tea.hypothesize(['So', 'Prob'], ['So:1 > 0'])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NdBJgR3hYc3Q",
        "colab_type": "text"
      },
      "source": [
        "# Hope that was as fun and easy as a cup of tea! ;)\n",
        "If you have any **COMMENTS OR FEEDBACK** please do not hesitate to get in touch: emjun [at] cs dot washington dot edu or join us in the [Tea Room](https://gitter.im/tea-room)."
      ]
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "Example of using Tea",
	"provenance": [],
	"collapsed_sections": [],
	"include_colab_link": true
	},
	"kernelspec": {
	"display_name": "Python 3",
	"name": "python3"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/emjun/cca25cdbeee39afe90771dde9cb80457/tea_example_0.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"colab_type": "text",
	"id": "JyG45Qk3qQLS"
	},
	"source": [
	"# Tea\n",
	"Tea requires users to describe their data, variables, study design, assumptions about the data, and hypotheses at a high-level. Tea combines computed properties about the data (e.g., normal distribution) with users' assumptions and hypotheses to infer a set of valid statisitcal analyses that test users' hypotheses. Unlike other statistical analysis tools, Tea focuses on capturing users' explicit hypotheses and assumptions about the data and does not require users to specify the specific statistical tests. \n",
	"\n",
	"Tea is designed for non-statistical experts!!"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "Ua--kwl8Ge7x",
	"colab_type": "text"
	},
	"source": [
	"## Example\n",
	"\n",
	"Let's walk through an example! Make sure to [install Tea before](http://tea-lang.org/install). :)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "VLt-_g2lGtHi",
	"colab_type": "text"
	},
	"source": [
	"## 1. Import tea"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "-gsAYPFgGxhL",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"import tea"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"colab_type": "text",
	"id": "KR921S_OQSHG"
	},
	"source": [
	"## Data\n",
	"\n",
	"Load data.\n",
	"\n",
	"This example is taken from Ehrlich[1] and Vandaele[2]. The data set comes as part of the MASS package in R.\n",
	"\n",
	"Let's say you're a historical criminologist who wants to know \"Is there a significant difference in imprisonment probabilities between Southern and non-Southern states?\"\n",
	"\n",
	"\n",
	"\n",
	"[1] Isaac Ehrlich. 1973. Participation in illegitimate activities: A theoretical and empirical investigation. Journal of political Economy. 81, 3 (1973), 521–565.\n",
	"\n",
	"[2] Walter Vandaele. 1987. Participation in illegitimate activities: Ehrlich revisited, 1960. Vol. 8677. Inter-university Consortium for Political and Social Research."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"cellView": "both",
	"colab_type": "code",
	"id": "WUtu4316QSHL",
	"colab": {}
	},
	"source": [
	"\n",
	"tea.data(\"https://homes.cs.washington.edu/~emjun/tea-lang/datasets/UScrime.csv\")\n"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"colab_type": "text",
	"id": "Id6tDF1HQSHD"
	},
	"source": [
	"## Variables\n",
	"\n",
	"Declare and annotate the variables of interest.\n",
	"\n",
	"There are two variables: `So` and `Prob`. \n",
	"\n",
	"`So` is a binary nominal variable where `So=1` means the state is Southern and `So=0` means the state is non-Southern.\n",
	"\n",
	"`Prob` is a ratio variable for the probability of imprisonment in each state."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "9_3Uxu7cGS0D",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"variables = [\n",
	" {\n",
	" 'name' : 'So',\n",
	" 'data type' : 'nominal', # Options: 'nominal', 'ordinal', 'interval', 'ratio'\n",
	" 'categories' : ['0', '1']\n",
	" },\n",
	" {\n",
	" 'name' : 'Prob',\n",
	" 'data type' : 'ratio', # Options: 'nominal', 'ordinal', 'interval', 'ratio'\n",
	" 'range' : [0,1] # optional\n",
	" }\n",
	"]\n",
	"\n",
	"tea.define_variables(variables)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "qwy7tpFCHlkw",
	"colab_type": "text"
	},
	"source": [
	"## Assumptions\n",
	"\n",
	"OPTIONAL: Declare any assumptions you may have about the data based on prior visualization or domain knowledge.\n",
	"\n",
	"Based on prior knowledge in historical criminology, you might assume that probability of imprisonment is normally distributed in Southern and non-Southern states (the two groups of interest).\n",
	"\n",
	"If no Type I Error Rate (or \"significance threshold\") is specified, Tea will use .05."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "JIrQI3r4NZRI",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"assumptions = {\n",
	" 'groups normally distributed': [['So', 'Prob']],\n",
	" 'Type I (False Positive) Error Rate': 0.05,\n",
	"}\n",
	"\n",
	"tea.assume(assumptions)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "1fCGhA2DNbHh",
	"colab_type": "text"
	},
	"source": [
	"## Study Design\n",
	"\n",
	"Express how the data were collected."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "DiZp2K-rN24X",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"experimental_design = {\n",
	" 'study type': 'observational study', # 'study type' could be 'experiment'\n",
	" 'contributor variables': 'So', # 'experiment's have 'independent variables'\n",
	" 'outcome variables': 'Prob', # 'experiment's have 'dependent variables'\n",
	" }\n",
	"tea.define_study_design(experimental_design)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "lFc-d10JN3_i",
	"colab_type": "text"
	},
	"source": [
	"## Hypothesis\n",
	"Explicitly state a hypothesis about the relationship between the variables in the data.\n",
	"\n",
	"Based on your domain knowledge, you might hypothesize that there is a relationship between a state being Southern/non-Southern (`So`) and the probability of imprisonment (`Prob`).\n",
	"In particular, you might hypothesize that Southern states (`So = 1`) have higher (`>`) imprisonment probabilities than non-Southern states (`So = 0`)."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "5mNM6Q8uOE3h",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"tea.hypothesize(['So', 'Prob'], ['So:1 > 0'])"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "NdBJgR3hYc3Q",
	"colab_type": "text"
	},
	"source": [
	"# Hope that was as fun and easy as a cup of tea! ;)\n",
	"If you have any COMMENTS OR FEEDBACK please do not hesitate to get in touch: emjun [at] cs dot washington dot edu or join us in the [Tea Room](https://gitter.im/tea-room)."
	]
	}
	]
	}