Skip to content

Instantly share code, notes, and snippets.

@emjun
Created October 19, 2019 00:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save emjun/89d3658fe59042b9da784fc21b7f3e9f to your computer and use it in GitHub Desktop.
Save emjun/89d3658fe59042b9da784fc21b7f3e9f to your computer and use it in GitHub Desktop.
Example of using Tea
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Example of using Tea",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/emjun/cca25cdbeee39afe90771dde9cb80457/tea_example_0.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "JyG45Qk3qQLS"
},
"source": [
"# Tea\n",
"Tea requires users to describe their data, variables, study design, assumptions about the data, and hypotheses at a high-level. Tea combines computed properties about the data (e.g., normal distribution) with users' assumptions and hypotheses to infer a set of valid statisitcal analyses that test users' hypotheses. Unlike other statistical analysis tools, Tea focuses on capturing users' *explicit* hypotheses and assumptions about the data and does *not* require users to specify the specific statistical tests. \n",
"\n",
"Tea is designed for non-statistical experts!!"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ua--kwl8Ge7x",
"colab_type": "text"
},
"source": [
"## Example\n",
"\n",
"Let's walk through an example! Make sure to [install Tea before](http://tea-lang.org/install). :)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VLt-_g2lGtHi",
"colab_type": "text"
},
"source": [
"## 1. Import tea"
]
},
{
"cell_type": "code",
"metadata": {
"id": "-gsAYPFgGxhL",
"colab_type": "code",
"colab": {}
},
"source": [
"import tea"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "KR921S_OQSHG"
},
"source": [
"## Data\n",
"\n",
"**Load data.**\n",
"\n",
"This example is taken from Ehrlich[1] and Vandaele[2]. The data set comes as part of the MASS package in R.\n",
"\n",
"Let's say you're a historical criminologist who wants to know \"Is there a significant difference in imprisonment probabilities between Southern and non-Southern states?\"\n",
"\n",
"\n",
"\n",
"[1] Isaac Ehrlich. 1973. Participation in illegitimate activities: A theoretical and empirical investigation. Journal of political Economy. 81, 3 (1973), 521–565.\n",
"\n",
"[2] Walter Vandaele. 1987. Participation in illegitimate activities: Ehrlich revisited, 1960. Vol. 8677. Inter-university Consortium for Political and Social Research."
]
},
{
"cell_type": "code",
"metadata": {
"cellView": "both",
"colab_type": "code",
"id": "WUtu4316QSHL",
"colab": {}
},
"source": [
"\n",
"tea.data(\"https://homes.cs.washington.edu/~emjun/tea-lang/datasets/UScrime.csv\")\n"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Id6tDF1HQSHD"
},
"source": [
"## Variables\n",
"\n",
"**Declare and annotate the variables of interest.**\n",
"\n",
"There are two variables: `So` and `Prob`. \n",
"\n",
"`So` is a binary nominal variable where `So=1` means the state is *Southern* and `So=0` means the state is *non-Southern*.\n",
"\n",
"`Prob` is a ratio variable for the probability of imprisonment in each state."
]
},
{
"cell_type": "code",
"metadata": {
"id": "9_3Uxu7cGS0D",
"colab_type": "code",
"colab": {}
},
"source": [
"variables = [\n",
" {\n",
" 'name' : 'So',\n",
" 'data type' : 'nominal', # Options: 'nominal', 'ordinal', 'interval', 'ratio'\n",
" 'categories' : ['0', '1']\n",
" },\n",
" {\n",
" 'name' : 'Prob',\n",
" 'data type' : 'ratio', # Options: 'nominal', 'ordinal', 'interval', 'ratio'\n",
" 'range' : [0,1] # optional\n",
" }\n",
"]\n",
"\n",
"tea.define_variables(variables)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "qwy7tpFCHlkw",
"colab_type": "text"
},
"source": [
"## Assumptions\n",
"\n",
"**OPTIONAL: Declare any assumptions you may have about the data based on prior visualization or domain knowledge.**\n",
"\n",
"Based on prior knowledge in historical criminology, you might assume that probability of imprisonment is normally distributed in Southern and non-Southern states (the two groups of interest).\n",
"\n",
"If no Type I Error Rate (or \"significance threshold\") is specified, Tea will use .05."
]
},
{
"cell_type": "code",
"metadata": {
"id": "JIrQI3r4NZRI",
"colab_type": "code",
"colab": {}
},
"source": [
"assumptions = {\n",
" 'groups normally distributed': [['So', 'Prob']],\n",
" 'Type I (False Positive) Error Rate': 0.05,\n",
"}\n",
"\n",
"tea.assume(assumptions)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "1fCGhA2DNbHh",
"colab_type": "text"
},
"source": [
"## Study Design\n",
"\n",
"**Express how the data were collected.**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "DiZp2K-rN24X",
"colab_type": "code",
"colab": {}
},
"source": [
"experimental_design = {\n",
" 'study type': 'observational study', # 'study type' could be 'experiment'\n",
" 'contributor variables': 'So', # 'experiment's have 'independent variables'\n",
" 'outcome variables': 'Prob', # 'experiment's have 'dependent variables'\n",
" }\n",
"tea.define_study_design(experimental_design)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "lFc-d10JN3_i",
"colab_type": "text"
},
"source": [
"## Hypothesis\n",
"**Explicitly state a hypothesis about the relationship between the variables in the data.**\n",
"\n",
"Based on your domain knowledge, you might hypothesize that there is a relationship between a state being Southern/non-Southern (`So`) and the probability of imprisonment (`Prob`).\n",
"In particular, you might hypothesize that Southern states (`So = 1`) have higher (`>`) imprisonment probabilities than non-Southern states (`So = 0`)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "5mNM6Q8uOE3h",
"colab_type": "code",
"colab": {}
},
"source": [
"tea.hypothesize(['So', 'Prob'], ['So:1 > 0'])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "NdBJgR3hYc3Q",
"colab_type": "text"
},
"source": [
"# Hope that was as fun and easy as a cup of tea! ;)\n",
"If you have any **COMMENTS OR FEEDBACK** please do not hesitate to get in touch: emjun [at] cs dot washington dot edu or join us in the [Tea Room](https://gitter.im/tea-room)."
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment