emjun/tea_example_0.ipynb

## tea_example_0.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Example of using Tea",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/emjun/f0ebafc97c208b9f329b1e8c14f90a7f/tea_example_0.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "JyG45Qk3qQLS"
      },
      "source": [
        "# Tea\n",
        "Tea requires users to describe their data, variables, study design, assumptions about the data, and hypotheses at a high-level. Tea combined computed properties about the data (e.g., normal distribution) with users' assumptions and hypotheses to infer a set of valid statisitcal analyses that test users' hypotheses. Unlike other statistical analysis tools, Tea focuses on capturing users' *explicit* hypotheses and assumptions about the data. \n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ua--kwl8Ge7x",
        "colab_type": "text"
      },
      "source": [
        "## Example\n",
        "\n",
        "Let's walk through an example! Make sure to install Tea before.  :)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VLt-_g2lGtHi",
        "colab_type": "text"
      },
      "source": [
        "## 1. Import tea"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-gsAYPFgGxhL",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import tea"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "KR921S_OQSHG"
      },
      "source": [
        "## Data\n",
        "\n",
        "Load data."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "both",
        "colab_type": "code",
        "id": "WUtu4316QSHL",
        "colab": {}
      },
      "source": [
        "\n",
        "tea.data(\"https://homes.cs.washington.edu/~emjun/tea-lang/datasets/UScrime.csv\")\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Id6tDF1HQSHD"
      },
      "source": [
        "## Variables\n",
        "\n",
        "Declare and annotate the variables of interest."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9_3Uxu7cGS0D",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "variables = [\n",
        "    {\n",
        "        'name' : 'So',\n",
        "        'data type' : 'nominal',\n",
        "        'categories' : ['0', '1']\n",
        "    },\n",
        "    {\n",
        "        'name' : 'Prob',\n",
        "        'data type' : 'ratio',\n",
        "        'range' : [0,1]\n",
        "    }\n",
        "]\n",
        "\n",
        "tea.define_variables(variables)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qwy7tpFCHlkw",
        "colab_type": "text"
      },
      "source": [
        "## Assumptions\n",
        "\n",
        "Declare any assumptions you may have about the data based on prior visualization or domain knowledge."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "JIrQI3r4NZRI",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "assumptions = {\n",
        "    'groups normally distributed': [['So', 'Prob']],\n",
        "    'Type I (False Positive) Error Rate': 0.05,\n",
        "}\n",
        "\n",
        "tea.assume(assumptions)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1fCGhA2DNbHh",
        "colab_type": "text"
      },
      "source": [
        "## Study Design\n",
        "\n",
        "Express how the data were collected."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "DiZp2K-rN24X",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "experimental_design = {\n",
        "                        'study type': 'observational study',\n",
        "                        'contributor variables': 'So',\n",
        "                        'outcome variables': 'Prob',\n",
        "                    }\n",
        "tea.define_study_design(experimental_design)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lFc-d10JN3_i",
        "colab_type": "text"
      },
      "source": [
        "## Hypothesis\n",
        "Explicitly state a hypothesis about the relationship between the variables in the data."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "5mNM6Q8uOE3h",
        "colab_type": "code"
      },
      "source": [
        "tea.hypothesize(['So', 'Prob'], ['So:1 > 0'])"
      ]
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "Example of using Tea",
	"version": "0.3.2",
	"provenance": [],
	"collapsed_sections": [],
	"include_colab_link": true
	},
	"kernelspec": {
	"display_name": "Python 3",
	"name": "python3"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/emjun/f0ebafc97c208b9f329b1e8c14f90a7f/tea_example_0.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"colab_type": "text",
	"id": "JyG45Qk3qQLS"
	},
	"source": [
	"# Tea\n",
	"Tea requires users to describe their data, variables, study design, assumptions about the data, and hypotheses at a high-level. Tea combined computed properties about the data (e.g., normal distribution) with users' assumptions and hypotheses to infer a set of valid statisitcal analyses that test users' hypotheses. Unlike other statistical analysis tools, Tea focuses on capturing users' explicit hypotheses and assumptions about the data. \n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "Ua--kwl8Ge7x",
	"colab_type": "text"
	},
	"source": [
	"## Example\n",
	"\n",
	"Let's walk through an example! Make sure to install Tea before. :)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "VLt-_g2lGtHi",
	"colab_type": "text"
	},
	"source": [
	"## 1. Import tea"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "-gsAYPFgGxhL",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"import tea"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"colab_type": "text",
	"id": "KR921S_OQSHG"
	},
	"source": [
	"## Data\n",
	"\n",
	"Load data."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"cellView": "both",
	"colab_type": "code",
	"id": "WUtu4316QSHL",
	"colab": {}
	},
	"source": [
	"\n",
	"tea.data(\"https://homes.cs.washington.edu/~emjun/tea-lang/datasets/UScrime.csv\")\n"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"colab_type": "text",
	"id": "Id6tDF1HQSHD"
	},
	"source": [
	"## Variables\n",
	"\n",
	"Declare and annotate the variables of interest."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "9_3Uxu7cGS0D",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"variables = [\n",
	" {\n",
	" 'name' : 'So',\n",
	" 'data type' : 'nominal',\n",
	" 'categories' : ['0', '1']\n",
	" },\n",
	" {\n",
	" 'name' : 'Prob',\n",
	" 'data type' : 'ratio',\n",
	" 'range' : [0,1]\n",
	" }\n",
	"]\n",
	"\n",
	"tea.define_variables(variables)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "qwy7tpFCHlkw",
	"colab_type": "text"
	},
	"source": [
	"## Assumptions\n",
	"\n",
	"Declare any assumptions you may have about the data based on prior visualization or domain knowledge."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "JIrQI3r4NZRI",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"assumptions = {\n",
	" 'groups normally distributed': [['So', 'Prob']],\n",
	" 'Type I (False Positive) Error Rate': 0.05,\n",
	"}\n",
	"\n",
	"tea.assume(assumptions)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "1fCGhA2DNbHh",
	"colab_type": "text"
	},
	"source": [
	"## Study Design\n",
	"\n",
	"Express how the data were collected."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "DiZp2K-rN24X",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"experimental_design = {\n",
	" 'study type': 'observational study',\n",
	" 'contributor variables': 'So',\n",
	" 'outcome variables': 'Prob',\n",
	" }\n",
	"tea.define_study_design(experimental_design)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "lFc-d10JN3_i",
	"colab_type": "text"
	},
	"source": [
	"## Hypothesis\n",
	"Explicitly state a hypothesis about the relationship between the variables in the data."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "5mNM6Q8uOE3h",
	"colab_type": "code"
	},
	"source": [
	"tea.hypothesize(['So', 'Prob'], ['So:1 > 0'])"
	]
	}
	]
	}