Skip to content

Instantly share code, notes, and snippets.

@simecek
Last active March 18, 2020 21:41
Show Gist options
  • Save simecek/d56e5d50ebe8ba9ca4b31b4e0c775738 to your computer and use it in GitHub Desktop.
Save simecek/d56e5d50ebe8ba9ca4b31b4e0c775738 to your computer and use it in GitHub Desktop.
Sampling_Saves_Time.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Sampling_Saves_Time.ipynb",
"provenance": [],
"collapsed_sections": [],
"machine_shape": "hm",
"authorship_tag": "ABX9TyOgylkZ+18ESmy8XzM+Gl38",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/simecek/d56e5d50ebe8ba9ca4b31b4e0c775738/sampling_saves_time.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "5mTmJ0WUc_LU",
"colab_type": "code",
"colab": {}
},
"source": [
"import numpy as np\n",
"x = np.random.random(333_333_333)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "II5IuC3bi0xM",
"colab_type": "text"
},
"source": [
"Calculation on the original data:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "TrYMCMMpeuy1",
"colab_type": "code",
"outputId": "ff751372-72a7-440c-b2f1-d2dbba64aee9",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 68
}
},
"source": [
"%%time\n",
"x.mean(), x.std(), np.quantile(x, 0.75)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"CPU times: user 6.88 s, sys: 1.73 ms, total: 6.88 s\n",
"Wall time: 6.89 s\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(0.4999888096247516, 0.28867514977516945, 0.7499992101695595)"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wWEG8NHti9df",
"colab_type": "text"
},
"source": [
"Calculation on a random sample:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "wBJufpD1fKXL",
"colab_type": "code",
"outputId": "3af8bd3b-adf4-4232-b381-68dd1ff2acf8",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 68
}
},
"source": [
"%%time\n",
"y = np.random.choice(x, 33_333)\n",
"print(y.mean(), y.std(), np.quantile(y, 0.75))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"0.5004472036643502 0.2890277502989701 0.751661040508891\n",
"CPU times: user 12.7 ms, sys: 0 ns, total: 12.7 ms\n",
"Wall time: 12.6 ms\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z2lHjHKhjBa8",
"colab_type": "text"
},
"source": [
"**Conclusion:** While the results are almost the sample, calculation on a sample is several orders of magnitude faster.\n",
"\n",
"**Credit:** Highly inspired by [a tweet](https://twitter.com/raymondh/status/1202772908696854528) from Raymond Hettinger (@raymondh).\n"
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment