Skip to content

Instantly share code, notes, and snippets.

@dmtucker
Last active April 25, 2020 04:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dmtucker/bffe536fabd27b5ebf43db112a89fa6c to your computer and use it in GitHub Desktop.
Save dmtucker/bffe536fabd27b5ebf43db112a89fa6c to your computer and use it in GitHub Desktop.
What's up with Python 1.17 in the PyPI stats database?
@dmtucker
Copy link
Author

dmtucker commented Mar 4, 2018

1.17 Discussions

March 4, #python on Freenode

15:03    dstufft| ofek: what's up?
15:05       ofek| dstufft we were wondering where 1.17 and None come from https://github.com/ofek/pypinfo#downloads-for-a-project-by-python-version
15:06    dstufft| None just means no data was reported, could be old versions of pip, could be mirroring infrastructure, whatever
15:07    dstufft| 1.17 I have no idea tbh. My guess is it's some alternate implementation of Python that had it's own versioning, and instead of doing urllib-python/{python_version} it did 
                  urllib-python/{implementation_version} for the urllib host header
15:08       ofek| dtux runciter nedbat ^
15:08       ofek| dstufft thank you!
15:08     nedbat| ¯\_(ツ)_/¯
15:09    dstufft| ofek: basically all that data comes from the user-agents, which comes from introspecting the runtime, so our data is only as good as what the runtime provides (and in the case of older pip's 
                  and non-pip agents, sometimes with a healthy dose of assumptions at play too)
15:12       ofek| dstufft, ah okay, makes sense. I'm saving this convo for when this gets asked again :)

March 5, #python on Freenode

[14:53:16] <disi> dstufft: is there any way to get more info on the user agents passing python 1.17? i've been doing some digging without much luck yet. doesnt look like anything recognizable is being provided, but maybe the raw user-agents are available?
[14:54:22] <dstufft> disi: I'd have to dig it out of the archival logs
[14:54:46] <dstufft> we purposely throw that data away from BigQuery to avoid leaking private data people might have in their user agent that we didn't expect
[14:57:22] <dstufft> disi: I did this one before, I think it was literally just the urllib2 UA with "1.17" where "2.7" or "3.6" normally would go

@dmtucker
Copy link
Author

dmtucker commented Mar 4, 2018

What is 1.17 used for?

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 365 --limit 50 --where 'details.python = "1.17"' '' project
Served from cache: False
Data processed: 138.32 GiB
Data billed: 138.32 GiB
Estimated cost: $0.68

| project              | download_count |
| -------------------- | -------------- |
| kudu-python          |         28,210 |
| kazoo                |         26,628 |
| pytest               |         24,272 |
| pytest-xdist         |         24,132 |
| py                   |         23,960 |
| cryptography         |         23,954 |
| cffi                 |         23,803 |
| impyla               |         23,698 |
| setuptools-scm       |         23,033 |
| pycparser            |         19,441 |
| scandir              |         19,173 |
| azure-datalake-store |         19,136 |
| virtualenv           |         16,072 |
| requests             |         11,126 |
| jinja2               |         10,509 |
| botocore             |          9,708 |
| paramiko             |          9,315 |
| fabric               |          8,901 |
| sh                   |          8,796 |
| ordereddict          |          8,696 |
| ecdsa                |          8,686 |
| pycrypto             |          8,658 |
| flask                |          8,639 |
| cython               |          8,620 |
| six                  |          8,543 |
| simplejson           |          8,498 |
| docutils             |          8,421 |
| boto3                |          8,398 |
| jmespath             |          8,266 |
| futures              |          8,260 |
| numpy                |          8,133 |
| pyparsing            |          8,112 |
| python-dateutil      |          8,045 |
| psutil               |          8,005 |
| markupsafe           |          7,943 |
| werkzeug             |          7,925 |
| itsdangerous         |          7,903 |
| pbr                  |          7,882 |
| ipython              |          7,881 |
| pytest-random        |          7,869 |
| pexpect              |          7,836 |
| argparse             |          7,742 |
| thrift               |          7,741 |
| sqlparse             |          7,734 |
| readline             |          7,719 |
| pg8000               |          7,713 |
| docopt               |          7,704 |
| hdfs                 |          7,701 |
| allpairs             |          7,698 |
| cm-api               |          7,697 |

@dmtucker
Copy link
Author

dmtucker commented Mar 5, 2018

How long has 1.17 been showing up?

dtucker@dtucker-wkstn:~ $ pypinfo --days 1095 --limit 36 --order download_month --where 'details.python = "1.17"' '' month
Served from cache: False
Data processed: 186.07 GiB
Data billed: 186.07 GiB
Estimated cost: $0.91

| download_month | download_count |
| -------------- | -------------- |
| 2018-03        |         16,157 |
| 2018-02        |         67,154 |
| 2018-01        |      1,898,377 |
| 2017-12        |         70,865 |
| 2017-11        |         64,577 |
| 2017-10        |         66,968 |
| 2017-09        |         63,809 |
| 2017-08        |         93,022 |
| 2017-07        |        628,639 |
| 2017-06        |        200,335 |
| 2017-05        |        148,618 |
| 2017-04        |         50,842 |
| 2017-03        |         53,755 |
| 2017-02        |         47,414 |
| 2017-01        |         50,651 |
| 2016-12        |         53,138 |
| 2016-11        |        129,537 |
| 2016-10        |        477,819 |
| 2016-09        |      1,013,797 |
| 2016-08        |        116,291 |
| 2016-07        |         81,521 |
| 2016-06        |         59,695 |
| 2016-05        |        151,872 |
| 2016-03        |          4,618 |
| 2016-02        |        226,267 |
| 2016-01        |        184,133 |

@dmtucker
Copy link
Author

dmtucker commented Mar 5, 2018

Where does 1.17 come from?

Standard urllib seems like the most likely source (which is present in at least one fork):

$ python -c 'import sys, urllib, urllib2;  print(sys.version[:3], urllib.__version__, urllib2.__version__)'; python3 -c 'import urllib.request; print(urllib.request.__version__)'
('2.7', '1.17', '2.7')
3.5

What agents may be using urllib?

@dmtucker
Copy link
Author

dmtucker commented Mar 6, 2018

Where is 1.17 being used from?

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 365 --where 'details.python = "1.17"' '' country
Served from cache: False
Data processed: 81.80 GiB
Data billed: 81.80 GiB
Estimated cost: $0.40

| country | download_count |
| ------- | -------------- |
| US      |      3,216,292 |
| JP      |         55,160 |
| CN      |         34,118 |
| GB      |         24,925 |
| IN      |          8,064 |
| None    |          7,190 |
| FR      |          6,131 |
| DE      |          5,703 |
| IL      |          5,109 |
| ES      |          4,710 |

@dmtucker
Copy link
Author

dmtucker commented Mar 6, 2018

Notes

Switching PyPI to HTTPS-only should mean that occurrences dropped in Oct/Nov 2017 (but they didn't).

urllib.urlretrieve uses FancyURLopener which inherits from URLopener which sets the User-Agent.

easy_install's user-agent logic:
https://github.com/pypa/setuptools/blob/97ff22f31ace57f4eabb6f1e77c9c553de0d1c24/setuptools/package_index.py#L50

log parser user agent logic:
https://github.com/pypa/linehaul/blob/master/linehaul/user_agents.py


(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 365 --where 'details.python = "1.17"' '' system distro
Served from cache: False
Data processed: 146.60 GiB
Data billed: 146.60 GiB
Estimated cost: $0.72

| system_name | distro_name | download_count |
| ----------- | ----------- | -------------- |
| None        | None        |      3,405,944 |

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 365 --where 'details.python = "1.17"' '' installer installer-version
Served from cache: False
Data processed: 153.04 GiB
Data billed: 153.04 GiB
Estimated cost: $0.75

| installer_name | installer_version | download_count |
| -------------- | ----------------- | -------------- |
| None           | None              |      3,405,944 |

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 365 --where 'details.python = "1.17"' '' impl impl-version
Served from cache: False
Data processed: 155.91 GiB
Data billed: 155.91 GiB
Estimated cost: $0.77

| implementation | impl_version | download_count |
| -------------- | ------------ | -------------- |
| None           | None         |      3,405,944 |

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 365 --where 'details.python = "1.17"' '' setuptools-version
Served from cache: False
Data processed: 51.03 GiB
Data billed: 51.03 GiB
Estimated cost: $0.25

| setuptools_version | download_count |
| ------------------ | -------------- |
| None               |      3,405,944 |

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 365 --where 'details.python = "1.17"' '' openssl
Served from cache: False
Data processed: 173.99 GiB
Data billed: 173.99 GiB
Estimated cost: $0.85

| openssl_version | download_count |
| --------------- | -------------- |
| None            |      3,405,944 |

Python Version Trends

None of these seem to correspond with 1.17 trends:

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 1095 --limit 36 --order download_month --where 'details.python = "2.6"' '' month
Served from cache: False
Data processed: 186.30 GiB
Data billed: 186.30 GiB
Estimated cost: $0.91

| download_month | download_count |
| -------------- | -------------- |
| 2018-03        |        146,529 |
| 2018-02        |        828,609 |
| 2018-01        |      1,141,411 |
| 2017-12        |      1,614,495 |
| 2017-11        |      1,371,796 |
| 2017-10        |      2,023,681 |
| 2017-09        |      2,289,735 |
| 2017-08        |      2,704,477 |
| 2017-07        |      2,828,229 |
| 2017-06        |      2,889,239 |
| 2017-05        |      2,043,001 |
| 2017-04        |      2,457,703 |
| 2017-03        |      2,680,095 |
| 2017-02        |      2,576,469 |
| 2017-01        |      3,066,075 |
| 2016-12        |      3,053,485 |
| 2016-11        |      3,591,192 |
| 2016-10        |      3,270,814 |
| 2016-09        |      3,209,299 |
| 2016-08        |      4,005,600 |
| 2016-07        |      3,946,704 |
| 2016-06        |      4,171,922 |
| 2016-05        |      2,034,229 |
| 2016-03        |      1,815,018 |
| 2016-02        |      9,519,408 |
| 2016-01        |      2,959,021 |

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 1095 --limit 36 --order download_month --where 'details.python = "2.7"' '' month
Served from cache: False
Data processed: 186.30 GiB
Data billed: 186.30 GiB
Estimated cost: $0.91

| download_month | download_count |
| -------------- | -------------- |
| 2018-03        |      4,668,812 |
| 2018-02        |     22,073,881 |
| 2018-01        |     30,490,386 |
| 2017-12        |     31,818,101 |
| 2017-11        |     30,530,294 |
| 2017-10        |     42,417,489 |
| 2017-09        |     47,877,172 |
| 2017-08        |     61,447,951 |
| 2017-07        |     63,732,403 |
| 2017-06        |     66,186,931 |
| 2017-05        |     68,284,527 |
| 2017-04        |     71,634,346 |
| 2017-03        |     75,668,717 |
| 2017-02        |     72,209,408 |
| 2017-01        |     80,589,481 |
| 2016-12        |     71,611,590 |
| 2016-11        |     78,672,198 |
| 2016-10        |     68,610,169 |
| 2016-09        |     74,346,875 |
| 2016-08        |     72,597,214 |
| 2016-07        |     62,769,844 |
| 2016-06        |     50,753,857 |
| 2016-05        |     17,155,656 |
| 2016-03        |      9,538,368 |
| 2016-02        |     48,248,300 |
| 2016-01        |     15,114,515 |

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 1095 --limit 36 --order download_month --where 'details.python = "3.4"' '' month
Served from cache: False
Data processed: 186.30 GiB
Data billed: 186.30 GiB
Estimated cost: $0.91

| download_month | download_count |
| -------------- | -------------- |
| 2018-03        |        453,018 |
| 2018-02        |      1,966,827 |
| 2018-01        |      2,435,945 |
| 2017-12        |      2,517,948 |
| 2017-11        |      2,310,739 |
| 2017-10        |      2,491,198 |
| 2017-09        |      2,332,760 |
| 2017-08        |      2,697,636 |
| 2017-07        |      2,397,228 |
| 2017-06        |      1,834,341 |
| 2017-05        |      1,863,789 |
| 2017-04        |      1,848,761 |
| 2017-03        |      2,238,398 |
| 2017-02        |      1,794,172 |
| 2017-01        |      1,793,060 |
| 2016-12        |      1,519,826 |
| 2016-11        |      1,539,866 |
| 2016-10        |      1,660,522 |
| 2016-09        |      1,602,485 |
| 2016-08        |      1,999,833 |
| 2016-07        |      1,976,705 |
| 2016-06        |        792,186 |
| 2016-05        |        363,550 |
| 2016-03        |        188,615 |
| 2016-02        |        977,270 |
| 2016-01        |        258,988 |

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 1095 --limit 36 --order download_month --where 'details.python = "3.5"' '' month
Served from cache: False
Data processed: 186.30 GiB
Data billed: 186.30 GiB
Estimated cost: $0.91

| download_month | download_count |
| -------------- | -------------- |
| 2018-03        |        419,007 |
| 2018-02        |      2,124,505 |
| 2018-01        |      2,331,339 |
| 2017-12        |      2,649,399 |
| 2017-11        |      2,759,223 |
| 2017-10        |      2,312,185 |
| 2017-09        |      2,016,890 |
| 2017-08        |      2,438,073 |
| 2017-07        |      2,391,372 |
| 2017-06        |      3,384,902 |
| 2017-05        |      2,832,727 |
| 2017-04        |      3,037,491 |
| 2017-03        |      2,466,478 |
| 2017-02        |      2,367,826 |
| 2017-01        |      2,417,595 |
| 2016-12        |      2,464,993 |
| 2016-11        |      1,820,581 |
| 2016-10        |      2,006,853 |
| 2016-09        |      1,761,309 |
| 2016-08        |      1,242,226 |
| 2016-07        |      1,004,178 |
| 2016-06        |        603,266 |
| 2016-05        |        333,189 |
| 2016-03        |         89,432 |
| 2016-02        |        489,781 |
| 2016-01        |        105,938 |

(david-YYe3LIIa) david@kahuna:~ $ pypinfo --days 1095 --limit 36 --order download_month --where 'details.python = "3.6"' '' month
Served from cache: False
Data processed: 186.30 GiB
Data billed: 186.30 GiB
Estimated cost: $0.91

| download_month | download_count |
| -------------- | -------------- |
| 2018-03        |        460,217 |
| 2018-02        |      2,271,910 |
| 2018-01        |      2,236,416 |
| 2017-12        |      2,251,264 |
| 2017-11        |      5,606,128 |
| 2017-10        |      2,226,206 |
| 2017-09        |      2,054,285 |
| 2017-08        |      2,339,605 |
| 2017-07        |      1,661,571 |
| 2017-06        |      1,726,659 |
| 2017-05        |      1,152,307 |
| 2017-04        |        902,785 |
| 2017-03        |        792,730 |
| 2017-02        |        580,227 |
| 2017-01        |        391,527 |
| 2016-12        |         71,144 |
| 2016-11        |         26,199 |
| 2016-10        |         23,336 |
| 2016-09        |         23,020 |
| 2016-08        |         23,118 |
| 2016-07        |         20,472 |
| 2016-06        |         10,369 |
| 2016-05        |          3,076 |
| 2016-03        |          1,319 |
| 2016-02        |          7,425 |
| 2016-01        |          1,597 |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment