When released, this story will ENTER TEXT HERE.
As a ENTER ROLE I want ENTER GOAL, so that ENTER REASON(S).
def make_polynomial(dataframe, degree=MAX_DEGREE): | |
"""Function for creating higher-order polynomial features from dataframe. | |
Dataframe df should be like [Y, X1, X2, .. Xi]. | |
Returns dataframe polynomial features of X1 ... Xi up to degree polynomials.""" | |
df = dataframe.copy() | |
cols = df.columns[1:] | |
for i in range(2, degree + 1): | |
for col in cols: |
# baseURI: http://opendata.stelselcatalogus.nl/id/dataset/sc | |
# imports: http://purl.org/dc/elements/1.1/ | |
# imports: http://rdfs.org/ns/void | |
# imports: http://www.w3.org/2004/02/skos/core | |
@prefix adms: <http://www.w3.org/ns/adms#> . | |
@prefix begrip_banken: <http://opendata.stelselcatalogus.nl/banken/id/begrip/> . | |
@prefix begrip_bgt: <http://opendata.stelselcatalogus.nl/bgt/id/begrip/> . | |
@prefix begrip_bri: <http://opendata.stelselcatalogus.nl/bri/id/begrip/> . | |
@prefix begrip_brk: <http://opendata.stelselcatalogus.nl/brk/id/begrip/> . |
KWB = { | |
2016: "83487NED", | |
2017: "83765NED", | |
2018: "84286NED", | |
2019: "84583NED", | |
2020: "84799NED" | |
} |
Zan Armstrong's comet chart has been on my list of hobby projects for a while now. I think it is an elegant solution to visualize statistical mix effects and address Simpson's paradox, and particularly useful when working with longitudinal data involving different sub-populations. Recently I found a good excuse to spend some time to actually use it as part of a exploratory data analysis on a project.
Since I mostly work in Python and have recently fallen in love with Altair - for the same reasons as Fernando explains here - I wondered how the comet chart could be implemented using the grammar of interactive graphics. It took me a while to figure out how to actually plot the c
import altair as alt | |
import pandas as pd | |
import vega_datasets | |
# Use airline data to assess statistical mix effects of delays | |
flights = vega_datasets.data.flights_20k() | |
aggregation = dict( | |
number_of_flights=("destination", "count"), | |
mean_delay=("delay", "mean"), |
Inevitably, when I teach introductory courses on machine learning at the Jheronimus Academy of Data Science, I get questions like:
Notwithstanding the wealth of information that is available in the creative commons, it is hard to see the forest for the trees. I found myself doing the same websearches over and over again, and I thought: 'Surely there must be a better way?'.
poetry: ## generate setup.py, environment.yml and requirements.txt from poetry | |
dephell deps convert | |
dephell deps convert --env pip | |
dephell deps convert --env conda | |
kernel: | |
poetry run ipython kernel install --user --name=your-project-name |