Skip to content

Instantly share code, notes, and snippets.

@MaxHalford
Last active March 11, 2023 06:33
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MaxHalford/d4e78fa4c86d977874aedb1528fc92c1 to your computer and use it in GitHub Desktop.
Save MaxHalford/d4e78fa4c86d977874aedb1528fc92c1 to your computer and use it in GitHub Desktop.
Are Airbnb guests less energy efficient than their host?
date kilowatt-hour n_hosts n_guests temperature
2022-01-01 2.171 0.0 0.0 7.875000000000028
2022-01-02 10.31 0.0 0.5 8.787500000000023
2022-01-03 16.107 0.0 1.0 8.35000000000003
2022-01-04 16.563 0.0 1.0 9.250000000000014
2022-01-05 17.098 0.0 1.0 5.700000000000024
2022-01-06 18.76 0.0 1.0 2.950000000000024
2022-01-07 20.853 0.0 1.0 3.2500000000000284
2022-01-08 19.548 0.0 1.0 8.887500000000031
2022-01-09 20.81 0.0 1.0 8.72500000000003
2022-01-10 19.277 0.0 1.0 7.837500000000027
2022-01-11 18.977 0.0 1.0 5.85000000000003
2022-01-12 10.933 0.0 0.5 2.275000000000013
2022-01-13 15.214 0.0 0.5 -0.1499999999999701
2022-01-14 21.192 0.0 1.0 0.8000000000000114
2022-01-15 22.991 0.0 1.0 1.71250000000002
2022-01-16 24.476 0.0 1.0 1.1625000000000227
2022-01-17 23.372 0.0 1.0 -0.5874999999999773
2022-01-18 26.668 0.0 1.0 0.4875000000000256
2022-01-19 28.921 0.0 1.0 4.312500000000021
2022-01-20 24.774 0.0 1.0 4.9875000000000185
2022-01-21 21.979 0.0 1.0 2.337500000000034
2022-01-22 28.041 0.0 1.0 1.1875000000000213
2022-01-23 25.802 0.0 1.0 2.137500000000024
2022-01-24 21.516 0.0 1.0 6.325000000000024
2022-01-25 20.496 0.0 1.0 4.050000000000011
2022-01-26 21.634 0.0 1.0 3.000000000000014
2022-01-27 23.348 0.0 1.0 0.9500000000000242
2022-01-28 25.095 0.0 1.0 0.2500000000000213
2022-01-29 30.351 0.0 1.0 0.8375000000000128
2022-01-30 27.926 0.0 1.0 0.7500000000000142
2022-01-31 26.02 0.0 1.0 5.537500000000023
2022-02-01 24.038 0.0 1.0 6.350000000000023
2022-02-02 21.191 0.0 1.0 8.525000000000027
2022-02-03 20.848 0.0 1.0 5.800000000000033
2022-02-04 9.168 0.0 1.0 9.825000000000024
2022-02-05 7.638 0.0 1.0 8.08750000000002
2022-02-06 9.832 0.0 1.0 5.212500000000034
2022-02-07 11.215 0.0 1.0 9.325000000000031
2022-02-08 21.322 0.0 1.0 7.537500000000023
2022-02-09 18.505 0.0 1.0 8.925000000000018
2022-02-10 11.716 0.0 1.0 9.787500000000023
2022-02-11 5.696 0.0 1.0 9.375000000000021
2022-02-12 7.385 0.0 0.5 6.65000000000002
2022-02-13 6.383 0.0 0.0 11.112500000000026
2022-02-14 6.126 0.0 0.0 8.837500000000027
2022-02-15 7.283 0.0 0.0 8.550000000000033
2022-02-16 7.655 0.0 0.0 11.000000000000021
2022-02-17 11.835 1.0 0.0 13.000000000000021
2022-02-18 16.639 2.0 0.0 12.050000000000018
2022-02-19 14.473 1.0 0.5 9.200000000000024
2022-02-20 15.226 0.0 1.0 7.912500000000016
2022-02-21 19.018 0.0 1.0 11.15000000000002
2022-02-22 10.834 0.0 1.0 10.537500000000016
2022-02-23 13.09 0.0 1.0 9.887500000000017
2022-02-24 9.604 0.0 1.0 10.737500000000033
2022-02-25 10.532 0.0 0.5 8.075000000000031
2022-02-26 16.355 1.0 0.0 6.40000000000002
2022-02-27 5.813 2.0 0.0 8.237500000000026
2022-02-28 4.149 2.0 0.0 10.950000000000017
2022-03-01 5.304 1.0 0.0 8.90000000000002
2022-03-02 2.062 0.0 0.0 9.72500000000003
2022-03-03 2.2 0.0 0.0 10.950000000000024
2022-03-04 2.652 0.0 0.0 9.28750000000003
2022-03-05 2.606 0.0 0.0 8.02500000000002
2022-03-06 2.18 0.0 0.0 7.58750000000002
2022-03-07 8.854 0.0 0.5 6.025000000000027
2022-03-08 19.768 0.0 1.0 9.437500000000028
2022-03-09 16.199 0.0 1.0 9.85000000000003
2022-03-10 14.997 0.0 1.0 11.825000000000024
2022-03-11 16.97 0.0 1.0 12.237500000000018
2022-03-12 14.738 0.0 1.0 9.600000000000016
2022-03-13 16.351 0.0 1.0 9.450000000000024
2022-03-14 19.501 0.0 1.0 9.725000000000016
2022-03-15 18.596 0.0 1.0 13.412500000000023
2022-03-16 15.068 0.0 1.0 13.225000000000009
2022-03-17 9.547 0.0 0.5 12.425000000000026
2022-03-18 2.236 0.0 0.0 10.225000000000023
2022-03-19 6.737 0.5 0.0 11.912500000000023
2022-03-20 6.707 1.0 0.0 12.400000000000027
2022-03-21 5.577 1.0 0.0 12.612500000000018
2022-03-22 8.594 1.0 0.0 11.675000000000011
2022-03-23 7.468 0.5 0.0 11.53750000000003
2022-03-24 2.465 0.0 0.0 11.587500000000027
2022-03-25 3.999 0.0 0.0 12.83750000000002
2022-03-26 5.441 1.0 0.0 12.212500000000027
2022-03-27 8.289 2.0 0.0 13.850000000000023
2022-03-28 9.167 2.0 0.0 14.275000000000013
2022-03-29 7.701 2.0 0.0 13.287500000000023
2022-03-30 5.418 1.0 0.0 10.65000000000002
2022-03-31 2.536 0.0 0.0 9.912500000000023
2022-04-01 6.991 0.0 0.0 4.725000000000023
2022-04-02 6.458 0.0 0.0 4.125000000000021
2022-04-03 2.155 0.0 0.0 3.3500000000000227
2022-04-04 2.056 0.0 0.0 4.937500000000021
2022-04-05 2.461 0.0 0.0 7.625000000000028
2022-04-06 2.506 0.0 0.0 10.000000000000014
2022-04-07 3.804 0.0 0.5 13.737500000000018
2022-04-08 7.752 0.0 1.0 14.937500000000014
2022-04-09 8.455 0.0 1.0 10.762500000000024
2022-04-10 11.004 0.0 1.0 10.725000000000023
2022-04-11 7.542 0.0 1.0 13.52500000000002
2022-04-12 7.666 0.0 1.0 13.825000000000017
2022-04-13 12.87 0.0 1.0 12.662500000000023
2022-04-14 7.723 0.0 1.0 14.15000000000002
2022-04-15 4.868 0.0 1.0 15.437500000000021
2022-04-16 4.59 0.0 1.0 16.600000000000023
2022-04-17 6.654 0.0 1.0 16.512500000000024
2022-04-18 6.511 0.0 1.0 13.862500000000018
2022-04-19 4.867 0.0 1.0 11.412500000000023
2022-04-20 7.882 0.0 1.0 10.16250000000003
2022-04-21 7.28 0.0 1.0 10.812500000000021
2022-04-22 7.757 0.0 1.0 13.800000000000018
2022-04-23 9.142 0.0 1.0 10.987500000000033
2022-04-24 12.604 0.0 1.0 13.037500000000023
2022-04-25 9.306 0.0 1.0 12.837500000000027
2022-04-26 7.356 0.0 1.0 15.725000000000023
2022-04-27 5.622 0.0 1.0 16.47500000000003
2022-04-28 3.429 0.0 1.0 15.612500000000033
2022-04-29 2.528 0.0 1.0 16.950000000000024
2022-04-30 2.089 0.0 1.0 16.987500000000026
2022-05-01 2.986 0.0 1.0 15.587500000000013
2022-05-02 5.314 0.0 1.0 15.487500000000018
2022-05-03 8.116 0.0 1.0 15.40000000000002
2022-05-04 4.621 0.0 1.0 15.287500000000016
2022-05-05 6.201 0.0 1.0 16.175000000000026
2022-05-06 4.748 0.0 1.0 16.325000000000017
2022-05-07 2.05 0.0 1.0 17.187500000000036
2022-05-08 3.298 0.0 1.0 18.700000000000024
2022-05-09 3.658 0.0 1.0 19.725000000000016
2022-05-10 5.936 0.0 1.0 20.58750000000002
2022-05-11 8.453 0.0 1.0 21.225000000000023
2022-05-12 4.136 0.0 1.0 19.587500000000034
2022-05-13 4.844 0.0 1.0 20.25000000000002
2022-05-14 4.864 0.0 1.0 22.725000000000023
2022-05-15 4.423 0.0 1.0 21.675000000000026
2022-05-16 6.768 0.0 1.0 22.450000000000017
2022-05-17 6.074 0.0 1.0 24.60000000000003
2022-05-18 2.389 0.0 1.0 25.93750000000002
2022-05-19 3.076 0.0 1.0 22.862500000000026
2022-05-20 3.877 0.0 1.0 24.2714285714286
2022-05-21 4.039 0.0 1.0 28.325000000000017
2022-05-22 4.889 0.0 1.0 25.02500000000002
2022-05-23 5.125 0.0 1.0 18.975000000000023
2022-05-24 5.781 0.0 1.0 14.850000000000016
2022-05-25 6.745 0.0 1.0 15.925000000000018
2022-05-26 4.404 0.0 1.0 15.925000000000026
2022-05-27 6.775 0.0 1.0 19.58750000000002
2022-05-28 3.758 0.0 1.0 20.40000000000002
2022-05-29 4.79 0.0 1.0 19.550000000000026
2022-05-30 5.58 0.0 1.0 19.287500000000023
2022-05-31 8.348 0.0 1.0 21.112500000000026
2022-06-01 3.656 0.0 0.5 23.950000000000024
2022-06-02 2.146 0.0 0.0 21.85000000000003
2022-06-03 2.244 0.0 1.0 23.81250000000002
2022-06-04 5.758 0.0 2.0 21.50000000000003
2022-06-05 5.099 0.0 2.0 20.550000000000026
2022-06-06 3.35 0.0 2.0 22.46250000000002
2022-06-07 6.18 0.0 1.5 20.575000000000024
2022-06-08 2.676 0.0 1.0 18.00000000000003
2022-06-09 2.883 0.0 1.0 18.887500000000017
2022-06-10 5.966 0.0 1.5 21.62500000000002
2022-06-11 3.056 0.0 2.0 24.912500000000023
2022-06-12 5.119 0.0 2.0 24.737500000000033
2022-06-13 3.959 0.0 2.0 25.775000000000013
2022-06-14 3.875 0.0 2.0 28.43750000000003
2022-06-15 3.556 0.0 2.0 29.162500000000023
2022-06-16 4.135 0.0 2.0 30.150000000000027
2022-06-17 3.907 1.0 1.0 30.637500000000024
2022-06-18 2.892 2.0 0.0 29.287500000000016
2022-06-19 3.181 2.0 0.0 27.825000000000024
2022-06-20 3.829 1.0 0.0 24.18750000000003
2022-06-21 2.929 0.0 0.5 23.15000000000002
2022-06-22 3.161 0.0 1.0 24.075000000000017
2022-06-23 4.319 0.0 1.0 22.062500000000014
2022-06-24 3.9 0.0 0.5 20.42500000000002
2022-06-25 2.589 0.0 0.0 21.300000000000033
2022-06-26 2.182 0.0 1.0 16.925000000000026
2022-06-27 3.789 0.0 2.0 15.937500000000021
2022-06-28 4.381 0.0 2.0 21.00000000000003
2022-06-29 4.424 0.0 1.0 22.15000000000002
2022-06-30 2.151 0.0 0.0 16.637500000000017
2022-07-01 2.11 0.0 0.0 18.637500000000024
2022-07-02 2.627 0.0 0.0 22.662500000000023
2022-07-03 2.163 0.0 0.0 24.162500000000023
2022-07-04 2.195 0.0 0.0 21.925000000000026
2022-07-05 2.699 0.0 0.0 24.71250000000002
2022-07-06 2.25 0.5 0.0 23.975000000000023
2022-07-07 3.695 1.0 0.0 23.86250000000002
2022-07-08 4.177 1.0 0.0 24.112500000000026
2022-07-09 3.281 1.0 0.0 25.375000000000036
2022-07-10 2.575 0.5 0.0 26.68750000000002
2022-07-11 3.127 0.0 1.0 28.12500000000002
2022-07-12 4.781 0.0 2.0 28.975000000000016
2022-07-13 3.46 0.0 1.0 28.837500000000013
2022-07-14 3.217 0.0 1.0 29.700000000000024
2022-07-15 4.582 0.0 2.0 28.825000000000024
2022-07-16 4.725 0.0 2.0 29.387500000000024
2022-07-17 4.498 0.0 2.0 30.175000000000026
2022-07-18 5.64 0.0 2.0 30.96250000000002
2022-07-19 5.317 0.0 2.0 27.962500000000034
2022-07-20 3.535 0.0 2.0 25.450000000000017
2022-07-21 3.582 0.0 2.0 26.48750000000002
2022-07-22 2.657 0.0 2.0 25.850000000000023
2022-07-23 4.028 0.0 1.0 23.787500000000023
2022-07-24 3.306 0.0 1.0 27.98750000000002
2022-07-25 5.406 0.0 2.0 24.287500000000023
2022-07-26 2.348 0.0 1.0 22.825000000000024
2022-07-27 2.641 0.0 0.0 22.93750000000003
2022-07-28 2.928 0.0 0.0 23.350000000000016
2022-07-29 3.52 0.5 0.0 22.275000000000027
2022-07-30 2.625 0.0 0.0 25.075000000000017
2022-07-31 2.28 0.0 0.0 26.38750000000003
2022-08-01 3.021 1.0 0.0 27.43750000000002
2022-08-02 4.18 2.0 0.0 28.700000000000017
2022-08-03 2.774 1.0 0.0 28.98750000000002
2022-08-04 3.177 0.0 1.0 29.00000000000002
2022-08-05 3.624 0.0 2.0 26.450000000000024
2022-08-06 3.746 0.0 1.0 26.91250000000003
2022-08-07 2.288 0.0 0.0 28.25000000000003
2022-08-08 2.344 0.0 0.0 27.81250000000003
2022-08-09 2.648 0.0 1.0 27.787500000000023
2022-08-10 3.649 0.0 2.0 28.925000000000033
2022-08-11 4.475 0.0 2.0 29.575000000000024
2022-08-12 4.456 0.0 2.0 31.212500000000027
2022-08-13 3.05 0.0 2.0 27.837500000000034
2022-08-14 3.159 0.0 2.0 25.26250000000003
2022-08-15 3.415 0.0 1.0 24.43750000000003
2022-08-16 2.597 0.0 0.0 25.08750000000002
2022-08-17 2.694 0.0 1.0 21.087500000000013
2022-08-18 6.129 0.0 2.0 20.512500000000024
2022-08-19 4.034 0.0 1.0 21.57500000000003
2022-08-20 2.536 0.0 0.0 23.712500000000027
2022-08-21 2.496 0.0 0.0 24.600000000000023
2022-08-22 2.921 0.0 1.0 22.31250000000003
2022-08-23 6.441 0.0 2.0 25.00000000000003
2022-08-24 3.421 0.0 2.0 27.287500000000023
2022-08-25 2.755 0.0 1.0 25.587500000000027
2022-08-26 4.117 0.0 1.0 25.050000000000026
2022-08-27 5.777 0.0 2.0 24.86250000000002
2022-08-28 6.562 0.0 2.0 26.787500000000023
2022-08-29 4.948 0.0 2.0 25.45000000000003
2022-08-30 7.889 0.0 2.0 25.175000000000026
2022-08-31 5.301 0.0 2.0 24.162500000000016
2022-09-01 6.92 0.0 1.0 23.737500000000026
2022-09-02 2.305 0.0 0.0 24.05000000000001
2022-09-03 2.658 0.0 1.0 23.00000000000003
2022-09-04 4.204 0.0 2.0 25.48750000000002
2022-09-05 2.749 0.0 2.0 26.80000000000002
2022-09-06 4.268 0.0 2.0 26.900000000000034
2022-09-07 3.36 0.0 2.0 23.46250000000002
2022-09-08 4.25 0.0 1.5 21.512500000000024
2022-09-09 4.475 0.0 1.0 19.350000000000023
2022-09-10 4.745 0.0 1.0 20.68750000000003
2022-09-11 3.453 0.0 1.0 24.362500000000026
2022-09-12 3.749 0.0 1.0 24.975000000000023
2022-09-13 3.309 0.0 1.5 25.850000000000016
2022-09-14 4.094 0.0 2.0 23.90000000000002
2022-09-15 2.338 0.0 2.0 23.975000000000016
2022-09-16 3.969 0.0 1.0 21.58750000000002
2022-09-17 3.318 0.0 0.5 17.837500000000027
2022-09-18 2.972 0.0 1.0 16.912500000000023
2022-09-19 3.7 0.0 1.0 17.75000000000002
2022-09-20 3.516 0.0 1.0 17.737500000000026
2022-09-21 3.506 0.0 1.0 17.71250000000002
2022-09-22 2.242 0.0 1.0 18.275000000000013
2022-09-23 3.799 0.0 1.0 18.475000000000016
2022-09-24 4.448 0.0 1.0 16.825000000000017
2022-09-25 2.713 0.0 1.0 14.925000000000026
2022-09-26 2.779 0.0 1.0 14.58750000000002
2022-09-27 7.43 0.0 1.0 14.887500000000024
2022-09-28 1.929 0.0 1.0 15.987500000000033
2022-09-29 4.319 0.0 1.0 14.450000000000031
2022-09-30 3.623 0.0 1.0 13.987500000000026
2022-10-01 4.922 0.0 1.0 16.52500000000002
2022-10-02 2.58 0.0 1.0 17.36250000000002
2022-10-03 5.915 0.0 1.0 18.262500000000017
2022-10-04 4.371 0.0 1.0 17.775000000000027
2022-10-05 4.271 0.0 1.0 18.375000000000036
2022-10-06 7.801 0.0 1.0 17.800000000000026
2022-10-07 6.748 0.0 1.0 17.71250000000002
2022-10-08 4.469 0.0 1.0 18.600000000000023
2022-10-09 3.999 0.0 1.0 18.06250000000003
2022-10-10 4.343 0.0 1.0 19.675000000000026
2022-10-11 5.483 0.0 1.0 18.287500000000023
2022-10-12 2.731 0.0 1.0 17.71250000000002
2022-10-13 5.802 0.0 1.0 17.92500000000002
2022-10-14 2.626 0.0 1.0 18.775000000000034
2022-10-15 2.524 0.0 1.0 17.91250000000003
2022-10-16 7.679 0.0 1.0 21.18750000000002
2022-10-17 4.759 0.0 1.0 22.02500000000002
2022-10-18 3.582 0.0 1.0 21.93750000000003
2022-10-19 4.796 0.0 1.0 21.000000000000014
2022-10-20 2.552 0.0 1.0 20.77500000000002
2022-10-21 5.412 0.0 1.0 19.75000000000002
2022-10-22 2.608 0.0 1.0 19.80000000000002
2022-10-23 4.293 0.0 1.0 22.42500000000002
2022-10-24 2.799 0.0 1.0 19.63750000000003
2022-10-25 4.283 0.0 1.0 17.862500000000026
2022-10-26 2.215 0.0 1.0 18.037500000000023
2022-10-27 3.94 0.0 1.0 20.150000000000034
2022-10-28 5.535 0.0 1.0 20.600000000000016
2022-10-29 2.357 0.0 0.5 19.587500000000013
2022-10-30 1.975 0.0 0.0 19.62500000000002
2022-10-31 2.228 0.0 0.0 20.37500000000002
2022-11-01 2.593 0.0 0.0 16.85000000000003
2022-11-02 1.99 0.0 0.0 15.937500000000021
2022-11-03 2.248 0.0 0.0 15.412500000000023
2022-11-04 2.206 0.0 0.0 11.762500000000017
2022-11-05 2.213 0.0 0.0 10.937500000000021
2022-11-06 1.736 0.0 0.0 10.862500000000011
2022-11-07 2.589 0.0 0.0 12.725000000000016
2022-11-08 1.843 0.0 0.0 16.975000000000016
2022-11-09 2.616 0.0 0.0 13.90000000000002
2022-11-10 1.771 0.0 0.0 11.262500000000017
2022-11-11 5.038 0.0 1.0 14.52500000000002
2022-11-12 8.671 0.0 2.0 17.600000000000023
2022-11-13 10.638 0.0 2.0 14.000000000000023
2022-11-14 7.613 0.0 1.0 14.887500000000024
2022-11-15 6.577 0.0 0.0 14.587500000000027
2022-11-16 5.872 0.0 0.0 13.60000000000003
2022-11-17 7.236 0.5 0.0 13.937500000000028
2022-11-18 5.302 1.0 0.0 11.125000000000014
2022-11-19 2.223 0.5 0.0 8.40000000000002
2022-11-20 1.776 0.0 0.0 8.237500000000033
2022-11-21 8.14 0.5 0.0 9.712500000000027
2022-11-22 19.411 1.0 0.0 8.137500000000017
2022-11-23 15.046 1.0 0.0 11.612500000000033
2022-11-24 10.991 1.0 0.0 9.437500000000028
2022-11-25 14.611 0.5 0.0 9.087500000000006
2022-11-26 14.344 0.5 0.0 6.22500000000003
2022-11-27 17.762 1.0 0.0 7.5375000000000085
2022-11-28 15.675 1.0 0.0 10.08750000000002
2022-11-29 14.678 1.0 0.0 8.575000000000024
2022-11-30 13.342 1.0 0.0 4.262500000000017
2022-12-01 16.346 1.0 0.0 4.875000000000021
2022-12-02 19.825 1.5 0.0 5.812500000000028
2022-12-03 10.991 1.0 0.0 5.700000000000024
2022-12-04 6.972 1.0 0.0 3.9375000000000213
2022-12-05 22.081 2.0 0.0 4.275000000000027
2022-12-06 22.108 2.0 0.0 7.325000000000024
2022-12-07 14.034 2.0 0.0 8.775000000000034
2022-12-08 6.779 1.0 0.0 6.887500000000024
2022-12-09 2.198 0.0 0.0 8.362500000000026
2022-12-10 2.198 0.0 0.0 1.962500000000027
2022-12-11 2.244 0.0 0.0 -0.8499999999999801
2022-12-12 22.233 1.0 0.0 2.200000000000017
2022-12-13 22.456 2.0 0.0 7.300000000000026
2022-12-14 17.933 2.0 0.0 11.625000000000021
2022-12-15 18.002 2.0 0.0 10.400000000000027
2022-12-16 13.852 2.0 0.0 9.562500000000028
2022-12-17 7.804 1.0 0.0 5.325000000000024
2022-12-18 0.933 0.0 0.0 7.475000000000023
2022-12-19 1.002 0.0 0.0 12.96250000000002
2022-12-20 0.961 0.0 0.0 13.287500000000016
2022-12-21 6.027 1.0 0.0 12.850000000000016
2022-12-22 13.967 2.0 0.0 13.062500000000028
2022-12-23 9.399 1.5 0.0 13.275000000000027
2022-12-24 1.362 1.0 0.0 11.200000000000024
2022-12-25 3.269 0.5 0.0 11.237500000000026
2022-12-26 1.84 0.0 0.0 11.375000000000028
2022-12-27 2.74 0.0 0.5 10.812500000000021
2022-12-28 4.237 0.0 1.0 8.787500000000023
2022-12-29 4.705 0.0 1.0 10.750000000000028
2022-12-30 4.66 0.0 1.0 12.162500000000023
2022-12-31 4.432 0.0 1.0 13.125000000000021
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Are Airbnb guests less energy efficient than their host?"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building a dataset"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Host"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import datetime as dt\n",
"\n",
"hosts = pd.DataFrame(\n",
" [\n",
" (dt.date(2022, 2, 17), dt.date(2022, 2, 19), 2), \n",
" (dt.date(2022, 2, 26), dt.date(2022, 3, 1), 2), \n",
" (dt.date(2022, 3, 19), dt.date(2022, 3, 23), 1), \n",
" (dt.date(2022, 3, 26), dt.date(2022, 3, 30), 2), \n",
" (dt.date(2022, 6, 17), dt.date(2022, 6, 20), 2), \n",
" (dt.date(2022, 7, 6), dt.date(2022, 7, 10), 1), \n",
" (dt.date(2022, 7, 29), dt.date(2022, 7, 29), 1), \n",
" (dt.date(2022, 8, 1), dt.date(2022, 8, 3), 2), \n",
" (dt.date(2022, 11, 17), dt.date(2022, 11, 19), 1), \n",
" (dt.date(2022, 11, 21), dt.date(2022, 11, 25), 1), \n",
" (dt.date(2022, 11, 26), dt.date(2022, 12, 2), 1), \n",
" (dt.date(2022, 12, 2), dt.date(2022, 12, 3), 2), \n",
" (dt.date(2022, 12, 4), dt.date(2022, 12, 8), 2), \n",
" (dt.date(2022, 12, 12), dt.date(2022, 12, 17), 2), \n",
" (dt.date(2022, 12, 21), dt.date(2022, 12, 23), 2), \n",
" (dt.date(2022, 12, 23), dt.date(2022, 12, 25), 1), \n",
" ],\n",
" columns=[\"since\", \"until\", \"n_hosts\"],\n",
")\n",
"hosts.to_csv(\"hosts.csv\", index=False)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"date\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">02</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">18</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2.0</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">02</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">26</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.0</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">03</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">23</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.5</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">07</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">29</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.5</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">11</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">29</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.0</span>\n",
"Name: n_hosts, dtype: float64\n",
"</pre>\n"
],
"text/plain": [
"\n",
"date\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m02\u001b[0m-\u001b[1;36m18\u001b[0m \u001b[1;36m2.0\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m02\u001b[0m-\u001b[1;36m26\u001b[0m \u001b[1;36m1.0\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m03\u001b[0m-\u001b[1;36m23\u001b[0m \u001b[1;36m0.5\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m07\u001b[0m-\u001b[1;36m29\u001b[0m \u001b[1;36m0.5\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m11\u001b[0m-\u001b[1;36m29\u001b[0m \u001b[1;36m1.0\u001b[0m\n",
"Name: n_hosts, dtype: float64\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"host_dates = pd.concat([\n",
" pd.DataFrame({\n",
" \"date\": (dates := pd.date_range(stay[\"since\"], stay[\"until\"], inclusive=\"both\", freq=\"D\")),\n",
" # We count only 50% for start and end days, because people don't spend the whole day there\n",
" \"n_hosts\": (\n",
" [stay[\"n_hosts\"] / 2] +\n",
" [stay[\"n_hosts\"]] * (len(dates) - 2) +\n",
" ([stay[\"n_hosts\"] / 2] if len(dates) > 1 else [])\n",
" )\n",
" })\n",
" for stay in hosts.to_dict(orient=\"records\")\n",
"])\n",
"host_dates = host_dates.groupby(\"date\")[\"n_hosts\"].sum()\n",
"host_dates.sample(5)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### kW/h"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>kilowatt-hour</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2022-09-05</th>\n",
" <td>2.749</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-01-18</th>\n",
" <td>26.668</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-10-10</th>\n",
" <td>4.343</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-07-14</th>\n",
" <td>3.217</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-07-09</th>\n",
" <td>3.281</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import glob\n",
"import pandas as pd\n",
"\n",
"kwh = (\n",
" pd.concat([\n",
" pd.read_csv(p, parse_dates=[\"Date\"], dayfirst=True)\n",
" for p in glob.glob(\"data/historique*.csv\")\n",
" ])\n",
" .rename(columns={\"Date\": \"date\", \"Conso en kWh\": \"kilowatt-hour\"})\n",
" .set_index(\"date\")\n",
")\n",
"kwh.sample(5)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Guests"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Confirmation code</th>\n",
" <th>Status</th>\n",
" <th># of guests</th>\n",
" <th>Start date</th>\n",
" <th>End date</th>\n",
" <th># of nights</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>HMM29JP58P</td>\n",
" <td>Past guest</td>\n",
" <td>1</td>\n",
" <td>2022-01-02</td>\n",
" <td>2022-01-12</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>HM34P8THWE</td>\n",
" <td>Past guest</td>\n",
" <td>1</td>\n",
" <td>2022-01-13</td>\n",
" <td>2022-02-12</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>HMMJH8NSHQ</td>\n",
" <td>Past guest</td>\n",
" <td>1</td>\n",
" <td>2022-02-19</td>\n",
" <td>2022-02-25</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>HM93S8EPCM</td>\n",
" <td>Past guest</td>\n",
" <td>1</td>\n",
" <td>2022-03-07</td>\n",
" <td>2022-03-17</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>HMHMMNSW25</td>\n",
" <td>Past guest</td>\n",
" <td>1</td>\n",
" <td>2022-04-07</td>\n",
" <td>2022-06-01</td>\n",
" <td>55</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"guests = pd.read_csv('data/guests.csv', parse_dates=[\"Start date\", \"End date\"])\n",
"(\n",
" pd.concat([\n",
" guests[[\"Start date\", \"End date\"]],\n",
" pd.DataFrame([\n",
" {\"Start date\": pd.Timestamp('2022-12-27 00:00:00'), \"End date\": pd.Timestamp('2022-12-31 00:00:00')},\n",
" ], index=[len(guests)])\n",
" ])\n",
" .rename(columns={\"Start date\": \"since\", \"End date\": \"until\"})\n",
" .to_csv('guests.csv')\n",
")\n",
"guests[\"# of guests\"] = guests[\"# of adults\"] + guests[\"# of children\"] + guests[\"# of infants\"]\n",
"guests[['Confirmation code', 'Status', '# of guests', 'Start date', 'End date', '# of nights']].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"date\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">06</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">08</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.0</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">08</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">11</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2.0</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">05</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">28</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.0</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">01</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">22</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.0</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">04</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">08</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1.0</span>\n",
"Name: n_guests, dtype: float64\n",
"</pre>\n"
],
"text/plain": [
"\n",
"date\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m06\u001b[0m-\u001b[1;36m08\u001b[0m \u001b[1;36m1.0\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m08\u001b[0m-\u001b[1;36m11\u001b[0m \u001b[1;36m2.0\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m05\u001b[0m-\u001b[1;36m28\u001b[0m \u001b[1;36m1.0\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m01\u001b[0m-\u001b[1;36m22\u001b[0m \u001b[1;36m1.0\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m04\u001b[0m-\u001b[1;36m08\u001b[0m \u001b[1;36m1.0\u001b[0m\n",
"Name: n_guests, dtype: float64\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"guest_dates = pd.concat([\n",
" pd.DataFrame({\n",
" \"date\": (dates := pd.date_range(stay[\"Start date\"], stay[\"End date\"], inclusive=\"both\", freq=\"D\")),\n",
" # We count only 50% for start and end days, because people don't spend the whole day there\n",
" \"n_guests\": (\n",
" [stay[\"# of guests\"] / 2] +\n",
" [stay[\"# of guests\"]] * (len(dates) - 2) +\n",
" ([stay[\"# of guests\"] / 2] if len(dates) > 1 else [])\n",
" )\n",
" })\n",
" for stay in guests.to_dict(orient=\"records\")\n",
"])\n",
"guest_dates = guest_dates.groupby(\"date\")[\"n_guests\"].sum()\n",
"\n",
"# Early guest who should have arrived in 2023\n",
"guest_dates.loc[pd.Timestamp('2022-12-27 00:00:00')] = 0.5\n",
"guest_dates.loc[pd.Timestamp('2022-12-28 00:00:00')] = 1\n",
"guest_dates.loc[pd.Timestamp('2022-12-29 00:00:00')] = 1\n",
"guest_dates.loc[pd.Timestamp('2022-12-30 00:00:00')] = 1\n",
"guest_dates.loc[pd.Timestamp('2022-12-31 00:00:00')] = 1\n",
"\n",
"guest_dates.sample(5)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Weather"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
"date\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">08</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">09</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">27.7875</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">10</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">04</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">17.7750</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">11</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">11</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">14.5250</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">12</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">19</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">12.9625</span>\n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2022</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">01</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">10</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">7.8375</span>\n",
"Name: temperature, dtype: float64\n",
"</pre>\n"
],
"text/plain": [
"\n",
"date\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m08\u001b[0m-\u001b[1;36m09\u001b[0m \u001b[1;36m27.7875\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m10\u001b[0m-\u001b[1;36m04\u001b[0m \u001b[1;36m17.7750\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m11\u001b[0m-\u001b[1;36m11\u001b[0m \u001b[1;36m14.5250\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m12\u001b[0m-\u001b[1;36m19\u001b[0m \u001b[1;36m12.9625\u001b[0m\n",
"\u001b[1;36m2022\u001b[0m-\u001b[1;36m01\u001b[0m-\u001b[1;36m10\u001b[0m \u001b[1;36m7.8375\u001b[0m\n",
"Name: temperature, dtype: float64\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"weather = (\n",
" pd.read_csv(\n",
" 'data/donnees-synop-essentielles-omm.csv',\n",
" sep=';',\n",
" parse_dates=[\"Date\"],\n",
" date_parser=lambda x: pd.to_datetime(x, utc=True)\n",
" )\n",
" .rename(columns={'Date': 'datetime', \"Température (°C)\": 'temperature'})\n",
" .assign(date=lambda df: df['datetime'].dt.date)\n",
")\n",
"temperature = weather.groupby('date')['temperature'].mean()\n",
"temperature = temperature[temperature.index.map(lambda d: d.year == 2022)]\n",
"temperature.sample(5)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Join them all"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date</th>\n",
" <th>kilowatt-hour</th>\n",
" <th>n_hosts</th>\n",
" <th>n_guests</th>\n",
" <th>temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2022-01-01</td>\n",
" <td>2.171</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>7.8750</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2022-01-02</td>\n",
" <td>10.310</td>\n",
" <td>0.0</td>\n",
" <td>0.5</td>\n",
" <td>8.7875</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2022-01-03</td>\n",
" <td>16.107</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>8.3500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2022-01-04</td>\n",
" <td>16.563</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>9.2500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2022-01-05</td>\n",
" <td>17.098</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>5.7000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset = (\n",
" kwh\n",
" .join(host_dates)\n",
" .join(guest_dates)\n",
" .join(temperature)\n",
")\n",
"dataset = dataset.sort_index().reset_index()\n",
"dataset.n_hosts = dataset.n_hosts.fillna(0).astype(float)\n",
"dataset.n_guests = dataset.n_guests.fillna(0).astype(float)\n",
"dataset.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">365</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1;36m365\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"len(dataset)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"dataset.to_csv('dataset.csv', index=False)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"| date | kilowatt-hour | n_hosts | n_guests | temperature |\n",
"|:--------------------|----------------:|----------:|-----------:|--------------:|\n",
"| 2022-01-01 00:00:00 | 2.171 | 0 | 0 | 7.875 |\n",
"| 2022-01-02 00:00:00 | 10.31 | 0 | 0.5 | 8.7875 |\n",
"| 2022-01-03 00:00:00 | 16.107 | 0 | 1 | 8.35 |\n",
"| 2022-01-04 00:00:00 | 16.563 | 0 | 1 | 9.25 |\n",
"| 2022-01-05 00:00:00 | 17.098 | 0 | 1 | 5.7 |\n"
]
}
],
"source": [
"print(dataset.head().to_markdown(index=False))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Regression analysis"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"hourly_temperatures = (\n",
" weather\n",
" .sort_values('datetime')\n",
" .set_index('datetime')\n",
" ['temperature']\n",
" .resample('1H')\n",
" .mean()\n",
" .interpolate()\n",
")\n",
"reference_temperature = 18\n",
"heating_degree_days = (\n",
" hourly_temperatures\n",
" .apply(lambda t: max(0, reference_temperature - t))\n",
" .to_frame()\n",
" .assign(date=lambda df: df.index.date)\n",
" .groupby('date')\n",
" .sum()\n",
" .rename(columns={'temperature': 'heating_degree_days'})\n",
")\n",
"cooling_degree_days = (\n",
" hourly_temperatures\n",
" .apply(lambda t: max(0, t - reference_temperature))\n",
" .to_frame()\n",
" .assign(date=lambda df: df.index.date)\n",
" .groupby('date')\n",
" .sum()\n",
" .rename(columns={'temperature': 'cooling_degree_days'})\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>is_weekend</th>\n",
" <th>n_people</th>\n",
" <th>is_guest</th>\n",
" <th>heating_degree_days</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2022-01-01</th>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>244.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-01-02</th>\n",
" <td>1.0</td>\n",
" <td>0.5</td>\n",
" <td>1.0</td>\n",
" <td>219.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-01-03</th>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>231.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-01-04</th>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>208.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-01-05</th>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>301.2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = (\n",
" dataset\n",
" .set_index('date')\n",
" .join(heating_degree_days)\n",
" .join(cooling_degree_days)\n",
" .reset_index()\n",
")\n",
"X['is_weekend'] = X.date.dt.dayofweek > 4\n",
"X['n_people'] = X.n_hosts + X.n_guests\n",
"X['is_guest'] = X.n_guests > 0\n",
"X = X.set_index('date')[[\n",
" 'is_weekend',\n",
" 'n_people',\n",
" 'is_guest',\n",
" 'heating_degree_days'\n",
"]].astype(float)\n",
"y = dataset.set_index('date')['kilowatt-hour']\n",
"X.head()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results \n",
"=======================================================================================\n",
"Dep. Variable: kilowatt-hour R-squared (uncentered): 0.844\n",
"Model: OLS Adj. R-squared (uncentered): 0.843\n",
"Method: Least Squares F-statistic: 489.2\n",
"Date: Tue, 17 Jan 2023 Prob (F-statistic): 2.61e-144\n",
"Time: 17:46:00 Log-Likelihood: -1011.5\n",
"No. Observations: 365 AIC: 2031.\n",
"Df Residuals: 361 BIC: 2047.\n",
"Df Model: 4 \n",
"Covariance Type: nonrobust \n",
"=======================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"---------------------------------------------------------------------------------------\n",
"is_weekend -1.5339 0.435 -3.527 0.000 -2.389 -0.679\n",
"n_people 2.4652 0.331 7.455 0.000 1.815 3.116\n",
"is_guest 1.7145 0.451 3.798 0.000 0.827 2.602\n",
"heating_degree_days 0.0405 0.001 27.288 0.000 0.038 0.043\n",
"==============================================================================\n",
"Omnibus: 6.854 Durbin-Watson: 0.729\n",
"Prob(Omnibus): 0.032 Jarque-Bera (JB): 9.520\n",
"Skew: 0.120 Prob(JB): 0.00856\n",
"Kurtosis: 3.754 Cond. No. 425.\n",
"==============================================================================\n",
"\n",
"Notes:\n",
"[1] R² is computed without centering (uncentered) since the model does not contain a constant.\n",
"[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
]
}
],
"source": [
"import statsmodels.api as sm\n",
"\n",
"mod = sm.OLS(y, X)\n",
"res = mod.fit()\n",
"print(res.summary())"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0.972</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1;36m0.972\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"24 * 0.0405"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.11.0 64-bit ('3.11.0')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "55fbbcf542e06cc59ad76a1e0d5dc36ee204d6d2b704491656ee6b3487310122"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment