Skip to content

Instantly share code, notes, and snippets.

View czsheets's full-sized avatar

Charlie Sheets czsheets

View GitHub Profile
@czsheets
czsheets / blog.md
Last active September 9, 2018 09:38
RShiny simulation to estimate the chances of job offer(s)

How many offers will I get: an interactive simulation using RStudio's Shiny application

As a rapidly growing field, data science programs often work to provide exposure to leading companies in marketing, banking, consulting, research, technology, insurance, and many other areas with a need for analytic services. Students are encouraged to apply to a wide array of companies, to develop relationships with people in the field, get an understanding of different fields in analytics, and increase the chances of getting one or (hopefully) more offers.

During the interview process analytics students are always running the numbers (consciously or not) and calculating the likelihood of an offer. Where should I put my energy? How many interviews are too many? Too few? I wrote an interactive program to allow applicants to estimate their personal numbers using this tool. It includes the following variables a

@czsheets
czsheets / 538_Riddler_8_24_18.md
Last active August 27, 2018 17:40
Solution to 538 Riddler 8-24-18

538 Classic Riddler for 8-24-2018

The code included is a simulated answer to a classic riddler challenge from 538:

Let’s call this game rock-paper-scissors-hop. Here is an idealized list of its rules:

  • Kids stand at either end of N hoops.
  • At the start of the game, one kid from each end starts hopping at a speed of one hoop per second until they run into each other, either in adjacent hoops or in the same hoop.
  • At that point, they play rock-paper-scissors at a rate of one game per second until one of the kids wins.
  • The loser goes back to their end of the hoops, a new kid immediately steps up at that end, and the winner and the new player hop until they run into each other. >- This process continues until someone reaches the opposing end. That player’s team wins!
@czsheets
czsheets / 538_card_game.md
Last active October 9, 2018 01:21
Can you make it to the end of the deck?

Take a standard deck of cards, and pull out the numbered cards from one suit (the cards 2 through 10). Shuffle them, and then lay them face down in a row. Flip over the first card. Now guess whether the next card in the row is bigger or smaller. If you’re right, keep going. If you play this game optimally, what’s the probability that you can get to the end without making any mistakes? Extra credit: What if there were more cards — 2 through 20, or 2 through 100? How do your chances of getting to the end change?

R code is here. Starts with function to determine success in single 10 card trial (0 or 1), which comes out to ~ .17.

Use function to simulate repetitions of 10 card came, can be extended to many cards or repetitions. Probability of success rapidly drops to near zero with ~ 35 cards:

@czsheets
czsheets / HR_RBI_association.md
Last active September 9, 2018 11:23
Checking assumptions about baseball's triple crown statistics

One of my favorite baseball websites, baseballmusings.com, recently had a post about the chances of a player winning the triple crown (leading the league in home runs, RBIs and batting average).

The author provided calculations for these, and estimated that probability of achieving the triple crown by multiplying the individual probabilities together. This led me to assess the association between these statistics, as it seemed that there was a fairly strong association between leading the league in home runs and RBIs (both generally signs of power hitters, who probably get lineup spots with opportunities to drive in runners), and that perhaps they shouldn't be considered as independent. The analysis using RStudio and intepretation is below.

Using Lahman package for baseball data, sqldf for data manipulation

library(Lahman)
library(sqldf)
@czsheets
czsheets / Riddler_Express_9_7_18.ipynb
Last active September 10, 2018 23:53
I'd like to use my lifeline
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@czsheets
czsheets / colored_die.md
Last active September 16, 2018 04:13
Rolling colored dice

This week's Riddler Express:

Abby and Beatrix are playing a game with two six-sided dice. Rather than having numbers on the sides like normal dice, however, the sides of these dice are either red or blue. In the game they're playing, Abby wins if the two dice land with the same color on top. Beatrix wins if the colors are not the same. One of the dice has five blue sides and one red side. If Abby and Beatrix have equal chances of winning the game, how many red and blue sides does the other die have?

While this wouldn't be too difficult to reason out, it's also a pretty straightforward simulation, and a good excuse for a stacked barchart. We start by creating the seven dice needed to compare.

df = data.frame(id = 1:6)
for(i in 0:6){
  die <- c(rep('red',i),rep('blue',6-i))
@czsheets
czsheets / _win_ws_win_perc.md
Last active October 9, 2018 12:58
How many playoff games can you win without winning the World Series (and vice versa)?

From the Riddler Express:

The Major League Baseball playoffs are about to begin. Based on the current playoff format, what is the best possible winning percentage a team can have in the playoffs without winning the World Series? And what is the worst possible winning percentage a team can have in the playoffs and still win the World Series?

The current format is:

  • one play-in game for two non-division-winning wild card teams
  • best 3-out-of 5 division series
  • best 4-out-of-7 league championship series
  • best 4-out-of-7 World Series
@czsheets
czsheets / gistlog.yml
Last active October 29, 2018 01:33
How do players move between the Yankees and Red Sox?
We couldn’t find that file to show.
@czsheets
czsheets / _introduction.md
Created October 29, 2018 02:25
Introduction and post list