alfaraday/1 - Mock Interview: Experimentation.md

## 1 - Mock Interview: Experimentation.md

      
    Raw
  

              1 - Mock Interview: Experimentation.md
            
          
    Mock Interview 1: Experimentation

After completing Unit 1, students will take an interview focused on experimentation. The goal of this interview is to evaluate problem solving in the context of experimentation. Students should be able to design a proper A/B test and discuss how they would evaluate it.
We use a generic coffee shop as the setup for this problem. The form of the interview in general is as follows, though be sure to be prepared to adapt to the path the student chooses to pursue.
Afterwards, write up comments and ways to improve and submit them via the Typeform linked in your dashboard.
Introduce yourself first! Ask if they have any questions before you begin, and then start the interview.
Introductory question:
You run a coffee shop and you’re looking to increase business. The way that you come up with to do that is to change the sign in front of your store. However, you’re not just some typical coffee shop owner, you want to be data driven. How would you use data to evaluate  the new sign’s effect on your business?
Stages of the Answer:
Students should be allowed to answer the question as they see fit, and do not create too much scaffolding for them right at the start. Part of the task is to be able to build that outline and then fill it in with the details.
However, here is a rough outline of what is typically present in a good answer and some potential follow up questions. Following the flow of how experimentation actually works is key to developing a logical plan and outline through the interview, so too much deviation from this order may cause confusion (meaning, don’t start with evaluating a t-test and then outline a metric to track). Throughout the interview ask why and ask them to explain any tools or techniques they want to use.


What should we do?


Set up an experiment or A/B test

Why?


Period with sign A then period with sign B

Make sure that the only difference is the sign, making for a good controlled experiment


How long do you run the experiment?


Most suggest a week or a month


Why?

Control for seasonality
Sample size


If people don’t include an A test period (wanting to use old data) would they use all the old data or just a similar time period to the test time period

What happens if they compare against years of data?


Can you be statistically rigorous here?


What are the metrics you’d want to look at?


Usually sales or number of customers

They have to make sure they’re not just calculating a single number


Would you look at any other metrics?

Why look at multiple metrics?
Other metrics students could mention:

Visitors
Sales per customer


Evaluate your experiment


How would you do it?


Usually t-test

Why?


Walk through interpretation

P-value should get defined here. If they don’t ask.


What if the results are significant? What do you do next


What if they’re not?

It may take some time for an effect to be seen is a good thing for students to bring up. It also makes a decent follow up.


Extensions - Assuming the student gets through these problems and outlines a robust experiment, here are potential extensions or follow up questions.

How would it change if it were a national chain of coffee shops?
How is this experiment different if it’s price?


## 2 - Mock Interview: Modeling Rurbic.md

      
    Raw
  

              2 - Mock Interview: Modeling Rurbic.md
            
          
Objective
Rate 1
Rate 3
Rate 5


Features


Identifying Features
The student suggests nonsense features
The student suggests one or two features, and does not focus on usage behaviors
The student suggests several features across behavioral and demographic categories outlining strong logic for each


Engineering Features
The student does not want to engineer new features
The student engineers one feature with questionable efficacy
The student engineers many features that creatively attend to the concern of churn


Model Selection


Which Model
The student proposes either only a single model or a non-modeling approach
The student presents a few models but may fail to detail the advantages of them
The student suggests several models with a thorough explanation of their strengths and weaknesses when pressed


Random Forest
The student either incorrectly or incompletely explains random forest
The student explains what random forest is without noting legitimate advantages or disadvantages to the technique
The student presents a thorough explanation of random forest, noting features like parallelizeablity


Weaker Models
The student either does not know or does not discredit weaker models like Naive Bayes or KNN in this context
The student suggests these models will not work well but does not explain why
The student acknowledges and explains these models weakness at incorporating complex trends observed in data


Evaluation
The student does not correctly evaluate their model
The student indicated some form of correct validation but it is narrow and/or incomplete. For example, suggesting one metric for evaluation but not mentioning cross validation or explaining that metric.
The student provides a thorough and varied evaluation strategy for their model and puts models in context with each other, weighing various kinds of costs and benefits


Small Sample Size
When offered a data set with a small sample size, the student proposes the same approach
The student proposes some changes to modeling structure but struggles with the reasons why something like neural networks wont work with small samples
The student accurately acknowledges the advantages of certain models in small sample sizes


New Product
The student does not adjust their approach
The student proposes an analytic or modeling solution that does not fully utilize the information that is available
The student elegantly combines analytics and modeling to leverage the small size of the dataset but also the potential value of machine learning.


Other


K-Means
Student does not know what k means is other than basic facts like it 'does grouping'
Student talk about kmeans as unsupervised and mentions finding clusters, but does not go into how those clusters are found or defined
Student describes K-Means clearly, mentioning key features like convergence, distance to cetroid, and k-centroids


K-Means failure
Student says k-means cannot fail
Student works through thinking about different shaped clusters but cannot see a failure rule
Student sees linear separability as key to k-means's success


Supervised vs Unsupervised
Student does not know what these things are
Outcome vs no outcome kind of explanation
The student understand and expresses the predictive, observed nature of supervised learning vs the assosciative nature of much of unsupervised


Bias Variance tradeoff
Student does not know what these things are
Some mention of what bias and variance are, but not how they relate
Explains a clear relationship including


## 2 - Mock Interview: Modeling.md

      
    Raw
  

              2 - Mock Interview: Modeling.md
            
          
    Mock Interview 2: Modeling

After completing Units 2-4, students will take an interview focused on modeling. The goal of this interview is to evaluate problem solving in the context of supervised and unsupervised learning. Students should be able to propose a proper model for a given problem and discuss how they would evaluate it.
We use Twitter’s user problem for this interview. The form of the interview in general is as follows, though be sure to be prepared to adapt to the path the student chooses to pursue.
Afterwards, write up comments and ways to improve and submit them via the Typeform linked in your dashboard.
Introduce yourself first! Ask if they have any questions before you begin, and then start the interview.
Intro Questions:

What’s the difference between Supervised and unsupervised learning?
Can you explain the bias variance tradeoff?

Case Study: Twitter has a user problem. Specifically, we’re starting to lose users faster than we’re adding them. How can you help us solve this problem?
First approach: Billions of rows of data on users, 1000’s of columns. How many people they follow. How often they visit. If they have the mobile app. Etc. Basically everything you’d want. If we don’t track it imagine that we could.
How would you approach building a model to help with our user problem?
Key questions


What should we predict?


Break it into two problems first.

User churn
User growth


User Churn is the real problem here (push against the marketing problem, which is more complex and really needs time series work…)


How do we predict user churn


Feature engineering

Some logic around interesting and valuable features…

Last sign on
Frequency of use


How do you handle the volume of data

Select columns via intuition, correlation
PCA for feature reduction
Can try subsampling


What kinds of models would you want to try

RF parallelizes well, good with overfitting
Boosted models will be slow but can perform well
If you value explanatory power can try a linear model

That comes with much more concern about feature selection…
Note that feature selection should prioritize variables we may be able to influence in some way


Explain random forest


How do you evaluate

Cross Validation
Validation sample
Test in wild


What if it works?

This is really about taking something from being a model and driving some kind of useful outcomes from it. It’s great that we’ve modeled churn, but it doesn’t matter if we just stop there.
Things like

Finding influenceable variables and affecting them
Target different marketing


What if it doesn’t?


What if we wanted to do it on signup for a single market that was experiencing distress?


This is a much smaller dataset

Thousands of rows
A dozen features


So, the key question then is how do you deal with a smaller dataset?

Boosting or NN is likely a challenge…
No need to do dimensionality reduction
Linear models, possibly still big enough for RF
Also definitely value in simple analytics…

How is this market different?
Comparative tests, etc


What if instead we’re talking about a new product and signups just aren’t where we’d expected…

This is almost certainly just an analytics project.

Look at history of other product launches
Competitive analysis
Really just look for reasonable responses here


Wrap up questions:

Can you describe K-Means clustering?
When does it fail?

(When they’re not linearly separable. Think of a bullseye.)


## 3 - Mock Interview: Phone Screen Rurbic.md

      
    Raw
  

              3 - Mock Interview: Phone Screen Rurbic.md
            
          
Objective
Rate 1
Rate 3
Rate 5


Communication


Clarity
The student is difficult to follow, telling stories that don't seem to make sense or otherwise struggling to communicate
The student tells stories but occasionally drops details or makes leaps without filling in the necessary details
The student tells clear stories that are engaging and relevant while being natural to follow


Pacing
The student speaks either far too quickly or too slowly
The student speaks at a reasonable pace, but it has some difficulties or frustrations
The student keeps an engaged pace and ensures that the listener is following by asking when appropriate


Focus
The student does not stay on topic or answer questions
The student occasionally gets lost in details or rabbit holes but generally stays on topic
The student uses every opportunity to further their case as a data science applicant


Narrative


Data science narrative
The student says they're a data scientist because they took a bootcamp
The student acknowledges a previous interest in data science but doesn't sell it well
The student crafts a clear story of how their interest in data science has evolved but has longstanding roots


Discussion of bootcamp
The student says a bootcamp made them a data scientist
The student mentions the bootcamp, either overemphasies it or undersells it
The student uses the bootcamp as a tool in their narrative and development but not the crux thereof


Discussion of work history
The student doesn't acknowledge work before the bootcamp or speaks about it poorly
Some clear coverage of work history, but not great explanation of progress and development
The student uses past work experience to lay the groundwork for data science


Case Example


Setup
Does not explain the setup of their example project well
There is little explanation of why the project is important or interesting, but the setup is explained
It is clear what the project aimed to do and why it was done


Explanation
Does not explain the project well
The student provides some explanation of the project but it is unclear
The project is clearly presented and easily understood


Impact
The project had no impact
Project is technically useful but not impactful
The project is obviously shown to be impactful both in their development and as a product itself


## 3 - Mock Interview: Phone Screen.md

      
    Raw
  

              3 - Mock Interview: Phone Screen.md
            
          
    Mock Interview 3: Phone Screen

After completing Unit 5, students will take an interview that simulates a common phone screen. The goal of this interview is to evaluate the student’s preparedness for the job search. Students should be prepared to discuss their experience with data science and their interest in the field.
The form of the interview in general is as follows, though be sure to be prepared to adapt to the path the student chooses to pursue.
Afterwards, write up comments and ways to improve and submit them via the Typeform linked in your dashboard.
Phone Screen Questions:


Give me a quick overview of what you’ve been working on for the past few years.


DON’T JUST LEAD WITH BOOTCAMP

This is a tool to advance you from somewhere


Do not be afraid to interrupt


How did you get into data science?

When did you know this was something you wanted to do?
How did you start?
What about it is interesting to you?


What’s your favorite project that you’ve ever made?

Walk me through it?
Inception
Implementation
Iteration
Impact


What kind of job are you looking for?


Where do you see yourself in 5 years?


What are your strengths? Weaknesses?


What’s the best job you ever had?


Looking for:


Why they are making this change


A narrative of their interest in data

Usually this is not a sudden change. Explain why it makes sense and I should trust your commitment to it


Clear discussion of past work


Focus on relevance of work and desire to expand skills


This should be a dialogue, not questions and monologues. Ask follow ups. Respond to their interesting comments.


## 4 - Mock Interview: Final Rubric.md

      
    Raw
  

              4 - Mock Interview: Final Rubric.md
            
          
Objective
Rate 1
Rate 3
Rate 5


Product


Data choice
The student chooses a dataset that is either trivially small or otherwise inappropriate for ML
The student picks a dataset that is in some way significantly flawed or incompatible with their desired model and does not appropriately navigate those challenges. (IE data is only from a specific subset of the population or has some knowable bias). Also the data may be too explicitly set up for only one problem.
The student picks a robust data set, understands its provenance, and accomodates any relevant outside information or assumptions


Product
The product is too simple or has no use case
The product is limited in its use and not easily generalized to a non sterlizied environment
The product easily translates into a variety of situations and has obvious and clear value


Code


Python Essentials
The code has a structure to it, but is redundant and heavily reliant on bad practices like copy/pasting or contains code that is no longer used.
The code is good, but not very efficient or the logic is somewhat broken
The student writes good, clean, coherent code.


Pep8
Student is living in the wild west of code style.
Maybe there is a bad line or two, but not much. The code looks OK but not awe inspiring.
The code is fully pep8 compliant.


Data Science Toolkit
The student uses some data science tools, but they frequently aren't the right ones
The student uses some data science tools, but occasionally reverts to other structures in Python unnecessarily or doesn't always use the best tool for the job
The student is using the data science toolkit and creating easy to understand data structures like well labeled pandas dataframes rather than matricies.


Machine Learning


Model Training/Tuning
The model is trained on test data or untuned
The model has a simple training structure but no real tuning. Also no robust pipeline is created incoporating unsupervised techniques
The model cleanly includes a solid pipeline and incorporates the necessary techniques for broad robustness


Model Selection
Only one model is tried
Models are compared but not robustly
The model comparison incorporates the training toolset robustly and clearly.


Model Validation
Evaluation is a one off and one a single metric that is just the 'score' attribute in SKLearn
The model is validated using cross validation but it doesn't consider if the score metric is the right metric or it relys on only one metric
The model is evluated robustly and considers the right metrics that will matter for the problem as defined


Presentation


Clarity
The presentation is muddled
The presentation has some missing information or provokes some questions it doesn't answer but provides some structure for the student to walk through. It may be text heavy.
The presentation is easy to walk through and sets up a talk that invites and then answers the audience's quesitons


Visuals
The visuals are ugly or lacking
There are some visuals but they are not the best representations of the data or they are not well formatted
Visuals are crisp, clean and hugely compelling. They could exist on their own and be great.


## 4 - Mock Interview: Final.md

      
    Raw
  

              4 - Mock Interview: Final.md
            
          
    Final Interview

This script covers the final interview for the course. The interview and presentation should be evaluated together.
Because this is a project based interview questions should be tailored to the project itself, but a rough guideline is below.
Afterwards, write up comments and ways to improve and submit them via the Typeform linked in your dashboard.


What did you build?

Why did you choose this?
How did you know it would be doable?
What research did you do before you started trying to make it?


How is this going to help people?

Who is your user?
How will they use the thing?


How was this technically challenging?

What was the biggest difficulty you had? How did you overcome it?


Technical dive


How did you use unsupervised learning?

Why did you use it like that?
Describe the technique.


How did you select your supervised model component?

What was your evaluation metric?
Why was that the right one? Did you consider any others?


How do you know this is good enough?

What are the weaknesses of the model as it stands?


What would you do if you had more time?

What expertise did you not have that you wish you did?


## Data Analysis Report Capstone.md

      
    Raw
  

              Data Analysis Report Capstone.md
            
          
    Data Analysis Report

Prompting questions for call

The goal of this call is to assess if the the student will be successful in the full data science bootcamp.
Indicators of success

Statistical literacy
Grasp of programming fundamentals

The questions below are suggestions to help you dig into the student’s work. You may choose to skip some of these questions or ask others that are not listed here. The student’s report should guide the discussion. If you see obvious issues, question them. Keep in mind that you’re trying to gauge the student’s understanding of basic statistics and programming. Be empathetic to the fact that they’re just starting their journey in data science.
Intro

Greet the student
Have the student pull up their project and share their screen
Explain the agenda of the call: “You’re going to walk me through your report, and along the way, I’ll stop you to ask questions. To start off, explain the dataset you chose.”
Dataset

Student should explain what’s in their dataset, where it comes from, and why it’s interesting/significant. If the student skips any of this information, prompt them to explain.
Further prompting:

Did you have any challenges with this data?
Was there any missing information or anything you had to drop?
Why did you choose this dataset?
How might this dataset be biased?

Questions

Student should have three complex analytic questions that are broken down or presented from different perspectives.
Further prompting:

How did your dataset inform the questions you chose to explore?
Did your questions change at all while working on this project? If so, how did they change?
Why did you choose these questions? Looking back, on your work, do you want to rephrase or reword any of your questions?
You’ve asked a question with a yes or no answer. How could you dig into this topic more? What other questions arise from this one?
Is this question answerable? Can you prove your answer?
Is the data you have the best data to answer this question? What’s missing?

Code

Student should write clear, coherent code and use the data science toolkit to analyze their dataset.
Further prompting:

What steps did you take to answer this question?
What issues did you run into while analyzing your data?
What tools did you use to help you analyze your data?
If you see any obvious errors in their code, point it out. Ask the student how they could fix the error.
Point to obvious PEP8 nonconformance and ask them "This code works, but do you see any issues with it?"

Analysis

Student should use summary statistics, statistical tests, and clear visualizations to present their conclusions in a way that’s easy to understand.

Did the conclusions from your data analysis surprise you or did they confirm your expectations? Why do you think that is?
Imagine someone sees this chart/graph/visualization out in the wild, separated from your report. What conclusions would you expect them to draw? Is that the conclusion that you want them to draw?
Why are these conclusions significant?
What further research would you propose for this dataset? What technologies or concepts would you need to learn in order to conduct that research?
How could you make your conclusions more rigorous?
Could someone look at the work you’ve done and come to a different conclusion?
What does this visualization/analysis mean? How else could you show the same results?

Rubric


Objective
Rate 1
Rate 3
Rate 5


Data/Questions


Data set choice
The student chooses a dataset that is either trivially small or otherwise inappropriate for analysis
The student picks a dataset that is in some way significantly flawed or incompatible with their desired analysis and does not appropriately navigate those challenges. (IE data is only from a specific subset of the population or has some knowable bias).
The student picks a robust data set, understands its provenance, and accomodates any relevant outside information or assumptions


Questions
The student asks overly simple questions that are asnwerable in single lines of code
The student approaches the questions with multiple steps, but presents only a single perspective or is disjointed in the approach.
The student chooses complex questions and then breaks them down to either multiple subquestions or presents different ways of reaching a conclusion and evaluates those merits. The questions also build on each other, leading to robust and engaging conclusions.


General Clarity/Structure
The student's report is unstructured or difficult to read.
The student provides some structure but it still contains moments where it is easy to lose the narrative or flow of questions.
The report is easy to read and uses appropriate markdown to give it a nice presentation and flow.


Code


Python Essentials
The student is writing unorganized, unintelligible code.
The code has a structure to it, but is redundant and heavily reliant on bad practices like copy/pasting or contains code that is no longer used.
The student writes good, clean, coherent code.


Pep8
Student is living in the wild west of code style.
There are some critical errors in pep8 styling, like inappropraite spacing or bad variable names
The code is fully or almost fully pep8 compliant.


Data Science Toolkit
The student is doing some work outside of python or not using data science tools where appropriate
The student uses some data science tools, but occasionally reverts to other structures in Python
The student is using the data science toolkit and creating easy to understand data structures like well labeled pandas dataframes rather than matricies.


Analysis


Visualizations - Visual Elements
Plots are unlabeled and unreadable
Some plots may lack a few labels but they are generally readable
Visuals are easily and independently readable, presenting robust conclusions that are easy to understand


Visualizations - Statistical Elements
The student relies on at most one type of visualization, even when others are more appropriate
The student tries to use multiple types of visual, but occasionally picks an inappropriate visual for a given question
The student uses a wide variety of visualizations, with each visual presenting concise information in the best possible way.


Summary and General Statistics
The student does not use summary statistics.
The student conducts some summary statistics to balance out visuals, but they are not always the most effective for their narrative
The student uses summary stats and statistical tests to compliment their visuals ina clear and compelling way.


## Supervised Learning Capstone Rubric.md

      
    Raw
  

              Supervised Learning Capstone Rubric.md
            
          
Objective
Rate 1
Rate 3
Rate 5


Problem Statement


Data choice
The student chooses a dataset that is either trivially small or otherwise inappropriate for ML
The student picks a dataset that is in some way significantly flawed or incompatible with their desired model and does not appropriately navigate those challenges. (IE data is only from a specific subset of the population or has some knowable bias). Also the data may be too explicitly set up for only one problem.
The student picks a robust data set, understands its provenance, and accomodates any relevant outside information or assumptions


Questions
The student asks overly simple questions that are asnwerable in single lines of code
The student approaches the questions with multiple steps, but presents only a single perspective or is disjointed in the approach.
The student approaches a complex ML problem, but frames it correctly within the tools they have developed in the course


Code


Python Essentials
The code has a structure to it, but is redundant and heavily reliant on bad practices like copy/pasting or contains code that is no longer used.
The code is good, but not very efficient or the logic is somewhat broken
The student writes good, clean, coherent code.


Pep8
Student is living in the wild west of code style.
Maybe there is a bad line or two, but not much. The code looks OK but not awe inspiring.
The code is fully pep8 compliant.


Data Science Toolkit
The student uses some data science tools, but they frequently aren't the right ones
The student uses some data science tools, but occasionally reverts to other structures in Python unnecessarily or doesn't always use the best tool for the job
The student is using the data science toolkit and creating easy to understand data structures like well labeled pandas dataframes rather than matricies.


Machine Learning


Model Training/Tuning
The model is trained on test data or untuned
The model has a simple training structure but no real tuning
The model is tuned and trained using tools like grid search and cross validation, and those tools are explained


Model Selection
Only one model is tried
Models are compared but not robustly
The model comparison incorporates the training toolset robustly and clearly.


Model Validation
Evaluation is a one off and one a single metric that is just the 'score' attribute in SKLearn
The model is validated using cross validation but it doesn't consider if the score metric is the right metric or it relys on only one metric
The model is evluated robustly and considers the right metrics that will matter for the problem as defined


Presentation


Clarity
The presentation is muddled
The presentation has some missing information or provokes some questions it doesn't answer but provides some structure for the student to walk through. It may be text heavy.
The presentation is easy to walk through and sets up a talk that invites and then answers the audience's quesitons


Visuals
The visuals are ugly or lacking
There are some visuals but they are not the best representations of the data or they are not well formatted
Visuals are crisp, clean and hugely compelling. They could exist on their own and be great.
Objective	Rate 1	Rate 3	Rate 5
Features
Identifying Features	The student suggests nonsense features	The student suggests one or two features, and does not focus on usage behaviors	The student suggests several features across behavioral and demographic categories outlining strong logic for each
Engineering Features	The student does not want to engineer new features	The student engineers one feature with questionable efficacy	The student engineers many features that creatively attend to the concern of churn
Model Selection
Which Model	The student proposes either only a single model or a non-modeling approach	The student presents a few models but may fail to detail the advantages of them	The student suggests several models with a thorough explanation of their strengths and weaknesses when pressed
Random Forest	The student either incorrectly or incompletely explains random forest	The student explains what random forest is without noting legitimate advantages or disadvantages to the technique	The student presents a thorough explanation of random forest, noting features like parallelizeablity
Weaker Models	The student either does not know or does not discredit weaker models like Naive Bayes or KNN in this context	The student suggests these models will not work well but does not explain why	The student acknowledges and explains these models weakness at incorporating complex trends observed in data
Evaluation	The student does not correctly evaluate their model	The student indicated some form of correct validation but it is narrow and/or incomplete. For example, suggesting one metric for evaluation but not mentioning cross validation or explaining that metric.	The student provides a thorough and varied evaluation strategy for their model and puts models in context with each other, weighing various kinds of costs and benefits
Small Sample Size	When offered a data set with a small sample size, the student proposes the same approach	The student proposes some changes to modeling structure but struggles with the reasons why something like neural networks wont work with small samples	The student accurately acknowledges the advantages of certain models in small sample sizes
New Product	The student does not adjust their approach	The student proposes an analytic or modeling solution that does not fully utilize the information that is available	The student elegantly combines analytics and modeling to leverage the small size of the dataset but also the potential value of machine learning.
Other
K-Means	Student does not know what k means is other than basic facts like it 'does grouping'	Student talk about kmeans as unsupervised and mentions finding clusters, but does not go into how those clusters are found or defined	Student describes K-Means clearly, mentioning key features like convergence, distance to cetroid, and k-centroids
K-Means failure	Student says k-means cannot fail	Student works through thinking about different shaped clusters but cannot see a failure rule	Student sees linear separability as key to k-means's success
Supervised vs Unsupervised	Student does not know what these things are	Outcome vs no outcome kind of explanation	The student understand and expresses the predictive, observed nature of supervised learning vs the assosciative nature of much of unsupervised
Bias Variance tradeoff	Student does not know what these things are	Some mention of what bias and variance are, but not how they relate	Explains a clear relationship including
Objective	Rate 1	Rate 3	Rate 5
Communication
Clarity	The student is difficult to follow, telling stories that don't seem to make sense or otherwise struggling to communicate	The student tells stories but occasionally drops details or makes leaps without filling in the necessary details	The student tells clear stories that are engaging and relevant while being natural to follow
Pacing	The student speaks either far too quickly or too slowly	The student speaks at a reasonable pace, but it has some difficulties or frustrations	The student keeps an engaged pace and ensures that the listener is following by asking when appropriate
Focus	The student does not stay on topic or answer questions	The student occasionally gets lost in details or rabbit holes but generally stays on topic	The student uses every opportunity to further their case as a data science applicant
Narrative
Data science narrative	The student says they're a data scientist because they took a bootcamp	The student acknowledges a previous interest in data science but doesn't sell it well	The student crafts a clear story of how their interest in data science has evolved but has longstanding roots
Discussion of bootcamp	The student says a bootcamp made them a data scientist	The student mentions the bootcamp, either overemphasies it or undersells it	The student uses the bootcamp as a tool in their narrative and development but not the crux thereof
Discussion of work history	The student doesn't acknowledge work before the bootcamp or speaks about it poorly	Some clear coverage of work history, but not great explanation of progress and development	The student uses past work experience to lay the groundwork for data science
Case Example
Setup	Does not explain the setup of their example project well	There is little explanation of why the project is important or interesting, but the setup is explained	It is clear what the project aimed to do and why it was done
Explanation	Does not explain the project well	The student provides some explanation of the project but it is unclear	The project is clearly presented and easily understood
Impact	The project had no impact	Project is technically useful but not impactful	The project is obviously shown to be impactful both in their development and as a product itself
Objective	Rate 1	Rate 3	Rate 5
Product
Data choice	The student chooses a dataset that is either trivially small or otherwise inappropriate for ML	The student picks a dataset that is in some way significantly flawed or incompatible with their desired model and does not appropriately navigate those challenges. (IE data is only from a specific subset of the population or has some knowable bias). Also the data may be too explicitly set up for only one problem.	The student picks a robust data set, understands its provenance, and accomodates any relevant outside information or assumptions
Product	The product is too simple or has no use case	The product is limited in its use and not easily generalized to a non sterlizied environment	The product easily translates into a variety of situations and has obvious and clear value
Code
Python Essentials	The code has a structure to it, but is redundant and heavily reliant on bad practices like copy/pasting or contains code that is no longer used.	The code is good, but not very efficient or the logic is somewhat broken	The student writes good, clean, coherent code.
Pep8	Student is living in the wild west of code style.	Maybe there is a bad line or two, but not much. The code looks OK but not awe inspiring.	The code is fully pep8 compliant.
Data Science Toolkit	The student uses some data science tools, but they frequently aren't the right ones	The student uses some data science tools, but occasionally reverts to other structures in Python unnecessarily or doesn't always use the best tool for the job	The student is using the data science toolkit and creating easy to understand data structures like well labeled pandas dataframes rather than matricies.
Machine Learning
Model Training/Tuning	The model is trained on test data or untuned	The model has a simple training structure but no real tuning. Also no robust pipeline is created incoporating unsupervised techniques	The model cleanly includes a solid pipeline and incorporates the necessary techniques for broad robustness
Model Selection	Only one model is tried	Models are compared but not robustly	The model comparison incorporates the training toolset robustly and clearly.
Model Validation	Evaluation is a one off and one a single metric that is just the 'score' attribute in SKLearn	The model is validated using cross validation but it doesn't consider if the score metric is the right metric or it relys on only one metric	The model is evluated robustly and considers the right metrics that will matter for the problem as defined
Presentation
Clarity	The presentation is muddled	The presentation has some missing information or provokes some questions it doesn't answer but provides some structure for the student to walk through. It may be text heavy.	The presentation is easy to walk through and sets up a talk that invites and then answers the audience's quesitons
Visuals	The visuals are ugly or lacking	There are some visuals but they are not the best representations of the data or they are not well formatted	Visuals are crisp, clean and hugely compelling. They could exist on their own and be great.
Objective	Rate 1	Rate 3	Rate 5
Data/Questions
Data set choice	The student chooses a dataset that is either trivially small or otherwise inappropriate for analysis	The student picks a dataset that is in some way significantly flawed or incompatible with their desired analysis and does not appropriately navigate those challenges. (IE data is only from a specific subset of the population or has some knowable bias).	The student picks a robust data set, understands its provenance, and accomodates any relevant outside information or assumptions
Questions	The student asks overly simple questions that are asnwerable in single lines of code	The student approaches the questions with multiple steps, but presents only a single perspective or is disjointed in the approach.	The student chooses complex questions and then breaks them down to either multiple subquestions or presents different ways of reaching a conclusion and evaluates those merits. The questions also build on each other, leading to robust and engaging conclusions.
General Clarity/Structure	The student's report is unstructured or difficult to read.	The student provides some structure but it still contains moments where it is easy to lose the narrative or flow of questions.	The report is easy to read and uses appropriate markdown to give it a nice presentation and flow.
Code
Python Essentials	The student is writing unorganized, unintelligible code.	The code has a structure to it, but is redundant and heavily reliant on bad practices like copy/pasting or contains code that is no longer used.	The student writes good, clean, coherent code.
Pep8	Student is living in the wild west of code style.	There are some critical errors in pep8 styling, like inappropraite spacing or bad variable names	The code is fully or almost fully pep8 compliant.
Data Science Toolkit	The student is doing some work outside of python or not using data science tools where appropriate	The student uses some data science tools, but occasionally reverts to other structures in Python	The student is using the data science toolkit and creating easy to understand data structures like well labeled pandas dataframes rather than matricies.
Analysis
Visualizations - Visual Elements	Plots are unlabeled and unreadable	Some plots may lack a few labels but they are generally readable	Visuals are easily and independently readable, presenting robust conclusions that are easy to understand
Visualizations - Statistical Elements	The student relies on at most one type of visualization, even when others are more appropriate	The student tries to use multiple types of visual, but occasionally picks an inappropriate visual for a given question	The student uses a wide variety of visualizations, with each visual presenting concise information in the best possible way.
Summary and General Statistics	The student does not use summary statistics.	The student conducts some summary statistics to balance out visuals, but they are not always the most effective for their narrative	The student uses summary stats and statistical tests to compliment their visuals ina clear and compelling way.
Objective	Rate 1	Rate 3	Rate 5
Problem Statement
Data choice	The student chooses a dataset that is either trivially small or otherwise inappropriate for ML	The student picks a dataset that is in some way significantly flawed or incompatible with their desired model and does not appropriately navigate those challenges. (IE data is only from a specific subset of the population or has some knowable bias). Also the data may be too explicitly set up for only one problem.	The student picks a robust data set, understands its provenance, and accomodates any relevant outside information or assumptions
Questions	The student asks overly simple questions that are asnwerable in single lines of code	The student approaches the questions with multiple steps, but presents only a single perspective or is disjointed in the approach.	The student approaches a complex ML problem, but frames it correctly within the tools they have developed in the course
Code
Python Essentials	The code has a structure to it, but is redundant and heavily reliant on bad practices like copy/pasting or contains code that is no longer used.	The code is good, but not very efficient or the logic is somewhat broken	The student writes good, clean, coherent code.
Pep8	Student is living in the wild west of code style.	Maybe there is a bad line or two, but not much. The code looks OK but not awe inspiring.	The code is fully pep8 compliant.
Data Science Toolkit	The student uses some data science tools, but they frequently aren't the right ones	The student uses some data science tools, but occasionally reverts to other structures in Python unnecessarily or doesn't always use the best tool for the job	The student is using the data science toolkit and creating easy to understand data structures like well labeled pandas dataframes rather than matricies.
Machine Learning
Model Training/Tuning	The model is trained on test data or untuned	The model has a simple training structure but no real tuning	The model is tuned and trained using tools like grid search and cross validation, and those tools are explained
Model Selection	Only one model is tried	Models are compared but not robustly	The model comparison incorporates the training toolset robustly and clearly.
Model Validation	Evaluation is a one off and one a single metric that is just the 'score' attribute in SKLearn	The model is validated using cross validation but it doesn't consider if the score metric is the right metric or it relys on only one metric	The model is evluated robustly and considers the right metrics that will matter for the problem as defined
Presentation
Clarity	The presentation is muddled	The presentation has some missing information or provokes some questions it doesn't answer but provides some structure for the student to walk through. It may be text heavy.	The presentation is easy to walk through and sets up a talk that invites and then answers the audience's quesitons
Visuals	The visuals are ugly or lacking	There are some visuals but they are not the best representations of the data or they are not well formatted	Visuals are crisp, clean and hugely compelling. They could exist on their own and be great.