rebeccabilbro/top-data-science-questions.md

## top-data-science-questions.md

      
    Raw
  

              top-data-science-questions.md
            
          
    Title

The top n questions data scientists ask
Introduction

Data science doesn’t start with data, it starts with a problem…
The pipeline model is useful, but data scientists progress via a series of questions - what are those questions?
Scoping

Questions data scientists ask to determine the project objective and scope

Requirements


Who is the client?
What is the desired output?
Is there a clear vision of what and why I need to do x?
How is this going to be used?
Who is going to use this?
How much time do I have?
What is the quantitative question?
What does the literature say about this?
Is there an existing model, algorithm, or baseline?

Data Availability


What data is available?
Is this the right data to answer the question?

Methods

Questions data scientists ask to decide which methods and tools to use

Workflow


Does my analysis make sense?
Can it work?
Will it scale?
Can I explain it clearly?
Is it viable?
How does it fit into current workflow?
Will my analysis actually answer the question?
Will it do what I want it to do?

Tools


What tools are available to me?
Do I need to supplement the data?
Is there other/similar/more data available somewhere else?
What kind of experiment can I do?

Interpretation

Questions data scientists ask to evaluate their results as they iterate

Reading Data


What is the size of the data?
What is the shape of the data?
Is it normalized?
Are things correlated?
How many features are there (feature discovery)?
How do I extract meaning from the features?

Visual Analytics


What does the data look like visually?
Are there clusters?
Outliers or anomalies or weird things?
What is the distribution?

Optimization


How much error will be accepted?
How many steps are necessary to answer the question?
Can I reduce those steps?
How does changing one part change other parts?

Conclusion


What does "done" look like?

Further Reading