# isaacarnault/OUTPUT.md

Last active February 17, 2024 18:42
Data collection using Python

# Data collection and statistics using Python and R

## Scripting in Python and R

The following gist offers a focus on Data Collection, one of the stages* of the Data Science methodology. We will also perform basic math operations on a single dataframe to see how they render using Python or R.

## Exercise

• Perform a data collection in `Python` and `R` using `Jupyter`.
⇢ Use the following dataframe from Spatialkey.com.
• How many observations and variables does the dataframe contain? Base your assessment on your scripting outputs.
• Calculate Sum, Min, Max and Mean of variable "raisedAmt" using Python (and Pandas) and using R.
— (*) Ten stages are crucial regarding Data Science methodology, among which Data collection. See architecture.md.
Vertices of Data Science methodology

`There are 10 variables and 1461 observations in the dataframe.`

Calculations using Python and R

```Sum = 14791971750
Min = 6000
Max = 300000000
Mean = 10131487.5 # Using R in Jupyter, otherwise Mean = 10131488 in RStudio```

 #1 Checking Python version !python -V #2 Import pandas to read the dataframe import pandas as pd pd.set_option('display.max_columns', None) MyData = pd.read_csv("http://samplecsvs.s3.amazonaws.com/SalesJan2009.csv") #3 Show the first rows of the dataframe MyData.head() #4 Get the dimensions of the dataframe MyData.shape # Full code !python -V import pandas as pd pd.set_option('display.max_columns', None) MyData = pd.read_csv("http://samplecsvs.s3.amazonaws.com/SalesJan2009.csv") #3 Show the first rows of the dataframe MyData.head() MyData.shape
