Instantly share code, notes, and snippets.

# isaacarnault/OUTPUT.md

Last active February 17, 2024 18:42
Show Gist options
• Save isaacarnault/15873ff613af833f9693e1a595bdfcc6 to your computer and use it in GitHub Desktop.
Data collection using Python

# Data collection and statistics using Python and R

## Scripting in Python and R

The following gist offers a focus on Data Collection, one of the stages* of the Data Science methodology. We will also perform basic math operations on a single dataframe to see how they render using Python or R.

# Versioning

I used no versioning system for this gist. My gist gist's repos status is flagged as concept because it is intended to be a demo or POC (proof-of-concept).

## Author

• Isaac Arnault - Suggesting two implementations in `Python` and `R`, from Initial work Cognitive Class Lab - Module 2 and providing one exercise.

## Licence

All public gists https://gist.github.com/isaacarnault

## Exercise

• Perform a data collection in `Python` and `R` using `Jupyter`.
⇢ Use the following dataframe from Spatialkey.com.
• How many observations and variables does the dataframe contain? Base your assessment on your scripting outputs.
• Calculate Sum, Min, Max and Mean of variable "raisedAmt" using Python (and Pandas) and using R.
— (*) Ten stages are crucial regarding Data Science methodology, among which Data collection. See architecture.md.
Vertices of Data Science methodology

`There are 10 variables and 1461 observations in the dataframe.`

Calculations using Python and R

```Sum = 14791971750
Min = 6000
Max = 300000000
Mean = 10131487.5 # Using R in Jupyter, otherwise Mean = 10131488 in RStudio```

Complete solution using Python and Pandas

Complete solution using R

See output

# Data collection using R

See output

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 #1 Checking Python version !python -V #2 Import pandas to read the dataframe import pandas as pd pd.set_option('display.max_columns', None) MyData = pd.read_csv("http://samplecsvs.s3.amazonaws.com/SalesJan2009.csv") #3 Show the first rows of the dataframe MyData.head() #4 Get the dimensions of the dataframe MyData.shape # Full code !python -V import pandas as pd pd.set_option('display.max_columns', None) MyData = pd.read_csv("http://samplecsvs.s3.amazonaws.com/SalesJan2009.csv") #3 Show the first rows of the dataframe MyData.head() MyData.shape
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters