Skip to content

Instantly share code, notes, and snippets.

View peeush-the-developer's full-sized avatar

Peeush Agarwal peeush-the-developer

  • Pune, MH, India
View GitHub Profile
@peeush-the-developer
peeush-the-developer / std_dev.py
Created July 25, 2021 02:41
Statistics: Standard Deviation
# Data
a = [12, 54, 32, 100, 20]
# Method 1
## Calculate sample mean
mean = sum(a) / len(a)
## Calculate distance from sample mean and then square it
distance_from_mean_squared = [(i-mean)**2 for i in a]
## Calculate the sample variance
var = sum(distance_from_mean_squared) / (len(a)-1)
@peeush-the-developer
peeush-the-developer / variance.py
Created July 25, 2021 02:36
Statistics: Variance of the data
# Data
a = [12, 54, 32, 100, 20]
# Method 1
## Calculate sample mean
mean = sum(a) / len(a)
## Calculate distance from sample mean and then square it
distance_from_mean_squared = [(i-mean)**2 for i in a]
## Calculate the sample variance
var = sum(distance_from_mean_squared) / (len(a)-1)
@peeush-the-developer
peeush-the-developer / iqr.py
Created July 25, 2021 02:34
Statistics: Interquartile range
# Data
a = [12, 54, 32, 100, 20]
# Import numpy library for percentile calculation
from numpy import percentile
# Calculate percentiles for 75th(Q3) and 25th(Q1)
q3, q1 = percentile(a, [75, 25])
# Calculate IQR = Q3 - Q1
@peeush-the-developer
peeush-the-developer / range.py
Created July 25, 2021 02:32
Statistics: Range of the dataset
# Data
a = [12, 54, 32, 100, 20]
# Calculate range
range = max(a) - min(a)
# Display output
print(range)
# Output: 88
@peeush-the-developer
peeush-the-developer / mode.py
Created June 22, 2021 03:51
Mode in the data
data = [5, 5.5, 5.5, 5.2, 5.6]
# We have a direct formula in statistics library to calculate the mode
from statistics import mode
# Calculate the mode value
mode_ = mode(data)
# Display the mode value
print(mode_)
@peeush-the-developer
peeush-the-developer / median_even_counts.py
Created June 22, 2021 03:41
Median when count of values is even
data = [5, 6, 3, 8, 4, 7]
# Step 1: Sort the values
data_sorted = sorted(data) # [3, 4, 5, 6, 7, 8]
# Step 2: Find the central values in the data
central_values = data_sorted[2:4] # we need 2nd and 3rd indices values to calculate the median
median_ = sum(central_values)/2
# Display median
@peeush-the-developer
peeush-the-developer / median_odd_counts.py
Created June 22, 2021 03:37
Median when count of data is odd
data = [5, 6, 3, 4, 7]
# Step 1: Sort the values
data_sorted = sorted(data) # [3, 4, 5, 6, 7]
# Step 2: Find the central value in the data
median_ = data_sorted[2] # 2 gives us 3rd item which is central value
# Display median
print(median_)
@peeush-the-developer
peeush-the-developer / mean_outliers.py
Created June 22, 2021 03:15
Mean affected by outliers
salaries_in_K = [1, 10, 1000]
# mean = (sum of values)/(total number of values)
mean = sum(salaries_in_K)/len(salaries_in_K)
# Display calculated mean
print(mean)
# Output
# 337.0
data = [3, 4, 5, 6, 7]
# mean = (sum of values)/(total number of values)
mean = sum(data)/len(data)
# Display calculated mean
print(mean)
# Output
# 5.0
@peeush-the-developer
peeush-the-developer / create_dataframe.py
Created June 5, 2021 06:00
Create dataframe and display top 5 rows
# Create dataframe with the items in lies list
df = pd.DataFrame(items, columns=['Date', 'Lie', 'Truth', 'Truth_Link'])
# Display top 5 rows from the dataframe
df.head()