mepsrajput/statistics.md

## statistics.md

      
    Raw
  

              statistics.md
            
          
    Exploratory data analysis

• anecdotal evidence: Evidence, often personal, that is collected casually rather than by a well-designed study.
• population: A group we are interested in studying. “Population” often refers to a group of people, but the term is used for other subjects, too.
• cross-sectional study: A study that collects data about a population at a particular point in time.
• cycle: In a repeated cross-sectional study, each repetition of the study is called a cycle.
• longitudinal study: A study that follows a population over time, collecting data from the same group repeatedly.
• record: In a dataset, a collection of information about a single person or other subject.
• respondent: A person who responds to a survey.
• sample: The subset of a population used to collect data.
• representative: A sample is representative if every member of the population has the same chance of being in the sample.
• oversampling: The technique of increasing the representation of a sub-population in order to avoid errors due to small sample sizes.
• raw data: Values collected and recorded with little or no checking, calculation or interpretation.
• recode: A value that is generated by calculation and other logic applied to raw data.
• data cleaning: Processes that include validating data, identifying errors, translating between data types and representations,  etc.
• distribution: The values that appear in a sample and the frequency of each.
• histogram: A mapping from values to frequencies, or a graph that shows this mapping.
• frequency: The number of times a value appears in a sample.
• mode: The most frequent value in a sample, or one of the most frequent values.
• normal distribution (Gaussian distribution): An idealization of a bell-shaped distribution.
• uniform distribution: A distribution in which all values have the same frequency.
• outlier: A value far from the central tendency.
Types Of Analysis

Quantitative Analysis: Quantitative Analysis or the Statistical Analysis is the science of collecting and interpreting data with numbers and graphs to identify patterns and trends.
Qualitative Analysis: Qualitative or Non-Statistical Analysis gives generic information and uses text, sound and other forms of media to do so.
Categories In Statistics

Descriptive Statistics: Descriptive Statistics uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables.

Descriptive Statistics helps organize data and focuses on the characteristics of data providing parameters.

Inferential Statistics: Inferential Statistics makes inferences and predictions about a population based on a sample of data taken from the population in question.

Inferential statistics generalizes a large data set and applies probability to arrive at a conclusion. It allows you to infer parameters of the population based on sample stats and build models on it.

Mean (x̄)


The mean is the average of the numbers i.e., (sum of numbers)/(Count of numbers)

Variance / Mean Squared Deviation (σ^2)


Describes the variablility or spread of a data distribution.
The average of the squared differences from the Mean.
σ^2 = 1/nΣ i = 1-to-N(xi − x̄)2

Population Variance is the average of squared deviations
Sample Variance is the average of squared differences from the mean
Standard Deviation (σ)


The Standard Deviation is a measure of how spread out numbers are.
In other words, it is the square root of the Variance.

Population SD = root (1/N Σ i = 1-to-N (xi - μ)2)
Sample SD = root (1/N-1 Σ i = 1-to-N (xi - x̄)2)
Calculation of Variance & SD
Data: 15, 16, 18, 19, 22, 24, 29, 30, 34
Mean: 15
Distances from Mean (Mean - Data Point): 8, 7, 5, 4, 1, 1, 6, 7, 11
Squaring & Adding: 8^2 + 7^2 + ... = 362
Variance = 362 / 9 = 40.22
SD = 6.34