• anecdotal evidence: Evidence, often personal, that is collected casually rather than by a well-designed study.
• population: A group we are interested in studying. “Population” often refers to a group of people, but the term is used for other subjects, too.
• cross-sectional study: A study that collects data about a population at a particular point in time.
• cycle: In a repeated cross-sectional study, each repetition of the study is called a cycle.
• longitudinal study: A study that follows a population over time, collecting data from the same group repeatedly.
• record: In a dataset, a collection of information about a single person or other subject.
• respondent: A person who responds to a survey.
• sample: The subset of a population used to collect data.
• representative: A sample is representative if every member of the population has the same chance of being in the sample.
• oversampling: The technique of increasing the representation of a sub-population in order to avoid errors due to small sample sizes.
• raw data: Values collected and recorded with little or no checking, calculation or interpretation.
• recode: A value that is generated by calculation and other logic applied to raw data.
• data cleaning: Processes that include validating data, identifying errors, translating between data types and representations, etc.
• distribution: The values that appear in a sample and the frequency of each.
• histogram: A mapping from values to frequencies, or a graph that shows this mapping.
• frequency: The number of times a value appears in a sample.
• mode: The most frequent value in a sample, or one of the most frequent values.
• normal distribution (Gaussian distribution): An idealization of a bell-shaped distribution.
• uniform distribution: A distribution in which all values have the same frequency.
• outlier: A value far from the central tendency.
Quantitative Analysis: Quantitative Analysis or the Statistical Analysis is the science of collecting and interpreting data with numbers and graphs to identify patterns and trends.
Qualitative Analysis: Qualitative or Non-Statistical Analysis gives generic information and uses text, sound and other forms of media to do so.
Descriptive Statistics: Descriptive Statistics uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables.
- Descriptive Statistics helps organize data and focuses on the characteristics of data providing parameters.
Inferential Statistics: Inferential Statistics makes inferences and predictions about a population based on a sample of data taken from the population in question.
- Inferential statistics generalizes a large data set and applies probability to arrive at a conclusion. It allows you to infer parameters of the population based on sample stats and build models on it.
- The mean is the average of the numbers i.e., (sum of numbers)/(Count of numbers)
- Describes the variablility or spread of a data distribution.
- The average of the squared differences from the Mean. σ^2 = 1/nΣ i = 1-to-N(xi − x̄)2
Population Variance is the average of squared deviations
Sample Variance is the average of squared differences from the mean
- The Standard Deviation is a measure of how spread out numbers are.
- In other words, it is the square root of the Variance.
Population SD = root (1/N Σ i = 1-to-N (xi - μ)2)
Sample SD = root (1/N-1 Σ i = 1-to-N (xi - x̄)2)
Calculation of Variance & SD
Data: 15, 16, 18, 19, 22, 24, 29, 30, 34 Mean: 15 Distances from Mean (Mean - Data Point): 8, 7, 5, 4, 1, 1, 6, 7, 11 Squaring & Adding: 8^2 + 7^2 + ... = 362 Variance = 362 / 9 = 40.22 SD = 6.34