Descriptive Statistics
Descriptive Statistics focuses on key statistical concepts including data measurement scales, descriptive statistics, and bivariate analysis techniques such as scatterplots, correlation, and simple linear regression. Descriptive statistics methods—including measures of central tendency, variation, skewness, kurtosis, and preliminary visualizations—provide essential tools for summarizing and exploring data. These foundations clarify underlying data patterns and establish the groundwork for more advanced inferential analyses.
Measurement Scales in Data
Learning Objectives
Distinguish nominal, ordinal, interval, and ratio scales, and explain how each scale influences statistical methodology.
Indicative Content
Examples: Gender (Nominal), Satisfaction Level (Ordinal), Age (Ratio).
Implications for analysis: Nominal for categories, ordinal for ranked data, interval/ratio for numerical calculations.
Central Tendency
Learning Objectives
Compute and interpret mean, median, mode, and trimmed mean, and illustrate data distributions using basic plots.
Indicative Content
.mean()
,.median()
,.mode()
,scipy.stats.trim_mean()
.Visualization:
plt.hist()
,plt.boxplot()
.
Variation
Learning Objectives
Calculate and interpret measures of variation (range, IQR, variance, standard deviation, coefficient of variation) to assess data dispersion.
Indicative Content
.max() - .min()
,.quantile()
,.var()
,.std()
, and manual calculation for coefficient of variation (std / mean
).
Skewness and Kurtosis in Data
Learning Objectives
Evaluate distribution shapes using skewness and kurtosis, and interpret how these measures reveal asymmetry or peakedness in real-world datasets.
Indicative Content
scipy.stats.skew()
,scipy.stats.kurtosis()
.
Interpreting Scatterplots
Learning Objectives
Construct scatterplots to explore relationships between two continuous variables and detect trends or outliers.
Indicative Content
plt.scatter()
.
Pearson’s Correlation Coefficient
Learning Objectives
Compute and interpret Pearson’s correlation coefficient to assess linear associations between variables.
Indicative Content
numpy.corrcoef()
,.corr()
.
Simple Linear Regression
Learning Objectives
Develop and interpret simple linear regression models to examine relationships between dependent and independent variables.
Indicative Content
statsmodels.api.OLS()
,.summary()
.
Two-Way Tables
Learning Objectives
Construct and interpret frequency and percentage tables for two categorical variables to analyze relationships.
Indicative Content
pd.crosstab()
,normalize='index'
or'columns'
.
Three-Way Tables
Learning Objectives
Summarize the interaction among three categorical variables using multi-dimensional tables.
Indicative Content
pd.crosstab()
with multiple variables.
Tools and Methodologies
Python (including
pandas
andnumpy
) for core descriptive calculations and tabulationsscipy.stats
for key statistical measures (e.g., skewness, kurtosis) and robust central tendency methodsmatplotlib
for basic data visualizations (e.g., scatterplots, histograms)statsmodels
for simple linear regression and associated diagnosticsMethodologies
Summarize data distributions with measures of center, variation, skewness, and kurtosis
Visualize numeric variables using plots and assess bivariate relationships (correlation, simple linear regression)
Construct frequency/percentage tables (two-way or multi-way) to evaluate interactions among categorical variables