PART 1: DATA ANALYTICS FOUNDATION

PART 1: DATA ANALYTICS FOUNDATION

Descriptive Statistics

Descriptive Statistics focuses on key statistical concepts including data measurement scales, descriptive statistics, and bivariate analysis techniques such as scatterplots, correlation, and simple linear regression. Descriptive statistics methods—including measures of central tendency, variation, skewness, kurtosis, and preliminary visualizations—provide essential tools for summarizing and exploring data. These foundations clarify underlying data patterns and establish the groundwork for more advanced inferential analyses.

Measurement Scales in Data

Learning Objectives

Distinguish nominal, ordinal, interval, and ratio scales, and explain how each scale influences statistical methodology.

Indicative Content

  • Examples: Gender (Nominal), Satisfaction Level (Ordinal), Age (Ratio).

  • Implications for analysis: Nominal for categories, ordinal for ranked data, interval/ratio for numerical calculations.

Central Tendency

Learning Objectives

Compute and interpret mean, median, mode, and trimmed mean, and illustrate data distributions using basic plots.

Indicative Content

  • .mean().median().mode()scipy.stats.trim_mean().

  • Visualization: plt.hist()plt.boxplot().

Variation

Learning Objectives

Calculate and interpret measures of variation (range, IQR, variance, standard deviation, coefficient of variation) to assess data dispersion.

Indicative Content

  • .max() - .min().quantile().var().std(), and manual calculation for coefficient of variation (std / mean).

Skewness and Kurtosis in Data

Learning Objectives

Evaluate distribution shapes using skewness and kurtosis, and interpret how these measures reveal asymmetry or peakedness in real-world datasets.

Indicative Content

  • scipy.stats.skew()scipy.stats.kurtosis().

Interpreting Scatterplots

Learning Objectives

Construct scatterplots to explore relationships between two continuous variables and detect trends or outliers.

Indicative Content

  • plt.scatter().

Pearson’s Correlation Coefficient

Learning Objectives

Compute and interpret Pearson’s correlation coefficient to assess linear associations between variables.

Indicative Content

  • numpy.corrcoef().corr().

Simple Linear Regression

Learning Objectives

Develop and interpret simple linear regression models to examine relationships between dependent and independent variables.

Indicative Content

  • statsmodels.api.OLS().summary().

Two-Way Tables

Learning Objectives

Construct and interpret frequency and percentage tables for two categorical variables to analyze relationships.

Indicative Content

  • pd.crosstab()normalize='index' or 'columns'.

Three-Way Tables

Learning Objectives

Summarize the interaction among three categorical variables using multi-dimensional tables.

Indicative Content

  • pd.crosstab() with multiple variables.

Tools and Methodologies

  • Python (including pandas and numpy) for core descriptive calculations and tabulations

  • scipy.stats for key statistical measures (e.g., skewness, kurtosis) and robust central tendency methods

  • matplotlib for basic data visualizations (e.g., scatterplots, histograms)

  • statsmodels for simple linear regression and associated diagnostics

  • Methodologies

    • Summarize data distributions with measures of center, variation, skewness, and kurtosis

    • Visualize numeric variables using plots and assess bivariate relationships (correlation, simple linear regression)

    • Construct frequency/percentage tables (two-way or multi-way) to evaluate interactions among categorical variables