Part 3: Data Reduction Methods

Non-Hierarchical Clustering

When data must be segmented or grouped, K-Means clustering provides a straightforward, distance-based approach. It partitions observations into K clusters by iteratively assigning them to the nearest centroids. This section details the K-Means algorithm’s fundamentals, plus considerations like standardization and the Elbow Method for choosing K.

K-MEANS FUNDAMENTALS

Learning Objectives

  • Cluster data into K groups using distance-based assignment

  • Summarize and profile clusters for actionable insights

Indicative Content

  • Algorithm Steps

    • Initialize centroids → assign points → recalc centroids → converge

  • Scaling

    • Importance of standardizing variables to prevent magnitude bias

K-MEANS IMPLEMENTATION & ELBOW METHOD

Learning Objectives

  • Use the Elbow Method to find an appropriate K

Indicative Content

  • Elbow Plot

    • WCSS (within-cluster sum of squares)/inertia vs. K

  • Interpretation

    • Observing the “bend” to choose optimal clusters

TOOLS & METHODOLOGIES (NON-HIERARCHICAL CLUSTERING)

  • Python

    • sklearn.cluster.KMeans for centroid-based segmentation

  • Workflow

    • Scale data → set initial K → run K-Means → evaluate WCSS

  • Evaluation

    • Elbow Plot to refine K

    • Cluster profiles for meaningful segmentation