Part 3: Data Reduction Methods
Non-Hierarchical Clustering
When data must be segmented or grouped, K-Means clustering provides a straightforward, distance-based approach. It partitions observations into K clusters by iteratively assigning them to the nearest centroids. This section details the K-Means algorithm’s fundamentals, plus considerations like standardization and the Elbow Method for choosing K.
K-MEANS FUNDAMENTALS
Learning Objectives
Cluster data into K groups using distance-based assignment
Summarize and profile clusters for actionable insights
Indicative Content
Algorithm Steps
Initialize centroids → assign points → recalc centroids → converge
Scaling
Importance of standardizing variables to prevent magnitude bias
K-MEANS IMPLEMENTATION & ELBOW METHOD
Learning Objectives
Use the Elbow Method to find an appropriate K
Indicative Content
Elbow Plot
WCSS (within-cluster sum of squares)/inertia vs. K
Interpretation
Observing the “bend” to choose optimal clusters
TOOLS & METHODOLOGIES (NON-HIERARCHICAL CLUSTERING)
Python
sklearn.cluster.KMeans
for centroid-based segmentation
Workflow
Scale data → set initial K → run K-Means → evaluate WCSS
Evaluation
Elbow Plot to refine K
Cluster profiles for meaningful segmentation