Chapter 9: Needs-Based Segmentation
This chapter focuses on the next step after prioritizing needs: understanding that different groups of customers have distinctly different priorities. We will explore how to use the MaxDiff data from Chapter 8 to find these segments.
Before we talk about segmentation
Before we look at the methods, remember: there is no single right approach. Anyone trying to sell you some perfect, universal method is likely misleading you. Thousands of segmentation techniques exist, each with different strengths and applications. The key is choosing the method that best answers your specific business questions, works well with your available data, and helps you make decisions that work.
Some segmentation approaches focus on demographics (age, income, location). Others look at behaviors (how people shop, what they buy, when they engage). Still others examine attitudes and values. Needs-based segmentation, which we'll focus on here, groups people by what they're trying to accomplish and what matters most to them in that process.
The right segmentation method depends entirely on your situation. If you're launching a new product, you might segment by unmet needs to find the biggest opportunities. If you're optimizing marketing spend, behavioral segmentation might work better. If you're expanding internationally, geographic and cultural segments could be most relevant.
What makes segmentation valuable isn't the sophistication of the statistical technique or the number of variables you include. It's whether the segments you create help you understand your customers better and make more effective business decisions.
The best segmentation is often the simplest one that still captures the differences that matter for your specific challenges.
This chapter will show you how to use MaxDiff data to create needs-based segments. But remember that this is just one tool in a much larger toolkit. The goal is finding groups of customers whose needs are different enough that they warrant different approaches, products, or messages.
The Original ODI Segmentation Method Approach

Before we explore how to use MaxDiff data for segmentation, we should understand the traditional approach that established needs-based segmentation in the first place. The Outcome-Driven Innovation (ODI) methodology by Tony Ulwick and Strategyn created a process for segmenting customers based on their unmet needs.[40]
The classic ODI segmentation process follows three main steps:
1. Data Collection Researchers gather data on both the importance and satisfaction of dozens of need statements using traditional Likert scales. Customers rate statements like "minimize the time it takes to resolve an issue" on importance (typically 1-5 scale) and then rate how satisfied they are with current solutions on the same scale. This dual measurement allows researchers to identify gaps where something is important but current solutions fall short.
2. Factor Analysis The first step uses a statistical technique called factor analysis to manage the complexity of dozens (or more) of individual need statements. This identifies which needs tend to correlate with each other and groups them into a smaller number of underlying themes or "factors."
For example, needs like "minimize the time it takes to get support," "quickly get answers to my questions," and "resolve issues on the first contact" might all load together into a single factor that researchers would label "Responsiveness." Similarly, needs related to data security, privacy protection, and system reliability might group into a "Trust and Security" factor.
Instead of trying to segment customers based on their ratings of 50-125+ individual needs, you can work with 5-8 meaningful factors that capture the key themes. Distinguish between these two steps. Factor analysis groups the needs to simplify the list of questions. Cluster analysis groups the people based on those simplified factors.
3. Cluster Analysis The final step applies cluster analysis algorithms to the factor scores from the previous step. This statistical process groups individual respondents into segments based on how they rated the importance and satisfaction of these underlying factors. The algorithm identifies natural groupings where people within each segment have similar patterns of unmet needs.
The result is typically 3-5 segments, each with distinct profiles. One segment might show high unmet needs around speed and convenience, while another prioritizes quality and reliability over everything else.
4. Segment Profiling Once the statistical clustering is complete, researchers analyze what makes each segment unique. This involves analyzing not just the needs that define each group, but also their demographics, behaviors, and complexity factors that can help explain the data.
This traditional approach established the foundation for needs-based segmentation and proved that customers with different priority patterns often require different solutions. However, the method inherits some of the potential biases we discussed in Chapter 7 around Likert scale data, particularly issues with response bias and the challenge of comparing importance ratings across different people.
The MaxDiff-based approach we'll explore next builds on these same principles while addressing some of the measurement challenges inherent in traditional rating scales.
Segmenting with MaxDiff Data
With MaxDiff utility scores in hand, you have several options for identifying customer segments. Each method has its strengths and appropriate use cases. Here are the three main approaches researchers use to segment MaxDiff data.
Method 1: Latent Class Analysis (LCA)
Latent Class Analysis is often considered the most robust method for segmenting choice-based data like MaxDiff. This model-based approach works differently from traditional clustering methods because it doesn't just group people after the fact. As noted by Wedel and Kamakura (2000), this model-based approach assumes your sample contains hidden subgroups with different preference patterns, and it simultaneously identifies these groups while estimating what each group's preferences look like. [13]
Think of LCA as working backwards from the patterns in your data. Rather than starting with individual utility scores and then clustering them, LCA asks "what if there are actually three distinct types of customers in this data, each with their own preference pattern?" It then tests whether this assumption explains the observed MaxDiff choices better than assuming two groups, or four groups, or treating everyone as homogeneous.
The method provides clear statistical measures to help you decide on the optimal number of segments. Fit statistics like the Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC) give you objective ways to compare different segment solutions. Generally, lower values indicate better model fit, helping remove some of the guesswork from deciding whether you have two segments or five.
LCA also handles uncertainty well. Instead of definitively assigning each person to a single segment, it calculates the probability that each respondent belongs to each segment. This probabilistic assignment can be valuable for understanding borderline cases and the stability of your segmentation.
The main drawback is complexity. LCA requires specialized software and can feel like a black box if you're not comfortable with statistical modeling. The results also require more interpretation than simpler clustering methods.
Method 2: K-Means Clustering
K-Means offers a more intuitive, algorithm-based approach that many researchers find easier to understand and implement. The method works by treating each person's MaxDiff utility scores as coordinates in multi-dimensional space. If you have ten attributes in your MaxDiff study, each respondent becomes a point in ten-dimensional space based on their utility scores for those attributes.
The K-Means algorithm then searches for the best way to place cluster "centers" in this space and assigns each person to their nearest center. The algorithm iteratively moves these centers around until it finds the configuration that minimizes the total distance between all points and their assigned centers.
This approach is simple and fast. You can visualize what's happening even if you can't easily draw ten-dimensional space. K-Means is also widely available in most statistical software packages and even Excel plugins.
However, K-Means requires you to specify the number of clusters upfront. You need to decide whether you want three segments or five segments before running the analysis. This often means running multiple analyses with different numbers of clusters and comparing the results. The method can also be sensitive to outliers, since a few people with unusual preference patterns can pull cluster centers away from more typical respondents.
Additionally, K-Means assumes clusters are roughly spherical and similarly sized, which may not match the actual structure in your data. If one segment represents 60% of your market while another represents 10%, K-Means might not identify this naturally.
Method 3: Two-Step or Hybrid Approaches
Many experienced researchers use hybrid approaches that combine the exploratory power of one method with the stability of another. The most common version starts with Hierarchical Clustering to explore the data structure, then uses those insights to inform a K-Means analysis for the final segmentation.
Hierarchical Clustering works like building a family tree in reverse. It starts by treating each person as their own cluster, then iteratively combines the two most similar clusters until everyone is grouped together. This creates a tree-like structure (called a dendrogram) that shows how clusters combine at each step.
The advantage of starting with Hierarchical Clustering is that it doesn't require you to specify the number of clusters beforehand. You can examine the dendrogram to see where natural breaks occur and identify the most meaningful number of segments. This exploratory step gives you insight into the data structure that pure K-Means clustering might miss.
Once you've identified the optimal number of clusters from the hierarchical analysis, you can use that number as input for K-Means clustering. This gives you the stability and interpretability of K-Means while removing the guesswork about how many segments to create.
Some researchers extend this approach even further, using the hierarchical results to inform starting points (called seeds) for the K-Means algorithm. This can help ensure that K-Means finds the global optimum rather than getting stuck in a local minimum.
The main drawback of hybrid approaches is complexity. You're running multiple analyses and making decisions at each step, which requires more time and expertise. The process can feel more like art than science, especially when interpreting hierarchical clustering results.
Each of these methods can produce valuable segmentations, and the best choice often depends on your specific situation, data characteristics, and comfort level with different analytical approaches. The key is choosing a method that gives you segments you can understand, act upon, and defend to stakeholders.
Segmenting our MaxDiff Data: Step-by-step approach
Download MaxDiff data from Chapter 8
Before we begin segmenting, let's review a bit about the data. The coffee preference data contains responses from 400 consumers across 15 MaxDiff attributes. Each attribute received a preference score representing its relative importance to each individual customer. These scores show substantial variation across respondents, suggesting that meaningful segments exist but aren't immediately obvious.
The challenge is to group customers with similar preference patterns while ensuring that the resulting segments are both statistically sound and practically useful a business. As we'll see, we have to balance statistical fit with business reality.
We'll start by examining how different clustering approaches handle the same underlying customer preference data, starting with the most common method in marketing research: k-means clustering.
First Attempt: Understanding K-Means Clustering
K-means clustering is the most widely used segmentation method in marketing research, and for good reason. It's computationally efficient, conceptually straightforward, and often produces interpretable results. However, as we will learn, its simplicity can also be a limitation when dealing with complex customer preference data.
How K-Means Works
Before diving into the analysis, it's worth understanding what k-means does. The algorithm follows a simple process:
- Initialize: Place k cluster centers randomly in the data space
- Assign: Assign each customer to the nearest cluster center
- Update: Move each cluster center to the average position of its assigned customers
- Repeat: Continue assigning and updating until cluster centers stop moving
This process guarantees that customers within each cluster are as similar as possible to their cluster center, while being as different as possible from other cluster centers. However, as detailed in clustering reviews like Jain (2010), k-means makes several assumptions that may not hold for all datasets: [14]
- Clusters should be roughly spherical (circular in 2D, ball-shaped in higher dimensions)
- Clusters should be of similar sizes
- All variables should be equally important
- The optimal number of clusters should be specified in advance
Finding the Right Number of Clusters
The biggest challenge with k-means is determining how many clusters to create. I tested three statistical methods to help guide this decision:
1library(factoextra)
2
3# Elbow method - looks for the "elbow" in within-cluster sum of squares
4fviz_nbclust(scaled_data, kmeans, method = "wss", k.max = 10) +
5 ggtitle("Elbow Method for Optimal k")
The elbow method plots the within-cluster sum of squares (WSS) for different numbers of clusters. WSS measures how tightly customers cluster around their assigned centers. As you add more clusters, WSS always decreases because customers get closer to their cluster centers. The "elbow" occurs where adding another cluster provides diminishing returns.
In my results, WSS dropped dramatically from k=1 to k=2 (from around 6000 to 3500), which makes sense because forcing all customers into one group creates high variation. The decline continued more gradually afterward, with potential elbows at k=3 (WSS around 2800) and k=6 (WSS around 2200). This gradual decline without a clear elbow suggested that the natural cluster structure might not be obvious.
1# Silhouette method - finds k that maximizes average silhouette width
2fviz_nbclust(scaled_data, kmeans, method = "silhouette", k.max = 10) +
3 ggtitle("Silhouette Method for Optimal k")
The silhouette method evaluates cluster quality rather than just within-cluster tightness. For each possible number of clusters, it calculates how well customers fit in their assigned clusters compared to alternative clusters. Higher average silhouette scores indicate better separation between clusters.
My results showed a clear peak at k=3 with a silhouette score around 0.50. This peak suggests that three is a point of strong, natural separation in the data. While other solutions might produce slightly higher average scores, the silhouette plot highlights k=3 as a structurally sound and interpretable option. Scores declined for k=4 and k=5 relative to this peak, suggesting that adding more clusters beyond three was creating weaker or more artificial divisions.
1# Gap statistic - compares clustering structure to random data
2set.seed(123)
3gap_stat <- clusGap(scaled_data, FUN = kmeans, nstart = 25, K.max = 10, B = 50)
4fviz_gap_stat(gap_stat)
The gap statistic uses a more sophisticated approach. It compares the within-cluster dispersion of your actual data to what you'd expect from randomly distributed data. The optimal k occurs where the gap between your data's structure and random structure is largest.
The gap statistic showed its steepest increase from k=3 to k=4, then plateaued. This pattern suggested that k=4 captured meaningful structure that wouldn't appear in random data. However, the relatively modest gap values (around 0.15) indicated that while structure existed, it wasn't extremely strong.
Interpreting Conflicting Signals
These three methods provided conflicting recommendations, which is common in real segmentation work:
- Elbow method: Ambiguous, potential stopping points at k=3 or k=6
- Silhouette method: Strong recommendation for k=3
- Gap statistic: Suggestion for k=4
Faced with these mixed signals, I made a business judgment to start with k=4. My reasoning was that coffee consumers might naturally fall into four, or more, behavioral types based on different priorities: quality seekers, convenience seekers, budget-conscious consumers, and experience seekers.
Implementing K-Means with Four Clusters
1# Perform k-means clustering
2set.seed(123)
3k4_clusters <- kmeans(scaled_data, centers = 4, nstart = 25, iter.max = 300)Let me explain each parameter in this code:
-
set.seed(123): K-means starts with random cluster centers, so setting a seed ensures reproducible results. Without this, you might get slightly different clusters each time you run the analysis. -
centers = 4: Specifies that we want four clusters. -
nstart = 25: Runs the k-means algorithm 25 times with different random starting points and keeps the best result. This matters because k-means can get stuck in local optima (good solutions that aren't the best possible solution). -
iter.max = 300: Maximum number of iterations allowed. K-means usually converges quickly, but this ensures the algorithm has enough time to find stable cluster centers.
1# Add cluster assignments to original data
2maxdiff_segmented <- maxdiff_chapter8_example
3maxdiff_segmented$cluster <- as.factor(k4_clusters$cluster)
4
5# Visualize clusters using PCA
6fviz_cluster(k4_clusters, data = scaled_data,
7 palette = c("#FF6B6B", "#4ECDC4", "#45B7D1", "#FFA07A"),
8 geom = "point",
9 ellipse.type = "convex",
10 ggtheme = theme_minimal()) +
11 ggtitle("K-means Clustering (k=4)")
The visualization revealed problems. The fviz_cluster() function uses Principal Component Analysis (PCA) to project the 15-dimensional preference data into two dimensions for plotting. While this projection inevitably loses some information, it provides a useful overview of cluster separation.
The plot showed that while Cluster 1 (red circles) and Cluster 3 (blue squares) appeared well-separated, Clusters 2 (teal triangles) and 4 (orange squares) exhibited substantial overlap in the center-right area. This visual overlap was concerning because well-separated clusters should appear as distinct groups with minimal boundary overlap.
Examining Cluster Profiles
To understand what these clusters represented, let's create a heatmap showing each cluster's preferences:
1# Create heatmap of cluster profiles
2cluster_centers <- as.data.frame(k4_clusters$centers)
3cluster_centers$cluster <- paste("Cluster", 1:4)
4
5heatmap_data <- melt(cluster_centers, id.vars = "cluster")
6
7ggplot(heatmap_data, aes(x = variable, y = cluster, fill = value)) +
8 geom_tile() +
9 geom_text(aes(label = round(value, 2)), color = "black", size = 3) +
10 scale_fill_gradient2(low = "blue", mid = "white", high = "red",
11 midpoint = 0, name = "Standardized\nValue") +
12 theme_minimal() +
13 theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
14 labs(title = "Cluster Profiles Heatmap (k=4)",
15 x = "Coffee Preferences",
16 y = "Clusters")
The heatmap displays standardized cluster centers, where:
- Red colors indicate above-average preference for an attribute
- Blue colors indicate below-average preference
- White colors indicate average preference
- Numbers show the exact standardized values
Reading the heatmap, we can identify each cluster's characteristics:
- Cluster 1: High values for quick service (1.65) and budget constraints (1.68), negative for quality and atmosphere
- Cluster 2: Positive for convenience features like rewards (1.01) and remote ordering (1.28)
- Cluster 3: Strong preference for quality coffee (0.93) and supporting aligned businesses (0.96)
- Cluster 4: Similar pattern to Cluster 2, with positive values for convenience and negative for quality
The similarity between Clusters 2 and 4 seemed to overlap. Both showed nearly identical patterns across multiple variables, differing mainly in magnitude rather than direction of preferences. This suggested they might represent variations within a single customer type rather than truly distinct segments.
Quality Assessment: Silhouette Analysis
To quantify the clustering quality, lets also calculate silhouette scores:
1# Validation metrics
2sil <- silhouette(k4_clusters$cluster, dist(scaled_data))
3print(paste("Average silhouette width:", round(mean(sil[, 3]), 3)))
4
5[1] "Average silhouette width: 0.509"need to explain what this means and provide the output above
The silhouette analysis provides both overall and cluster-specific quality measures. For each customer, it compares how well they fit in their assigned cluster versus the next best alternative. The calculation involves:
- a(i): Average distance from customer i to other customers in the same cluster
- b(i): Average distance from customer i to customers in the nearest different cluster
- Silhouette score = (b(i) - a(i)) / max(a(i), b(i))
Interpreting silhouette scores:
- 0.7 to 1.0: Excellent clustering, customers clearly belong in their assigned cluster
- 0.5 to 0.7: Good clustering, reasonable separation between clusters
- 0.3 to 0.5: Weak clustering, some customers might fit better elsewhere
- Below 0.3: Poor clustering, artificial or forced groupings likely
- Negative: Customer fits better in a different cluster than their assigned one
My results showed:
- Overall average: 0.509 (just above the acceptable threshold)
- Cluster 1: 0.53 (good separation)
- Cluster 2: 0.48 (borderline quality)
- Cluster 3: 0.55 (good separation)
- Cluster 4: 0.44 (weak separation, below recommended threshold)
These scores revealed the core problem with my four-cluster solution. While two clusters achieved good separation, the other two fell into the borderline to weak range. Cluster 4's score of 0.44 particularly concerned me because it suggested forced grouping of customers who might naturally belong elsewhere.
Understanding the Cluster Size Distribution
1# Examine cluster sizes
2table(k4_clusters$cluster)
3
4 1 2 3 4
5121 59 100 120The cluster sizes also revealed potential issues:
1# Examine cluster sizes
2table(k4_clusters$cluster)
3
4 1 2 3 4
5121 59 100 120-
Cluster 1: 121 customers (30%)
-
Cluster 2: 59 customers (15%)
-
Cluster 3: 100 customers (25%)
-
Cluster 4: 120 customers (30%)
Cluster 2 was notably smaller than the others, which can be a warning sign. Small clusters sometimes emerge when an algorithm tries to separate a handful of outliers or when it artificially splits a larger, more natural group. In this case, its small size, combined with the poor silhouette score and visual overlap, strongly suggested that this four-cluster solution was not stable.
Trying Three K-Means Clusters
The problems with my four cluster solution forced me to reconsider the silhouette method's strong recommendation for three clusters. Sometimes stepping back from initial business assumptions leads to better results, and this proved to be one of those moments.
Implementing the Three-Cluster Solution
1# Perform k-means clustering with k=3
2set.seed(123)
3k3_clusters <- kmeans(scaled_data, centers = 3, nstart = 25, iter.max = 300)
4
5# Add cluster assignments to original data
6maxdiff_segmented$cluster <- as.factor(k3_clusters$cluster)
7
8# Visualize clusters using PCA
9fviz_cluster(k3_clusters, data = scaled_data,
10 palette = c("#FF6B6B", "#4ECDC4", "#45B7D1"),
11 geom = "point",
12 ellipse.type = "convex",
13 ggtheme = theme_minimal()) +
14 ggtitle("K-means Clustering (k=3)")
The improvement was clear. The PCA visualization showed much cleaner separation between segments, with minimal overlap between the convex hull boundaries. The problematic overlapping clusters from my four-segment solution had been consolidated into more coherent groupings.
Where the four-cluster solution showed Clusters 2 and 4 bleeding into each other in the center-right area of the plot, the three-cluster solution created clear boundaries. Each cluster occupied its own distinct region of the preference space, with Cluster 1 (red circles) positioned in the lower portion, Cluster 2 (teal triangles) in the upper left, and Cluster 3 (blue squares) on the right side.
The ellipses around each cluster also appeared more natural. In clustering visualizations, these ellipses represent the approximate boundaries within which most cluster members fall. Tighter, more circular ellipses indicate cohesive segments, while elongated or overlapping ellipses suggest internal heterogeneity or unclear boundaries. The three-cluster ellipses were more compact and showed minimal overlap compared to the four-cluster attempt.
Examining the New Cluster Profiles
The real test of improvement came when I examined what these three clusters represented in terms of coffee preferences:
1# Create updated heatmap for 3 clusters
2cluster_centers <- as.data.frame(k3_clusters$centers)
3cluster_centers$cluster <- paste("Cluster", 1:3)
4
5heatmap_data <- melt(cluster_centers, id.vars = "cluster")
6
7ggplot(heatmap_data, aes(x = variable, y = cluster, fill = value)) +
8 geom_tile() +
9 geom_text(aes(label = round(value, 2)), color = "black", size = 3) +
10 scale_fill_gradient2(low = "blue", mid = "white", high = "red",
11 midpoint = 0, name = "Standardized\nValue") +
12 theme_minimal() +
13 theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
14 labs(title = "Cluster Profiles Heatmap (k=3)",
15 x = "Coffee Preferences",
16 y = "Clusters")
The three-cluster heatmap revealed much more distinct and interpretable preference patterns compared to the overlap between Clusters 2 and 4 in my previous attempt. Each cluster now displayed clear, differentiated preferences that made business sense:
Cluster 1: Quick & Budget-Conscious Consumers This segment showed the highest positive values for speed-related attributes, with "obtaining coffee quickly during time constraints" scoring 1.65 and "purchasing within financial limits" at 1.68. These customers prioritized efficiency and value. Correspondingly, they showed negative values for quality-focused attributes like "consuming high-quality coffee for satisfaction" (-0.98) and comfort features like "accessing comfortable spaces away from home" (-0.45). This profile painted a clear picture of customers who view coffee primarily as a functional necessity rather than an experience.
Cluster 2: Convenience-Focused Customers This group displayed strong positive values for modern convenience features. They scored highly on "accumulating rewards through loyalty programs" (1.01), "accessing convenient locations" (1.35), and "placing remote orders ahead of arrival" (1.28). However, like Cluster 1, they showed negative values for traditional quality attributes such as "consuming high-quality coffee" (-0.72) and "supporting businesses aligned with personal values" (-0.55). This suggested customers who wanted coffee to fit seamlessly into their busy lifestyles through digital integration and location convenience, but weren't willing to pay premium prices for quality.
Cluster 3: Quality & Experience Seekers The largest cluster showed an entirely different preference pattern. They demonstrated high positive values for "consuming high-quality coffee for satisfaction" (0.93), "supporting businesses aligned with personal values" (0.96), and "experiencing hygienic conditions" (0.86). Conversely, they showed negative values for quick service needs (-0.71) and budget constraints (-0.62). This profile suggested customers who viewed coffee as an experience worth investing in, both financially and temporally.
Understanding Cluster Sizes and Market Implications
1# Examine cluster sizes and proportions
2table(k3_clusters$cluster)
3prop.table(table(k3_clusters$cluster))
4
5 1 2 3
6100 120 180
7
8 1 2 3
90.25 0.30 0.45The cluster distribution revealed interesting market dynamics:
- Cluster 1 (Quick & Budget-Conscious): 100 customers (25%)
- Cluster 2 (Convenience-Focused): 120 customers (30%)
- Cluster 3 (Quality & Experience Seekers): 180 customers (45%)
The fact that Cluster 3 contained nearly half of all respondents suggested that quality and experience orientation might be the dominant preference pattern among coffee consumers in this sample. However, this large segment size also raised a flag that we'd need to investigate further. Sometimes when one cluster becomes too large, it indicates that heterogeneous customers are being forced together because the algorithm can't find enough distinct patterns to separate them properly.
Quality Assessment: Improved But Not Perfect
The quality metrics showed marked improvement over the four-cluster solution:
1# Validation metrics for k=3
2sil_3 <- silhouette(k3_clusters$cluster, dist(scaled_data))
3print(paste("Average silhouette width:", round(mean(sil_3[, 3]), 3)))
4
5# Detailed cluster breakdown
6summary(sil_3)> summary(sil_3)
> Silhouette of 400 units in 3 clusters from silhouette.default(x = k3_clusters$cluster, dist = dist(scaled_data)) :
> Cluster sizes and average silhouette widths:
100 120 180
0.5537768 0.5331681 0.4379362
Individual silhouette widths:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2924 0.4397 0.5028 0.4955 0.5565 0.6708The results were encouraging:
- Overall average silhouette width: 0.495 (approaching the 0.5 threshold for good clustering)
- Cluster 1: 0.55 (good separation)
- Cluster 2: 0.53 (good separation)
- Cluster 3: 0.44 (weak separation, but the only problematic cluster)
The improvement was clear when comparing to the four-cluster attempt. Instead of having two problematic clusters with scores below 0.48, I now had only one cluster with weak separation. More significantly, the two smaller clusters achieved good separation scores above 0.53, indicating that they represented genuine, well-defined customer segments.
However, the weak score for Cluster 3 was concerning, especially given that it contained 180 customers (45% of the sample). A silhouette score of 0.44 suggested that many customers in this cluster were nearly as similar to members of other clusters as they were to their cluster mates. This could indicate that Cluster 3 contained multiple sub-groups that the three-cluster solution couldn't differentiate.
Visualizing Individual Customer Fit
To better understand the quality issues, I examined the silhouette plot:
1# Create silhouette plot
2fviz_silhouette(sil_3) +
3 ggtitle("Silhouette Analysis for K-means (k=3)")
4
5 cluster size ave.sil.width
61 1 100 0.55
72 2 120 0.53
83 3 180 0.44
The silhouette plot displays individual customer scores sorted by cluster and score magnitude. Each bar represents one customer, with longer bars indicating better fit within their assigned cluster. Negative bars indicate customers who might fit better in a different cluster.
The plot revealed several insights:
Cluster 1 and 2: Most customers showed positive silhouette scores between 0.4 and 0.7, with few negative values. This confirmed that these segments contained customers with genuinely similar preferences who were well-separated from other groups.
Cluster 3: While most customers had positive scores, there was more variation and a longer tail of customers with scores near zero. Some customers even showed negative scores, suggesting they might be misassigned. The heterogeneity within this large cluster supported my suspicion that it might contain multiple sub-segments.
Exploring Cluster 3's Internal Structure
Given the size and quality concerns with Cluster 3, I investigated its internal composition:
1# Examine Cluster 3 customers in detail
2cluster3_data <- scaled_data[k3_clusters$cluster == 3, ]
3
4# Look at variation within Cluster 3
5apply(cluster3_data, 2, sd)
6
7# Compare to variation in other clusters
8cluster1_data <- scaled_data[k3_clusters$cluster == 1, ]
9cluster2_data <- scaled_data[k3_clusters$cluster == 2, ]
10
11mean(apply(cluster1_data, 2, sd)) # Average variation in Cluster 1
12mean(apply(cluster2_data, 2, sd)) # Average variation in Cluster 2
13mean(apply(cluster3_data, 2, sd)) # Average variation in Cluster 3> apply(cluster3_data, 2, sd)
Consume high-quality coffee for satisfaction Obtain coffee quickly during time constraints
0.3097366 0.1021907
Secure comfortable space for extended stays Access internet connectivity while away from home
0.7651981 1.1211154
Accumulate rewards through repeat purchases Place orders remotely to avoid waiting
0.6835872 0.9962735
Acquire fresh food alongside coffee Access coffee within daily travel patterns
0.5099785 0.2303266
Purchase coffee during off-peak hours Support businesses aligned with personal values
0.8979413 0.7016863
Receive guidance for optimal coffee selection Purchase coffee within financial limits
0.2073337 0.2622862
Find quiet space for focused activities Choose from options matching current preferences
0.8389023 0.8410385
Experience service in hygienic conditions
0.2732193
> mean(apply(cluster1_data, 2, sd)) # Average variation in Cluster 1
> [1] 0.407853
> mean(apply(cluster2_data, 2, sd)) # Average variation in Cluster 2
> [1] 0.4076522
> mean(apply(cluster3_data, 2, sd)) # Average variation in Cluster 3
> [1] 0.582721The analysis confirmed my suspicions. Cluster 3 showed higher average variation across preference attributes compared to Clusters 1 and 2. While the other clusters had relatively tight distributions around their center points, Cluster 3 contained customers with more diverse preference patterns that happened to be grouped together because they didn't fit clearly into the other two segments.
The Promise and Limitations of Three Clusters
The three-cluster solution represented an improvement over my four-cluster attempt in several ways:
Clearer Differentiation: Each cluster now had a distinct and interpretable preference profile. The confusion between similar clusters was eliminated, and each segment suggested different marketing approaches.
Better Statistical Quality: Two of the three clusters achieved good separation, and the overall silhouette score improved. The visual separation was much cleaner, with minimal overlap between cluster boundaries.
Actionable Insights: The three segments suggested clear marketing strategies. Quick & Budget-Conscious customers could be targeted with efficiency and value messaging. Convenience-Focused customers would respond to digital features and location strategies. Quality & Experience Seekers would appreciate premium positioning and values-based marketing.
However, key limitations remained:
Large Heterogeneous Segment: Cluster 3's size and internal variation suggested it might benefit from further subdivision. Nearly half of all customers fell into this category, which could limit the precision of targeted marketing efforts.
Moderate Quality Scores: While improved, the silhouette scores still fell short of the 0.7 threshold that indicates strong, well-separated clusters. This suggested that natural customer groupings in coffee preferences might be more subtle than extreme.
Potential for Refinement: The internal heterogeneity in Cluster 3 raised questions about whether a different approach might reveal additional meaningful segments within this large group.
This shows that segmentation quality involves more than just overall statistical measures. Even when average metrics improve, examining individual cluster performance can reveal opportunities for further refinement. The three-cluster solution was clearly better than my four-cluster attempt, but it wasn't necessarily the final answer to understanding customer segments in this coffee preference data.
Exploring a Different Approach: Hierarchical Clustering
After mixed results with k-means, I decided to test a different approach. K-means assumes spherical clusters of similar sizes, but perhaps my coffee preference data had different underlying structure that required a more flexible method.
Hierarchical clustering works differently from k-means in several key ways. Instead of starting with a predetermined number of clusters, it builds a tree-like structure called a dendrogram that shows how customers group together at different levels of similarity. Think of it like a family tree, but instead of showing genealogical relationships, it reveals preference relationships among customers.
Understanding Linkage Methods
The first decision in hierarchical clustering involves choosing how to measure the distance between groups of customers. Different linkage methods can produce markedly different results:
1library(cluster)
2library(dendextend)
3
4# Calculate distance matrix
5dist_matrix <- dist(scaled_data, method = "euclidean")The distance matrix calculates how different each customer is from every other customer based on their coffee preferences. With 400 customers, this creates a 400x400 matrix containing 79,800 unique pairwise distances. Euclidean distance treats each preference like a coordinate in 15-dimensional space and calculates the straight-line distance between customers.
1# Test different linkage methods
2hc_complete <- hclust(dist_matrix, method = "complete")
3hc_single <- hclust(dist_matrix, method = "single")
4hc_average <- hclust(dist_matrix, method = "average")
5hc_ward <- hclust(dist_matrix, method = "ward.D2")
6
7# Visualize dendrograms
8par(mfrow = c(2, 2))
9plot(hc_complete, main = "Complete Linkage", cex = 0.6, hang = -1)
10plot(hc_single, main = "Single Linkage", cex = 0.6, hang = -1)
11plot(hc_average, main = "Average Linkage", cex = 0.6, hang = -1)
12plot(hc_ward, main = "Ward Linkage", cex = 0.6, hang = -1)
13par(mfrow = c(1, 1))
Each linkage method defines distance between clusters differently:
-
Single linkage uses the shortest distance between any two points in different clusters. This often creates long, chain-like clusters that may not reflect natural groupings.
-
Complete linkage uses the maximum distance between points in different clusters. This tends to create compact, spherical clusters of similar sizes.
-
Average linkage uses the average distance between all pairs of points in different clusters. This provides a middle ground between single and complete linkage.
-
Ward linkage minimizes the within-cluster sum of squares when merging clusters. This method tends to create clusters of roughly equal size and is often preferred for customer segmentation because it produces the most interpretable results.
The dendrograms revealed clear differences between methods. Single linkage produced a few large clusters with many small outlier groups. Complete and average linkage showed more balanced structures, but Ward linkage produced the clearest branching pattern with distinct separation points that suggested natural customer groupings.
Reading the Dendrogram
A dendrogram shows the hierarchical relationship between all customers, with height on the y-axis representing the distance at which clusters merge. Lower heights indicate more similar customers, while higher heights show where dissimilar groups come together.
The Ward dendrogram revealed several interesting patterns:
-
Clear primary split: The tree showed a major division around height 15, suggesting two fundamentally different customer types exist in the data.
-
Secondary branches: Each major branch showed further subdivision around heights 8-10, indicating more nuanced differences within the broader customer types.
-
Stable clusters: Some groups of customers clustered together at low heights (2-4), suggesting these individuals have nearly identical preferences.
To convert the hierarchical structure into discrete segments, I needed to "cut" the tree at a specific height. Cutting lower creates more clusters with smaller differences, while cutting higher produces fewer clusters with larger differences.
1# Create both k=3 and k=4 solutions by cutting at different heights
2hc3_clusters <- cutree(hc_ward, k = 3)
3hc4_clusters <- cutree(hc_ward, k = 4)
4
5# Add to dataset for analysis
6maxdiff_hc3 <- maxdiff_chapter8_example
7maxdiff_hc3$cluster <- as.factor(hc3_clusters)
8
9maxdiff_hc4 <- maxdiff_chapter8_example
10maxdiff_hc4$cluster <- as.factor(hc4_clusters)The cutree() function automatically finds the height that produces exactly k clusters. By specifying both k=3 and k=4, I could compare how the same underlying structure looked when divided into different numbers of segments.
Comparing Hierarchical Results
The hierarchical approach yielded interesting differences from k-means, and the results helped explain why my earlier attempts had struggled:
Three Cluster Hierarchical Results:
- Average silhouette width: 0.495
- Cluster sizes: 120, 100, 180 customers
- Same basic segment structure as k-means k=3
Four Cluster Hierarchical Results:
- Average silhouette width: 0.508
- Cluster sizes: 120, 100, 120, 60 customers
- Different cluster composition than k-means k=4
The hierarchical four cluster solution showed one key advantage over my earlier k-means attempt. It maintained the same strong clusters (later identified as "Frequent Premium Seekers" and "Value-Conscious Users") in both three and four cluster solutions. When moving from three to four clusters, it cleanly split the large mainstream segment rather than creating artificial divisions among the well-defined groups.
This stability suggested that some customer segments were more "real" than others. The hierarchical method was detecting natural groupings that persisted regardless of how I divided the remaining customers.
Understanding Silhouette Scores
1# Hierarchical 3-cluster validation metrics
2sil_hc3 <- silhouette(hc3_clusters, dist(scaled_data))
3print(paste("Average silhouette width (3-cluster):", round(mean(sil_hc3[, 3]), 3)))
4
5# Hierarchical 4-cluster validation metrics
6sil_hc4 <- silhouette(hc4_clusters, dist(scaled_data))
7print(paste("Average silhouette width (4-cluster):", round(mean(sil_hc4[, 3]), 3)))
8
9# Detailed breakdown by cluster
10aggregate(sil_hc4[, 3], by = list(cluster = sil_hc4[, 1]), FUN = mean)> print(paste("Average silhouette width (3-cluster):", round(mean(sil_hc3[, 3]), 3)))
[1] "Average silhouette width (3-cluster): 0.495"
> print(paste("Average silhouette width (4-cluster):", round(mean(sil_hc4[, 3]), 3)))
[1] "Average silhouette width (4-cluster): 0.508"
> # Detailed breakdown by cluster
> aggregate(sil_hc4[, 3], by = list(cluster = sil_hc4[, 1]), FUN = mean)
cluster x
1 1 0.5331681
2 2 0.5527411
3 3 0.4837314
4 4 0.4305644The results showed interesting patterns:
Three Cluster Solution (Average: 0.495):
- Cluster 1: 0.53 (good separation)
- Cluster 2: 0.55 (good separation)
- Cluster 3: 0.44 (weak separation)
Four Cluster Solution (Average: 0.508):
- Cluster 1: 0.53 (good separation)
- Cluster 2: 0.55 (good separation)
- Cluster 3: 0.48 (borderline separation)
- Cluster 4: 0.43 (weak separation)
The four cluster solution showed more balanced performance overall. While it still contained one weak cluster, the problematic large segment from the three cluster solution had been divided into two more manageable groups. This suggested that the hierarchical method was successfully identifying natural subdivisions within the heterogeneous mainstream segment rather than creating artificial splits among the well-defined clusters.
Visualizing Hierarchical Results
The PCA visualizations revealed notable differences from my k-means attempts:
1# Visualize 3-cluster solution
2fviz_cluster(list(data = scaled_data, cluster = hc3_clusters),
3 geom = "point",
4 palette = c("#FF6B6B", "#4ECDC4", "#45B7D1")) +
5 ggtitle("Hierarchical Clustering (k=3)")
6
7# Visualize 4-cluster solution
8fviz_cluster(list(data = scaled_data, cluster = hc4_clusters),
9 geom = "point",
10 palette = c("#FF6B6B", "#4ECDC4", "#45B7D1", "#FFA07A")) +
11 ggtitle("Hierarchical Clustering (k=4)")

The fviz_cluster() function creates a two-dimensional representation of the multidimensional customer data using Principal Component Analysis (PCA). Think of this as creating a map where similar customers appear close together and different customers appear far apart. The ellipses around each cluster show the approximate boundaries, with larger ellipses indicating more internal variation within the segment.
The hierarchical four cluster visualization showed cleaner separation than my k-means attempt. While some overlap remained between adjacent clusters, the boundaries appeared more natural rather than artificially imposed. Critically, the clearly distinct cluster remained well-separated from all others, and the division of the mainstream segment looked meaningful rather than random.
Why Hierarchical Clustering Worked Better
Several factors explained why the hierarchical approach produced superior results for this coffee preference data. Unlike k-means, which assumes all clusters should be roughly spherical and similar in size, hierarchical clustering can detect clusters of different shapes and densities. My coffee preference data apparently contained some tight, well-defined groups and some looser, more dispersed groups that didn't fit k-means' geometric assumptions.
The dendrogram revealed which customer groupings were most stable across different solutions. The same core segments appeared in both three and four cluster solutions, suggesting these represented genuine customer types rather than statistical artifacts created by the algorithm. This stability provided confidence that the hierarchical method was identifying real patterns in customer preferences rather than imposing artificial structure.
Ward linkage specifically optimizes for creating clusters with minimal internal variation, which aligns well with segmentation goals. This method tends to produce segments where customers within each group are as similar as possible, making the resulting clusters more coherent and actionable for marketing purposes. The dendrogram also provided clear visual guidance about where natural divisions exist in the data, rather than forcing me to guess the optimal number of clusters based solely on statistical measures.
The convergence between hierarchical and k-means results strengthened my confidence in the underlying patterns. Both methods produced nearly identical silhouette scores of 0.508 and 0.509 respectively for four clusters, suggesting that genuine customer structure existed rather than method-specific artifacts. When different algorithms detect similar patterns, it indicates that the clustering reflects real customer groupings rather than algorithmic bias.
However, I wouldn't say I definitively prioritized hierarchical clustering over k-means based on this analysis alone. The statistical quality was virtually identical between the two approaches, with both four-cluster solutions producing silhouette scores around 0.508. The key advantage of the hierarchical method was its ability to reveal the data's natural structure, which led to a more interpretable and stable result. It cleanly subdivided the large mainstream segment along meaningful lines rather than creating artificial breaks. This highlights a crucial lesson in segmentation supported by Dolnicar et al. (2018): success is not merely a hunt for the highest statistical score, but a search for the most stable, understandable, and ultimately actionable customer groupings. [15] In this case, the hierarchical approach delivered a solution that was not just statistically sound, but also made more business sense.
The Missing Piece: Demographic Validation
Note that in real-world segmentation projects, researchers would conduct further validation beyond the statistical and preference-based analysis we've focused on here. A step involves examining how each clustering solution performs across demographic variables, behavioral data, and other customer characteristics available in the dataset.
For example, researchers would typically analyze whether the "Quality & Experience Seekers" segment shows higher income levels, different age distributions, or distinct geographic patterns compared to the "Quick & Budget-Conscious" group. They might examine whether convenience-focused customers are more likely to be working professionals with busy schedules, or whether quality seekers tend to live in urban areas with more coffee shop options.
This demographic profiling often reveals which statistical solution translates into the most meaningful and actionable business segments. Sometimes a clustering solution that appears statistically sound based on preference data alone falls apart when you discover that the resulting segments show no meaningful differences in age, income, lifestyle, or other relevant characteristics. Conversely, a solution with moderate statistical scores might prove highly valuable if it creates segments with distinct demographic profiles that align with existing customer data or market research insights.
We simplified this analysis by focusing primarily on preference-based clustering quality to illustrate the core methodological differences between approaches. However, the interplay between preference-based clusters and real-world customer characteristics frequently drives the final segmentation decision in ways that pure statistical measures cannot capture.
This exploration taught me that segmentation success often depends on finding the method that best matches the underlying structure of your specific data, rather than defaulting to the most commonly used approach. In this case, the hierarchical structure of coffee preferences made hierarchical clustering slightly more suitable, but the decision was based on interpretability and business actionability rather than purely statistical superiority.
From Segmentation to Action
At this point, you've identified distinct customer segments. Groups like the "Quick & Budget-Conscious," "Convenience-Focused," and "Quality & Experience Seekers" from our coffee shop example, each with different priority patterns. But a list of segments, no matter how statistically sound, doesn't tell you what to do next.
This is where many practitioners turn to strategic frameworks.
The next chapter examines Tony Ulwick's Jobs-to-Be-Done Growth Strategy Matrix. This framework attempts to translate segmentation insights into strategic choices (differentiated, disruptive, dominant, discrete, or sustaining) based on where customer segments fall on the importance-satisfaction landscape. The segments you've identified become inputs to this thinking. Are the "Quality & Experience Seekers" underserved? The matrix would suggest differentiated positioning. Are the "Quick & Budget-Conscious" overserved? That might point toward disruption.
That said, Chapter 10 isn't a sales pitch for the framework.
The matrix has real limitations, and understanding them matters as much as understanding the framework itself. Chapter 10 will walk through the five strategies, show how they connect to the opportunity landscape, and then explain why treating this as a complete strategy would be a mistake. The goal is to give you a useful tool while being honest about what it can and can't do.
Chapter 9 Summary
-
No Perfect Method for Segmentation: There is no single best way to segment customers. The most effective method depends on your business goals and data. The ultimate goal of segmentation is to find actionable groups of customers by balancing the need for practical insights with the preservation of the underlying data's integrity and statistical validity.
-
Traditional vs. Modern Approaches: The chapter contrasts the classic Outcome-Driven Innovation (ODI) method, which uses factor and cluster analysis on importance/satisfaction Likert scales, with newer methods that leverage MaxDiff data to avoid scale-related biases.
-
Key Segmentation Methods for MaxDiff Data: Three primary statistical methods for segmenting MaxDiff utility scores are introduced:
- Latent Class Analysis (LCA): A sophisticated, model-based approach that simultaneously finds hidden segments and estimates their preferences.
- K-Means Clustering: A popular and intuitive algorithm that groups customers by minimizing the distance to cluster "centers." Its main challenge is that you must specify the number of clusters in advance.
- Hierarchical Clustering: A method that builds a tree-like structure (dendrogram) to show how customers naturally group together, providing visual guidance on the optimal number of segments.
-
Determining the Number of Clusters is Challenging: The practical analysis shows that statistical tools meant to find the optimal number of clusters (like the elbow method, silhouette method, and gap statistic) often provide conflicting recommendations, requiring researcher judgment.
-
Iterative Process Leads to Better Results: The initial K-Means attempt with four clusters was statistically weak, with overlapping segments and poor quality scores. Revisiting the analysis and choosing three clusters resulted in a much cleaner, more interpretable, and statistically sounder solution, identifying three core segments: Quick & Budget-Conscious, Convenience-Focused, and Quality & Experience Seekers.
-
Even Good Solutions Have Limitations: The three-cluster solution, while an improvement, created one large and internally diverse (heterogeneous) segment of "Quality & Experience Seekers," suggesting it might contain multiple distinct sub-groups.
-
Hierarchical Clustering Can Offer Deeper Insight: By using hierarchical clustering, the analysis revealed the underlying structure of the data more clearly. It produced a superior four-cluster solution by cleanly splitting the large, heterogeneous segment identified in the K-Means analysis, resulting in a more balanced and stable segmentation.
-
Actionability Over Statistical Purity: The key takeaway is that segmentation success is measured by the clarity, stability, and business utility of the resulting segments, not just by achieving the highest possible statistical score. The best method is the one that best matches the data's natural structure and produces groups that can be targeted effectively.
Recommended Readings
If you are interested in learning more about segmenting maxdiff data and other approaches to go about segmenting data in general. I strongly recommend reading through Chris Chapman's blog post titled, Individual Scores in Choice Models, Part 1: Data & Averages or checking out his book R For Marketing Research and Analytics (Use R!). [36, 41] For those dealing specifically with MaxDiff data, Chrzan and Orme (2019) explore the nuances of clustering utility scores versus using latent class methods directly. [16]
In his book, he goes through a bit more detail in different segmentation methods for general marketing research and analytics use cases. They are beyond the scope of my expertise and this online book I am writing but it's a great starting point.
Other resources include
- Market Segmentation Analysis: Understanding It, Doing It, and Making It Useful Book by Bettina Grün, Friedrich Leisch, and Sara Dolnicar
- Market Segmentation: How to Do It and How to Profit from It
- Market Segmentation
References
[13] Wedel, Michel, and Wagner A. Kamakura. Market Segmentation: Conceptual and Methodological Foundations. Kluwer Academic Publishers, 2000.
[14] Jain, Anil K. "Data Clustering: 50 Years Beyond K-Means." Pattern Recognition Letters, vol. 31, no. 8, 2010, pp. 651-666.
[15] Dolnicar, Sara, Bettina Grün, and Friedrich Leisch. Market Segmentation Analysis: Understanding It, Doing It, and Making It Useful. Springer, 2018.
[16] Chrzan, Keith, and Bryan Orme. Applied MaxDiff: A Practitioner's Guide to Best-Worst Scaling. Sawtooth Software, 2019.
[36] Chapman, Chris. "Individual Scores in Choice Models, Part 1: Data & Averages." QuantUXBlog.com, 23 Oct. 2024. Available at: https://quantuxblog.com/individual-scores-in-choice-models-part-1-data-averages
[40] Ulwick, Tony. “Market Segmentation Through a Jobs-to-be-Done Lens.” Jobs-to-be-Done.com, 18 Nov. 2017. Available at: https://jobs-to-be-done.com/market-segmentation-through-a-jobs-to-be-done-lens-5ef9242de65
[41] Chapman, Chris, and Elea McDonnell Feit. R for Marketing Research and Analytics (Use R!) 2nd ed., Springer, 2019. ISBN-13: 978-3030143152. Available on Amazon: https://www.amazon.com/Marketing-Research-Analytics-Use/dp/3030143155