Introduction
Conjoint analysis is a well established empirical technique used to quantify consumer choices. Specifically, with conjoint analysis one can quantitatively establish the relative importance of a product’s attributes; and within each attribute, the ranking of the attribute levels. It’s a workhorse in marketing. In litigation, it is used primarily in patent infringement and false claims cases.
There are many good conjoint analysis R tutorials out there - especially with a marketing focus; but none illustrate its application in support of litigation. This is the point of this tutorial: to explain how to conduct a full-fledged conjoint analysis for purposes of calculating willingness to pay. It’s more of how a sausage is made kind of tutorial.
There are many good scholarly papers and industry white papers that explain conjoint analysis. Again, few in the context of litigation. The following are three good papers discussing conjoint analysis in the context of litigation; and, in fact, the example used here is taken from Bedi and Reibstein.
All three papers do an excellent job in explaining conjoint analysis and the controlling case law. The takeaway is that there is no consensus around a “conventional” approach to conjoint analysis; neither in the established methodology nor in its treatment by the courts.
What do I mean when i say the focus of this tutorial is on how the sausage is made? All three papers assume the elements of a conjoint analysis to illustrate their argument, their analysis; they don’t show how the elements are made, where they come from. This tutorial does and it does so with the R package conjoint.
This is probably because these analysts rely on commercial packages - with idiosyncratic coding or canned routines, to the extent there is any coding at all. There are, of course, really good commercial ones, Qualtrix, conjointly, Survey Monkey and of course, the big daddy: Sawtooth Software.
A Primer
Conjoint analysis typically relies on a targeted-survey results: a survey which asks respondents to choose one of several products among a slate of “hypothetically similar” products, each with a variety of prices and features. The product attributes among which respondents are choosing must be the ones that affect the purchase process.
This choosing from a slate is appealing because it presumably resembles most every consumer’s shopping habits. This act of choosing reveals individual and group preferences, utility.
In the case of patent infringement customers must be able to choose a product for which they would be willing to pay more for a version of the product with a desired attribute. For example: a device-texting feature with autocorrect instead of one without autocorrect;a camera with a special telescoping lens rather than one sans the attribute. The attribute being the disputed product in question.
Survey results are used to gauge the utility (or part-worth) respondents place on a given attribute or feature - relative to the product’s other features.
This utility constitutes a measure of consumer damages - that is to say, the difference between what the consumer actually pays for a product and what the consumer would have paid for the same product without the attribute.
In consumer mislabeling or misrepresentation, conjoint analysis is used to estimate the value of the allegedly misrepresented feature such or rice that is labeled as “Organic” or “Natural” versus rice without the label. Functionally, the underlying exercise is identical for both tasks.
There are three elements to the analysis with the R package conjoint. The consumer preferences, specifying the choices made by consumers when presented with the product profiles. Second, the product profiles with the associated levels; thus, a desired attribute could be a laptop computer’s memory. And the associated memory levels would be two: 100GB or 250GB.
Here we create the profiles: the combination of products and product attributes that will be presented to customers for their choosing. Note the dimensions of the full panel: 72 x4 which is a result of the number of proposed attribute levels: 4 x 3 x 2 x 2.
slate = expand.grid(Brand=c("Apple","Dell","Gateway", "Compaq"),
Prices =c("400","600", "700"),
Memory = c("100GB","250GB"),
ScreenSize=c("15","19","20"))
head(slate)
Brand <fct> | Prices <fct> | Memory <fct> | ScreenSize <fct> | |
---|---|---|---|---|
1 | Apple | 400 | 100GB | 15 |
2 | Dell | 400 | 100GB | 15 |
3 | Gateway | 400 | 100GB | 15 |
4 | Compaq | 400 | 100GB | 15 |
5 | Apple | 600 | 100GB | 15 |
6 | Dell | 600 | 100GB | 15 |
In principle, this slate of choices is presented to prospective customers. They are then asked to rank them in preffered order. In practice, this is folly, any customer would soon grow weary of this exercise and probably walk away.
The following R function reduces the possible combinations to 14 possible combinations, a fraction of the original number.
The reduced set of 14 possible combinations cannot be randomly chosen. If every profile is the color black and has a weight of 100 - then it is not possible for us to tell if a prospective customer chose a profile because of the color or because of the weight.
In other words, we want the attributes to be independent. This is common sense. But it is important statistically as well - because the engine underlying the solution - the multiple linear regression model - is most efficient the less there is correlation among attributes. At its most efficient, there is no correlation between the attributes - which places them “orthogonal” to each other. Such a combination of product profiles is called an orthogonal design.
factdesign = caFactorialDesign(data=slate,
type="fractional")
head(factdesign)
Brand <fct> | Prices <fct> | Memory <fct> | ScreenSize <fct> | |
---|---|---|---|---|
1 | Apple | 400 | 100GB | 15 |
8 | Compaq | 600 | 100GB | 15 |
15 | Gateway | 400 | 250GB | 15 |
22 | Dell | 700 | 250GB | 15 |
31 | Gateway | 600 | 100GB | 19 |
36 | Compaq | 700 | 100GB | 19 |
And converted to a numeric format - to run the multiple linear regression.
profiles = caEncodedDesign(factdesign)
head(profiles)
Brand <int> | Prices <int> | Memory <int> | ScreenSize <int> | |
---|---|---|---|---|
1 | 1 | 1 | 1 | 1 |
8 | 4 | 2 | 1 | 1 |
15 | 3 | 1 | 2 | 1 |
22 | 2 | 3 | 2 | 1 |
31 | 3 | 2 | 1 | 2 |
36 | 4 | 3 | 1 | 2 |
The particular variations or categories of each product attribute are called levels. These are constructed here.
Pricelevels = factdesign %>% dplyr::select(Prices) %>% unique()
Brandlevels = factdesign %>% dplyr::select(Brand) %>% unique()
Memorylevels = factdesign %>% dplyr::select(Memory) %>% unique()
ScreenSizelevels = factdesign %>% dplyr::select(ScreenSize) %>% unique()
levelnames = c(
pull(Brandlevels),
pull(Pricelevels),
pull(Memorylevels),
pull(ScreenSizelevels))
cbind.data.frame(levelnames)
levelnames <fct> | ||||
---|---|---|---|---|
Apple | ||||
Compaq | ||||
Gateway | ||||
Dell | ||||
400 | ||||
600 | ||||
700 | ||||
100GB | ||||
250GB | ||||
15 |
The third key component of the analysis are the “results” of the survey, asking consumers to score or rank the slate of choices presented to them. This act of choosing reveals individual and group preferences, utility.
Here I simulate these results them using the package wakefield; the responses are randomly generated presumably ordered from highest to lowest to ensure the results of the multiple regression - the associated coefficients for each attribute - are in similar order
lvls = c("14","13","12", "11", "10", "9","8", "7", "6",
"5","4","3","2","1")
n = 14
s1 = r_sample_ordered(x = lvls, n)
s2 = r_sample_ordered(x = lvls, n)
s3 = r_sample_ordered(x = lvls, n)
s4 = r_sample_ordered(x = lvls, n)
s5 = r_sample_ordered(x = lvls, n)
and here we obtain the parameters if the linear model. Recall that they revealed the partial-utilities for all respondents.
preferences = cbind.data.frame(s1,s2,s3,s4,s5)
preferences = preferences %>% dplyr::select(s1:s5) %>%
mutate_if(is.factor, as.integer)
score = rowMeans(as.data.frame(preferences))
head( preferences)
s1 <int> | s2 <int> | s3 <int> | s4 <int> | s5 <int> | |
---|---|---|---|---|---|
1 | 12 | 10 | 1 | 5 | 3 |
2 | 1 | 12 | 12 | 8 | 2 |
3 | 12 | 4 | 11 | 4 | 13 |
4 | 5 | 6 | 1 | 3 | 14 |
5 | 13 | 3 | 14 | 10 | 6 |
6 | 9 | 6 | 4 | 8 | 4 |
The function Conjoint() returns all the results of the regression.
conjoint::Conjoint(y = preferences, x = profiles, z = levelnames)
##
## Call:
## lm(formula = frml)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6,610 -2,733 -0,331 2,861 7,439
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6,7824 0,4322 15,69 <0,0000000000000002 ***
## factor(x$Brand)1 1,8056 0,7749 2,33 0,023 *
## factor(x$Brand)2 -0,7500 0,7293 -1,03 0,308
## factor(x$Brand)3 -1,7278 0,7749 -2,23 0,029 *
## factor(x$Prices)1 0,5074 0,6535 0,78 0,440
## factor(x$Prices)2 0,2441 0,6063 0,40 0,689
## factor(x$Memory)1 -0,0361 0,4344 -0,08 0,934
## factor(x$ScreenSize)1 -0,9593 0,6535 -1,47 0,147
## factor(x$ScreenSize)2 0,7374 0,6063 1,22 0,229
## ---
## Signif. codes: 0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
##
## Residual standard error: 3,51 on 61 degrees of freedom
## Multiple R-squared: 0,183, Adjusted R-squared: 0,0761
## F-statistic: 1,71 on 8 and 61 DF, p-value: 0,114
## [1] "Part worths (utilities) of levels (model parameters for whole sample):"
## levnms utls
## 1 intercept 6,7824
## 2 Apple 1,8056
## 3 Compaq -0,75
## 4 Gateway -1,7278
## 5 Dell 0,6722
## 6 400 0,5074
## 7 600 0,2441
## 8 700 -0,7515
## 9 100GB -0,0361
## 10 250GB 0,0361
## 11 15 -0,9593
## 12 19 0,7374
## 13 20 0,2219
## [1] "Average importance of factors (attributes):"
## [1] 44,24 25,40 6,92 23,44
## [1] Sum of average importance: 100
## [1] "Chart of average factors importance"