library(fixest)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
productData<-read_csv("C:/Users/anubh/Dropbox/UG Emp IO/My Slides/R_markdownfiles/PSET Datasets/productData_loglinear.csv")
## Rows: 4000 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): cdid, product_in_market, firm_id, region
## dbl (7): product_id, x1, x2, cost, xi, price, share
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We construct three main sets of instruments following the BLP framework.
For each product \(j\) in market \(m\), define the leave-one-out average of rival characteristics:
\[ \bar{x}_{-j,m}^{(k)} = \frac{1}{J_m - 1} \sum_{\substack{r \in J_m \\ r \neq j}} x_{r,m}^{(k)} \]
where \(x_{r,m}^{(k)}\) denotes product characteristic \(k\) (e.g., sugar, mushy) and \(J_m\) is the number of products in market \(m\).
The BLP instrument for product \(j\) and characteristic \(k\) is:
\[ Z_{j,m}^{(k,BLP)} = |\bar{x}_{-j,m}^{(k)} - x_{j,m}^{(k)}| \]
Intuition: similarity to rival characteristics affect equilibrium prices via market competition but are uncorrelated with product-level unobserved quality.
# 2a) BLP instruments: leave-one-out sums of rivals' x1 and x2 within market (cdid)
productData <- productData %>%
group_by(cdid) %>%
mutate(
blp_x1_sum = abs((sum(x1) - x1)/(length(product_id)-1) - x1),
blp_x2_sum = abs((sum(x2) - x2)/(length(product_id)-1) - x2)
) %>%
ungroup()
For each product \(j\) sold in multiple markets \(m = 1,\dots,M_j\), define:
\[ \bar{p}_{-m,j} = \frac{1}{M_j - 1} \sum_{\substack{r \in M_j \\ r \neq m}} p_{r,j} \]
The Hausman–Nevo instrument is the average price of the same product in other markets:
\[ Z_{j,m}^{(HN)} = \bar{p}_{-m,j} \]
This captures firm-level pricing variation that is correlated with \(p_{j,m}\) but exogenous to market-specific demand shocks.
# 2b) Hausman-Nevo: mean price of same product in other markets (leave-market-out)
productData <- productData %>%
group_by(product_id) %>%
mutate(price_mean_other_markets = (sum(price) - price)/(length(cdid)-1) ) %>%
ungroup()
Let \(D_{r}\) denote a demographic
variable (e.g., mean income) in region \(r\) that contains markets \(m \in r\).
Then the leave-one-out regional demographic mean
is:
\[ \bar{D}_{-m,r} = \frac{1}{R_r - 1} \sum_{\substack{s \in r \\ s \neq m}} D_{s} \]
The Waldfogel instrument for market \(m\) is:
\[ Z_{m}^{(W)} = \bar{D}_{-m,r} \]
Demographic variation across regions affects market-level demand composition and hence equilibrium prices, but is plausibly exogenous to individual product quality.
# 2c) For walfdogel/Wolfdogel replace price with demographics.
productData <- productData %>%
group_by(product_id, region) %>%
mutate(othr_mkt_inc = (sum(xi)-xi)/(length(cdid)-1)) %>%
ungroup()
Summary Table
| Instrument Type | Formula | Source of Variation |
|---|---|---|
| BLP (within-market) | \(Z_{j,m}^{(BLP)} = |\bar{x}_{-j,m} - x_{j,m}|\) | Distance from Rivals’ characteristics |
| Hausman–Nevo | \(Z_{j,m}^{(HN)} = \bar{p}_{-m,j}\) | Same product prices across markets |
| Waldfogel | \(Z_{m}^{(W)} = \bar{D}_{-m,r}\) | Demographic variation across regions |
To linearize the discrete-choice demand system, we transform observed market shares into the Berry (1994) mean utility term.
For each product \(j\) in market \(m\):
\[ \delta_{j,m} = \ln(s_{j,m}) - \ln(s_{0,m}) \]
where: - \(s_{j,m}\) = observed market share of product \(j\) in market \(m\), - \(s_{0,m}\) = share of the outside good, computed as: \[ s_{0,m} = 1 - \sum_{j \in J_m} s_{j,m} \] with \(J_m\) being the set of products sold in market \(m\).
Interpretation
We begin with the simplest case, estimating the
logit demand equation for a single market \(m = 1\).
This corresponds to the case where asymptotics rely on the number of
products \(J_m \to \infty\),
not on the number of markets.
# ---------------------------
# Compute Berry delta (log(share) - log(outside_share))
# ---------------------------
productData <- productData %>%
group_by(cdid) %>%
mutate(sum_sh = sum(share),
outside_share = pmax(1 - sum_sh, 1e-9),
delta = log(share) - log(outside_share)) %>%
ungroup()
For consumer \(i\) choosing product \(j\) in market \(m\):
\[ U_{ij} = \beta_{\text{price}} \, p_{j} + \beta_{1} \, x_{1,j} + \beta_{2} \, x_{2,j} + \xi_{j} + \varepsilon_{ij} \]
where: - \(p_{j}\) = price of product \(j\), - \(x_{1,j}, x_{2,j}\) = observed product characteristics, - \(\xi_{j}\) = unobserved (to the econometrician) quality term, - \(\varepsilon_{ij}\) = i.i.d. Type I Extreme Value error.
m1 <- feols(delta ~ price + x1 + x2, data = productData[productData$cdid=="M001",])
For consumer \(i\) choosing product \(j\) in market \(m\):
\[ U_{ijm} = \beta_{\text{price}} \, p_{jm} + \beta_{1} \, x_{1,jm} + \beta_{2} \, x_{2,jm} + \xi_{jm} + \varepsilon_{ijm} \]
where: - \(p_{jm}\) = price of product \(j\), - \(x_{1,jm}, x_{2,jm}\) = observed product characteristics, - \(\xi_{jm}\) = unobserved (to the econometrician) quality term, - \(\varepsilon_{ijm}\) = i.i.d. Type I Extreme Value error.
m2 <- feols(delta ~ price + x1 + x2, data = productData)
For consumer \(i\) choosing product \(j\) in market \(m\):
\[ U_{ijm} = \beta_{\text{price}} \, p_{jm} + \beta_{1} \, x_{1,jm} + \beta_{2} \, x_{2,jm} + \xi_{j} + \xi_{m} + \xi_{jm} + \varepsilon_{ijm} \]
where: - \(p_{jm}\) = price of product \(j\), - \(x_{1,jm}, x_{2,jm}\) = observed product characteristics, - \(\xi_{j}\) = product level mean of quality/unobserved characteristic - \(\xi_{m}\) = market level mean of quality/unobserved characteristic - \(\xi_{jm}\) = unobserved (to the econometrician) characteristic, - \(\varepsilon_{ijm}\) = i.i.d. Type I Extreme Value error.
m3 <- feols(delta ~ price + x1 + x2 | product_id + cdid, data = productData)
For consumer \(i\) choosing product \(j\) in market \(m\):
\[ U_{ijm} = \beta_{\text{price}} \, p_{jm} + \beta_{1} \, x_{1,jm} + \beta_{2} \, x_{2,jm} + \xi_{f} + \xi_{m} + \xi_{jm} + \varepsilon_{ijm} \]
where: - \(p_{jm}\) = price of product \(j\), - \(x_{1,jm}, x_{2,jm}\) = observed product characteristics, - \(\xi_{f}\) = firm level mean of quality/unobserved characteristic. Brand value - \(\xi_{m}\) = market level mean of quality/unobserved characteristic - \(\xi_{jm}\) = unobserved (to the econometrician) characteristic, - \(\varepsilon_{ijm}\) = i.i.d. Type I Extreme Value error.
m4 <- feols(delta ~ price + x1 + x2 | firm_id + cdid, data = productData)
For consumer \(i\) choosing product \(j\) in market \(m\):
\[ U_{ijm} = \beta_{\text{price}} \, p_{jm} + \beta_{1} \, x_{1,jm} + \beta_{2} \, x_{2,jm} + \xi_{j} + \xi_{m} + \xi_{jm} + \varepsilon_{ijm} \]
where: - \(p_{jm}\) = price of product \(j\), - \(x_{1,jm}, x_{2,jm}\) = observed product characteristics, - \(\xi_{j}\) = Product level mean of quality/unobserved characteristic. - \(\xi_{m}\) = market level mean of quality/unobserved characteristic - \(\xi_{jm}\) = unobserved (to the econometrician) characteristic, - \(\varepsilon_{ijm}\) = i.i.d. Type I Extreme Value error.
m5 <- feols(delta ~ x1 + x2 | product_id + cdid| price ~ price_mean_other_markets + othr_mkt_inc + blp_x1_sum + blp_x2_sum, data = productData)
m6 <- feols(delta ~ x1 + x2 | firm_id + cdid| price ~ price_mean_other_markets + othr_mkt_inc + blp_x1_sum + blp_x2_sum, data = productData)
etable(m1, m2, m3, m4, m5, m6,
dict = c("price" = "Price", "x1" = "X1", "x2" = "X2", "income" = "MarketIncome"))
## m1 m2 m3
## Dependent Var.: delta delta delta
##
## Constant -16.23*** (2.088) -15.77*** (0.3546)
## Price 0.0625 (0.3863) 0.0132 (0.0642) 0.0198 (0.0625)
## X1 0.3762 (0.4836) 0.1362 (0.0841) 0.1338 (0.0860)
## X2 0.4932 (0.5162) 0.0655 (0.0801) 0.0783 (0.0842)
## Fixed-Effects: ----------------- ------------------ ---------------
## product_id No No Yes
## cdid No No Yes
## firm_id No No No
## _______________ _________________ __________________ _______________
## S.E. type IID IID by: product_id
## Observations 100 4,000 4,000
## R2 0.02177 0.00106 0.03310
## Within R2 -- -- 0.00116
##
## m4 m5 m6
## Dependent Var.: delta delta delta
##
## Constant
## Price 0.0262 (0.0759) 0.0198 (0.0625) -0.0437 (0.1757)
## X1 0.1140 (0.0962) 0.1338 (0.0860) 0.1492 (0.1191)
## X2 0.0441 (0.0732) 0.0783 (0.0842) 0.0595 (0.0753)
## Fixed-Effects: --------------- --------------- ----------------
## product_id No Yes No
## cdid Yes Yes Yes
## firm_id Yes No Yes
## _______________ _______________ _______________ ________________
## S.E. type by: firm_id by: product_id by: firm_id
## Observations 4,000 4,000 4,000
## R2 0.01909 0.03310 0.01879
## Within R2 0.00082 0.00116 0.00052
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The utility of consumer \(i\) if they purchase product \(j\) and reside in market \(m\) is:
\[ U_{ijm} = -\alpha_{im} \cdot p_{jm} + sugar_{jm} \cdot \beta_{im} + \xi_{jm} + \varepsilon_{ijm} \]
\[ \begin{bmatrix} \alpha_{im}\\ \beta_{im} \end{bmatrix} = \begin{bmatrix} \alpha_{0}\\ \beta_{0} \end{bmatrix}+\nu_{im} \quad \quad \text{s.t. }\quad \nu_{im} ~ N(0,\Sigma) \]
| Formula Block | Component | Specification | Interpretation |
|---|---|---|---|
| 1st block | Linear mean utility | share ~ price + productdummy |
Deterministic part of mean utility: observed characteristics (price) and product dummies. |
| 2nd block | Exogenous variables + Fixed effects | 0 + productdummy |
Includes product fixed effects; |
| 3rd block | Random coefficients | price + sugar |
Characteristics with consumer heterogeneity (taste variation). |
| 4th block | Instruments | 0 + IV1 + IV2 + ... + IV20 |
Excluded instruments used to identify endogenous regressors (e.g., price). |
Full Syntax Pattern: \[ \texttt{share ~ 1st block | 2nd block | 3rd block | 4th block} \]
library(BLPestimatoR)
library(tidyverse)
nevos_model <- as.formula("share ~ price + productdummy |
0 + productdummy |
price + sugar|
0+ IV1 + IV2 + IV3 + IV4 + IV5 + IV6 + IV7 + IV8 + IV9 + IV10 +
IV11 + IV12 + IV13 + IV14 + IV15 + IV16 + IV17 + IV18 + IV19 + IV20")
\[ \delta_{jm}^{l+1} = \delta_{jm}^{l} + \log(S_{jm}) - \log(s_{jm})\]
We need to provide \(\delta_{jm}^{1}\), the initial guess to find \(\delta_{jm}\)
productData_cereal$startingGuessesDelta <- c(log(w_guesses_cereal)) # include orig. draws in the product data
The BLP_data() function prepares all data structures
required for estimating the BLP or Nevo-style demand model.
It organizes product-level data, market identifiers, integration draws,
and estimation controls into a format used by
estimateBLP().
Function Call Structure
\[ \texttt{BLP data(model, market identifier, product identifier, productData, ...)} \]
This call initializes the dataset and simulation parameters used in the random-coefficients estimation.
| Argument | Role | Description |
|---|---|---|
model |
Structure | Utility specification, random coeffs, instruments |
market_identifier |
Market grouping | Defines each market \(m\) |
product_identifier |
Product grouping | Links same product across markets |
productData |
Input data | Product-level observations |
integration_method |
Simulation | Controls random-draw integration |
blp_inner_tol, blp_inner_maxit |
Numerical settings | Berry inversion convergence |
integration_accuracy,
integration_seed |
Monte Carlo control | Number and reproducibility of consumer draws |
cereal_data2 <- BLP_data(
model = nevos_model,
market_identifier = "cdid",
product_identifier = "product_id",
productData = productData_cereal,
integration_method = "MLHS",
blp_inner_tol = 1e-6, blp_inner_maxit = 5000,
integration_accuracy = 20, integration_seed = 213
)
## Mean utility (variable name: `delta`) is initialized with 0 because of missing or invalid par_delta argument.
blp_est1<-estimateBLP(blp_data = cereal_data2, printLevel = 1)
## blp_data were prepared with the following arguments:
## BLP_data(model = nevos_model, market_identifier = "cdid", product_identifier = "product_id",
## productData = productData_cereal, integration_accuracy = 20,
## integration_method = "MLHS", integration_seed = 213, blp_inner_tol = 1e-06,
## blp_inner_maxit = 5000)
## Starting a BLP demand estimation with 2256 observations in 94 markets...
## [integration::method integration::amountDraws 20 ]
## [blp::inner_tol 1e-06 blp::inner_maxit 5000 ]
## [solver::method BFGS solver::maxit 10000 solver::reltol 1e-06 ]
## gmm objective: 189.9435
## gmm objective: 77087.92
## gmm objective: 4111.535
## gmm objective: 374.4437
## gmm objective: 198.9622
## gmm objective: 190.1912
## gmm objective: 189.9244
## gmm objective: 7077.797
## gmm objective: 488.2191
## gmm objective: 200.5563
## gmm objective: 189.7251
## gmm objective: 189.0409
## gmm objective: 188.8877
## gmm objective: 188.8877
## gmm objective: 814.3173
## gmm objective: 220.147
## gmm objective: 190.1319
## gmm objective: 188.9366
## gmm objective: 188.8897
## gmm objective: 188.8879
## gmm objective: 188.8877
## gmm objective: 188.8877
## gmm objective: 188.8877
## gmm objective: 188.8876
## ------------------------------------------
## Solver message: Successful convergence
## ------------------------------------------
## Final GMM evaluation at optimal parameters:
## gmm objective: 188.888
## theta (RC): -0.07 1.07 0
## theta (demogr.):
## inner iterations: 42
## gradient: -4e-04 0 -0.0266
## Using the heteroskedastic asymptotic variance-covariance matrix...
summary(blp_est1)
##
## Data information:
##
## 94 market(s) with 2256 products
## 25 linear coefficient(s) (24 exogenous coefficients)
## 3 non-linear parameters related to random coefficients
## 0 demographic variable(s)
##
## Estimation results:
##
## Linear Coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.7687979 0.2614769 -6.764642 1.336388e-11
## price -30.1654692 2.2084233 -13.659279 1.777356e-42
## productdummyproduct10 1.8589449 0.1596595 11.643185 2.485575e-31
## productdummyproduct11 1.7629234 0.1393468 12.651334 1.099918e-36
## productdummyproduct12 -0.2872553 0.1599158 -1.796291 7.244819e-02
## productdummyproduct13 2.2497007 0.1497871 15.019324 5.486406e-51
## productdummyproduct14 1.7795564 0.1497489 11.883602 1.440238e-32
## productdummyproduct15 2.1594184 0.1446350 14.930118 2.098931e-50
## productdummyproduct16 3.6872370 0.1699590 21.694866 2.293909e-104
## productdummyproduct17 0.7886869 0.1573699 5.011677 5.395781e-07
## productdummyproduct18 0.8752717 0.1525111 5.739069 9.519846e-09
## productdummyproduct19 1.7679247 0.1446626 12.221021 2.400493e-34
## productdummyproduct2 2.3666780 0.1524484 15.524457 2.370036e-54
## productdummyproduct20 2.8078466 0.1501105 18.705203 4.490181e-78
## productdummyproduct21 2.6795586 0.1547226 17.318472 3.413003e-67
## productdummyproduct22 0.4724688 0.1446494 3.266303 1.089617e-03
## productdummyproduct23 1.3757516 0.1429032 9.627158 6.140419e-22
## productdummyproduct24 2.1723598 0.1596999 13.602763 3.855876e-42
## productdummyproduct3 1.8020360 0.1492630 12.072893 1.468775e-33
## productdummyproduct4 0.7641588 0.1380172 5.536695 3.082334e-08
##
## ...
##
## 5 estimates are omitted. They are available in the LinCoefficients generated by summary.
##
## Random Coefficients
## Estimate Std. Error t value Pr(>|t|)
## unobs_sd*(Intercept) -0.0709765270 1.10554060 -0.064200742 0.9488104
## unobs_sd*price 1.0705189927 12.83936039 0.083377907 0.9335511
## unobs_sd*sugar -0.0007739994 0.09003039 -0.008597091 0.9931406
##
## Wald Test
## 0.0306 on 3 DF, p-value: 0.998586809404042
##
## Computational Details:
## Solver converged with 24 iterations to a minimum at 188.888 .
## Local minima check: NA
## stopping criterion outer loop: 1e-06
## stopping criterion inner loop: 1e-06
## Market shares are integrated with MLHS and 20 draws.
## Method for standard errors: heteroskedastic
In the cereal data set, the level of sugar and also mushy does not vary within a product. Therefore, product dummy absorbs all the mean slopes associated with sugar and mushy. If one wishes to look at mean slopes then the following formula should be used:
nevos_model2 <- as.formula("share ~ price + sugar |
0 + sugar |
price + sugar|
0+ IV1 + IV2 + IV3 + IV4 + IV5 + IV6 + IV7 + IV8 + IV9 + IV10 +
IV11 + IV12 + IV13 + IV14 + IV15 + IV16 + IV17 + IV18 + IV19 + IV20")
cereal_data3 <- BLP_data(
model = nevos_model2,
market_identifier = "cdid",
product_identifier = "product_id",
productData = productData_cereal,
integration_method = "MLHS",
blp_inner_tol = 1e-6, blp_inner_maxit = 5000,
integration_accuracy = 20, integration_seed = 213
)
## Mean utility (variable name: `delta`) is initialized with 0 because of missing or invalid par_delta argument.
blp_est1<-estimateBLP(blp_data = cereal_data3, printLevel = 1)
## blp_data were prepared with the following arguments:
## BLP_data(model = nevos_model2, market_identifier = "cdid", product_identifier = "product_id",
## productData = productData_cereal, integration_accuracy = 20,
## integration_method = "MLHS", integration_seed = 213, blp_inner_tol = 1e-06,
## blp_inner_maxit = 5000)
## Starting a BLP demand estimation with 2256 observations in 94 markets...
## [integration::method integration::amountDraws 20 ]
## [blp::inner_tol 1e-06 blp::inner_maxit 5000 ]
## [solver::method BFGS solver::maxit 10000 solver::reltol 1e-06 ]
## gmm objective: 265.3415
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: 67324.66
## gmm objective: 2711.537
## gmm objective: 310.3391
## gmm objective: 262.4372
## gmm objective: 369.6278
## gmm objective: 263.1742
## gmm objective: 261.749
## gmm objective: 7861.726
## gmm objective: 510.7391
## gmm objective: 265.0589
## gmm objective: 261.252
## gmm objective: 259.1477
## gmm objective: 259.1314
## gmm objective: 259.1298
## gmm objective: 259.13
## gmm objective: 259.1302
## gmm objective: 259.1302
## gmm objective: 259.1299
## gmm objective: 259.1299
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 5397.46
## gmm objective: 472.4613
## gmm objective: 265.9145
## gmm objective: 259.3404
## gmm objective: 259.1366
## gmm objective: 259.13
## ------------------------------------------
## Solver message: Successful convergence
## ------------------------------------------
## Final GMM evaluation at optimal parameters:
## gmm objective: 259.1303
## theta (RC): -0.11 2.06 -0.05
## theta (demogr.):
## inner iterations: 44
## gradient: -0.0023 -0.0083 -0.9639
## Using the heteroskedastic asymptotic variance-covariance matrix...
summary(blp_est1)
##
## Data information:
##
## 94 market(s) with 2256 products
## 3 linear coefficient(s) (1 exogenous coefficients)
## 3 non-linear parameters related to random coefficients
## 0 demographic variable(s)
##
## Estimation results:
##
## Linear Coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.61506294 0.40981595 -6.381067 1.758586e-10
## price -12.52062456 3.78100238 -3.311456 9.281170e-04
## sugar 0.03557057 0.01105716 3.216972 1.295512e-03
##
## Random Coefficients
## Estimate Std. Error t value Pr(>|t|)
## unobs_sd*(Intercept) -0.10635106 1.38068597 -0.07702769 0.93860151
## unobs_sd*price 2.06074915 12.35635362 0.16677648 0.86754593
## unobs_sd*sugar -0.05140232 0.02298225 -2.23660940 0.02531188
##
## Wald Test
## 5.6437 on 3 DF, p-value: 0.13029014550078
##
## Computational Details:
## Solver converged with 42 iterations to a minimum at 259.1303 .
## Local minima check: NA
## stopping criterion outer loop: 1e-06
## stopping criterion inner loop: 1e-06
## Market shares are integrated with MLHS and 20 draws.
## Method for standard errors: heteroskedastic
\[ \begin{bmatrix} \alpha_{im}\\ \beta_{im} \end{bmatrix} = \begin{bmatrix} \alpha_{0}\\ \beta_{0} \end{bmatrix} + \Pi_{(K+1)}^{Income} Income_{rim} + \Pi_{(K+1)}^{Incomesq} Incomesq_{rim} +\nu_{im} \]
The package provides demographicData_cereal which is a
list of matrices. Each matrix has the same dimensions, where rows map to
markets and columns map to a pseudo consumer. Matrices contain draws of
consumer demographics from the demographic distribution observed at the
market level. Below I show what these individual matrices look like for
income, income square, and age.
demographicData_cereal$income[1:4, 1:5]
## cdid draw_1 draw_2 draw_3 draw_4
## 1 market_1 0.49512349 0.3787622 0.1050146 -1.48548093
## 2 market_2 0.05389113 -1.5030833 0.6217917 0.26922901
## 3 market_3 0.62459788 -0.2242698 -0.2846169 0.67845943
## 4 market_4 0.94419851 0.4056359 0.5859923 0.01813051
demographicData_cereal$incomesq[1:4, 1:5]
## cdid draw_1 draw_2 draw_3 draw_4
## 1 market_1 8.3313042 6.121865 1.030803 -25.5836047
## 2 market_2 0.0966346 -25.849845 10.767233 4.0668173
## 3 market_3 10.8215619 -4.894545 -5.956954 11.8673879
## 4 market_4 17.1121550 6.629730 10.075529 -0.5537043
demographicData_cereal$age[1:4, 1:5]
## cdid draw_1 draw_2 draw_3 draw_4
## 1 market_1 -0.2301090 -2.5326941 -0.006965458 -0.8279460
## 2 market_2 0.1414545 -0.1813188 -1.279931134 -0.4532526
## 3 market_3 -0.1813188 -0.5867840 0.208145922 -0.4532526
## 4 market_4 0.7444506 0.1414545 0.645359728 0.8346017
For our model specification, we only need income and income square demographics. So we will create a new list of matrices that contains only these two.
demData_cereal<-list()
demData_cereal[["income"]]<-demographicData_cereal[[1]]
demData_cereal[["incomesq"]]<-demographicData_cereal[[2]]
The package provides originalDraws_cereal which is also
a list of matrices. Each matrix has the same dimensions, where rows map
to markets and columns map to a pseudo consumer. Matrices contain draws
of consumer’s unobserved taste shocks to their taste parameter/slope
coefficients associated with various product attributes. Below I show
what these individual matrices look like for price and
sugar.
originalDraws_cereal$price[1:4, 1:5]
## cdid draw_1 draw_2 draw_3 draw_4
## 1 market_1 -1.5008378 0.1331817 -0.1382405 1.2571357
## 2 market_2 0.0276477 -0.8414414 -0.9056861 -2.0306179
## 3 market_3 -0.4768085 -0.9348689 1.4300456 2.6188581
## 4 market_4 0.5396019 0.1698178 0.7128376 -0.1216309
originalDraws_cereal$sugar[1:4, 1:5]
## cdid draw_1 draw_2 draw_3 draw_4
## 1 market_1 -1.1510786 -0.5007498 0.7974412 -0.6830540
## 2 market_2 1.0451218 1.0170277 2.3012550 0.1490338
## 3 market_3 -1.0187809 0.3107185 -0.9531532 0.8057080
## 4 market_4 -0.5760582 -0.5072059 -0.7496653 1.1163544
For our taste shock coefficients we will only need unobserved taste shocks for the intercept term, price’s slope coefficient, and sugar’s slope coefficient. So we create a new list for that as well.
origD<- list()
origD[["(Intercept)"]]<-originalDraws_cereal[["constant"]]
origD[["price"]]<-originalDraws_cereal[["price"]]
origD[["sugar"]]<-originalDraws_cereal[["sugar"]]
BLP_data object with DemographicsUnder custom demographics, we cannot simply use the default options
for many things anymore. Therefore we use a more customized
BLP_data object. The arguments and the types of values that
go within it are state below:
| Argument | Specification | Description / Role |
|---|---|---|
model |
nevos_model |
Formula defining mean utility terms, fixed effects, random coefficients, and instruments. |
market_identifier |
"cdid" |
Variable identifying distinct markets (e.g., city–year or region–time). |
par_delta |
"startingGuessesDelta" |
Initial guess for mean utility \(\delta_{jm}^{(0)}\); starting values for Berry inversion. |
product_identifier |
"product_id" |
Unique identifier for products across markets; links same product across multiple markets. |
productData |
productData_cereal |
Product-level dataset containing shares, prices, characteristics, and instruments. |
demographic_draws |
demData_cereal |
List of simulated demographic variables (e.g., income, income²) for consumer heterogeneity. |
blp_inner_tol |
1e-6 |
Convergence tolerance for the Berry contraction mapping. |
blp_inner_maxit |
5000 |
Maximum iterations for the inner-loop (share inversion). |
integration_draws |
origD |
Random draws representing simulated consumer taste coefficients. |
integration_weights |
rep(1 / 20, 20) |
Equal weights assigned to 20 simulated consumers per market for numerical integration. |
cereal_data <- BLP_data(
model = nevos_model,
market_identifier = "cdid",
par_delta = "startingGuessesDelta",
product_identifier = "product_id",
productData = productData_cereal,
demographic_draws = demData_cereal,
blp_inner_tol = 1e-6, blp_inner_maxit = 5000,
integration_draws = origD,
integration_weights = rep(1 / 20, 20)
)
The matrix \(\Pi\) (often denoted as
theta2 in BLPestimatoR) governs how
demographic variables interact with consumers’
taste coefficients.
Each element represents how a demographic variable (e.g., income,
income²) shifts a consumer’s sensitivity to price or product
attributes.
We first provide an initial guess for the \(3 \times 3\) matrix
theta_guesses_cereal, where:
Mathematically, the random coefficient vector for consumer \(i\) is:
\[ \tilde{\beta}_{im} = \tilde{\beta} + \Pi D_{im} + \Sigma \tilde{\nu}_{im} \]
where:
We use exp(rnorm(...)) to ensure positive starting
values for numerical stability during estimation.
set.seed(123)
theta_guesses_cereal<-matrix(data = exp(rnorm(9, mean = 0, sd = 1)),nrow = 3,ncol = 3)
colnames(theta_guesses_cereal) <- c("unobs_sd", "income", "incomesq")
rownames(theta_guesses_cereal) <- c("(Intercept)", "price", "sugar")
# correctly named:
theta_guesses_cereal
## unobs_sd income incomesq
## (Intercept) 0.5709374 1.073054 1.5855260
## price 0.7943926 1.138018 0.2822220
## sugar 4.7526783 5.557037 0.5031571
To restrict the demographic influence of income² only to
the price coefficient, we set the corresponding
elements of \(\Pi\) to
NA:
\[ \Pi = \begin{bmatrix} \Pi_{(Intercept), \text{unobs_sd}} & \Pi_{(Intercept), \text{income}} & \text{NA} \\ \Pi_{(\text{price}), \text{unobs_sd}} & \Pi_{(\text{price}), \text{income}} & \Pi_{(\text{price}), \text{income}^2} \\ \Pi_{(\text{sugar}), \text{unobs_sd}} & \Pi_{(\text{sugar}), \text{income}} & \text{NA} \end{bmatrix} \]
This restriction means:
income².Such restrictions improve identification and help control over-parameterization, especially when demographic data are limited.
theta_guesses_cereal[c(1,3),3]<-NA
theta_guesses_cereal
## unobs_sd income incomesq
## (Intercept) 0.5709374 1.073054 NA
## price 0.7943926 1.138018 0.282222
## sugar 4.7526783 5.557037 NA
Argument details of estimateBLP:
| Argument | Specification | Description / Role |
|---|---|---|
blp_data |
cereal_data |
BLP dataset object created via BLP_data(); contains
product, market, demographic, and integration information. |
par_theta2 |
theta_guesses_cereal |
Initial parameter matrix for random-coefficient standard deviations and demographic interactions. |
solver_method |
"BFGS" |
Optimization algorithm used to minimize the GMM objective function. |
solver_maxit |
1000 |
Maximum number of iterations allowed for the outer-loop optimization. |
solver_reltol |
1e-6 |
Convergence tolerance for the solver (relative change in objective). |
standardError |
"heteroskedastic" |
Type of standard errors computed for parameter estimates. |
extremumCheck |
FALSE |
Disables additional local extremum checking (set TRUE
for robustness tests). |
printLevel |
1 |
Controls output verbosity during estimation; 1 prints
basic iteration progress. |
cereal_est <- estimateBLP(
blp_data = cereal_data,
par_theta2 = theta_guesses_cereal,
solver_method = "BFGS", solver_maxit = 1000, solver_reltol = 1e-6,
standardError = "heteroskedastic",
extremumCheck = FALSE,
printLevel = 1
)
## blp_data were prepared with the following arguments:
## BLP_data(model = nevos_model, market_identifier = "cdid", product_identifier = "product_id",
## par_delta = "startingGuessesDelta", productData = productData_cereal,
## demographic_draws = demData_cereal, integration_draws = origD,
## integration_weights = rep(1/20, 20), blp_inner_tol = 1e-06,
## blp_inner_maxit = 5000)
## Starting a BLP demand estimation with 2256 observations in 94 markets...
## [integration::method integration::amountDraws 20 ]
## [blp::inner_tol 1e-06 blp::inner_maxit 5000 ]
## [solver::method BFGS solver::maxit 1000 solver::reltol 1e-06 ]
## gmm objective: 409853.9
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: 59085.55
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: 133936.3
## gmm objective: 37150.72
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: 34319.97
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: 36310.27
## gmm objective: 33184.99
## gmm objective: Inf [delta contains NaN's]
## gmm objective: 26425.06
## gmm objective: 38235.71
## gmm objective: 15882.64
## gmm objective: 173386.3
## gmm objective: 11165.15
## gmm objective: 37495.67
## gmm objective: 7910.851
## gmm objective: Inf [delta contains NaN's]
## gmm objective: 11274.85
## gmm objective: 6288.903
## gmm objective: 161823.9
## gmm objective: 10116.64
## gmm objective: 5823.248
## gmm objective: 3296.159
## gmm objective: 1179.461
## gmm objective: 841.7361
## gmm objective: 442.4041
## gmm objective: 278.5394
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: 19988.23
## gmm objective: 561.3904
## gmm objective: 242.6125
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: 365698.8
## gmm objective: 14908.54
## gmm objective: 821.9894
## gmm objective: 239.5668
## gmm objective: Inf [delta contains NaN's]
## gmm objective: Inf [delta contains NaN's]
## gmm objective: 21905.76
## gmm objective: 231.3619
## gmm objective: 99891.74
## gmm objective: 13111.89
## gmm objective: 300.556
## gmm objective: 220.7689
## gmm objective: 379.5581
## gmm objective: 145.2948
## gmm objective: 263.8472
## gmm objective: 85.4407
## gmm objective: 112.9728
## gmm objective: 64.5782
## gmm objective: 430.8553
## gmm objective: 46.8967
## gmm objective: 41.7351
## gmm objective: 39.0365
## gmm objective: 38.9347
## gmm objective: 38.8817
## gmm objective: 38.8691
## gmm objective: 38.8596
## gmm objective: 38.8568
## gmm objective: 4729.26
## gmm objective: 271.864
## gmm objective: 49.528
## gmm objective: 39.2719
## gmm objective: 38.8717
## gmm objective: 38.8573
## gmm objective: 38.8569
## gmm objective: 38.857
## gmm objective: 38.8567
## gmm objective: 9432.379
## gmm objective: 428.6707
## gmm objective: 50.0763
## gmm objective: 39.2248
## gmm objective: 38.8699
## gmm objective: 38.8569
## gmm objective: 38.8564
## gmm objective: 42.0326
## gmm objective: 38.9724
## gmm objective: 38.8604
## gmm objective: 38.8567
## gmm objective: 38.8569
## gmm objective: 38.857
## gmm objective: 38.857
## gmm objective: 38.8569
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 4809.167
## gmm objective: 282.285
## gmm objective: 53.5139
## gmm objective: 39.5282
## gmm objective: 38.8821
## gmm objective: 38.8572
## gmm objective: 38.8567
## gmm objective: 38.8567
## gmm objective: 38.8566
## gmm objective: 38.8567
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## ------------------------------------------
## Solver message: Successful convergence
## ------------------------------------------
## Final GMM evaluation at optimal parameters:
## gmm objective: 38.8568
## theta (RC): 0.26 2.09 -0.01
## theta (demogr.): 3.5 22.24 -0.16 0 -1.14 0
## inner iterations: 62
## gradient: 0.1905 -0.0055 0.9096 -0.1059 -0.0539 -0.9763 -0.2801
## Using the heteroskedastic asymptotic variance-covariance matrix...
summary(cereal_est)
##
## Data information:
##
## 94 market(s) with 2256 products
## 25 linear coefficient(s) (24 exogenous coefficients)
## 7 non-linear parameters related to random coefficients
## 2 demographic variable(s)
##
## Estimation results:
##
## Linear Coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.70109899 0.3193686 -8.4576218 2.728854e-17
## price -31.45917696 4.0287222 -7.8087234 5.777015e-15
## productdummyproduct10 1.73559033 0.1302492 13.3251484 1.653038e-40
## productdummyproduct11 2.63298336 0.2033102 12.9505690 2.332191e-38
## productdummyproduct12 -0.04283125 0.1391659 -0.3077712 7.582565e-01
## productdummyproduct13 2.37639536 0.1332336 17.8363110 3.692855e-71
## productdummyproduct14 2.76749898 0.2225029 12.4380362 1.624489e-35
## productdummyproduct15 3.14831633 0.2191466 14.3662600 8.426017e-47
## productdummyproduct16 4.80113843 0.2450446 19.5929143 1.777201e-85
## productdummyproduct17 1.59050045 0.2001071 7.9482468 1.891696e-15
## productdummyproduct18 1.00227867 0.1382094 7.2518841 4.110134e-13
## productdummyproduct19 2.90487567 0.2508688 11.5792603 5.249709e-31
## productdummyproduct2 3.50550584 0.2446810 14.3268405 1.487348e-46
## productdummyproduct20 3.36075421 0.1707321 19.6843694 2.935675e-86
## productdummyproduct21 3.71814879 0.2286113 16.2640640 1.775598e-59
## productdummyproduct22 0.92917911 0.1506774 6.1666784 6.973939e-10
## productdummyproduct23 2.30561396 0.2134405 10.8021411 3.362692e-27
## productdummyproduct24 1.91457256 0.1473704 12.9915688 1.365979e-38
## productdummyproduct3 2.04504691 0.1288559 15.8708093 1.009310e-56
## productdummyproduct4 0.89165724 0.1170136 7.6201176 2.534446e-14
##
## ...
##
## 5 estimates are omitted. They are available in the LinCoefficients generated by summary.
##
## Random Coefficients
## Estimate Std. Error t value Pr(>|t|)
## unobs_sd*(Intercept) 0.256593310 0.10736535 2.3899079 1.685260e-02
## unobs_sd*price 2.092086121 0.78942708 2.6501322 8.046028e-03
## unobs_sd*sugar -0.007949391 0.01052495 -0.7552902 4.500749e-01
## income*(Intercept) 3.503446580 0.62327280 5.6210484 1.898021e-08
## income*price 22.242228769 79.50494383 0.2797591 7.796624e-01
## income*sugar -0.163490348 0.02739416 -5.9680736 2.400710e-09
## incomesq*price -1.144111432 4.12066745 -0.2776520 7.812795e-01
##
## Wald Test
## 180.0485 on 7 DF, p-value: 1.90277006974e-35
##
## Computational Details:
## Solver converged with 140 iterations to a minimum at 38.8568 .
## Local minima check: NA
## stopping criterion outer loop: 1e-06
## stopping criterion inner loop: 1e-06
## Market shares are integrated with provided_by_user and 20 draws.
## Method for standard errors: heteroskedastic