1 Standard Discrete Choice - using FEOLS

1.0.1 Load the Dataset

library(fixest)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
productData<-read_csv("C:/Users/anubh/Dropbox/UG Emp IO/My Slides/R_markdownfiles/PSET Datasets/productData_loglinear.csv")
## Rows: 4000 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): cdid, product_in_market, firm_id, region
## dbl (7): product_id, x1, x2, cost, xi, price, share
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

1.1 Instrument Construction

We construct three main sets of instruments following the BLP framework.


1.1.1 (a) BLP Instruments (Within-Market, Rivals’ Characteristics)

For each product \(j\) in market \(m\), define the leave-one-out average of rival characteristics:

\[ \bar{x}_{-j,m}^{(k)} = \frac{1}{J_m - 1} \sum_{\substack{r \in J_m \\ r \neq j}} x_{r,m}^{(k)} \]

where \(x_{r,m}^{(k)}\) denotes product characteristic \(k\) (e.g., sugar, mushy) and \(J_m\) is the number of products in market \(m\).

The BLP instrument for product \(j\) and characteristic \(k\) is:

\[ Z_{j,m}^{(k,BLP)} = |\bar{x}_{-j,m}^{(k)} - x_{j,m}^{(k)}| \]

Intuition: similarity to rival characteristics affect equilibrium prices via market competition but are uncorrelated with product-level unobserved quality.

# 2a) BLP instruments: leave-one-out sums of rivals' x1 and x2 within market (cdid)
productData <- productData %>%
  group_by(cdid) %>%
  mutate(
    blp_x1_sum = abs((sum(x1) - x1)/(length(product_id)-1) - x1),
    blp_x2_sum = abs((sum(x2) - x2)/(length(product_id)-1) - x2)
  ) %>%
  ungroup()

1.1.2 (b) Hausman–Nevo Instruments (Across-Market, Same Product)

For each product \(j\) sold in multiple markets \(m = 1,\dots,M_j\), define:

\[ \bar{p}_{-m,j} = \frac{1}{M_j - 1} \sum_{\substack{r \in M_j \\ r \neq m}} p_{r,j} \]

The Hausman–Nevo instrument is the average price of the same product in other markets:

\[ Z_{j,m}^{(HN)} = \bar{p}_{-m,j} \]

This captures firm-level pricing variation that is correlated with \(p_{j,m}\) but exogenous to market-specific demand shocks.

# 2b) Hausman-Nevo: mean price of same product in other markets (leave-market-out)
productData <- productData %>%
  group_by(product_id) %>%
  mutate(price_mean_other_markets = (sum(price) - price)/(length(cdid)-1) ) %>%
  ungroup()

1.1.3 (c) Waldfogel Instruments (Regional Demographics)

Let \(D_{r}\) denote a demographic variable (e.g., mean income) in region \(r\) that contains markets \(m \in r\).
Then the leave-one-out regional demographic mean is:

\[ \bar{D}_{-m,r} = \frac{1}{R_r - 1} \sum_{\substack{s \in r \\ s \neq m}} D_{s} \]

The Waldfogel instrument for market \(m\) is:

\[ Z_{m}^{(W)} = \bar{D}_{-m,r} \]

Demographic variation across regions affects market-level demand composition and hence equilibrium prices, but is plausibly exogenous to individual product quality.

# 2c) For walfdogel/Wolfdogel replace price with demographics. 
productData <- productData %>% 
  group_by(product_id, region) %>% 
  mutate(othr_mkt_inc = (sum(xi)-xi)/(length(cdid)-1)) %>%
  ungroup()

Summary Table

Instrument Type Formula Source of Variation
BLP (within-market) \(Z_{j,m}^{(BLP)} = |\bar{x}_{-j,m} - x_{j,m}|\) Distance from Rivals’ characteristics
Hausman–Nevo \(Z_{j,m}^{(HN)} = \bar{p}_{-m,j}\) Same product prices across markets
Waldfogel \(Z_{m}^{(W)} = \bar{D}_{-m,r}\) Demographic variation across regions

1.2 Estimating Discrete Choice via Standard Logit:

1.2.1 Berry Mean Utility Transformation

To linearize the discrete-choice demand system, we transform observed market shares into the Berry (1994) mean utility term.

For each product \(j\) in market \(m\):

\[ \delta_{j,m} = \ln(s_{j,m}) - \ln(s_{0,m}) \]

where: - \(s_{j,m}\) = observed market share of product \(j\) in market \(m\), - \(s_{0,m}\) = share of the outside good, computed as: \[ s_{0,m} = 1 - \sum_{j \in J_m} s_{j,m} \] with \(J_m\) being the set of products sold in market \(m\).


Interpretation

  • \(\delta_{j,m}\) represents the mean utility level of product \(j\) relative to the outside option.
  • Taking logarithmic differences yields a linear form suitable for estimation: \[ \delta_{j,m} = \mathbf{x}_{j,m}' \beta + \xi_{j,m} \] where \(\mathbf{x}_{j,m}\) are observed product characteristics, \(\beta\) are taste parameters, and \(\xi_{j,m}\) captures unobserved quality.

We begin with the simplest case, estimating the logit demand equation for a single market \(m = 1\).
This corresponds to the case where asymptotics rely on the number of products \(J_m \to \infty\), not on the number of markets.


# ---------------------------
# Compute Berry delta (log(share) - log(outside_share))
# ---------------------------
productData <- productData %>%
  group_by(cdid) %>%
  mutate(sum_sh = sum(share), 
         outside_share = pmax(1 - sum_sh, 1e-9),
         delta = log(share) - log(outside_share)) %>%
  ungroup()

1.2.2 Model 1: Product level variation only

For consumer \(i\) choosing product \(j\) in market \(m\):

\[ U_{ij} = \beta_{\text{price}} \, p_{j} + \beta_{1} \, x_{1,j} + \beta_{2} \, x_{2,j} + \xi_{j} + \varepsilon_{ij} \]

where: - \(p_{j}\) = price of product \(j\), - \(x_{1,j}, x_{2,j}\) = observed product characteristics, - \(\xi_{j}\) = unobserved (to the econometrician) quality term, - \(\varepsilon_{ij}\) = i.i.d. Type I Extreme Value error.


m1 <- feols(delta ~ price + x1 + x2, data = productData[productData$cdid=="M001",])

1.2.3 Model 2: Product and Market level variation

For consumer \(i\) choosing product \(j\) in market \(m\):

\[ U_{ijm} = \beta_{\text{price}} \, p_{jm} + \beta_{1} \, x_{1,jm} + \beta_{2} \, x_{2,jm} + \xi_{jm} + \varepsilon_{ijm} \]

where: - \(p_{jm}\) = price of product \(j\), - \(x_{1,jm}, x_{2,jm}\) = observed product characteristics, - \(\xi_{jm}\) = unobserved (to the econometrician) quality term, - \(\varepsilon_{ijm}\) = i.i.d. Type I Extreme Value error.


m2 <- feols(delta ~ price + x1 + x2, data = productData)

1.2.4 Model 3: Product and Market level variation with Product + Market FEs

For consumer \(i\) choosing product \(j\) in market \(m\):

\[ U_{ijm} = \beta_{\text{price}} \, p_{jm} + \beta_{1} \, x_{1,jm} + \beta_{2} \, x_{2,jm} + \xi_{j} + \xi_{m} + \xi_{jm} + \varepsilon_{ijm} \]

where: - \(p_{jm}\) = price of product \(j\), - \(x_{1,jm}, x_{2,jm}\) = observed product characteristics, - \(\xi_{j}\) = product level mean of quality/unobserved characteristic - \(\xi_{m}\) = market level mean of quality/unobserved characteristic - \(\xi_{jm}\) = unobserved (to the econometrician) characteristic, - \(\varepsilon_{ijm}\) = i.i.d. Type I Extreme Value error.


m3 <- feols(delta ~ price + x1 + x2 | product_id + cdid, data = productData)

1.2.5 Model 4: Product and Market level variation with Frim/Brand + Market FEs

For consumer \(i\) choosing product \(j\) in market \(m\):

\[ U_{ijm} = \beta_{\text{price}} \, p_{jm} + \beta_{1} \, x_{1,jm} + \beta_{2} \, x_{2,jm} + \xi_{f} + \xi_{m} + \xi_{jm} + \varepsilon_{ijm} \]

where: - \(p_{jm}\) = price of product \(j\), - \(x_{1,jm}, x_{2,jm}\) = observed product characteristics, - \(\xi_{f}\) = firm level mean of quality/unobserved characteristic. Brand value - \(\xi_{m}\) = market level mean of quality/unobserved characteristic - \(\xi_{jm}\) = unobserved (to the econometrician) characteristic, - \(\varepsilon_{ijm}\) = i.i.d. Type I Extreme Value error.


m4 <- feols(delta ~ price + x1 + x2 | firm_id + cdid, data = productData)

1.2.6 IV estimation

For consumer \(i\) choosing product \(j\) in market \(m\):

\[ U_{ijm} = \beta_{\text{price}} \, p_{jm} + \beta_{1} \, x_{1,jm} + \beta_{2} \, x_{2,jm} + \xi_{j} + \xi_{m} + \xi_{jm} + \varepsilon_{ijm} \]

where: - \(p_{jm}\) = price of product \(j\), - \(x_{1,jm}, x_{2,jm}\) = observed product characteristics, - \(\xi_{j}\) = Product level mean of quality/unobserved characteristic. - \(\xi_{m}\) = market level mean of quality/unobserved characteristic - \(\xi_{jm}\) = unobserved (to the econometrician) characteristic, - \(\varepsilon_{ijm}\) = i.i.d. Type I Extreme Value error.


m5 <- feols(delta ~ x1 + x2 | product_id + cdid| price ~ price_mean_other_markets + othr_mkt_inc + blp_x1_sum + blp_x2_sum, data = productData)
m6 <- feols(delta ~ x1 + x2 | firm_id + cdid| price ~ price_mean_other_markets + othr_mkt_inc + blp_x1_sum + blp_x2_sum, data = productData)

1.2.7 Regression Results

etable(m1, m2, m3, m4, m5, m6, 
       dict = c("price" = "Price", "x1" = "X1", "x2" = "X2", "income" = "MarketIncome"))
##                                m1                 m2              m3
## Dependent Var.:             delta              delta           delta
##                                                                     
## Constant        -16.23*** (2.088) -15.77*** (0.3546)                
## Price             0.0625 (0.3863)    0.0132 (0.0642) 0.0198 (0.0625)
## X1                0.3762 (0.4836)    0.1362 (0.0841) 0.1338 (0.0860)
## X2                0.4932 (0.5162)    0.0655 (0.0801) 0.0783 (0.0842)
## Fixed-Effects:  ----------------- ------------------ ---------------
## product_id                     No                 No             Yes
## cdid                           No                 No             Yes
## firm_id                        No                 No              No
## _______________ _________________ __________________ _______________
## S.E. type                     IID                IID  by: product_id
## Observations                  100              4,000           4,000
## R2                        0.02177            0.00106         0.03310
## Within R2                      --                 --         0.00116
## 
##                              m4              m5               m6
## Dependent Var.:           delta           delta            delta
##                                                                 
## Constant                                                        
## Price           0.0262 (0.0759) 0.0198 (0.0625) -0.0437 (0.1757)
## X1              0.1140 (0.0962) 0.1338 (0.0860)  0.1492 (0.1191)
## X2              0.0441 (0.0732) 0.0783 (0.0842)  0.0595 (0.0753)
## Fixed-Effects:  --------------- --------------- ----------------
## product_id                   No             Yes               No
## cdid                        Yes             Yes              Yes
## firm_id                     Yes              No              Yes
## _______________ _______________ _______________ ________________
## S.E. type           by: firm_id  by: product_id      by: firm_id
## Observations              4,000           4,000            4,000
## R2                      0.01909         0.03310          0.01879
## Within R2               0.00082         0.00116          0.00052
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2 Estimating Random Coefficient

The utility of consumer \(i\) if they purchase product \(j\) and reside in market \(m\) is:

\[ U_{ijm} = -\alpha_{im} \cdot p_{jm} + sugar_{jm} \cdot \beta_{im} + \xi_{jm} + \varepsilon_{ijm} \]

2.1 Case 1: Without Demographics

\[ \begin{bmatrix} \alpha_{im}\\ \beta_{im} \end{bmatrix} = \begin{bmatrix} \alpha_{0}\\ \beta_{0} \end{bmatrix}+\nu_{im} \quad \quad \text{s.t. }\quad \nu_{im} ~ N(0,\Sigma) \]

2.1.1 Specify the formula

Formula Block Component Specification Interpretation
1st block Linear mean utility share ~ price + productdummy Deterministic part of mean utility: observed characteristics (price) and product dummies.
2nd block Exogenous variables + Fixed effects 0 + productdummy Includes product fixed effects;
3rd block Random coefficients price + sugar Characteristics with consumer heterogeneity (taste variation).
4th block Instruments 0 + IV1 + IV2 + ... + IV20 Excluded instruments used to identify endogenous regressors (e.g., price).

Full Syntax Pattern: \[ \texttt{share ~ 1st block | 2nd block | 3rd block | 4th block} \]

library(BLPestimatoR)
library(tidyverse)

nevos_model <- as.formula("share ~  price + productdummy |
    0 + productdummy |
    price + sugar|
    0+ IV1 + IV2 + IV3 + IV4 + IV5 + IV6 + IV7 + IV8 + IV9 + IV10 + 
    IV11 + IV12 + IV13 + IV14 + IV15 + IV16 + IV17 + IV18 + IV19 + IV20")

2.1.1.1 Recall the algorithm

\[ \delta_{jm}^{l+1} = \delta_{jm}^{l} + \log(S_{jm}) - \log(s_{jm})\]

We need to provide \(\delta_{jm}^{1}\), the initial guess to find \(\delta_{jm}\)

productData_cereal$startingGuessesDelta <- c(log(w_guesses_cereal)) # include orig. draws in the product data

2.1.1.2 Create BLP Data Object

The BLP_data() function prepares all data structures required for estimating the BLP or Nevo-style demand model.
It organizes product-level data, market identifiers, integration draws, and estimation controls into a format used by estimateBLP().


Function Call Structure

\[ \texttt{BLP data(model, market identifier, product identifier, productData, ...)} \]

This call initializes the dataset and simulation parameters used in the random-coefficients estimation.

Argument Role Description
model Structure Utility specification, random coeffs, instruments
market_identifier Market grouping Defines each market \(m\)
product_identifier Product grouping Links same product across markets
productData Input data Product-level observations
integration_method Simulation Controls random-draw integration
blp_inner_tol, blp_inner_maxit Numerical settings Berry inversion convergence
integration_accuracy, integration_seed Monte Carlo control Number and reproducibility of consumer draws
cereal_data2 <- BLP_data(
  model = nevos_model,
  market_identifier = "cdid",
  product_identifier = "product_id",
  productData = productData_cereal,
  integration_method = "MLHS",
  blp_inner_tol = 1e-6, blp_inner_maxit = 5000,
  integration_accuracy = 20, integration_seed = 213
)
## Mean utility (variable name: `delta`) is initialized with 0 because of missing or invalid par_delta argument.

2.1.1.3 Estimate the model

blp_est1<-estimateBLP(blp_data = cereal_data2, printLevel = 1)
## blp_data were prepared with the following arguments:
## BLP_data(model = nevos_model, market_identifier = "cdid", product_identifier = "product_id", 
##     productData = productData_cereal, integration_accuracy = 20, 
##     integration_method = "MLHS", integration_seed = 213, blp_inner_tol = 1e-06, 
##     blp_inner_maxit = 5000)
## Starting a BLP demand estimation with  2256  observations in  94  markets...
## [integration::method  integration::amountDraws 20 ]
## [blp::inner_tol 1e-06  blp::inner_maxit 5000 ]
## [solver::method BFGS  solver::maxit 10000  solver::reltol 1e-06 ]
## gmm objective: 189.9435
## gmm objective: 77087.92
## gmm objective: 4111.535
## gmm objective: 374.4437
## gmm objective: 198.9622
## gmm objective: 190.1912
## gmm objective: 189.9244
## gmm objective: 7077.797
## gmm objective: 488.2191
## gmm objective: 200.5563
## gmm objective: 189.7251
## gmm objective: 189.0409
## gmm objective: 188.8877
## gmm objective: 188.8877
## gmm objective: 814.3173
## gmm objective: 220.147
## gmm objective: 190.1319
## gmm objective: 188.9366
## gmm objective: 188.8897
## gmm objective: 188.8879
## gmm objective: 188.8877
## gmm objective: 188.8877
## gmm objective: 188.8877
## gmm objective: 188.8876
## ------------------------------------------ 
## Solver message: Successful convergence 
## ------------------------------------------ 
## Final GMM evaluation at optimal parameters: 
## gmm objective: 188.888
##   theta (RC): -0.07 1.07 0 
##   theta (demogr.):  
##   inner iterations: 42 
##   gradient: -4e-04 0 -0.0266 
## Using the heteroskedastic asymptotic variance-covariance matrix...

2.1.1.4 Summarize

summary(blp_est1)
## 
## Data information:
## 
##   94 market(s) with 2256 products 
##   25 linear coefficient(s) (24 exogenous coefficients) 
##   3 non-linear parameters related to random coefficients 
##   0 demographic variable(s) 
## 
## Estimation results:
## 
##  Linear Coefficients
##                          Estimate Std. Error    t value      Pr(>|t|)
## (Intercept)            -1.7687979  0.2614769  -6.764642  1.336388e-11
## price                 -30.1654692  2.2084233 -13.659279  1.777356e-42
## productdummyproduct10   1.8589449  0.1596595  11.643185  2.485575e-31
## productdummyproduct11   1.7629234  0.1393468  12.651334  1.099918e-36
## productdummyproduct12  -0.2872553  0.1599158  -1.796291  7.244819e-02
## productdummyproduct13   2.2497007  0.1497871  15.019324  5.486406e-51
## productdummyproduct14   1.7795564  0.1497489  11.883602  1.440238e-32
## productdummyproduct15   2.1594184  0.1446350  14.930118  2.098931e-50
## productdummyproduct16   3.6872370  0.1699590  21.694866 2.293909e-104
## productdummyproduct17   0.7886869  0.1573699   5.011677  5.395781e-07
## productdummyproduct18   0.8752717  0.1525111   5.739069  9.519846e-09
## productdummyproduct19   1.7679247  0.1446626  12.221021  2.400493e-34
## productdummyproduct2    2.3666780  0.1524484  15.524457  2.370036e-54
## productdummyproduct20   2.8078466  0.1501105  18.705203  4.490181e-78
## productdummyproduct21   2.6795586  0.1547226  17.318472  3.413003e-67
## productdummyproduct22   0.4724688  0.1446494   3.266303  1.089617e-03
## productdummyproduct23   1.3757516  0.1429032   9.627158  6.140419e-22
## productdummyproduct24   2.1723598  0.1596999  13.602763  3.855876e-42
## productdummyproduct3    1.8020360  0.1492630  12.072893  1.468775e-33
## productdummyproduct4    0.7641588  0.1380172   5.536695  3.082334e-08
## 
## ...
## 
##  5 estimates are omitted. They are available in the LinCoefficients generated by summary.
## 
##  Random Coefficients
##                           Estimate  Std. Error      t value  Pr(>|t|)
## unobs_sd*(Intercept) -0.0709765270  1.10554060 -0.064200742 0.9488104
## unobs_sd*price        1.0705189927 12.83936039  0.083377907 0.9335511
## unobs_sd*sugar       -0.0007739994  0.09003039 -0.008597091 0.9931406
## 
##  Wald Test
## 0.0306 on  3 DF, p-value: 0.998586809404042 
## 
## Computational Details: 
##   Solver converged with 24 iterations to a minimum at 188.888 .
##   Local minima check: NA
##   stopping criterion outer loop: 1e-06
##   stopping criterion inner loop: 1e-06
##   Market shares are integrated with MLHS and 20 draws. 
##   Method for standard errors: heteroskedastic

2.2 Case 2: Recover average slopes.

In the cereal data set, the level of sugar and also mushy does not vary within a product. Therefore, product dummy absorbs all the mean slopes associated with sugar and mushy. If one wishes to look at mean slopes then the following formula should be used:

    1. Declare new formula
nevos_model2 <- as.formula("share ~  price + sugar |
    0 + sugar |
    price + sugar|
    0+ IV1 + IV2 + IV3 + IV4 + IV5 + IV6 + IV7 + IV8 + IV9 + IV10 + 
    IV11 + IV12 + IV13 + IV14 + IV15 + IV16 + IV17 + IV18 + IV19 + IV20")
    1. Declare new blp data
cereal_data3 <- BLP_data(
  model = nevos_model2,
  market_identifier = "cdid",
  product_identifier = "product_id",
  productData = productData_cereal,
  integration_method = "MLHS",
  blp_inner_tol = 1e-6, blp_inner_maxit = 5000,
  integration_accuracy = 20, integration_seed = 213
)
## Mean utility (variable name: `delta`) is initialized with 0 because of missing or invalid par_delta argument.
    1. Estimate
blp_est1<-estimateBLP(blp_data = cereal_data3, printLevel = 1)
## blp_data were prepared with the following arguments:
## BLP_data(model = nevos_model2, market_identifier = "cdid", product_identifier = "product_id", 
##     productData = productData_cereal, integration_accuracy = 20, 
##     integration_method = "MLHS", integration_seed = 213, blp_inner_tol = 1e-06, 
##     blp_inner_maxit = 5000)
## Starting a BLP demand estimation with  2256  observations in  94  markets...
## [integration::method  integration::amountDraws 20 ]
## [blp::inner_tol 1e-06  blp::inner_maxit 5000 ]
## [solver::method BFGS  solver::maxit 10000  solver::reltol 1e-06 ]
## gmm objective: 265.3415
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: 67324.66
## gmm objective: 2711.537
## gmm objective: 310.3391
## gmm objective: 262.4372
## gmm objective: 369.6278
## gmm objective: 263.1742
## gmm objective: 261.749
## gmm objective: 7861.726
## gmm objective: 510.7391
## gmm objective: 265.0589
## gmm objective: 261.252
## gmm objective: 259.1477
## gmm objective: 259.1314
## gmm objective: 259.1298
## gmm objective: 259.13
## gmm objective: 259.1302
## gmm objective: 259.1302
## gmm objective: 259.1299
## gmm objective: 259.1299
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 259.13
## gmm objective: 5397.46
## gmm objective: 472.4613
## gmm objective: 265.9145
## gmm objective: 259.3404
## gmm objective: 259.1366
## gmm objective: 259.13
## ------------------------------------------ 
## Solver message: Successful convergence 
## ------------------------------------------ 
## Final GMM evaluation at optimal parameters: 
## gmm objective: 259.1303
##   theta (RC): -0.11 2.06 -0.05 
##   theta (demogr.):  
##   inner iterations: 44 
##   gradient: -0.0023 -0.0083 -0.9639 
## Using the heteroskedastic asymptotic variance-covariance matrix...
    1. Print output
summary(blp_est1)
## 
## Data information:
## 
##   94 market(s) with 2256 products 
##   3 linear coefficient(s) (1 exogenous coefficients) 
##   3 non-linear parameters related to random coefficients 
##   0 demographic variable(s) 
## 
## Estimation results:
## 
##  Linear Coefficients
##                 Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  -2.61506294 0.40981595 -6.381067 1.758586e-10
## price       -12.52062456 3.78100238 -3.311456 9.281170e-04
## sugar         0.03557057 0.01105716  3.216972 1.295512e-03
## 
##  Random Coefficients
##                         Estimate  Std. Error     t value   Pr(>|t|)
## unobs_sd*(Intercept) -0.10635106  1.38068597 -0.07702769 0.93860151
## unobs_sd*price        2.06074915 12.35635362  0.16677648 0.86754593
## unobs_sd*sugar       -0.05140232  0.02298225 -2.23660940 0.02531188
## 
##  Wald Test
## 5.6437 on  3 DF, p-value: 0.13029014550078 
## 
## Computational Details: 
##   Solver converged with 42 iterations to a minimum at 259.1303 .
##   Local minima check: NA
##   stopping criterion outer loop: 1e-06
##   stopping criterion inner loop: 1e-06
##   Market shares are integrated with MLHS and 20 draws. 
##   Method for standard errors: heteroskedastic

2.3 Case 3: With Demographics

\[ \begin{bmatrix} \alpha_{im}\\ \beta_{im} \end{bmatrix} = \begin{bmatrix} \alpha_{0}\\ \beta_{0} \end{bmatrix} + \Pi_{(K+1)}^{Income} Income_{rim} + \Pi_{(K+1)}^{Incomesq} Incomesq_{rim} +\nu_{im} \]

2.3.1 Loading Demogrpahics Data

The package provides demographicData_cereal which is a list of matrices. Each matrix has the same dimensions, where rows map to markets and columns map to a pseudo consumer. Matrices contain draws of consumer demographics from the demographic distribution observed at the market level. Below I show what these individual matrices look like for income, income square, and age.

demographicData_cereal$income[1:4, 1:5]
##       cdid     draw_1     draw_2     draw_3      draw_4
## 1 market_1 0.49512349  0.3787622  0.1050146 -1.48548093
## 2 market_2 0.05389113 -1.5030833  0.6217917  0.26922901
## 3 market_3 0.62459788 -0.2242698 -0.2846169  0.67845943
## 4 market_4 0.94419851  0.4056359  0.5859923  0.01813051
demographicData_cereal$incomesq[1:4, 1:5]
##       cdid     draw_1     draw_2    draw_3      draw_4
## 1 market_1  8.3313042   6.121865  1.030803 -25.5836047
## 2 market_2  0.0966346 -25.849845 10.767233   4.0668173
## 3 market_3 10.8215619  -4.894545 -5.956954  11.8673879
## 4 market_4 17.1121550   6.629730 10.075529  -0.5537043
demographicData_cereal$age[1:4, 1:5]
##       cdid     draw_1     draw_2       draw_3     draw_4
## 1 market_1 -0.2301090 -2.5326941 -0.006965458 -0.8279460
## 2 market_2  0.1414545 -0.1813188 -1.279931134 -0.4532526
## 3 market_3 -0.1813188 -0.5867840  0.208145922 -0.4532526
## 4 market_4  0.7444506  0.1414545  0.645359728  0.8346017

For our model specification, we only need income and income square demographics. So we will create a new list of matrices that contains only these two.

demData_cereal<-list()
demData_cereal[["income"]]<-demographicData_cereal[[1]]
demData_cereal[["incomesq"]]<-demographicData_cereal[[2]]

2.3.2 Loading Unobserved Taste shocks

The package provides originalDraws_cereal which is also a list of matrices. Each matrix has the same dimensions, where rows map to markets and columns map to a pseudo consumer. Matrices contain draws of consumer’s unobserved taste shocks to their taste parameter/slope coefficients associated with various product attributes. Below I show what these individual matrices look like for price and sugar.

originalDraws_cereal$price[1:4, 1:5]
##       cdid     draw_1     draw_2     draw_3     draw_4
## 1 market_1 -1.5008378  0.1331817 -0.1382405  1.2571357
## 2 market_2  0.0276477 -0.8414414 -0.9056861 -2.0306179
## 3 market_3 -0.4768085 -0.9348689  1.4300456  2.6188581
## 4 market_4  0.5396019  0.1698178  0.7128376 -0.1216309
originalDraws_cereal$sugar[1:4, 1:5]
##       cdid     draw_1     draw_2     draw_3     draw_4
## 1 market_1 -1.1510786 -0.5007498  0.7974412 -0.6830540
## 2 market_2  1.0451218  1.0170277  2.3012550  0.1490338
## 3 market_3 -1.0187809  0.3107185 -0.9531532  0.8057080
## 4 market_4 -0.5760582 -0.5072059 -0.7496653  1.1163544

For our taste shock coefficients we will only need unobserved taste shocks for the intercept term, price’s slope coefficient, and sugar’s slope coefficient. So we create a new list for that as well.

origD<- list()
origD[["(Intercept)"]]<-originalDraws_cereal[["constant"]]
origD[["price"]]<-originalDraws_cereal[["price"]]
origD[["sugar"]]<-originalDraws_cereal[["sugar"]]

2.3.3 Creating BLP_data object with Demographics

Under custom demographics, we cannot simply use the default options for many things anymore. Therefore we use a more customized BLP_data object. The arguments and the types of values that go within it are state below:

Argument Specification Description / Role
model nevos_model Formula defining mean utility terms, fixed effects, random coefficients, and instruments.
market_identifier "cdid" Variable identifying distinct markets (e.g., city–year or region–time).
par_delta "startingGuessesDelta" Initial guess for mean utility \(\delta_{jm}^{(0)}\); starting values for Berry inversion.
product_identifier "product_id" Unique identifier for products across markets; links same product across multiple markets.
productData productData_cereal Product-level dataset containing shares, prices, characteristics, and instruments.
demographic_draws demData_cereal List of simulated demographic variables (e.g., income, income²) for consumer heterogeneity.
blp_inner_tol 1e-6 Convergence tolerance for the Berry contraction mapping.
blp_inner_maxit 5000 Maximum iterations for the inner-loop (share inversion).
integration_draws origD Random draws representing simulated consumer taste coefficients.
integration_weights rep(1 / 20, 20) Equal weights assigned to 20 simulated consumers per market for numerical integration.
cereal_data <- BLP_data(
  model = nevos_model,
  market_identifier = "cdid",
  par_delta = "startingGuessesDelta",
  product_identifier = "product_id",
  productData = productData_cereal,
  demographic_draws = demData_cereal,
  blp_inner_tol = 1e-6, blp_inner_maxit = 5000,
  integration_draws = origD,
  integration_weights = rep(1 / 20, 20)
)

2.3.4 Setting Initial Guesses and Restrictions for the \(\Pi\) Matrix

The matrix \(\Pi\) (often denoted as theta2 in BLPestimatoR) governs how demographic variables interact with consumers’ taste coefficients.
Each element represents how a demographic variable (e.g., income, income²) shifts a consumer’s sensitivity to price or product attributes.


  1. Step 1 — Create Initial Guess for \(\Pi\)

We first provide an initial guess for the \(3 \times 3\) matrix theta_guesses_cereal, where:

  • Rows correspond to random coefficients:
    \((\text{Intercept}), \text{Price}, \text{Sugar})\)
  • Columns correspond to demographic or heterogeneity terms:
    \((\text{unobs_sd}, \text{income}, \text{income}^2)\)

Mathematically, the random coefficient vector for consumer \(i\) is:

\[ \tilde{\beta}_{im} = \tilde{\beta} + \Pi D_{im} + \Sigma \tilde{\nu}_{im} \]

where:

  • \(D_i\) are demographic variables (income, income²),
  • \(\Pi\) captures how demographics shift mean preferences,
  • \(\Sigma \nu_i\) represents purely random taste variation.
  • \(\tilde{\beta}=(\alpha,\beta)\) and \(\tilde{\beta}_{im}=(\alpha_{im},\beta_{im})\)
  • \(\nu_{im} = \Sigma \tilde{\nu}_{im}\)

We use exp(rnorm(...)) to ensure positive starting values for numerical stability during estimation.

set.seed(123)
theta_guesses_cereal<-matrix(data = exp(rnorm(9, mean = 0, sd = 1)),nrow = 3,ncol = 3)
colnames(theta_guesses_cereal) <- c("unobs_sd", "income", "incomesq")
rownames(theta_guesses_cereal) <- c("(Intercept)", "price", "sugar")

# correctly named:
theta_guesses_cereal
##              unobs_sd   income  incomesq
## (Intercept) 0.5709374 1.073054 1.5855260
## price       0.7943926 1.138018 0.2822220
## sugar       4.7526783 5.557037 0.5031571

  1. Step 2 — Impose Parameter Restrictions

To restrict the demographic influence of income² only to the price coefficient, we set the corresponding elements of \(\Pi\) to NA:

\[ \Pi = \begin{bmatrix} \Pi_{(Intercept), \text{unobs_sd}} & \Pi_{(Intercept), \text{income}} & \text{NA} \\ \Pi_{(\text{price}), \text{unobs_sd}} & \Pi_{(\text{price}), \text{income}} & \Pi_{(\text{price}), \text{income}^2} \\ \Pi_{(\text{sugar}), \text{unobs_sd}} & \Pi_{(\text{sugar}), \text{income}} & \text{NA} \end{bmatrix} \]

This restriction means:

  • The intercept and sugar coefficients do not vary with income².
  • Only the price sensitivity is affected by the squared income term, allowing richer heterogeneity in price preferences while keeping other effects simple.

Such restrictions improve identification and help control over-parameterization, especially when demographic data are limited.

theta_guesses_cereal[c(1,3),3]<-NA
theta_guesses_cereal
##              unobs_sd   income incomesq
## (Intercept) 0.5709374 1.073054       NA
## price       0.7943926 1.138018 0.282222
## sugar       4.7526783 5.557037       NA

  1. Estimate

Argument details of estimateBLP:

Argument Specification Description / Role
blp_data cereal_data BLP dataset object created via BLP_data(); contains product, market, demographic, and integration information.
par_theta2 theta_guesses_cereal Initial parameter matrix for random-coefficient standard deviations and demographic interactions.
solver_method "BFGS" Optimization algorithm used to minimize the GMM objective function.
solver_maxit 1000 Maximum number of iterations allowed for the outer-loop optimization.
solver_reltol 1e-6 Convergence tolerance for the solver (relative change in objective).
standardError "heteroskedastic" Type of standard errors computed for parameter estimates.
extremumCheck FALSE Disables additional local extremum checking (set TRUE for robustness tests).
printLevel 1 Controls output verbosity during estimation; 1 prints basic iteration progress.
cereal_est <- estimateBLP(
  blp_data = cereal_data,
  par_theta2 = theta_guesses_cereal,
  solver_method = "BFGS", solver_maxit = 1000, solver_reltol = 1e-6,
  standardError = "heteroskedastic",
  extremumCheck = FALSE,
  printLevel = 1
)
## blp_data were prepared with the following arguments:
## BLP_data(model = nevos_model, market_identifier = "cdid", product_identifier = "product_id", 
##     par_delta = "startingGuessesDelta", productData = productData_cereal, 
##     demographic_draws = demData_cereal, integration_draws = origD, 
##     integration_weights = rep(1/20, 20), blp_inner_tol = 1e-06, 
##     blp_inner_maxit = 5000)
## Starting a BLP demand estimation with  2256  observations in  94  markets...
## [integration::method  integration::amountDraws 20 ]
## [blp::inner_tol 1e-06  blp::inner_maxit 5000 ]
## [solver::method BFGS  solver::maxit 1000  solver::reltol 1e-06 ]
## gmm objective: 409853.9
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: 59085.55
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: 133936.3
## gmm objective: 37150.72
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: 34319.97
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: 36310.27
## gmm objective: 33184.99
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: 26425.06
## gmm objective: 38235.71
## gmm objective: 15882.64
## gmm objective: 173386.3
## gmm objective: 11165.15
## gmm objective: 37495.67
## gmm objective: 7910.851
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: 11274.85
## gmm objective: 6288.903
## gmm objective: 161823.9
## gmm objective: 10116.64
## gmm objective: 5823.248
## gmm objective: 3296.159
## gmm objective: 1179.461
## gmm objective: 841.7361
## gmm objective: 442.4041
## gmm objective: 278.5394
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: 19988.23
## gmm objective: 561.3904
## gmm objective: 242.6125
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: 365698.8
## gmm objective: 14908.54
## gmm objective: 821.9894
## gmm objective: 239.5668
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: Inf [delta contains NaN's] 
## gmm objective: 21905.76
## gmm objective: 231.3619
## gmm objective: 99891.74
## gmm objective: 13111.89
## gmm objective: 300.556
## gmm objective: 220.7689
## gmm objective: 379.5581
## gmm objective: 145.2948
## gmm objective: 263.8472
## gmm objective: 85.4407
## gmm objective: 112.9728
## gmm objective: 64.5782
## gmm objective: 430.8553
## gmm objective: 46.8967
## gmm objective: 41.7351
## gmm objective: 39.0365
## gmm objective: 38.9347
## gmm objective: 38.8817
## gmm objective: 38.8691
## gmm objective: 38.8596
## gmm objective: 38.8568
## gmm objective: 4729.26
## gmm objective: 271.864
## gmm objective: 49.528
## gmm objective: 39.2719
## gmm objective: 38.8717
## gmm objective: 38.8573
## gmm objective: 38.8569
## gmm objective: 38.857
## gmm objective: 38.8567
## gmm objective: 9432.379
## gmm objective: 428.6707
## gmm objective: 50.0763
## gmm objective: 39.2248
## gmm objective: 38.8699
## gmm objective: 38.8569
## gmm objective: 38.8564
## gmm objective: 42.0326
## gmm objective: 38.9724
## gmm objective: 38.8604
## gmm objective: 38.8567
## gmm objective: 38.8569
## gmm objective: 38.857
## gmm objective: 38.857
## gmm objective: 38.8569
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 4809.167
## gmm objective: 282.285
## gmm objective: 53.5139
## gmm objective: 39.5282
## gmm objective: 38.8821
## gmm objective: 38.8572
## gmm objective: 38.8567
## gmm objective: 38.8567
## gmm objective: 38.8566
## gmm objective: 38.8567
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## gmm objective: 38.8568
## ------------------------------------------ 
## Solver message: Successful convergence 
## ------------------------------------------ 
## Final GMM evaluation at optimal parameters: 
## gmm objective: 38.8568
##   theta (RC): 0.26 2.09 -0.01 
##   theta (demogr.): 3.5 22.24 -0.16 0 -1.14 0 
##   inner iterations: 62 
##   gradient: 0.1905 -0.0055 0.9096 -0.1059 -0.0539 -0.9763 -0.2801 
## Using the heteroskedastic asymptotic variance-covariance matrix...

  1. View the output
summary(cereal_est)
## 
## Data information:
## 
##   94 market(s) with 2256 products 
##   25 linear coefficient(s) (24 exogenous coefficients) 
##   7 non-linear parameters related to random coefficients 
##   2 demographic variable(s) 
## 
## Estimation results:
## 
##  Linear Coefficients
##                           Estimate Std. Error    t value     Pr(>|t|)
## (Intercept)            -2.70109899  0.3193686 -8.4576218 2.728854e-17
## price                 -31.45917696  4.0287222 -7.8087234 5.777015e-15
## productdummyproduct10   1.73559033  0.1302492 13.3251484 1.653038e-40
## productdummyproduct11   2.63298336  0.2033102 12.9505690 2.332191e-38
## productdummyproduct12  -0.04283125  0.1391659 -0.3077712 7.582565e-01
## productdummyproduct13   2.37639536  0.1332336 17.8363110 3.692855e-71
## productdummyproduct14   2.76749898  0.2225029 12.4380362 1.624489e-35
## productdummyproduct15   3.14831633  0.2191466 14.3662600 8.426017e-47
## productdummyproduct16   4.80113843  0.2450446 19.5929143 1.777201e-85
## productdummyproduct17   1.59050045  0.2001071  7.9482468 1.891696e-15
## productdummyproduct18   1.00227867  0.1382094  7.2518841 4.110134e-13
## productdummyproduct19   2.90487567  0.2508688 11.5792603 5.249709e-31
## productdummyproduct2    3.50550584  0.2446810 14.3268405 1.487348e-46
## productdummyproduct20   3.36075421  0.1707321 19.6843694 2.935675e-86
## productdummyproduct21   3.71814879  0.2286113 16.2640640 1.775598e-59
## productdummyproduct22   0.92917911  0.1506774  6.1666784 6.973939e-10
## productdummyproduct23   2.30561396  0.2134405 10.8021411 3.362692e-27
## productdummyproduct24   1.91457256  0.1473704 12.9915688 1.365979e-38
## productdummyproduct3    2.04504691  0.1288559 15.8708093 1.009310e-56
## productdummyproduct4    0.89165724  0.1170136  7.6201176 2.534446e-14
## 
## ...
## 
##  5 estimates are omitted. They are available in the LinCoefficients generated by summary.
## 
##  Random Coefficients
##                          Estimate  Std. Error    t value     Pr(>|t|)
## unobs_sd*(Intercept)  0.256593310  0.10736535  2.3899079 1.685260e-02
## unobs_sd*price        2.092086121  0.78942708  2.6501322 8.046028e-03
## unobs_sd*sugar       -0.007949391  0.01052495 -0.7552902 4.500749e-01
## income*(Intercept)    3.503446580  0.62327280  5.6210484 1.898021e-08
## income*price         22.242228769 79.50494383  0.2797591 7.796624e-01
## income*sugar         -0.163490348  0.02739416 -5.9680736 2.400710e-09
## incomesq*price       -1.144111432  4.12066745 -0.2776520 7.812795e-01
## 
##  Wald Test
## 180.0485 on  7 DF, p-value: 1.90277006974e-35 
## 
## Computational Details: 
##   Solver converged with 140 iterations to a minimum at 38.8568 .
##   Local minima check: NA
##   stopping criterion outer loop: 1e-06
##   stopping criterion inner loop: 1e-06
##   Market shares are integrated with provided_by_user and 20 draws. 
##   Method for standard errors: heteroskedastic