1 Standard Discrete Choice - using FEOLS
- 1.1 Instrument Construction
- 1.2 Estimating Discrete Choice via Standard Logit:
2 Estimating Random Coefficient

1 Standard Discrete Choice - using FEOLS

1.0.1 Load the Dataset

library(fixest)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

productData<-read_csv("C:/Users/anubh/Dropbox/UG Emp IO/My Slides/R_markdownfiles/PSET Datasets/productData_loglinear.csv")

## Rows: 4000 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): cdid, product_in_market, firm_id, region
## dbl (7): product_id, x1, x2, cost, xi, price, share
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

1.1 Instrument Construction

We construct three main sets of instruments following the BLP framework.

1.1.1 (a) BLP Instruments (Within-Market, Rivals’ Characteristics)

For each product \(j\) in market \(m\), define the leave-one-out average of rival characteristics:

\[ \bar{x}_{-j,m}^{(k)} = \frac{1}{J_m - 1} \sum_{\substack{r \in J_m \\ r \neq j}} x_{r,m}^{(k)} \]

where \(x_{r,m}^{(k)}\) denotes product characteristic \(k\) (e.g., sugar, mushy) and \(J_m\) is the number of products in market \(m\).

The BLP instrument for product \(j\) and characteristic \(k\) is:

\[ Z_{j,m}^{(k,BLP)} = |\bar{x}_{-j,m}^{(k)} - x_{j,m}^{(k)}| \]

Intuition: similarity to rival characteristics affect equilibrium prices via market competition but are uncorrelated with product-level unobserved quality.

# 2a) BLP instruments: leave-one-out sums of rivals' x1 and x2 within market (cdid)
productData <- productData %>%
  group_by(cdid) %>%
  mutate(
    blp_x1_sum = abs((sum(x1) - x1)/(length(product_id)-1) - x1),
    blp_x2_sum = abs((sum(x2) - x2)/(length(product_id)-1) - x2)
  ) %>%
  ungroup()

1.1.2 (b) Hausman–Nevo Instruments (Across-Market, Same Product)

For each product \(j\) sold in multiple markets \(m = 1,\dots,M_j\), define:

\[ \bar{p}_{-m,j} = \frac{1}{M_j - 1} \sum_{\substack{r \in M_j \\ r \neq m}} p_{r,j} \]

The Hausman–Nevo instrument is the average price of the same product in other markets:

\[ Z_{j,m}^{(HN)} = \bar{p}_{-m,j} \]

This captures firm-level pricing variation that is correlated with \(p_{j,m}\) but exogenous to market-specific demand shocks.

# 2b) Hausman-Nevo: mean price of same product in other markets (leave-market-out)
productData <- productData %>%
  group_by(product_id) %>%
  mutate(price_mean_other_markets = (sum(price) - price)/(length(cdid)-1) ) %>%
  ungroup()

1.1.3 (c) Waldfogel Instruments (Regional Demographics)

Let \(D_{r}\) denote a demographic variable (e.g., mean income) in region \(r\) that contains markets \(m \in r\).
Then the leave-one-out regional demographic mean is:

\[ \bar{D}_{-m,r} = \frac{1}{R_r - 1} \sum_{\substack{s \in r \\ s \neq m}} D_{s} \]

The Waldfogel instrument for market \(m\) is:

\[ Z_{m}^{(W)} = \bar{D}_{-m,r} \]

Demographic variation across regions affects market-level demand composition and hence equilibrium prices, but is plausibly exogenous to individual product quality.

# 2c) For walfdogel/Wolfdogel replace price with demographics. 
productData <- productData %>% 
  group_by(product_id, region) %>% 
  mutate(othr_mkt_inc = (sum(xi)-xi)/(length(cdid)-1)) %>%
  ungroup()

Summary Table

Instrument Type	Formula	Source of Variation
BLP (within-market)	\(Z_{j,m}^{(BLP)} = \|\bar{x}_{-j,m} - x_{j,m}\|\)	Distance from Rivals’ characteristics
Hausman–Nevo	\(Z_{j,m}^{(HN)} = \bar{p}_{-m,j}\)	Same product prices across markets
Waldfogel	\(Z_{m}^{(W)} = \bar{D}_{-m,r}\)	Demographic variation across regions

1.2 Estimating Discrete Choice via Standard Logit:

1.2.1 Berry Mean Utility Transformation

To linearize the discrete-choice demand system, we transform observed market shares into the Berry (1994) mean utility term.

For each product \(j\) in market \(m\):

\[ \delta_{j,m} = \ln(s_{j,m}) - \ln(s_{0,m}) \]

where: - \(s_{j,m}\) = observed market share of product \(j\) in market \(m\), - \(s_{0,m}\) = share of the outside good, computed as: \[ s_{0,m} = 1 - \sum_{j \in J_m} s_{j,m} \] with \(J_m\) being the set of products sold in market \(m\).

Interpretation

\(\delta_{j,m}\) represents the mean utility level of product \(j\) relative to the outside option.
Taking logarithmic differences yields a linear form suitable for estimation: \[ \delta_{j,m} = \mathbf{x}_{j,m}' \beta + \xi_{j,m} \] where \(\mathbf{x}_{j,m}\) are observed product characteristics, \(\beta\) are taste parameters, and \(\xi_{j,m}\) captures unobserved quality.

We begin with the simplest case, estimating the logit demand equation for a single market \(m = 1\).
This corresponds to the case where asymptotics rely on the number of products \(J_m \to \infty\), not on the number of markets.

# ---------------------------
# Compute Berry delta (log(share) - log(outside_share))
# ---------------------------
productData <- productData %>%
  group_by(cdid) %>%
  mutate(sum_sh = sum(share), 
         outside_share = pmax(1 - sum_sh, 1e-9),
         delta = log(share) - log(outside_share)) %>%
  ungroup()