Generation of synthetic datasets for discrete choice analysis |
| |
Authors: | Laurie A Garrow Tudor D Bodea Misuk Lee |
| |
Institution: | (1) School of Civil and Environmental Engineering, Georgia Institute of Technology, 790 Atlantic Drive, Atlanta, GA 30332-0355, USA;(2) InterContinental Hotels Group, Atlanta, GA, USA |
| |
Abstract: | Despite the widespread use of synthetic data in discrete choice analysis, little is known about how the methodology used to
generate synthetic datasets influences the properties of parameter estimates and the validity of results based on these estimates.
That is, there are two potential sources of biases when using synthetic discrete choice data: (1) bias due to the method used
to generate the dataset; and, (2) bias due to parameter estimation. The primary objective of this study is to examine bias
due to the underlying data generation method. This study compares three methods for generating synthetic datasets and uses
design of experiments and analysis of variance methods to investigate the ability to recover estimates for “true” logsum parameters
for nested logit models. The method that uses nested logit probabilities to generate the chosen alternative results in unbiased
parameter estimates. The method that is based on Gumbel error component approximations reveals that while the error components
themselves are unbiased, subtle empirical identification problems can arise when these error components are combined with
synthetically generated utility functions. The method that is based on normal error component approximations reveals that
all logsum coefficients are biased upwards; the bias dramatically increases for those nests that have a low choice frequency
and is most pronounced for those nests with high correlations among alternatives. Based on the results of the analysis, several
recommendations for the generation of synthetic datasets for discrete choice analyses are provided. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|