A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data |
| |
Affiliation: | 1. Zachry Department of Civil Engineering, Texas A&M University, College Station, TX 77843, United States;2. Perceptron Learning Solutions Pvt Ltd, Bengaluru, India;3. Texas A&M Transportation Institute, Texas A&M University, College Station, TX 77843, United States;1. MTA Eötvös University, EIRSA “Lendulet” Astrophysics Research Group, Budapest 1117, Hungary;2. Department of Zoology, University of Oxford, Tinbergen Building, South Parks Road, Oxford, OX1 3PS, United Kingdom;3. Universitäts-Sternwarte München, Scheinerstrasse 1, D-81679, München, Germany;4. Arizona State University, 873701, Tempe, AZ 85287-3701, USA;5. Jet Propulsion Laboratory, 4800 Oak Grove Dr., Pasadena, CA 91109, USA;6. Department of Computer Science, University of Houston, 4800 Calhoun Rd., Houston TX 77204-3010, USA;7. INAF — Osservatorio Astronomico di Trieste, via G. Tiepolo 11, 34135 Trieste, Italy;8. Leibniz Institute for Astrophysics, An der Sternwarte 16, 14482 Potsdam, Germany;9. SISSA — Scuola Internazionale Superiore di Studi Avanzati, Via Bonomea 265, 34136 Trieste, Italy;10. Max-Planck-Institut für Astrophysik, Karl-Schwarzschild-Str. 1, D-85748 Garching, Germany;11. Northwestern University, Evanston, IL, 60208, USA;1. Connecticut Transportation Safety Research Center, University of Connecticut, 270 Middle Turnpike, Unit 5202, Storrs, CT 06269-5202, USA;2. Department of Civil and Environmental Engineering, University of Connecticut, 261 Glenbrook Road, Unit 3037, Storrs, CT 06269-3037, USA;3. Department of Statistics, University of Connecticut, AUST 333, 215 Glenbrook Road, Storrs, CT 06269, USA;4. Connecticut Transportation Safety Research Center, Department of Civil and Environmental Engineering, University of Connecticut, Longley Building Room 144, Storrs, CT 06269, USA;1. Department of Actuarial Science, Risk Management, and Insurance, School of Business, University of Wisconsin - Madison, Madison, WI 53706, United States;2. Department of Mathematics, Michigan State University, East Lansing, MI 48824, United States;1. Center for Transportation Research, The University of Tennessee, 600 Henley Street, Knoxville, TN 37996, USA;2. Key Laboratory of Road and Traffic Engineering of the Ministry of Education, Tongji University, 4800 Cao An Road 201804, Shanghai, PR China;3. School of Automobile, Chang’an University, Nan Er Huang Zhong Duan, Xi’an 710064, Shaanxi, PR China;1. School of Civil and Building Engineering, Loughborough University, Loughborough LE11 3TU, United Kingdom;2. Zachry Department of Civil Engineering, Texas A&M University, College Station, TX 3136, United States;1. Department of Civil Engineering, College of Engineering, Gumushane University, Gumushane 29100, Turkey;2. Department of Agricultural Economics, College of Agriculture, Ataturk University, Erzurum 25240, Turkey;3. Department of Civil Engineering, College of Engineering, Ataturk University, Erzurum 25240, Turkey |
| |
Abstract: | Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel and innovative multi-parameter models to analyze such data. These multi-parameter models have been proposed for overcoming the limitations of the traditional negative binomial (NB) model, which cannot handle this kind of data efficiently. The research documented in this paper continues the work related to multi-parameter models. The objective of this paper is to document the development and application of a flexible NB generalized linear model with randomly distributed mixed effects characterized by the Dirichlet process (NB-DP) to model crash data. The objective of the study was accomplished using two datasets. The new model was compared to the NB and the recently introduced model based on the mixture of the NB and Lindley (NB-L) distributions. Overall, the research study shows that the NB-DP model offers a better performance than the NB model once data are over-dispersed and have a heavy tail. The NB-DP performed better than the NB-L when the dataset has a heavy tail, but a smaller percentage of zeros. However, both models performed similarly when the dataset contained a large amount of zeros. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data, such as the identification of outliers and sources of dispersion. |
| |
Keywords: | Negative binomial Dirichlet process Generalized linear model Crash data |
本文献已被 ScienceDirect 等数据库收录! |
|