Models for Injury Count Data in the U.S. National Health Interview Survey

Peng, Jin and Lyu, Tianmeng and Shi, Junxin and Nagaraja, Haikady N. and Xiang, Huiyun (2014) Models for Injury Count Data in the U.S. National Health Interview Survey. Journal of Scientific Research and Reports, 3 (17). pp. 2286-2302. ISSN 23200227

Text
Peng3172014JSRR9490.pdf - Published Version
Download (4MB)

Official URL: https://doi.org/10.9734/JSRR/2014/9490

Abstract

Aims: To examine the best count data model for injury data in the National Health Interview Survey (NHIS). To compare the best count data model with traditional logistic regression model in analyzing injury data in NHIS.
Data Source: 2006-2010 medically consulted non-occupational injury data from National Health Interview Survey (NHIS).
Methodology: Six count data models (Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated NB (ZINB), hurdle Poisson (HP), and hurdle NB (HNB)) were compared using Likelihood Ratio (LR) test and Vuong test. Injury count was used as the dependent variable in count data models. Independent variables included age, gender, marital status, race, education, poverty status, disability status and medical insurance coverage status. Dichotomized injury count was used as the dependent variable in logistic regression model. The same independent variables used in count data models were included in logistic regression model. The model fit of logistic regression was examined by Hosmer and Lemeshow goodness of fit test.
Results: Among 248,850 participants aged 18-64, 98.37% have no medically consulted non-occupational injuries, 1.55% have 1 medically consulted non-occupational injury, 0.07% have 2 or more medically consulted non-occupational injuries. Zero-inflated negative binomial (ZINB) model offered the best fit. Logistic regression model provided a good fit but resulted in different estimates from ZINB model.
Conclusion: Zero-inflated negative binomial (ZINB) model demonstrated the potential to be the best model for injury count data with excess zeros. Given the infrequent occurrence of multiple injuries in our data, the logistic regression model is appropriate for assessing injury burden and identifying injury risks. However, for more frequently-occurring injuries (e.g. sports injuries), logistic regression may undercount the total number of injuries and result in biased estimates. The evaluation procedure and model selection criteria presented in this paper provide a useful approach to modeling injury count data with excess zeros.

Item Type:	Article
Subjects:	Middle Asian Archive > Multidisciplinary
Depositing User:	Managing Editor
Date Deposited:	13 Jul 2023 04:34
Last Modified:	14 Aug 2025 03:32
URI:	http://peerreview.go2articles.com/id/eprint/821

Actions (login required)

: View Item