连续数据环境下的道路交通事故风险预测模型 Road Crash Risk Prediction Model for Continuous Streaming Data Environment期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

连续数据环境下的道路交通事故风险预测模型

引用本文：	高珍,高屹,余荣杰,黄智强,王雪松.连续数据环境下的道路交通事故风险预测模型[J].中国公路学报,2018,31(4):280-287.

作者姓名：	高珍高屹余荣杰黄智强王雪松

作者单位：	1. 同济大学软件学院, 上海 201804;2. 同济大学道路与交通工程教育部重点实验室, 上海 201804

基金项目：	国家自然科学基金项目（71401127，51522810）；上海市科学技术委员会项目（15DZ1204800）

摘要：	针对现有研究多基于病例对照的欠采样方法，即每起事故从连续交通流数据中按一定比例抽取对照的非事故数据构建模型，而该类模型在连续数据环境中的预测精度存在缺陷的状况，对城市交通连续观测并动态调控的技术环境（简称连续数据环境）开展道路交通事故风险预测模型构建研究。首先提出基于全样本交通流数据，结合“调整事故分类阈值”的方法解决事故风险预测研究中的非平衡数据分类问题；而后采用上海市城市快速路2014年5，6月的线圈检测交通流数据及历史事故数据开展实证研究，以受试者工作特征曲线下面积为评价指标，对比基于全样本和抽样样本构建的常用事故风险预测模型（逻辑回归、随机森林）的整体预测能力；以灵敏度和特异度的几何均数为评价指标，对比3种分类阈值计算方式（约登指数法、事故占比法和交叉点法）对事故/非事故综合预测精度的影响。结果表明：在连续数据环境下，采用全样本数据建模能使模型整体预测能力提高13.06%；基于约登指数法进行分类阈值计算可使模型的事故/非事故综合预测精度最佳。
关键词：	交通工程连续数据环境事故风险预测模型非平衡数据二分类阈值城市快速路
收稿时间：	2017-09-28
Road Crash Risk Prediction Model for Continuous Streaming Data Environment

GAO Zhen,GAO Yi,YU Rong-jie,HUANG Zhi-qiang,WANG Xue-song.Road Crash Risk Prediction Model for Continuous Streaming Data Environment[J].China Journal of Highway and Transport,2018,31(4):280-287.

Authors:	GAO Zhen GAO Yi YU Rong-jie HUANG Zhi-qiang WANG Xue-song

Institution:	1. School of Software Engineering, Tongji University, Shanghai 201804, China;2. Key Laboratory of Road and Traffic Engineering, Ministry of Education, Tongji University, Shanghai 201804, China

Abstract:	This paper describes research on a road crash risk prediction model for a continuous observation and dynamic management environment (called a continuous data environment) in an active traffic management (ATM) system. A traffic crash is an event with a small probability, and the ratio of crashes to non-crash cases in crash risk prediction research is not coordinated, and therefore poses the issue of an imbalanced data classification. To build a crash risk prediction model, existing research has been mostly based on a "matched case-control" under-sampling method to extract non-crash cases from continuous traffic flow data at a certain proportion; thus, the prediction accuracy of the model in a continuous data environment is inadequate. The research proposes utilizing a full set of traffic flow data to build a model and avoid an imbalanced data classification by "adjusting the classification threshold to discriminate crashes from non-crashes." The loop detector data and crash history data of the Shanghai expressway system from May to June 2014 were used experimentally. The area under an ROC curve (AUC) was used as an index to compare the commonly used crash risk prediction model (using logistic regression and random forest algorithms) based on the full set of data and the sample data respectively. The influence of three different classification thresholds (Youden's index, the crash occupancy, and the cross point method) on the comprehensive prediction accuracy of a crash and non-crash was compared using the geometric mean of sensitivity and specificity as the indices. The results show that, in a continuous data environment, the model with a full set of data improves the overall prediction capability by 13.06%. Youden's index method for the classification threshold calculation increases the optimal comprehensive prediction accuracy of crash and non-crash cases.

Keywords:	traffic engineering continuous data environment crash risk prediction model imbalanced data binary classification threshold urban expressway
本文献已被 CNKI 等数据库收录！
	点击此处可从《中国公路学报》浏览原始摘要信息
	点击此处可从《中国公路学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏