首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于社交网络数据的交通突发事件识别方法
引用本文:刘昭,何赏璐,刘英舜.基于社交网络数据的交通突发事件识别方法[J].交通信息与安全,2021,39(2):53-60.
作者姓名:刘昭  何赏璐  刘英舜
作者单位:南京理工大学自动化学院 南京 210094
基金项目:江苏省自然科学基金项目BK20180486中国博士后科学基金项目2018M642257中央高校基本科研业务费专项资金30920021140
摘    要:为了从社交网络数据中挖掘出交通突发事件,研究了基于机器学习的文本识别方法。通过关键词和地点定位,利用网页爬虫“Beautiful Soup”爬取到原始文本。采用正则匹配、重复度计算以及“0-1”标记预处理原始文本。基于预处理后文本特征,研究了基于特征权重的特征词选取方法;其中,特征权重的计算综合了词语的出现频率和含有该词语的文本所占比例,通过将二者归一化并加权合并,获得训练集突发事件文本中各个无重复词语的特征权重;依据此值选择确定特征词,并用于后续分类器的输入。测试对比了不同的分类器以及特征词选择方法,结果表明,所提特征词选取方法与XGBoost分类器结合,在交通突发事件识别上具有最好的综合表现,精确率为0.679 6,召回率为0.648 1,F1值为0.663 5,AUC值为0.759 4。 

关 键 词:智能交通    社交网络数据    交通突发事件识别    文本分类    机器学习
收稿时间:2020-06-18

A Method to Identify Traffic Incidents Based on Social Network Data
LIU Zhao,HE Shanglu,LIU Yingshun.A Method to Identify Traffic Incidents Based on Social Network Data[J].Journal of Transport Information and Safety,2021,39(2):53-60.
Authors:LIU Zhao  HE Shanglu  LIU Yingshun
Institution:School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China
Abstract:A text classification method based on machine learning is studied to identify traffic incidents by mining the data from the social networks. The original texts are crawled by web crawler"Beautiful Soup"based on the keywords and location. These texts are preprocessed using regular expression matching, duplicate removing, and"0-1"mark? ing. According to the features of preprocessed texts, the paper proposes a method to select feature words based on fea? ture weights. The feature weight is calculated by normalizing, weighting, and combining the word frequency and the ratio of the text containing that word. Accordingly, the feature weight of each unique word in the training set of the traf? fic incident text can be achieved, used as a criterion for selecting feature words, and as the inputs of classifiers. A test is conducted to compare different classifiers and methods to select feature words. The results show that the proposed method to select feature words combined with the XGBoost classifier has the optimal performance, with a precision rate of 0.679 6, a recall rate of 0.648 1, an F1 value of 0.663 5, and an AUC value of 0.759 4. 
Keywords:
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《交通信息与安全》浏览原始摘要信息
点击此处可从《交通信息与安全》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号