首页 | 本学科首页   官方微博 | 高级检索  

From Twitter to detector: Real-time traffic incident detection using social media data
Affiliation:1. Department of Civil and Environmental Engineering and Heinz College, Carnegie Mellon University, Pittsburgh, PA 15213, United States;2. Department of Computer Science, University of Albany SUNY, Albany, NY 12222, United States;1. Department of Civil, Environmental, and Construction Engineering, University of Central Florida, 12800 Pegasus Drive, Orlando, FL, United States;2. School of Science and Engineering, Tulane University, New Orleans, LA, United States;1. Here Global B.V., 425 W Randolph St, Chicago, IL 60606, United States;2. Department of Civil, Structural and Environmental Engineering, Department of Industrial and Systems Engineering, The State University of New York, Buffalo, NY 14260, United States;3. Sid and Reva Dewberry Department of Civil, Environmental, and Infrastructure Engineering, George Mason University, Fairfax, VA 22030, United States;1. School of Civil Engineering, The University of Queensland, Australia;2. Faculty of Built Environment and Engineering, Queensland University of Technology, Australia;3. School of ICT, Griffith University, Brisbane, Australia
Abstract:The effectiveness of traditional incident detection is often limited by sparse sensor coverage, and reporting incidents to emergency response systems is labor-intensive. We propose to mine tweet texts to extract incident information on both highways and arterials as an efficient and cost-effective alternative to existing data sources. This paper presents a methodology to crawl, process and filter tweets that are accessible by the public for free. Tweets are acquired from Twitter using the REST API in real time. The process of adaptive data acquisition establishes a dictionary of important keywords and their combinations that can imply traffic incidents (TI). A tweet is then mapped into a high dimensional binary vector in a feature space formed by the dictionary, and classified into either TI related or not. All the TI tweets are then geocoded to determine their locations, and further classified into one of the five incident categories.We apply the methodology in two regions, the Pittsburgh and Philadelphia Metropolitan Areas. Overall, mining tweets holds great potentials to complement existing traffic incident data in a very cheap way. A small sample of tweets acquired from the Twitter API cover most of the incidents reported in the existing data set, and additional incidents can be identified through analyzing tweets text. Twitter also provides ample additional information with a reasonable coverage on arterials. A tweet that is related to TI and geocodable accounts for approximately 5% of all the acquired tweets. Of those geocodable TI tweets, 60–70% are posted by influential users (IU), namely public Twitter accounts mostly owned by public agencies and media, while the rest is contributed by individual users. There is more incident information provided by Twitter on weekends than on weekdays. Within the same day, both individuals and IUs tend to report incidents more frequently during the day time than at night, especially during traffic peak hours. Individual tweets are more likely to report incidents near the center of a city, and the volume of information significantly decays outwards from the center.
Keywords:Incident detection  Social media  Natural language processing  Geocoding  Data mining  Crowd-sourcing
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号