首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于属性相关分析与聚类的铁路列车时刻表非均衡数据集预处理方法
引用本文:孔德越,周姗琪,朱建生,闫力斌,吴颖.基于属性相关分析与聚类的铁路列车时刻表非均衡数据集预处理方法[J].铁路计算机应用,2021,30(10):1-5.
作者姓名:孔德越  周姗琪  朱建生  闫力斌  吴颖
作者单位:1.中国铁道科学研究院集团有限公司 电子计算技术研究所,北京 100081
基金项目:中国国家铁路集团有限公司科技研究开发计划课题(2019F007)
摘    要:在铁路列车运行图调整日趋频繁的背景下,列车时刻表数据集具有数据量大、属性多、不同车次时刻表记录数量差异较大、相同车次时刻表记录属性值相似的特点,列车时刻表数据分析和挖掘面临着数据集不均衡问题。为此,提出基于属性相关分析与聚类的铁路列车时刻表非均衡数据集预处理方法,依据列车时刻表属性与列车运营指标(客座率)的相关分析,可有效合并蕴含冗余信息的相似数据,降低数据集中此类相似数据的占比,可削弱非均衡数据集对后续数据分析的不利影响,并能保留数据所蕴含的主要信息,减少过多相似数据对数据分析模型应用效果的不利影响,提高模型的预测准确度。

关 键 词:列车时刻表  非均衡数据集  数据预处理  相关分析  聚类处理
收稿时间:2020-12-09

Imbalanced dataset preprocessing algorithm for train timetable based on correlation analysis and clustering
Institution:1.Institute of Computing Technologies, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China2.CHINA RAILWAY, Beijing 100844, China
Abstract:Under the background of frequent adjustment of railway train operation plan, the train timetable data set is characterized by large amount of data and too many attributes, large difference in the number of timetable data records of different trains and similar attribute values of timetable data records of the same train. Therefore, train timetable data analysis and mining are faced with the problem of unbalanced data set. For this, the imbalanced dataset preprocessing algorithm for train timetables based on correlation analysis and clustering is put forward, in which based on the correlation analysis of train timetable attributes and train operation index (i.e., percentage of passenger seats utilization per train), similar data records containing redundant information can be effectively merged to reduce the proportion of such similar data records in the data set while the main information contained in similiar data records can be retained, thus weakening negative affects of the imbalanced train timetable data sets on subsequent data analysis and reducing the adverse impact of too much similar data on the application effects of the data analysis models and helping improve the prediction accuracy of the models.
Keywords:
点击此处可从《铁路计算机应用》浏览原始摘要信息
点击此处可从《铁路计算机应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号