首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
With the purpose of improving the accuracy of text categorization and reducing the dimension of the feature space,this paper proposes a two-stage feature selection method based on a novel category correlation degree(CCD)method and latent semantic indexing(LSI).In the first stage,a novel CCD method is proposed to select the most effective features for text classification,which is more effective than the traditional feature selection method.In the second stage,document representation requires a high dimensionality of the feature space and does not take into account the semantic relation between features,which leads to a poor categorization accuracy.So LSI method is proposed to solve these problems by using statistically derived conceptual indices to replace the individual terms which can discover the important correlative relationship between features and reduce the feature space dimension.Firstly,each feature in our algorithm is ranked depending on their importance of classification using CCD method.Secondly,we construct a new semantic space based on LSI method among features.The experimental results have proved that our method can reduce effectively the dimension of text vector and improve the performance of text categorization.  相似文献   

纯电动汽车行驶里程预测是驾驶者最关心的问题之一,为解决现有预测算法模型精度低、相对误差大的问题,本文采用融合片段回归与单点分类的机器学习方法对行驶里程进行预测.以真实车辆各项状态参数、环境信息等作为输入,通过聚类和过滤封装式特征筛选,提取最优特征集合,并基于行驶片段样本量选择预测方法,通过对环境温度和电池健康状态(SOH)进行分层耦合提高片段回归预测精度,通过单点分类和片段回归预测模型融合优化最终预测结果.行驶里程测试集预测结果中均方根相对误差(RMSRE)为0.035,平均相对误差为1.71%,能够精确稳定地实现行驶里程预测.  相似文献   

纯电动汽车行驶里程预测是驾驶者最关心的问题之一,为解决现有预测算法模型精度低、相对误差大的问题,本文采用融合片段回归与单点分类的机器学习方法对行驶里程进行预测.以真实车辆各项状态参数、环境信息等作为输入,通过聚类和过滤封装式特征筛选,提取最优特征集合,并基于行驶片段样本量选择预测方法,通过对环境温度和电池健康状态(SOH)进行分层耦合提高片段回归预测精度,通过单点分类和片段回归预测模型融合优化最终预测结果.行驶里程测试集预测结果中均方根相对误差(RMSRE)为0.035,平均相对误差为1.71%,能够精确稳定地实现行驶里程预测.  相似文献   

The rough sets and Boolean reasoning based discretization approach (RSBRA) is not suitable for feature selection for machine learning algorithms such as neural network or SVM because the information loss due to discretization is large. A modified RSBRA for feature selection was proposed and evaluated with SVM classifiers. In the presented algorithm, the level of consistency, coined from the rough sets theory, is introduced to substitute the stop criterion of circulation of the RSBRA, which maintains the fidelity of the training set after discretization. The experimental results show the modified algorithm has better predictive accuracy and less training time than the original RSBRA.  相似文献   

In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalanobis distance, the effective feature subsets are obtained as generalized attribute reducts. Experiment result shows that the classification performance can be improved by using the selected feature subsets.  相似文献   

为了提高鉴别式学习策略训练的贝叶斯网络分类器的分类精度,分析了贝叶斯网络结构与数据中变量分布之间的差异对贝叶斯网络分类器性能的影响,实验以网络结构的实际联合概率分布的树型近似描述为基准,删除在条件对数似然函数极大化过程中不起作用的边,生成具有同一联合概率分布的不同描述程度的网络结构.实验结果表明,只有当网络结构表现力不足时,鉴别式参数学习才能起积极作用;而当网络结构中有多余的边时,反而容易受其制约.从而验证了网络中多余的边对分类器性能没有影响的观点是片面的.  相似文献   

鉴于已有的绝大多数选择性分类算法主要用于完整数据,而现实中的数据通常是不完整的并且包含许多冗余属性或无关属性,本文在已有工作基础上利用信息增益率构建了一种用于不完整数据的混合型的选择性贝叶斯分类器:GBSD.在12个标准的不完整数据集上的实验结果表明,GBSD不仅能大幅度减少属性数目,而且比已有工作更能有效改善分类准确率和效率.  相似文献   

Reliability parameter selection is very important in the period of equipment project design and demonstration. In this paper, the problem in selecting the reliability parameters and their number is proposed. In order to solve this problem, the thought of text mining is used to extract the feature and curtail feature sets from text data firstly, and frequent pattern tree (FPT) of the text data is constructed to reason frequent item-set between the key factors by frequent patter growth (FPG) algorithm. Then on the basis of fuzzy Bayesian network (FBN) and sample distribution, this paper fuzzifies the key attributes, which forms associated relationship in frequent item-sets and their main parameters, eliminates the subjective influence factors and obtains condition mutual information and maximum weight directed tree among all the attribute variables. Furthermore, the hybrid model is established by reason fuzzy prior probability and contingent probability and concluding parameter learning method. Finally, the example indicates the model is believable and effective.  相似文献   

传统的高排放移动源识别方式是将采集的尾气数据与预先设定的排放阈值进行比较判 定,但是,排放阈值的设定很大程度上取决于人为标准,并且忽视了外部因素对尾气排放的影响, 无法真正反映移动源排放水平。针对此问题,本文结合机器学习算法,提出一种基于深度特征聚 类的高排放移动源识别方法。首先,利用随机森林算法筛选出不同污染物(CO、HC、NO)排放的 主要影响特征;其次,对多维影响特征进行聚类分析,获取高排放类别标签;最后,训练得到基于 深度森林的移动污染源分类模型,自动识别高排放目标源。通过对比实验,在合肥市机动车污染 遥测数据集上验证了所提方法的有效性。  相似文献   

定位与建图是车辆未知环境自主驾驶的基础,激光雷达依赖于场景几何特征而视觉图像 易受光线干扰,依靠单一激光点云或视觉图像的定位与建图算法存在一定局限性。本文提出一 种激光与视觉融合SLAM(Simultaneous Localization And Mapping)的车辆自主定位算法,通过融 合互补的激光与视觉各自优势提升定位算法的整体性能。为发挥多源融合优势,本文在算法前 端利用激光点云获取视觉特征的深度信息,将激光-视觉特征以松耦合的方式输入位姿估计模块 提升算法的鲁棒性。针对算法后端位姿和特征点大范围优化过程中计算量过大的问题,提出基 于关键帧和滑动窗口的平衡选取策略,以及基于特征点和位姿的分类优化策略减少计算量。实 验结果表明:本文算法的平均定位相对误差为 0.11 m 和 0.002 rad,平均资源占用率为 22.18% (CPU)和 21.50%(内存),与经典的 A-LOAM(Advanced implementation of LOAM)和 ORB-SLAM2 (Oriented FAST and Rotated BRIEF SLAM2)算法相比在精确性和鲁棒性上均有良好表现。  相似文献   

在脑机接口中,基于小波变换法和AR模型法结合线性判别准则对两类思维任务进行特征提取及分类,提出以小波系数均值经K-L变换作为特征,用Fisher判别准则进行分类。结果表明,这种方法可以利用少量的数据提取脑电信号的特征,具有比较好的分类效果。  相似文献   

为提高高光谱图像(HSI)分类精度,基于集成学习方法提出高光谱图像分类的层次集成学习新框架。采用两种集成学习策略:外部集成及内部集成。在外部集成阶段,构造多种高光谱图像的光谱和空间特征,使外部集成呈高度多样性,有利于提高分类精度;内部集成阶段,针对关联多特征集中的个体,Adaboost算法实现个体分类性能的提高。两组高光谱数据的实验结果表明,与原始的Adaboost和单分类器相比较,该方法在整体精度方面有更好的性能。  相似文献   

Arc sensing plays a significant role in the control and monitoring of welding quality for aluminum alloy pulsed gas touch argon welding (GTAW). A method for online quality monitoring based on adaptive boosting algorithm is proposed through the analysis of acquired arc voltage signal. Two feature extraction algorithms were developed in time domain and frequency domain respectively to extract six statistic characteristic parameters before removing the pulse interference using the wavelet packet transform (WPT), based on which the Adaboost classification model is successfully established to evaluate and classify the welding quality into two classes and the classified accuracy of the model is as high as 98.81%. The Adaboost algorithm has been verified to be feasible in the online evaluation of welding quality.  相似文献   

城市不同区域网约车供需缺口预测可为车辆调度策略提供支持,从而提高车辆运行效率和乘客服务水平.为实现网约车供需缺口短时预测,提出一种基于时空数据挖掘的深度学习预测模型(Spatio-Temporal Deep Learning Model, S-TDL).该模型由时空变量模型、空间属性变量模型和环境变量模型 3个子模型融合而成,可捕捉时空关联性、区域差异性和环境变化对供需缺口的影响.同时,提出特征聚类-最大信息系数两阶段特征选择方法,筛选与供需缺口相关性强的特征变量,提高训练效率,减少过拟合.滴滴出行实例分析证明,特征选择后的 STDL模型预测精度显著优于BP神经网络、长短期记忆网络和卷积神经网络.  相似文献   

基于交通视频监控图像的天气识别已经成为智能交通系统中重要的研究课题. 虽然卷积神经网络(convolutional neural network,CNN)在图像识别技术获得了巨大的发展,但是针对复杂交通场景的天气识别问题,现有的模型在特征表达方面仍然面临着巨大的挑战. 为了提取丰富的语义特征,提出了基于联合投票机制的深度神经网络(deep neural network,DNN)模型. 所提出的模型包括两个核心模块:基于通道和空间注意力机制的二阶特征模块和基于复合特征结果联合投票机制的分类模块, 用以提取不同天气图像中的判别性信息,提高在复杂交通场景下的天气识别性能. 最后,在两个基准天气分类数据集上进行了验证试验,结果表明:对于复杂场景条件下的天气识别问题,所提出的基于联合投票机制的深度神经网络模型的识别正确率优于目前最好的天气识别方法的1.97%.   相似文献   

In recommendation system,sparse data and cold-start user have always been a challenging problem.Using a linear upper confidence bound(UCB) bandit approach as the item selection strategy based on the user historical ratings and user-item context,we model the recommendation problem as a multi-arm bandit(MAB)problem in this paper.Enabling the engine to recommend while it learns,we adopt probabilistic matrix factorization(PMF) in this strategy learning phase after observing the payoff.In particular,we propose a new approach to get the upper bound statistics out of latent feature matrix.In the experiment,we use two public datasets(Netfilx and MovieLens) to evaluate our proposed model.The model shows good results especially on cold-start users.  相似文献   

为消除复杂传递路径对轴承滚动体振动信号的影响并提高故障特征提取的能力,研究了基于变分模态分解(VMD)、优化最大相关峭度解卷积(MCKD)和1.5维谱的轴承滚动体故障特征提取问题;分析了轴承滚动体原始振动信号特点、早期故障信号的特性以及复杂传递路径对振动信号的影响,运用VMD将原始振动信号分解为一系列本征模态函数(IMFs),提出了转频分量剔除方法,通过峭度准则优选2个峭度较大的IMFs分量进行重构;基于网格搜索法研究了MCKD算法参数优化方法,用以增强重构信号的周期性故障特征,消除复杂传递路径对轴承滚动体故障信号的影响;利用1.5维谱分析重构信号,建立了复杂传递路径下轴承滚动体故障特征提取新方法,实现了轴承滚动体故障的准确诊断;为了证明方法的有效性,选取美国凯斯西储大学轴承SKF6205基座滚动体数据进行试验验证与分析。试验结果表明:网格搜索法获得了MCKD算法的最优滤波长度与冲击周期参数(365、85),优化MCKD算法增强了重构信号的故障特征,减少了无关频率分量,明显降低了其他成分的干扰;提出的故障特征提取方法在0、735和1 470 W负载条件下均提取到了轴承滚动体的故障特征频...  相似文献   

To improve the detection rate and lower down the false positive rate in intrusion detection system,dimensionality reduction is widely used in the intrusion detection system.For this purpose,a data processing (DP) with support vector machine (SVM) was built.Different from traditionally identifying the redundant data before purging the audit data by expert knowledge or utilizing different kinds of subsets of the available 41-connection attributes to build a classifier,the proposed strategy first removes the attributes whose correlation with another attribute exceeds a threshold,and then classifies two sequence samples as one class while removing either of the two samples whose similarity exceeds a threshold.The results of performance experiments showed that the strategy of DP and SVM is superior to the other existing data reduction strategies (e.g.,audit reduction,rule extraction,and feature selection),and that the detection model based on DP and SVM outperforms those based on data mining,soft computing,and hierarchical principal component analysis neural networks.  相似文献   

Collaborative representation-based classification (CRC) is a distance based method, and it obtains the original contributions from all samples to solve the sparse representation coefficient. We find out that it helps to enhance the discrimination in classification by integrating other distance based features and/or adding signal preprocessing to the original samples. In this paper, we propose an improved version of the CRC method which uses the Gabor wavelet transformation to preprocess the samples and also adapts the nearest neighbor (NN) features, and hence we call it GNN-CRC. Firstly, Gabor wavelet transformation is applied to minimize the effects from the background in face images and build Gabor features into the input data. Secondly, the distances solved by NN and CRC are fused together to obtain a more discriminative classification. Extensive experiments are conducted to evaluate the proposed method for face recognition with different instantiations. The experimental results illustrate that our method outperforms the naive CRC as well as some other state-of-the-art algorithms.  相似文献   

The rapid development of the Internet brings a variety of original information including text information, audio information, etc. However, it is difficult to find the most useful knowledge rapidly and accurately because of its huge number. Automatic text classification technology based on machine learning can classify a large number of natural language documents into the corresponding subject categories according to its correct semantics. It is helpful to grasp the text information directly. By learning from a set of hand-labeled documents, we obtain the traditional supervised classifier for text categorization (TC). However, labeling all data by human is labor intensive and time consuming. To solve this problem, some scholars proposed a semi-supervised learning method to train classifier, but it is unfeasible for various kinds and great number of Web data since it still needs a part of hand-labeled data. In 2012, Li et al. invented a fully automatic categorization approach for text (FACT) based on supervised learning, where no manual labeling efforts are required. But automatically labeling all data can bring noise into experiment and cause the fact that the result cannot meet the accuracy requirement. We put forward a new idea that part of data with high accuracy can be automatically tagged based on the semantic of category name, then a semi-supervised way is taken to train classifier with both labeled and unlabeled data, and ultimately a precise classification of massive text data can be achieved. The empirical experiments show that the method outperforms the supervised support vector machine (SVM) in terms of both F1 performance and classification accuracy in most cases. It proves the effectiveness of the semi-supervised algorithm in automatic TC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号