首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The rapid development of the Internet brings a variety of original information including text information, audio information, etc. However, it is difficult to find the most useful knowledge rapidly and accurately because of its huge number. Automatic text classification technology based on machine learning can classify a large number of natural language documents into the corresponding subject categories according to its correct semantics. It is helpful to grasp the text information directly. By learning from a set of hand-labeled documents, we obtain the traditional supervised classifier for text categorization (TC). However, labeling all data by human is labor intensive and time consuming. To solve this problem, some scholars proposed a semi-supervised learning method to train classifier, but it is unfeasible for various kinds and great number of Web data since it still needs a part of hand-labeled data. In 2012, Li et al. invented a fully automatic categorization approach for text (FACT) based on supervised learning, where no manual labeling efforts are required. But automatically labeling all data can bring noise into experiment and cause the fact that the result cannot meet the accuracy requirement. We put forward a new idea that part of data with high accuracy can be automatically tagged based on the semantic of category name, then a semi-supervised way is taken to train classifier with both labeled and unlabeled data, and ultimately a precise classification of massive text data can be achieved. The empirical experiments show that the method outperforms the supervised support vector machine (SVM) in terms of both F1 performance and classification accuracy in most cases. It proves the effectiveness of the semi-supervised algorithm in automatic TC.  相似文献   

2.
This paper proposed a novel feature selection method LUIFS (latent utility of irrelevant feature selection) that not only selects the relevant features, but also targets at discovering the latent useful irrelevant attributes by measuring their supportive importance to other attributes. The method minimizes the information lost and simultaneously maximizes the final classification accuracy. The classification error rates of the LUIFS method on 16 real-life datasets from UCI machine learning repository were evaluated using the ID3, Nave-Bayes, and IB (instance-based classifier) learning algorithms, respectively; and compared with those of the same algorithms with no feature selection (NoFS), feature subset selection (FSS), and correlation-based feature selection (CFS). The empirical results demonstrate that the LUIFS can improve the performance of learning algorithms by taking the latent relevance for irrelevant attributes into consideration, and hence including those potentially important attributes into the optimal feature subset for classification.  相似文献   

3.
Weibo,also known as micro-blog,with its extremely low threshold of information release and interactive communication mode,has become the primary source and communication form of Internet hotspots.However,characterized as a kind of short text,the sparsity in semantic features,plus its colloquial and diversified expressions makes clustering analysis more difficult.In order to solve the above problems,we use the Biterm topic model(BTM)to extract features from the corpus and use vector space model(VSM)to strengthen the features to reduce the vector dimension and highlight the main features.Then,an improved Weibo feature-incorporated incremental clustering algorithm and the Weibo buzz calculation formula are proposed to describe the buzz of Weibo,and then the discovery of hotspots can be reasonably made.The experimental results show that the incremental clustering algorithm presented in this paper can effectively improve the accuracy of clustering in different dimensions.Meanwhile,the calculation formula of Weibo buzz reasonably describes the evolution process of Weibo buzz from a qualitative point of view,which can help discover the hotspots effectively.  相似文献   

4.
为提高高光谱图像(HSI)分类精度,基于集成学习方法提出高光谱图像分类的层次集成学习新框架。采用两种集成学习策略:外部集成及内部集成。在外部集成阶段,构造多种高光谱图像的光谱和空间特征,使外部集成呈高度多样性,有利于提高分类精度;内部集成阶段,针对关联多特征集中的个体,Adaboost算法实现个体分类性能的提高。两组高光谱数据的实验结果表明,与原始的Adaboost和单分类器相比较,该方法在整体精度方面有更好的性能。  相似文献   

5.
针对路面裂缝检测不完整和分割出现断裂的问题,提出了一种多尺度特征增强的路面裂缝检测网络MFENet,实现端到端的路面裂缝图像检测、分类和分割处理;设计了多尺度注意力特征增强模块,建立了网络模型的上层多尺度特征通道与底层特征通道权重系数之间的映射关系,以提升有效通道的特征输出;基于路面裂缝的坐标信息和像素语义信息在物理位置上的相关性,设计了多语义特征关联模块,实现不同语义信息之间的特征融合增强,并通过特征维度转换实现对路面裂缝图像的前景特征过滤;提出了一种针对深度特征强度进行量化评估的方法,用于提升模型提取特征能力的可解释性。在自采集数据集上的研究结果表明:MFENet对路面裂缝图像检测的平均精准率和平均召回率相比Mask R-CNN分别提升了4.3%和5.4%,相比基线模型RDSNet分别提升了14.6%和14.3%;MFENet对路面裂缝图像分割的平均精准率和平均召回率相比Mask R-CNN分别提升了6.6%和8.8%,相比RDSNet分别提升了8.1%和9.7%;与Mask R-CNN等主流方法相比,MFENet对不同类型路面裂缝图像的检测、分割精度最高。在公开数据集(CFD、C...  相似文献   

6.
针对三维点云鸟瞰图特征提取不充分导致车辆目标检测性能欠佳问题,本文提出一种基于金字塔特征融合的二阶段三维点云车辆目标检测算法。首先通过降维处理并利用体素占用编码原始三维点云,得到二维特征图输入;然后,利用上采样网络传递高层语义特征,下采样网络传递低层位置特征,构建一阶段金字塔网络结构提取车辆目标特征;最后,通过候选区域提取层得 到不同尺度的候选区域,利用兴趣区域池化层对齐各候选区域尺度,并采用全连接层融合多尺度特征,提取不同感受野下车辆目标特征;此外,在损失函数方面,补充正余弦角度损失并加权到总损失函数中,优化车辆目标航向角预测。基于KITTI公开数据集的实验分析表明,本文算法相较基准网络能够有效补充三维点云鸟瞰图特征提取,在不同难度的检测任务中平均检测精度提高 了5.07%~8.59%。  相似文献   

7.
提出使用最小二乘支持向量机LS—SVM(Least Squares Support Vector Machines)算法进行乐器音乐分类,从而实现乐器的辩识。在对Ls—sVM理论进行深入探讨的基础上,选择乐器音乐clip作为样本,进行特征提取,提取的特征包括频谱特征,短时自相关系数和MFCC等,然后用最小二乘支持向量机算法进行分类。对古琴、古筝、箜篌和琵琶音乐采取样本进行仿真实验,求得分类准确率和运行时间,同时使用逻辑回归(Logistic Regression)算法进行对比试验,其中最小二乘支持向量机和逻辑回归分类的准确率分别为96.5%和92.5%,且LS—SVM的运行时间比Logist的少。实验结果表明最小二乘支持向量机具有更为优越的分类性能和非线性处理能力,可以推广用于解决其它实际分类问题。  相似文献   

8.
针对Internet上日益泛滥的色情信息,提出了一种语义链技术和向量空间模型相结合的方法,利用语义链技术找出待分类文本的语义链,由该语义链的各密度向量分量与色情(性文化)文本语义进行比较,来确定其与待分类文本的相似程度,从而将待分类文本分到对应的类中,最后可以使用先前的分类结果对色情信息实施过滤,通过实验表明,该方法能较好的识别色情网页并加以过滤.  相似文献   

9.
For many image classification tasks, color histogram is usually employed as an important “signature” to describe the color distribution of the image and infer the image content. However, most traditional color histograms cannot achieve satisfactory results in many image classification systems. In order to improve the accuracy and reduce the computational complexity of the classification task, an information-based color feature representation is proposed in this paper. The mutual information between the feature and the class label is adopted to evaluate the discriminative power of the feature. A novel quantization scheme is presented, which removes the redundant color components and combines the adjacent components together to generate a new feature to maximize the discriminative ability. An iterative algorithm is performed to derive the color space quantization and color feature generation. In order to illustrate the effectiveness of the proposed color representation, a specific image classification task, i.e., differentiating the adult images from benign ones, is employed. Experimental results show that our color feature achieves better classification performance and better efficiency compared with the traditional color histogram.  相似文献   

10.
基于交通视频监控图像的天气识别已经成为智能交通系统中重要的研究课题. 虽然卷积神经网络(convolutional neural network,CNN)在图像识别技术获得了巨大的发展,但是针对复杂交通场景的天气识别问题,现有的模型在特征表达方面仍然面临着巨大的挑战. 为了提取丰富的语义特征,提出了基于联合投票机制的深度神经网络(deep neural network,DNN)模型. 所提出的模型包括两个核心模块:基于通道和空间注意力机制的二阶特征模块和基于复合特征结果联合投票机制的分类模块, 用以提取不同天气图像中的判别性信息,提高在复杂交通场景下的天气识别性能. 最后,在两个基准天气分类数据集上进行了验证试验,结果表明:对于复杂场景条件下的天气识别问题,所提出的基于联合投票机制的深度神经网络模型的识别正确率优于目前最好的天气识别方法的1.97%.   相似文献   

11.
Considering the diversities and ambiguities of opinion expressions in Chinese online product reviews,normal sentiment analysis technologies have exposed their inadequateness in both classification accuracy and identifying effectiveness.We propose a novel approach which can easily identify product features and corresponding opinions by building a domain-specific affective ontology and thus mapping comment sentences to the objects defined in the affective ontology.Ontology is created automatically by processing the online reviews;both product features and affective words are presented as nodes which are connected to each other by their semantic relationship.Furthermore,in order to increase the accuracy,we introduce a dynamic polarity detection technique for affective words whose sentimental tendencies are dependent on particular contexts.The experimental results clearly demonstrate the performance improvement of our approach compared with others in real world online product reviews for classification tests.  相似文献   

12.
文本自动分类方法是指在给定的分类体系下,根据文本的内容自动判别类型的过程。它是当今信息搜索领域的重要研究方向。本文介绍了文本自动分类的研究方法,文本的向量空间模型表示.并给出了文档的训练算法和分类算法。  相似文献   

13.
Improved local tangent space alignment (ILTSA) is a recent nonlinear dimensionality reduction method which can efficiently recover the geometrical structure of sparse or non-uniformly distributed data manifold. In this paper, based on combination of modified maximum margin criterion and ILTSA, a novel feature extraction method named orthogonal discriminant improved local tangent space alignment (ODILTSA) is proposed. ODILTSA can preserve local geometry structure and maximize the margin between different classes simultaneously. Based on ODILTSA, a novel face recognition method which combines augmented complex wavelet features and original image features is developed. Experimental results on Yale, AR and PIE face databases demonstrate the effectiveness of ODILTSA and the feature fusion method.  相似文献   

14.
Objective Due to the incompleteness and complexity of fault diagnosis for power transformers, a comprehensive rough-fuzzy scheme for solving fault diagnosis problems is presented. Fuzzy set theory is used both for representation of incipient faults' indications and producing a fuzzy granulation of the feature space. Rough set theory is used to obtain dependency rules that model indicative regions in the granulated feature space. The fuzzy membership functions corresponding to the indicative regions, modelled by rules, are stored as cases. Results Diagnostic conclusions are made using a similarity measure based on these membership functions. Each case involves only a reduced number of relevant features making this scheme suitable for fault diagnosis. Conclusion Superiority of this method in terms of classification accuracy and case generation is demonstrated.  相似文献   

15.
Automatic translation of Chinese text to Chinese Braille is important for blind people in China to acquire information using computers or smart phones. In this paper, a novel scheme of Chinese-Braille translation is proposed. Under the scheme, a Braille word segmentation model based on statistical machine learning is trained on a Braille corpus, and Braille word segmentation is carried out using the statistical model directly without the stage of Chinese word segmentation. This method avoids establishing rules concerning syntactic and semantic information and uses statistical model to learn the rules stealthily and automatically. To further improve the performance, an algorithm of fusing the results of Chinese word segmentation and Braille word segmentation is also proposed. Our results show that the proposed method achieves accuracy of 92.81% for Braille word segmentation and considerably outperforms current approaches using the segmentation-merging scheme.  相似文献   

16.
In recent years, automatic identification of butterfly species arouses more and more attention in different areas. Because most of their larvae are pests, this research is not only meaningful for the popularization of science but also important to the agricultural production and the environment. Texture as a notable feature is widely used in digital image recognition technology; for describing the texture, an extremely effective method, graylevel co-occurrence matrix(GLCM), has been proposed and used in automatic identification systems. However,according to most of the existing works, GLCM is computed by the whole image, which likely misses some important features in local areas. To solve this problem, this paper presents a new method based on the GLCM features extruded from three image blocks, and a weight-based k-nearest neighbor(KNN) search algorithm used for classifier design. With this method, a butterfly classification system works on ten butterfly species which are hard to identify by shape features. The final identification accuracy is 98%.  相似文献   

17.
纯电动汽车行驶里程预测是驾驶者最关心的问题之一,为解决现有预测算法模型精度低、相对误差大的问题,本文采用融合片段回归与单点分类的机器学习方法对行驶里程进行预测.以真实车辆各项状态参数、环境信息等作为输入,通过聚类和过滤封装式特征筛选,提取最优特征集合,并基于行驶片段样本量选择预测方法,通过对环境温度和电池健康状态(SOH)进行分层耦合提高片段回归预测精度,通过单点分类和片段回归预测模型融合优化最终预测结果.行驶里程测试集预测结果中均方根相对误差(RMSRE)为0.035,平均相对误差为1.71%,能够精确稳定地实现行驶里程预测.  相似文献   

18.
纯电动汽车行驶里程预测是驾驶者最关心的问题之一,为解决现有预测算法模型精度低、相对误差大的问题,本文采用融合片段回归与单点分类的机器学习方法对行驶里程进行预测.以真实车辆各项状态参数、环境信息等作为输入,通过聚类和过滤封装式特征筛选,提取最优特征集合,并基于行驶片段样本量选择预测方法,通过对环境温度和电池健康状态(SOH)进行分层耦合提高片段回归预测精度,通过单点分类和片段回归预测模型融合优化最终预测结果.行驶里程测试集预测结果中均方根相对误差(RMSRE)为0.035,平均相对误差为1.71%,能够精确稳定地实现行驶里程预测.  相似文献   

19.
如何在海量多源多模态的滑坡灾害时空大数据中快速精准地发现满足灾情评估任务需求的优势信息,是综合减灾救灾的关键. 传统灾害数据检索多以“人工经验+关键字”的被动检索方式为主,难以兼顾任务的精确性与时效性,为此,提出了一种面向评估任务的滑坡灾情数据多层级语义检索方法. 通过建立滑坡灾情评估任务对数据特征需求的显式语义描述及任务需求与数据特征之间的高级语义映射,并据此设计多层级语义匹配的数据检索算法,面向灾情评估任务实现优势数据汇聚. 以四川茂县滑坡灾害评估为例进行实验分析,本文检索方法查询效率具有明显优势,900 km2、90 d范围内的灾情数据精准检索效率达到秒级,且推荐优势数据集的准确性高,60 d时间差距阈值下推荐结果平均贴近度达到90%以上. 结果表明本方法可根据任务需求准确可靠地快速自动获取灾害数据,从而显著提高减灾应急响应能力.   相似文献   

20.
为解决监测数据缺失导致的轴温监测系统误诊和漏诊率较高的问题, 提出了一种基于数据特征分析的轴温监测数据软测量方法; 通过轴温监测点的布局与相关性分析, 确定了监测数据软测量的源数据范围; 采用自组织特征映射算法, 通过对源数据归一化、优胜区域定义与隶属度优化, 实现了轴温数据本征维数确定与数据聚类; 引入多维尺度分析方法, 通过数据间距的相似性量化与距离矩阵特征值分解, 实现了轴温数据的类内降维; 采用多维尺度分析方法对类间降维数据再次降维, 提出了一种分步式降维方法, 构建了信息量最大化与计算量最小化的平衡策略; 采用深度学习栈式自编码器方法提取类间降维数据的内部特征, 构建了缺失轴温数据的软测量模型。研究结果表明: 基于降维数据的软测量方法的时间效率比基于原始数据的软测量方法高14.25%;2种方法的精度相当, 当一维数据缺失时, 数据软测量的平均精度可达99.83%;当二维数据缺失时, 平均精度可达99.75%;当三或四维数据缺失时, 平均精度均可达99.16%;在满足最大允许误差2.5%、误差容忍度1.0%条件的情况下, 针对任意缺失维度不高于四维的情况, 提出的方法可有效地实现高精度与高效率的缺失数据恢复。   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号