首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于语义相似度的话题关联检测方法
引用本文:翟东海,崔静静,聂洪玉,杜佳.基于语义相似度的话题关联检测方法[J].西南交通大学学报,2015,28(3):517-522.
作者姓名:翟东海  崔静静  聂洪玉  杜佳
基金项目:国家语委十二五科研规划资助项目(YB125-49)教育部科学技术研究重点项目(212167)中央高校基本科研业务费专项资金资助项目(SWJTU12CX096)国家级大学生创新创业训练计划资助项目(201210694017)
摘    要:为有效识别任意两篇报道的相似性,提出了一种基于语义相似度的话题关联检测算法.该算法首先通过计算特征词之间的相对熵作为两篇报道中特征词之间的语义相似度;其次,通过计算平均语义相似度获得特征词和报道之间的关联度;最后,结合特征词在语料库中的TF-IF(term frequency-inverse document frequency)权重计算两篇报道之间的关联度,实现报道之间的关联度检测.本文提出的方法与现有的向量空间模型方法和仅依赖于平均点互信息的方法进行了比较,并通过TDT4中文语料进行测评,结果表明,基于语义相似度的关联检测方法能够更好地利用文本的语境信息,提高了现有检测系统的性能,其最小DET(detection error tradeoff)代价降低了3%. 

关 键 词:关联检测    语义相似度    相对熵    关联度
收稿时间:2014-06-30

Topic Link Detection Method Based on Semantic Similarity
ZHAI Donghai,CUI Jingjing,NIE Hongyu,DU Jia.Topic Link Detection Method Based on Semantic Similarity[J].Journal of Southwest Jiaotong University,2015,28(3):517-522.
Authors:ZHAI Donghai  CUI Jingjing  NIE Hongyu  DU Jia
Abstract:To effectively judge the similarity between the topics of any two of stories, a topic link detection method was proposed on the basis of semantic similarity. First, the relative entropy between the feature words in two stories was calculated to work as the semantic similarity. Furthermore, the relevance between the feature words and the other story was obtained by calculating the average semantic similarity. At last, the relevance degree between two stories was calculated by considering TF-IF(term frequency-nverse document frequency)weights of the feature words in the corpus and the semantic similarity simultaneously, completing the link detection of the story pairs. The proposed algorithm was compared with the VSM (vector space model) method and average point-wise mutual information. The experimental results for Chinese Corpus of TDT4 show that minimum DET(detection error tradeoff)cost of the proposed algorithm is reduced by about 3%, which demonstrates that the proposed algorithm can impose the context information effectively and improve the performance of the topic link detection system simultaneously. 
Keywords:
点击此处可从《西南交通大学学报》浏览原始摘要信息
点击此处可从《西南交通大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号