首页 | 本学科首页   官方微博 | 高级检索  
     


Research on Web Page Automatic Classification Based on Internet News Corpus
Authors:CAI Wei  WANG Yong-cheng  YIN Zhong-hang  
Abstract:Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.
Keywords:automatic classification  Web pages  subject extraction
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号