Research on Web Page Automatic Classification Based on Internet News Corpus Research on Web Page Automatic Classification Based on Internet News Corpus期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Research on Web Page Automatic Classification Based on Internet News Corpus

作者姓名：	蔡巍王永成尹中航

摘要：	Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.
Research on Web Page Automatic Classification Based on Internet News Corpus

CAI Wei,WANG Yong-cheng,YIN Zhong-hang,.Research on Web Page Automatic Classification Based on Internet News Corpus[J].Journal of Shanghai Jiaotong university,2007,12(6):731-735.

Authors:	CAI Wei WANG Yong-cheng YIN Zhong-hang

Abstract:	Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.

Keywords:	automatic classification Web pages subject extraction
本文献已被万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏