面向URL的网络机器人软件模型的研究与实现 Research and Realization of a Spider Model Facing URL期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

面向URL的网络机器人软件模型的研究与实现

引用本文：	李广丽,刘觉夫.面向URL的网络机器人软件模型的研究与实现[J].华东交通大学学报,2007,24(1):67-70.

作者姓名：	李广丽刘觉夫

作者单位：	华东交通大学,信息工程学院,江西,南昌,330013

基金项目：	华东交通大学校科研和教改项目 , 华东交通大学校科研和教改项目

摘要：	WEB数据挖掘的关键是设计智能、高效的网络机器人.详细分析了面向URL的网络机器人的工作流程及实现它的关键技术,提出用多个队列管理URL列表,且队列元素按文档相关性高低排序,并行高速地下载网页.此外,在文档相关性计算中设计了一个可收敛的迭代阈值算法,有效地解决了相关度阈值设定的随意性.
关键词：	网络机器人 URL种子广度优先文档相关性阈值
文章编号：	1005-0523（2007）01-0067-04
收稿时间：	2006-10-11
修稿时间：	2006年10月11
Research and Realization of a Spider Model Facing URL

LI Guang-li,LIU Jue-fu.Research and Realization of a Spider Model Facing URL[J].Journal of East China Jiaotong University,2007,24(1):67-70.

Authors:	LI Guang-li LIU Jue-fu

Institution:	School of Information Engineer,East China Jiaotong Univ, Nanchang 330013, China

Abstract:	The key issue of mining data on WEB is how to design an intelligent and effective spider.The paper analyzes the work flow and key technologies of the spider facing URL in details.It also brings forward the mind that adopting several queues to manage the URL list,in order to download HTML files in high speed we sort the URLs by document correlativity.Moreover,we import the idea of iterative threshold into computing document correlativity,which resolve the random modification of threshold.

Keywords:	spider URL seed scope first document correlativity threshold
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《华东交通大学学报》浏览原始摘要信息
	点击此处可从《华东交通大学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏