基于搜索引擎的知识发现 Knowledge Discovery Based on the Search Engine期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于搜索引擎的知识发现

引用本文：	马玉春,宋瀚涛.基于搜索引擎的知识发现[J].计算机工程与应用,2004,40(30):178-180,220.

作者姓名：	马玉春宋瀚涛

作者单位：	北京理工大学计算机系,北京,100081

摘要：	数据挖掘一般用于高度结构化的大型数据库,以发现其中所蕴含的知识。随着在线文本的增多,其中所蕴含的知识也越来越丰富,但是,它们却难以被分析利用。因而,研究一套行之有效的方案发现文本中所蕴含的知识是非常重要的,也是当前重要的研究课题。该文利用搜索引擎Google获取相关Web页面,进行过滤和清洗后得到相关文本,然后,进行文本聚类,利用Episode进行事件识别和信息抽取,数据集成及数据挖掘,从而实现知识发现。最后给出了原型系统,对知识发现进行实践检验,收到了很好的效果。
关键词：	搜索引擎文本聚类 episode 信息抽取知识发现
文章编号：	1002-8331-(2004)30-0178-03
Knowledge Discovery Based on the Search Engine

Ma Yuchun,Song Hantao.Knowledge Discovery Based on the Search Engine[J].Computer Engineering and Applications,2004,40(30):178-180,220.

Authors:	Ma Yuchun Song Hantao

Abstract:	Data mining is typically applied to large databases of highly structured information in order to discover new knowledge.Though the amount of potentially valuable knowledge contained in document collections can be great,they are often difficult to analyze.Therefore,it is important to develop methods to efficiently discover knowledge embedded in these document repositories,and text mining becomes an important research area too.This paper describes an approach for mining knowledge from web pages,at first,gets web pages from the web by search engine Google,then filters out the irrelevant documents,takes text categorization,extracts information and recognizes the event type by episode,integrates and mines the data in order to discover new knowledge.Finally,a prototype based on this theory is developed,and then the result is described in detail.

Keywords:	search engine text categorization episode information extraction knowledge discovery
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏