首页 | 本学科首页   官方微博 | 高级检索  
     

基于搜索引擎的知识发现
引用本文:马玉春,宋瀚涛.基于搜索引擎的知识发现[J].计算机工程与应用,2004,40(30):178-180,220.
作者姓名:马玉春  宋瀚涛
作者单位:北京理工大学计算机系,北京,100081
摘    要:数据挖掘一般用于高度结构化的大型数据库,以发现其中所蕴含的知识。随着在线文本的增多,其中所蕴含的知识也越来越丰富,但是,它们却难以被分析利用。因而,研究一套行之有效的方案发现文本中所蕴含的知识是非常重要的,也是当前重要的研究课题。该文利用搜索引擎Google获取相关Web页面,进行过滤和清洗后得到相关文本,然后,进行文本聚类,利用Episode进行事件识别和信息抽取,数据集成及数据挖掘,从而实现知识发现。最后给出了原型系统,对知识发现进行实践检验,收到了很好的效果。

关 键 词:搜索引擎  文本聚类  episode  信息抽取  知识发现
文章编号:1002-8331-(2004)30-0178-03

Knowledge Discovery Based on the Search Engine
Ma Yuchun,Song Hantao.Knowledge Discovery Based on the Search Engine[J].Computer Engineering and Applications,2004,40(30):178-180,220.
Authors:Ma Yuchun  Song Hantao
Abstract:Data mining is typically applied to large databases of highly structured information in order to discover new knowledge.Though the amount of potentially valuable knowledge contained in document collections can be great,they are often difficult to analyze.Therefore,it is important to develop methods to efficiently discover knowledge embedded in these document repositories,and text mining becomes an important research area too.This paper describes an approach for mining knowledge from web pages,at first,gets web pages from the web by search engine Google,then filters out the irrelevant documents,takes text categorization,extracts information and recognizes the event type by episode,integrates and mines the data in order to discover new knowledge.Finally,a prototype based on this theory is developed,and then the result is described in detail.
Keywords:search engine  text categorization  episode  information extraction  knowledge discovery  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号