文本挖掘及其关键技术与方法 The Text Mining and its Key Technigues and Methods期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

文本挖掘及其关键技术与方法

引用本文：	王丽坤,王宏,陆玉昌.文本挖掘及其关键技术与方法[J].计算机科学,2002,29(12):12-19.

作者姓名：	王丽坤王宏陆玉昌

作者单位：	智能技术与系统国家重点实验室,清华大学计算机科学与技术系,北京,100084

基金项目：	国家基础研究项目(973)(G1998030414)，清华大学信息学院基础研究的(985)

摘要：	从1969年美国国防部的计算机网络ARPANET起步,至今已有32年历史的Internet,已经发展成为包含多种信息资源、站点遍布全球的巨大信息服务系统,为其用户提供了极具价值的、巨大的数据资料。在数字图书馆和Internet上,在线可获得的信息量呈指数级增长,导致了信息爆炸。WWW以超文本的形式呈现给用户,一个网页里包含了多种不同的数据类型,其中最主要的信息源就是文本数据。文本表达了大量的、丰富的信息,同时包含了许多未被所有者发现的潜在知
关键词：	文本挖掘数据挖掘知识发现数据处理数据库
The Text Mining and its Key Technigues and Methods

Abstract:	With the dramatically development of Internet, the information processing and management technology onWWW have become a great important branch of data mining and data warehouse. Especially, nowadays, Text Miningis marvelously emerging and plays an important role in interrelated fields. So it is worth summarizing the contentabout text mining from its definition to relational methods and techniques. In this paper, combined to comparativelymature data mining technology, we present the definition of text mining and the multi-stage text mining process mod-el. Moreover, this paper roundly introduces the key areas of text mining and some of the powerful text analysis tech-niques, including: Word Automatic Segmenting, Feature Representation, Feature Extraction, Text Categorization,Text Clustering, Text Summarization, Information Extraction, Pattern Quality Evaluation, etc. These techniquescover the whole process from information preprocessing to knowledge obtaining.

Keywords:	Text mining Knowledge discovery in database Data mining Word automatic segmenting Feature representation Feature extraction Text categorization Text clustering
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机科学》浏览原始摘要信息
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏