基于向量空间模型的文本过滤系统 A Text Filtering System Based on Vector Space Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于向量空间模型的文本过滤系统

引用本文：	黄萱菁,夏迎炬,吴立德.基于向量空间模型的文本过滤系统[J].软件学报,2003,14(3):435-442.

作者姓名：	黄萱菁夏迎炬吴立德

作者单位：	复旦大学计算机科学与工程系,上海,200433

基金项目：	Supported by the National Natural Science Foundation of China under Grant Nos.69873011, 69935010, 60103014 (国家自然科学基金); the National High Technology Development 863 Program of China under Grant No.863-306-ZD02-02-4 (国家863高科技发展计划); the National High-Tech Research and Development Plan of China under Grant No.2001AA114120(国家高技术研究发展计划);the Science and Technology Promotion Foundation of Shanghai of China under Grant No.995115005(上海市科学技术发展基金);the Science and Technology Foundation of Fudan University of China(复旦大学科学技术基金)

摘要：	文本过滤是指从大量的文本数据流中寻找满足特定用户需求的文本的过程.首先从任务、测试主题、语料库和评测指标等方面介绍了文本检索领域最权威的国际评测会议--文本检索会议(TREC)及其中的文本过滤项目,然后详细地描述了基于向量空间模型的文本过滤系统.该系统由训练和自适应过滤两个阶段组成.在训练阶段,通过特征抽取和伪反馈建立初始的过滤模板,并设置初始阈值;在过滤阶段,则根据用户的反馈信息自适应地调整模板和阈值.该系统参加了2000年举行的第9次文本检索会议的评测,取得了很好的成绩,在来自多个国家的15个系统中名列前茅,其中自适应过滤和批过滤的平均准确率分别为26.5%和31.7%.
关键词：	文本检索文本过滤文本分类机器学习向量空间模型
文章编号：	1000-9825/2003/14(03)0435
收稿时间：	2001/9/14 0:00:00
修稿时间：	2001年9月14日
A Text Filtering System Based on Vector Space Model

HUANG Xuan-Jing,XIA Ying-Ju and WU Li-De.A Text Filtering System Based on Vector Space Model[J].Journal of Software,2003,14(3):435-442.

Authors:	HUANG Xuan-Jing XIA Ying-Ju and WU Li-De

Abstract:	Text filtering is the procedure of retrieving documents relevant to the requirements of specific users from a large-scale text data stream. First, the TREC (text retrieval conference) as well as its text filtering track are introduced, which is the most authoritative international evaluation conference on text retrieval, from the aspects of tasks, topics, corpus and evaluation metrics. Then a text filtering system based on vector space model is presented. This system is composed of two phases of training and adaptive filtering. During the training phase, feature selection and pseudo feedback are used to select the initial filtering profiles and thresholds. During the filtering phase, user feedback is utilized to modify the profiles and thresholds adaptively. This system took participate in the 9th Text Retrieval Conference in 2000, and ranked high among all the 15 systems from many countries. Good performance has been achieved, where the average precisions of adaptive and batch filtering are 26.5% and 31.7% respectively.

Keywords:	text retrieval text filtering text categorization machine learning vector space model
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏