首页 | 本学科首页   官方微博 | 高级检索  
     

基于向量空间模型的文本过滤系统
引用本文:黄萱菁,夏迎炬,吴立德.基于向量空间模型的文本过滤系统[J].软件学报,2003,14(3):435-442.
作者姓名:黄萱菁  夏迎炬  吴立德
作者单位:复旦大学计算机科学与工程系,上海,200433
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.69873011, 69935010, 60103014 (国家自然科学基金); the National High Technology Development 863 Program of China under Grant No.863-306-ZD02-02-4 (国家863高科技发展计划); the National High-Tech Research and Development Plan of China under Grant No.2001AA114120(国家高技术研究发展计划);the Science and Technology Promotion Foundation of Shanghai of China under Grant No.995115005(上海市科学技术发展基金);the Science and Technology Foundation of Fudan University of China(复旦大学科学技术基金)
摘    要:文本过滤是指从大量的文本数据流中寻找满足特定用户需求的文本的过程.首先从任务、测试主题、语料库和评测指标等方面介绍了文本检索领域最权威的国际评测会议--文本检索会议(TREC)及其中的文本过滤项目,然后详细地描述了基于向量空间模型的文本过滤系统.该系统由训练和自适应过滤两个阶段组成.在训练阶段,通过特征抽取和伪反馈建立初始的过滤模板,并设置初始阈值;在过滤阶段,则根据用户的反馈信息自适应地调整模板和阈值.该系统参加了2000年举行的第9次文本检索会议的评测,取得了很好的成绩,在来自多个国家的15个系统中名列前茅,其中自适应过滤和批过滤的平均准确率分别为26.5%和31.7%.

关 键 词:文本检索  文本过滤  文本分类  机器学习  向量空间模型
文章编号:1000-9825/2003/14(03)0435
收稿时间:2001/9/14 0:00:00
修稿时间:2001年9月14日

A Text Filtering System Based on Vector Space Model
HUANG Xuan-Jing,XIA Ying-Ju and WU Li-De.A Text Filtering System Based on Vector Space Model[J].Journal of Software,2003,14(3):435-442.
Authors:HUANG Xuan-Jing  XIA Ying-Ju and WU Li-De
Abstract:Text filtering is the procedure of retrieving documents relevant to the requirements of specific users from a large-scale text data stream. First, the TREC (text retrieval conference) as well as its text filtering track are introduced, which is the most authoritative international evaluation conference on text retrieval, from the aspects of tasks, topics, corpus and evaluation metrics. Then a text filtering system based on vector space model is presented. This system is composed of two phases of training and adaptive filtering. During the training phase, feature selection and pseudo feedback are used to select the initial filtering profiles and thresholds. During the filtering phase, user feedback is utilized to modify the profiles and thresholds adaptively. This system took participate in the 9th Text Retrieval Conference in 2000, and ranked high among all the 15 systems from many countries. Good performance has been achieved, where the average precisions of adaptive and batch filtering are 26.5% and 31.7% respectively.
Keywords:text retrieval  text filtering  text categorization  machine learning  vector space model
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号