首页 | 本学科首页   官方微博 | 高级检索  
     

基于查询向量的英语话题跟踪研究
引用本文:赵华,赵铁军,于浩,郑德权.基于查询向量的英语话题跟踪研究[J].计算机研究与发展,2007,44(8):1412-1417.
作者姓名:赵华  赵铁军  于浩  郑德权
作者单位:哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001
基金项目:国家自然科学基金 , 国家高技术研究发展计划(863计划)
摘    要:通过分析英语新闻报道的特点,提出了一种基于词汇区分和位置特征相结合的特征项抽取算法.词汇区分是指将单词分为首字母是大写的单词和首字母不是大写的单词,位置特征利用新闻报道的倒金字塔式的结构特点决定单词的重要性.提出了一种基于多个特征项抽取算法融合的特征项权值计算方法,该方法认为被越多的特征项抽取算法选中的特征项越重要.提出了一种基于多数投票策略的双重过滤算法,对报道和话题是否相关进行两次过滤,大大降低了系统的误报率.实验表明提出的3种算法不但取得了很好的效果,而且具有很好的可扩展性.

关 键 词:话题跟踪  词汇区分  多数投票策略  双重过滤  归一化检测开销  查询向量  英语话题  跟踪研究  Vector  Query  Based  Research  Topic  Tracking  可扩展性  效果  过滤算法  实验  误报率  系统  相关  投票策略  计算方法  特征项权值  算法融合  抽取算法
修稿时间:2005-12-23

English Topic Tracking Research Based on Query Vector
Zhao Hua,Zhao Tiejun,Yu Hao,Zheng Dequan.English Topic Tracking Research Based on Query Vector[J].Journal of Computer Research and Development,2007,44(8):1412-1417.
Authors:Zhao Hua  Zhao Tiejun  Yu Hao  Zheng Dequan
Affiliation:School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001
Abstract:As a new area of natural language processing,topic tracking has received a lot of attentions from experts both at home and at broad,and has become more and more popular.Topic tracking is defined to be the task of monitoring a stream of news stories to find those that discuss the topic known to the system.Research is made into three key problems in the query-based topic tracking:feature extraction,feature weight computation,and similarity measure.Firstly,a feature extraction algorithm based on the combination of word differentiation and the location property is proposed.The basic idea of word differentiation is to divide words into capital words,whose initials are capital,and common words,whose initials are not capital.The location property decides the importance of words based on the inverse-pyramidal structure of the news stories.Secondly,a new method to compute the feature's weight based on the combination of several different feature extraction algorithms is proposed.This method gives the feature bigger weight,which is selected by more feature extraction algorithms.Finally,a double filtration algorithm based on the majority vote rule is proposed,which makes two judgments about the relativity of a story and a topic,and reduces the system's false alarm successfully.Experiments indicate that these three proposed methods not only perform well,but also have good scalability.
Keywords:topic tracking  word differentiation  majority vote rule  double filtration  normalized detection cost
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号