信息检索用户查询语句的停用词过滤 Removal of Stop Word in Users' Request for Information Retrieval期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

信息检索用户查询语句的停用词过滤

引用本文：	熊文新,宋柔.信息检索用户查询语句的停用词过滤[J].计算机工程,2007,33(6):195-197.

作者姓名：	熊文新宋柔

作者单位：	1. 北京外国语大学中国外语教育研究中心,北京,100089;北京语言大学语言信息处理研究所,北京,100083 2. 北京语言大学语言信息处理研究所,北京,100083

基金项目：	国家自然科学基金 , 国家高技术研究发展计划(863计划) , 教育部科学技术基金 , 教育部人文社会科学研究基地基金

摘要：	针对以自然语言形式提出的查询请求，区分信息需求表述和信息内容两部分。基于近20万语句的查询语料库和背景语料人民日报对照，提出汉语通用停用词和查询专用的相对停用词，采用左右熵和Ngram方法及KL距离脱机构造相应候选词表。根据候选词语的Bigram属性和句中不同位置的分布特点，给出了在线动态识别停用词的方法。实验结果表明，该文的方法比单纯根据静态停用词表标注效果要好。
关键词：	用户查询停用词构造识别
文章编号：	1000-3428（2007）06-0195-03
修稿时间：	2006-07-05
Removal of Stop Word in Users' Request for Information Retrieval

XIONG Wenxin,SONG Rou.Removal of Stop Word in Users'''' Request for Information Retrieval[J].Computer Engineering,2007,33(6):195-197.

Authors:	XIONG Wenxin SONG Rou

Affiliation:	1. NationaI Research Centre for Foreign Language Education, Beijing Foreign Studies University, Beijing 100089; 2. Center for Language Information Processing, Beijing Language and Culture University, Beijing 100083

Abstract:	Information need expression and information content words are distinguished for users requests in natural language. Based on the analysis of 200 000 query sentences and the People’s Daily corpus, absolute stop word and relative stop word are proposed. The candidate stop word lists are built offline by means of left/right entropy, Ngram and KL divergence. With the information of Bigram and different position distributions, this paper gives a dynamic identification algorithm for the actual stop word in users’ request expression. The experiment shows the method is superior to the baseline which only consults a stop word list.

Keywords:	Users request Stop word Building Identification
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏