基于多特征的自适应新词识别 Adaptive Method for Chinese New Word Identification Based on Multi-features期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于多特征的自适应新词识别

引用本文：	罗智勇,宋柔.基于多特征的自适应新词识别[J].北京工业大学学报,2007,33(7):718-725.

作者姓名：	罗智勇宋柔

作者单位：	北京工业大学,计算机学院,北京,100022;北京语言大学,语言信息处理研究所,北京,100083;北京语言大学,语言信息处理研究所,北京,100083

基金项目：	国家自然科学基金(60272055,60572159);;国家“八六三”计划资助项目(2001AA114111);;教育部科学技术研究重点项目(00128,107017)．

摘要：	为提高自动分词系统对未登录词的识别性能,提出和实现了一种基于多特征的自适应新词识别方法,综合考虑了被处理文本中重复字符串的上下文统计特征(上下文熵)、内部耦合特征(似然比)、背景语料库对比特征(相关频率比值)以及自动分词系统辅助的边界确认信息等,并直接从被抽取文本中自动训练识別模型．同时,新词识别过程在字串PAT-Array数据结构上进行,可以抽取任意长度的新词语．实验结果表明,该方法新词发现速度快、节省存储空间．
关键词：	自然语言处理系统计算语言学词语处理新词识别多特征自适应自动分词
文章编号：	0254-0037（2007）07-0718-08
修稿时间：	2005-12-20
Adaptive Method for Chinese New Word Identification Based on Multi-features

LUO Zhi-yong,SONG Rou.Adaptive Method for Chinese New Word Identification Based on Multi-features[J].Journal of Beijing Polytechnic University,2007,33(7):718-725.

Authors:	LUO Zhi-yong SONG Rou

Affiliation:	1. College of Computer Science, Beijing University of Technology, Beijing 100022, China; 2. Center for Language Information Processing, Beijing Language and Culture University, Beijing 100083, China

Abstract:	To improve the performance of new word identification in Chinese word segment,the authors pro- pose an adaptive method for Chinese new word identification based on multi-feature method for off line corpus processing,in which many features,including context-entropy,likelihood ratios,frequency ratio against background corpus and boundary-verification with basic segmentation are introduced to evaluate the candidate words.And all of the features are integrated into an adaptive SVM classifier.Candidate new words...

Keywords:	natural language processing system computational linguistics word processing new word identification multi-features adaptation word segmentation
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏