基于WWW的未登录词识别研究 WWW-based Recognition of Non-login Words期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于WWW的未登录词识别研究

引用本文：	韩洁周勇刘少辉史忠植. 基于WWW的未登录词识别研究[J]. 计算机科学, 2002, 29(12): 155-156

作者姓名：	韩洁周勇刘少辉史忠植

作者单位：	中国科技大学研究生院,北京,100039

摘要：	一、引言当前,随着国民经济信息化的不断发展以及Internet的普及应用,全世界丰富的信息资源展现在我们每个人面前。如何从大量的信息中迅速有效地提取出所需信息极大地影响着我国计算机技术和信息技术的发展和应用推广。据统计,在信息领域中,80％以上的信息是以语言文字为载体的,因此,中文信息处理技术成为我国重要的计算机应用技术。未登录词的识别是中文信息处理技术中的难点之一。它在Internet数据挖掘、信息检索、图书馆图书文献管理、语音识别等应用中
关键词：	中文信息处理中文分词处理 WWW 未登录词识别分词词典计算机
WWW-based Recognition of Non-login Words

Abstract:	Currently, very little reference material can be found on the research of non-login word recognition. Solu-tions based on rules and syntaxes can't satisfactorily solve all kinds of problems of non-login word recognition. Thispaper will study and compare several existing solutions. The proposed solution is to extract N-grams after words sep-aration, from which non-login words can be extracted by means of probability statistics. Experiments have demon-strated that this method has favorable efficiency, recall ratio, and accuracy.

Keywords:	Non-login word Recognition N-gram WWW
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机科学》浏览原始摘要信息
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏