面向专利文献的中文分词技术的研究 Research on Chinese Word Segmentation for Patent Documents期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向专利文献的中文分词技术的研究

引用本文：	张桂平,刘东生,尹宝生,徐立军,苗雪雷.面向专利文献的中文分词技术的研究[J].中文信息学报,2010,24(3):112-117.

作者姓名：	张桂平刘东生尹宝生徐立军苗雪雷

作者单位：	沈阳航空工业学院知识工程中心, 辽宁沈阳 110034

基金项目：	国家自然科学基金资助项目(60842005);;辽宁省教育厅科技研究资助项目(2007T139)

摘要：	针对专利文献的特点,该文提出了一种基于统计和规则相结合的多策略分词方法。该方法利用文献中潜在的切分标记,结合切分文本的上下文信息进行最大概率分词,并利用术语前后缀规律进行后处理。该方法充分利用了从大规模语料中获取的全局信息和切分文本的上下文信息,有效地解决了专利分词中未登录词难以识别问题。实验结果表明,该文方法在封闭和开放测试下分别取得了较好的结果,对未登录词的识别也有很好的效果。
关键词：	计算机应用中文信息处理中文分词专利文献上下文信息
Research on Chinese Word Segmentation for Patent Documents

ZHANG Guiping,LIU Dongsheng,YIN Baosheng,XU Lijun,MIAO Xuelei.Research on Chinese Word Segmentation for Patent Documents[J].Journal of Chinese Information Processing,2010,24(3):112-117.

Authors:	ZHANG Guiping LIU Dongsheng YIN Baosheng XU Lijun MIAO Xuelei

Affiliation:	Knowledge Engineering Research Center, Shenyang Institute of Aeronautical Engineering, Shenyang, Liaoning 110034, China

Abstract:	According to the characteristics of the patent documents,this paper presents a multi-strategy approach for word segmentation based on statistics and rules.Our method takes advantage of the latent segmentation-marks in the document and employs the context information of the text in the a maximum probabilistic model of segmentation.Meanwhile,the term affix rules are applied in the post-processing.Making full use of the global information from a large scale corpus and the specific context information,this meth...

Keywords:	computer application Chinese information processing Chinese word segmentation patent document context information
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏