首页 | 本学科首页   官方微博 | 高级检索  
     

词典与统计方法结合的中文分词模型研究及应用
引用本文:蒋建洪,赵嵩正,罗玫.词典与统计方法结合的中文分词模型研究及应用[J].计算机工程与设计,2012,33(1):387-391.
作者姓名:蒋建洪  赵嵩正  罗玫
作者单位:西北工业大学管理学院,陕西西安,710129
摘    要:为了解决传统的基于词典的分词法和基于统计的分词方法的效率和识别能力的不足,根据电子商务中商品名称信息这一特定领域的文本数据的特点进行分析,研究了mmseg分词法和基于互信息的处理方法,结合两类分词方法的优点,将mmseg分词算法和互信息的算法应用于分词处理过程中,设计并实现了一个快速、准确度高的分词模型,通过测试结果表明,该模型能够较好地解决分词的速度与效率问题.

关 键 词:分词  mmseg算法  互信息  词典  统计

Analysis and application of Chinese word segmentation model which consist of dictionary and statistics method
JIANG Jian-hong , ZHAO Song-zheng , LUO Mei.Analysis and application of Chinese word segmentation model which consist of dictionary and statistics method[J].Computer Engineering and Design,2012,33(1):387-391.
Authors:JIANG Jian-hong  ZHAO Song-zheng  LUO Mei
Affiliation:(School of Management,Northwestern Polytechnical University,Xi’an 710129,China)
Abstract:To solve the problem that there is a lack of efficiency and recognition ability in the dictionary-based word segmentation method and in the statistical-based word segmentation method,the specific areas of product name text data in E-commerce is analyzed,and the "mmseg" word segmentation method and mutual information processing method are researched.A rapid and highly accurate word segmentation model is designed and proposed,two types of word segmentation method are untilled,and "mmseg" segmentation algorithm and mutual information segmentation algorithm are applied in word segment processing.The test proves that this model can provide a better solution for segmentation speed and efficiency.
Keywords:word segment  mmseg algorithm  mutual information  dictionary  statistics
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号