首页 | 本学科首页   官方微博 | 高级检索  
     

汉语语料库词性标注自动校对方法研究
引用本文:张虎,郑家恒,刘江.汉语语料库词性标注自动校对方法研究[J].计算机应用,2005,25(1):17-19.
作者姓名:张虎  郑家恒  刘江
作者单位:山西大学计算机与信息技术学院
基金项目:国家863计划资助项目(2001AA4031)
摘    要:从聚类和分类的角度入手,对大规模语料库中的词性标注的自动校对问题作了分析,提出了语料库词性标注正确性检查和自动校对的新方法。该方法利用聚类和分类的思想,对范例进行聚类并求出阈值,根据阈值,判定词性标注的正误;对标注错误的词性,按靠近各词性类别重心的原则归类,给出一个校对词性,进而提高汉语语料库词性标注的准确率。

关 键 词:聚类    词性标注    自动校对
文章编号:1001-9081(2005)01-0017-03

Study on auto-proofreading method for POS tagging of Chinese corpus
ZHANG Hu,ZHENG Jia-heng,LIU Jiang.Study on auto-proofreading method for POS tagging of Chinese corpus[J].journal of Computer Applications,2005,25(1):17-19.
Authors:ZHANG Hu  ZHENG Jia-heng  LIU Jiang
Affiliation:College of Computer & Information Technology, Shanxi University
Abstract:The auto-proofreading problem in the large-scale corpus was analyzed, and a new method inspecting the correctness of POS tagging and an auto-proofreading method based on clustering and classifying were put forward. Using clustering and classifying, the method firstly classified the sequences of part of speech of the example and got the threshold value. Then according to the threshold value, it classified the test sequences to judge its correctness, and gave out a proofreading POS to the wrong POS Tagging. Furthermore, it enhanced the correctness ratio of the part of speech tagging on large-scale corpus.
Keywords:clustering  POS Tagging  auto-proofreading
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号