汉语短语的自动划分和标注 |
| |
引用本文: | 周强.汉语短语的自动划分和标注[J].中文信息学报,1997,11(1):1-10. |
| |
作者姓名: | 周强 |
| |
作者单位: | 北京大学计算语言学研究所 |
| |
摘 要: | 考虑到传统的基于规则的汉语分析器对大规模真实文本的分析所遇到的困难, 本文在使用统计方法进行汉语自动句法分析方面作了一些探索, 提出了一套基于统计的汉语短语自动划分和标注算法, 它分为预测划分点、括号匹配和分析树生成等三个处理阶段, 其间利用了从人工标注的树库中统计得到的各种数据进行自动句法排歧, 最终得到一棵最佳句法分析树, 从而可以自顶向下地完成对一句句子的短语自动划分和标注, 对一千多句句子的封闭测试结果表明, 短语划分的正确率约为86%, 短语标注的正确率约为92%, 处理效果还是比较令人满意的。
|
关 键 词: | 短语自动划分和标注 语料库加工 |
Automatically Bracket and Tag Chinese Phrases Qiang Zhouinstitute of Computational Linguistics |
| |
Affiliation: | (Peking University Beijing. 100871 |
| |
Abstract: | : this paper. we describe work toward the construction of a probabilistic parsing system for Chinese phrase . The system is intend to bracket and tag the Chinese pbrase automatically in large-scale real text corpus . The algorithm has three processing stages : to pre- dict the bracketing point, to match brackets and to generate the syntactic tree, using the scatis- tics information got from a supervised training treebank . Through syntactically disambiguating, the parser gets the best syntactic tree. Using this tree, we can bracket and tag the phrases of a sentence up-down automatically . The close test results of the system is : bracket accuracy is 86% and the tagging accuracy is 92%. |
| |
Keywords: | :phrase bracketing and tagging tagging corpus annotion |
本文献已被 CNKI 维普 等数据库收录! |
| 点击此处可从《中文信息学报》浏览原始摘要信息 |
|
点击此处可从《中文信息学报》下载全文 |
|