期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Vari-gram language model based on word clustering

袁里驰《中南工业大学学报(英文版)》2012,19(4):1057-1062

Category-based statistic language model is an important method to solve the problem of sparse data. But there are two bottlenecks: 1) The problem of word clustering. It is hard to find a suitable clustering method with good performance and less computation. 2) Class-based method always loses the prediction ability to adapt the text in different domains. In order to solve above problems, a definition of word similarity by utilizing mutual information was presented. Based on word similarity, the definition of word set similarity was given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, and the perplexity is reduced from 283 to 218. At the same time, an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability. The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora, and is reduced from 195.56 to 184.25 on English corpora compared with category-based model. 相似文献

2.

融合语义类信息的句法分析统计模型

袁里驰《数据采集与处理》2017,32(1):175-181

稀疏数据严重影响句子结构分析模型的结果, 而句法结构是语义内容和句法分析形式的结合。本文在语义结构信息标注的基础上提出了一种基于语义搭配关系的词聚类模型和算法,建立基于语义类的头驱动句子结构分析统计模型。该语言模型不但比较成功地解决了数据稀疏问题, 而且句子结构分析系统性能也有了明显的提高。句子结构分析实验结果表明,基于语义类的头驱动的句子结构分析统计模型,其召回率和精确率的值相应为88.26%和88.73%,综合指标改进了8.39%。相似文献

3.

一种新颖的词性标注模型 总被引：4，自引：4，他引：0

袁里驰钟义信《微电子学与计算机》2005,22(9):1-2,6

文章首次提出一种统计模型,即马氏族模型,该模型假定一个词出现概率既与当前词的词性标记有关,也与它前面的词有关,但其前面的词和该词词性标记关于该词条件独立.将马氏族模型适当加以简化,能成功地用于词性标记,实验结果证明:在相同的测试条件下,这种基于马氏族模型的词性标注方法标记成功率大大高于传统的基于隐马尔可夫模型的词性标注方法.马氏族模型在其它一些自然语言处理领域如分词、句法分析、语音识别、机器翻译也有广泛的应用前景. 相似文献

4.

Improved head-driven statistical models for natural language parsing

袁里驰《中南工业大学学报(英文版)》2013,(10):2747-2752

Head-driven statistical models for natural language parsing are the most representative lexicalized syntactic parsing models, but they only utilize semantic dependency between words, and do not incorporate other semantic information such as semantic collocation and semantic category. Some improvements on this distinctive parser are presented. Firstly, "valency" is an essential semantic feature of words. Once the valency of word is determined, the collocation of the word is clear, and the sentence structure can be directly derived. Thus, a syntactic parsing model combining valence structure with semantic dependency is purposed on the base of head-driven statistical syntactic parsing models. Secondly, semantic role labeling(SRL) is very necessary for deep natural language processing. An integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Experiments are conducted for the refined statistical parser. The results show that 87.12% precision and 85.04% recall are obtained, and F measure is improved by 5.68% compared with the head-driven parsing model introduced by Collins. 相似文献

5.

Improved hidden Markov model for speech recognition and POS tagging

袁里驰《中南工业大学学报(英文版)》2012,19(2):511-516

In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system. 相似文献

6.

中心词驱动句法分析中的平滑技术

下载免费PDF全文

袁里驰《电子学报》2013,41(7):1337-1342

解决数据稀疏问题是中心词驱动句法分析中的一个重要问题,基于词类的统计语言模型是解决统计模型数据稀疏问题的重要方法.本文在分析经典平滑算法的基础上,提出一种基于语义依存信息和互信息的词聚类算法,并利用绝对权重差分方法构造了一种可变长语言模型,即根据历史词对当前词预测所作的贡献不同,n值的大小也随之变化.进而提出了一种基于语义类和可变长模型的中心词驱动句法分析改进模型,既增强了句法分析模型的消歧能力,又解决了严重的数据稀疏问题.改进模型性能有了明显的提高,精确率和召回率分别为84.53%和82.41%,综合指标F值比Collins的中心词驱动句法分析模型提高了2.02个百分点. 相似文献

7.

基于相似度的词聚类算法 总被引：1，自引：1，他引：0

袁里驰钟义信《微电子学与计算机》2005,22(8):93-95

基于类的统计语言模型是解决统计模型数据稀疏问题的重要方法.传统的统计方法基于贪婪原则,常以语料的似然函数或困惑度(perplexity)作为评价标准.传统的聚类方法的主要缺点是聚类速度慢,初值对结果影响大,易陷入局部最优.本文提出了词相似度定义、词集合相似度定义,一种自下而上的分层聚类算法.这种方法不但能改善聚类效果,而且可根据不同的模型选择不同的相似度定义,从而提高聚类的使用效果. 相似文献

8.

基于配价结构和语义依存关系的句法分析统计模型

下载免费PDF全文

袁里驰《电子学报》2013,41(10):2029

目前主流的词汇化句法分析方法仅仅考虑词语之间的语义依存关系,而没有引入语义搭配和语义类等语义信息．“配价”是词语的一个比较本质的特点,一旦一个词语的配价结构确定下来,它应该和怎样的词进行搭配也就比较清楚了,从而也可以比较直接地导出句子的结构．本文结合中心词驱动句法分析模型,提出了基于配价结构和语义依存关系的句法分析模型．模型在规则的分解及概率计算中引入丰富的语义信息,既包括语义依存信息,也包括配价结构等语义搭配信息．用改进的句法分析模型进行句法分析实验,实验结果表明,精确率和召回率分别为88.76％和87.43％,综合指标F值比Collins的中心词驱动句法分析模型提高了6.65个百分点．相似文献

9.

利用配价信息的语义角色标注

下载免费PDF全文

袁里驰《电子学报》2017,45(10):2533-2539

语义角色标注是一种浅层语义分析.现有的汉语语义分析方法和语义角色标注体系没有结合汉语的特点并有效刻画出汉语的本质特性,导致目前汉语语义角色标注性能与英语相比相差较大.在汉语中,配价结构可以较好地刻画汉语句子的句法结构和语义构成关系,因此,我们在考察配价语法的基础上适当修改了语义角色标注体系并将谓词本身的配价信息融入语义角色标注.实验结果表明,配价信息的使用能够较大幅度提高动名词性谓词的语义角色标注性能:基于正确句法树和正确谓词识别,动词性谓词的SRL性能F1值达到93.69％;名词性谓词的SRL性能F1值达到79.23％;均优于目前国内外的同类系统. 相似文献

10.

基于相似度的词聚类算法和可变长语言模型 总被引：3，自引：0，他引：3

袁里驰《小型微型计算机系统》2009,30(5)

基于类的统计语言模型是解决统计模型数据稀疏问题的重要方法.传统的统计聚类方法基于含婪原则,常以语料的似然函数或困惑度(perplexity)作为评价标准.这种传统的聚类方法的主要缺点是聚类速度慢,初值对结果影响大,易陷入局部最优.本文利用互信息定义一种词相似度,基于相似度,提出一种自下而上的分层聚类算法.实验证明,该算法在计算复杂度和聚类效果上比传统的基于贪婪原则的统计聚类算法都有明显的改进.在提高预测能力方面,提出一种新的基于类的可变长语言模型(Vari-gram)的生成方法. 相似文献