排序方式: 共有13条查询结果,搜索用时 0 毫秒
1.
Chinese Phonetic-Character Conversion(CPCC) is an important issue in Chinese speech recognition and Chinese sentence keyboard input system. The approaches based on large corpus statistic Markov language model (such as bigram, trigram) become more and more popular today. This paper presents an improved Chinese word bigram, space-compressed Chinese word bigram, which stores the bi-word co-articulation frequency in the form of the bi-character co-articulation frequency. The bi-word co-articulation frequency is estimated from the bi-character co-articulation frequency library. The CPCC experiment with the improved Chinese word bigram shows: it can reach a higher correct conversion ratio with less space occupation. 相似文献
2.
Eyas El-Qawasmeh 《Information Processing Letters》2004,92(5):257-265
Word prediction methodologies depend heavily on the statistical approach that uses the unigram, bigram, and the trigram of words. However, the construction of the N-gram model requires a very large size of memory, which is beyond the capability of many existing computers. Beside this, the approximation reduces the accuracy of word prediction. In this paper, we suggest to use a cluster of computers to build an Optimal Binary Search Tree (OBST) that will be used for the statistical approach in word prediction. The OBST will contain extra links so that the bigram and the trigram of the language will be presented. In addition, we suggest the incorporation of other enhancements to achieve optimal performance of word prediction. Our experimental results showed that the suggested approach improves the keystroke saving. 相似文献
3.
一种基于单字统计二元文法的自组词音字转换算法 总被引:3,自引:0,他引:3
音字转换在语音识别和汉字语句键盘输入方面都占有很重要的地位,现在比较流行的方法是基于大语料统计的Markov模型的音字转换方法,其中基于单字N元文法的音字转换算法具有数据量少,算法简单的优点,但转换准确率却较低,而基于词N元文法的音字转换算法则正好相反,本文在基于单字统计Bigran算法的基础上提出了一种自组词的音字转换方法,不仅具有单字Bigram方法的占空间少的优点,而且又可充分利用基于词Bi 相似文献
4.
Guannan FANG;Caixia YUAN;Xiaojie WANG;Jiang LI;Zhanjiang SONG 《电子学报:英文版》2013,22(2):331-334
Tags or keywords provide an efficient way to manage and retrieve large scale data. This paper proposes an unsupervised method to suggest informative tags for multi-party dialogues by integrating dialogue characteristics. Our model first extracts keywords from dialogue texts under a speaker salience based framework. Then we get keyword bigrams through frequent pattern matching. In order to generate more flexible and meaningful tags, we expand keywords and their bigrams by tag association rules mined from a popular bookmarking web del.icio.us. Finally we rank the three types of tag candidates under a uniform metric. Experimental results validate the effectiveness and the versatility of our method when compared with several strong baseline models like TextRank, TFIDF rank and KNN. 相似文献
5.
维吾尔文Bigram文本特征提取 总被引:1,自引:0,他引:1
文本特征表示是在文本自动分类中最重要的一个环节。在基于向量空间模型(VSM)的文本表示中特征单元粒度的选择直接影响到文本分类的效果。在维吾尔文文本分类中,对于单词特征不能更好地表征文本内容特征的问题,在分析了维吾尔文Bigram对文本分类作用的基础上,构造了一个新的统计量CHIMI,并在此基础上提出了一种维吾尔语Bigram特征提取算法。将抽取到的Bigram作为文本特征,采用支持向量机(SVM)算法对维吾尔文文本进行了分类实验。实验结果表明,与以词为特征的文本分类相比,Bigram作为文本特征能够提高维吾尔文文本分类的准确率和召回率并且通过实验验证了该算法的有效性。 相似文献
6.
7.
基于Bigram的特征词抽取及自动分类方法研究 总被引:1,自引:1,他引:1
王笑旻 《计算机工程与应用》2005,41(22):177-179,210
用计算机信息处理技术实现文本自动分类是计算机自然语言理解学科共同关注的课题。该文提出了一种基于Bigram的无词典的中文文本特征词的抽取方法,并利用互信息概念对得到的特征词进行处理,提高了特征词抽取的准确性。此外,通过采用基于统计学习原理和结构风险最小原则的支持向量机算法对一些文本进行了分类,验证了由所提出的算法得到的特征词的有效性和可行性。 相似文献
8.
基于隐马尔可夫模型(HMM)的词性标注的应用研究 总被引:3,自引:0,他引:3
利用隐马尔可夫模型(HMM)对英语文本进行词性标注,首先介绍了对Viterbi算法的改进和基于HMM模型方法训练机器的步骤,然后通过一系列对比实验,得出两个结论:二元文法模型的“性能价格比”较三元文法模型更令人满意;词性标注集的个数对词性标注的准确率有影响。最后利用上述结论进行了封闭式测试和开放式测试。 相似文献
9.
基于汉语二元同现的统计词义消歧方法研究 总被引:4,自引:0,他引:4
采用《汉语义词词林》和英汉双语语料库,通过“双语对齐”扩充了英汉词典的单记号译文;对大规模汉语语料库以B+树算法为骨架统计了双语词组二元同现频次。 相似文献
10.
Chuanhua ZHOU;Jiayi ZHOU;Cai YU;Wei ZHAO;Ruilin PAN 《电子学报:英文版》2020,29(5):880-886
We propose a multi-channel sliced deep Recurrent convolutional neural network (RCNN) with a residual network. We expand the RCNN into a deep neural network. Our proposed model can directly learn to extract bigram features and other features from sentences where other machine learning methods cannot. The experimental results indicate that our model outperforms the traditional methods. 相似文献