期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A NOVEL SPACE-COMPRESSED CHINESE WORD DIGRAM BASED ON BI-CHARACTER CO-ARTICULATION FREQUENCY

Zhao Yibao Qiao Liyan Tan Jianxun Sun Shenghe 《电子科学学刊(英文版)》2000,(2)

Chinese Phonetic-Character Conversion(CPCC) is an important issue in Chinese speech recognition and Chinese sentence keyboard input system. The approaches based on large corpus statistic Markov language model (such as bigram, trigram) become more and more popular today. This paper presents an improved Chinese word bigram, space-compressed Chinese word bigram, which stores the bi-word co-articulation frequency in the form of the bi-character co-articulation frequency. The bi-word co-articulation frequency is estimated from the bi-character co-articulation frequency library. The CPCC experiment with the improved Chinese word bigram shows: it can reach a higher correct conversion ratio with less space occupation. 相似文献

2.

Word prediction using a clustered optimal binary search tree

Eyas El-Qawasmeh 《Information Processing Letters》2004,92(5):257-265

Word prediction methodologies depend heavily on the statistical approach that uses the unigram, bigram, and the trigram of words. However, the construction of the N-gram model requires a very large size of memory, which is beyond the capability of many existing computers. Beside this, the approximation reduces the accuracy of word prediction. In this paper, we suggest to use a cluster of computers to build an Optimal Binary Search Tree (OBST) that will be used for the statistical approach in word prediction. The OBST will contain extra links so that the bigram and the trigram of the language will be presented. In addition, we suggest the incorporation of other enhancements to achieve optimal performance of word prediction. Our experimental results showed that the suggested approach improves the keystroke saving. 相似文献

3.

一种基于单字统计二元文法的自组词音字转换算法 总被引：3，自引：0，他引：3

赵以宝孙圣和《电子学报》1998,26(10):55-59

音字转换在语音识别和汉字语句键盘输入方面都占有很重要的地位，现在比较流行的方法是基于大语料统计的Ｍａｒｋｏｖ模型的音字转换方法，其中基于单字Ｎ元文法的音字转换算法具有数据量少，算法简单的优点，但转换准确率却较低，而基于词Ｎ元文法的音字转换算法则正好相反，本文在基于单字统计Ｂｉｇｒａｎ算法的基础上提出了一种自组词的音字转换方法，不仅具有单字Ｂｉｇｒａｍ方法的占空间少的优点，而且又可充分利用基于词Ｂｉ相似文献

4.

下载免费PDF全文

Guannan FANG;Caixia YUAN;Xiaojie WANG;Jiang LI;Zhanjiang SONG 《电子学报:英文版》2013,22(2):331-334

Tags or keywords provide an efficient way to manage and retrieve large scale data. This paper proposes an unsupervised method to suggest informative tags for multi-party dialogues by integrating dialogue characteristics. Our model first extracts keywords from dialogue texts under a speaker salience based framework. Then we get keyword bigrams through frequent pattern matching. In order to generate more flexible and meaningful tags, we expand keywords and their bigrams by tag association rules mined from a popular bookmarking web del.icio.us. Finally we rank the three types of tag candidates under a uniform metric. Experimental results validate the effectiveness and the versatility of our method when compared with several strong baseline models like TextRank, TFIDF rank and KNN. 相似文献

5.

维吾尔文Bigram文本特征提取 总被引：1，自引：0，他引：1

阿力木江·艾沙库尔班·吾布力吐尔根·依布拉音《计算机工程与应用》2015,(3):216-221,228

文本特征表示是在文本自动分类中最重要的一个环节。在基于向量空间模型（VSM）的文本表示中特征单元粒度的选择直接影响到文本分类的效果。在维吾尔文文本分类中,对于单词特征不能更好地表征文本内容特征的问题,在分析了维吾尔文Bigram对文本分类作用的基础上,构造了一个新的统计量CHIMI,并在此基础上提出了一种维吾尔语Bigram特征提取算法。将抽取到的Bigram作为文本特征,采用支持向量机（SVM）算法对维吾尔文文本进行了分类实验。实验结果表明,与以词为特征的文本分类相比,Bigram作为文本特征能够提高维吾尔文文本分类的准确率和召回率并且通过实验验证了该算法的有效性。相似文献

6.

基于无指导学习策略的无词表条件下的汉语自动分词 总被引：16，自引：0，他引：16

孙茂松肖明邹嘉彦《计算机学报》2004,27(6):736-742

探讨了基于无指导学习策略和无词表条件下的汉语自动分词方法,以期对研制开放环境下健壮的分词系统有所裨益,全部分词知识源自从生语料库中自动获得的汉字Bigram．在字间互信息和t-测试差的基础上,提出了一种将两者线性叠加的新的统计量md,并引入了峰和谷的概念,进而设计了相应的分词算法,大规模开放测试结果显示,该算法关于字间位置的分词正确率为85．88％,较单独使用互信息或t-测试差分别提高了2．47％和5．66％。相似文献

7.

基于Bigram的特征词抽取及自动分类方法研究 总被引：1，自引：1，他引：1

王笑旻《计算机工程与应用》2005,41(22):177-179,210

用计算机信息处理技术实现文本自动分类是计算机自然语言理解学科共同关注的课题。该文提出了一种基于Bigram的无词典的中文文本特征词的抽取方法,并利用互信息概念对得到的特征词进行处理,提高了特征词抽取的准确性。此外,通过采用基于统计学习原理和结构风险最小原则的支持向量机算法对一些文本进行了分类,验证了由所提出的算法得到的特征词的有效性和可行性。相似文献

8.

基于隐马尔可夫模型(HMM)的词性标注的应用研究 总被引：3，自引：0，他引：3

胡春静韩兆强《计算机工程与应用》2002,38(6):62-64

利用隐马尔可夫模型(HMM)对英语文本进行词性标注,首先介绍了对Viterbi算法的改进和基于HMM模型方法训练机器的步骤,然后通过一系列对比实验,得出两个结论:二元文法模型的“性能价格比”较三元文法模型更令人满意;词性标注集的个数对词性标注的准确率有影响。最后利用上述结论进行了封闭式测试和开放式测试。相似文献

9.

基于汉语二元同现的统计词义消歧方法研究 总被引：4，自引：0，他引：4

荀恩东李生《高技术通讯》1998,8(10):21-25

采用《汉语义词词林》和英汉双语语料库，通过“双语对齐”扩充了英汉词典的单记号译文；对大规模汉语语料库以Ｂ＋树算法为骨架统计了双语词组二元同现频次。相似文献

10.

Chuanhua ZHOU;Jiayi ZHOU;Cai YU;Wei ZHAO;Ruilin PAN 《电子学报:英文版》2020,29(5):880-886

We propose a multi-channel sliced deep Recurrent convolutional neural network (RCNN) with a residual network. We expand the RCNN into a deep neural network. Our proposed model can directly learn to extract bigram features and other features from sentences where other machine learning methods cannot. The experimental results indicate that our model outperforms the traditional methods. 相似文献