首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于无向图序列标注模型的中文分词词性标注一体化系统   总被引:3,自引:0,他引:3  
在中文词法分析中,分词是词性标注必须经历的阶段。为了能在分词阶段就充分利用词性标注的信息和减少两阶段错误的累计,最好的方法是将两个阶段,整合到一个架构中。该文以无向图模型为基础,将分词和词性标注有机地统一在一个序列标注模型中。由于可以采用更深层次的依赖关系作为特征,一体化系统在1998年人民日报语料上取得了97.19%的分词精确率和95.34%的词性标注精确率,是目前同类系统,在这一语料上取得的最好结果。  相似文献   

2.
一种新颖的词性标注模型   总被引:4,自引:4,他引:0  
文章首次提出一种统计模型,即马氏族模型,该模型假定一个词出现概率既与当前词的词性标记有关,也与它前面的词有关,但其前面的词和该词词性标记关于该词条件独立.将马氏族模型适当加以简化,能成功地用于词性标记,实验结果证明:在相同的测试条件下,这种基于马氏族模型的词性标注方法标记成功率大大高于传统的基于隐马尔可夫模型的词性标注方法.马氏族模型在其它一些自然语言处理领域如分词、句法分析、语音识别、机器翻译也有广泛的应用前景.  相似文献   

3.
该文以处理大规模真实文本为目标,把句法分析分解为分词/词性标注、短语识别两个部分。首先提出了一个一体化的分词/词性标注方法,该方法在隐马尔科夫模型(HMM)的基础上引入词汇信息,既保留了HMM简单快速的特点,又有效提高了标注精度;然后应用中心驱动模型进行短语识别,这是一个词汇化的英文句法分析模型,该文将其同分词/词性标注模型结合进行汉语句法分析。在公共的测试集上对句法分析器的性能进行了评价,精确率和召回率分别为77.57%和74.96%,这一结果要明显好于目前唯一可比的工作。  相似文献   

4.
A neural-network-based approach to synthesising F0 information for Mandarin text-to-speech is discussed. The basic idea is to use neural networks to model the relationship between linguistic features. Extracted from input text and parameters representing the pitch contour of syllables. Two MLPs are used to separately synthesise the mean and shape of pitch contour, using different linguistic features. A large set of utterances is employed to train these MLPs using the well known back-propagation algorithm. Pronunciation rules for generating F0 information are automatically learned and implicitly memorised by the MLPs. In the synthesis, parameters representing the mean and shape of the pitch contour of each syllable are generated using linguistic features extracted from the given input text. Simulation results confirmed that this is a promising approach for F0 synthesis. The resulting synthesised pitch contours of syllables match well with their original counterparts. Average root mean square errors of 0.94 ms/frame and 1.00 ms/frame were achieved  相似文献   

5.
本文设计与实现了一个全自动中文新闻字幕生成系统,输入为新闻视频,输出为视频对应的字幕文本.以<新闻联播>为语料,实现了音频提取、音频分类与切分、说话人识别、大词汇量连续语音识别、视频文件的播放和文本字幕的自动生成等多项功能.新闻字幕的自动生成,避免了繁重费时的人工字幕添加过程.实验表明,该系统识别率高,能够满足听障等特...  相似文献   

6.
基于短时分形维数的汉语语音自动分段技术研究   总被引:1,自引:0,他引:1  
本文根据汉语语音的构成特点,提出了一种新的基于短时分形维数的汉语语音自动分段方法。该方法首先用等差尺度网格维数替代传统盒维数计算方法来快速计算语音信号的分形维数,然后在统计、分析汉语男女声21种声母和38种韵母语间信号的分形特性基础上,利用中心偏离限定算法来实现汉语语音信号的自动分段。仿真实验表明,该方法不但能正确实现不同语速条件下的语音自动分段,而且具有噪声鲁棒性,是一种有效的汉语语音自动分段技  相似文献   

7.
在中文分词领域,基于字标注的方法得到广泛应用,通过字标注分词问题可转换为序列标注问题,现在分词效果最好的是基于条件随机场(CRFs)的标注模型。作战命令的分词是进行作战指令自动生成的基础,在将CRFs模型应用到作战命令分词时,时间和空间复杂度非常高。为提高效率,对模型进行分析,根据特征选择算法选取特征子集,有效降低分词的时间与空间开销。利用CRFs置信度对分词结果进行后处理,进一步提高分词精确度。实验结果表明,特征选择算法及分词后处理方法可提高中文分词识别性能。  相似文献   

8.
The objective of this work is to develop a rule-based emotion conversion method for a better emotional perception. In this work, performance of emotion conversion using the linear modification model is improved by using vowel-based non-uniform prosody modification. In the present approach, attempts were made to integrate features like position and identity for addressing the non-uniformity in prosody generated due to the emotional state of the speaker. We mainly concentrate on the parameters such as strength, duration and pitch contour of vowels at different parts of the sentence. The influence of emotions on the above parameters is exploited to convert the speech from neutral emotion to the target emotion. Non-uniform prosody modification factors for emotion conversion are based on the position of vowels in the word, and the position of the word in the sentence. This study is carried out by using Indian Institute of Technology-Simulated Emotion speech corpus. Evaluation of the proposed algorithm is carried out by a subjective listening test. From the listening tests, it is observed that the performance of the proposed approach is better than the existing approaches.  相似文献   

9.
This letter presents a prediction model for sentence‐final intonations for Korean conversational‐style text‐to‐speech systems in which we introduce the linguistic feature of ‘modality’ as a new parameter. Based on their function and meaning, we classify tonal forms in speech data into tone types meaningful for speech synthesis and use the result of this classification to build our prediction model using a tree structured classification algorithm. In order to show that modality is more effective for the prediction model than features such as sentence type or speech act, an experiment is performed on a test set of 970 utterances with a training set of 3,883 utterances. The results show that modality makes a higher contribution to the determination of sentence‐final intonation than sentence type or speech act, and that prediction accuracy improves up to 25% when the feature of modality is introduced.  相似文献   

10.
《电子学报:英文版》2017,(6):1111-1117
The accurate classification of subjective and objective sentences is important in the preparation for micro-blog sentiment analysis. Since a single feature type cannot provide enough subjective information for classification, we propose a Support vector machine (SVM)-based classification model for Chinese micro-blogs using multiple features. We extracted the subjective features from the Part of speech (POS) and the dependency relationship between words, and constructed a 3-POS subjective pattern set and a dependency template set. We fused these two types of features and used an SVM-based model to classify Chinese micro-blog text. The experimental results showed that the performance of the classification model improved remarkably when using multiple features.  相似文献   

11.
吴语文语转换中的语音韵律控制   总被引:2,自引:2,他引:0  
吴语不同于汉语普通话,无论是声调类型与特征、协同发音的变调方式,还是语句的语调与幅度变化、单词重音位置与时长,都有其自身的规律。因此,在吴语文语转换中有必要建立相应的韵律控制模型来控制语音的合成输出。这里首先介绍吴语的语音韵律特征,然后介绍语音合成中具体的声调控制、变调处理以及时长与幅度调整处理方法,对语调的处理也作了简单介绍,并给出了运用韵律控制合成语音的实验评价结果。  相似文献   

12.
Feature selection is very important for feature‐based relation classification tasks. While most of the existing works on feature selection rely on linguistic information acquired using parsers, this letter proposes new features, including probabilistic and semantic relatedness features, to manifest the relatedness between patterns and certain relation types in an explicit way. The impact of each feature set is evaluated using both a chisquare estimator and a performance evaluation. The experiments show that the impact of relatedness features is superior to existing well‐known linguistic features, and the contribution of relatedness features cannot be substituted using other normally used linguistic feature sets.  相似文献   

13.
改进汉语数码语音识别中的语音特征提取性能   总被引:3,自引:0,他引:3  
汉语数据码语音识别中存在三种与语音特征提取性能有关的语音混淆。  相似文献   

14.
黄哲莹  刘作桢  徐及  赵庆卫 《信号处理》2022,38(10):2074-2081
为了解决汉英语码转换文本数据稀缺的问题,本文提出了基于编码器-解码器模型合成语码转换文本的方法,从有限的语码转换文本与大量单语种平行语料中学习语码转换语言学规则与语种内部的语言学规则,来合成语码转换文本。但是该模型合成的语码转换文本自然度低,因此本文又提出基于带复制机制的编码器-解码器模型合成语码转换文本的方法,在编码器-解码器的基础上,增加了一个门控,用来决定从编码器的预测结果还是从编码器的输入源文本中产生下一个词。最终,该方法使语言模型在SEAME测试集上的困惑度降低了绝对13.96。由此可得出结论,本文提出的方法可大规模地合成自然度高的语码转换文本,缓解语码转换文本数据的稀缺性。  相似文献   

15.
TTS语音单元边界的自动切分   总被引:2,自引:0,他引:2  
语音单元边界的准确切分对基于波形拼接的语音合成系统至关重要。文章采用了两步切分方法,第一步中先由基于HMM模型的强制对齐方法得到初始的边界.在第二步中提出用基于前后音素的边界模型来修正初始边界。为解决训练数据不足的问题,提出用分类与衰退树将前后因素发音相近的边界模型进行聚类。这样可以根据训练数据的多少,动态调节边界模型的数目,以保证模型训练的可靠性。在对中文语音库的实验中,自动切分的准确度由78.7%提高到91.5%。  相似文献   

16.
17.
动词细分类属于词性标注的一部分,是自然语言处理的重要内容之一。基于条件随机场在分词和词性标注的基础上对动词进行了更细致的分类。根据动词的语言环境构建条件随机场模型,实验结果表明该方法取得了较高的准确率,最高取得了98.11的F值。  相似文献   

18.
A neural network based approach of pause-duration synthesis for Mandarin text-to-speech is proposed. It uses an MLP to replace explicit synthesis rules for generating pause duration from input text. By properly training the MLP using a large set of utterances, phonological rules of producing pause duration are automatically learned. Experimental results confirmed that this is a promising approach.<>  相似文献   

19.
具有文本生成功能的智能语音生成系统   总被引:1,自引:0,他引:1  
陈芳  袁保宗 《电子学报》1997,25(10):5-8
智能语音生成系统不仅研究通常的文语转换过程,而且研究文转转换所需文本的生成过程,本文将介绍具有文本生成功能的智能语音生成系统,该系统通过主题选择、文本规划、文本组织、语法实现、文本形成等步骤得到正确的文本,根据怕生成的文本和文语转换实现高自然度及可懂度的语音输出。  相似文献   

20.
基于一种改进的监督流形学习算法的语音情感识别   总被引:2,自引:0,他引:2  
为了有效提高语音情感识别的性能,需要对嵌入在高维声学特征空间的非线性流形上的语音特征数据作非线性降维处理。监督局部线性嵌入(SLLE)是一种典型的用于非线性降维的监督流形学习算法。该文针对SLLE存在的缺陷,提出一种能够增强低维嵌入数据的判别力,具备最优泛化能力的改进SLLE算法。利用该算法对包含韵律和音质特征的48维语音情感特征数据进行非线性降维,提取低维嵌入判别特征用于生气、高兴、悲伤和中性4类情感的识别。在自然情感语音数据库的实验结果表明,该算法仅利用较少的9维嵌入特征就取得了90.78%的最高正确识别率,比SLLE提高了15.65%。可见,该算法用于语音情感特征数据的非线性降维,可以较好地改善语音情感识别结果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号