首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
于凤芹  肖志 《声学技术》2008,27(2):266-270
利用Hilbert-Huang变换的自适应带通滤波特性,提出了一种共振峰提取算法。该算法利用固有模态函数是均值为零的窄带调频调幅信号与语音信号声道模型的调频.调幅信号相一致的特点;并根据经验模态分解的自适应性,有效地利用了信号本身决定固有模态函数的中心频率和带宽的特性,不需预先估计带通滤波器频率和带宽便可得到共振峰分量,避免受虚假峰值、共振峰合并和高音调语音的影响。该算法既能精确提取共振峰,又能跟踪共振峰频率的变化。  相似文献   

2.
浊音端点检测在语音处理中占有重要地位,在语音编解码、语音识别、语音增强处理中都需要用到端点检测技术。常规的以短时能量、过零率等作为判决特征参数的方法无法在低信噪比的系统中满足应用需求。本文以信号的共振峰和基音周期检测为基础检测浊音端点,算法首先提取语音信号的第一共振峰以及基音周期信息并以此为判决依据判断浊音的起点和终点。实验证明,这种方法在噪声环境中相对于传统的基于能量检测的或AMR_WB标准中的端点检测算法有更高的正确率。  相似文献   

3.
Powerful computers are needed for processing tasks related to human languages these days. Human languages, also called natural languages, are highly versatile systems of encoding information and can capture information of various domains. To enable a computer to process information in human languages, the language needs to be appropriately ‘described’ to the computer, i.e. the language needs to be ‘modelled’. In this work, we present an approach for acquisition of morphology of inflectional language like Hindi. It is an unsupervised learning approach, suitable for languages with a rich concatenative morphology. Broadly, our work is carried out in three steps: 1. Acquire the morphology of Hindi from a raw (un annotated) Central Institute of Indian Languages (CIIL), Mysore text corpus, 2. prepare clusters and prepare stem bag and suffix bag, 3. use the morphological knowledge to decompose given word as stems and suffixes according to their morphological behaviour and add new words. A prime motivation behind this work is to eventually develop an unsupervised morphological analyser which is language-independent (used for Hindi). Second motivation is to develop a Morphological segmentation which is language-independent as it is shown that study of morphology would benefit to a range of NLP tasks such as speech recognition, speech synthesis, machine translation and information retrieval. Though Hindi is an important and a national language in India, little computational work has been done so far in this direction. Our work is one of the first efforts in this regard and can be considered pioneering. There are many such languages for which it is very important to have a suitable but inexpensive computational acquisition process. Languages receive very little attention of computational linguistic research both in terms of availability of funds and number of researchers. We however do not claim that our approach is a solution for all such languages. Different languages have characteristics that require individual research attention.  相似文献   

4.
高声强声源及其应用   总被引:4,自引:0,他引:4  
谷嘉锦 《声学技术》1997,16(1):9-13
本文介绍了4种高声强声源:1.单孔旋转阀;2.带中心杆的哈特曼发声器阵;3.喷流点声源;4.喷流反馈旋转发声器。本文还给出4种声源的用途,即分别应用于:1.流动管道中有限振辐波的传播研究;2.进气道吸声内衬的声阻抗测量技术;3.风洞声学环境的研究;4.利用高强的能来清灰除尘。  相似文献   

5.
一种新型的语音分析编辑合成系统   总被引:1,自引:1,他引:0  
作者根据语音研究的需要,研制了一套集语音采集,分析,合成,修改,比较,放音、调整为一体。通过参数修正,中值平滑,Hanning窗滤波,鼠标画线式,数值直接修改参数的语音分析,编辑,合成系统。该系统对语音研究来说,可以大大地缩短分析合成时间,提高效率,探导各种参数的作用,各种分析合成方法的优劣,对整个语音的研究有其使用价值。本文通过语音分析编辑合成系统的结构,功能,应用3个部分来介绍这套系统。  相似文献   

6.
The design of a system for the synthesis of one frequency from another is discussed in terms of mathematical methods of approximating real numbers, the ratio of the frequencies being the number approximated. A general equation describing the frequency synthesis process is derived and it is shown, using charts, how block diagrams for a frequency synthesizer can be developed from the solutions of this equation. Examples are given for a synthesizer which compares the frequency of an ammonia N15H3 maser with a standard frequency of 5 MHz, and for a synthesizer which offsets a standard frequency of 100 kHz by steps of 1×10-5 Hz.  相似文献   

7.
8.
A method is described for evaluating the clarity of speech received through the intercommunication channels of self-contained respirators based on transforming the speech communication into an electrical signal, then into digital form and use of a PC for formant evaluation of speech clarity and intelligibility.  相似文献   

9.
Speech is a distinctive complex feature of human capabilities. In order to understand the physics underlying speech production, in this work, we empirically analyse the statistics of large human speech datasets ranging several languages. We first show that during speech, the energy is unevenly released and power-law distributed, reporting a universal robust Gutenberg–Richter-like law in speech. We further show that such ‘earthquakes in speech’ show temporal correlations, as the interevent statistics are again power-law distributed. As this feature takes place in the intraphoneme range, we conjecture that the process responsible for this complex phenomenon is not cognitive, but it resides in the physiological (mechanical) mechanisms of speech production. Moreover, we show that these waiting time distributions are scale invariant under a renormalization group transformation, suggesting that the process of speech generation is indeed operating close to a critical point. These results are put in contrast with current paradigms in speech processing, which point towards low dimensional deterministic chaos as the origin of nonlinear traits in speech fluctuations. As these latter fluctuations are indeed the aspects that humanize synthetic speech, these findings may have an impact in future speech synthesis technologies. Results are robust and independent of the communication language or the number of speakers, pointing towards a universal pattern and yet another hint of complexity in human speech.  相似文献   

10.
从提高满足少数民族普通话高自然度语音合成与高精度语音识别的实际应用需求出发,首次从实验语音学的角度对初级、中级和高级阶段的50名维吾尔族汉语学习者与10名母语为汉语普通话的说话人声调的一阶差分与时长以及相似度进行对比,并对其声调的一阶差分模式、声调时长等韵律参数进行了实验分析,得出维吾尔族学生对汉语声调的偏误情况以及与中国少数民族汉语水平等级考试(Master of Human Kinetics, MHK)成绩的关系。通过实验结果可以发现,三组维吾尔族人学习普通话的声调都有困难。两种语言的音系,语调和重音等特性影响了第二语言中的声调特性。归纳了维吾尔族学习者声调的基本声学特征,总结出了一些重要的规则和结论;为解决给汉语语音处理带来的困难,尤其是少数民族汉语的语音合成和语音识别方面的声调问题,提供了重要的参考依据。  相似文献   

11.
微腔光频梳作为一种频率的测量工具,具有高准确度、可集成化的优势,将在深空探测、精密计量等领域发挥巨大作用。本文系统全面地介绍了微腔光频梳在非线性激发产生和器件研制方面的技术现状,阐述了微腔光频梳在光钟、测距成像、光谱分析、频率合成器、低信噪微波源和相干通信等方面的研究进展,对光频梳未来的技术研究热点和应用前景进行了预测,为微腔光频梳在计量、测试、通信等领域的应用发展起到推动作用。  相似文献   

12.
An analog implementation of the fractional N-phase-locked-loop variable-frequency synthesis technique is presented. In addition to its simplicity, this implementation allows tuning over broad frequency ranges. The synthesizer was developed in response to a need for a compact, low-power, local oscillator for a swept heterodyne, low-frequency, battery-operated, portable spectrum analyzer. The resulting prototype synthesizer was constructed on a 4-in×4-in circuit board using standard CMOS integrated circuits. The total power requirements were +7 V at 8 mA and -7 V at 1 mA when the synthesizer is operated in the 380-580-kHz frequency range. Further reductions in size may be expected from the use of surface mount devices. Spectral data are presented for the prototype circuit serving as the local oscillator in a prototype swept-frequency spectrum analyzer. That is, instead of a spectral analysis of a fixed synthesizer frequency, the synthesizer was swept through a range of frequencies about a stable reference applied to the input of the prototype analyzer. Thus the results are conservative , since they include the effects of noise coupled to the sweep voltage from other circuitry within the prototype analyzer. This method of evaluation demonstrates one of the distinct advantages of this circuit  相似文献   

13.
杨俊  张玲霞  陈明 《计测技术》2003,(5):43-44,48
分析研究了微型汉字识别系统设计中的关键技术。主要包括:汉字识别、语音合成、系统设计;该系统通过前端的单片微型摄像头对文字进行图像采集,然后由DSP处理器进行图像分析和汉字识别,最后通过语音合成芯片进行朗读。  相似文献   

14.
液滴喷射技术的应用进展   总被引:7,自引:0,他引:7  
液滴喷射技术具有结构简单、成本低、定位精度高等优势,除喷墨打印外,在无模具成形、微机械和微器件制造、生物芯片、材料合成等领域有广泛的应用前景,本文综述这些进展,并简要介绍本课题组设计研制的8喷头组合溶液喷射合成仪及其性能、应用。  相似文献   

15.
In this paper is described a novel technique for producing an electro-optical intensity synthesizer which can generate different periodic time domain waveforms through only sine or cosine wave applied-voltages. The synthesizer presented here consists of a series of stages between two polarizers, with each stage consisting of an electro-optic element and a compensator. Every electro-optical element has the same applied-voltage function but different azimuth angles and ratios between the longitudinal and transverse lengths. The main principle is the synthesis of an electro-optic effect and a polarization interference effect in the time domain. This technique is based on an expanded Fourier positive-direction searching algorithm, which can not only simplify the calculation process but also produces many choices of structural parameters for different waveforms generation. A three-stage synthesis of an electro-optical birefringent system for continuous square waveform is undertaken to prove the principle.  相似文献   

16.
In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processing.  相似文献   

17.
Spoken language is one of the distinctive characteristics of the human race. Spoken language processing is a branch of computer science that plays an important role in human–computer interaction (HCI), which has made remarkable advancement in the last two decades. This paper reviews and summarizes the acoustic, phonetic and prosody features that have been used for spoken language identification specifically for Indian languages. In addition, we also review the speech databases, which are already available for Indian languages and can be used for the purposes of spoken language identification.  相似文献   

18.
N USHA RANI  P N GIRIJA 《Sadhana》2012,37(6):747-761
Speech is one of the most important communication channels among the people. Speech Recognition occupies a prominent place in communication between the humans and machine. Several factors affect the accuracy of the speech recognition system. Much effort was involved to increase the accuracy of the speech recognition system, still erroneous output is generating in current speech recognition systems. Telugu language is one of the most widely spoken south Indian languages. In the proposed Telugu speech recognition system, errors obtained from decoder are analysed to improve the performance of the speech recognition system. Static pronunciation dictionary plays a key role in the speech recognition accuracy. Modification should be performed in the dictionary, which is used in the decoder of the speech recognition system. This modification reduces the number of the confusion pairs which improves the performance of the speech recognition system. Language model scores are also varied with this modification. Hit rate is considerably increased during this modification and false alarms have been changing during the modification of the pronunciation dictionary. Variations are observed in different error measures such as F-measures, error-rate and Word Error Rate (WER) by application of the proposed method.  相似文献   

19.
宋南  吴沛文  杨鸿武 《声学技术》2018,37(4):372-379
针对聋哑人与正常人之间存在的交流障碍问题,提出了一种融合人脸表情的手语到汉藏双语情感语音转换的方法。首先使用深度置信网络模型得到手势图像的特征信息,并通过深度神经网络模型得到人脸信息的表情特征。其次采用支持向量机对手势特征和人脸表情特征分别进行相应模型的训练及分类,根据识别出的手势信息和人脸表情信息分别获得手势文本及相应的情感标签。同时,利用普通话情感训练语料,采用说话人自适应训练方法,实现了一个基于隐Markov模型的情感语音合成系统。最后,利用识别获得的手势文本和情感标签,将手势及人脸表情转换为普通话或藏语的情感语音。客观评测表明,静态手势的识别率为92.8%,在扩充的Cohn-Kanade数据库和日本女性面部表情(Japanese Female Facial Expression,JAFFE)数据库上的人脸表情识别率为94.6%及80.3%。主观评测表明,转换获得的情感语音平均情感主观评定得分4.0分,利用三维情绪模型(Pleasure-Arousal-Dominance,PAD)分别评测人脸表情和合成的情感语音的PAD值,两者具有很高的相似度,表明合成的情感语音能够表达人脸表情的情感。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号