首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 828 毫秒
1.
Abstract

This paper presents a semi‐automatic phonetic labeling method for processing in the MAT (Mandarin across Taiwan) speech database. MAT speech data are collected through the telephone networks. Each utterance has been transcribed into Chinese characters and Pinyin symbols. The proposed phonetic labeling method will mark the syllable and sub‐syllable boundaries in an utterance. Phonetic symbols are assigned to each segmented syllable. The segmentation process is accomplished by using hidden Markov modeling (HMM) and Viterbi decoding. The accuracy of syllable segmentation is detected by measuring the syllable length and the distance of a syllable from its state models. The experimental results show that the proposed labeling method can achieve segmentation accuracy around 90% for an allowed tolerance of 16 ms.  相似文献   

2.
Longitudinal motion during in vivo pullbacks acquisition of intravascular ultrasound (IVUS) sequences is a major artifact for 3-D exploring of coronary arteries. Most current techniques are based on the electrocardiogram (ECG) signal to obtain a gated pullback without longitudinal motion by using specific hardware or the ECG signal itself. We present an image-based approach for cardiac phase retrieval from coronary IVUS sequences without an ECG signal. A signal reflecting cardiac motion is computed by exploring the image intensity local mean evolution. The signal is filtered by a band-pass filter centered at the main cardiac frequency. Phase is retrieved by computing signal extrema. The average frame processing time using our setup is 36 ms. Comparison to manually sampled sequences encourages a deeper study comparing them to ECG signals.  相似文献   

3.
The focus of this paper is to automatically segment and label continuous speech signal into syllable-like units for Indian languages. In this approach, the continuous speech signal is first automatically segmented into syllable-like units using group delay based algorithm. Similar syllable segments are then grouped together using an unsupervised and incremental training (UIT) technique. Isolated style HMM models are generated for each of the clusters during training. During testing, the speech signal is segmented into syllable-like units which are then tested against the HMMs obtained during training. This results in a syllable recognition performance of 42·6% and 39·94% for Tamil and Telugu. A new feature extraction technique that uses features extracted from multiple frame sizes and frame rates during both training and testing is explored for the syllable recognition task. This results in a recognition performance of 48·7% and 45·36%, for Tamil and Telugu respectively. The performance of segmentation followed by labelling is superior to that of a flat start syllable recogniser (27·8%and 28·8%for Tamil and Telugu respectively).  相似文献   

4.
Rabal H  Pomarico J  Arizaga R 《Applied optics》1994,33(20):4358-4360
We present a digital speckle-pattern interferometric setup that can be operated at TV frame rates (30 ms) to display the locus of the points at which the optical-path difference between the reference and object beams is within the coherence length. Experimental results are shown.  相似文献   

5.
端点检测技术是语音信号处理的关键技术之一,为提高低信噪比环境下端点检测的准确率和稳健性,提出了一种非平稳噪声抑制和调制域谱减结合功率归一化倒谱距离的端点检测算法。该算法首先通过抑制非平稳噪声再采用调制域谱减消除残余噪声来提升信噪比,减少语音失真。然后再提取每帧信号的功率归一化倒谱系数,计算每帧信号与背景噪声的功率归一化倒谱距离。最后将该倒谱距离作为检测参数,采用双门限判决方法进行端点检测。实验结果表明,该端点检测算法对语音帧和噪声帧具有较好的区分性。此外,在低信噪比环境下,所提出的算法对于不同类型的噪声都具有较好的稳健性。  相似文献   

6.
By recognizing sensory information, through touch, vision, or voice sensory modalities, a robot can interact with people in a more intelligent manner. In human–robot interaction (HRI), emotion recognition has been a popular research topic in recent years. This paper proposes a method for emotion recognition, using a speech signal to recognize several basic human emotional states, for application in an entertainment robot. The proposed method uses voice signal processing and classification. Firstly, end-point detection and frame setting are accomplished in the pre-processing stage. Then, the statistical features of the energy contour are computed. Fisher’s linear discriminant analysis (FLDA) is used to enhance the recognition rate. In the final stage, a support vector machine (SVM) is used to complete the emotional state classification. In order to determine the effectiveness of emotional HRI, an embedded system was constructed and integrated with a self-built entertainment robot. The experimental results for the entertainment robot show that the robot interacts with a person in a responsive manner. The average recognition rate for five emotional states is 73.8% using the database constructed in the authors’ lab.  相似文献   

7.
8.
Existing speech retrieval systems are frequently confronted with expanding volumes of speech data. The dynamic updating strategy applied to construct the index can timely process to add or remove unnecessary speech data to meet users’ real-time retrieval requirements. This study proposes an efficient method for retrieving encryption speech, using unsupervised deep hashing and B+ tree dynamic index, which avoid privacy leakage of speech data and enhance the accuracy and efficiency of retrieval. The cloud’s encryption speech library is constructed by using the multi-threaded Dijk-Gentry-Halevi-Vaikuntanathan (DGHV) Fully Homomorphic Encryption (FHE) technique, which encrypts the original speech. In addition, this research employs Residual Neural Network18-Gated Recurrent Unit (ResNet18-GRU), which is used to learn the compact binary hash codes, store binary hash codes in the designed B+ tree index table, and create a mapping relation of one to one between the binary hash codes and the corresponding encrypted speech. External B+ tree index technology is applied to achieve dynamic index updating of the B+ tree index table, thereby satisfying users’ needs for real-time retrieval. The experimental results on THCHS-30 and TIMIT showed that the retrieval accuracy of the proposed method is more than 95.84% compared to the existing unsupervised hashing methods. The retrieval efficiency is greatly improved. Compared to the method of using hash index tables, and the speech data’s security is effectively guaranteed.  相似文献   

9.
本文讨论了一种带有参考通道的自适应话音消噪滤波器原理,该滤波器采用最小均方差(心幅)算法。将此滤波器应用于复杂噪音背景的话音信号提取,能很好地抑制背景噪声,从而获得清晰的话音信号。  相似文献   

10.
The standard reference clinical score quantifying average Parkinson''s disease (PD) symptom severity is the Unified Parkinson''s Disease Rating Scale (UPDRS). At present, UPDRS is determined by the subjective clinical evaluation of the patient''s ability to adequately cope with a range of tasks. In this study, we extend recent findings that UPDRS can be objectively assessed to clinically useful accuracy using simple, self-administered speech tests, without requiring the patient''s physical presence in the clinic. We apply a wide range of known speech signal processing algorithms to a large database (approx. 6000 recordings from 42 PD patients, recruited to a six-month, multi-centre trial) and propose a number of novel, nonlinear signal processing algorithms which reveal pathological characteristics in PD more accurately than existing approaches. Robust feature selection algorithms select the optimal subset of these algorithms, which is fed into non-parametric regression and classification algorithms, mapping the signal processing algorithm outputs to UPDRS. We demonstrate rapid, accurate replication of the UPDRS assessment with clinically useful accuracy (about 2 UPDRS points difference from the clinicians'' estimates, p < 0.001). This study supports the viability of frequent, remote, cost-effective, objective, accurate UPDRS telemonitoring based on self-administered speech tests. This technology could facilitate large-scale clinical trials into novel PD treatments.  相似文献   

11.
为了在设计阶段获得滚珠丝杠进给系统相对精确的二阶数学模型,提出了一种基于Lyapunov稳定性理论建立模型参考自适应系统,以二阶系统作为参考模型,以理论模型作为控制对象的仿真建模方法。结合实例,建立了考虑粘性摩擦和传动刚度的滚珠丝杠进给系统理论模型,运用MATLAB/Simulink建立了模型参考自适应系统的仿真模型,并利用仿真结果推导出了进给系统的二阶数学模型。结果表明:二阶模型的输出能够很好地跟踪理论模型的输出;同时对两模型输入1毫秒的单位脉冲信号时,相对误差在0.314%以内。  相似文献   

12.
赵建平  原猛  冯海泓 《声学技术》2013,32(3):217-221
宽动态范围压缩算法作为助听器非线性听力补偿的核心算法,其释放时间常数的设定可影响言语理解度。根据普通话的语音特点,将宽动态范围压缩算法按频率范围划分为低频区间(  相似文献   

13.
提出一种用已编码的系统信息符号替代大部分参考信号,减少无意义参考信号插入参考信号结构,从而使得系统整体传输效率得到提高.两天线发射分集条件下,该参考信号结构所插入无意义参考信号的数量比LTE原有方案减少了97.5%,只占整个数据帧中可用时频资源的0.238%,并且通过迭代信道估计和再次译码,误码率性能略优于原方案.四天线时,该参考信号结构所插入无意义参考信号的数量比LTE原有方案减少了96.7%,性能上也比原结构稍有提高.另外,对该参考信号结构在不同衰落、不同码率、不同多普勒频移等多种条件下的性能进行了比较.  相似文献   

14.
邓步  李弘毅  顾亚平 《声学技术》2022,41(3):465-472
针对低空高速飞行目标轨迹估计问题,提出了一种利用多阵列进行空间匹配的轨迹估计方法。该方法首先对探测范围内的空间点进行划分,计算其相对于阵列参考坐标的真实方位角与俯仰角并建立字典进行存储。将传感器阵列接收到的信号进行分帧处理,对每一帧数据计算波达方位。理论上,在探测范围内,高速运动目标为直线运动。在理想情况下,对于单个阵列,每一帧数据得到的方向矢量与参考坐标所构成的直线均处于同一平面。对每一帧数据估计得到的波达方位在字典上进行匹配,找到误差最小的N个空间点,进行直线拟合。再对由两帧以上数据得到的直线进行空间平面拟合,得到估计平面。两个阵列所得的估计平面的相交直线即为高速飞行目标的估计轨迹。仿真结果验证了高速飞行目标的轨迹估计算法的有效性。  相似文献   

15.
陈颖  肖仲喆 《声学技术》2018,37(4):380-387
建立了一个将离散情感标签与维度情感空间结合起来的汉语情感语音数据库。该数据库由16名母语为汉语的说话人对情感语音进行表演型录制。语音样本是根据中性、愉悦、高兴、沮丧、愤怒、哀伤,以及悲伤等七种离散的情感标签采集而得,每名说话人有336条语音样本。随后由三名标注人在维度空间上对每条语音样本进行标注。最后,根据标注所得的数据来研究这七种情感在维度空间的分布情况,并分析了情感在一致性、集中性和差异性方面的性能。除此以外,还计算了这七种情感的情感识别率。结果显示,三名标注人对该数据库标注的一致性都达到了80%以上,情感之间的可区分度较高,并且七种情感的情感识别率均高于基线水平。因此,该数据库具有较好的情感质量,能够为离散情感标签到维度情感空间的转化提供重要的研究依据。  相似文献   

16.
The field of digital audio forensics aims to detect threats and fraud in audio signals. Contemporary audio forensic techniques use digital signal processing to detect the authenticity of recorded speech, recognize speakers, and recognize recording devices. User-generated audio recordings from mobile phones are very helpful in a number of forensic applications. This article proposed a novel method for recognizing recording devices based on recorded audio signals. First, a database of the features of various recording devices was constructed using 32 recording devices (20 mobile phones of different brands and 12 kinds of recording pens) in various environments. Second, the audio features of each recording device, such as the Mel-frequency cepstral coefficients (MFCC), were extracted from the audio signals and used as model inputs. Finally, support vector machines (SVM) with fractional Gaussian kernel were used to recognize the recording devices from their audio features. Experiments demonstrated that the proposed method had a 93.4% accuracy in recognizing recording devices.  相似文献   

17.
谈新权  陈锐 《光电工程》2002,29(1):23-25
云层使PPM光脉冲产生时间扩展。全信号检测带来困难。给出平均多路经时间扩展的计算式。并提出了克服困难的一种方法:将电脉冲非线性展宽,以减小云层厚度对脉宽度的影响。模拟实验表明,当输入脉冲宽度变化范围20ns-5μs时,输出脉冲宽度变化为1-5μs。  相似文献   

18.
陈雪勤  刘正  赵鹤鸣 《声学技术》2008,27(5):704-707
提出了一种具有较高精度且抗噪性能强的基音检测算法。该算法将线性预测残差看作语音源信号的近似,对其进行频谱分析,依据残差幅度谱算得基音周期的粗估值。然后回到时域信号,根据基音周期粗估值设计一长度可调的窗,通过窗函数在语音段连续取两段语音信号作相似度运算,可根据最大相似度值计算出准确的基音周期。该方法准确性高,在噪声环境下也具有较好的效果。  相似文献   

19.
Cochlear prosthesis systems for postlingually deaf individuals (those who have become deaf due to disease or injury after having developed mature speech capability) are considered. These systems require the surgical implantation of an array of electrodes within the cochlea and are driven by processed sound signals from outside the body. A system that uses an analog signal approach for transcutaneous transfer of six processed speech data channels using frequency multiplexing is described. The system utilizes a filterbank of six narrowband surface acoustic wave (SAW) filters in the range 72-78 MHz with a 1.2-MHz channel spacing to multiplex the six carrier signals, frequency modulated, by the processed speech signals, onto a composite signal. The same SAW filters are used in the receiver filterbank for signal separation, but are housed in a miniaturized package. The system includes a portable transmitter and a receiver package which is to be implanted in the patient. The implanted circuits are supplied exclusively from power transferred from outside the body via a separate 10-MHz transcutaneous link.  相似文献   

20.
The application of a thermal-wave resonant cavity to thermal-diffusivity measurements of gases has been investigated. The cavity was constructed using a thin aluminum foil wall as the intensitv-modulated laser-beam oscillator source opposite a pyroclectric polyvilidene fluoride wall acting as a signal transducer. Theoretically, cavity-length and modulation-frequency scans both produce resonance-like extrema in lock-in in-phase and quadrature curses. These extrema can be used to measure the thermal diffusivity of the gas within the cavity. It was found experimentally that one can obtain. very accurate and reproducible measurements of the thermal diffusivity of the gas by using simple cavity-length scanning without any signal normalization procedure. rather than traditional modulation-frequency scanning; normalized by the frequency-dependent transfer function of the instrumentation. By scanning the cavity length, the thermal diffisivity of room air at 299 K was measured with three-significant figure precision as 0.216±11.001 cm2·s–1, with a standard deviation 0.5%. Only two significant figure accuracy could be obtained by scanning the frequency: 0.22±0.03 cm2·s–1, with a standard deviation of 14%. Cavity-length scanning consistently exhibited a much higher signal-to-noise ratio.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号