期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multichannel blind deconvolution for source separation in convolutive mixtures of speech

Kokkinakis K. Nandi A.K. 《IEEE transactions on audio, speech, and language processing》2006,14(1):200-212

This paper addresses the blind separation of convolutive and temporally correlated mixtures of speech, through the use of a multichannel blind deconvolution (MBD) method. In the proposed framework (LP-NGA), spatio-temporal separation is carried out by entropy maximization using the well-known natural gradient algorithm (NGA), while a temporal pre-whitening stage, based on linear prediction (LP), manages to fully preserve the original spectral characteristics of each source contribution. Confronted with synthetic convolutive mixtures, we show that the LP-NGA-an unconstrained natural extension to the multichannel BSS problem-benefits not only from fewer model constraints, but also from other factors, such as an overall increase in separation performance, spectral preservation efficiency and speed of convergence. 相似文献

2.

Vowel onset point detection for noisy speech using spectral energy at formant frequencies

Anil Kumar Vuppala K. Sreenivasa Rao 《International Journal of Speech Technology》2013,16(2):229-235

In this paper, we propose a method for robust detection of the vowel onset points (VOPs) from noisy speech. The proposed VOP detection method exploits the spectral energy at formant frequencies of the speech segments present in glottal closure region. In this work, formants are extracted by using group delay function, and glottal closure instants are extracted by using zero frequency filter based method. Performance of the proposed VOP detection method is compared with the existing method, which uses the combination of evidence from excitation source, spectral peaks energy and modulation spectrum. Speech data from TIMIT database and noise samples from NOISEX database are used for analyzing the performance of the VOP detection methods. Significant improvement in the performance of VOP detection is observed by using proposed method compared to existing method. 相似文献

3.

Two-microphone separation of speech mixtures. 总被引：1，自引：0，他引：1

Michael Syskind Pedersen DeLiang Wang Jan Larsen Ulrik Kjems 《Neural Networks, IEEE Transactions on》2008,19(3):475-492

Separation of speech mixtures, often referred to as the cocktail party problem, has been studied for decades. In many source separation tasks, the separation method is limited by the assumption of at least as many sensors as sources. Further, many methods require that the number of signals within the recorded mixtures be known in advance. In many real-world applications, these limitations are too restrictive. We propose a novel method for underdetermined blind source separation using an instantaneous mixing model which assumes closely spaced microphones. Two source separation techniques have been combined, independent component analysis (ICA) and binary time - frequency (T-F) masking. By estimating binary masks from the outputs of an ICA algorithm, it is possible in an iterative way to extract basis speech signals from a convolutive mixture. The basis signals are afterwards improved by grouping similar signals. Using two microphones, we can separate, in principle, an arbitrary number of mixed speech signals. We show separation results for mixtures with as many as seven speech signals under instantaneous conditions. We also show that the proposed method is applicable to segregate speech signals under reverberant conditions, and we compare our proposed method to another state-of-the-art algorithm. The number of source signals is not assumed to be known in advance and it is possible to maintain the extracted signals as stereo signals. 相似文献

4.

具有非奇异约束的线性卷积混合信号盲分离联合对角化方法

李炜杨慧中《控制与决策》2014,29(3):541-545

联合对角化能够成功解决盲分离问题, 但在求解时会得到非期望的奇异解, 从而无法完全分离出源信号. 鉴于此, 提出一种用于线性卷积混合盲分离的联合对角化方法, 将卷积混合模型变换为瞬时模型, 并对变换后的模型应用联合对角化求取分离矩阵. 在求解过程中, 引入约束条件对解的范围进行限定, 避免了奇异解的出现. 仿真结果表明, 所提出的方法能够成功实现卷积混合信号盲分离.

相似文献

5.

Robust Arabic speech recognition in noisy environments using prosodic features and formant

Anissa Imen Amrous Mohamed Debyeche Abderrahman Amrouche 《International Journal of Speech Technology》2011,14(4):351-359

This paper investigates the contribution of formants and prosodic features such as pitch and energy in Arabic speech recognition under real-life conditions. Our speech recognition system based on Hidden Markov Models (HMMs) is implemented using the HTK Toolkit. The front-end of the system combines features based on conventional Mel-Frequency Cepstral Coefficient (MFFC), prosodic information and formants. The experiments are performed on the ARADIGIT corpus which is a database of Arabic spoken words. The obtained results show that the resulting multivariate feature vectors, in noisy environment, lead to a significant improvement, up to 27%, in word accuracy relative the word accuracy obtained from the state-of-the-art MFCC-based system. 相似文献

6.

Supervised and unsupervised classification by histogram overlay techniques

Y. INOMATA S. OGATA 《International journal of remote sensing》2013,34(14):2605-2616

Abstract

A new classification technique was proposed from the viewpoint of memory saving, as well as intensive processing time reduction, to meet the strong requirement for easier operation on a personal computer. To carry out this process efficiently, some neighbouring pixels were lumped into a cell. This idea is based on the fact that changes of the CCT counts in the sea area are monotone, that is histograms are symmetrical, and that cell by cell classification is, therefore, sufficient.

First, a cell distance was denned by extending the concept of the Mahalano-bis' distance, which is the statistical difference between a cell and a cluster. The classification results agree well with those of the conventional Maximum Likelihood Method. We define this method as CDM (Cell Distance Method).

Secondly, an alternative concept which indicates the degree of similarity between two cells was proposed. It was found that this concept, defined as HOM (Histogram Overlay Method), not only improves the speed of processing image data but also has a close relation with the cell distance. In fact, it corresponds fairly to the cell distance under a certain condition.

Thirdly, these two methods were extended to unsupervised classification and applied to the investigation of turbidity in the sea around Hiroshima and Kure, West Japan. 相似文献

7.

基于改进的萤火虫优化算法的混合语音盲分离

李著成黄祥林《计算机应用研究》2019,36(10)

针对传统盲源分离优化算法对分离性能影响较大的局限性,提出了一种基于改进的萤火虫优化的混合语音盲分离算法。将萤火虫的飞行跨度由固定取值变为由新构造的函数自适应调整,在加快收敛速度的同时避免算法早熟现象的发生。实验结果表明,与基于自然梯度、标准萤火虫和粒子群优化的盲分离算法相比,新算法对混合语音信号的分离效果较好,在收敛速度和分离能力方面都有所提升。相似文献

8.

Speech separation using speaker-adapted eigenvoice speech models 总被引：2，自引：1，他引：1

Ron J. Weiss Daniel P.W. Ellis 《Computer Speech and Language》2010,24(1):16-29

We present a system for model-based source separation for use on single channel speech mixtures where the precise source characteristics are not known a priori. The sources are modeled using hidden Markov models (HMM) and separated using factorial HMM methods. Without prior speaker models for the sources in the mixture it is difficult to exactly resolve the individual sources because there is no way to determine which state corresponds to which source at any point in time. This is solved to a small extent by the temporal constraints provided by the Markov models, but permutations between sources remains a significant problem. We overcome this by adapting the models to match the sources in the mixture. We do this by representing the space of speaker variation with a parametric signal model-based on the eigenvoice technique for rapid speaker adaptation. We present an algorithm to infer the characteristics of the sources present in a mixture, allowing for significantly improved separation performance over that obtained using unadapted source models. The algorithm is evaluated on the task defined in the 2006 Speech Separation Challenge [Cooke, M.P., Lee, T.-W., 2008. The 2006 Speech Separation Challenge. Computer Speech and Language] and compared with separation using source-dependent models. Although performance is not as good as with speaker-dependent models, we show that the system based on model adaptation is able to generalize better to held out speakers. 相似文献

9.

基于小波域的非平稳卷积混合
语音信号的自适应盲分离 总被引：1，自引：1，他引：1

下载免费PDF全文

楼红伟胡光锐《控制与决策》2004,19(1):73-76

一些卷积混合信号的盲分离算法是迭代性的，不适于实时应用．为此提出一种基于小波域的算法，用于卷积混合信号的自适应盲分离．对基于小波域的算法进行仿真，并与频域盲信号分离算法进行对比，结果表明所提出的算法能提高盲信号分离的性能。相似文献

10.

基于共振峰合成和韵律调整的语音验证码方法研究*

汪成亮张玉维《计算机应用研究》2011,28(7):2458-2461

为了提高语音验证技术的有效性,提出了一种基于共振峰合成、修改时长和调节韵律的随机语音验证码生成方法。该方法选择音素作为语音合成单元,基于规则在合成过程中设定随机语速参数,以及调整单元之间的连接规则来实现韵律的随机调整,使得语速和韵律具有不确定性和不可预测性,从而有效降低了自动语音识别技术（ASR）对语音码的识别率,增强了语音验证码的抗攻击性。合成的语音验证码的人耳识别率达到了90%左右,ASR的识别率为28.8%,主观平均判分（MOS）为4分,语音码的可懂度和清晰度达到了满意的效果。实验结果验证了所提方法的可行性。相似文献

11.

Enhancement of speech signals separated from their convolutive mixture by FDICA algorithm

Rajkishore Prasad Hiroshi Saruwatari Kyohiro Shikano 《Digital Signal Processing》2009,19(1):127-133

This paper presents a novel method for the enhancement of independent components of mixed speech signal segregated by the frequency domain independent component analysis (FDICA) algorithm. The enhancement algorithm proposed here is based on maximum a posteriori (MAP) estimation of the speech spectral components using generalized Gaussian distribution (GGD) function as the statistical model for the time–frequency series of speech (TFSS) signal. The proposed MAP estimator has been used and evaluated as the post-processing stage for the separation of convolutive mixture of speech signals by the fixed-point FDICA algorithm. It has been found that the combination of separation algorithm with the proposed enhancement algorithm provides better separation performance under both the reverberant and non-reverberant conditions. 相似文献

12.

语音信号共振峰频率估计的分段线性预测算法

下载免费PDF全文

陈宁万茂文《计算机工程与应用》2009,45(28):156-159

基于分段线性预测算法估计语音的共振峰频率,运用多通道的滤波器组对语音的频段进行划分,然后选择合适的逆滤波器逼近不同频段的短时频谱,最后依据该逆滤波器估计共振峰频率。实验结果表明,与传统方法相比,该方法提高了语音共振峰频率估计时的分辨率与准确性,受噪声的影响较小。相似文献

13.

语音分离与HMM相结合的语音增强方法

下载免费PDF全文

刘凤增李国辉唐敏《计算机工程与应用》2013,49(16):196-200

针对基于隐马尔科夫（HMM,Hidden Markov Model）的MAP和MMSE两种语音增强算法计算量大且前者不能处理非平稳噪声的问题,借鉴语音分离方法,提出了一种语音分离与HMM相结合的语音增强算法。该算法采用适合处理非平稳噪声的多状态多混合单元HMM,对带噪语音在语音模型和噪声模型下的混合状态进行解码,结合语音分离方法中的最大模型理论进行语音估计,避免了迭代过程和计算量特别大的公式计算,减少了计算复杂度。实验表明,该算法能够有效地去除平稳噪声和非平稳噪声,且感知评价指标PESQ 的得分有明显提高,算法时间也得到有效控制。相似文献

14.

基于多语BERT的无监督攻击性言论检测

师夏阳张风远袁嘉琪黄敏《计算机应用》2022,42(11):3379-3385

攻击性言论会对社会安定造成严重不良影响,但目前攻击性言论自动检测主要集中在少数几种高资源语言,对低资源语言缺少足够的攻击性言论标注语料导致检测困难,为此,提出一种跨语言无监督攻击性迁移检测方法。首先,使用多语BERT（mBERT）模型在高资源英语数据集上进行对攻击性特征的学习,得到一个原模型;然后,通过分析英语与丹麦语、阿拉伯语、土耳其语、希腊语的语言相似程度,将原模型迁移到这四种低资源语言上,实现对低资源语言的攻击性言论自动检测。实验结果显示,与BERT、线性回归（LR）、支持向量机（SVM）、多层感知机（MLP）这四种方法相比,所提方法在丹麦语、阿拉伯语、土耳其语、希腊语这四种语言上的攻击性言论检测的准确率和F1值均提高了近2个百分点,接近目前的有监督检测,可见采用跨语言模型迁移学习和迁移检测相结合的方法能够实现对低资源语言的无监督攻击性检测。相似文献

15.

Improvement in monaural speech separation using sparse non-negative tucker decomposition

Yash Vardhan Varshney Prashant Upadhyaya Zia Ahmad Abbasi Musiur Raza Abidi Omar Farooq 《International Journal of Speech Technology》2018,21(4):837-849

A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced in this paper. In the proposed work, the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD. By using the proposed algorithm, the vector components of both target and mixed signal can be exploited and used for the separation of any monaural mixture. Experiment was done on the monaural data generated by mixing the speech signals from two speakers and, by mixing noise and speech signals using TIMIT and noisex-92 dataset. The separation results are compared with the other existing algorithms in terms of correlation of separated signal with the original signal, signal to distortion ratio, perceptual evaluation of speech quality and short-time objective intelligibility. Further, to get more conclusive information about separation ability, speech recognition using Kaldi toolkit was also performed. The recognition results are compared in terms of word error rate (WER) using the MFCC based features. Results show the average improved WER using proposed algorithm over the nearest performing algorithm is up to 2.7% for mixed speech of two speakers and 1.52% for noisy speech input. 相似文献

16.

Supervised and unsupervised relevance sampling in handcrafted and deep learning features obtained from image collections

《Applied Soft Computing》2019

Image collections are currently widely available and are being generated in a fast pace due to mobile and accessible equipment. In principle, that is a good scenario taking into account the design of successful visual pattern recognition systems. However, in particular for classification tasks, one may need to choose which examples are more relevant in order to build a training set that well represents the data, since they often require representative and sufficient observations to be accurate. In this paper we investigated three methods for selecting relevant examples from image collections based on learning models from small portions of the available data. We considered supervised methods that need labels to allow selection, and an unsupervised method that is agnostic to labels. The image datasets studied were described using both handcrafted and deep learning features. A general purpose algorithm is proposed which uses learning methods as subroutines. We show that our relevance selection algorithm outperforms random selection, in particular when using unlabelled data in an unsupervised approach, significantly reducing the size of the training set with little decrease in the test accuracy. 相似文献

17.

基于频域卷积信号盲源分离的乐曲数据库构建* 总被引：1，自引：1，他引：0

李鹏周明全黎南杉王学松a 《计算机应用研究》2010,27(4):1376-1379

将通过频域卷积信号盲源分离算法从MP3歌曲音频信号中分离出人声主唱信号,再从人声主唱信号中提取出能够表征歌曲的旋律特征构建哼唱检索系统的歌曲数据库。盲源分离要求观测信号数目不小于源信号数目,因此先用小波多分辨率分析构造一路观测信号,再用频域独立成分分析(FDICA)实现MP3歌曲音频信号的盲源分离(BSS)。实验证明,采用FDICA-based BSS从歌曲MP3中分离出的人声主唱信号的旋律特征与待检索的人声哼唱信号的旋律特征有较高的相似度,可以用歌曲MP3构建哼唱检索系统的歌曲数据库。相似文献

18.

基于卷积盲分离神经网络的过程信号去噪算法

华容《计算机工程与设计》2007,28(17):4217-4219

研究了一种基于卷积混叠盲信号分离的简约神经网络算法,简称RCMNN.可使线性静态或动态传输通道中混叠的多源信号实现分离,从而可有效应用于传输通道中过程信号上的去噪,是控制工程中去噪的新方法.对多类不同信号组合仿真表明,该算法是有效的,网络性能是稳定的. 相似文献

19.

Monaural speech separation and recognition challenge 总被引：2，自引：1，他引：1

Martin Cooke John R. Hershey Steven J. Rennie 《Computer Speech and Language》2010,24(1):1-15

Robust speech recognition in everyday conditions requires the solution to a number of challenging problems, not least the ability to handle multiple sound sources. The specific case of speech recognition in the presence of a competing talker has been studied for several decades, resulting in a number of quite distinct algorithmic solutions whose focus ranges from modeling both target and competing speech to speech separation using auditory grouping principles. The purpose of the monaural speech separation and recognition challenge was to permit a large-scale comparison of techniques for the competing talker problem. The task was to identify keywords in sentences spoken by a target talker when mixed into a single channel with a background talker speaking similar sentences. Ten independent sets of results were contributed, alongside a baseline recognition system. Performance was evaluated using common training and test data and common metrics. Listeners’ performance in the same task was also measured. This paper describes the challenge problem, compares the performance of the contributed algorithms, and discusses the factors which distinguish the systems. One highlight of the comparison was the finding that several systems achieved near-human performance in some conditions, and one out-performed listeners overall. 相似文献

20.

Efficient underdetermined speech signal separation using encompassed Hammersley- Clifford algorithm and hardware implementation

《Microprocessors and Microsystems》2021

Speech Separation is among the propelled advances for a wide range of uses in different sectors, where detachment from the Blind Source Separation Signal is a troublesome task. Blind source separation is a growing digital signal processing industry to separate the precise signal from the recorded dense. Exclusively, among the "Blind Source Separation," the "Under Determined Blind Source Separation" is considered as an Over Determined Blind Source Separation due to its wide range of usage. Nevertheless, it is seen that real implementation is very rarely done in existing researches because the real-time Implementation of UBSS (Underdetermined Blind Source Separation) exists to be a challenging one due to its lacking hardware characteristics of increased latency, reduced speed and consumption of more memory space. Consequently, an increasing need to implement an Underdetermined source signal separation in real-time with improved hardware utility. In this Unswerving framework, a Real-time feasible Source Signal separator formulated in which the source signals decomposed by Boosted Band-Limited VMD (Variational Mode Decomposition) "Multicomponent Signal”. The amount of "Band-Limited” Intrinsic Mode Function (BLIMF) was subjected to the Encompassed Hammersley–Clifford algorithm for source separation using Expectation-Maximization and Gibbs Sampling, an alternative to deterministic algorithms and to determine the exact estimated parameter from the E-M method. Subsequently, the source separation algorithm infers the best separation of source signals by exact estimation and determination from the decomposed signals. The iterations in E-M estimation reduced by the Gauss-Seidel Method. Thus, our novel source signal separates internally with a signal decomposer and a source separation algorithm with fewer iterations, which reduces memory consumption and yields better hardware realization with reduced latency and increased speed. The proposed implementation is done by utilizing Matlab for initial processing and the hardware analysis performed in Xilinx Platform. 相似文献