首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
一种基于MFCC和LPCC的文本相关说话人识别方法   总被引:1,自引:0,他引:1  
于明  袁玉倩  董浩  王哲 《计算机应用》2006,26(4):883-885
在说话人识别的建模过程中,为传统矢量量化模型的码字增加了方差分量,形成了一种新的连续码字分布的矢量量化模型。同时采用美尔倒谱系数及其差分和线性预测倒谱系数及其差分相结合作为识别的特征参数,来进行与文本有关的说话人识别。通过与动态时间规整算法和传统的矢量量化方法进行比较表明,在系统响应时间并未明显增加的基础上,该模型识别率有一定提高。  相似文献   

2.
《Real》2005,11(4):270-281
Recently, Shen et al. [IEEE Transactions on Image Processing 2003;12:283–95] presented an efficient adaptive vector quantization (AVQ) algorithm and their proposed AVQ algorithm has a better peak signal-to-noise ratio (PSNR) than that of the previous benchmark AVQ algorithm. This paper presents an improved AVQ algorithm based on the proposed hybrid codebook data structure which consists of three codebooks—the locality codebook, the static codebook, and the history codebook. Due to easy maintenance advantage, the proposed AVQ algorithm leads to a considerable computation-saving effect while preserving the similar PSNR performance as in the previous AVQ algorithm by Shen et al. [IEEE Transactions on Image Processing 2003;12:283–95]. Experimental results show that the proposed AVQ algorithm over the previous AVQ algorithm has about 75% encoding time improvement ratio while both algorithms have the similar PSNR performance.  相似文献   

3.
This paper describes a method for designing a codebook for vector quantization (VQ), based on preprocessing of the input data which makes them block-stationary, and on a criterion which takes into account the error visibility of the image to be coded. Test results, carried out at about 1.2 bits/pel bit rate, indicate that the proposed VQ enables reconstruction of images (both outside and inside the training set) with very low distortion, and exhibits high robustness, the variance of the SNR being sensibly lower than in the case of unprocessed data.  相似文献   

4.
In this paper, an invisible hybrid color image hiding scheme based on spread vector quantization (VQ) neural network with penalized fuzzy c-means (PFCM) clustering technology (named SPFNN) is proposed. The goal is to offer safe exchange of a color stego-image in the internet. In the proposed scheme, the secret color image is first compressed by a spread-unsupervised neural network with PFCM based on interpolative VQ (IVQ), then the block cipher Data Encryption Standard (DES) and the Rivest, Shamir and Adleman (RSA) algorithms are hired to provide the mechanism of a hybrid cryptosystem for secure communication and convenient environment in the internet. In the SPFNN, the penalized fuzzy clustering technology is embedded in a two-dimensional Hopfield neural network in order to generate optimal solutions for IVQ. Then we encrypted color IVQ indices and sorted the codebooks of secret color image information and embedded them into the frequency domain of the cover color image by the Hadamard transform (HT). Our proposed method has two benefits comparing with other data hiding techniques. One is the high security and convenience offered by the hybrid DES and RSA cryptosystems to exchange color image data in the internet. The other benefit is that excellent results can be obtained using our proposed color image compression scheme SPFNN method.  相似文献   

5.
针对语音识别率不高的问题,提出一种基于PCS-PCA和支持向量机的分级说话人确认方法.首先采用主成分分析法对话者特征向量降维的同时,得到说话人特征向量的主成份空间,在此空间中构造PCS-PCA分类器,筛选可能的目标说话人,然后采用支持向量机进行最终的说话人确认.仿真实验结果表明该方法具有较高的识别率和较快的训练速度.  相似文献   

6.
针对训练和测试阶段中的语音数据类型(普通话和四川方言)的不匹配导致说话人确认系统性能下降很大的问题,提出了一种新的建立高斯混合模型(GMM)方法——普通话和四川方言按比例混合建立普通话和四川方言联合GMM的方法,并发现使系统针对普通话和四川方言不匹配导致的性能下降率至很低(2.79%)的比例。实验结果表明,该方法可以有效地加强测试阶段针对语种变化的鲁棒性,可以有效的减少普通话和四川方言在训练和测试阶段的不匹配造成的性能下降率。  相似文献   

7.
In this paper, we propose a sub-vector based speaker characterization method for biometric speaker verification, where speakers are represented by uniform segmentation of their maximum likelihood linear regression (MLLR) super-vectors called m-vectors. The MLLR transformation is estimated with respect to universal background model (UBM) without any speech/phonetic information. We introduce two strategies for segmentation of MLLR super-vector: one is called disjoint and other is an overlapped window technique. During test phase, m-vectors of the test utterance are scored against the claimant speaker. Before scoring, m-vectors are post-processed to compensate the session variability. In addition, we propose a clustering algorithm for multiple-class wise MLLR transformation, where Gaussian components of the UBM are clustered into different groups using the concept of expectation maximization (EM) and maximum likelihood (ML). In this case, MLLR transformations are estimated with respect to each class using the sufficient statistics accumulated from the Gaussian components belonging to the particular class, which are then used for m-vector system. The proposed method needs only once alignment of the data with respect to the UBM for multiple MLLR transformations. We first show that the proposed multi-class m-vector system shows promising speaker verification performance when compared to the conventional i-vector based speaker verification system. Secondly, the proposed EM based clustering technique is robust to the random initialization in-contrast to the conventional K-means algorithm and yields system performance better/equal which is best obtained by the K-means. Finally, we show that the fusion of the m-vector with the i-vector further improves the performance of the speaker verification in both score as well as feature domain. The experimental results are shown on various tasks of NIST 2008 speaker recognition evaluation (SRE) core condition.  相似文献   

8.
Visual (image and video) database systems require efficient indexing to enable fast access to the images in a database. In addition, the large memory capacity and channel bandwidth requirements for the storage and transmission of visual data necessitate the use of compression techniques. We note that image/video indexing and compression are typically pursued independently. This reduces the storage efficiency and may degrade the system performance. In this paper, we present novel algorithms based on vector quantization (VQ) for indexing of compressed images and video. To start with, the images are compressed using VQ. In the first technique, for each codeword in the codebook, a histogram is generated and stored along with the codeword. We note that the superposition of the histograms of the codewords, which are used to represent an image, is a close approximation of the histogram of the image. This histogram is used as an index to store and retrieve the image. In the second technique, the histogram of the labels of an image is used as an index to access the image. We also propose an algorithm for indexing compressed video sequences. Here, each frame is encoded in the intraframe mode using VQ. The labels are used for the segmentation of a video sequence into shots, and for indexing the representative frame of each shot. The proposed techniques not only provide fast access to stored visual data, but also combine compression and indexing. The average retrieval rates are 95% and 94% at compression ratios of 16:1 and 64:1, respectively. The corresponding cut detection rates are 97% and 90%, respectively.  相似文献   

9.
This paper proposes a new codebook generation algorithm for image data compression using a combined scheme of principal component analysis (PCA) and genetic algorithm (GA). The combined scheme makes full use of the near global optimal searching ability of GA and the computation complexity reduction of PCA to compute the codebook. The experimental results show that our algorithm outperforms the popular LBG algorithm in terms of computational efficiency and image compression performance.  相似文献   

10.
With the growing trend toward remote security verification procedures for telephone banking, biometric security measures and similar applications, automatic speaker verification (ASV) has received a lot of attention in recent years. The complexity of ASV system and its verification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we concentrate on optimizing dimensionality of feature space by selecting relevant features. At present there are several methods for feature selection in ASV systems. To improve performance of ASV system we present another method that is based on ant colony optimization (ACO) algorithm. After feature reduction phase, feature vectors are applied to a Gaussian mixture model universal background model (GMM-UBM) which is a text-independent speaker verification model. The performance of proposed algorithm is compared to the performance of genetic algorithm on the task of feature selection in TIMIT corpora. The results of experiments indicate that with the optimized feature set, the performance of the ASV system is improved. Moreover, the speed of verification is significantly increased since by use of ACO, number of features is reduced over 80% which consequently decrease the complexity of our ASV system.  相似文献   

11.
In speaker recognition tasks, one of the reasons for reduced accuracy is due to closely resembling speakers in the acoustic space. In order to increase the discriminative power of the classifier, the system must be able to use only the unique features of a given speaker with respect to his/her acoustically resembling speaker. This paper proposes a technique to reduce the confusion errors, by finding speaker-specific phonemes and formulate a text using the subset of phonemes that are unique, for speaker verification task using i-vector based approach. In this paper spectral features such as linear prediction cepstral co-efficients (LPCC), perceptual linear prediction co-efficients (PLP) and phase feature such as modified group delay are experimented to analyse the importance of speaker-specific-text in speaker verification task. Experiments have been conducted on speaker verification task using speech data of 50 speakers collected in a laboratory environment. The experiments show that the equal error rate (EER) has been decreased significantly using i-vector approach with speaker-specific-text when compared to i-vector approach with random-text using different spectral and phase based features.  相似文献   

12.
The characterization of a speech signal using non-linear dynamical features has been the focus of intense research lately. In this work, the results obtained with time-dependent largest Lyapunov exponents (TDLEs) in a text-dependent speaker verification task are reported. The baseline system used Gaussian mixture models (GMMs), obtained from the adaptation of a universal background model (UBM), for the speaker voice models. Sixteen cepstral and 16 delta cepstral features were used in the experiments, and it is shown how the addition of TDLEs can improve the system’s accuracy. Cepstral mean subtraction was applied to all features in the tests for channel equalization, and silence frames were discarded. The corpus used, obtained from a subset of the Center for Spoken Language Understanding (CSLU) Speaker Recognition corpus, consisted of telephone speech from 91 different speakers.  相似文献   

13.
Codebook of conventional VQ cannot be generally used and needs real time onboard updating,which is hard to implement in spaceborne SAR system.In order to solve this problem,this paper analyses the characteristic of space-borne SAR raw data firstly,and then utilizes the distortion function of multidimensional space as criterion,and finally the adaptive code book design algorithm is proposed according to the joint probability density function of the input data.Besides,the feasibility of the new algorithm in cascade with entropy coding and the robustness of the algorithm when error occurs during transmission are analysed based on the encoding and decoding scheme.Experimental results of real data show that codebook deriving from the new algorithm can be generally used and designed off-line,which makes VQ a practical algorithm for space-borne SAR raw data compression.  相似文献   

14.
传统矢量量化器的码本普适性差,需要在线更新,难以在星载SAR系统中实现.文中针对星载SAR原始数据的统计特性,以多维空间上的失真函数为代价函数,根据输入数据的联合概率密度函数设计得到了具有普适性的矢量量化码本,分析了原始数据矢量量化编码以及解码方案.在此基础上,深入研究了矢量量化级联熵编码方案的可行性以及码字索引在信道传输发生误码时算法的稳健性.实际数据处理结果表明,文中算法具有普适性,矢量量化码本的普适性使得码本可以进行离线设计,为矢量量化的星载实用化提供了理论指导.  相似文献   

15.
This paper presents a novel classified self-organizing map method for edge preserving quantization of images using an adaptive subcodebook and weighted learning rate. The subcodebook sizes of two classes are automatically adjusted in training iterations based on modified partial distortions that can be estimated incrementally. The proposed weighted learning rate updates the neuron efficiently no matter of how large the weighting factor is. Experimental results show that the new method achieves better quality of reconstructed edge blocks and more spread out codebook and incurs a significantly less computational cost as compared to the competing methods.  相似文献   

16.
Image subband coding using fuzzy inference and adaptive quantization   总被引:2,自引:0,他引:2  
Wavelet image decomposition generates a hierarchical data structure to represent an image. Recently, a new class of image compression algorithms has been developed for exploiting dependencies between the hierarchical wavelet coefficients using zerotrees. This paper deals with a fuzzy inference filter for image entropy coding by choosing significant coefficients and zerotree roots in the higher frequency wavelet subbands. Moreover, an adaptive quantization is proposed to improve the coding performance. Evaluating with the standard images, the proposed approaches are comparable or superior to most state-of-the-art coders. Based on the fuzzy energy judgment, the proposed approaches can achieve an excellent performance on the combination applications of image compression and watermarking.  相似文献   

17.
18.
An intelligent system for text-dependent speaker recognition is proposed in this paper. The system consists of a wavelet-based module as the feature extractor of speech signals and a neural-network-based module as the signal classifier. The Daubechies wavelet is employed to filter and compress the speech signals. The fuzzy ARTMAP (FAM) neural network is used to classify the processed signals. A series of experiments on text-dependent gender and speaker recognition are conducted to assess the effectiveness of the proposed system using a collection of vowel signals from 100 speakers. A variety of operating strategies for improving the FAM performance are examined and compared. The experimental results are analyzed and discussed.  相似文献   

19.
Self-organizing maps, vector quantization, and mixture modeling   总被引:1,自引:0,他引:1  
Self-organizing maps are popular algorithms for unsupervised learning and data visualization. Exploiting the link between vector quantization and mixture modeling, we derive expectation-maximization (EM) algorithms for self-organizing maps with and without missing values. We compare self-organizing maps with the elastic-net approach and explain why the former is better suited for the visualization of high-dimensional data. Several extensions and improvements are discussed. As an illustration we apply a self-organizing map based on a multinomial distribution to market basket analysis.  相似文献   

20.
In this paper, we use the Gaussian mixture model (GMM) based multidimensional companding quantization framework to develop two important quantization schemes. In the first scheme, the scalar quantization in the companding framework is replaced by more efficient lattice vector quantization. Low-complexity lattice pruning and quantization schemes are provided for the E/sub 8/ Gossett lattice. At moderate to high bit rates, the proposed scheme recovers much of the space-filling loss due to the product vector quantizers (PVQ) employed in earlier work, and thereby, provides improved performance with a marginal increase in complexity. In the second scheme, we generalize the compression framework to accommodate recursive coding. In this approach, the joint probability density function (PDF) of the parameter vectors of successive source frames is modeled using a GMM. The conditional density of the parameter vector of the current source frame based on the quantized values of the parameter vector of the previous source frames is used to generate a new codebook for every current source frame. We demonstrate the efficacy of the proposed schemes in the application of speech spectrum quantization. The proposed scheme is shown to provide superior performance with moderate increase in complexity when compared with conventional one-step linear prediction based compression schemes for both narrow-band and wide-band speech.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号