期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients

Marwa A. Nasr Mohammed Abd-Elnaby Adel S. El-Fishawy S. El-Rabaie Fathi E. Abd El-Samie 《International Journal of Speech Technology》2018,21(4):941-951

This paper presents an efficient approach for automatic speaker identification based on cepstral features and the Normalized Pitch Frequency (NPF). Most relevant speaker identification methods adopt a cepstral strategy. Inclusion of the pitch frequency as a new feature in the speaker identification process is expected to enhance the speaker identification accuracy. In the proposed framework for speaker identification, a neural classifier with a single hidden layer is used. Different transform domains are investigated for reliable feature extraction from the speech signal. Moreover, a pre-processing noise reduction step, is used prior to the feature extraction process to enhance the performance of the speaker identification system. Simulation results prove that the NPF as a feature in speaker identification enhances the performance of the speaker identification system, especially with the Discrete Cosine Transform (DCT) and wavelet denoising pre-processing step. 相似文献

2.

Efficient on-line signature recognition based on multi-section vector quantization

Marcos Faundez-Zanuy Juan Manuel Pascual-Gaspar 《Pattern Analysis & Applications》2011,14(1):37-45

This paper proposes a multi-section vector quantization approach for on-line signature recognition. We have used a database of 330 users which includes 25 skilled forgeries performed by 5 different impostors. This database is larger than those typically used in the literature. Nevertheless, we also provide results from the SVC database. Our proposed system obtains similar results as the state-of-the-art online signature recognition algorithm, Dynamic Time Warping, with a reduced computational requirement, around 47 times lower. In addition, our system improves the database storage requirements due to vector compression, and is more privacy-friendly because it is not possible to recover the original signature using the codebooks. Experimental results reveal that our proposed multi-section vector quantization achieves a 98% identification rate, minimum Detection Cost Function value equal to 2.29% for random forgeries and 7.75% for skilled forgeries. 相似文献

3.

Speaker recognition using pyramid match kernel based support vector machines

A. D. Dileep C. Chandra Sekhar 《International Journal of Speech Technology》2012,15(3):365-379

Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel, GMM-UBM mean interval kernel (GUMI) and intermediate matching kernel. Recently, the pyramid match kernel (PMK) using grids in the feature space as histogram bins and vocabulary-guided PMK (VGPMK) using clusters in the feature space as histogram bins have been proposed for recognition of objects in an image represented as a set of local feature vectors. In PMK, a set of feature vectors is mapped onto a multi-resolution histogram pyramid. The kernel is computed between a pair of examples by comparing the pyramids using a weighted histogram intersection function at each level of pyramid. We propose to use the PMK-based SVM classifier for speaker identification and verification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the PMK-based SVM classifier is construction of a pyramid of histograms. We first propose to form hard clusters, using k-means clustering method, with increasing number of clusters at different levels of pyramid to design the codebook-based PMK (CBPMK). Then we propose the GMM-based PMK (GMMPMK) that uses soft clustering. We compare the performance of the GMM-based approaches, and the PMK and other dynamic kernel SVM-based approaches to speaker identification and verification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification and verification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the state-of-the-art GMM-based approaches. For speaker recognition task, the GMMPMK-based SVM gives a performance that is better than that of SVMs using many other dynamic kernels and comparable to that of SVMs using state-of-the-art dynamic kernel, GUMI kernel. The storage requirements of the GMMPMK-based SVMs are less than that of SVMs using any other dynamic kernel. 相似文献

4.

基于支持向量机和小波分析的说话人识别 总被引：2，自引：0，他引：2

张振领徐东平贾仰理《计算机工程与设计》2007,28(21):5201-5202,5224

为解决说话人识别问题,提出了一种基于支持向量机和小波分析的识别方法以及其框架模型,即将小波分析应用于信号预处理,并以此为基础,利用其奇异点检测原理将语音信号和噪声分离,实现语音增强,最终基于样本进行训练和测试,采用SVM实现说话人的分类识别. 相似文献

5.

基于矢量量化和查找表的改进DTW语音识别方法

李宏言盛利元陈妮《计算机工程与设计》2007,28(19):4702-4704,4737

针对传统DTW语音识别方法的运算量和存储空间大的缺陷,提出一种基于矢量量化和查找表的改进DTW方法.方法利用矢量量化操作将连续特征矢量空间转化成离散矢量空间,以降低模式存储空间,在此基础上建立矢量失真测度表,并通过Hash查表方式实现了地址空间的精确定位,从而省去了动态规划操作造成的大量距离测度计算,极大提高了识别匹配速度.理论分析和实验结果证明了改进方法的有效性.同时为研究方便,在Matlab平台下设计和开发了DTW实时语音识别系统. 相似文献

6.

基于Mel频率倒谱系数和遗传算法的煤矸界面识别研究

何爱香王平建魏广芬张守祥《工矿自动化》2013,39(2):66-71

针对现有的煤矸界面识别技术采用的γ射线法不适用于顶板不含放射性元素或者放射性元素含量较低的工作面,而雷达探测法探测范围小、信号衰减严重的问题,提出了一种基于Mel频率倒谱系数和遗传算法的煤矸界面识别方法。该方法利用煤矸放落过程中产生的声波信号的特征差异进行煤矸识别,采用Mel频率倒谱系数将去噪后的煤矸声波信号变换到频域进行处理,提取出煤矸声波信号的32维特征参数;采用遗传算法优化处理32维特征参数,得到最优参数组合;采用支持向量机和BP神经网络对最优参数进行识别。实验结果表明,该方法能够准确识别出煤矸下落状态。相似文献

7.

Efficient music note recognition based on a self-organizing map tree and linear vector quantization

Khalid Youssef Peng-Yung Woo 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2009,13(12):1187-1198

Using classical signal processing and filtering techniques for music note recognition faces various kinds of difficulties. This paper proposes a new scheme based on neural networks for music note recognition. The proposed scheme uses three types of neural networks: time delay neural networks, self-organizing maps, and linear vector quantization. Experimental results demonstrate that the proposed scheme achieves 100% recognition rate in moderate noise environments. The basic design of two potential applications of the proposed scheme is briefly demonstrated. 相似文献

8.

Clustering of biological time series by cepstral coefficients based distances

Alexios Savvides Vasilis J. Promponas Konstantinos Fokianos 《Pattern recognition》2008,41(7):2398-2412

Clustering of stationary time series has become an important tool in many scientific applications, like medicine, finance, etc. Time series clustering methods are based on the calculation of suitable similarity measures which identify the distance between two or more time series. These measures are either computed in the time domain or in the spectral domain. Since the computation of time domain measures is rather cumbersome we resort to spectral domain methods. A new measure of distance is proposed and it is based on the so-called cepstral coefficients which carry information about the log spectrum of a stationary time series. These coefficients are estimated by means of a semiparametric model which assumes that the log-likelihood ratio of two or more unknown spectral densities has a linear parametric form. After estimation, the estimated cepstral distance measure is given as an input to a clustering method to produce the disjoint groups of data. Simulated examples show that the method yields good results, even when the processes are not necessarily linear. These cepstral-based clustering algorithms are applied to biological time series. In particular, the proposed methodology effectively identifies distinct and biologically relevant classes of amino acid sequences with the same physicochemical properties, such as hydrophobicity. 相似文献

9.

Improved batch fuzzy learning vector quantization for image compression

George E. Tsekouras Mamalis Antonios Christos Anagnostopoulos Damianos Gavalas Dafne Economou 《Information Sciences》2008,178(20):3895-3907

In this paper, we develop a batch fuzzy learning vector quantization algorithm that attempts to solve certain problems related to the implementation of fuzzy clustering in image compression. The algorithm’s structure encompasses two basic components. First, a modified objective function of the fuzzy c-means method is reformulated and then is minimized by means of an iterative gradient-descent procedure. Second, the overall training procedure is equipped with a systematic strategy for the transition from fuzzy mode, where each training vector is assigned to more than one codebook vectors, to crisp mode, where each training vector is assigned to only one codebook vector. The algorithm is fast and easy to implement. Finally, the simulation results show that the method is efficient and appears to be insensitive to the selection of the fuzziness parameter. 相似文献

10.

基于VQ-MAP与LS-SVM融合的说话人识别系统

展领景新幸《电子技术应用》2010,(6)

传统的最小二乘支持向量机(LS-SVM)使用特征向量作为训练样本,在说话人识别系统中应用时区分性不够明显。对此,提出VQ-MAP与LS-SVM融合的方法,使用通用背景模型(UBM)经过VQ-MAP过程得到说话人自适应参数集,把此参数集作为最小二乘支持向量机的训练样本应用于说话人识别系统中。用Matlab进行仿真实验,结果表明,该识别系统SVM训练时间短,且具有较高的识别率。相似文献

11.

基于多维向量模型模糊聚类的图像识别研究

肖满生 YU Xun-quan 周丽娟《计算机工程与设计》2008,29(15)

从建立像素色彩空间的多维向量模型出发,采用一种改进的模糊C均值聚类算法对图像进行分割,从而得到一组图像像素空间的特征区域向量,并采用特征向量相似度计算方法计算图像相似度,进而比较两幅图像相似度大小,以达到图像识别的目的.通过实验对图像相似识别效果进行验证,实验表明,基于多维向量模型模糊聚类方法在图像识别中有一定应用价值. 相似文献

12.

模糊理论在基于特征向量的模式识别中的应用

王晓君魏书华《计算机工程与应用》2007,43(10):81-83,134

在计算机测量与控制系统中经常会遇到基于特征向量的模式识别问题,由于特征向量提取过程所带来的误差,使得数据库中标准模式的特征向量与待识对象的特征向量都具有一定的不确定性。将模糊理论应用于识别过程中,将识别过程转换为两个模糊集的贴近度或距离问题,设计实现了基于格贴近度和距离的识别算法,并成功应用于基于手掌形状的身份识别系统中。相似文献

13.

Image indexing and retrieval based on vector quantization

Shyh Wei Teng^{Author Vitae} Guojun Lu Author Vitae 《Pattern recognition》2007,40(11):3299-3316

To effectively utilize information stored in a digital image library, effective image indexing and retrieval techniques are essential. This paper proposes an image indexing and retrieval technique based on the compressed image data using vector quantization (VQ). By harnessing the characteristics of VQ, the proposed technique is able to capture the spatial relationships of pixels when indexing the image. Experimental results illustrate the robustness of the proposed technique and also show that its retrieval performance is higher compared with existing color-based techniques. 相似文献

14.

Speaker discrimination based on fuzzy fusion and feature reduction techniques

S. Khennouf H. Sayoud 《International Journal of Speech Technology》2018,21(1):51-63

In this paper, we propose a research work on speaker discrimination using a multi-classifier fusion with focus on feature reduction effects. Speaker discrimination consists in the automatic distinction between two speakers using the vocal characteristics of their speeches. A number of features are extracted using Mel Frequency Spectral Coefficients and then reduced using Relative Speaker Characteristic (RSC) along with the Principal Components Analysis (PCA). Several classification methods are implemented to ensure the discrimination task. Since different classifiers are employed, two fusion algorithms at the decision level, referred to as Weighted Fusion and Fuzzy Fusion, are proposed to boost the classification performances. These algorithms are based on the weighting of the different classifiers outputs. Furthermore, the effects of speaker gender and feature reduction on the speaker discrimination task have been examined too. The evaluation of our approaches was conducted on a subset of Hub-4 Broadcast-News. The experimental results have shown that the speaker discrimination accuracy is improved by 5–15% using the (RSC–PCA) feature reduction. In addition, the proposed fusion methods recorded an improvement of about 10% compared to the individual scores of the classifiers. Finally, we noticed that the gender has an important impact on the discrimination performances. 相似文献

15.

Robust speech recognition based on independent vector analysis using harmonic frequency dependency

Soram Jun Minook Kim Myungwoo Oh Hyung-Min Park 《Neural computing & applications》2013,22(7-8):1321-1327

This paper describes an algorithm that enhances speech by independent vector analysis (IVA) using harmonic frequency dependency for robust speech recognition. While the conventional IVA exploits the full-band uniform dependencies of each source signal, a harmonic clique model is introduced to improve the enhancement performance by modeling strong dependencies among multiples of fundamental frequencies. An IVA-based learning algorithm is derived to consider the non-holonomic constraint and the minimal distortion principle to reduce the unavoidable distortion of IVA, and the minimum power distortionless response beamformer is used as a pre-processing step. In addition, the algorithm compares the log-spectral features of the enhanced speech and observed noisy speech to identify time–frequency segments corrupted by noise and restores those with the cluster-based missing feature reconstruction technique. Experimental results demonstrate that the proposed method enhances recognition performance significantly in noisy environments, especially with competing interference. 相似文献

16.

Multiresolution,perceptual and vector quantization based video codec

Akbar Sheikh Akbari Pooneh Bagheri Zadeh Tom Buggy John Soraghan 《Multimedia Tools and Applications》2012,58(3):569-583

This paper presents a novel Multiresolution, Perceptual and Vector Quantization (MPVQ) based video coding scheme. In the intra-frame mode of operation, a wavelet transform is applied to the input frame and decorrelates it into its frequency subbands. The coefficients in each detail subband are pixel quantized using a uniform quantization factor divided by the perceptual weighting factor of that subband. The quantized coefficients are finally coded using a quadtree-coding algorithm. Perceptual weights are specifically calculated for the centre of each detail subband. In the inter-frame mode of operation, a Displaced Frame Difference (DFD) is first generated using an overlapped block motion estimation/compensation technique. A wavelet transform is then applied on the DFD and converts it into its frequency subbands. The detail subbands are finally vector quantized using an Adaptive Vector Quantization (AVQ) scheme. To evaluate the performance of the proposed codec, the proposed codec and the adaptive subband vector quantization coding scheme (ASVQ), which has been shown to outperform H.263 at all bitrates, were applied to six test sequences. Experimental results indicate that the proposed codec outperforms the ASVQ subjectively and objectively at all bit rates. 相似文献

17.

Constrained-storage multistage vector quantization based on genetic algorithms

Shiueng-Bien Yang 《Pattern recognition》2008,41(2):689-700

Multistage vector quantization (MSVQ) and their variants have been recently proposed. Before MSVQ is designed, the user must artificially determine the number of codewords in each VQ stage. However, the users usually have no idea regarding the number of codewords in each VQ stage, and thus doubt whether the resulting MSVQ is optimal. This paper proposes the genetic design (GD) algorithm to design the MSVQ. The GD algorithm can automatically find the number of codewords to optimize each VQ stage according to the rate–distortion performance. Thus, the MSVQ based on the GD algorithm, namely MSVQ(GD), is proposed here. Furthermore, using a sharing codebook (SC) can further reduce the storage size of MSVQ. Combining numerous similar codewords in the VQ stages of MSVQ produces the codewords of the sharing codebook. This paper proposes the genetic merge (GM) algorithm to design the SC of MSVQ. Therefore, the constrained-storage MSVQ using a SC, namely CSMSVQ, is proposed and outperforms other MSVQs in the experiments presented here. 相似文献

18.

Image retrieval based on quadtree classified vector quantization

Chen Hsin-Hui Ding Jian-Jiun Sheu Hsin-Teng 《Multimedia Tools and Applications》2014,72(2):1961-1984

Multimedia Tools and Applications - In this paper, a color image retrieval scheme based on quadtree classified vector quantization (QCVQ) is proposed. This scheme not only captures intra-block... 相似文献

19.

Image retrieval based on index compressed vector quantization

Amir Masud Eftekhari-MoghadamAuthor Vitae Jamshid ShanbehzadehAuthor Vitae 《Pattern recognition》2003,36(11):2635-2647

Increased amount of visual data in several applications necessitates content-based image retrieval. Since most of visual data is stored in compressed form, it is crucial to develop indexing techniques for searching images based on their content in compressed form. Therefore, it is desirable to explore image compression techniques with capability of describing image content in compressed form. Vector Quantization (VQ) is a compression scheme that exploits intra-block correlation and image correlation reflects image content, hence VQ is a suitable compression technique for compressed domain image retrieval.This paper introduces a novel indexing scheme for compressed domain image databases based on indices generated from IC-VQ. The proposed scheme extracts image features based on relationship between indices of IC-VQ compressed images. This relationship detects contiguous regions of compressed image based on inter- and intra-block correlation. Experimental results show effectiveness superiority of the new scheme compared to VQ and color-based schemes. 相似文献

20.

On-line recognition of hand-written characters utilizing positional and stroke vector sequences

Katsuo Ikeda Takashi Yamamura Yasumasa Mitamura Shiokazu Fujiwara Yoshiharu Tominaga Takeshi Kiyono 《Pattern recognition》1981,13(3):191-206

An on-line recognition method for hand-written characters utilizing stroke vector sequences and a positional vector sequence has been developed. The number of target characters is about 2000, and fairly good recognition scores have been attained. Our scheme uses the number of strokes as the primary parameter. We employ three types of recognition strategy depending on the number of strokes. The general stroke vector sequence method, devised to analyze the shape, can represent both skeleton and local characteristics by a small amount of information; and the restricted dynamic programming method is effective to determine the shape of a stroke. The similarity of two shapes and the complexity of a stroke have been introduced to reduce the dictionary size and the processing time, respectively. 相似文献