共查询到20条相似文献,搜索用时 0 毫秒
1.
R. Visalakshi P. Dhanalakshmi S. Palanivel 《International Journal of Speech Technology》2016,19(3):467-483
Speaker localization is a technique to locate and track an active speaker from multiple acoustic sources using microphone array. Microphone array is used to improve the speech quality of recorded speech signal in meeting room and other places. In this work, the time delay estimation between source and each microphone is calculated using a localization method called time differences of arrival (TDOA). TDOA localization consists of two steps namely (a) a time delay estimator and (b) a localization estimator. For time delay estimation, the generalized cross-correlation using phase transform, the generalized cross correlation using maximum likelihood, linear prediction (LP) residual and the Hilbert envelope of the LP residual are chosen for estimating the location of a person. A new speaker localization algorithm known as group search optimization (GSO) algorithm is proposed. The performance of this algorithm is analyzed and compared with Gauss–Newton nonlinear least square method and genetic algorithm. Experimental results show that the proposed GSO method outperforms the other methods in terms of mean square error, root mean square error, mean absolute error, mean absolute percentage error, euclidean distance and mean absolute relative error. 相似文献
2.
This paper proposes a multimodal approach to distinguish silence from speech situations, and to identify the location of the active speaker in the latter case. In our approach, a video camera is used to track the faces of the participants, and a microphone array is used to estimate the Sound Source Location (SSL) using the Steered Response Power with the phase transform (SRP-PHAT) method. The audiovisual cues are combined, and two competing Hidden Markov Models (HMMs) are used to detect silence or the presence of a person speaking. If speech is detected, the corresponding HMM also provides the spatio-temporally coherent location of the speaker. Experimental results show that incorporating the HMM improves the results over the unimodal SRP-PHAT, and the inclusion of video cues provides even further improvements. 相似文献
3.
Multimedia Tools and Applications - Emotional speaker recognition under real life conditions becomes an urgent need for several applications. This paper proposes a novel approach using multiple... 相似文献
4.
5.
6.
Yiying Zhang Zhang D. Xiaoyan Zhu 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2000,30(5):598-602
This correspondence introduces a new text-independent speaker verification method, which is derived from the basic idea of pattern recognition that the discriminating ability of a classifier can be improved by removing the common information between classes. In looking for the common speech characteristics between a group of speakers, a global speaker model can be established. By subtracting the score acquired from this model, the conventional likelihood score is normalized with the consequence of more compact score distribution and lower equal error rates. Several experiments are carried out to demonstrate the effectiveness of the proposed method 相似文献
7.
Arnab Poddar Md Sahidullah Goutam Saha 《International Journal of Speech Technology》2018,21(3):473-488
A major challenge in ASV is to improve performance with short speech segments for end-user convenience in real-world applications. In this paper, we present a detailed analysis of ASV systems to observe the duration variability effects on state-of-the-art i-vector and classical Gaussian mixture model-universal background model (GMM-UBM) based ASV systems. We observe an increase in uncertainty of model parameter estimation for i-vector based ASV with speech of shorter duration. In order to compensate the effect of duration variability in short utterances, we have proposed adaptation technique for Baum-Welch statistics estimation used to i-vector extraction. Information from pre-estimated background model parameters are used for adaptation method. The ASV performance with the proposed approach is considerably superior to the conventional i-vector based system. Furthermore, the fusion of proposed i-vector based system and GMM-UBM further improves the ASV performance, especially for short speech segments. Experiments conducted on two speech corpora, NIST SRE 2008 and 2010, have shown relative improvement in equal error rate (EER) in the range of 12–20%. 相似文献
8.
This paper presents a neural-network-based PID-like control strategy applicable to a class of nonlinear control problems commonly encountered in the process-control industry. An artificial neural network is used to provide compensation of the plant's nonlinear dynamics so that the overall closed-loop system can be described in terms of an equivalent error system. In the paper, the strategy is carefully described, and then evaluated and compared with an alternative control system design which uses conventional gain-scheduled PID controllers. The paper includes real-time experimental results in applying the proposed technique for level control of a coupled-tanks system. 相似文献
9.
The double traveling salesman problem is a variation of the basic traveling salesman problem where targets can be reached by two salespersons operating in parallel. The real problem addressed by this work concerns the optimization of the harvest sequence for the two independent arms of a fruit-harvesting robot. This application poses further constraints, like a collision-avoidance function. The proposed solution is based on a self-organizing map structure, initialized with as many artificial neurons as the number of targets to be reached. One of the key components of the process is the combination of competitive relaxation with a mechanism for deleting and creating artificial neurons. Moreover, in the competitive relaxation process, information about the trajectory connecting the neurons is combined with the distance of neurons from the target. This strategy prevents tangles in the trajectory and collisions between the two tours. Results of tests indicate that the proposed approach is efficient and reliable for harvest sequence planning. Moreover, the enhancements added to the pure self-organizing map concept are of wider importance, as proved by a traveling salesman problem version of the program, simplified from the double version for comparison. 相似文献
10.
In financial time series forecasting, the problem that we often encounter is how to increase the prediction accuracy as possible using the financial data with noise. In this study, we discuss the use of supervised neural networks as a meta-learning technique to design a financial time series forecasting system to solve this problem. In this system, some data sampling techniques are first used to generate different training subsets from the original datasets. In terms of these different training subsets, different neural networks with different initial conditions or training algorithms are then trained to formulate different prediction models, i.e., base models. Subsequently, to improve the efficiency of predictions of metamodeling, the principal component analysis (PCA) technique is used as a pruning tool to generate an optimal set of base models. Finally, a neural-network-based nonlinear metamodel can be produced by learning from the selected base models, so as to improve the prediction accuracy. For illustration and verification purposes, the proposed metamodel is conducted on four typical financial time series. Empirical results obtained reveal that the proposed neural-network-based nonlinear metamodeling technique is a very promising approach to financial time series forecasting. 相似文献
11.
《Computers & Electrical Engineering》2014,40(8):215-226
Iris localization plays a decisive role in the overall iris biometric system’s performance, because it isolates the valid part of iris. This study proposes a reliable iris localization technique. It includes the following. First, it extracts the iris inner contour within a sliding-window in an eye image using a multi-valued adaptive threshold and the two-dimensional (2D) properties of binary objects. Then, it localizes the iris outer contour using an edge-detecting operator in a sub image centered at the pupil center. Finally, it regularizes the iris contours to compensate for their non-circular structure. The proposed technique is tested on the following public iris databases: CASA V1.0, CASIA-Iris-Lamp, IITD V1.0, and the MMU V1.0. The experimental and accuracy results of the proposed scheme compared with other state-of-the-art techniques endorse its satisfactory performance. 相似文献
12.
A unified approach to autofocus and alignment for pattern localization using hybrid weighted Hausdorff distance 总被引:2,自引:0,他引:2
Dongjiang Xu 《Pattern recognition letters》2011,32(14):1747-1755
Pattern localization is a fundamental task in machine vision, and autofocus is a requirement for any automated inspection system by allowing greater variation in the distance from the camera to the object being imaged. In this paper, we propose a unified approach to simultaneous autofocus and alignment for pattern localization by extending the idea of image reference approach. Under the least trimmed squares (LTS) scheme, the proposed hybrid weighted Hausdorff distance (HWHD) is a robust similarity metric that combines the Hausdorff distance (HD) with the edge-amplitude normalized gradient (EANG) matching. The EANG is designed to characterize the different degrees of blur at the edge points for focus cues, immune to illumination variations between the reference and the target image. We experimentally illustrate its performance on simulated as well as real data. 相似文献
13.
ANNSTLF-a neural-network-based electric load forecasting system 总被引:10,自引:0,他引:10
Khotanzad A. Afkhami-Rohani R. Tsun-Liang Lu Abaye A. Davis M. Maratukulam D.J. 《Neural Networks, IEEE Transactions on》1997,8(4):835-846
A key component of the daily operation and planning activities of an electric utility is short-term load forecasting, i.e., the prediction of hourly loads (demand) for the next hour to several days out. The accuracy of such forecasts has significant economic impact for the utility. This paper describes a load forecasting system known as ANNSTLF (artificial neural-network short-term load forecaster) which has received wide acceptance by the electric utility industry and presently is being used by 32 utilities across the USA and Canada. ANNSTLF can consider the effect of temperature and relative humidity on the load. Besides its load forecasting engine, ANNSTLF contains forecasters that can generate the hourly temperature and relative humidity forecasts needed by the system. ANNSTLF is based on a multiple ANN strategy that captures various trends in the data. Both the first and the second generation of the load forecasting engine are discussed and compared. The building block of the forecasters is a multilayer perceptron trained with the error backpropagation learning rule. An adaptive scheme is employed to adjust the ANN weights during online forecasting. The forecasting models are site independent and only the number of hidden layer nodes of ANN's need to be adjusted for a new database. The results of testing the system on data from ten different utilities are reported. 相似文献
14.
This paper proposes an efficient technique for automatic localization of ear from side face images. The technique is rotation, scale and shape invariant and makes use of the connected components in a graph obtained from the edge map of the side face image. It has been evaluated on IIT Kanpur database consisting of 2672 side faces with variable sizes, rotations and shapes and University of Notre Dame database containing 2244 side faces with variable background and poor illumination. Experimental results reveal the efficiency and robustness of the technique. 相似文献
15.
This paper presents a novel global localization approach for mobile robots by exploring line-segment features in any structured environment. The main contribution of this paper is an effective data association approach, the Line-segment Relation Matching (LRM) technique, which is based on a generation and exploration of an Interpretation Tree (IT). A new representation of geometric patterns of line-segments is proposed for the first time, which is called as Relation Table. It contains relative geometric positions of every line-segment respect to the others (or itself) in a coordinate-frame independent sense. Based on that, a Relation-Table-constraint is applied to minimize the searching space of IT therefore greatly reducing the processing time of LRM. The Least Square algorithm is further applied to estimate the robot pose using matched line-segment pairs. Then a global localization system can be realized based on our LRM technique integrated with a hypothesis tracking framework which is able to handle pose ambiguity. Sufficient simulations were specially designed and carried out indicating both pluses and minuses of our system compared with former methods. We also presented the practical experiments illustrating that our approach has a high robustness against uncertainties from sensor occlusions and extraneous observation in a highly dynamic environment. Additionally our system was demonstrated to easily deal with initialization and have the ability of quick recovery from a localization failure. 相似文献
16.
For a class of MIMO sampled-data nonlinear systems with unknown dynamic nonlinearities, a stable neural-network (NN)-based adaptive control approach which is an integration of an NN approach and the adaptive implementation of the variable structure control with a sector, is developed. The sampled-data nonlinear system is assumed to be controllable and its state vector is available for measurement. The variable structure control with a sector serves two purposes. One is to force the system state to be within the state region in which the NN's are used when the system goes out of neural control; and the other is to provide an additional control until the system tracking error metric is controlled inside the sector within the network approximation region. The proof of a complete stability and a tracking error convergence is given and the setting of the sector and the NN parameters is discussed. It is demonstrated that the asymptotic error of the system can be made dependent only on inherent network approximation errors and the frequency range of unmodeled dynamics. Simulation studies of a two-link manipulator show the effectiveness of the proposed control approach. 相似文献
17.
To navigate the object, pattern and architecture fields, the authors have developed a unified object topology, which uses a technology's domain dependency and implementation details to organize relationships with other technologies and to identify how the system will evolve. It also supports object repositories and identifies future research directions 相似文献
18.
In this paper a hierarchical, neural network control architecture of a walking machine is proposed. The neural network is based on the theory of the Cerebellum Model Articulation Controller (CMAC) which is a neuromuscular control system. Some preliminary studies of kinematic control and gait synthesis are presented to demonstrate the effectiveness of the CMAC neural network. After having been trained to learn the multivariable, nonlinear relationships of the leg kinematics and gaits, CMAC is utilized to perform feedforward kinematic control of a quadruped in straight-line walking and step climbing. Simulation examples are provided and discussed. This algorithm can be extended to control other highly nonlinear processes which are hierarchical in nature and cannot be modeled by mathematical equations. 相似文献
19.
Sundararajan Srinivasan Tao Ma Georgios Lazarou Joseph Picone 《International Journal of Speech Technology》2014,17(1):17-25
Gaussian Mixture Models (GMM) have been the most popular approach in speaker recognition and verification for over two decades. The inefficiencies of this model for signals such as speech are well documented and include an inability to model temporal dependencies that result from nonlinearities in the speech signal. The resulting models are often complex and overdetermined, which leads to a lack of generalization. In this paper, we present a nonlinear mixture autoregressive model (MixAR) that attempts to directly model nonlinearities in the trajectories of the speech features. We apply this model to the problem of speaker verification. Experiments with synthetic data demonstrate the viability of the model. Evaluations on standard speech databases, including TIMIT, NTIMIT, and NIST-2001, demonstrate that MixAR, using only half the number of parameters and only static features, can achieve a lower equal error rate when compared to GMMs, particularly in the presence of previously unseen noise. Performance as a function of the duration of both the training and evaluation utterances is also analyzed. 相似文献
20.
《Engineering Applications of Artificial Intelligence》2005,18(1):13-19
Gaussian mixture model (GMM) has been widely used for modeling speakers. In speaker identification, one major problem is how to generate a set of GMMs for identification purposes based upon the training data. Due to the hill-climbing characteristic of the maximum likelihood (ML) method, any arbitrary estimate of the initial model parameters will usually lead to a sub-optimal model in practice. To resolve this problem, this paper proposes a hybrid training method based on genetic algorithm (GA). It utilizes the global searching capability of GA and combines the effectiveness of the ML method. Experimental results based on TI46 and TIMIT showed that this hybrid GA could obtain more optimized GMMs and better results than the simple GA and the traditional ML method. 相似文献