首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 21 毫秒
We present a technique implementing space-variant filtering of an image, with kernels belonging to a given family, in time independent of the size and shape of the filter kernel support. The essence of our method is efficient approximation of these kernels, belonging to an infinite family governed by a small number of parameters, as a linear combination of a small number k of “basis” kernels. The accuracy of this approximation increases with k, and requires O(k) storage space. Any kernel in the family may be applied to the image in O(k) time using precomputed results of the application of the basis kernels. Performing linear combinations of these values with appropriate coefficients yields the desired result. A trade off between algorithm efficiency and approximation quality is obtained by adjusting k. The basis kernels are computed using singular value decomposition, distinguishing this from previous techniques designed to achieve a similar effect. We illustrate by applying our methods to the family of elliptic Gaussian kernels, a popular choice for filtering warped images.  相似文献   

In this paper, we introduce a family of filter kernels, the gray-code kernels (GCK) and demonstrate their use in image analysis. Filtering an image with a sequence of gray-code kernels is highly efficient and requires only two operations per pixel for each filter kernel, independent of the size or dimension of the kernel. We show that the family of kernels is large and includes the Walsh-Hadamard kernels, among others. The GCK can be used to approximate any desired kernel and, as such forms, a complete representation. The efficiency of computation using a sequence of GCK filters can be exploited for various real-time applications, such as, pattern detection, feature extraction, texture analysis, texture synthesis, and more  相似文献   


Emotion recognition from speech signals is an interesting research with several applications like smart healthcare, autonomous voice response systems, assessing situational seriousness by caller affective state analysis in emergency centers, and other smart affective services. In this paper, we present a study of speech emotion recognition based on the features extracted from spectrograms using a deep convolutional neural network (CNN) with rectangular kernels. Typically, CNNs have square shaped kernels and pooling operators at various layers, which are suited for 2D image data. However, in case of spectrograms, the information is encoded in a slightly different manner. Time is represented along the x-axis and y-axis shows frequency of the speech signal, whereas, the amplitude is indicated by the intensity value in the spectrogram at a particular position. To analyze speech through spectrograms, we propose rectangular kernels of varying shapes and sizes, along with max pooling in rectangular neighborhoods, to extract discriminative features. The proposed scheme effectively learns discriminative features from speech spectrograms and performs better than many state-of-the-art techniques when evaluated its performance on Emo-DB and Korean speech dataset.


在深度卷积神经网络的学习过程中,卷积核的初始值通常是随机赋值的.另外,基于梯度下降法的网络参数学习法通常会导致梯度弥散现象.鉴于此,提出一种基于反卷积特征提取的深度卷积神经网络学习方法.首先,采用无监督两层堆叠反卷积神经网络从原始图像中学习得到特征映射矩阵;然后,将该特征映射矩阵作为深度卷积神经网络的卷积核,对原始图像进行逐层卷积和池化操作;最后,采用附加动量系数的小批次随机梯度下降法对深度卷积网络微调以避免梯度弥散问题.在MNIST、CIFAR-10和CIFAR-100数据集上的实验结果表明,所提出方法可有效提高图像分类精度.  相似文献   

We propose a novel approach for denoising Monte Carlo path traced images, which uses data from individual samples rather than relying on pixel aggregates. Samples are partitioned into layers, which are filtered separately, giving the network more freedom to handle outliers and complex visibility. Finally the layers are composited front-to-back using alpha blending. The system is trained end-to-end, with learned layer partitioning, filter kernels, and compositing. We obtain similar image quality as recent state-of-the-art sample based denoisers at a fraction of the computational cost and memory requirements.  相似文献   

We describe approaches for positive data modeling and classification using both finite inverted Dirichlet mixture models and support vector machines (SVMs). Inverted Dirichlet mixture models are used to tackle an outstanding challenge in SVMs namely the generation of accurate kernels. The kernels generation approaches, grounded on ideas from information theory that we consider, allow the incorporation of data structure and its structural constraints. Inverted Dirichlet mixture models are learned within a principled Bayesian framework using both Gibbs sampler and Metropolis-Hastings for parameter estimation and Bayes factor for model selection (i.e., determining the number of mixture’s components). Our Bayesian learning approach uses priors, which we derive by showing that the inverted Dirichlet distribution belongs to the family of exponential distributions, over the model parameters, and then combines these priors with information from the data to build posterior distributions. We illustrate the merits and the effectiveness of the proposed method with two real-world challenging applications namely object detection and visual scenes analysis and classification.  相似文献   

We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach provides a practical method for learning high-order Markov random field (MRF) models with potential functions that extend over large pixel neighborhoods. These clique potentials are modeled using the Product-of-Experts framework that uses non-linear functions of many linear filter responses. In contrast to previous MRF approaches all parameters, including the linear filters themselves, are learned from training data. We demonstrate the capabilities of this Field-of-Experts model with two example applications, image denoising and image inpainting, which are implemented using a simple, approximate inference scheme. While the model is trained on a generic image database and is not tuned toward a specific application, we obtain results that compete with specialized techniques. The work for this paper was performed while S.R. was at Brown University.  相似文献   

The extreme learning machine (ELM) is a new method for using single hidden layer feed-forward networks with a much simpler training method. While conventional kernel-based classifiers are based on a single kernel, in reality, it is often desirable to base classifiers on combinations of multiple kernels. In this paper, we propose the issue of multiple-kernel learning (MKL) for ELM by formulating it as a semi-infinite linear programming. We further extend this idea by integrating with techniques of MKL. The kernel function in this ELM formulation no longer needs to be fixed, but can be automatically learned as a combination of multiple kernels. Two formulations of multiple-kernel classifiers are proposed. The first one is based on a convex combination of the given base kernels, while the second one uses a convex combination of the so-called equivalent kernels. Empirically, the second formulation is particularly competitive. Experiments on a large number of both toy and real-world data sets (including high-magnification sampling rate image data set) show that the resultant classifier is fast and accurate and can also be easily trained by simply changing linear program.  相似文献   

We present a sparse optimization framework for extracting sparse shape priors from a collection of 3D models. Shape priors are defined as point‐set neighborhoods sampled from shape surfaces which convey important information encompassing normals and local shape characterization. A 3D shape model can be considered to be formed with a set of 3D local shape priors, while most of them are likely to have similar geometry. Our key observation is that the local priors extracted from a family of 3D shapes lie in a very low‐dimensional manifold. Consequently, a compact and informative subset of priors can be learned to efficiently encode all shapes of the same family. A comprehensive library of local shape priors is first built with the given collection of 3D models of the same family. We then formulate a global, sparse optimization problem which enforces selecting representative priors while minimizing the reconstruction error. To solve the optimization problem, we design an efficient solver based on the Augmented Lagrangian Multipliers method (ALM). Extensive experiments exhibit the power of our data‐driven sparse priors in elegantly solving several high‐level shape analysis applications and geometry processing tasks, such as shape retrieval, style analysis and symmetry detection.  相似文献   

基于深度学习理论,将图像去噪过程看成神经网络的拟合过程,构造简洁高效的复合卷积神经网络,提出基于复合卷积神经网络的图像去噪算法.算法第1阶段由2个2层的卷积网络构成,分别训练阶段2中的3层卷积网络中的部分初始卷积核,缩短阶段2中网络的训练时间和增强算法的鲁棒性.最后运用阶段2中的卷积网络对新的噪声图像进行有效去噪.实验表明文中算法在峰值信噪比、结构相识度及均方根误差指数上与当前较好的图像去噪算法相当,尤其当噪声加强时效果更佳且训练时间较短.  相似文献   

A prototype filter design approach to pyramid generation   总被引:1,自引:0,他引:1  
This paper presents a technique for image pyramid generation, in which the reduction (expansion) factor between layers is any rational number M/L. The image pyramid generation is modeled as an interpolation and filtering followed by a decimation. The model enables frequency domain analysis of the image pyramid, as well as convenient design of the generating kernels. L(M) generating kernels are necessary to produce an image pyramid with reduction (expansion) factor M/L(L/M). A polyphase filter network scheme is used where the L(M) generating kernels can be produced by sampling one prototype low-pass filter with cutoff frequency at ω=π/max[M,L]. Using these polyphase filters, the frequency content of pyramid image decompositions can be adjusted with great flexibility. A systematic procedure is presented here for specifying the relative positions of spatial samples in successive pyramid levels-a complication that arises when generalizing from integer reduction factors to rational factors. Two types of low-pass filters are employed in this work for the prototype filter design: a binomial filter and an FIR linear phase filter. Illustrative examples are presented  相似文献   

The techniques for image analysis and classification generally consider the image sample labels fixed and without uncertainties. The rank regression problem studied in this paper is based on the training samples with uncertain labels, which often is the case for the manual estimated image labels. A core ranking model is designed first as the bilinear fusing of multiple candidate kernels. Then, the parameters for feature selection and kernel selection are learned simultaneously by maximum a posteriori for given samples and uncertain labels. The provable convergency Expectation Maximization (EM) method is used for inferring these parameters in an iterative manner. The effectiveness of the proposed algorithm is finally validated by the extensive experiments on age ranking task and human tracking task. The popular FG-NET and the large scale Yamaha aging database are used for the age estimation experiments, and our algorithm outperforms those state-of-the-art algorithms ever reported by other interrelated literatures significantly. The experiment result of human tracking task also validates its advantage over conventional linear regression algorithm. A short version of this paper appeared in ICME07.  相似文献   

In this paper, we investigate material classification from single images obtained under unknown viewpoint and illumination. It is demonstrated that materials can be classified using the joint distribution of intensity values over extremely compact neighborhoods (starting from as small as 3times3 pixels square) and that this can outperform classification using filter banks with large support. It is also shown that the performance of filter banks is inferior to that of image patches with equivalent neighborhoods. We develop novel texton-based representations which are suited to modeling this joint neighborhood distribution for Markov random fields. The representations are learned from training images and then used to classify novel images (with unknown viewpoint and lighting) into texture classes. Three such representations are proposed and their performance is assessed and compared to that of filter banks. The power of the method is demonstrated by classifying 2,806 images of all 61 materials present in the Columbia-Utrecht database. The classification performance surpasses that of recent state-of-the-art filter bank-based classifiers such as Leung and Malik (IJCV 01), Cula and Dana (IJCV 04), and Varma and Zisserman (IJCV 05). We also benchmark performance by classifying all of the textures present in the UIUC, Microsoft Textile, and San Francisco outdoor data sets. We conclude with discussions on why features based on compact neighborhoods can correctly discriminate between textures with large global structure and why the performance of filter banks is not superior to that of the source image patches from which they were derived.  相似文献   

Feature learning for 3D shapes is challenging due to the lack of natural paramterization for 3D surface models. We adopt the multi‐view depth image representation and propose Multi‐View Deep Extreme Learning Machine (MVD‐ELM) to achieve fast and quality projective feature learning for 3D shapes. In contrast to existing multi‐view learning approaches, our method ensures the feature maps learned for different views are mutually dependent via shared weights and in each layer, their unprojections together form a valid 3D reconstruction of the input 3D shape through using normalized convolution kernels. These lead to a more accurate 3D feature learning as shown by the encouraging results in several applications. Moreover, the 3D reconstruction property enables clear visualization of the learned features, which further demonstrates the meaningfulness of our feature learning.  相似文献   

探讨了利用Gabor小波和隐马尔可夫模型(HMM)进行人脸识别的方法,首先对人脸图像进行多分辨率的Gabor小波变换;然后在图像上放置一组网格结点,每个结点用该结点处的多尺度Gabor幅度特征描述,采用独立元分析法对每个结点进行去相关和降维;最后形成特征结,把每个特征结作为观测向量,对隐马尔可夫模型进行训练,并将优化的模型参数用于人脸识别,ORL人脸库的实验结果表明,该方法识别率高,工程上易于应用。  相似文献   

Shape deformation is one of the fundamental techniques in geometric processing. One principle of deformation is to preserve the geometric details while distributing the necessary distortions uniformly. To achieve this, state-of-the-art techniques deform shapes in a locally as-rigid-as-possible (ARAP) manner. Existing ARAP deformation methods optimize rigid transformations in the 1-ring neighborhoods and maintain the consistency between adjacent pairs of rigid transformations by single overlapping edges. In this paper, we make one step further and propose to use larger local neighborhoods to enhance the consistency of adjacent rigid transformations. This is helpful to keep the geometric details better and distribute the distortions more uniformly. Moreover, the size of the expanded local neighborhoods provides an intuitive parameter to adjust physical stiffness. The larger the neighborhood is, the more rigid the material is. Based on these, we propose a novel rigidity controllable mesh deformation method where shape rigidity can be flexibly adjusted. The size of the local neighborhoods can be learned from datasets of deforming objects automatically or specified by the user, and may vary over the surface to simulate shapes composed of mixed materials. Various examples are provided to demonstrate the effectiveness of our method.  相似文献   

Consider situations where the depth at each point in the scene is multi-valued, due to the presence of a virtual image semi-reflected by a transparent surface. The semi-reflected image is linearly superimposed on the image of an object that is behind the transparent surface. A novel approach is proposed for the separation of the superimposed layers. Focusing on either of the layers yields initial separation, but crosstalk remains. The separation is enhanced by mutual blurring of the perturbing components in the images. However, this blurring requires the estimation of the defocus blur kernels. We thus propose a method for self calibration of the blur kernels, given the raw images. The kernels are sought to minimize the mutual information of the recovered layers. Autofocusing and depth estimation in the presence of semi-reflections are also considered. Experimental results are presented.  相似文献   

A novel model is presented to learn bimodally informative structures from audio–visual signals. The signal is represented as a sparse sum of audio–visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform and a spatio–temporal visual basis function. To represent an audio–visual signal, the kernels can be positioned independently and arbitrarily in space and time. The proposed algorithm uses unsupervised learning to form dictionaries of bimodal kernels from audio–visual material. The basis functions that emerge during learning capture salient audio–visual data structures. In addition, it is demonstrated that the learned dictionary can be used to locate sources of sound in the movie frame. Specifically, in sequences containing two speakers, the algorithm can robustly localize a speaker even in the presence of severe acoustic and visual distracters.   相似文献   

针对不同卷积核可以提取不同的图像特征,而卷积核的训练比较困难这一问题,提出一种带主成分分析(PCA)卷积的稀疏表示分类算法。先对训练样本集做分片去均值化处理,然后直接应用PCA算法提取所有分片的前K个特征向量作为卷积核,再用这些卷积核对原始图像进行卷积操作;并提出一种自动加权策略,对卷积处理后得到的K个特征图像进行加权叠加操作;最后对特征图像进行分块直方图统计稀疏化,并应用稀疏表示分类算法进行分类。在公共人脸数据集AR、CMU Multi-PIE、ORL以及数字手写体数据集MNIST上与常用分类算法进行对比实验,实验结果表明,带PCA卷积的稀疏表示分类算法具有更高的分类准确率。  相似文献   

This paper proposes a scale‐adaptive filtering method to improve the performance of structure‐preserving texture filtering for image smoothing. With classical texture filters, it usually is challenging to smooth texture at multiple scales while preserving salient structures in an image. We address this issue in the concept of adaptive bilateral filtering, where the scales of Gaussian range kernels are allowed to vary from pixel to pixel. Based on direction‐wise statistics, our method distinguishes texture from structure effectively, identifies appropriate scope around a pixel to be smoothed and thus infers an optimal smoothing scale for it. Filtering an image with varying‐scale kernels, the image is smoothed according to the distribution of texture adaptively. With commendable experimental results, we show that, needing less iterations, our proposed scheme boosts texture filtering performance in terms of preserving the geometric structures of multiple scales even after aggressive smoothing of the original image.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号