首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 32 毫秒
1.
Traditionally, in machine vision images are represented using cartesian coordinates with uniform sampling along the axes. On the contrary, biological vision systems represent images using polar coordinates with non-uniform sampling. For various advantages provided by space-variant representations many researchers are interested in space-variant computer vision. In this direction the current work proposes a novel and simple space variant representation of images. The proposed representation is compared with the classical log-polar mapping. The log-polar representation is motivated by biological vision having the characteristic of higher resolution at the fovea and reduced resolution at the periphery. On the contrary to the log-polar, the proposed new representation has higher resolution at the periphery and lower resolution at the fovea. Our proposal is proved to be a better representation in navigational scenarios such as driver assistance systems and robotics. The experimental results involve analysis of optical flow fields computed on both proposed and log-polar representations. Additionally, an egomotion estimation application is also shown as an illustrative example. The experimental analysis comprises results from synthetic as well as real sequences.  相似文献   

2.
In this paper we will discuss the use of some graph-based representations and techniques for image processing and analysis. Instead of making an extensive review of the graph techniques in this field, we will explain how we are using these techniques in an active vision system for an autonomous mobile robot developed in the Institut de Robòtica i Informàtica Industrial within the project “Active Vision System with Automatic Learning Capacity for Industrial Applications (CICYT TAP98-0473)”. Specifically we will discuss the use of graph-based representations and techniques for image segmentation, image perceptual grouping and object recognition. We first present a generalisation of a graph partitioning greedy algorithm for colour image segmentation. Next we describe a novel fusion of colour-based segmentation and depth from stereo that yields a graph representing every object in the scene. Finally we describe a new representation of a set of attributed graphs (AGs), denominated function-described graphs (FDGs), a distance measure for matching AGs with FDGs and some applications for robot vision.  相似文献   

3.
简化的Wigner分布及其在笔迹鉴别中的应用   总被引:7,自引:0,他引:7  
本文提出了简化的Wigner分布纹理分析方法并用于笔迹鉴别.Wigner分布是图像的局部频谱表示,但计算量和存储量太大,我们证明Wigner分布是信号的冗余表示,然后对它进行了简化.简化的Wigner分布是信息保持的,且仍保持较好的纹理度量性能.该方法在笔迹鉴别实验中比以往的笔迹纹理分析方法取得了更好的结果。  相似文献   

4.
Human action recognition in video is important in many computer vision applications such as automated surveillance. Human actions can be compactly encoded using a sparse set of local spatio-temporal salient features at different scales. The existing bottom-up methods construct a single dictionary of action primitives from the joint features of all scales and hence, a single action representation. This representation cannot fully exploit the complementary characteristics of the motions across different scales. To address this problem, we introduce the concept of learning multiple dictionaries of action primitives at different resolutions and consequently, multiple scale-specific representations for a given video sample. Using a decoupled fusion of multiple representations, we improved the human classification accuracy of realistic benchmark databases by about 5%5%, compared with the state-of-the art methods.  相似文献   

5.
Visually distinct patterns with matching subband statistics   总被引:1,自引:0,他引:1  
A commonly used representation of a visual pattern is a statistical distribution measured from the output of a bank of filters (Gaussian, Laplacian, Gabor, etc.). Both marginal and joint distributions of filter responses have been advocated and effectively used for a variety of vision tasks, including texture classification, texture synthesis, object detection, and image retrieval. This paper examines the ability of these representations to discriminate between an arbitrary pair of visual stimuli. Examples of patterns are derived that provably possess the same marginal and joint statistical properties, yet are "visually distinct." This is accomplished by showing sufficient conditions for matching the first k moments of the marginal distributions of a pair of images. Then, given a set of filters, we show how to match the marginal statistics of the subband images formed through convolution with the filter set. Next, joint statistics are examined and images with similar joint distributions of subband responses are shown. Finally, distinct periodic patterns are derived that possess approximately the same subband statistics for any arbitrary filter set.  相似文献   

6.
整数对的低重量表示JSF3   总被引:2,自引:0,他引:2  
J.A.Solinas给出了整数对的最优带符号二进制表示,称做联合稀疏表示(JSF).JSF表示长度至多是最大整数的二进制长度加一,其平均汉明密度为1/2.利用窗口方法扩展了联合稀疏表示,给出了整数对的一种新表示方法:3-宽度联合稀疏表示(JSF3).该表示长度至多是最大整数的二进制长度加一,平均汉明密度为19/52.因此,利用JSF3计算uP+vQ比用JSF大约提高9%的效率.  相似文献   

7.
Many modern computer vision algorithms are built atop of a set of low-level feature operators (such as SIFT [23,24]; HOG [8,3]; or LBP [1,2]) that transform raw pixel values into a representation better suited to subsequent processing and classification. While the choice of feature representation is often not central to the logic of a given algorithm, the quality of the feature representation can have critically important implications for performance. Here, we demonstrate a large-scale feature search approach to generating new, more powerful feature representations in which a multitude of complex, nonlinear, multilayer neuromorphic feature representations are randomly generated and screened to find those best suited for the task at hand. In particular, we show that a brute-force search can generate representations that, in combination with standard machine learning blending techniques, achieve state-of-the-art performance on the Labeled Faces in the Wild (LFW) [19] unconstrained face recognition challenge set. These representations outperform previous state-of-the-art approaches, in spite of requiring less training data and using a conceptually simpler machine learning backend. We argue that such large-scale-search-derived feature sets can play a synergistic role with other computer vision approaches by providing a richer base of features with which to work.  相似文献   

8.
Grouping in vision can be seen as the process that organizes image entities into higher-level structures. Despite its importance, there is little consistency in the statement of the grouping problem in literature. In addition, most grouping algorithms in vision are inspired on a specific technique, rather than being based on desired characteristics, making it cumbersome to compare the behavior of various methods. We discuss six precisely formulated considerations for the design of generic grouping algorithms in vision: proper definition, invariance, multiple interpretations, multiple solutions, simplicity and robustness. We observe none of the existing algorithms for grouping in vision meet all the considerations. We present a simple algorithm as an extension of a classical algorithm, where the extension is based on taking the considerations into account. The algorithm is applied to three examples: grouping point sets, grouping poly-lines, and grouping flow-field vectors. The complexity of the greedy algorithm is O(nO/sub G/), where O/sub G/ is the complexity of the grouping measure.  相似文献   

9.
Perceptual grouping is a key intermediate-level vision problem. Parallel solutions to this problem are characterized by uneven distribution of symbolic features among the processors, unbalanced workload, and irregular interprocessor data dependency caused by the input image. In this paper, we propose two load-balancing techniques for parallelizing perceptual grouping on distributed-memory machines. By using an initial workload estimate, we first partition the computations to distribute the workload across the processors. In addition, we asynchronously perform ongoing task migrations to adapt to the unbalanced workload which may evolve differently from the initial estimate. We also discuss two strategies to manage the irregular interprocessor data dependency. To illustrate our ideas, perceptual grouping steps used in an integrated vision system for building detection are used as examples. Our experimental results show that, given 8K extracted line segments from a 1K × 1K image, both the line and junction grouping steps can be completed in 0.644 s on a 32-node SP2 and in 0.585 s on a 32-node T3D. For the same grouping steps, a serial implementation requires 10.550 s and 10.023 s on a single node of SP2 and T3D, respectively. The implementations were performed using the message passing interface standard and are portable to other high performance computing platforms.  相似文献   

10.
The Wigner Distribution (WD) for discrete images is computed for different test images with gray level and spatial frequency information contents. The most relevant characteristics of the Wigner Distribution are analyzed from 2-D displays of the original 4-D distribution of the test images. This representation through the WD is shown to be specially adequate for processing of textured information and of spatially variant degraded images.  相似文献   

11.
12.
张亚娟  祝跃飞  况百杰 《软件学报》2006,17(9):2004-2012
J.A.Solinas给出了整数对的最优带符号二进制表示,称做联合稀疏表示(JSF).JSF表示长度至多是最大整数的二进制长度加一,其平均汉明密度为1/2.利用窗口方法扩展了联合稀疏表示,给出了整数对的一种新表示方法:3-宽度联合稀疏表示(JSF3).该表示长度至多是最大整数的二进制长度加一,平均汉明密度为19/52.因此,利用JSF3计算uP+vQ比用JSF大约提高9%的效率.  相似文献   

13.
从复杂的自然图像中获取目标轮廓是计算机视觉中的经典难题,而提供符合人类感知特性和自然图像统计规律的线索合并模型是提高轮廓质量的关键问题。利用连续性和相似性线索进行轮廓编组,提出一种线索合并模型,拟合格式塔规则中连续性和相似性的统计联合条件概率。该线索合并模型解释了如何用两个相互独立的线索变量得到两个相关线索联合分布的特殊形式,克服了判别式模型刻意回避的相关线索合并问题,是更符合自然图像统计特性和人类感知特性的格式塔线索量化模型。将该模型应用于自然图像的轮廓提取中,实验结果证实了模型的有效性。  相似文献   

14.
Motion segmentation and non-rigid structure from motion are two challenging computer vision problems that have attracted numerous research interests. While the previous works handle these two problems separately, we present a general motion segmentation framework in this paper for solving these two seemingly different problems in a unified manner. At the heart of our general motion segmentation framework is a model selection mechanism based on finding the minimal basis subspace representation, by seeking the joint sparse representation of the data matrix. However, such formulation is NP-hard and we solve the convex proxy instead. Unlike other compressive sensing related works, this convex proxy solution is insufficient for our problem. The convex relaxation artefacts and noise yield multiple subspace representations, making identification of the exact number of motion subspaces challenging. We solve for the right number of subspaces by transforming this problem into a Facility Location problem with global cost and solve the factor graph formulation using max product belief propagation message passing.  相似文献   

15.
多媒体数据持续呈现爆发式增长并显现出异源异构的特性,因此跨模态学习领域研究逐渐引起学术和工业界的关注。跨模态表征与生成是跨模态学习的两大核心基础问题。跨模态表征旨在利用多种模态之间的互补性剔除模态之间的冗余,从而获得更为有效的特征表示;跨模态生成则是基于模态之间的语义一致性,实现不同模态数据形式上的相互转换,有助于提高不同模态间的迁移能力。本文系统地分析了国际与国内近年来跨模态表征与生成领域的重要研究进展,包括传统跨模态表征学习、多模态大模型表示学习、图像到文本的跨模态转换和跨模态图像生成。其中,传统跨模态表征学习探讨了跨模态统一表征和跨模态协同表征,多模态大模型表示学习探讨了基于Transformer的模型研究,图像到文本的跨模态转换探讨了图像视频的语义描述、视频字幕语义分析和视觉问答等领域的发展,跨模态图像生成从不同模态信息的跨模态联合表示方法、图像的跨模态生成技术和基于预训练的特定域图像生成阐述了跨模态生成方面的进展。本文详细综述了上述各个子领域研究的挑战性,对比了国内外研究方面的进展情况,梳理了发展脉络和学术研究的前沿动态。最后,根据上述分析展望了跨模态表征与生成的发展趋势和突破口。  相似文献   

16.
Learning identity with radial basis function networks   总被引:11,自引:0,他引:11  
Radial basis function (RBF) networks are compared with other neural network techniques on a face recognition task for applications involving identification of individuals using low-resolution video information. The RBF networks are shown to exhibit useful shift, scale and pose (y-axis head rotation) invariance after training when the input representation is made to mimic the receptive field functions found in early stages of the human vision system. In particular, representations based on difference of Gaussian (DoG) filtering and Gabor wavelet analysis are compared. Extensions of the techniques to the case of image sequence analysis are described and a time delay (TD) RBF network is used for recognising simple movement-based gestures. Finally, we discuss how these techniques can be used in real-life applications that require recognition of faces and gestures using low-resolution video images.  相似文献   

17.
为了有效地获取到更有区别性的跨模态表示,提出了一种基于多负例对比机制的跨模态表示学习方法——监督对比的跨模态表示学习(supervised contrastive cross-modal representation learning,SCCMRL),并将其应用于视觉模态和听觉模态上。SCCMRL分别通过视觉编码器和音频编码器提取得到视听觉特征,利用监督对比损失让样本数据与其多个负例进行对比,使得相同类别的视听觉特征距离更近,不同类别的视听觉特征距离更远。此外,该方法还引入了中心损失和标签损失来进一步保证跨模态表示间的模态一致性和语义区分性。为了验证SCCMRL方法的有效性,基于SCCMRL方法构建了相应的跨模态检索系统,并结合Sub_URMP和XmediaNet数据集进行了跨模态检索实验。实验结果表明,SCCMRL方法相较于当前常用的跨模态检索方法取得了更高的mAP值,同时验证了多负例对比机制下的跨模态表示学习具有可行性。  相似文献   

18.
Grammatical evolution (GE) is a form of grammar-based genetic programming. A particular feature of GE is that it adopts a distinction between the genotype and phenotype similar to that which exists in nature by using a grammar to map between the genotype and phenotype. Two variants of genotype representation are found in the literature, namely, binary and integer forms. For the first time we analyse and compare these two representations to determine if one has a performance advantage over the other. As such this study seeks to extend our understanding of GE by examining the impact of different genotypic representations in order to determine whether certain representations, and associated diversity-generation operators, improve GE’s efficiency and effectiveness. Four mutation operators using two different representations, binary and gray code representation, are investigated. The differing combinations of representation and mutation operator are tested on three benchmark problems. The results provide support for the use of an integer-based genotypic representation as the alternative representations do not exhibit better performance, and the integer representation provides a statistically significant advantage on one of the three benchmarks. In addition, a novel wrapping operator for the binary and gray code representations is examined, and it is found that across the three problems examined there is no general trend to recommend the adoption of an alternative wrapping operator. The results also back up earlier findings which support the adoption of wrapping.  相似文献   

19.
Predicting the fold, or approximate 3D structure, of a protein from its amino acid sequence is an important problem in biology. The homology modeling approach uses a protein database to identify fold-class relationships by sequence similarity. The main limitation of this method is that some proteins with similar structures appear to have very different sequences, which we call the hidden-homology problem. As in other real-world domains for machine learning, this difficulty may be caused by a low-level representation. Learning in such domains can be improved by using domain knowledge to search for representations that better match the inductive bias of a preferred algorithm. In this domain, knowledge of amino acid properties can be used to construct higher-level representations of protein sequences. In one experiment using a 179-protein data set, the accuracy of fold-class prediction was increased from 77.7% to 81.0%. The search results are analyzed to refine the grouping of small residues suggested by Dayhoff. Finally, an extension to the representation incorporates sequential context directly into the representation, which can express finer relationships among the amino acids. The methods developed in this domain are generalized into a framework that suggests several systematic roles for domain knowledge in machine learning. Knowledge may define both a space of alternative representations, as well as a strategy for searching this space. The search results may be summarized to extract feedback for revising the domain knowledge.  相似文献   

20.
语音的传统短时傅立叶分析方法不仅要假设语音具有准稳定性,而且其时间分辨率和频率分辨率存在着折衷。本文采用具有锥形核的广义类时-频分布(CK-GTFD)描述方法,并重点考察了它的瞬态响应和压缩交叉项的能力,结果表明,它不仅能同时获得好的时间分辨率和频率分辨率,对多分量信号也能精确地描述。最后,通过对语音的爆破音-元音转换段,以及元音-鼻辅音转换段的描述,表明了它在语音共振峰频谱描述、声门关闭时间确定,以及辅音-元音划分等方面的优势,为语音特征提取和识别打下了基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号