首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 62 毫秒
1.
Individual cells that respond preferentially to particular objects have been found in the ventral visual pathway. How the brain is able to develop neurons that exhibit these object selective responses poses a significant challenge for computational models of object recognition. Typically, many objects make up a complex natural scene and are never presented in isolation. Nonetheless, the visual system is able to build invariant object selective responses. In this paper, we present a model of the ventral visual stream, VisNet, which can solve the problem of learning object selective representations even when multiple objects are always present during training. Past research with the VisNet model has shown that the network can operate successfully in a similar training paradigm, but only when training comprises many different object pairs. Numerous pairings are required for statistical decoupling between objects. In this research, we show for the first time that VisNet is capable of utilizing the statistics inherent in independent rotation to form object selective representations when training with just two objects, always presented together. Crucially, our results show that in a dependent rotation paradigm, the model fails to build object selective representations and responds as if the two objects are in fact one. If the objects begin to rotate independently, the network forms representations for each object separately.  相似文献   

2.
3.
Semantic image parsing, which refers to the process of decomposing images into semantic regions and constructing the structure representation of the input, has recently aroused widespread interest in the field of computer vision. The recent application of deep representation learning has driven this field into a new stage of development. In this paper, we summarize three aspects of the progress of research on semantic image parsing, i.e., category-level semantic segmentation, instance-level semantic segmentation, and beyond segmentation. Specifically, we first review the general frameworks for each task and introduce the relevant variants. The advantages and limitations of each method are also discussed. Moreover, we present a comprehensive comparison of different benchmark datasets and evaluation metrics. Finally, we explore the future trends and challenges of semantic image parsing.  相似文献   

4.
Extracting semantic video objects   总被引:6,自引:0,他引:6  
We present an accurate and user-interactive semantic video object (SVO) extraction system. Although we also obtain an SVO with an accurate boundary by integrating temporal and spatial information, our way is quite different from others' work. Instead of fusing spatial and temporal segmentations on the first or all the frames of a video sequence, our system adaptively performs spatial and temporal segmentation and fusion when necessary. To achieve this, our system detects the variations between successive frames. We only need to fuse the spatial and temporal segmentation when a large variation occurs. Otherwise, the system tracks the previous SVO's boundary. We find this simple method efficient in both speed and accuracy. Since the temporal segmentation, spatial segmentation, spatio-temporal fusion, and boundary tracking all employ simple algorithms, our system has a low computational complexity  相似文献   

5.
In multi-task learning, there are roughly two approaches to discovering representations. The first is to discover task relevant representations, i.e., those that compactly represent solutions to particular tasks. The second is to discover domain relevant representations, i.e., those that compactly represent knowledge that remains invariant across many tasks. In this article, we propose a new approach to multi-task learning that captures domain-relevant knowledge by learning potential-based shaping functions, which augment a task’s reward function with artificial rewards. We address two key issues that arise when deriving potential functions. The first is what kind of target function the potential function should approximate; we propose three such targets and show empirically that which one is best depends critically on the domain and learning parameters. The second issue is the representation for the potential function. This article introduces the notion of $k$ -relevance, the expected relevance of a representation on a sample sequence of $k$ tasks, and argues that this is a unifying definition of relevance of which both task and domain relevance are special cases. We prove formally that, under certain assumptions, $k$ -relevance converges monotonically to a fixed point as $k$ increases, and use this property to derive Feature Selection Through Extrapolation of k-relevance (FS-TEK), a novel feature-selection algorithm. We demonstrate empirically the benefit of FS-TEK on artificial domains.  相似文献   

6.
Learning overcomplete representations   总被引:38,自引:0,他引:38  
In an overcomplete basis, the number of basis vectors is greater than the dimensionality of the input, and the representation of an input is not a unique combination of basis vectors. Overcomplete representations have been advocated because they have greater robustness in the presence of noise, can be sparser, and can have greater flexibility in matching structure in the data. Overcomplete codes have also been proposed as a model of some of the response properties of neurons in primary visual cortex. Previous work has focused on finding the best representation of a signal using a fixed overcomplete basis (or dictionary). We present an algorithm for learning an overcomplete basis by viewing it as probabilistic model of the observed data. We show that overcomplete bases can yield a better approximation of the underlying statistical distribution of the data and can thus lead to greater coding efficiency. This can be viewed as a generalization of the technique of independent component analysis and provides a method for Bayesian reconstruction of signals in the presence of noise and for blind source separation when there are more sources than mixtures.  相似文献   

7.
Spence I 《Human factors》2004,46(4):738-747
Information displays commonly use 2-D and 3-D objects even though the numbers represented are 1-D. This practice may be problematic because the psychophysical relation between perceived and physical magnitudes is generally nonlinear for areas and volumes. Nonetheless, this research shows that apparent 2-D and 3-D objects can produce linear psychophysical functions if only one dimension shows variation. Processing time increases with the number of dimensions in the objects that show variation, not with the apparent dimensionality. Indeed, when only one dimension showed variation, apparent 3-D objects were judged more quickly than were apparent 2-D or 1-D objects. These results present a challenge for computational models of size perception and have implications for the design of information displays. Actual or potential applications of this research include the design and use of statistical graphs and information displays; objects that display variation in more than one dimension should not be used to represent single (1-D) numerical variables if they are to be judged accurately and rapidly.  相似文献   

8.
An algorithm is described for partitioning intersecting polyhedrons into disjoint pieces and, more generally, removing intersections from sets of planar polygons embedded in three space. Polygons, or faces, need not be convex and may contain multiple holes. Intersections are removed by considering pairs of faces and slicing the faces apart along their regions of intersection. To reduce the number of face pairs examined, bounding boxes around groups of faces are checked for overlap. The intersection algorithm also computes set-theoretic operations on polyhedrons. Information gathered during face cutting is used to determine which portions of the original boundaries may be present in the result of an intersection, a union, or a difference of solids. The method includes provisions to detect and in some cases overcome, the effects of numerical inaccuracy on the topological decisions that the algorithm must make. The regions in which ambiguous results are possible are flagged so that the user can take appropriate action  相似文献   

9.
10.
基于语义对象报表工具的实现   总被引:3,自引:0,他引:3  
尹呈  徐立臻 《计算机工程与设计》2006,27(16):3048-3050,3054
现有报表工具对于处理中国式报表具有一定的困难.对国内外报表工具的优缺点进行分析,提出一个新的报表模型.通过引入语义对象的概念,屏蔽了复杂数据源的异构性;利用定义一种可嵌套的描述语言,解决了报表格式复杂多变的特点.最后通过实例,证明此模型能够满足中国式报表的需求,同时具有扩展性、可重用性和易操作性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号