首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Under natural viewing conditions, small movements of the eye, head and body prevent the maintenance of a steady direction of gaze. It is known that stimuli tend to fade when they are stabilized on the retina for several seconds. However, it is unclear whether the physiological motion of the retinal image serves a visual purpose during the brief periods of natural visual fixation. This study examines the impact of fixational instability on the statistics of the visual input to the retina and on the structure of neural activity in the early visual system. We show that fixational instability introduces a component in the retinal input signals that, in the presence of natural images, lacks spatial correlations. This component strongly influences neural activity in a model of the LGN. It decorrelates cell responses even if the contrast sensitivity functions of simulated cells are not perfectly tuned to counter-balance the power-law spectrum of natural images. A decorrelation of neural activity at the early stages of the visual system has been proposed to be beneficial for discarding statistical redundancies in the input signals. The results of this study suggest that fixational instability might contribute to the establishment of efficient representations of natural stimuli.  相似文献   

2.
Under natural viewing conditions, the physiological instability of visual fixation keeps the projection of the stimulus on the retina in constant motion. After eye opening, chronic exposure to a constantly moving retinal image might influence the experience-dependent refinement of cell response characteristics. The results of previous modeling studies have suggested a contribution of fixational instability to the Hebbian maturation of the receptive fields of V1 simple cells (Rucci, Edelman, & Wray, 2000; Rucci & Casile, 2004). This letter examines the origins of such a contribution. Using quasilinear models of lateral geniculate nucleus units and V1 simple cells, we derive analytical expressions for the second-order statistics of thalamocortical activity before and after eye opening. We show that in the presence of natural stimulation, fixational instability introduces a spatially uncorrelated signal in the retinal input, which strongly influences the structure of correlated activity in the model. This input signal produces a regime of thalamocortical activity similar to that present before eye opening and compatible with the Hebbian maturation of cortical receptive fields.  相似文献   

3.
Based on human retinal sampling distributions and eye movements, a sequential resolution image preprocessor is developed. Combined with a nearest neighbor classifier, this preprocessor provides an efficient image classification method, the sequential resolution nearest neighbor (SRNN) classifier. The human eye has a typical fixation sequence that exploits the nonuniform sampling distribution of its retina. If the retinal resolution is not sufficient to identify an object, the eye moves in such a way that the projection of the object falls onto a retinal region with a higher sampling density. Similarly, the SRNN classifier uses a sequence of increasing resolutions until a final class decision is made. Experimental results on texture segmentation show that the preprocessor used in the SRNN classifier is considerably faster than traditional multiresolution algorithms which use all the available resolution levels to analyze the input data.  相似文献   

4.
Human eye-head co-ordination in natural exploration   总被引:1,自引:0,他引:1  
During natural behavior humans continuously adjust their gaze by moving head and eyes, yielding rich dynamics of the retinal input. Sensory coding models, however, typically assume visual input as smooth or a sequence of static images interleaved by volitional gaze shifts. Are these assumptions valid during free exploration behavior in natural environments? We used an innovative technique to simultaneously record gaze and head movements in humans, who freely explored various environments (forest, train station, apartment). Most movements occur along the cardinal axes, and the predominance of vertical or horizontal movements depends on the environment. Eye and head movements co-occur more frequently than their individual statistics predicts under an independence assumption. The majority of co-occurring movements point in opposite directions, consistent with a gaze-stabilizing role of eye movements. Nevertheless, a substantial fraction of eye movements point in the same direction as co-occurring head movements. Even under the very most conservative assumptions, saccadic eye movements alone cannot account for these synergistic movements. Hence nonsaccadic eye movements that interact synergistically with head movements to adjust gaze cannot be neglected in natural visual input. Natural retinal input is continuously dynamic, and cannot be faithfully modeled as a mere sequence of static frames with interleaved large saccades.  相似文献   

5.
《Advanced Robotics》2013,27(5):527-546
Prediction of dynamic features is an important task for determining the manipulation strategies of an object. This paper presents a technique for predicting dynamics of objects relative to the robot's motion from visual images. During the training phase, the authors use the recurrent neural network with parametric bias (RNNPB) to self-organize the dynamics of objects manipulated by the robot into the PB space. The acquired PB values, static images of objects and robot motor values are input into a hierarchical neural network to link the images to dynamic features (PB values). The neural network extracts prominent features that each induce object dynamics. For prediction of the motion sequence of an unknown object, the static image of the object and robot motor value are input into the neural network to calculate the PB values. By inputting the PB values into the closed loop RNNPB, the predicted movements of the object relative to the robot motion are calculated recursively. Experiments were conducted with the humanoid robot Robovie-IIs pushing objects at different heights. The results of the experiment predicting the dynamics of target objects proved that the technique is efficient for predicting the dynamics of the objects.  相似文献   

6.
Statistically efficient processing schemes focus the resources of a signal processing system on the range of statistically probable signals. Relying on the statistical properties of retinal motion signals during ego-motion we propose a nonlinear processing scheme for retinal flow. It maximizes the mutual information between the visual input and its neural representation, and distributes the processing load uniformly over the neural resources. We derive predictions for the receptive fields of motion sensitive neurons in the velocity space. The properties of the receptive fields are tightly connected to their position in the visual field, and to their preferred retinal velocity. The velocity tuning properties show characteristics of properties of neurons in the motion processing pathway of the primate brain.  相似文献   

7.
A biologically inspired visual system capable of motion detection and pursuit motion is implemented using a Discrete Leaky Integrate-and-Fire (DLIF) neuron model. The system consists of a visual world, a virtual retina, the neural network circuitry (DLIF) to process the information, and a set of virtual eye muscles that serve to move the input area (visual field) of the retina within the visual world. Temporal aspects of the DLIF model are heavily exploited including: spike propagation latency, relative spike timing, and leaky potential integration. A novel technique for motion detection is employed utilizing coincidence detection aspects of the DLIF and relative spike timing. The system as a whole encodes information using relative spike timing of individual action potentials as well as rate coded spike trains. Experimental results are presented in which the motion of objects is detected and tracked in real and animated video. Pursuit motion is successful using linear and also sinusoidal paths which include object velocity changes. The visual system exhibits dynamic overshoot correction heavily exploiting neural network characteristics. System performance is within the bounds of real-time applications.  相似文献   

8.
We report on a computational model of retinal motion sensitivity based on correlation-based motion detectors. We simulate object motion detection in the presence of retinal slip caused by the salamander's head movements during locomotion. Our study offers new insights into object motion sensitive ganglion cells in the salamander retina. A sigmoidal transformation of the spatially and temporally filtered retinal image substantially improves the sensitivity of the system in detecting a small target moving in place against a static natural background in the presence of comparatively large, fast simulated eye movements, but is detrimental to the direction-selectivity of the motion detector. The sigmoid has insignificant effects on detector performance in simulations of slow, high contrast laboratory stimuli. These results suggest that the sigmoid reduces the system's noise sensitivity.  相似文献   

9.
With the proliferation of video data, video summarization is an ideal tool for users to browse video content rapidly. In this paper, we propose a novel foveated convolutional neural networks for dynamic video summarization. We are the first to integrate gaze information into a deep learning network for video summarization. Foveated images are constructed based on subjects’ eye movements to represent the spatial information of the input video. Multi-frame motion vectors are stacked across several adjacent frames to convey the motion clues. To evaluate the proposed method, experiments are conducted on two video summarization benchmark datasets. The experimental results validate the effectiveness of the gaze information for video summarization despite the fact that the eye movements are collected from different subjects from those who generated summaries. Empirical validations also demonstrate that our proposed foveated convolutional neural networks for video summarization can achieve state-of-the-art performances on these benchmark datasets.  相似文献   

10.
Spatiotemporal Visual Considerations for Video Coding   总被引:3,自引:0,他引:3  
Human visual sensitivity varies with not only spatial frequencies, but moving velocities of image patterns. Moreover, the loss of visual sensitivity due to object motions might be compensated by eye movement. Removing the psychovisual redundancies in both the spatial and temporal frequency domains facilitates an efficient coder without perceptual degradation. Motivated by this, a visual measure is proposed for the purpose of video compressions. The novelty of this analysis relies on combining three visual factors altogether: the motion attention model, unconstrained eye-movement incorporated spatiovelocity visual sensitivity model, and visual masking model. For each motion-unattended macroblock, the retinal velocity is evaluated so that discrete cosine transform coefficients to which the human visual system has low sensitivity are picked up with the aid of eye movement incorporated spatiovelocity visual model. Based on masking thresholds of those low-sensitivity coefficients, a spatiotemporal distortion masking measure is determined. Accordingly, quantization parameters at macroblock level for video coding are adjusted on the basis of this measure. Experiments conducted by H.264 exhibit the effectiveness of the proposed scheme in improving coding performance without picture quality degradation  相似文献   

11.
In a moving agent, the different apparent motion of objects located at various distances provides an important source of depth information. While motion parallax is evident for large translations of the agent, a small parallax also occurs in most head/eye systems during rotations of the cameras. A similar parallax is also present in the human eye, so that a redirection of gaze shifts the projection of an object on the retina by an amount that depends not only on the amplitude of the rotation, but also on the distance of the object with respect to the observer. This study examines the accuracy of distance estimation on the basis of the parallax produced by camera rotations. Sequences of human eye movements were used to control the motion of a pan/tilt system specifically designed to reproduce the oculomotor parallax present in the human eye. We show that the oculomotor strategies by which humans scan visual scenes produce parallaxes that provide accurate estimation of distance. This information simplifies challenging visual tasks such as image segmentation and figure/ground segregation.  相似文献   

12.
Current digital image/video storage, transmission and display technologies use uniformly sampled images. On the other hand, the human retina has a nonuniform sampling density that decreases dramatically as the solid angle from the visual fixation axis increases. Therefore, there is sampling mismatch. This paper introduces retinally reconstructed images (RRI), a representation of digital images that enables a resolution match with the retina. To create an RRI, the size of the input image, the viewing distance and the fixation point should be known. In the coding phase, we compute the “retinal codes”, which consist of the retinal sampling locations onto which the image projects, together with the retinal outputs at these locations. In the decoding phase, we use the backprojection of the retinal codes onto the input image grid as B-spline control coefficients, in order to construct a 3D B-spline surface with nonuniform resolution properties. An RRI is then created by mapping the B-spline surface onto a uniform grid, using triangulation. Transmitting or storing the “retinal codes” instead of the full resolution images enables up to two orders of magnitude data compression, depending on the resolution of the input image, the size of the input image and the viewing distance. The data reduction capability of retinal codes and RRI is promising for digital video storage and transmission applications. However, the computational burden can be substantial in the decoding phase  相似文献   

13.
To understand possible strategies of temporal spike coding in the central nervous system, we study functional neuromimetic models of visual processing for static images. We will first present the retinal model which was introduced by Van Rullen and Thorpe and which represents the multiscale contrast values of the image using an orthonormal wavelet transform. These analog values activate a set of spiking neurons which each fire once to produce an asynchronous wave of spikes. According to this model, the image may be progressively reconstructed from this spike wave thanks to regularities in the statistics of the coefficients determined with natural images. Here, we study mathematically how the quality of information transmission carried by this temporal representation varies over time. In particular, we study how these regularities can be used to optimize information transmission by using a form of temporal cooperation of neurons to code analog values. The original model used wavelet transforms that are close to orthogonal. However, the selectivity of realistic neurons overlap, and we propose an extension of the previous model by adding a spatial cooperation between filters. This model extends the previous scheme for arbitrary-and possibly nonorthogonal-representations of features in the images. In particular, we compared the performance of increasingly over-complete representations in the retina. Results show that this algorithm provides an efficient spike coding strategy for low-level visual processing which may adapt to the complexity of the visual input.  相似文献   

14.
Saccadic eye movements remain spatially accurate even when the target becomes invisible and the initial eye position is perturbed. The brain accomplishes this in part by remapping the remembered target location in retinal coordinates. The computation that underlies this visual remapping is approximated by vector subtraction: the original saccade vector is updated by subtracting the vector corresponding to the intervening eye movement. The neural mechanism by which vector subtraction is implemented is not fully understood. Here, we investigate vector subtraction within a framework in which eye position and retinal target position signals interact multiplicatively (gain field). When the eyes move, they induce a spatial modulation of the firing rates across a retinotopic map of neurons. The updated saccade metric can be read from the shift of the peak of the population activity across the map. This model uses a quasi-linear (half-rectified) dependence on the eye position and requires the slope of the eye position input to be negatively proportional to the preferred retinal position of each neuron. We derive analytically this constraint and study its range of validity. We discuss how this mechanism relates to experimental results reported in the frontal eye fields of macaque monkeys.  相似文献   

15.
M Maltz  D Shinar 《Human factors》1999,41(1):15-25
This 2-part study focuses on eye movements to explain driving-related visual performance in younger and older persons. In the first task, participants' eye movements were monitored as they viewed a traffic scene image with a numeric overlay and visually located the numbers in their sequential order. The results showed that older participants had significantly longer search episodes than younger participants, and that the visual search of older adults was characterized by more fixations and shorter saccades, although the average fixation durations remained the same. In the second task, participants viewed pictures of traffic scenes photographed from the driver's perspective. Their task was to assume the role of the driver and regard the image accordingly. Results in the second task showed that older participants allocated a larger percentage of their visual scan time to a small subset of areas in the image, whereas younger participants scanned the images more evenly. Also, older participants revisited the same areas and younger participants did not. The results suggest how aging might affect the efficacy of visual information processing. Potential applications of this research include training older drivers for a more effective visual search, and providing older drivers with redundant information in case some information is missed.  相似文献   

16.
目的 视网膜血管健康状况的自动分析对糖尿病、心脑血管疾病以及多种眼科疾病的快速无创诊断具有重要参考价值。视网膜图像中血管网络结构复杂且图像背景亮度不均使得血管区域的准确自动提取具有较大难度。本文通过使用具有对称全卷积结构的U-net深度神经网络实现视网膜血管的高精度分割。方法 基于U-net网络中的层次化对称结构和Dense-net网络中的稠密连接方式,提出一种改进的适用于视网膜血管精准提取的深度神经网络模型。首先使用白化预处理技术弱化原始彩色眼底图像中的亮度不均,增强图像中血管区域的对比度;接着对数据集进行随机旋转、Gamma变换操作实现数据增广;然后将每一幅图像随机分割成若干较小的图块,用于减小模型参数规模,降低训练难度。结果 使用多种性能指标对训练后的模型进行综合评定,模型在DRIVE数据集上的灵敏度、特异性、准确率和AUC(area under the curve)分别达到0.740 9、0.992 9、0.970 7和0.917 1。所提算法与目前主流方法进行了全面比较,结果显示本文算法各项性能指标均表现良好。结论 本文针对视网膜图像中血管区域高精度自动提取难度大的问题,提出了一种具有稠密连接方式的对称全卷积神经网络改进模型。结果表明该模型在视网膜血管分割中能够达到良好效果,具有较好的研究及应用价值。  相似文献   

17.
A test of metabolically efficient coding in the retina   总被引:2,自引:0,他引:2  
We tested the hypothesis that aspects of the neural code of retinal ganglion cells are optimized to transmit visual information at minimal metabolic cost. Under a broad ensemble of light patterns, ganglion cell spike trains consisted of sparse, precise bursts of spikes. These bursts were viewed as independent neural symbols. The noise in each burst was measured via repeated presentation of the visual stimulus, and the energy cost was estimated from the total charge flow during ganglion cell spiking. Given these costs and noise, the theory of efficient codes predicts an optimal distribution of symbol usage. Symbols that are either noisy or costly occur less frequently in this optimal code. We found good qualitative and quantitative agreement with the measured distribution of burst sizes for ganglion cells in the tiger salamander retina.  相似文献   

18.
Efficiency of a video coding process, as well as accuracy of an objective video quality evaluation can be significantly improved by introduction of the human visual system (HVS) characteristics. In this paper we analyze one of these characteristics; namely, visual acuity reduction due to the foveated vision and object movements in a video sequence. We propose a new video quality metric called Foveated Mean Squared Error (FMSE) that takes into account a variable resolution of the HVS across the visual field. The highest visual acuity is at the point of fixation that falls into fovea, an area at retina with the highest density of photoreceptors. Visual acuity decreases rapidly for image regions which are further with respect to the fixation point. FMSE also utilizes the effect of additional spatial acuity reduction due to motion in a video sequence. The quality measures calculated by FMSE have shown a high correlation with experimental results obtained by subjective video quality assessment.  相似文献   

19.
A microsensor that merges sensing and scanning functions on a single chip has been designed and fabricated, resulting in the first integrated scanning retina of its kind. A microfabrication technique has been developed to combine a one-dimensional array of photodiodes and electrostatically driven scanning slits on a single chip. The scanner actuates 12-μm-wide microslits by up to 20 μm on top of 30-μm-wide photodiodes, and the motion generates an effect similar to that of the retinal scanning vergence found in the insects' compound eyes. The silicon retina is coupled with an array of 120-μm-diameter microlenses to compose a microsized scanning compound eye, and the effect of retinal scanning in edge and position detection is demonstrated with the sensor. Each individual visual unit of the compound eye detects light contrast due to the scanning motion of the visual axis. The architecture of this scanning retina increases the resolution of a visual system with a relatively small number of receptors  相似文献   

20.
Powerful data reduction and selection processes, such as selective attention mechanisms and space-variant sensing in humans, can provide great advantages for developing effective real-time robot vision systems. The use of such processes should be closely coupled with motor capabilities, in order to actively interact with the environment. In this paper, an anthropomorphic vision system architecture integrating retina-like sensing, hierarchical structures and selective attention mechanisms is proposed. Direction of gaze is shifted based on both the sensory and semantic characteristics of the visual input, so that a task-dependent attentive behavior is produced. The sensory features currently included in the system are related to optical flow invariants, thus providing the system with motion detection capabilities. A neural network architecture for visual recognition is also included, which produces semantic-driven gaze shifts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号