期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A model of saliency-based visual attention for rapid scene analysis 总被引：35，自引：0，他引：35

《IEEE transactions on pattern analysis and machine intelligence》1998,20(11):1254-1259

A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail 相似文献

2.

Leukocyte image segmentation by visual attention and extreme learning machine

Chen Pan Dong Sun Park Yong Yang Hyouck Min Yoo 《Neural computing & applications》2012,21(6):1217-1227

This paper presents a fast and simple framework for leukocyte image segmentation by learning with extreme learning machine (ELM) and sampling via simulating visual system. In sampling stage, visual attention and the effect of microsaccades in fixation are simulated. The high gradient pixels in fixation regions are sampled to group training set. We designed an automatic sampling process for leukocyte image according to the staining knowledge of blood smears. In learning stage, ELM classifier is trained online to simulate visual neuron system and then extracts pixels of object from image. The ELM-based segmentation is fully automatic by the proposed framework, which could find efficient samples actively, train the classification model in real time and almost no parameter adjusted. Experimental results demonstrated the new method could extract entire leukocyte from complex scenes, has equivalent performance compared to the SVM-based method and exceeds the marker-controlled watershed algorithm. 相似文献

3.

Accurate sensing of scene geo-context via mobile visual localization

Heng Liu Houqiang Li Tao Mei Jiebo Luo 《Multimedia Systems》2015,21(3):255-265

Image geo-tagging has drawn a great deal of attention in recent years. The geographic information associated with images can be used to promote potential applications such as location recognition or virtual navigation. In this paper, we propose a novel approach for accurate mobile image geo-tagging in urban areas. The approach is able to provide a comprehensive set of geo-context information based on the current image, including the real location of the camera and the viewing angle, as well as the location of the captured scene. Moreover, the parsed building facades and their geometric structures can also be estimated. First, for the image to be geo-tagged, we perform partial duplicate image retrieval to filter crowd-sourced images capturing the same scene. We then employ the structure-from-motion technique to reconstruct a sparse 3D point cloud of the scene. Meanwhile, the geometric structure of the query image is analyzed to extract building facades. Finally, by combining the reconstructed 3D scene model and the extracted structure information, we can register the camera location and viewing direction to a real-world map. The captured building location and facade orientation are also aligned. The effectiveness of the proposed system is demonstrated by experiment results. 相似文献

4.

Modeling the control of attention in visual workspaces

Steelman KS McCarley JS Wickens CD 《Human factors》2011,53(2):142-153

相似文献

5.

Indoor scene understanding via RGB-D image segmentation employing depth-based CNN and CRFs

Li Wei Gu Junhua Dong Yongfeng Dong Yao Han Jungong 《Multimedia Tools and Applications》2020,79(47-48):35475-35489

With the availability of low-cost depth-visual sensing devices, such as Microsoft Kinect, we are experiencing a growing interest in indoor environment understanding, at the core of which is semantic segmentation in RGB-D image. The latest research shows that the convolutional neural network (CNN) still dominates the image semantic segmentation field. However, down-sampling operated during the training process of CNNs leads to unclear segmentation boundaries and poor classification accuracy. To address this problem, in this paper, we propose a novel end-to-end deep architecture, termed FuseCRFNet, which seamlessly incorporates a fully-connected Conditional Random Fields (CRFs) model into a depth-based CNN framework. The proposed segmentation method uses the properties of pixel-to-pixel relationships to increase the accuracy of image semantic segmentation. More importantly, we formulate the CRF as one of the layers in FuseCRFNet to refine the coarse segmentation in the forward propagation, in meanwhile, it passes back the errors to facilitate the training. The performance of our FuseCRFNet is evaluated by experimenting with SUN RGB-D dataset, and the results show that the proposed algorithm is superior to existing semantic segmentation algorithms with an improvement in accuracy of at least 2%, further verifying the effectiveness of the algorithm.

相似文献

6.

Leveraging visual attention and neural activity for stereoscopic 3D visual comfort assessment

Qiuping Jiang Feng Shao Gangyi Jiang Mei Yu Zongju Peng 《Multimedia Tools and Applications》2017,76(7):9405-9425

Visual comfort assessment (VCA) for stereoscopic three-dimensional (S3D) images is a challenging problem in the community of 3D quality of experience (3D-QoE). The goal of VCA is to automatically predict the degree of perceived visual discomfort in line with subjective judgment. The challenges of VCA typically lie in the following two aspects: 1) formulating effective visual comfort-aware features, and 2) finding an appropriate way to pool them into an overall visual comfort score. In this paper, a novel two-stage framework is proposed to address these problems. In the first stage, primary predictive feature (PPF) and advanced predictive feature (APF) are separately extracted and then integrated to reflect the perceived visual discomfort for 3D viewing. Specifically, we compute the S3D visual attention-weighted disparity statistics and neural activities of the middle temporal (MT) area in human brain to construct the PPF and APF, respectively. Followed by the first stage, the integrated visual comfort-aware features are fused with a single visual comfort score by using random forest (RF) regression, mapping from a high-dimensional feature space into a low-dimensional quality (visual comfort) space. Comparison results with five state-of-the-art relevant models on a standard benchmark database confirm the superior performance of our proposed method. 相似文献

7.

Modeling layered artificial neural networks using a visual programming paradigm

Kenneth J. Mackin 《Artificial Life and Robotics》2009,14(3):422-424

The understanding of soft computing methodology often requires grasping abstract concepts or imagining complex interactions of large models over long computing cycles. However, this can be difficult for students with a weak background in mathematics, especially in the early stages of soft computing education. This article introduces the idea of applying a visual programming paradigm as a tool for an educational introduction to soft computing methods. IntelligentPad, proposed by Y. Tanaka, was used as the visual programming paradigm. IntelligentPad gives a visual appearance to objects or classes, and allows users to operate and link different objects together using a mouse. This article reports on using IntelligentPad to teach the basic mechanisms of artificial neural networks. The proposed method was applied to 3rd-year college students to verify its validity as a teaching method. 相似文献

8.

Modeling spatial layout for scene image understanding via a novel multiscale sum-product network

《Expert systems with applications》2016

Semantic image segmentation is challenging due to the large intra-class variations and the complex spatial layouts inside natural scenes. This paper investigates this problem by designing a new deep architecture, called multiscale sum-product network (MSPN), which utilizes multiscale unary potentials as the inputs and models the spatial layouts of image content in a hierarchical manner. That is, the proposed MSPN models the joint distribution of multiscale unary potentials and object classes instead of single unary potentials in popular settings. Besides, MSPN characterizes scene spatial layouts in a fine-to-coarse manner to enforce the consistency in labeling. Multiscale unary potentials at different scales can thus help overcome semantic ambiguities caused by only evaluating single local regions, while long-range spatial correlations can further refine image labeling. In addition, higher orders are able to pose the constraints among labels. By this way, multi-scale unary potentials, long-range spatial correlations, higher-order priors are well modeled under the uniform framework in MSPN. We conduct experiments on two challenging benchmarks consisting of the MSRC-21 dataset and the SIFT FLOW dataset. The results demonstrate the superior performance of our method comparing with the previous graphical models for understanding scene images. 相似文献

9.

Automatic scene generation using sentiment analysis and bidirectional recurrent neural network with multi-head attention

Dharaniya R. Indumathi J. Uma G. V. 《Neural computing & applications》2022,34(19):16945-16958

Neural Computing and Applications - Text generation is one of the complex tasks associated with natural language processing. For efficient text generation, syntax and semantics of the language have... 相似文献

10.

Human-centered image classification via a neural network considering visual and biological features

Horii Kazaha Maeda Keisuke Ogawa Takahiro Haseyama Miki 《Multimedia Tools and Applications》2020,79(7-8):4395-4415

Multimedia Tools and Applications - In this paper, we propose a human-centered image classification via a neural network considering visual and biological features. The proposed method has two... 相似文献

11.

Video scene segmentation and semantic representation using a novel scheme

Songhao Zhu Yuncai Liu 《Multimedia Tools and Applications》2009,42(2):183-205

Grouping video content into semantic segments and classifying semantic scenes into different types are the crucial processes to content-based video organization, management and retrieval. In this paper, a novel approach to automatically segment scenes and semantically represent scenes is proposed. Firstly, video shots are detected using a rough-to-fine algorithm. Secondly, key-frames within each shot are selected adaptively with hybrid features, and redundant key-frames are removed by template matching. Thirdly, spatio-temporal coherent shots are clustered into the same scene based on the temporal constraint of video content and visual similarity between shot activities. Finally, under the full analysis of typical characters on continuously recorded videos, scene content is semantically represented to satisfy human demand on video retrieval. The proposed algorithm has been performed on various genres of films and TV program. Promising experimental results show that the proposed method makes sense to efficient retrieval of interesting video content.

Yuncai LiuEmail:

相似文献

12.

Enhancing knowledge discovery via association-based evolution of neural logic networks 总被引：1，自引：0，他引：1

Chia H.W.K. Tan C.L. Sung S.Y. 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(7):889-901

The comprehensibility aspect of rule discovery is of emerging interest in the realm of knowledge discovery in databases. Of the many cognitive and psychological factors relating the comprehensibility of knowledge, we focus on the use of human amenable concepts as a representation language in expressing classification rules. Existing work in neural logic networks (or neulonets) provides impetus for our research; its strength lies in its ability to learn and represent complex human logic in decision-making using symbolic-interpretable net rules. A novel technique is developed for neulonet learning by composing net rules using genetic programming. Coupled with a sequential covering approach for generating a list of neulonets, the straightforward extraction of human-like logic rules from each neulonet provides an alternate perspective to the greater extent of knowledge that can potentially be expressed and discovered, while the entire list of neulonets together constitute an effective classifier. We show how the sequential covering approach is analogous to association-based classification, leading to the development of an association-based neulonet classifier. Empirical study shows that associative classification integrated with the genetic construction of neulonets performs better than general association-based classifiers in terms of higher accuracies and smaller rule sets. This is due to the richness in logic expression inherent in the neulonet learning paradigm. 相似文献

13.

Combining SUN-based visual attention model and saliency contour detection algorithm for apple image segmentation 总被引：2，自引：0，他引：2

Wang Dandan He Dongjian Song Huaibo Liu Chang Xiong Hongting 《Multimedia Tools and Applications》2019,78(13):17391-17411

Accurate segmentation of apple fruit under natural illumination conditions provides benefits for growers to plan relevant applications of nutrients and pesticides. It also plays an important role for monitoring the growth status of the fruit. However, the segmentation of apples throughout various growth stages had only achieved a limited success so far due to the color changes of apple fruit as it matures as well as occlusion and the non-uniform background of apple images acquired in an orchard environment. To achieve the segmentation of apples with different colors and with various illumination conditions for the whole growth stage, a segmentation method independent of color was investigated. Features, including saliency and contour of the image, were combined in this algorithm to remove background and extract apples. Saliency using natural statistics (SUN) visual attention model was used for background removal and it was combined with threshold segmentation algorithm to extract salient binary region of apple images. The centroids of the obtained salient binary region were then extracted as initial seed points. Image sharpening, globalized probability of boundary-oriented watershed transform-ultrametric contour map (gPb-OWT-UCM) and Otsu algorithms were applied to detect saliency contours of images. With the built seed points and extracted saliency contours, a region growing algorithm was performed to accurately segment apples by retaining as many fruit pixels and removing as many background pixels as possible. A total of 556 apple images captured in natural conditions were used to evaluate the effectiveness of the proposed method. An average segmentation error (SE), false positive rate (FPR), false negative rate (FNR) and overlap Index (OI) of 8.4, 0.8, 7.5 and 90.5% respectively, were achieved and the performance of the proposed method outperformed other six methods in comparison. The method developed in this study can provide a more effective way to segment apples with green, red, and partially red colors without changing any features and parameters and therefore it is also applicable for monitoring the growth status of apples.

相似文献

14.

Hybrid convolutional neural networks and optical flow for video visual attention prediction

Meijun Sun Ziqi Zhou Dong Zhang Zheng Wang 《Multimedia Tools and Applications》2018,77(22):29231-29244

In this paper, a convolutional neural networks (CNN) and optical flow based method is proposed for prediction of visual attention in the videos. First, a deep-learning framework is employed to extract spatial features in frames to replace those commonly used handcrafted features. The optical flow is calculated to obtain the temporal feature of the moving objects in video frames, which always draw audiences’ attentions. By integrating these two groups of features, a hybrid spatial temporal feature set is obtained and taken as the input of a support vector machine (SVM) to predict the degree of visual attention. Finally, two publicly available video datasets were used to test the performance of the proposed model, where the results have demonstrated the efficacy of the proposed approach. 相似文献

15.

Image visual attention computation and application via the learning of object attributes

Junwei Han Dongyang Wang Ling Shao Xiaoliang Qian Gong Cheng Jungong Han 《Machine Vision and Applications》2014,25(7):1671-1683

相似文献

16.

Object selection in visual scene via oscillatory network with controllable coupling and self-organized performance

E. S. Grichuk M. G. Kuzmina E. A. Manykin 《Optical Memory & Neural Networks》2011,20(2):113-119

An oscillatory network model with controllable coupling and self-organized synchronization-based performance was developed for image processing. The model demonstrates the following capabilities: (a) brightness segmentation of real grey-level images; (b) colored image segmentation; (c) selective image segmentation—extraction of the subset of image fragments with brightness values contained in an arbitrary given interval. An additional capability—successive selection of spatially separated fragments of a visual scene—has been achieved via further model extension. The fragment selection (under minor natural restrictions on mutual fragment locations) is based on in-phase internal synchronization of oscillator ensembles, corresponding to all the fragments, and distinct phase shifts between different ensembles. 相似文献

17.

Multi-atlas segmentation of optic disc in retinal images via convolutional neural network

Yang Xinbo Zhang Yan 《Multimedia Tools and Applications》2021,80(11):16537-16547

Multi-atlas segmentation is widely accepted as an essential image segmentation approach. Through leveraging on the information from the atlases instead of utilizing the model-based segmentation techniques, the multi-atlas segmentation could significantly enhance the accuracy of segmentation. However, label fusion, which plays an important role for multi-atlas segmentation still remains the primary challenge. Bearing this in mind, a deep learning-based approach is presented through integrating feature extraction and label fusion. The proposed deep learning architecture consists of two independent channels composing of continuous convolutional layers. To evaluate the performance our approach, we conducted comparison experiments between state-of-the-art techniques and the proposed approach on publicly available datasets. Experimental results demonstrate that the accuracy of the proposed approach outperforms state-of-the-art techniques both in efficiency and effectiveness.

相似文献

18.

基于FPGA的神经振荡器设计及优化

李啸隽戴孝亮《电子技术应用》2011,37(7):32-35

为神经振荡器提出了一种高效的FPGA实现方案,介绍了一种改进的分布式算法(DA),以便于最大限度地利用FPGA上的查找表(LUT)资源.整个系统在Matlab/Simulink下采用Altera公司的DSP Builder构建.该方法节约了74％的查找表、75％的寄存器和100％的嵌入式乘法器资源.同时,该方案得到了令... 相似文献

19.

Fully convolutional neural network with attention gate and fuzzy active contour model for skin lesion segmentation

Tran Thi-Thao Pham Van-Truong 《Multimedia Tools and Applications》2022,81(10):13979-13999

Multimedia Tools and Applications - This study proposes an approach for segmentation of skin lesions from dermoscopic images based on fully convolutional neural network and active contour model... 相似文献

20.

Synchronization of single-degree-of-freedom oscillators via neural network based on fixed-time terminal sliding mode control scheme

Sun Haibin Hou Linlin Li Chaojie 《Neural computing & applications》2019,31(10):6365-6372

Neural Computing and Applications - In this paper, the synchronization problem is investigated for two single-degree-of-freedom oscillators via neural network based on fixed-time terminal sliding... 相似文献