首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Convolutional Neural Networks have dominated the field of computer vision for the last ten years, exhibiting extremely powerful feature extraction capabilities and outstanding classification performance. The main strategy to prolong this trend in the state-of-the-art literature relies on further upscaling networks in size. However, costs increase rapidly while performance improvements may be marginal. Our main hypothesis is that adding additional sources of information can help to increase performance and that this approach is more cost-effective than building bigger networks, which involve higher training time, larger parametrisation space and higher computational resources requirements. In this paper, an ensemble method for accurate image classification is proposed, fusing automatically detected features through a Convolutional Neural Network and a set of manually defined statistical indicators. Through a combination of the predictions of a CNN and a secondary classifier trained on statistical features, a better classification performance can be achieved cheaply. We test five different CNN architectures and multiple learning algorithms in a diverse number of datasets to validate our proposal. According to the results, the inclusion of additional indicators and an ensemble classification approach help to increase the performance in all datasets. Both code and datasets are publicly available via GitHub at: https://github.com/jahuerta92/cnn-prob-ensemble.  相似文献   

2.
Despite the tremendous achievements of deep convolutional neural networks (CNNs) in many computer vision tasks, understanding how they actually work remains a significant challenge. In this paper, we propose a novel two-step understanding method, namely Salient Relevance (SR) map, which aims to shed light on how deep CNNs recognize images and learn features from areas, referred to as attention areas, therein. Our proposed method starts out with a layer-wise relevance propagation (LRP) step which estimates a pixel-wise relevance map over the input image. Following, we construct a context-aware saliency map, SR map, from the LRP-generated map which predicts areas close to the foci of attention instead of isolated pixels that LRP reveals. In human visual system, information of regions is more important than of pixels in recognition. Consequently, our proposed approach closely simulates human recognition. Experimental results using the ILSVRC2012 validation dataset in conjunction with two well-established deep CNN models, AlexNet and VGG-16, clearly demonstrate that our proposed approach concisely identifies not only key pixels but also attention areas that contribute to the underlying neural network's comprehension of the given images. As such, our proposed SR map constitutes a convenient visual interface which unveils the visual attention of the network and reveals which type of objects the model has learned to recognize after training. The source code is available at https://github.com/Hey1Li/Salient-Relevance-Propagation.  相似文献   

3.
Event classification is inherently sequential and multimodal. Therefore, deep neural models need to dynamically focus on the most relevant time window and/or modality of a video. In this study, we propose the Multimodal Attentive Fusion Network (MAFnet), an architecture that can dynamically fuse visual and audio information for event recognition. Inspired by prior studies in neuroscience, we couple both modalities at different levels of visual and audio paths. Furthermore, the network dynamically highlights a modality at a given time window relevant to classify events. Experimental results in AVE (Audio-Visual Event), UCF51, and Kinetics-Sounds datasets show that the approach can effectively improve the accuracy in audio-visual event classification. Code is available at: https://github.com/numediart/MAFnet  相似文献   

4.
Due to the huge gap between the high dynamic range of natural scenes and the limited (low) range of consumer-grade cameras, a single-shot image can hardly record all the information of a scene. Multi-exposure image fusion (MEF) has been an effective way to solve this problem by integrating multiple shots with different exposures, which is in nature an enhancement problem. During fusion, two perceptual factors including the informativeness and the visual realism should be concerned simultaneously. To achieve the goal, this paper presents a deep perceptual enhancement network for MEF, termed as DPE-MEF. Specifically, the proposed DPE-MEF contains two modules, one of which responds to gather content details from inputs while the other takes care of color mapping/correction for final results. Both extensive experimental results and ablation studies are conducted to show the efficacy of our design, and demonstrate its superiority over other state-of-the-art alternatives both quantitatively and qualitatively. We also verify the flexibility of the proposed strategy on improving the exposure quality of single images. Moreover, our DPE-MEF can fuse 720p images in more than 60 pairs per second on an Nvidia 2080Ti GPU, making it attractive for practical use. Our code is available at https://github.com/dongdong4fei/DPE-MEF.  相似文献   

5.
With recent advance in Earth Observation techniques, the availability of multi-sensor data acquired in the same geographical area has been increasing greatly, which makes it possible to jointly depict the underlying land-cover phenomenon using different sensor data. In this paper, a novel multi-attentive hierarchical fusion net (MAHiDFNet) is proposed to realize the feature-level fusion and classification of hyperspectral image (HSI) with Light Detection and Ranging (LiDAR) data. More specifically, a triple branch HSI-LiDAR Convolutional Neural Network (CNN) backbone is first developed to simultaneously extract the spatial features, spectral features and elevation features of the land-cover objects. On this basis, hierarchical fusion strategy is adopted to fuse the oriented feature embeddings. In the shallow feature fusion stage, we propose a novel modality attention (MA) module to generate the modality integrated features. By fully considering the correlation and heterogeneity between different sensor data, feature interaction and integration is released by the proposed MA module. At the same time, self-attention modules are also adopted to highlight the modality specific features. In the deep feature fusion stage, the obtained modality specific features and modality integrated features are fused to construct the hierarchical feature fusion framework. Experiments on three real HSI-LiDAR datasets demonstrate the effectiveness of the proposed framework. The code will be public on https://github.com/SYFYN0317/-MAHiDFNet.  相似文献   

6.
Visual impairment assistance systems play a vital role in improving the standard of living for visually impaired people (VIP). With the development of deep learning technologies and assistive devices, many assistive technologies for VIP have achieved remarkable success in environmental perception and navigation. In particular, convolutional neural network (CNN)-based models have surpassed the level of human recognition and achieved a strong generalization ability. However, the large memory and computation consumption in CNNs have been one of the main barriers to deploying them into resource-limited systems for visual impairment assistance applications. To this end, most cheap convolutions (e.g., group convolution, depth-wise convolution, and shift convolution) have recently been used for memory and computation reduction but with a specific architecture design. Furthermore, it results in a low discriminability of the compressed networks by directly replacing the standard convolution with these cheap ones. In this paper, we propose to use knowledge distillation to improve the performance of compact student networks with cheap convolutions. In our case, the teacher is a network with the standard convolution, while the student is a simple transformation of the teacher architecture without complicated redesigning. In particular, we introduce a novel online distillation method, which online constructs the teacher network without pre-training and conducts mutual learning between the teacher and student network, to improve the performance of the student model. Extensive experiments demonstrate that the proposed approach achieves superior performance to simultaneously reduce memory and computation overhead of cutting-edge CNNs on different datasets, including CIFAR-10/100 and ImageNet ILSVRC 2012, compared to the previous CNN compression and acceleration methods. The codes are publicly available at https://github.com/EthanZhangYC/OD-cheap-convolution.  相似文献   

7.
Establishing reliable correspondences by a deep neural network is an important task in computer vision, and it generally requires permutation-equivariant architecture and rich contextual information. In this paper, we design a Permutation-Equivariant Split Attention Network (called PESA-Net), to gather rich contextual information for the feature matching task. Specifically, we propose a novel “Split–Squeeze–Excitation–Union” (SSEU) module. The SSEU module not only generates multiple paths to exploit the geometrical context of putative correspondences from different aspects, but also adaptively captures channel-wise global information by explicitly modeling the interdependencies between the channels of features. In addition, we further construct a block by fusing the SSEU module, Multi-Layer Perceptron and some normalizations. The proposed PESA-Net is able to effectively infer the probabilities of correspondences being inliers or outliers and simultaneously recover the relative pose by essential matrix. Experimental results demonstrate that the proposed PESA-Net relative surpasses state-of-the-art approaches for pose estimation and outlier rejection on both outdoor scenes and indoor scenes (i.e., YFCC100M and SUN3D). Source codes: https://github.com/x-gb/PESA-Net.  相似文献   

8.
The research on universal adversarial perturbations (UAPs) is significant to trustworthy deep learning. To disentangle the UAPs with the training data dependency and the target model dependency, the exploration of procedural noise functions is a feasible method. However, the current procedural adversarial noise attack method has several characteristics like visually significant anisotropy and gradient artifacts that may impact the stealthiness of adversarial examples. This study proposes a novel model-free and data-free UAP method based on the procedural noise functions with two variants: Simplex noise attack and Worley noise attack. The attack method can achieve deceit on the neural networks with a more aesthetic rendering effect. A detailed empirical study is provided to validate the effectiveness of the proposed attack method. The extensive experiments show that the UAPs generated by the proposed method achieve considerable attack performance on the ImageNet dataset and the CIFAR-10 dataset. Moreover, this study implements the performance evaluation and robustness analysis of existing defense methods against the proposed UAPs. It has the potential to enhance research on the robustness of neural networks in real applications. The code is available at https://github.com/momo1986/adversarial_example_simplex_worley.  相似文献   

9.
Curated collections of models are essential for the success of Machine Learning (ML) and Data Analytics in Model-Driven Engineering (MDE). However, current datasets are either too small or not properly curated. In this paper, we present ModelSet, a dataset composed of 5,466 Ecore models and 5,120 UML models which have been manually labelled to support ML tasks. We describe the structure of the dataset and explain how to use the associated library to develop ML applications in Python. Finally, we present some applications which can be addressed using ModelSet.Tool Website: https://github.com/modelset  相似文献   

10.
Automatic affect recognition in real-world environments is an important task towards a natural interaction between humans and machines. The recent years, several advancements have been accomplished in determining the emotional states with the use of Deep Neural Networks (DNNs). In this paper, we propose an emotion recognition system that utilizes the raw text, audio and visual information in an end-to-end manner. To capture the emotional states of a person, robust features need to be extracted from the various modalities. To this end, we utilize Convolutional Neural Networks (CNNs) and propose a novel transformer-based architecture for the text modality that can robustly capture the semantics of sentences. We develop an audio model to process the audio channel, and adopt a variation of a high resolution network (HRNet) to process the visual modality. To fuse the modality-specific features, we propose novel attention-based methods. To capture the temporal dynamics in the signal, we utilize Long Short-Term Memory (LSTM) networks. Our model is trained on the SEWA dataset of the AVEC 2017 research sub-challenge on emotion recognition, and produces state-of-the-art results in the text, visual and multimodal domains, and comparable performance in the audio case when compared with the winning papers of the challenge that use several hand-crafted and DNN features. Code is available at: https://github.com/glam-imperial/multimodal-affect-recognition.  相似文献   

11.
Infrared and visible image fusion aims to synthesize a single fused image containing salient targets and abundant texture details even under extreme illumination conditions. However, existing image fusion algorithms fail to take the illumination factor into account in the modeling process. In this paper, we propose a progressive image fusion network based on illumination-aware, termed as PIAFusion, which adaptively maintains the intensity distribution of salient targets and preserves texture information in the background. Specifically, we design an illumination-aware sub-network to estimate the illumination distribution and calculate the illumination probability. Moreover, we utilize the illumination probability to construct an illumination-aware loss to guide the training of the fusion network. The cross-modality differential aware fusion module and halfway fusion strategy completely integrate common and complementary information under the constraint of illumination-aware loss. In addition, a new benchmark dataset for infrared and visible image fusion, i.e., Multi-Spectral Road Scenarios (available at https://github.com/Linfeng-Tang/MSRS), is released to support network training and comprehensive evaluation. Extensive experiments demonstrate the superiority of our method over state-of-the-art alternatives in terms of target maintenance and texture preservation. Particularly, our progressive fusion framework could round-the-clock integrate meaningful information from source images according to illumination conditions. Furthermore, the application to semantic segmentation demonstrates the potential of our PIAFusion for high-level vision tasks. Our codes will be available at https://github.com/Linfeng-Tang/PIAFusion.  相似文献   

12.
In this paper, an unsupervised learning-based approach is presented for fusing bracketed exposures into high-quality images that avoids the need for interim conversion to intermediate high dynamic range (HDR) images. As an objective quality measure – the colored multi-exposure fusion structural similarity index measure (MEF-SSIMc) – is optimized to update the network parameters, the unsupervised learning can be realized without using any ground truth (GT) images. Furthermore, an unreferenced gradient fidelity term is added in the loss function to recover and supplement the image information for the fused image. As shown in the experiments, the proposed algorithm performs well in terms of structure, texture, and color. In particular, it maintains the order of variations in the original image brightness and suppresses edge blurring and halo effects, and it also produces good visual effects that have good quantitative evaluation indicators. Our code will be publicly available at https://github.com/cathying-cq/UMEF.  相似文献   

13.
This paper proposes a template-based approach to semi-automatically create contextualized learning tasks out of several sources from the Web of Data. The contextualization of learning tasks opens the possibility of bridging formal learning that happens in a classroom, and informal learning that happens in other physical spaces, such as squares or historical buildings. The tasks created cover different cognitive levels and are contextualized by their location and the topics covered. We applied this approach to the domain of History of Art in the Spanish region of Castile and Leon. We gathered data from DBpedia, Wikidata and the Open Data published by the regional government and we applied 32 templates to obtain 16K learning tasks. An evaluation with 8 teachers shows that teachers would accept their students to carry out the tasks generated. Teachers also considered that the 85% of the tasks generated are aligned with the content taught in the classroom and were found to be relevant to learn in other informal spaces. The tasks created are available at https://casuallearn.gsic.uva.es/sparql.  相似文献   

14.
Accurate retinal vessel segmentation is very challenging. Recently, the deep learning based method has greatly improved performance. However, the non-vascular structures usually harm the performance and some low contrast small vessels are hard to be detected after several down-sampling operations. To solve these problems, we design a deep fusion network (DF-Net) including multiscale fusion, feature fusion and classifier fusion for multi-source vessel image segmentation. The multiscale fusion module allows the network to detect blood vessels with different scales. The feature fusion module fuses deep features with vessel responses extracted from a Frangi filter to obtain a compact yet domain invariant feature representation. The classifier fusion module provides the network more supervision. DF-Net also predicts the parameter of the Frangi filter to avoid manually picking the best parameters. The learned Frangi filter enhances the feature map of the multiscale network and restores the edge information loss caused by down-sampling operations. The proposed end-to-end network is easy to train and the inference time for one image is 41ms on a GPU. The model outperforms state-of-the-art methods and achieves the accuracy of 96.14%, 97.04%, 98.02% from three publicly available fundus image datasets DRIVE, STARE, CHASEDB1, respectively. The code is available at https://github.com/y406539259/DF-Net.  相似文献   

15.
As a special group, visually impaired people (VIP) find it difficult to access and use visual information in the same way as sighted individuals. In recent years, benefiting from the development of computer hardware and deep learning techniques, significant progress have been made in assisting VIP with visual perception. However, most existing datasets are annotated in single scenario and lack of sufficient annotations for diversity obstacles to meet the realistic needs of VIP. To address this issue, we propose a new dataset called Walk On The Road (WOTR), which has nearly 190 K objects, with approximately 13.6 objects per image. Specially, WOTR contains 15 categories of common obstacles and 5 categories of road judging objects, including multiple scenario of walking on sidewalks, tactile pavings, crossings, and other locations. Additionally, we offer a series of baselines by training several advanced object detectors on WOTR. Furthermore, we propose a simple but effective PC-YOLO to obtain excellent detection results on WOTR and PASCAL VOC datasets. The WOTR dataset is available at https://github.com/kxzr/WOTR  相似文献   

16.
17.

A lot of malicious applications appears every day, threatening numerous users. Therefore, a surge of studies have been conducted to protect users from newly emerging malware by using machine learning algorithms. Albeit existing machine or deep learning-based Android malware detection approaches achieve high accuracy by using a combination of multiple features, it is not possible to employ them on our mobile devices due to the high cost for using them. In this paper, we propose MAPAS, a malware detection system, that achieves high accuracy and adaptable usages of computing resources. MAPAS analyzes behaviors of malicious applications based on API call graphs of them by using convolution neural networks (CNN). However, MAPAS does not use a classifier model generated by CNN, it only utilizes CNN for discovering common features of API call graphs of malware. For efficiently detecting malware, MAPAS employs a lightweight classifier that calculates a similarity between API call graphs used for malicious activities and API call graphs of applications that are going to be classified. To demonstrate the effectiveness and efficiency of MAPAS, we implement a prototype and thoroughly evaluate it. And, we compare MAPAS with a state-of-the-art Android malware detection approach, MaMaDroid. Our evaluation results demonstrate that MAPAS can classify applications 145.8% faster and uses memory around ten times lower than MaMaDroid. Also, MAPAS achieves higher accuracy (91.27%) than MaMaDroid (84.99%) for detecting unknown malware. In addition, MAPAS can generally detect any type of malware with high accuracy.

  相似文献   

18.
Tremendous advances in different areas of knowledge are producing vast volumes of data, a quantity so large that it has made necessary the development of new computational algorithms. Among the algorithms developed, we find Machine Learning models and specific data mining techniques that might be useful for all areas of knowledge. The use of computational tools for data analysis is increasingly required, given the need to extract meaningful information from such large volumes of data. However, there are no free access libraries, modules, or web services that comprise a vast array of analytical techniques in a user-friendly environment for non-specific users. Those that exist raise high usability barriers for those untrained in the field as they usually have specific installation requirements and require in-depth programming knowledge, or may result expensive. As an alternative, we have developed DMAKit, a user-friendly web platform powered by DMAKit-lib, a new library implemented in Python, which facilitates the analysis of data of different kind and origins. Our tool implements a wide array of state-of-the-art data mining and pattern recognition techniques, allowing the user to quickly implement classification, prediction or clustering models, statistical evaluation, and feature analysis of different attributes in diverse datasets without requiring any specific programming knowledge. DMAKit is especially useful for users who have large volumes of data to be analyzed but do not have the informatics, mathematical, or statistical knowledge to implement models. We expect this platform to provide a way to extract information and analyze patterns through data mining techniques for anyone interested in applying them with no specific knowledge required. Particularly, we present several cases of study in the areas of biology, biotechnology, and biomedicine, where we highlight the applicability of our tool to ease the labor of non-specialist users to apply data analysis and pattern recognition techniques. DMAKit is available for non-commercial use as an open-access library, licensed under the GNU General Public License, version GPL 3.0. The web platform is publicly available at https://pesb2.cl/dmakitWeb. Demonstrative and tutorial videos for the web platform are available in https://pesb2.cl/dmakittutorials/. Complete urls for relevant content are listed in the Data Availability section.  相似文献   

19.
The International Society for the Study of Vascular Anomalies (ISSVA) provides a classification for vascular anomalies that enables specialists to unambiguously classify diagnoses. This classification is only available in PDF format and is not machine-readable, nor does it provide unique identifiers that allow for structured registration. In this paper, we describe the process of transforming the ISSVA classification into an ontology. We also describe the structure of this ontology, as well as two applications of the ontology using examples from the domain of rare disease research. We used the expertise of an ontology expert and clinician during the development process. We semi-automatically added mappings to relevant external ontologies using automated ontology matching systems and manual assessment by experts. The ISSVA ontology should contribute to making data for vascular anomaly research more Findable, Accessible, Interoperable, and Reusable (FAIR). The ontology is available at https://bioportal.bioontology.org/ontologies/ISSVA.  相似文献   

20.
With more and more crowdsourcing geo-tagged field photos available online, they are becoming a potentially valuable source of information for environmental studies. However, the labelling and recognition of these photos are time-consuming. To utilise such information, a land cover type recognition model for field photos was proposed based on the deep learning technique. This model combines a pre-trained convolutional neural network (CNN) as the image feature extractor and the multinomial logistic regression model as the feature classifier. The pre-trained CNN model Inception-v3 was used in this study. The labelled field photos from the Global Geo-Referenced Field Photo Library (http://eomf.ou.edu/photos) were chosen for model training and validation. The results indicated that our recognition model achieved an acceptable accuracy (48.40% for top-1 prediction and 76.24% for top-3 prediction) of land cover classification. With accurate self-assessment of confidence, the model can be applied to classify numerous online geo-tagged field photos for environmental information extraction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号