首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The classification of habitats is crucial for structuring knowledge and developing our understanding of the natural world. Currently, most successful methods employ human surveyorsa laborious, expensive and subjective process. In this paper, we formulate habitat classification as a fine-grained visual categorization problem. We build on previous work and propose an image annotation framework that uses a novel automatic random forest-based method and that takes into consideration visual and geographical closeness in the classification process. During training, low-level visual features and medium-level contextual information are extracted. For the latter, we use a human-in-the-loop methodology by asking humans a set of 17 questions about the appearances of the image that can be easily answered by non-ecologists to extract medium-level knowledge about the images. During testing, and considering that close areas have similar ecological properties, we weigh the influence of the prediction of each tree of the forest according to their distance to the unseen test photography. Additionally, we present an updated version of a geo-referenced habitat image database containing over 1,000 high-resolution ground photographs that have been manually annotated by habitat classification experts. This has been made publicly available image database specifically designed for the development of multimedia analysis techniques for ecological applications. We show experimental recall and precision results which illustrate that our image annotation framework is able to annotate with a reasonable degree of confidence four of the main habitat classes: woodland and scrub, grassland and marsh, heathland and miscellaneous.  相似文献   

2.
3.
4.
5.
User representation learning is one prominent and critical task of user analysis on social networks, which derives conceptual user representations to improve the inference of user intentions and behaviors. Previous efforts have shown its substantial value in multifarious real-world applications, including product recommendation, textual content modeling, link prediction, and many more. However, existing studies either underutilize multi-view information, or neglect the stringent entanglement among underlying factors that govern user intentions, thus deriving deteriorated representations. To overcome these shortages, this paper proposes an adversarial fusion framework to fully exploit substantial multi-view information for user representation, consisting of a generator and a discriminator. The generator learns representations with a variational autoencoder, and is forced by the adversarial fusion framework to pay specific attention to substantial informative signs, thus integrating multi-view information. Furthermore, the variational autoencoder used in the generator is novelly designed to capture and disentangle the latent factors behind user intentions. By fully utilizing multi-view information and achieving disentanglement, our model learns robust and interpretable user representations. Extensive experiments on both synthetic and real-world datasets demonstrate the superiority of our proposed model.  相似文献   

6.
针对高光谱遥感图像中标记样本获取困难的问题,研究如何选择少量高质量的查询样本进行交互标记的多视图主动学习算法。首先采用不同尺度和方向的三维Gabor滤波器组提取高光谱图像空谱特征;然后挑选出类别判别能力较强的三维Gabor特征来构建多视图;最后提出一种基于多视图后验概率差异最小(MPPD)的样本查询策略。实验初选30个标记样本,经过100次迭代后,三维Gabor特征多视图结合MPPD查询策略在ROSIS Pavia University和AVIRIS Indiana Pines两个数据集上的总体分类精度分别达到94.16%和91.30%,表明通过三维Gabor可以有效提取高光谱遥感图像空谱特征,提供具有多样性和互补性的特征视图。结合MPPD查询策略能挑选出最有价值的查询样本。  相似文献   

7.
8.
Zhang  Hong  Huang  Yu  Xu  Xin  Zhu  Ziqi  Deng  Chunhua 《Multimedia Tools and Applications》2018,77(3):3353-3368

Due to the rapid development of multimedia applications, cross-media semantics learning is becoming increasingly important nowadays. One of the most challenging issues for cross-media semantics understanding is how to mine semantic correlation between different modalities. Most traditional multimedia semantics analysis approaches are based on unimodal data cases and neglect the semantic consistency between different modalities. In this paper, we propose a novel multimedia representation learning framework via latent semantic factorization (LSF). First, the posterior probability under the learned classifiers is served as the latent semantic representation for different modalities. Moreover, we explore the semantic representation for a multimedia document, which consists of image and text, by latent semantic factorization. Besides, two projection matrices are learned to project images and text into a same semantic space which is more similar with the multimedia document. Experiments conducted on three real-world datasets for cross-media retrieval, demonstrate the effectiveness of our proposed approach, compared with state-of-the-art methods.

  相似文献   

9.
With the rapid development of location-based social networks (LBSNs), more and more media data are unceasingly uploaded by users. The asynchrony between the visual and textual information has made it extremely difficult to manage the multimodal information for manual annotation-free retrieval and personalized recommendation. Consequently the automated image semantic discovery of multimedia location-related user-generated contents (UGCs) for user experience has become mandatory. Most of the literatures leverage single-modality data or correlated multimedia data for image semantic detection. However, the intrinsically heterogeneous UGCs in LBSNs are usually independent and uncorrelated. It is hard to build correlation between textual information and visual information. In this paper, we propose a cross-domain semantic modeling method for automatic image annotation for visual information from social network platforms. First, we extract a set of hot topics from the collected textual information for image dataset preparation. Then the proposed noisy sample filtering is implemented to remove low-relevance photos. Finally, we leverage cross-domain datasets to discover the common knowledge of each semantic concept from UGCs and boost the performance of semantic annotation by semantic transfer. The comparison experiments on cross-domain datasets were conducted to demonstrate the superiority of the proposed method.  相似文献   

10.
11.
12.
Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding. Both in bag-of-words and the recently popular representations based on convolutional neural networks, local features are computed at multiple scales. However, these multi-scale convolutional features are pooled into a single scale-invariant representation. We argue that entirely scale-invariant image representations are sub-optimal and investigate approaches to scale coding within a bag of deep features framework. Our approach encodes multi-scale information explicitly during the image encoding stage. We propose two strategies to encode multi-scale information explicitly in the final image representation. We validate our two scale coding techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012, Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale coding approaches outperform both the scale-invariant method and the standard deep features of the same network. Further, combining our scale coding approaches with standard deep features leads to consistent improvement over the state of the art.  相似文献   

13.
In this paper, we propose a new tensor-based representation algorithm for image classification. The algorithm is realized by learning the parameter tensor for image tensors. One novelty is that the parameter tensor is learned according to the Tucker tensor decomposition as the multiplication of a core tensor with a group of matrices for each order, which endows that the algorithm preserved the spatial information of image. We further extend the proposed tensor algorithm to a semi-supervised framework, in order to utilize both labeled and unlabeled images. The objective function can be solved by using the alternative optimization method, where at each iteration, we solve the typical ridge regression problem to obtain the closed form solution of the parameter along the corresponding order. Experimental results of gray and color image datasets show that our method outperforms several classification approaches. In particular, we find that our method can implement a high-quality classification performance when only few labeled training samples are provided.  相似文献   

14.
字典学习作为一种高效的特征学习技术被广泛应用于多视角分类中.现有的多视角字典学习方法大多只利用多视角数据的部分信息,且只学习一种类型的字典.实际上,多视角数据的相关性信息和多样性信息同样重要,且仅考虑一种合成型字典或解析型字典的学习算法不能同时满足处理速度、可解释性以及应用范围的要求.针对上述问题,提出了一种基于块对角...  相似文献   

15.
Acquiring land cover types from very high resolution (VHR) images is of great significance to many applications and has been intensively studied for many years. The difficulties in image classification and the high frequencies of remote sensing image acquisition make it urgent to develop efficient knowledge transfer approaches for understanding multi-temporal VHR images. This letter proposed a knowledge transfer approach that uses the label information of the existing VHR images to classify multi-temporal images. The approach was implemented in three steps: object-based change detection, knowledge transfer of label information, and random walker (RW) classification. The proposed approach was tested by two datasets with each having two temporal images acquired on the same geographical areas. The experimental results showed that the proposed approach outperformed the support vector machine (SVM) algorithm in classifying multi-temporal images and can reduce the influence of spectral confusions on image classification.  相似文献   

16.
In this paper, we deal with those applications of textual image compression where high compression ratio and maintaining or improving the visual quality and readability of the compressed images are of main concern. In textual images, most of the information exists in the edge regions; therefore, the compression problem can be studied in the framework of region-of-interest (ROI) coding. In this paper, the Set Partitioning in Hierarchical Trees (SPIHT) coder is used in the framework of ROI coding along with some image enhancement techniques in order to remove the leakage effect which occurs in the wavelet-based low-bit-rate compression. We evaluated the compression performance of the proposed method with respect to some qualitative and quantitative measures. The qualitative measures include the averaged mean opinion scores (MOS) curve along with demonstrating some outputs in different conditions. The quantitative measures include two proposed modified PSNR measures and the conventional one. Comparing the results of the proposed method with those of three conventional approaches, DjVu, JPEG2000, and SPIHT coding, showed that the proposed compression method considerably outperformed the others especially from the qualitative aspect. The proposed method improved the MOS by 20 and 30 %, in average, for high- and low-contrast textual images, respectively. In terms of the modified and conventional PSNR measures, the proposed method outperformed DjVu and JPEG2000 up to 0.4 dB for high-contrast textual images at low bit rates. In addition, compressing the high contrast images using the proposed ROI technique, compared to without using this technique, improved the average textual PSNR measure up to 0.5 dB, at low bit rates.  相似文献   

17.
18.
Browsing multimedia collection on mobile devices raises the needs for new multimedia indexing solutions. In this paper, we focus on the management of personal image collections. We propose a method to simplify the browsing task on such a collection. The contributions reside in an incremental hierarchical algorithm, a method to provide a textual representation of the groups obtained and an algorithm to build a geo-temporal view of the collection. The proposed incremental hierarchical algorithm builds a temporal tree from the time stamp of each image. We opt here for a combination of a supervised clustering and an incremental algorithm based on mixture model. Good properties of the hierarchy are determined automatically thanks to the Integrated Likelihood Criterion (ICL). Based on the events obtained, a textual representation is proposed and then used to improve our temporal classification, combining geographical and temporal information. Results are validated on several real user collections with our prototype MyOwnLife.  相似文献   

19.
20.
A mode versus clarity dilemma exists in occupancy grid based robotic mapping. This arises as two general approaches have emerged in the domain with diametric operational modes and differing representational abilities; the inverse and the forward approach. Their classification relates to the sensory model employed by the approaches. The inverse approach is characterised by an ability to construct a map in real time. This ability comes at the cost of reduced representational clarity however. The forward approach is capable of producing more accurate maps but requires all sensory data a priori. This work investigates if sub dividing the mapping problem into its constituent elements of sensor data evaluation and representation may facilitate improved real time map generation. ConForM (Contextual Forward Modelling) is presented as a technique for spatial perception and map building which addresses this problem which embodies this approach. Results from in-depth empirical evaluation illustrate the associated improvement in map quality resultant from the technique.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号