首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper presents an approach to image understanding on the aspect of unsupervised scene segmentation. With the goal of image understanding in mind, we consider ‘unsupervised scene segmentation’ a task of dividing a given image into semantically meaningful regions without using annotation or other human-labeled information. We seek to investigate how well an algorithm can achieve at partitioning an image with limited human-involved learning procedures. Specifically, we are interested in developing an unsupervised segmentation algorithm that only relies on the contextual prior learned from a set of images. Our algorithm incorporates a small set of images that are similar to the input image in their scene structures. We use the sparse coding technique to analyze the appearance of this set of images; the effectiveness of sparse coding allows us to derive a priori the context of the scene from the set of images. Gaussian mixture models can then be constructed for different parts of the input image based on the sparse-coding contextual prior, and can be combined into an Markov-random-field-based segmentation process. The experimental results show that our unsupervised segmentation algorithm is able to partition an image into semantic regions, such as buildings, roads, trees, and skies, without using human-annotated information. The semantic regions generated by our algorithm can be useful, as pre-processed inputs for subsequent classification-based labeling algorithms, in achieving automatic scene annotation and scene parsing.  相似文献   

2.
The sensing context plays an important role in many pervasive and mobile computing applications. Continuing from previous work [D. Phung, B. Adams, S. Venkatesh, Computable social patterns from sparse sensor data, in: Proceedings of First International Workshop on Location Web, World Wide Web Conference (WWW), New York, NY, USA, 2008, ACM 69–72.], we present an unsupervised framework for extracting user context in indoor environments with existing wireless infrastructures. Our novel approach casts context detection into an incremental, unsupervised clustering setting. Using WiFi observations consisting of access point identification and signal strengths freely available in office or public spaces, we adapt a density-based clustering technique to recover basic forms of user contexts that include user motion state and significant places the user visits from time to time. High-level user context, termed rhythms, comprising sequences of significant places are derived from the above low-level context by employing probabilistic clustering techniques, latent Dirichlet allocation and its n-gram temporal extension. These user contexts can enable a wide range of context-ware application services. Experimental results with real data in comparison with existing methods are presented to validate the proposed approach. Our motion classification algorithm operates in real-time, and achieves a 10% improvement over an existing method; significant locations are detected with over 90% accuracy and near perfect cluster purity. Richer indoor context and meaningful rhythms, such as typical daily routines or meeting patterns, are also inferred automatically from collected raw WiFi signals.  相似文献   

3.
A generic algorithm is presented for automatic extraction of buildings and roads from complex urban environments in high-resolution satellite images where the extraction of both object types at the same time enhances the performance. The proposed approach exploits spectral properties in conjunction with spatial properties, both of which actually provide complementary information to each other. First, a high-resolution pansharpened colour image is obtained by merging the high-resolution panchromatic (PAN) and the low-resolution multispectral images yielding a colour image at the resolution of the PAN band. Natural and man-made regions are classified and segmented by the Normalized Difference Vegetation Index (NDVI). Shadow regions are detected by the chromaticity to intensity ratio in the YIQ colour space. After the classification of the vegetation and the shadow areas, the rest of the image consists of man-made areas only. The man-made areas are partitioned by mean shift segmentation where some resulting segments are irrelevant to buildings in terms of shape. These artefacts are eliminated in two steps: First, each segment is thinned using morphological operations and its length is compared to a threshold which is determined according to the empirical length of the buildings. As a result, long segments which most probably represent roads are masked out. Second, the erroneous thin artefacts which are classified by principal component analysis (PCA) are removed. In parallel to PCA, small artefacts are wiped out based on morphological processes as well. The resultant man-made mask image is overlaid on the ground-truth image, where the buildings are previously labelled, for the accuracy assessment of the methodology. The method is applied to Quickbird images (2.4 m multispectral R, G, B, near-infrared (NIR) bands and 0.6 m PAN band) of eight different urban regions, each of which includes different properties of surface objects. The images are extending from simple to complex urban area. The simple image type includes a regular urban area with low density and regular building pattern. The complex image type involves almost all kinds of challenges such as small and large buildings, regions with bare soil, vegetation areas, shadows and so on. Although the performance of the algorithm slightly changes for various urban complexity levels, it performs well for all types of urban areas.  相似文献   

4.
This paper presents a structured approach for efficiently exploiting the perspective information of a scene to enhance the detection of objects in monocular systems. It defines a finite grid of 3D positions on the dominant ground plane and computes occupancy maps from which object location estimates are extracted . This method works on the top of any detection method, either pixel-wise (e.g. background subtraction) or region-wise (e.g. detection-by-classification) technique, which can be linked to the proposed scheme with minimal fine tuning. Its flexibility thus allows for applying this approach in a wide variety of applications and sectors, such as surveillance applications (e.g. person detection) or driver assistance systems (e.g. vehicle or pedestrian detection). Extensive results provide evidence of its excellent performance and its ease of use in combination with different image processing techniques.  相似文献   

5.
The main goal of existing approaches for structural texture analysis has been the identification of repeating texture primitives and their placement patterns in images containing a single type of texture. We describe a novel unsupervised method for simultaneous detection and localization of multiple structural texture areas along with estimates of their orientations and scales in real images. First, multi-scale isotropic filters are used to enhance the potential texton locations. Then, regularity of the textons is quantified in terms of the periodicity of projection profiles of filter responses within sliding windows at multiple orientations. Next, a regularity index is computed for each pixel as the maximum regularity score together with its orientation and scale. Finally, thresholding of this regularity index produces accurate localization of structural textures in images containing different kinds of textures as well as non-textured areas. Experiments using three different data sets show the effectiveness of the proposed method in complex scenes.  相似文献   

6.
An unsupervised approach based on Information Bottleneck (IB) principle is proposed for detecting acoustic events from audio streams. In this paper, the IB principle is first concisely presented, and then the practical issues related to the application of IB principle to acoustic event detection are described in detail, including definitions of various variables, criterion for determining the number of acoustic events, tradeoff between amount of information preserved and compression of the initial representation, and detection steps. Further, we compare the proposed approach with both unsupervised and supervised approaches on four different types of audio files. Experimental results show that the proposed approach obtains lower detection errors and higher running speed compared to two state-of-the-art unsupervised approaches, and is little inferior to the state-of-the-art supervised approach in terms of both detection errors and runtime. The advantage of the proposed unsupervised approach over the supervised approach is that it does not need to pre-train classifiers and pre-know any prior information about audio streams.  相似文献   

7.
In this paper, we present a feature-based approach for monocular scene reconstruction based on Extended Kalman Filters (EKF). Our method processes a sequence of images taken by a single camera mounted frontally on a mobile robot. Using a combination of various techniques, we are able to produce a precise reconstruction that is free from outliers and can therefore be used for reliable obstacle detection and 3D map building. Furthermore, we present an attention-driven method that focuses the feature selection to image areas where the obstacle situation is unclear and where a more detailed scene reconstruction is necessary. In extensive real-world field tests we show that the presented approach is able to detect obstacles that are not seen by other sensors, such as laser range finders. Furthermore, we show that visual obstacle detection combined with a laser range finder can increase the detection rate of obstacles considerably, allowing the autonomous use of mobile robots in complex public and home environments.  相似文献   

8.
任蕾  施朝健  冉鑫 《计算机工程与应用》2012,48(23):161-164,172
提出一种应用奇异值分解的海上场景显著性检测方法。提取海上场景图像中颜色和亮度各通道特征,并对各其分别进行奇异值分解,根据设定的阈值,选择各特征的典型分量。各特征的粗显著图定义为各特征和其典型分量的差。为进一步去除海杂波等干扰,在粗显著图中,计算其空间域全局显著性,以此形成显著性图。得到的颜色通道和亮度通道显著图通过线性合并为总显著图。利用海上场景图像进行了实验,结果表明提出方法的有效性。  相似文献   

9.
动画视频分析中,实时在线地检测场景切换点是一个基础任务。传统基于像素和阈值的检测方法,不仅需要存储整个动画视频,同时检测结果受目标运动和噪声的影响较大,且阈值设定也不太适用复杂的场景变换。提出一种基于在线Bayesian决策的动画场景切换检测方法,新方法首先对动画帧图像分块并提取其HSV颜色特征,然后将连续帧的相似度存入一个固定长度的缓存队列中,最后基于动态Bayesian决策判定是否有场景切换。多类动画视频的对比实验结果表明,新方法能够在线且更稳健地检测出动画场景切换。  相似文献   

10.
Park  Seyoung  Kang  Jaewoong  Kim  Jongmo  Lee  Seongil  Sohn  Mye 《Multimedia Tools and Applications》2019,78(4):4417-4435
Multimedia Tools and Applications - In this paper, we propose an anomaly detection system of machines using a hybrid learning mechanism that combines two kinds of machine learning approaches,...  相似文献   

11.
This paper presents a new partitioning algorithm, designated as the Adaptive C-Populations (ACP) clustering algorithm, capable of identifying natural subgroups and influential minor prototypes in an unlabeled dataset. In contrast to traditional Fuzzy C-Means clustering algorithms, which partition the whole dataset equally, adaptive clustering algorithms, such as that presented in this study, identify the natural subgroups in unlabeled datasets. In this paper, data points within a small, dense region located at a relatively large distance from any of the major cluster centers are considered to form a minor prototype. The aim of ACP is to adaptively separate these isolated minor clusters from the major clusters in the dataset. The study commences by introducing the mathematical model of the proposed ACP algorithm and demonstrates its convergence to a stable solution. The ability of ACP to detect minor prototypes is confirmed via its application to the clustering of three different datasets with different sizes and characteristics.  相似文献   

12.
International Journal on Document Analysis and Recognition (IJDAR) - How to precisely detect arbitrary-shaped texts in natural images has recently become a new hot topic in areas of computer vision...  相似文献   

13.
Rapid building detection using machine learning   总被引:1,自引:0,他引:1  
This work describes algorithms for performing discrete object detection, specifically in the case of buildings, where usually only low quality RGB-only geospatial reflective imagery is available. We utilize new candidate search and feature extraction techniques to reduce the problem to a machine learning (ML) classification task. Here we can harness the complex patterns of contrast features contained in training data to establish a model of buildings. We avoid costly sliding windows to generate candidates; instead we innovatively stitch together well known image processing techniques to produce candidates for building detection that cover 80–85 % of buildings. Reducing the number of possible candidates is important due to the scale of the problem. Each candidate is subjected to classification which, although linear, costs time and prohibits large scale evaluation. We propose a candidate alignment algorithm to boost classification performance to 80–90 % precision with a linear time algorithm and show it has negligible cost. Also, we propose a new concept called a Permutable Haar Mesh (PHM) which we use to form and traverse a search space to recover candidate buildings which were lost in the initial preprocessing phase. All code and datasets from this paper are made available online (http://kdl.cs.umb.edu/w/datasets/ and https://github.com/caitlinkuhlman/ObjectDetectionCLUtility).  相似文献   

14.
15.
In order to process video data efficiently, a video segmentation technique through scene change detection must be required. This is a fundamental operation used in many digital video applications such as digital libraries, video on demand (VOD), etc. Many of these advanced video applications require manipulations of compressed video signals. So, the scene change detection process is achieved by analyzing the video directly in the compressed domain, thereby avoiding the overhead of decompressing video into individual frames in the pixel domain. In this paper, we propose a fast scene change detection algorithm using direct feature extraction from MPEG compressed videos, and evaluate this technique using sample video data, First, we derive binary edge maps from the AC coefficients in blocks which were discrete cosine transformed. Second, we measure edge orientation, strength and offset using correlation between the AC coefficients in the derived binary edge maps. Finally, we match two consecutive frames using these two features (edge orientation and strength). This process was made possible by a new mathematical formulation for deriving the edge information directly from the discrete cosine transform (DCT) coefficients. We have shown that the proposed algorithm is faster or more accurate than the previously known scene change detection algorithms  相似文献   

16.
The combination of the Spinning Enhanced Visible and Infrared Imager (SEVIRI) and the Geostationary Earth Radiation Budget (GERB) instruments on Meteosat-8 provides a powerful new tool for detecting aerosols and estimating their radiative effect at high temporal and spatial resolution. However, at present no specific aerosol treatment is performed in the GERB processing chain, severely limiting the use of the data for aerosol studies. A particular problem relates to the misidentification of Saharan dust outbreaks as cloud which can bias the shortwave and longwave fluxes. In this paper an algorithm is developed which employs multiple-linear regression, using information from selected thermal infrared SEVIRI channels, to detect dust aerosol over ocean and provide an estimate of the optical depth at 0.55 μm (τ055). To test the performance of the algorithm, it has been applied to a number of dust events observed by SEVIRI during March and June 2004. The results are compared to co-located MODIS observations taken from the Terra and Aqua platforms, and ground based observations from the Cape Verde AERONET site. In terms of detection capability, employing the algorithm results in a notable improvement in the routine GERB scene identification. Locations identified by MODIS as being likely to be dust contaminated were originally classified as cloud in over 99.5% of the cases studied. With the application of the detection algorithm approximately 60-70% of these points are identified as dusty depending on the dust model employed. The algorithm is also capable of detecting dust in regions and at times which would be excluded when using shortwave observations, due for example to the presence of sun-glint, or through the night. We further investigate whether the algorithm is capable of generating useful information concerning the aerosol loading. Comparisons with co-located retrievals from the SEVIRI 0.6 μm solar reflectance band observations show a level of agreement consistent with that expected from the simulations, with rms differences of between 0.5 and 0.8, and a mean bias ranging from − 0.5 to 0.3 dependent on the dust representation employed in the algorithm. Temporally resolved comparisons with observations from the Capo Verde AERONET site through the months of March and June reinforce these findings, but also indicate that the algorithm is capable of discerning the diurnal pattern in aerosol loading. The algorithm has now been incorporated within the routine GERB processing in detection mode, and will be used to provide an experimental aerosol product for assessment by the scientific community.  相似文献   

17.
Raj  Chahat  Meel  Priyanka 《Applied Intelligence》2021,51(11):8132-8148

An upsurge of false information revolves around the internet. Social media and websites are flooded with unverified news posts. These posts are comprised of text, images, audio, and videos. There is a requirement for a system that detects fake content in multiple data modalities. We have seen a considerable amount of research on classification techniques for textual fake news detection, while frameworks dedicated to visual fake news detection are very few. We explored the state-of-the-art methods using deep networks such as CNNs and RNNs for multi-modal online information credibility analysis. They show rapid improvement in classification tasks without requiring pre-processing. To aid the ongoing research over fake news detection using CNN models, we build textual and visual modules to analyze their performances over multi-modal datasets. We exploit latent features present inside text and images using layers of convolutions. We see how well these convolutional neural networks perform classification when provided with only latent features and analyze what type of images are needed to be fed to perform efficient fake news detection. We propose a multi-modal Coupled ConvNet architecture that fuses both the data modules and efficiently classifies online news depending on its textual and visual content. We thence offer a comparative analysis of the results of all the models utilized over three datasets. The proposed architecture outperforms various state-of-the-art methods for fake news detection with considerably high accuracies.

  相似文献   

18.
Assistive technologies for elderly often use ambient sensor systems to infer activities of daily living (ADL). In general such systems assume that only a single person (the resident) is present in the home. However, in real world environments, it is common to have visits and it is crucial to know when the resident is alone or not. We deal with this challenge by presenting a novel method that models regular activity patterns and detects visits. Our method is based on the Markov modulated Poisson process (MMPP), but is extended to allow the incorporation of multiple feature streams. The results from the experiments on nine months of sensor data collected in two apartments show that our model significantly outperforms the standard MMPP. We validate the generalisation of the model using two new data sets collected from an other sensor network.  相似文献   

19.
The society is changing towards a new paradigm in which an increasing number of old adults live alone. In parallel, the incidence of conditions that affect mobility and independence is also rising as a consequence of a longer life expectancy. In this paper, the specific problem of falls of old adults is addressed by devising a technological solution for monitoring these users. Video cameras, accelerometers and GPS sensors are combined in a multi-modal approach to monitor humans inside and outside the domestic environment. Machine learning techniques are used to detect falls and classify activities from accelerometer data. Video feeds and GPS are used to provide location inside and outside the domestic environment. It results in a monitoring solution that does not imply the confinement of the users to a closed environment.  相似文献   

20.
目的 场景文本检测是场景理解和文字识别领域的重要任务之一,尽管基于深度学习的算法显著提升了检测精度,但现有的方法由于对文字局部语义和文字实例间的全局语义的提取能力不足,导致缺乏文字多层语义的建模,从而检测精度不理想。针对此问题,提出了一种层级语义融合的场景文本检测算法。方法 该方法包括基于文本片段的局部语义理解模块和基于文本实例的全局语义理解模块,以分别引导网络关注文字局部和文字实例间的多层级语义信息。首先,基于文本片段的局部语义理解模块根据相对位置将文本划分为多个片段,在细粒度优化目标的监督下增强网络对局部语义的感知能力。然后,基于文本实例的全局语义理解模块利用文本片段粗分割结果过滤背景区域并提取可靠的文字区域特征,进而通过注意力机制自适应地捕获任意形状文本的全局语义信息并得到最终分割结果。此外,为了降低边界区域的预测噪声对层级语义信息聚合的干扰,提出边界感知损失函数以降低边界区域特征的歧义性。结果 算法在3个常用的场景文字检测数据集上实验并与其他算法进行了比较,所提方法在性能上获得了显著提升,在Totoal-Text数据集上,F值为87.0%,相比其他模型提升了1.0%;在MSRA-TD500(MSRA text detection 500 database)数据集上,F值为88.2%,相比其他模型提升了1.0%;在ICDAR 2015(International Conference on Document Analysis and Recognition)数据集上,F值为87.0%。结论 提出的模型通过分别构建不同层级下的语义上下文和对歧义特征额外的惩罚解决了层级语义提取不充分的问题,获得了更高的检测精度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号