首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Landmark annotation for training images is essential for many learning tasks in computer vision, such as object detection, tracking, and alignment. Image annotation is typically conducted manually, which is both labor-intensive and error-prone. To improve this process, this paper proposes a new approach to estimating the locations of a set of landmarks for a large image ensemble using manually annotated landmarks for only a small number of images in the ensemble. Our approach, named semi-supervised least-squares congealing, aims to minimize an objective function defined on both annotated and unannotated images. A shape model is learned online to constrain the landmark configuration. We employ an iterative coarse-to-fine patch-based scheme together with a greedy patch selection strategy for landmark location estimation. Extensive experiments on facial images show that our approach can reliably and accurately annotate landmarks for a large image ensemble starting with a small number of manually annotated images, under several challenging scenarios.  相似文献   

3.
目的 眼部状态的变化可以作为反映用户真实心理状态及情感变化的依据。由于眼部区域面积较小,瞳孔与虹膜颜色接近,在自然光下利用普通摄像头捕捉瞳孔大小以及位置的变化信息是当前一项具有较大挑战的任务。同时,与现实应用环境类似的具有精细定位和分割信息的眼部结构数据集的欠缺也是制约该领域研究发展的原因之一。针对以上问题,本文利用在普通摄像头场景下采集眼部图像数据,捕捉瞳孔的变化信息并建立了一个眼部图像分割及特征点定位数据集(eye segment and landmark detection dataset,ESLD)。方法 收集、标注并公开发布一个包含多种眼部类型的图像数据集ESLD。采用3种方式采集图像:1)采集用户使用电脑时的面部图像;2)收集已经公开的数据集中满足在自然光下使用普通摄像机条件时采集到的面部图像;3)基于公开软件UnityEye合成的眼部图像。3种采集方式可分别得到1 386幅、804幅和1 600幅眼部图像。得到原始图像后,在原始图像中分割出眼部区域,将不同尺寸的眼部图像归一化为256×128像素。最后对眼部图像的特征点进行人工标记和眼部结构分割。结果 ESLD数据集包含多种类型的眼部图像,可满足研究人员的不同需求。因为实际采集和从公开数据集中获取真实眼部图像十分困难,所以本文利用UnityEye生成眼部图像以改善训练数据量少的问题。实验结果表明,合成的眼部图像可以有效地弥补数据量缺少的问题,F1值可达0.551。利用深度学习方法分别提供了眼部特征点定位和眼部结构分割任务的基线。采用ResNet101作为特征提取网络情况下,眼部特征点定位的误差为5.828,眼部结构分割的mAP (mean average precision)可达0.965。结论 ESLD数据集可为研究人员通过眼部图像研究用户情感变化以及心理状态提供数据支持。  相似文献   

4.
This paper proposes a new framework for visual place recognition that incrementally learns models of each place and offers adaptability to dynamic elements in the scene. Traditional Bag-Of-Words (BOW) image-retrieval approaches to place recognition typically treat images in a holistic manner and are not capable of dealing with sub-scene dynamics, such as structural changes to a building façade or seasonal effects on foliage. However, by treating local features as observations of real-world landmarks in a scene that is observed repeatedly over a period of time, such dynamics can be modelled at a local level, and the spatio-temporal properties of each landmark can be independently updated incrementally. The method proposed models each place as a set of such landmarks and their geometric relationships. A new BOW filtering stage and geometric verification scheme are introduced to compute a similarity score between a query image and each scene model. As further training images are acquired for each place, the landmark properties are updated over time and in the long term, the model can adapt to dynamic behaviour in the scene. Results on an outdoor dataset of images captured along a 7 km path, over a period of 5 months, show an improvement in recognition performance when compared to state-of-the-art image retrieval approaches to place recognition.  相似文献   

5.
In many current medical applications of image analysis, objects are detected and delimited by boundary curves or surfaces. Yet the most effective multivariate statistics available pertain to labeled points (landmarks) only. In the finite-dimensional feature space that landmarks support, each case of a data set is equivalent to a deformation map deriving it from the average form. This paper introduces a new extension of the finite-dimensional spline-based approach for incorporating edge information. In this implementation edgels are restricted to landmark loci: they are interpreted as pairs of landmarks at infinitesimal separation in a specific direction. The effect of changing edge direction is a singular perturbation of the thin-plate spline for the landmarks alone. An appropriate normalization yields a basis for image deformations corresponding to changes of edge direction without landmark movement; this basis complements the basis of landmark deformations ignoring edge information. We derive explicit formulas for these edge warps, evaluate the quadratic form expressing bending energies of their formal combinations, and show the resulting spectrum of edge features in typical scenes. These expressions will aid all investigations into medical images that entail comparisons of anatomical scene analyses to a normative or typical form.  相似文献   

6.
Learning Active Basis Model for Object Detection and Recognition   总被引:1,自引:0,他引:1  
This article proposes an active basis model, a shared sketch algorithm, and a computational architecture of sum-max maps for representing, learning, and recognizing deformable templates. In our generative model, a deformable template is in the form of an active basis, which consists of a small number of Gabor wavelet elements at selected locations and orientations. These elements are allowed to slightly perturb their locations and orientations before they are linearly combined to generate the observed image. The active basis model, in particular, the locations and the orientations of the basis elements, can be learned from training images by the shared sketch algorithm. The algorithm selects the elements of the active basis sequentially from a dictionary of Gabor wavelets. When an element is selected at each step, the element is shared by all the training images, and the element is perturbed to encode or sketch a nearby edge segment in each training image. The recognition of the deformable template from an image can be accomplished by a computational architecture that alternates the sum maps and the max maps. The computation of the max maps deforms the active basis to match the image data, and the computation of the sum maps scores the template matching by the log-likelihood of the deformed active basis.  相似文献   

7.
Geotag propagation in social networks based on user trust model   总被引:1,自引:1,他引:0  
In the past few years sharing photos within social networks has become very popular. In order to make these huge collections easier to explore, images are usually tagged with representative keywords such as persons, events, objects, and locations. In order to speed up the time consuming tag annotation process, tags can be propagated based on the similarity between image content and context. In this paper, we present a system for efficient geotag propagation based on a combination of object duplicate detection and user trust modeling. The geotags are propagated by training a graph based object model for each of the landmarks on a small tagged image set and finding its duplicates within a large untagged image set. Based on the established correspondences between these two image sets and the reliability of the user, tags are propagated from the tagged to the untagged images. The user trust modeling reduces the risk of propagating wrong tags caused by spamming or faulty annotation. The effectiveness of the proposed method is demonstrated through a set of experiments on an image database containing various landmarks.  相似文献   

8.
Steganography algorithms recognition is a sub-section of steganalysis. Analysis shows when a steganalysis detector trained on one cover source is applied to images from an unseen source, generally the detection performance decreases. To tackle with this problem, this paper proposes a steganalytic scheme for steganography algorithms recognition. For a given testing image, a match image of the testing image is achieved. The match image is generated by performing a Gaussian filtering on the testing image to remove the possible stego signal. Then the match image is embedded in with recognized steganography algorithms. A CNN model trained on a training set is used to extract deep features from testing image and match images. Computing similarity between features with inner product operation or weighted-χ2, the final decision is made according to similarity between testing feature and each class of match feature. The proposed scheme can also detect steganography algorithms unknown in training set. Experiments show that, comparing with directly used CNN model, the proposed scheme achieves considerable improvement on testing accuracy when detecting images come from unseen source.  相似文献   

9.
An algorithm for accurate localization of facial landmarks coupled with a head pose estimation from a single monocular image is proposed. The algorithm is formulated as an optimization problem where the sum of individual landmark scoring functions is maximized with respect to the camera pose by fitting a parametric 3D shape model. The landmark scoring functions are trained by a structured output SVM classifier that takes a distance to the true landmark position into account when learning. The optimization criterion is non-convex and we propose a robust initialization scheme which employs a global method to detect a raw but reliable initial landmark position. Self-occlusions causing landmarks invisibility are handled explicitly by excluding the corresponding contributions from the data term. This allows the algorithm to operate correctly for a large range of viewing angles. Experiments on standard “in-the-wild” datasets demonstrate that the proposed algorithm outperforms several state-of-the-art landmark detectors especially for non-frontal face images. The algorithm achieves the average relative landmark localization error below 10% of the interocular distance in 98.3% of the 300 W dataset test images.  相似文献   

10.
This paper describes an approach to training a database of building images under the supervision of a user. Then it will be applied to recognize buildings in an urban scene. Given a set of training images, we first detect the building facets and calculate their properties such as area, wall color histogram and a list of local features. All facets of each building surface are used to construct a common model whose initial parameters are selected randomly from one of these facets. The common model is then updated step-by-step by spatial relationship of remaining facets and SVD-based (singular value decomposition) approximative vector. To verify the correspondence of image pairs, we proposed a new technique called cross ratio-based method which is more suitable for building surfaces than several previous approaches. Finally, the trained database is used to recognize a set of test images. The proposed method decreases the size of the database approximately 0.148 times, while automatically rejecting randomly repeated features from the scene and natural noise of local features. Furthermore, we show that the problem of multiple buildings was solved by separately analyzing each surface of a building.  相似文献   

11.
Automatically locating facial landmarks in images is an important task in computer vision. This paper proposes a novel context modeling method for facial landmark detection, which integrates context constraints together with local texture model in the cascaded AdaBoost framework. The motivation of our method lies in the basic human psychology observation that not only the local texture information but also the global context information is used for human to locate facial landmarks in faces. Therefore, in our solution, a novel type of feature, called Non-Adjacent Rectangle (NAR) Haar-like feature, is proposed to characterize the co-occurrence between facial landmarks and its surroundings, i.e., the context information, in terms of low-level features. For the locating task, traditional Haar-like features (characterizing local texture information) and NAR Haar-like features (characterizing context constraints in global sense) are combined together to form more powerful representations. Through Real AdaBoost learning, the most discriminative feature set is selected automatically and used for facial landmark detection. To verify the effectiveness of the proposed method, we evaluate our facial landmark detection algorithm on BioID and Cohn-Kanade face databases. Experimental results convincingly show that the NAR Haar-like feature is effective to model the context and our proposed algorithm impressively outperforms the published state-of-the-art methods. In addition, the generalization capability of the NAR Haar-like feature is further validated by extended applications to face detection task on FDDB face database.  相似文献   

12.
目的 远程光体积描记(remote photoplethysmograph,rPPG)是一种基于视频的非接触心率测量方法,通过跟踪人脸皮肤区域并从中提取周期性微弱变化的颜色信号估计出心率。目前基于级联回归树的人脸地标方法训练的Dlib库,由于能快速准确定位人脸轮廓,正逐渐被研究者用于跟踪皮肤感兴趣区域(region of interest,ROI)。由于实际应用中存在地标无规则抖动,且现有研究没有考虑目标晃动的影响,因此颜色信号提取不准确,心率估计精度不佳。为了克服以上缺陷,提出一种基于Dlib的抗地标抖动和运动晃动的跟踪方法。方法 本文方法主要包含两个步骤:首先,通过阈值判断两帧间地标的区别,若近似则沿用地标,反之使用当前帧地标以解决抖动问题。其次,针对运动晃动,通过左右眼地标中点计算旋转角度,矫正晃动的人脸,保证ROI在运动中也能保持一致。结果 通过信噪比(signal-to-noise,SNR)、平均绝对误差(mean absolute error,MAE)和均方根误差(root mean squared error,RMSE)来评价跟踪方法在rPPG中的测量表现。经在UBFC-RPPG(stands for Univ.Bourgogne Franche-Comté Remote PhotoPlethysmoGraphy)和PURE(Pulse Rate Detection Dataset)数据集测试,与Dlib相比,本文方法rPPG测量结果在UBFC-RPPG中SNR提高了约0.425 dB,MAE提高0.291 5 bpm,RMSE降低0.645 3 bpm;在PURE中SNR降低了0.041 1 dB,MAE降低0.065 2 bpm,RMSE降低0.271 8 bpm。结论 本文方法相比于Dlib有效提高跟踪框稳定性,在静止和运动中都能跟踪相同ROI,适合rPPG应用。  相似文献   

13.
In this paper, a landmark selection and tracking approach is presented for mobile robot navigation in natural environments, using textural distinctiveness-based saliency detection and spatial information acquired from stereo data. The presented method focuses on achieving high robustness of tracking rather than self-positioning accuracy. The landmark selection method is designed to select a small amount of the most salient feature points in a wide variety of sparse unknown environments to ensure successful matching. Landmarks are selected by an iterative algorithm from a textural distinctiveness-based saliency map extended with spatial information, where a repulsive potential field is created around the position of each already selected landmark for better distribution in order to increase robustness. The template matching of landmarks is aided with visual odometry-based motion estimation. Other robustness increasing strategies includes estimating landmark positions by unscented Kalman filters as well as from surrounding landmarks. Experimental results show that the introduced method is robust and suitable for natural environments.  相似文献   

14.
The work presented in this paper aims to develop a system for automatic translation of static gestures of alphabets and signs in American sign language. In doing so, we have used Hough transform and neural networks which is trained to recognize signs. Our system does not rely on using any gloves or visual markings to achieve the recognition task. Instead, it deals with images of bare hands, which allows the user to interact with the system in a natural way. An image is processed and converted to a feature vector that will be compared with the feature vectors of a training set of signs. The extracted features are not affected by the rotation, scaling or translation of the gesture within the image, which makes the system more flexible.The system was implemented and tested using a data set of 300 samples of hand sign images; 15 images for each sign. Experiments revealed that our system was able to recognize selected ASL signs with an accuracy of 92.3%.  相似文献   

15.
This paper presents Visual ENhancement of USers (VENUS), a system able to automatically enhance male and female frontal facial images exploiting a database of celebrities as reference patterns for attractiveness. Each face is represented by a set of landmark points that can be manually selected or automatically localized using active shape models. The faces can be compared remapping the landmarks by means of Catmull–Rom splines, a class of interpolating splines particularly useful to extract shape-based representations. Given the input image, its landmarks are compared against the known beauty templates and moved towards the K-nearest ones by 2D image warping. The VENUS performances have been evaluated by 20 volunteers on a set of images collected during the Festival of Creativity, held in Florence, Italy, on October 2007. The experiments show that the 73.9% of the beautified faces are more attractive than the original pictures.  相似文献   

16.
Faster R-CNN在工业CT图像缺陷检测中的应用   总被引:1,自引:0,他引:1       下载免费PDF全文
目的 传统的缺陷图像识别算法需要手工构建、选择目标的主要特征,并选取合适的分类器进行识别,局限性较大。为此研究了一种基于Faster R-CNN (faster Regions with convolutional neural networks features)的缺陷检测方法,该方法采用卷积网络自动提取目标的特征,避免了缺陷检测依赖手工设计缺陷特征的问题。方法 该方法基于卷积神经网络。首先,确定缺陷检测任务:选择工业CT (computed tomography)图像中主要存在的3种类型的缺陷:夹渣、气泡、裂纹为检测目标;其次,人工对缺陷图像采用矩形框(GT box)进行标注,生成坐标文件,并依据矩形框的长宽比选定42种类型的锚窗(anchor);在训练之前采用同态滤波对数据集做增强处理,增强后的图片经过卷积层与池化层后获得卷积特征图,并送入区域建议网络RPN (region proposal networks)中进行初次的目标(不区分具体类别)和背景判断,同时粗略地回归目标边框;最后经过RoI (region of interest) pooling层后输出固定大小的建议框,利用分类网络对建议区域进行具体的类别判断,并精确回归目标的边框。结果 待检测数据集的图片大小在150×150到350×250之间,每张图片含有若干个不同类别的气泡、夹渣和裂纹。利用训练出来的模型对缺陷图片进行检测,可以有效识别到不同类别的缺陷目标,其中可以检测到面积最小的缺陷区域为9×9 piexl,并快速、准确地标出气泡、夹渣和裂纹的位置,检测准确率高达96%,平均每张图片的检测时间为86 ms。结论 所提出的Faster R-CNN工业CT图像缺陷检测方法,避免了传统缺陷检测需要手动选取目标特征的问题,缺陷的识别与定位过程的自动化程度更高;该方法检测效果良好,如果需要检测更多种类的缺陷,只需要对网络进行微调训练即可获得新的检测模型。本文为工业CT图像缺陷检测提供了一种更高效的方法。  相似文献   

17.
This paper proposes a discriminative framework for efficiently aligning images. Although conventional Active Appearance Models (AAMs)-based approaches have achieved some success, they suffer from the generalization problem, i.e., how to align any image with a generic model. We treat the iterative image alignment problem as a process of maximizing the score of a trained two-class classifier that is able to distinguish correct alignment (positive class) from incorrect alignment (negative class). During the modeling stage, given a set of images with ground truth landmarks, we train a conventional Point Distribution Model (PDM) and a boosting-based classifier, which acts as an appearance model. When tested on an image with the initial landmark locations, the proposed algorithm iteratively updates the shape parameters of the PDM via the gradient ascent method such that the classification score of the warped image is maximized. We use the term Boosted Appearance Models (BAMs) to refer to the learned shape and appearance models, as well as our specific alignment method. The proposed framework is applied to the face alignment problem. Using extensive experimentation, we show that, compared to the AAM-based approach, this framework greatly improves the robustness, accuracy, and efficiency of face alignment by a large margin, especially for unseen data.  相似文献   

18.
Sharing visual features for multiclass and multiview object detection   总被引:6,自引:0,他引:6  
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (runtime) computational complexity and the (training-time) sample complexity scale linearly with the number of classes to be detected. We present a multitask learning procedure, based on boosted decision stumps, that reduces the computational and sample complexity by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required and, therefore, the runtime cost of the classifier, is observed to scale approximately logarithmically with the number of classes. The features selected by joint training are generic edge-like features, whereas the features chosen by training each class separately tend to be more object-specific. The generic features generalize better and considerably reduce the computational cost of multiclass object detection  相似文献   

19.
基于内容相关性的场景图像分类方法   总被引:4,自引:0,他引:4  
场景图像分类是计算机视觉领域中的一个基本问题.提出一种基于内容相关性的场景图像分类方法.首先从图像上提取视觉单词,并把图像表示成视觉单词的词频矢量;然后利用产生式模型来学习训练集合中包含的主题,和每一幅图像所包含的相关主题;最后用判定式分类器进行多类学习.提出的方法利用logistic正态分布对主题的相关性进行建模,使得学习得到的类别的主题分布更准确.并且在学习过程中不需要对图像内容进行人工标注.还提出了一种新的局部区域描述方法,它结合了局部区域的梯度信息和彩色信息.在自然场景图像集合和人造场景图像集合上实验了提出的方法,它相对于传统方法取得了更好的结果.  相似文献   

20.
This paper presents a technique of selecting an optimal number of features from the original set of features. Due to the large number of features considered, it is computationally more efficient to select a subset of features that can discriminate as well as the original set. The subset of features is determined using stepwise discriminant analysis. The results of using such a scheme to classify scaled, rotated, and translated binary images and also images that have been perturbed with random noise are reported. The features used in this study are Zernike moments, which are the mapping of the image onto a set of complex orthogonal polynomials. The performance of using a subset is examined through its comparison to the original set.The classifiers used in this study are neural network and a statistical nearest neighbor classifier. The back-propagation learning algorithm is used in training the neural network. The classifers are trained with some noiseless images and are tested with the remaining data set. When an optimal subset of features is used, the classifers performed almost as well as when trained with the original set of features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号