共查询到20条相似文献,搜索用时 15 毫秒
1.
Alex D. Holub Max Welling Pietro Perona 《International Journal of Computer Vision》2008,77(1-3):239-258
Learning models for detecting and classifying object categories is a challenging problem in machine vision. While discriminative
approaches to learning and classification have, in principle, superior performance, generative approaches provide many useful
features, one of which is the ability to naturally establish explicit correspondence between model components and scene features—this,
in turn, allows for the handling of missing data and unsupervised learning in clutter. We explore a hybrid generative/discriminative
approach, using ‘Fisher Kernels’ (Jaakola, T., et al. in Advances in neural information processing systems, Vol. 11, pp. 487–493,
1999), which retains most of the desirable properties of generative methods, while increasing the classification performance through
a discriminative setting. Our experiments, conducted on a number of popular benchmarks, show strong performance improvements
over the corresponding generative approach. In addition, we demonstrate how this hybrid learning paradigm can be extended
to address several outstanding challenges within computer vision including how to combine multiple object models and learning
with unlabeled data. 相似文献
2.
Object count/area graphs for the evaluation of object detection and segmentation algorithms 总被引:1,自引:0,他引:1
Christian Wolf Jean-Michel Jolion 《International Journal on Document Analysis and Recognition》2006,8(4):280-296
Evaluation of object detection algorithms is a non-trivial task: a detection result is usually evaluated by comparing the bounding box of the detected object with the bounding box of the ground truth object. The commonly used precision and recall measures are computed from the overlap area of these two rectangles. However, these measures have several drawbacks: they don't give intuitive information about the proportion of the correctly detected objects and the number of false alarms, and they cannot be accumulated across multiple images without creating ambiguity in their interpretation. Furthermore, quantitative and qualitative evaluation is often mixed resulting in ambiguous measures.In this paper we propose a new approach which tackles these problems. The performance of a detection algorithm is illustrated intuitively by performance graphs which present object level precision and recall depending on constraints on detection quality. In order to compare different detection algorithms, a representative single performance value is computed from the graphs. The influence of the test database on the detection performance is illustrated by performance/generality graphs. The evaluation method can be applied to different types of object detection algorithms. It has been tested on different text detection algorithms, among which are the participants of the ICDAR 2003 text detection competition.The work presented in this article has been conceived in the framework of two industrial contracts with France Télécom in the framework of the projects ECAV I and ECAV II with respective numbers 001B575 and 0011BA66. 相似文献
3.
Recognition is the fundamental task of visual cognition, yet how to formalize the general recognition problem for computer vision remains an open issue. The problem is sometimes reduced to the simplest case of recognizing matching pairs, often structured to allow for metric constraints. However, visual recognition is broader than just pair-matching: what we learn and how we learn it has important implications for effective algorithms. In this review paper, we reconsider the assumption of recognition as a pair-matching test, and introduce a new formal definition that captures the broader context of the problem. Through a meta-analysis and an experimental assessment of the top algorithms on popular data sets, we gain a sense of how often metric properties are violated by recognition algorithms. By studying these violations, useful insights come to light: we make the case for local distances and systems that leverage outside information to solve the general recognition problem. 相似文献
4.
The present study employs deep learning methods to recognize repetitive assembly actions and estimate their operating times. It is intended to monitor the assembly process of workers and prevent assembly quality problems caused by the lack of key operational steps and the irregular operation of workers. Based on the characteristics of the repeatability and tool dependence of the assembly action, the recognition of the assembly action is considered as the tool object detection in the present study. Moreover, the YOLOv3 algorithm is initially applied to locate and judge the assembly tools and recognize the worker's assembly action. The present study shows that the accuracy of the action recognition is 92.8 %. Then, the pose estimation algorithm CPM based on deep learning is used to realize the recognition of human joint. Finally, the joint coordinates are extracted to judge the operating times of repetitive assembly actions. The accuracy rate of judging the operating times for repetitive assembly actions is 82.1 %. 相似文献
5.
入侵目标视觉检测与识别是无人机感知与规避技术领域中重点研究的课题,关键任务是无人机在飞行过程中通过机载传感器获取光学机载图像中并判断是否存在入侵目标,并对入侵目标进行检测识别和定位。入侵目标视觉检测与识别是将无人机安全集成到国家空域并保证无人机和有人机飞行安全的关键技术之一。本文主要围绕无人机感知与规避技术中的入侵目标视觉检测与识别技术方面,分析检测与识别入侵目标所面临的一些难点问题,综述当前入侵目标视觉检测与识别的主要处理方法,并指出了该领域存在的尚未解决的问题和展望未来的发展趋势。 相似文献
6.
Mobile battery-operated devices are becoming an essential instrument for business, communication, and social interaction. In addition to the demand for an acceptable level of performance and a comprehensive set of features, users often desire extended battery lifetime. In fact, limited battery lifetime is one of the biggest obstacles facing the current utility and future growth of increasingly sophisticated “smart” mobile devices. This paper proposes a novel application-aware and user-interaction aware energy optimization middleware framework (AURA) for pervasive mobile devices. AURA optimizes CPU and screen backlight energy consumption while maintaining a minimum acceptable level of performance. The proposed framework employs a novel Bayesian application classifier and management strategies based on Markov Decision Processes and Q-Learning to achieve energy savings. Real-world user evaluation studies on Google Android based HTC Dream and Google Nexus One smartphones running the AURA framework demonstrate promising results, with up to 29% energy savings compared to the baseline device manager, and up to 5×savings over prior work on CPU and backlight energy co-optimization. 相似文献
7.
8.
一种基于向量夹角的k近邻多标记文本分类算法 总被引:2,自引:1,他引:1
在多标记学习中,一个示例可以有多个概念标记.学习系统的目标是通过对由多标记样本组成的训练集进行学习,以尽可能正确地预测未知样本所对应的概念标记集.k近邻算法已被应用到多标记学习中,该算法将测试示例转化为多维向量,根据其k个近邻样本的标记向量来确定该测试示例的标记向量.传统的k近邻算法是基于向量的空间距离来选取近邻,而在自然语言处理中,文本间的相似度常用文本向量的夹角来表示,所以本文将文本向量间的夹角关系作为选取k近邻的标准并结合k近邻算法提出了一种多标记文本学习算法.实验表明,该算法在文档分类的准确率上体现出较好的性能. 相似文献
9.
Boosting算法是目前流行的一种机器学习算法。采用Boosting家族的Adaboost.MH算法作为分类算法,设计了一个中文文本自动分类器,并给出了评估方法和结果。评价表明,该分类器和SVM的分类精度相当,而较基于其他分类算法的分类器有更好的分类精度。 相似文献
10.
11.
深入研究系统调用异常检测方法存在的不足,针对单纯依据序列或系统调用频率不能完整表示进程行为等问题,提出以研究系统调用的先后顺序以及系统调用之间的稳定性作为重要特征,提取系统调用特征向量,利用机器学习分类算法实现异常检测的新方法。提出的异常检测方法具有模型体积小、特征明确、报警准确率高等优点。静态数据测试结果表明利用系统调用时间特征描述进程行为是可行的;实时环境实验结果表明系统在真实环境下占用资源少、不影响程序及网络本身的运行效率,同时用户击键特征识别实验结果表明了时间特征对行为检测的有效性。 相似文献
12.
本文对自然场景文本检测问题及其方法的研究进展进行了综述.首先,论述了自然场景文本的特点、自然场景文本检测技术的研究背景、现状以及主要技术路线.其次,从传统文本检测以及深度学习文本检测的视角出发,梳理、分析并比较了各类自然场景文本检测方法的优缺点,并介绍了端对端文本识别技术.再次,论述了自然场景文本检测技术所面临的挑战,探讨了相应的解决方案.最后,本文列举了测试基准数据集、评估方法,将最具代表性的自然场景文本检测方法的性能进行了比较,本文还展望了本领域的发展趋势. 相似文献
13.
Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional feature-vector-based learning methods, one could treat the presence or absence of a word as a Boolean feature and use these binary-valued features together with the numerical features. However, the use of a text-classification system on this is a bit more problematic—in the most straight-forward approach each number would be considered a distinct token and treated as a word. This paper presents an alternative approach for the use of text classification methods for supervised learning problems with numerical-valued features in which the numerical features are converted into bag-of-words features, thereby making them directly usable by text classification methods. We show that even on purely numerical-valued data the results of text classification on the derived text-like representation outperforms the more naive numbers-as-tokens representation and, more importantly, is competitive with mature numerical classification methods such as C4.5, Ripper, and SVM. We further show that on mixed-mode data adding numerical features using our approach can improve performance over not adding those features. 相似文献
14.
权重润饰和改进的分类对不平衡数据的处理 总被引:1,自引:0,他引:1
王和勇 《计算机应用与软件》2009,26(8):144-146,161
不平衡数据集是指某类样本数量明显少于其它类样本数量的数据集,传统的分类算法在处理不平衡数据分类问题时会倾向于多数类,而导致少数类的分类精度较低.针对文本数据的不平衡情况,首先采用权重润饰(Weight-retouching)的方法进行特征提取,然后采用欠取样(Under sampling)的支持向量机SVM(Support Vector Machine)方法进行文本分类.通过实验发现,使用权重润饰和欠取样的SVM方法可以提高处理不平衡数据的分类精度. 相似文献
15.
Marco Pedersoli Jordi GonzàlezAndrew D. Bagdanov Xavier Roca 《Pattern recognition letters》2011,32(13):1581-1587
Human detection is fundamental in many machine vision applications, like video surveillance, driving assistance, action recognition and scene understanding. However in most of these applications real-time performance is necessary and this is not achieved yet by current detection methods.This paper presents a new method for human detection based on a multiresolution cascade of Histograms of Oriented Gradients (HOG) that can highly reduce the computational cost of detection search without affecting accuracy. The method consists of a cascade of sliding window detectors. Each detector is a linear Support Vector Machine (SVM) composed of HOG features at different resolutions, from coarse at the first level to fine at the last one.In contrast to previous methods, our approach uses a non-uniform stride of the sliding window that is defined by the feature resolution and allows the detection to be incrementally refined as going from coarse-to-fine resolution. In this way, the speed-up of the cascade is not only due to the fewer number of features computed at the first levels of the cascade, but also to the reduced number of windows that need to be evaluated at the coarse resolution. Experimental results show that our method reaches a detection rate comparable with the state-of-the-art of detectors based on HOG features, while at the same time the detection search is up to 23 times faster. 相似文献
16.
Existing classification algorithms use a set of training examples to select classification features, which are then used for all future applications of the classifier. A major problem with this approach is the selection of a training set: a small set will result in reduced performance, and a large set will require extensive training. In addition, class appearance may change over time requiring an adaptive classification system. In this paper, we propose a solution to these basic problems by developing an on-line feature selection method, which continuously modifies and improves the features used for classification based on the examples provided so far. The method is used for learning a new class, and to continuously improve classification performance as new data becomes available. In ongoing learning, examples are continuously presented to the system, and new features arise from these examples. The method continuously measures the value of the selected features using mutual information, and uses these values to efficiently update the set of selected features when new training information becomes available. The problem is challenging because at each stage the training process uses a small subset of the training data. Surprisingly, with sufficient training data the on-line process reaches the same performance as a scheme that has a complete access to the entire training data. 相似文献
17.
In this paper, we present approaches to detecting and segmenting text in videos. The proposed video-text-detection technique is capable of adaptively applying appropriate operators for video frames of different modalities by classifying the background complexities. Effective operators such as the repeated shifting operations are applied for the noise removal of images with high edge density. Meanwhile, a text-enhancement technique is used to highlight the text regions of low-contrast images. A coarse-to-fine projection technique is then employed to extract text lines from video frames. Experimental results indicate that the proposed text-detection approach is superior to the machine-learning-based (such as SVM and neural network), multiresolution-based, and DCT-based approaches in terms of detection and false-alarm rates. Besides text detection, a technique for text segmentation is also proposed based on adaptive thresholding. A commercial OCR package is then used to recognize the segmented foreground text. A satisfactory character-recognition rate is reported in our experiments.Published online: 14 December 2004 相似文献
18.
19.
Dimitris Fragoudis Dimitris Meretakis Spiridon Likothanassis 《Knowledge and Information Systems》2005,8(1):16-33
In this paper, we propose a new feature-selection algorithm for text classification, called best terms (BT). The complexity of BT is linear in respect to the number of the training-set documents and is independent from both the vocabulary size and the number of categories. We evaluate BT on two benchmark document collections, Reuters-21578 and 20-Newsgroups, using two classification algorithms, naive Bayes (NB) and support vector machines (SVM). Our experimental results, comparing BT with an extensive and representative list of feature-selection algorithms, show that (1) BT is faster than the existing feature-selection algorithms; (2) BT leads to a considerable increase in the classification accuracy of NB and SVM as measured by the F1 measure; (3) BT leads to a considerable improvement in the speed of NB and SVM; in most cases, the training time of SVM has dropped by an order of magnitude; (4) in most cases, the combination of BT with the simple, but very fast, NB algorithm leads to classification accuracy comparable with SVM while sometimes it is even more accurate. 相似文献
20.
In recent years, computer vision has been widely used on industrial environments, allowing robots to perform important tasks like quality control, inspection and recognition. Vision systems are typically used to determine the position and orientation of objects in the workstation, enabling them to be transported and assembled by a robotic cell (e.g. industrial manipulator). These systems commonly resort to CCD (Charge-Coupled Device) Cameras fixed and located in a particular work area or attached directly to the robotic arm (eye-in-hand vision system). Although it is a valid approach, the performance of these vision systems is directly influenced by the industrial environment lighting. Taking all these into consideration, a new approach is proposed for eye-on-hand systems, where the use of cameras will be replaced by the 2D Laser Range Finder (LRF). The LRF will be attached to a robotic manipulator, which executes a pre-defined path to produce grayscale images of the workstation. With this technique the environment lighting interference is minimized resulting in a more reliable and robust computer vision system. After the grayscale image is created, this work focuses on the recognition and classification of different objects using inherent features (based on the invariant moments of Hu) with the most well-known machine learning models: k-Nearest Neighbor (kNN), Neural Networks (NNs) and Support Vector Machines (SVMs). In order to achieve a good performance for each classification model, a wrapper method is used to select one good subset of features, as well as an assessment model technique called K-fold cross-validation to adjust the parameters of the classifiers. The performance of the models is also compared, achieving performances of 83.5% for kNN, 95.5% for the NN and 98.9% for the SVM (generalized accuracy). These high performances are related with the feature selection algorithm based on the simulated annealing heuristic, and the model assessment (k-fold cross-validation). It makes possible to identify the most important features in the recognition process, as well as the adjustment of the best parameters for the machine learning models, increasing the classification ratio of the work objects present in the robot's environment. 相似文献