共查询到20条相似文献,搜索用时 15 毫秒
1.
A Region Ensemble for 3-D Face Recognition 总被引:1,自引:0,他引:1
Faltemier T.C. Bowyer K.W. Flynn P.J. 《Information Forensics and Security, IEEE Transactions on》2008,3(1):62-73
In this paper, we introduce a new system for 3D face recognition based on the fusion of results from a committee of regions that have been independently matched. Experimental results demonstrate that using 28 small regions on the face allow for the highest level of 3D face recognition. Score-based fusion is performed on the individual region match scores and experimental results show that the Borda count and consensus voting methods yield higher performance than the standard sum, product, and min fusion rules. In addition, results are reported that demonstrate the robustness of our algorithm by simulating large holes and artifacts in images. To our knowledge, no other work has been published that uses a large number of 3D face regions for high-performance face matching. Rank one recognition rates of 97.2% and verification rates of 93.2% at a 0.1% false accept rate are reported and compared to other methods published on the face recognition grand challenge v2 data set. 相似文献
2.
Bin Ma Haizhou Li Rong Tong 《IEEE transactions on audio, speech, and language processing》2007,15(7):2053-2062
In this paper, we study a novel approach to spoken language recognition using an ensemble of binary classifiers. In this framework, we begin by representing a speech utterance with a high-dimensional feature vector such as the phonotactic characteristics or the polynomial expansion of cepstral features. A binary classifier can be built based on such feature vectors. We adopt a distributed output coding strategy in ensemble classifier design, where we decompose a multiclass language recognition problem into many binary classification tasks, each of which addresses a language recognition subtask by using a component classifier. Then, we combine the results of the component classifiers to form an output code as a hypothesized solution to the overall language recognition problem. In this way, we effectively project high-dimensional feature vectors into a tractable low-dimensional space, yet maintaining language discriminative characteristics of the spoken utterances. By fusing the output codes from both phonotactic features and cepstral features, we achieve equal-error-rates of 1.38% and 3.20% for 30-s trials on the 2003 and 2005 NIST language recognition evaluation databases. 相似文献
3.
International Journal of Computer Vision - This paper strives for spatio-temporal localization of human actions in videos. In the literature, the consensus is to achieve localization by training on... 相似文献
4.
International Journal of Computer Vision - Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features... 相似文献
5.
6.
View Invariance for Human Action Recognition 总被引:4,自引:0,他引:4
This paper presents an approach for viewpoint invariant human action recognition, an area that has received scant attention
so far, relative to the overall body of work in human action recognition. It has been established previously that there exist
no invariants for 3D to 2D projection. However, there exist a wealth of techniques in 2D invariance that can be used to advantage
in 3D to 2D projection. We exploit these techniques and model actions in terms of view-invariant canonical body poses and
trajectories in 2D invariance space, leading to a simple and effective way to represent and recognize human actions from a
general viewpoint. We first evaluate the approach theoretically and show why a straightforward application of the 2D invariance
idea will not work. We describe strategies designed to overcome inherent problems in the straightforward approach and outline
the recognition algorithm. We then present results on 2D projections of publicly available human motion capture data as well
on manually segmented real image sequences. In addition to robustness to viewpoint change, the approach is robust enough to
handle different people, minor variabilities in a given action, and the speed of aciton (and hence, frame-rate) while encoding
sufficient distinction among actions.
This work was done when the author was a graduate student in the Department of Computer Science and was partially supported
by the NSF Grant ECS-02-5475. The author is curently with Siemens Corporate Research, Princeton, NJ.
Dr. Chellappa is with the Department of Electrical and Computer Engineering. 相似文献
7.
8.
《Advanced Robotics》2013,27(6-7):871-891
In robotics, there has been a growing interest in expressing actions as a combination of meaningful subparts commonly called motion primitives. Primitives are analogous to words in a language. Similar to words put together according to the rules of language in a sentence, primitives arranged with certain rules make an action. In this paper we investigate modeling and recognition of arm manipulation actions at different levels of complexity using primitives. Primitives are detected automatically in a sequential manner. Here, we assume no prior knowledge on primitives, but look for correlating segments across various sequences. All actions are then modeled within a single hidden Markov models whose structure is learned incrementally as new data is observed. We also generate an action grammar based on these primitives and thus link signals to symbols. 相似文献
9.
How much does knowledge regarding a certain spoken word or phrase help with its localization? This is a very fundamental question for speech processing, and will be partially addressed in this paper. In particular, this work will utilize prior information regarding the contents of a speech signal in order to improve the artificial localization of it using Time delay of arrival (TDOA) between two microphones. The prior information, which is used to develop a very simple frequency-selective phase transform (FPT), increases the effective SNR by only using a subset of the highest SNR frequencies in the Phase Transform. Simulations in a reverberant environment show that the proposed approach can more robustly and accurately localize speech sources. For 20 ms signal segments, it is shown that using a subset of 45 percent of available speech frequency bins is superior to using 30, 60, or 100, where using 100 corresponds to the standard Phase Transform. 相似文献
10.
活体虹膜图像的定位与分割 总被引:2,自引:0,他引:2
介绍了一种活体虹膜的定位与分割算法。算法主要分为两部分:圆环的定位与非虹膜区域的去除。本算法根据眼睛的生理特点和数字虹膜图像的实际情况,利用传统定位方法与数学形态学相结合对虹膜区域进行快速而准确的定位,并分别提出了去除眼睑、睫毛和光斑影响的解决方案。算法中也考虑到实际应用可能遇到的影响虹膜定位与分割的问题。实验表明,该算法取得较好的分割结果,并且具有鲁棒性。 相似文献
11.
Neural Processing Letters - Learning spatiotemporal information is a fundamental part in action recognition. In this work, we attempt to extract efficient spatiotemporal information for video... 相似文献
12.
13.
We propose a layered-grammar model to represent actions. Using this model, an action is represented by a set of grammar rules. The bottom layer of an action instance’s parse tree contains action primitives such as spatiotemporal (ST) interest points. At each layer above, we iteratively mine grammar rules and “super rules” that account for the high-order compositional feature structures. The grammar rules are categorized into three classes according to three different ST-relations of their action components, namely the strong relation, weak relation and stochastic relation. These ST-relations characterize different action styles (degree of stiffness), and they are pursued in terms of grammar rules for the purpose of action recognition. By adopting the Emerging Pattern (EP) mining algorithm for relation pursuit, the learned production rules are statistically significant and discriminative. Using the learned rules, the parse tree of an action video is constructed by combining a bottom-up rule detection step and a top-down ambiguous rule pruning step. An action instance is recognized based on the discriminative configurations generated by the production rules of its parse tree. Experiments confirm that by incorporating the high-order feature statistics, the proposed method largely improves the recognition performance over the bag-of-words models. 相似文献
14.
神经网络集成方法具有比单个神经网络更强的泛化能力,却因为其黑箱性而难以理解;决策树算法因为分类结果显示为树型结构而具有良好的可理解性,泛化能力却比不上神经网络集成。该文将这两种算法相结合,提出一种决策树的构造算法:使用神经网络集成来预处理训练样本,使用C4.5算法处理预处理后的样本并生成决策树。该文在UCI数据上比较了神经网络集成方法、决策树C4.5算法和该文算法,实验表明:该算法具有神经网络集成方法的强泛化能力的优点,其泛化能力明显优于C4.5算法;该算法的最终结果昆示为决策树,显然具有良好的可理解性。 相似文献
15.
人体动作识别的研究 总被引:1,自引:0,他引:1
程祥 《数字社区&智能家居》2006,(7):120-121,133
本文深入分析比较了当前的人体动作识别技术中的识别算法、人体检测方法和人体表征.并重点阐述了基于隐马尔可夫模型的动作识别算法中需要解决的主要问题和相应的解决方法。 相似文献
16.
程祥 《数字社区&智能家居》2006,(20)
本文深入分析比较了当前的人体动作识别技术中的识别算法、人体检测方法和人体表征,并重点阐述了基于隐马尔可夫模型的动作识别算法中需要解决的主要问题和相应的解决方法。 相似文献
17.
18.
19.