首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we present a framework for parsing video events with stochastic Temporal And–Or Graph (T-AOG) and unsupervised learning of the T-AOG from video. This T-AOG represents a stochastic event grammar. The alphabet of the T-AOG consists of a set of grounded spatial relations including the poses of agents and their interactions with objects in the scene. The terminal nodes of the T-AOG are atomic actions which are specified by a number of grounded relations over image frames. An And-node represents a sequence of actions. An Or-node represents a number of alternative ways of such concatenations. The And–Or nodes in the T-AOG can generate a set of valid temporal configurations of atomic actions, which can be equivalently represented as the language of a stochastic context-free grammar (SCFG). For each And-node we model the temporal relations of its children nodes to distinguish events with similar structures but different temporal patterns and interpolate missing portions of events. This makes the T-AOG grammar context-sensitive. We propose an unsupervised learning algorithm to learn the atomic actions, the temporal relations and the And–Or nodes under the information projection principle in a coherent probabilistic framework. We also propose an event parsing algorithm based on the T-AOG which can understand events, infer the goal of agents, and predict their plausible intended actions. In comparison with existing methods, our paper makes the following contributions. (i) We represent events by a T-AOG with hierarchical compositions of events and the temporal relations between the sub-events. (ii) We learn the grammar, including atomic actions and temporal relations, automatically from the video data without manual supervision. (iii) Our algorithm infers the goal of agents and predicts their intents by a top-down process, handles events insertion and multi-agent events, keeps all possible interpretations of the video to preserve the ambiguities, and achieves the globally optimal parsing solution in a Bayesian framework. (iv) The algorithm uses event context to improve the detection of atomic actions, segment and recognize objects in the scene. Extensive experiments, including indoor and out door scenes, single and multiple agents events, are conducted to validate the effectiveness of the proposed approach.  相似文献   

2.
人的行为模式的理解与识别是智能视觉监控系统的一个关键环节。针对目前大部分的研究都是简单场景下的简单行为识别,不具有广泛适用性的问题,该文提出一种复杂场景下的分层行为建模和识别方法。通过统计方法在监控画面内选定若干个有意义的标志点,利用这些标志点将复杂行为分解为一系列简单行为,对简单行为的轨迹进行HMM建模,并利用Level-Building算法进行复杂行为的识别。实验结果表明,该方法对复杂行为具有较高的识别率,而且在多种场景下具有普适性。  相似文献   

3.
There is very little of time in a temporal constraint propagation algorithm. Most of these algorithms could easily handle reasoning over any domain mapable onto rational numbers, e.g., weight, frequency, luminosity, etc. Actually some of these algorithms are capable of handling more sophisticated domains than those mapped onto rational numbers, e.g., intervals, or partially ordered objects (say, time). In this article we have generalized such an algorithm, which was originally developed for time-interval domain, to any generic domain, where binary constraints are expressed over arcs of the constraint network. Given the composition table for the primitive relations between a pair of the domain entities (e.g., intervals) as an additional input along with a constraint graph, the algorithm would generate all consistent singleton models for a given network. The algorithm is also extended here to handle uncertainty values.  相似文献   

4.
The role of perceptual organization in motion analysis has heretofore been minimal. In this work we present a simple but powerful computational model and associated algorithms based on the use of perceptual organizational principles, such as temporal coherence (or common fate) and spatial proximity, for motion segmentation. The computational model does not use the traditional frame by frame motion analysis; rather it treats an image sequence as a single 3D spatio-temporal volume. It endeavors to find organizations in this volume of data over three levels—signal, primitive, and structural. The signal level is concerned with detecting individual image pixels that are probably part of a moving object. The primitive level groups these individual pixels into planar patches, which we call the temporal envelopes. Compositions of these temporal envelopes describe the spatio-temporal surfaces that result from object motion. At the structural level, we detect these compositions of temporal envelopes by utilizing the structure and organization among them. The algorithms employed to realize the computational model include 3D edge detection, Hough transformation, and graph based methods to group the temporal envelopes based on Gestalt principles. The significance of the Gestalt relationships between any two temporal envelopes is expressed in probabilistic terms. One of the attractive features of the adopted algorithm is that it does not require the detection of special 2D features or the tracking of these features across frames. We demonstrate that even with simple grouping strategies, we can easily handle drastic illumination changes, occlusion events, and multiple moving objects, without the use of training and specific object or illumination models. We present results on a large variety of motion sequences to demonstrate this robustness.  相似文献   

5.
关系图文法及其应用   总被引:3,自引:1,他引:3  
方林  谢立 《软件学报》1997,8(2):87-92
字符串文法不适于描述二维以上事物的特征,无法定义事物之间的复杂关系.本文提出了关系图的概念,对关系图的性质进行了研究.在此基础上提出了一种新的文法——关系图文法.该文法能够方便地抽象和概括二维以上复杂对象的特征,为分析和识别这些对象提供工具和方法,可以广泛应用于模式识别、高维文本分析和描述图示语言的语法等领域.为了使关系图文法实用化,本文还提出了相应的识别和匹配算法.  相似文献   

6.
7.
This paper describes a probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents. The fundamental idea is to divide the recognition problem into two levels. The lower level detections are performed using standard independent probabilistic event detectors to propose candidate detections of low-level features. The outputs of these detectors provide the input stream for a stochastic context-free grammar parsing mechanism. The grammar and parser provide longer range temporal constraints, disambiguate uncertain low-level detections, and allow the inclusion of a priori knowledge about the structure of temporal events in a given domain. We develop a real-time system and demonstrate the approach in several experiments on gesture recognition and in video surveillance. In the surveillance application, we show how the system correctly interprets activities of multiple interacting objects  相似文献   

8.
This paper describes Meta-Restriction Grammar for parsing coordinate conjunction in English. Meta-Restriction Grammar consists of Restriction Grammar, a logic grammar implementation of Sager's String Grammar, plus a metagrammatical component that automatically rewrites “base” grammar rules into more complex rules to handle coordinate conjunction. The approach resembles Sedogbo's approach of “empty elements” or “holes.” This avoids the combinatorial explosion due to backtracking in the treatment of Woods, Sager, and Dahl and McCord. Restriction Grammar is well suited to metagrammar extensions, because the absence of parameters in grammar rules facilitates the statement of metarules. The metagrammatical component generates grammar rules specifying allowable conjoinings at limited types of nodes, to reduce redundancy. Meta-Restriction Grammar represents both the surface structure and a regularized structure (via pointers to elided elements) for efficient computation of selectional restrictions. This approach is sufficiently powerful to handle a number of complex phenomena, such as conjunction with comma (as distinguished from the appositive construction), paired conjunctions such as both ... and, either ... or, and scoping of left noun modifiers under conjunction. One of the great attractions of the metagrammar approach is that the grammar can be translated and compiled, resulting in an efficient treatment of conjunction (parse times of 1 to 3 seconds per parse). This contracts with the interrupt-driven approach, where an interpreter generates rules for conjoining structures on demand, making it impossible to compile the complete grammar.  相似文献   

9.
Motion segmentation refers to the problem of separating the objects in a video sequence according to their motion. It is a fundamental problem of computer vision, since various systems focusing on the analysis of dynamic scenes include motion segmentation algorithms. In this paper we present a novel approach, where a video shot is temporally divided in successive and overlapping windows and motion segmentation is performed on each window respectively. This attribute renders the algorithm suitable even for long video sequences. In the last stage of the algorithm the segmentation results for every window are aggregated into a final segmentation. The presented algorithm can handle effectively asynchronous trajectories on each window even when they have no temporal intersection. The evaluation of the proposed algorithm on the Berkeley motion segmentation benchmark demonstrates its scalability and accuracy compared to the state of the art.  相似文献   

10.
This paper describes a novel approach for incremental learning of human motion pattern primitives through online observation of human motion. The observed time series data stream is first stochastically segmented into potential motion primitive segments, based on the assumption that data belonging to the same motion primitive will have the same underlying distribution. The motion segments are then abstracted into a stochastic model representation and automatically clustered and organized. As new motion patterns are observed, they are incrementally grouped together into a tree structure, based on their relative distance in the model space. The tree leaves, which represent the most specialized learned motion primitives, are then passed back to the segmentation algorithm so that as the number of known motion primitives increases, the accuracy of the segmentation can also be improved. The combined algorithm is tested on a sequence of continuous human motion data that are obtained through motion capture, and demonstrates the performance of the proposed approach.  相似文献   

11.
In 1983, Allen presented an ingenious method for the representation and maintenance of temporal information in the presence of imprecise, uncertain, and relative knowledge about time of occurrence. He introduced 13 relations between his primitive “temporal intervals,” providing for the expression of “any relationship which can hold between two intervals.” the model, however, did not address the problem of temporally incomparable events, such as events occurring in a distributed system without a common clock. Lamport's interprocessor communication model furnishes an axiomatic system for describing such events and their possible relationships. This article demonstrates that Allen's temporal model can be subsumed in a more general model based on Lamport's axiomatics. It is further suggested that this extended model can provide the underpinnings of a temporal knowledge base containing time-dependent information measured by unsynchronized clocks or in relativistic space-time. In this model, the number of relations between intervals increases dramatically from Allen's 13 or Lamport's 2 or 3 to over 80. Within this context, a modification of Allen's algorithm for the maintenance of a temporal reasoning system is presented, thus permitting the advantages of such a system to extend to reasoning about a wider range of phenomena.  相似文献   

12.
Structured documents are usually processed by tree-based document transformers, which transform the document tree representing the structure of the input document into another tree structure. Event-based document transformers, by contrast, recognize the input as a stream of parsing events, i.e., lexical tokens, and process the events one by one in an event-driven manner. Event-based document transformers have advantages that they need less memory space and that they are more tolerant of large inputs, compared to tree-based transformers, which construct the intermediate tree representation.This paper proposes an algorithm which derives an event-based transformer from a given specification of a document transformation over a tree structure. The derivation of an event-based transformer is carried out in the framework of attribute grammars. We first obtain an attribute grammar which processes a stream of parsing events, by applying a deforestation method; We then derive an attribute evaluation scheme relevant to the event-based transformation. Using this algorithm, one can develop event-based document transformers in a more declarative style than directly programming over the stream of parsing events.  相似文献   

13.
This paper presents the use of place/transition petri nets (PNs) for the recognition and evaluation of complex multi-agent activities. The PNs were built automatically from the activity templates that are routinely used by experts to encode domain-specific knowledge. The PNs were built in such a way that they encoded the complex temporal relations between the individual activity actions. We extended the original PN formalism to handle the propagation of evidence using net tokens. The evaluation of the spatial and temporal properties of the actions was carried out using trajectory-based action detectors and probabilistic models of the action durations. The presented approach was evaluated using several examples of real basketball activities. The obtained experimental results suggest that this approach can be used to determine the type of activity that a team has performed as well as the stage at which the activity ended.  相似文献   

14.
针对帧间编码模式下出现图像块丢失的情况,提出一种有效的时间域运动矢量恢复差错掩盖算法。把运动矢量场建模为高斯马尔科夫随机场,对丢失图像块的运动矢量采用最大后验概率方法恢复,其权值能够根据空间和时间信息而自适应选择。仿真结果表明,该算法在客观和主观上都能获得高质量的图像。  相似文献   

15.
This study presents a real-time texture transfer method for artistic style transfer for video stream. We propose a parallel framework using a T-shaped kernel to enhance the computational performance. With regard to accelerated motion estimation, which is necessarily required for maintaining temporal coherence, we present a method using a downscaled motion field to successfully achieve high real-time performance for texture transfer of video stream. In addition, to enhance the artistic quality, we calculate the level of abstraction using visual saliency and integrate it with the texture transfer algorithm. Thus, our algorithm can stylize video with perceptual enhancements.  相似文献   

16.
空间数据库中连接运算的处理与优化   总被引:7,自引:0,他引:7       下载免费PDF全文
空间数据库的性能问题严重制约了它的应用与发展 .由于空间连接运算是空间数据库中最复杂、最耗时的基本操作 ,因此其处理效率在很大程度上决定了空间数据库的整体性能 .尽管目前已经有许多空间连接算法 ,但空间连接运算的代价估计和查询优化仍然有待进一步研究 .众所周知 ,大部分空间连接算法都是基于 R树索引实现的 ,如果参与空间连接运算的关系上没有索引或只有部分索引 ,那么就需要使用特殊的算法来处理 .另外 ,各种算法的代价评估模型需要一个相对统一的计算方法 ,实践证明 ,根据空间数据库的实际情况 ,使用 I/ O代价来估计算法的复杂性较为合理 .在此基础上 ,针对复杂的空间查询中可能出现多个关系参与空间连接运算的情况 ,故还需要合理地应用动态编程算法来找出代价最优的连接顺序 ,以便最终形成一个通用的算法框架 .通过对该算法框架的复杂性分析可以看出 ,在此基础上实现的空间数据库查询优化系统将具有较高的时空效率 ,并且能够处理非常复杂的空间查询  相似文献   

17.
18.
This paper proposes a vision-only online mosaicing method for underwater surveys. Our method tackles a common problem in low-cost imaging platforms, where complementary navigation sensors produce imprecise or even missing measurements. Under these circumstances, the success of the optical mapping depends on the continuity of the acquired video stream. However, this continuity cannot be always guaranteed due to the motion blurs or lack of texture, common in underwater scenarios. Such temporal gaps hinder the extraction of reliable motion estimates from visual odometry, and compromise the ability to infer the presence of loops for producing an adequate optical map. Unlike traditional underwater mosaicing methods, our proposal can handle camera trajectories with gaps between time-consecutive images. This is achieved by constructing minimum spanning tree which verifies whether the current topology is connected or not. To do so, we embed a trajectory estimate correction step based on graph theory algorithms. The proposed method was tested with several different underwater image sequences and results were presented to illustrate the performance.  相似文献   

19.
20.
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words   总被引:16,自引:0,他引:16  
We present a novel unsupervised learning method for human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm automatically learns the probability distributions of the spatial-temporal words and the intermediate topics corresponding to human action categories. This is achieved by using latent topic models such as the probabilistic Latent Semantic Analysis (pLSA) model and Latent Dirichlet Allocation (LDA). Our approach can handle noisy feature points arisen from dynamic background and moving cameras due to the application of the probabilistic models. Given a novel video sequence, the algorithm can categorize and localize the human action(s) contained in the video. We test our algorithm on three challenging datasets: the KTH human motion dataset, the Weizmann human action dataset, and a recent dataset of figure skating actions. Our results reflect the promise of such a simple approach. In addition, our algorithm can recognize and localize multiple actions in long and complex video sequences containing multiple motions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号