首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
视频显著性目标检测需要同时结合空间信息和时间信息,连续地定位视频序列中与运动相关的显著性目标,其核心问题在于如何高效地刻画运动目标的时空特征.现有的视频显著性目标检测算法大多使用光流,ConvLSTM以及3D卷积等提取时域特征,缺乏对时间信息的连续学习能力.为此,设计了一种鲁棒的时空渐进式学习网络(spatial-temporal progressive learning network, STPLNet),以完成对视频序列中显著性目标的高效定位.在空间域中使用一种U型结构对各视频帧进行编码解码,在时间域中通过学习视频序列中帧间运动目标的主体部分和形变区域特征,渐进地对运动目标特征进行编码,能够捕捉到目标的时间相关性特征和运动趋向性.在4个公开数据集上与13个主流的视频显著性目标检测算法进行一系列对比实验,所提出的模型在多个指标(max F, S-measure (S), MAE)上达到了最优结果,同时在运行速度上具有较好的实时性.  相似文献   

2.
A novel classification method based on multiple-point statistics (MPS) is proposed in this article. The method is a modified version of the spatially weighted k-nearest neighbour (k-NN) classifier, which accounts for spatial correlation through weights applied to neighbouring pixels. The MPS characterizes the spatial correlation between multiple points of land-cover classes by learning local patterns in a training image. This rich spatial information is then converted to multiple-point probabilities and incorporated into the k-NN classifier. Experiments were conducted in two study areas, in which the proposed method for classification was tested on a WorldView-2 sub-scene of the Sichuan mountainous area and an IKONOS image of the Beijing urban area. The multiple-point weighted k-NN method (MPk-NN) was compared to several alternatives; including the traditional k-NN and two previously published spatially weighted k-NN schemes; the inverse distance weighted k-NN, and the geostatistically weighted k-NN. The classifiers using the Bayesian and Support Vector Machine (SVM) methods, and these classifiers weighted with spatial context using the Markov random field (MRF) model, were also introduced to provide a benchmark comparison with the MPk-NN method. The proposed approach increased classification accuracy significantly relative to the alternatives, and it is, thus, recommended for the identification of land-cover types with complex and diverse spatial distributions.  相似文献   

3.
This article describes a standardised way to build context-aware global smart space applications using information that is distributed across independent (legacy, sensor-enabled, and embedded) systems by exploiting the overlapping spatial and temporal attributes of the information maintained by these systems. The framework supports a spatial programming model based on a topographical approach to modelling space that enables systems to independently define and use potentially overlapping spatial context in a consistent manner and in contrast to topological approaches, in which geographical relationships between objects are described explicitly. This approach is supported by an extensible data model that implicitly captures the relationships between information provided by separate underlying systems and facilitates the incremental construction of global smart spaces since the underlying systems to be incorporated are largely decoupled. The framework has been evaluated using a prototype that integrates legacy systems and context-aware services for multi-modal urban journey planning and for visualising traffic congestion.  相似文献   

4.
现有时空感知的表示学习框架无法对强时空语义的实际场景存在的“When”、“Where”和“What”3个问题给出一个统一的解决方案。同时,现有的时间和空间建模上的研究方案也存在着一定的缺陷,无法在复杂的实际场景中取得最优的性能。为了解决这些问题,本文提出了一个统一的用户表示框架—GTRL (geography and time aware representation learning),可以同时在时间和空间的维度上对用户的历史行为轨迹进行联合建模。在时间建模上,GTRL采用函数式的时间编码以及连续时间和上下文感知的图注意力网络,在动态的用户行为图上灵活地捕获高阶的结构化时序信息。在空间建模上,GTRL采用了层级化的地理编码和深度历史轨迹建模模块高效地刻画了用户的地理位置偏好。GTRL设计了统一的联合优化方案,同时在交互预测、交互时间预测以及交互位置3个任务上进行模型学习。最后,本文在公开数据集和工业数据集上设计了大量的实验,分别验证了GTRL相较学术界基线模型的优势,以及在实际业务场景中的有效性。  相似文献   

5.
A generative model for modelling maritime vessel behaviour is proposed. The model is a novel variant of the dynamic Bayesian network (DBN). The proposed DBN is in the form of a switching linear dynamic system (SLDS) that has been extended into a larger DBN. The application of synthetic data fabrication of maritime vessel behaviour is considered. Behaviour of various vessels in a maritime piracy situation is simulated. A means to integrate information from context based external factors that influence behaviour is provided. Simulated observations of the vessels kinematic states are generated. The generated data may be used for the purpose of developing and evaluating counter-piracy methods and algorithms. A novel methodology for evaluating and optimising behavioural models such as the proposed model is presented. The log-likelihood, cross entropy, Bayes factor and the Bhattacharyya distance measures are applied for evaluation. The results demonstrate that the generative model is able to model both spatial and temporal datasets.  相似文献   

6.
付治  王红军  李天瑞  滕飞  张继 《软件学报》2020,31(4):981-990
聚类是机器学习领域中的一个研究热点,弱监督学习是半监督学习中一个重要的研究方向,有广泛的应用场景.在对聚类与弱监督学习的研究中,提出了一种基于k个标记样本的弱监督学习框架.该框架首先用聚类及聚类置信度实现了标记样本的扩展.其次,对受限玻尔兹曼机的能量函数进行改进,提出了基于k个标记样本的受限玻尔兹曼机学习模型.最后,完成了对该模型的推理并设计相关算法.为了完成对该框架和模型的检验,选择公开的数据集进行对比实验,实验结果表明,基于k个标记样本的弱监督学习框架实验效果较好.  相似文献   

7.
Complex network is graph network with non-trivial topological features often occurring in real systems, such as video monitoring networks, social networks and sensor networks. While there is growing research study on complex networks, the main focus has been on the analysis and modeling of large networks with static topology. Predicting and control of temporal complex networks with evolving patterns are urgently needed but have been rarely studied. In view of the research gaps we are motivated to propose a novel end-to-end deep learning based network model, which is called temporal graph convolution and attention (T-GAN) for prediction of temporal complex networks. To joint extract both spatial and temporal features of complex networks, we design new adaptive graph convolution and integrate it with Long Short-Term Memory (LSTM) cells. An encoder-decoder framework is applied to achieve the objectives of predicting properties and trends of complex networks. And we proposed a dual attention block to improve the sensitivity of the model to different time slices. Our proposed T-GAN architecture is general and scalable, which can be used for a wide range of real applications. We demonstrate the applications of T-GAN to three prediction tasks for evolving complex networks, namely, node classification, feature forecasting and topology prediction over 6 open datasets. Our T-GAN based approach significantly outperforms the existing models, achieving improvement of more than 4.7% in recall and 25.1% in precision. Additional experiments are also conducted to show the generalization of the proposed model on learning the characteristic of time-series images. Extensive experiments demonstrate the effectiveness of T-GAN in learning spatial and temporal feature and predicting properties for complex networks.  相似文献   

8.
The actual functions of a region may not reflect the intent of the original zoning scheme from planners. To identify the actual urban functional regions, numerous methods have been proposed with computational advancement. Specifically, remote sensing by image recognition, geodemographic classification, social sensing with big data and geo-text mining techniques have been widely applied. Points-of-interest (POIs) are one of the most common open-access data type used to extract information pertaining to functional zones. However, previous works have either lost sight or did not make full use of the spatial interactions that can be extracted from POIs due to model limitations in the context of geographical space. In this research, we introduced an approach that detects functional regions at the scale of a neighborhood area (NA) by combining POI data and a simplified Place2vec model, which is theorized from the first law of geography. First, the POI-based spatial context is constructed by using the nearest neighbor approach. Then, we can increase the number of training tuples (tcenter, tcontext) based on the weight derived from the distance between the POI tcenter and POI tcontext. Next, high-dimensional characteristic vectors of the POIs are extracted by using the skip-gram training framework. By summarizing the POI vectors at the NA level, we employ a K-means clustering model to cluster the functional regions. Compared with other probabilistic topic models (PTMs) and Word2vec, the Place2vec-based approach obtained the highest mean reciprocal rank value (MRR-SWP=0.356, MRR-SLC=0.401, MRR-SJC=0.433, and MRR-SLin=0.421) in terms of similarity capturing performance and functional region identification accuracy (OA=0.7424). The research has important implications to urban planning and governance.  相似文献   

9.
Many problems in machine learning and computer vision consist of predicting multi-dimensional output vectors given a specific set of input features. In many of these problems, there exist inherent temporal and spatial dependencies between the output vectors, as well as repeating output patterns and input–output associations, that can provide more robust and accurate predictors when modeled properly. With this intrinsic motivation, we propose a novel Output-Associative Relevance Vector Machine (OA-RVM) regression framework that augments the traditional RVM regression by being able to learn non-linear input and output dependencies. Instead of depending solely on the input patterns, OA-RVM models output covariances within a predefined temporal window, thus capturing past, current and future context. As a result, output patterns manifested in the training data are captured within a formal probabilistic framework, and subsequently used during inference. As a proof of concept, we target the highly challenging problem of dimensional and continuous prediction of emotions, and evaluate the proposed framework by focusing on the case of multiple nonverbal cues, namely facial expressions, shoulder movements and audio cues. We demonstrate the advantages of the proposed OA-RVM regression by performing subject-independent evaluation using the SAL database that constitutes naturalistic conversational interactions. The experimental results show that OA-RVM regression outperforms the traditional RVM and SVM regression approaches in terms of accuracy of the prediction (evaluated using the Root Mean Squared Error) and structure of the prediction (evaluated using the correlation coefficient), generating more accurate and robust prediction models.  相似文献   

10.
目的 针对视觉目标跟踪(video object tracking,VOT)和视频对象分割(video object segmentation,VOS)问题,研究人员提出了多个多任务处理框架,但是该类框架的精确度和鲁棒性较差。针对此问题,本文提出一个融合多尺度上下文信息和视频帧间信息的实时视觉目标跟踪与视频对象分割多任务的端到端框架。方法 文中提出的架构使用了由空洞深度可分离卷积组成的更加多尺度的空洞空间金字塔池化模块,以及具备帧间信息的帧间掩模传播模块,使得网络对多尺度目标对象分割能力更强,同时具备更好的鲁棒性。结果 本文方法在视觉目标跟踪VOT-2016和VOT-2018数据集上的期望平均重叠率(expected average overlap,EAO)分别达到了0.462和0.408,分别比SiamMask高了0.029和0.028,达到了最先进的结果,并且表现出更好的鲁棒性。在视频对象分割DAVIS(densely annotated video segmentation)-2016和DAVIS-2017数据集上也取得了有竞争力的结果。其中,在多目标对象分割DAVIS-2017数据集上,本文方法比SiamMask有更好的性能表现,区域相似度的杰卡德系数的平均值JM和轮廓精确度的F度量的平均值FM分别达到了56.0和59.0,并且区域和轮廓的衰变值JDFD都比SiamMask中的低,分别为17.9和19.8。同时运行速度为45帧/s,达到了实时的运行速度。结论 文中提出的融合多尺度上下文信息和视频帧间信息的实时视觉目标跟踪与视频对象分割多任务的端到端框架,充分捕捉了多尺度上下文信息并且利用了视频帧间的信息,使得网络对多尺度目标对象分割能力更强的同时具备更好的鲁棒性。  相似文献   

11.
Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods.  相似文献   

12.
13.
目的 在癌症筛查和药物研发等医学研究和诊疗过程中,显微图像中的有丝分裂细胞检测可以提供重要的生物学判据。然而,不同培养条件下图像分布差异明显,且细胞密度逐渐增大导致场景变得复杂,常规预处理方法难以进行有效的区域筛选;不同阶段细胞外观相似、运动过程模糊,现有方法缺乏对区域特征编码的显式监督,容易因为语义区分能力不足导致错误预测。为此,本文提出基于外观和运动模式感知的检测框架,通过两阶段预处理和对细胞状态模式的判别性学习,实现复杂场景下的精准预测。方法 本文方法采用3阶段检测框架:在预处理阶段结合区域分割网络和先验优化算法来充分精简候选区域;在预训练阶段构造基于图像分类和重构的两种辅助任务,为候选区域的外观和运动编码提供直接监督,使编码网络具备对不同细胞状态的语义感知能力;在全模型训练和预测阶段,以预处理得到的候选区域序列作为输入,用预训练的编码网络提取候选区域特征,最终通过时序网络融合序列上下文信息得到细胞检测结果。结果 在C2C12-16数据集上的实验结果表明,本文方法的平均性能达到:验证集精准率85.3%,召回率89.3%,F得分87.2%;测试集精准率86.4%,召回率86.1%,...  相似文献   

14.
为了更有效地利用目标的特征信息,提高目标的跟踪精度和鲁棒性,提出融合显著度时空上下文的超像素跟踪算法.首先对目标上下文区域进行超像素分割,根据运动信息计算目标上下文的运动相关性及特征协方差信息,得到相关性显著度.然后基于贝叶斯框架,在频域构建融合显著度信息的时空上下文模型.再利用联合颜色和纹理的直方图信息计算巴氏系数,更新时空上下文模型.此外,引入尺度金字塔模型,准确估计目标尺度.最后加入低通滤波自适应运动预测模块,在线更新动态模型样本集,使用岭回归方法实现低通滤波的参数在线更新.在公共数据上的实验表明,文中算法在光照变化、背景复杂、目标旋转、机动性高、分辨率低等情况下具有较好的跟踪效果.  相似文献   

15.
风速预测是影响风电场效率和稳定性的重要因素.文中基于风速的时空特征,融合变分模态分解(VMD)和混合深度学习框架进行短期风速预测,即VHSTN (VMD-based hybrid spatio-temporal network).其中,混合深度学习框架由卷积神经网络(CNN)、长短时记忆网络(LSTM)以及自注意力机制(SAM)组成.该算法对原始数据清洗后,采用VMD将多站点风速的时空数据分解为固有模态函数(intrinsic mode functions, IMF)分量,去除风速数据的不稳定性;然后针对各IMF分量,应用底部的CNN抽取空域特征;再用顶层LSTM提取时域特征,之后用SAM通过自适应加权加强对隐藏特征的提取并得到各分量的预测结果;最后合并获得最终预测风速.在数据集WIND上进行实验,并和相关典型算法对比,实验结果表明了该算法的有效性和优越性.  相似文献   

16.
ABSTRACT

Behaviour could be expressed as a set of specific movement patterns in time. An animal's movement or trajectory could characterise its behaviours and provide information about its internal states. Recent advances in GPS-based sensor technologies led to drastic increase in volume of the data collected from animals' movements which enables researchers to analyse and model their behaviours using data-driven methods. However, having compact, discriminative, semantical and independent numerical representations of trajectories as features, is essential for employing the most of available off-the-shelf machine learning and deep learning techniques. Inspired by language processing, the approach presented in this study utilizes Skip-gram model to create contextual vector embeddings or representations of key-points in animal trajectories to be used as input features. Here, a key-point is defined as a location which represents a trajectory segment. It is assumed that these key-points encapsulate contextual information which is attributed to a certain behaviour or specific group of animals with similar behavioural features. So, the vector embeddings could be interpreted as contextual semantical representations of trajectory key-points independent of their spatial coordinates. With these representations, it would be possible to predict likelihood of preceding or subsequent key-points given a context or an internal state, or vice versa. To test this hypothesis, an experiment was conducted on birds' trajectories logged from a seabird species, Streaked Shearwater (Calonectris leucomelas). In this experiment, vector representations of the key-points in birds' trajectories were constructed and optimized using candidate sampling. The experimental results showcased the utility of these vector embeddings in both exploration of Streaked Shearwater trajectory data and improvement of gender-based trajectory classification. In summary, the proposed method provided a novel approach for numerical representation of animal trajectories and, it was illustrated to be semantically more explanatory for analysis as well as being more informative as features for modelling of animal movement data.  相似文献   

17.
18.
In this paper, we present a framework for parsing video events with stochastic Temporal And–Or Graph (T-AOG) and unsupervised learning of the T-AOG from video. This T-AOG represents a stochastic event grammar. The alphabet of the T-AOG consists of a set of grounded spatial relations including the poses of agents and their interactions with objects in the scene. The terminal nodes of the T-AOG are atomic actions which are specified by a number of grounded relations over image frames. An And-node represents a sequence of actions. An Or-node represents a number of alternative ways of such concatenations. The And–Or nodes in the T-AOG can generate a set of valid temporal configurations of atomic actions, which can be equivalently represented as the language of a stochastic context-free grammar (SCFG). For each And-node we model the temporal relations of its children nodes to distinguish events with similar structures but different temporal patterns and interpolate missing portions of events. This makes the T-AOG grammar context-sensitive. We propose an unsupervised learning algorithm to learn the atomic actions, the temporal relations and the And–Or nodes under the information projection principle in a coherent probabilistic framework. We also propose an event parsing algorithm based on the T-AOG which can understand events, infer the goal of agents, and predict their plausible intended actions. In comparison with existing methods, our paper makes the following contributions. (i) We represent events by a T-AOG with hierarchical compositions of events and the temporal relations between the sub-events. (ii) We learn the grammar, including atomic actions and temporal relations, automatically from the video data without manual supervision. (iii) Our algorithm infers the goal of agents and predicts their intents by a top-down process, handles events insertion and multi-agent events, keeps all possible interpretations of the video to preserve the ambiguities, and achieves the globally optimal parsing solution in a Bayesian framework. (iv) The algorithm uses event context to improve the detection of atomic actions, segment and recognize objects in the scene. Extensive experiments, including indoor and out door scenes, single and multiple agents events, are conducted to validate the effectiveness of the proposed approach.  相似文献   

19.
In this paper, we develop a theoretical understanding of multi-sensory knowledge and user context and their inter-relationships. This is used to develop a generic representation framework for multi-sensory knowledge and context. A representation framework for context can have a significant impact on media applications that dynamically adapt to user needs. There are three key contributions of this work: (a) theoretical analysis, (b) representation framework and (c) experimental validation. Knowledge is understood to be a dynamic set of multi-sensory facts with three key properties – multi-sensory, emergent and dynamic. Context is the dynamic subset of knowledge that affects the communication between entities. We develop a graph based, multi-relational representation framework for knowledge, and model its temporal dynamics using a linear dynamical system. Our approach results in a stable and convergent system. We applied our representation framework to a image retrieval system with a large collection of photographs from everyday events. Our experimental validation with the retrieval evaluated against two reference algorithms indicates that our context based approach provides significant gains in real-world usage scenarios.  相似文献   

20.
This paper introduces a new approach to fitting a linear regression model to symbolic interval data. Each example of the learning set is described by a feature vector, for which each feature value is an interval. The new method fits a linear regression model on the mid-points and ranges of the interval values assumed by the variables in the learning set. The prediction of the lower and upper bounds of the interval value of the dependent variable is accomplished from its mid-point and range, which are estimated from the fitted linear regression model applied to the mid-point and range of each interval value of the independent variables. The assessment of the proposed prediction method is based on the estimation of the average behaviour of both the root mean square error and the square of the correlation coefficient in the framework of a Monte Carlo experiment. Finally, the approaches presented in this paper are applied to a real data set and their performance is compared.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号