首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
4.
5.
For the first time, a genetic framework using contextual knowledge is proposed for segmentation and recognition of unconstrained handwritten numeral strings. New algorithms have been developed to locate feature points on the string image, and to generate possible segmentation hypotheses. A genetic representation scheme is utilized to show the space of all segmentation hypotheses (chromosomes). For the evaluation of segmentation hypotheses, a novel evaluation scheme is introduced, in order to improve the outlier resistance of the system. Our genetic algorithm tries to search and evolve the population of segmentation hypotheses, and to find the one with the highest segmentation/recognition confidence. The NIST NSTRING SD19 and CENPARMI databases were used to evaluate the performance of our proposed method. Our experiments showed that proper use of contextual knowledge in segmentation, evaluation and search greatly improves the overall performance of the system. On average, our system was able to obtain correct recognition rates of 95.28% and 96.42% on handwritten numeral strings using neural network and support vector classifiers, respectively. These results compare favorably with the ones reported in the literature.  相似文献   

6.
Image and video analysis requires rich features that can characterize various aspects of visual information. These rich features are typically extracted from the pixel values of the images and videos, which require huge amount of computation and seldom useful for real-time analysis. On the contrary, the compressed domain analysis offers relevant information pertaining to the visual content in the form of transform coefficients, motion vectors, quantization steps, coded block patterns with minimal computational burden. The quantum of work done in compressed domain is relatively much less compared to pixel domain. This paper aims to survey various video analysis efforts published during the last decade across the spectrum of video compression standards. In this survey, we have included only the analysis part, excluding the processing aspect of compressed domain. This analysis spans through various computer vision applications such as moving object segmentation, human action recognition, indexing, retrieval, face detection, video classification and object tracking in compressed videos.  相似文献   

7.
Conditional models for contextual human motion recognition   总被引:1,自引:0,他引:1  
We describe algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random fields (CRFs) and maximum entropy Markov models (MEMMs). Existing approaches to this problem typically use generative structures like the hidden Markov model (HMM). Therefore, they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate rich overlapping features of the observation or long-term contextual dependencies among observations at multiple timesteps. This makes them prone to myopic failures in recognizing many human motions, because even the transition between simple human activities naturally has temporal segments of ambiguity and overlap. The correct interpretation of these sequences requires more holistic, contextual decisions, where the estimate of an activity at a particular timestep could be constrained by longer windows of observations, prior and even posterior to that timestep. This would not be computationally feasible with a HMM which requires the enumeration of a number of observation sequences exponential in the size of the context window. In this work we follow a different philosophy: instead of restrictively modeling the complex image generation process – the observation, we work with models that can unrestrictedly take it as an input, hence condition on it. Conditional models like the proposed CRFs seamlessly represent contextual dependencies and have computationally attractive properties: they support efficient, exact recognition using dynamic programming, and their parameters can be learned using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show not only how these can successfully classify diverse human activities like walking, jumping, running, picking or dancing, but also how they can discriminate among subtle motion styles like normal walks and wander walks.  相似文献   

8.
9.
Multimedia Tools and Applications - According to the rapid spread of multimedia data and online observations by users, the importance of researching on machine vision also, analyzing and automatic...  相似文献   

10.
11.
Segmentation and recognition of continuous gestures are challenging due to spatio-temporal variations and endpoint localization issues. A novel multi-scale Gesture Model is presented here as a set of 3D spatio-temporal surfaces of a time-varying contour. Three approaches, which differ mainly in endpoint localization, are proposed: the first uses a motion detection strategy and multi-scale search to find the endpoints; the second uses Dynamic Time Warping to roughly locate the endpoints before a fine search is carried out; the last approach is based on Dynamic Programming. Experimental results on two arm and single hand gestures show that all three methods achieve high recognition rates, ranging from 88% to 96% for the two arm test, with the last method performing best.  相似文献   

12.

In the last few decades, information security has gained huge importance owing to the massive growth in digital communication; hence, driving steganography to the forefront for secure communication. Steganography is a practice of concealing information or message in covert communication which involves hiding the information in any multimedia file such as text, image, or video. Many contributions have been made in the domain of image steganography; however, due to the low embedding capacity and robustness of images; videos are gaining more attention of academic researchers. This paper aims to provide a qualitative as well as quantitative analysis of various video steganography techniques by highlighting their properties, challenges, pros, and cons. Moreover, different quality metrics for the evaluation of distinct steganography techniques have also been discussed. The paper also provides an overview of steganalysis attacks which are commonly employed to test the security of the steganography techniques. The experimental analysis of some of the prominent techniques using different quality metrics has also been done. This paper also presented a critical analysis driven from the literature and the experimental results. The primary objective of this paper is to help the beginners to understand the basic concepts of this research domain to initiate their research in this field. Further, the paper highlighted the real-life applications of video steganography and also suggested some future directions which require the attention of the research community.

  相似文献   

13.

Human activity recognition (HAR) essentially uses (past) sensor data or complex context information for inferring the activities a user performs in his daily tasks. HAR has been extensively studied using different paradigms, such as different reasoning mechanisms, including probabilistic, rule-based, statistical, logical reasoning, or the machine learning (ML) paradigm, to construct inference models to recognize or predict user activities. ML for HAR allows that activities can be recognized and even anticipated through the analysis of collected data from different sensors, with greater accuracy than the other paradigms. On the other hand, context-aware middlewares (CAMs) can efficiently integrate a large number of different devices and sensors. Moreover, they provide a programmable and auto-configurable infrastructure for streamline the design and construction of software solutions in scenarios where lots of sensors and data are their bases, such as ambient intelligence, smart cities, and e-health domains. In this way, the full integration of ML capabilities as services in CAMs can advance the development of software solutions in these domains when ML is necessary, specially for HAR, which is the basis for many scenarios in these domains. In this work, we present a survey for identifying the state-of-the-art in using ML for HAR in CAMs through a systematic literature review (SLR). In our SLR, we worked to answer four research questions: (i) what are the different types of context reasoners available in CAMs; (ii) what are the ML algorithms and methods used for generating models for context reasoning; (iii) which CAMs support data processing in real time; and (iv) what are the HAR scenarios usually tackled by the research works. In our analysis, we observed that, although ML offers viable approaches to construct inference models for HAR using different ML approaches, including batch learning, adaptive learning and data stream learning, there are yet some gaps and research challenges to be tackled, specially on the use of data stream learning considering concept drift on data, mechanisms for adapting the inference models, and further considering all of this as services in CAMs, specially for HAR.

  相似文献   

14.
A survey on vision-based human action recognition   总被引:10,自引:0,他引:10  
Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human–computer interaction. The task is challenging due to variations in motion performance, recording settings and inter-personal differences. In this survey, we explicitly address these challenges. We provide a detailed overview of current advances in the field. Image representations and the subsequent classification process are discussed separately to focus on the novelties of recent research. Moreover, we discuss limitations of the state of the art and outline promising directions of research.  相似文献   

15.
南静  宁传峰  建中华  代伟 《控制与决策》2023,38(6):1541-1550
针对智能手机受限的计算和存储环境等问题,提出基于流形正则化和QR分解的轻量级随机配置网络人体行为识别模型.首先,利用流形正则化解决输入数据被随机映射到SCNs隐含层空间后出现难以预测的非线性分布问题,以提升模型结构的轻量性;其次,采用QR分解降低输出权值计算复杂度,进一步提高模型建模过程的轻量性;最后,在两个人体行为识别数据集上评估所提出模型在模型识别精度和轻量性方面的有效性.实验结果表明,与SCNs、CNN等相比,所提出模型对于人体行为识别问题不仅可以实现识别精度的提高,还能有效降低计算复杂度,提高模型结构的紧致性.  相似文献   

16.
A hybrid algorithm for text recognition is described. The algorithm combines a Markov method with a dictionary method. Experiments were conducted with the hybrid algorithm using as data texts of different nature comprising handprinted letters of varying quality. The performance and computational complexity of the algorithm are deliberated. The computational complexity of the hybrid algorithm is shown to be less than that for the Predictor-Corrector Algorithm,(1) for similar performance.  相似文献   

17.
Neural Computing and Applications - This work is motivated by the tremendous achievement of deep learning models for computer vision tasks, particularly for human activity recognition. It is...  相似文献   

18.
Most researches on human activity recognition do not take into account the temporal localization of actions. In this paper, a new method is designed to model both actions and their temporal domains. This method is based on a new Hough method which outperforms previous published ones on honeybee dataset thanks to a deeper optimization of the Hough variables. Experiments are performed to select skeleton features adapted to this method and relevant to capture human actions. With these features, our pipeline improves state-of-the-art performances on TUM dataset and outperforms baselines on several public datasets.  相似文献   

19.
为了在视频图像中进行字幕信息的实时提取,提出了一套简捷而有效的方法。首先进行文字事件检测,然后进行边缘检测、阈值计算和边缘尺寸限制,最后依据文字像素密度范围进一步滤去非文字区域的视频字幕,提出的叠加水平和垂直方向边缘的方法,加强了检测到的文字的边缘;对边缘进行尺寸限制过滤掉了不符合文字尺寸的边缘。应用投影法最终确定视频字幕所在区域。最后,利用OCR识别技术对提取出来的文字区域进行识别,完成视频中文字的提取。以上方法的结合保证了提出算法的正确率和鲁棒性。  相似文献   

20.
To provide more sophisticated healthcare services, it is necessary to collect the precise information on a patient. One impressive area of study to obtain meaningful information is human activity recognition, which has proceeded through the use of supervised learning techniques in recent decades. Previous studies, however, have suffered from generating a training dataset and extending the number of activities to be recognized. In this paper, to find out a new approach that avoids these problems, we propose unsupervised learning methods for human activity recognition, with sensor data collected from smartphone sensors even when the number of activities is unknown. Experiment results show that the mixture of Gaussian exactly distinguishes those activities when the number of activities k is known, while hierarchical clustering or DBSCAN achieve above 90% accuracy by obtaining k based on Caliński–Harabasz index, or by choosing appropriate values for ɛ and MinPts when k is unknown. We believe that the results of our approach provide a way of automatically selecting an appropriate value of k at which the accuracy is maximized for activity recognition, without the generation of training datasets by hand.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号