首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
While most existing sports video research focuses on detecting event from soccer and baseball etc., little work has been contributed to flexible content summarization on racquet sports video, e.g. tennis, table tennis etc. By taking advantages of the periodicity of video shot content and audio keywords in the racquet sports video, we propose a novel flexible video content summarization framework. Our approach combines the structure event detection method with the highlight ranking algorithm. Firstly, unsupervised shot clustering and supervised audio classification are performed to obtain the visual and audio mid-level patterns respectively. Then, a temporal voting scheme for structure event detection is proposed by utilizing the correspondence between audio and video content. Finally, by using the affective features extracted from the detected events, a linear highlight model is adopted to rank the detected events in terms of their exciting degrees. Experimental results show that the proposed approach is effective.  相似文献   

2.
We describe and analyze a discriminative algorithm for learning to align an audio signal with a given sequence of events that tag the signal. We demonstrate the applicability of our method for the tasks of speech-to-phoneme alignment (ldquoforced alignmentrdquo) and music-to-score alignment. In the first alignment task, the events that tag the speech signal are phonemes while in the music alignment task, the events are musical notes. Our goal is to learn an alignment function whose input is an audio signal along with its accompanying event sequence and its output is a timing sequence representing the actual start time of each event in the audio signal. Generalizing the notion of separation with a margin used in support vector machines for binary classification, we cast the learning task as the problem of finding a vector in an abstract inner-product space. To do so, we devise a mapping of the input signal and the event sequence along with any possible timing sequence into an abstract vector space. Each possible timing sequence therefore corresponds to an instance vector and the predicted timing sequence is the one whose projection onto the learned prediction vector is maximal. We set the prediction vector to be the solution of a minimization problem with a large set of constraints. Each constraint enforces a gap between the projection of the correct target timing sequence and the projection of an alternative, incorrect, timing sequence onto the vector. Though the number of constraints is very large, we describe a simple iterative algorithm for efficiently learning the vector and analyze the formal properties of the resulting learning algorithm. We report experimental results comparing the proposed algorithm to previous studies on speech-to-phoneme and music-to-score alignment, which use hidden Markov models. The results obtained in our experiments using the discriminative alignment algorithm are comparable to results of state-of-the-art systems.  相似文献   

3.

Each year, a huge number of malicious programs are released which causes malware detection to become a critical task in computer security. Antiviruses use various methods for detecting malware, such as signature-based and heuristic-based techniques. Polymorphic and metamorphic malwares employ obfuscation techniques to bypass traditional detection methods used by antiviruses. Recently, the number of these malware has increased dramatically. Most of the previously proposed methods to detect malware are based on high-level features such as opcodes, function calls or program’s control flow graph (CFG). Due to new obfuscation techniques, extracting high-level features is tough, fallible and time-consuming; hence approaches using program’s bytes are quicker and more accurate. In this paper, a novel byte-level method for detecting malware by audio signal processing techniques is presented. In our proposed method, program’s bytes are converted to a meaningful audio signal, then Music Information Retrieval (MIR) techniques are employed to construct a machine learning music classification model from audio signals to detect new and unseen instances. Experiments evaluate the influence of different strategies converting bytes to audio signals and the effectiveness of the method.

  相似文献   

4.
Toward semantic indexing and retrieval using hierarchical audio models   总被引:1,自引:0,他引:1  
Semantic-level content analysis is a crucial issue in achieving efficient content retrieval and management. We propose a hierarchical approach that models the statistical characteristics of audio events over a time series to accomplish semantic context detection. Two stages, audio event and semantic context modeling, are devised to bridge the semantic gap between physical audio features and semantic concepts. In this work, hidden Markov models (HMMs) are used to model four representative audio events, i.e., gunshot, explosion, engine, and car-braking, in action movies. At the semantic-context level, Gaussian mixture models (GMMs) and ergodic HMMs are investigated to fuse the characteristics and correlations between various audio events. They provide cues for detecting gunplay and car-chasing scenes, two semantic contexts we focus on in this work. The promising experimental results demonstrate the effectiveness of the proposed approach and exhibit that the proposed framework provides a foundation in semantic indexing and retrieval. Moreover, the two fusion schemes are compared, and the relations between audio event and semantic context are studied.  相似文献   

5.
This paper describes a means of unsupervised learning of recurring patterns in user activity through patterns in system level events generated by a graphical user interface. Earlier work has shown that using this distillation of the more complex behavioural interaction between the user and the application provides a symbolic representation of knowledge and goals that could be used to imply preference. Although prior research has explored the possibilities of removing this information acquisition bottleneck in such an expert system using ambient monitoring approaches, some have experienced difficulty in dealing with the varying length training sequences and segmentation of the continuous event stream. Unlike previous work the approach documented here handles interactions of varying sizes and is able to recall recurrent patterns in real time irrespective of the number of interactions learned. In addition to describing the proposed approach we also describe the shortcomings of various previously applied machine learning techniques on the same type of data. We also demonstrate a practical implementation of our approach applied to web browser usage.  相似文献   

6.
Microscopic analysis forms an integral part of many scientific studies. It is a task which requires great expertise and care. However, it can often be an extremely repetitive and labourious task. In some cases many hundreds of slides may need to be analysed, a process that will require each slide to be meticulously examined. Machine vision tools could be used to help assist in just such repetitive and tedious tasks. However, many machine vision solutions involve a lengthy data acquisition phase and in many cases result in systems that are highly specialised and not easily adaptable. In this paper, we describe a framework that applies flexible machine vision techniques to microscope analysis and utilises active learning to help overcome the data acquisition and adaptability problems. In particular we investigate the potential of various aspects of our proposed framework on a particular real world microscopic task, the recognition of parasite eggs.  相似文献   

7.
Temporal segmentation of videos into meaningful image sequences containing some particular activities is an interesting problem in computer vision. We present a novel algorithm to achieve this semantic video segmentation. The segmentation task is accomplished through event detection in a frame-by-frame processing setup. We propose using one-class classification (OCC) techniques to detect events that indicate a new segment, since they have been proved to be successful in object classification and they allow for unsupervised event detection in a natural way. Various OCC schemes have been tested and compared, and additionally, an approach based on the temporal self-similarity maps (TSSMs) is also presented. The testing was done on a challenging publicly available thermal video dataset. The results are promising and show the suitability of our approaches for the task of temporal video segmentation.  相似文献   

8.
Sports video annotation is important for sports video semantic analysis such as event detection and personalization. In this paper, we propose a novel approach for sports video semantic annotation and personalized retrieval. Different from the state of the art sports video analysis methods which heavily rely on audio/visual features, the proposed approach incorporates web-casting text into sports video analysis. Compared with previous approaches, the contributions of our approach include the following. 1) The event detection accuracy is significantly improved due to the incorporation of web-casting text analysis. 2) The proposed approach is able to detect exact event boundary and extract event semantics that are very difficult or impossible to be handled by previous approaches. 3) The proposed method is able to create personalized summary from both general and specific point of view related to particular game, event, player or team according to user's preference. We present the framework of our approach and details of text analysis, video analysis, text/video alignment, and personalized retrieval. The experimental results on event boundary detection in sports video are encouraging and comparable to the manually selected events. The evaluation on personalized retrieval is effective in helping meet users' expectations.  相似文献   

9.
Knowledge graph (KG) techniques have achieved successful results in many tasks, especially in semantic web and natural language processing domains. In recent years, representation learning on KG has been successfully applied to e-business applications, such as event-driven automatic investment strategies. However, there is still limited research about learning events’ influence on KG for modern quantitative investment. In this paper, we propose a novel event influence learning framework to predict stock market trends, called ST-Trend, leveraging enterprise knowledge graph to represent company correlation relationships, for mining the deep background knowledge of web events, with three self-supervised learning tasks. In particular, we devise two jointly self-supervised tasks to identify the relations between web events and companies. The first task is for generating ground-truth event-company correlation labels based on the enterprise knowledge graph. The second task is used to train how to identify the correlated companies of an event based on the generated correlation labels, with the encoding of web events, company features, and technical sequential data. We then design the prediction network to infer an event’s influence on stock price trends of the identified correlated companies based on the enterprise KG. Finally, we perform extensive experiments on a massive real-life dataset to validate the effectiveness of our proposed framework, and the experimental results demonstrate its superior performance in predicting stock market trends via considering events’ influences with the enterprise knowledge graph.  相似文献   

10.
In this article we set out to examine whether analysis of the audio from a multimedia surveillance application can be used to augment an event detection system based on visual processing, and possibly contribute to any improvements. In processing audio information we are not concerned with identifying or classifying what type of event is detected as our aim is to keep audio processing to a minimum in order to allow deployment on a wireless sensor network. We describe an experiment where we gathered information from a series of traditional wired microphones installed in a typical surveillance setting. We also obtained information on activities carried out from cameras located in the same area. We present the results of analysis of audio information based on the mean of the volume, the zero-crossing rate, and the frequency and how these correlate with events detected visually. We found that detecting events, based on their volume only, returned satisfactory results. We show the results determined by applying this volume based approach to a range of physical environments.  相似文献   

11.
12.
Audio streams, such as news broadcasting, meeting rooms, and special video comprise sound from an extensive variety of sources. The detection of audio events including speech, coughing, gunshots, etc. leads to intelligent audio event detection (AED). With substantial attention geared to AED for various types of applications, such as security, speech recognition, speaker recognition, home care, and health monitoring, scientists are now more motivated to perform extensive research on AED. The deployment of AED is actually a more complicated task when going beyond exclusively highlighting audio events in terms of feature extraction and classification in order to select the best features with high detection accuracy. To date, a wide range of different detection systems based on intelligent techniques have been utilized to create machine learning-based audio event detection schemes. Nevertheless, the preview study does not encompass any state-of-the-art reviews of the proficiency and significances of such methods for resolving audio event detection matters. The major contribution of this work entails reviewing and categorizing existing AED schemes into preprocessing, feature extraction, and classification methods. The importance of the algorithms and methodologies and their proficiency and restriction are additionally analyzed in this study. This research is expanded by critically comparing audio detection methods and algorithms according to accuracy and false alarms using different types of datasets.  相似文献   

13.
Rare events, especially those that could potentially negatively impact society, often require humans’ decision-making responses. Detecting rare events can be viewed as a prediction task in data mining and machine learning communities. As these events are rarely observed in daily life, the prediction task suffers from a lack of balanced data. In this paper, we provide an in depth review of rare event detection from an imbalanced learning perspective. Five hundred and seventeen related papers that have been published in the past decade were collected for the study. The initial statistics suggested that rare events detection and imbalanced learning are concerned across a wide range of research areas from management science to engineering. We reviewed all collected papers from both a technical and a practical point of view. Modeling methods discussed include techniques such as data preprocessing, classification algorithms and model evaluation. For applications, we first provide a comprehensive taxonomy of the existing application domains of imbalanced learning, and then we detail the applications for each category. Finally, some suggestions from the reviewed papers are incorporated with our experiences and judgments to offer further research directions for the imbalanced learning and rare event detection fields.  相似文献   

14.
"事件"(event)是指在特定时空发生的对人类社会和自然界产生较为明显影响的事情.社会动乱、暴恐事件、传染病大流行等例子是给国家和社会安全带来严重威胁的"事件".如果能够提前对这些事件的发生进行有效预测,将有助于做好应对准备,大大减少不必要的损失,因此事件预测技术在实际中具有重大社会应用价值,能够在社会安全、风险感知...  相似文献   

15.
In the past decade, granular computing (GrC) has been an active topic of research in machine learning and computer vision. However, the granularity division is itself an open and complex problem. Deep learning, at the same time, has been proposed by Geoffrey Hinton, which simulates the hierarchical structure of human brain, processes data from lower level to higher level and gradually composes more and more semantic concepts. The information similarity, proximity and functionality constitute the key points in the original insight of granular computing proposed by Zadeh. Many GrC researches are based on the equivalence relation or the more general tolerance relation, either of which can be described by some distance functions. The information similarity and proximity depended on the samples distribution can be easily described by the fuzzy logic. From this point of view, GrC can be considered as a set of fuzzy logical formulas, which is geometrically defined as a layered framework in a multi-scale granular system. The necessity of such kind multi-scale layered granular system can be supported by the columnar organization of the neocortex. So the granular system proposed in this paper can be viewed as a new explanation of deep learning that simulates the hierarchical structure of human brain. In view of this, a novel learning approach, which combines fuzzy logical designing with machine learning, is proposed in this paper to construct a GrC system to explore a novel direction for deep learning. Unlike those previous works on the theoretical framework of GrC, our granular system is abstracted from brain science and information science, so it can be used to guide the research of image processing and pattern recognition. Finally, we take the task of haze-free as an example to demonstrate that our multi-scale GrC has high ability to increase the texture information entropy and improve the effect of haze-removing.  相似文献   

16.
Automatic audio content recognition has attracted an increasing attention for developing multimedia systems, for which the most popular approaches combine frame-based features with statistic models or discriminative classifiers. The existing methods are effective for clean single-source event detection but may not perform well for unstructured environmental sounds, which have a broad noise-like flat spectrum and a diverse variety of compositions. We present an automatic acoustic scene understanding framework that detects audio events through two hierarchies, acoustic scene recognition and audio event recognition, in which the former is preceded by following dominant audio sources and in turn helps infer non-dominant audio events within the same scene through modeling their occurrence correlations. On the scene recognition hierarchy, we perform adaptive segmentation and feature extraction for every input acoustic scene stream through Eigen-audiospace and an optimized feature subspace, respectively. After filtering background, scene streams are recognized by modeling the observation density of dominant features using a two-level hidden Markov model. On the audio event recognition hierarchy, scene knowledge is characterized by an audio context model that essentially describes the occurrence correlations of dominant and non-dominant audio events within this scene. Monte Carlo integration and gradient descent techniques are employed to maximize the likelihood and correctly tag each audio event. To the best of our knowledge, this is the first work that models event correlations as scene context for robust audio event detection from complex and noisy environments. Note that according to the recent report, the mean accuracy for the acoustic scene classification task by human listeners is only around 71 % on the data collected in office environments from the DCASE dataset. None of the existing methods performs well on all scene categories and the average accuracy of the best performances of the recent 11 methods is 53.8 %. The proposed method averagely achieves an accuracy of 62.3 % on the same dataset. Additionally, we create a 10-CASE dataset by manually collecting 5,250 audio clips of 10 scene types and 21 event categories. Our experimental results on 10-CASE show that the proposed method averagely achieves the enhanced performance of 78.3 %, and the average accuracy of audio event recognition can be effectively improved by capturing dominant audio sources and reasoning non-dominant events from the dominant ones through acoustic context modeling. In the future work, exploring the interactions between acoustic scene recognition and audio event detection, and incorporating other modalities to improve the accuracy are required to further advance the proposed framework.  相似文献   

17.
Machine learning is traditionally formalized and investigated as the study of learning concepts and decision functions from labeled examples, requiring a representation that encodes information about the domain of the decision function to be learned. We are interested in providing a way for a human teacher to interact with an automated learner using natural instructions, thus allowing the teacher to communicate the relevant domain expertise to the learner without necessarily knowing anything about the internal representations used in the learning process. In this paper we suggest to view the process of learning a decision function as a natural language lesson interpretation problem, as opposed to learning from labeled examples. This view of machine learning is motivated by human learning processes, in which the learner is given a lesson describing the target concept directly and a few instances exemplifying it. We introduce a learning algorithm for the lesson interpretation problem that receives feedback from its performance on the final task, while learning jointly (1) how to interpret the lesson and (2) how to use this interpretation to do well on the final task. traditional machine learning by focusing on supplying the learner only with information that can be provided by a task expert. We evaluate our approach by applying it to the rules of the solitaire card game. We show that our learning approach can eventually use natural language instructions to learn the target concept and play the game legally. Furthermore, we show that the learned semantic interpreter also generalizes to previously unseen instructions.  相似文献   

18.
基于深度网络的可学习感受野算法在图像分类中的应用   总被引:1,自引:0,他引:1  
作为图像检索,图像组织和机器人视觉的基本任务,图像分类在计算机视觉和机器学习中受到了广泛的关注.用于目标识别及图像分类的多种基于深度学习的模型同样引发了该领域内的极大兴趣.本文提出了一种取代尺度不变特征变换(SIFT)和方向梯度直方图(HOG)描述子的算法,即利用深度分层结构,按层级学习有效的图像表示,直接从原始像素点学习特征.该方法分别利用K--奇异值分解(K--SVD)和正交匹配追踪(OMP)进行字典训练和编码.此外,本文采用了同时学习分类器和用于池化的感受野方案.实验结果证明,上述算法在目标(Oxford flowers)和事件(UIUC--sports)图像分类测试集中取得了更好的分类性能.  相似文献   

19.
Many events in real world applications are long-lasting events which have certain durations. The temporal relationships among those durable events are often complex. Processing such complex events has become increasingly important in applications of wireless networks. An important issue of complex event processing is to extract patterns from event streams to support decision making in real-time. However, network latencies and machine failures in wireless networks may cause events to be out-of-order. In this work, we analyze the preliminaries of event temporal semantics. A tree-plan model of out-of-order durable events is proposed. A hybrid solution is correspondingly introduced. Extensive experimental studies demonstrate the efficiency of our approach.  相似文献   

20.
Over the last decade, the deep neural networks are a hot topic in machine learning. It is breakthrough technology in processing images, video, speech, text and audio. Deep neural network permits us to overcome some limitations of a shallow neural network due to its deep architecture. In this paper we investigate the nature of unsupervised learning in restricted Boltzmann machine. We have proved that maximization of the log-likelihood input data distribution of restricted Boltzmann machine is equivalent to minimizing the cross-entropy and to special case of minimizing the mean squared error. Thus the nature of unsupervised learning is invariant to different training criteria. As a result we propose a new technique called “REBA” for the unsupervised training of deep neural networks. In contrast to Hinton’s conventional approach to the learning of restricted Boltzmann machine, which is based on linear nature of training rule, the proposed technique is founded on nonlinear training rule. We have shown that the classical equations for RBM learning are a special case of the proposed technique. As a result the proposed approach is more universal in contrast to the traditional energy-based model. We demonstrate the performance of the REBA technique using wellknown benchmark problem. The main contribution of this paper is a novel view and new understanding of an unsupervised learning in deep neural networks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号