首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Course resolution earth observation satellites offer large data sets with daily observations at global scales. These data sets represent a rich resource that, because of the high acquisition rate, allows the application of time-series analysis methods. To research the application of these time-series analysis methods to large data sets, it is necessary to turn to high-performance computing (HPC) resources and software designs. This article presents an overview of the development of the HiTempo platform, which was designed to facilitate research into time-series analysis of hyper-temporal sequences of satellite image data. The platform is designed to facilitate the exhaustive evaluation and comparison of algorithms, while ensuring that experiments are reproducible. Early results obtained using applications built within the platform are presented. A sample model-based change detection algorithm based on the extended Kalman filter has been shown to achieve a 97% detection success rate on simulated data sets constructed from MODIS time series. This algorithm has also been parallelized to illustrate that an entire sequence of MODIS tiles (415 tiles over 9 years) can be processed in under 19 minutes using 32 processors.  相似文献   

2.
We propose a new similar sequence matching method that efficiently supports variable-length and variable-tolerance continuous query sequences on time-series data stream. Earlier methods do not support variable lengths or variable tolerances adequately for continuous query sequences if there are too many query sequences registered to handle in main memory. To support variable-length query sequences, we use the window construction mechanism that divides long sequences into smaller windows for indexing and searching the sequences. To support variable-tolerance query sequences, we present a new notion of intervaled sequences whose individual entries are an interval of real numbers rather than a real number itself. We also propose a new similar sequence matching method based on these notions, and then, formally prove correctness of the method. In addition, we show that our method has the prematching characteristic, which finds future candidates of similar sequences in advance. Experimental results show that our method outperforms the naive one by 2.6-102.1 times and the existing methods in the literature by 1.4-9.8 times over the entire ranges of parameters tested when the query selectivities are low (<32%), which are practically useful in large database applications.  相似文献   

3.
Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. However, most of existing SCFG-based methods lack explicit phylogenic analysis of homologous RNA sequences, which is probably the reason why these methods are not ideal in practical application. Hence, we present a new SCFG-based method by integrating phylogenic analysis with the newly defined profile SCFG. The method can be summarized as: 1) we define a new profile SCFG, M, to depict consensus secondary structure of multiple RNA sequence alignment; 2) we introduce two distinct hidden Markov models, λ and λ', to perform phylogenic analysis of homologous RNA sequences. Here, λ' is for non-structural regions of the sequence and λ' is for structural regions of the sequence; 3) we merge λ and λ' into M to devise a combined model for prediction of RNA secondary structure. We tested our method on data sets constructed from the Rfam database. The sensitivity and specificity of our method are more accurate than those of the predictions by Pfold.  相似文献   

4.
Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD’s reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena.  相似文献   

5.
This paper presents an evolutionary algorithm for modeling the arrival dates in time-stamped data sequences such as newscasts, e-mails, IRC conversations, scientific journal articles or weblog postings. These models are applied to the detection of buzz (i.e. terms that occur with a higher-than-normal frequency) in them, which has attracted a lot of interest in the online world with the increasing number of periodic content producers. That is why in this paper we have used this kind of online sequences to test our system, though it is also valid for other types of event sequences. The algorithm assigns frequencies (number of events per time unit) to time intervals so that it produces an optimal fit to the data. The optimization procedure is a trade off between accurately fitting the data and avoiding too many frequency changes, thus overcoming the noise inherent in these sequences. This process has been traditionally performed using dynamic programming algorithms, which are limited by memory and efficiency requirements. This limitation can be a problem when dealing with long sequences, and suggests the application of alternative search methods with some degree of uncertainty to achieve tractability, such as the evolutionary algorithm proposed in this paper. This algorithm is able to reach the same solution quality as those classical dynamic programming algorithms, but in a shorter time. We also test different cost functions and propose a new one that yields better fits than the one originally proposed by Kleinberg on real-world data. Finally, several distributions of states for the finite state automata are tested, with the result that an uniform distribution produces much better fits than the geometric distribution also proposed by Kleinberg. We also present a variant of the evolutionary algorithm, which achieves a fast fit of a sequence extended with new data, by taking advantage of the fit obtained for the original subsequence.  相似文献   

6.
In this paper, the concept of a long memory system for forecasting is developed. Pattern modelling and recognition systems are introduced as local approximation tools for forecasting. Such systems are used for matching the current state of the time-series with past states to make a forecast. In the past, this system has been successfully used for forecasting the Santa Fe competition data. In this paper, we forecast the financial indices of six different countries, and compare the results with neural networks on five different error measures. The results show that pattern recognition-based approaches in time-series forecasting are highly accurate, and that these are able to match the performance of advanced methods such as neural networks. Received: 2 April 1998?Received in revised form: 1 February 1999?Accepted: 16 February 1999  相似文献   

7.
时间序列分类问题的算法比较   总被引:8,自引:0,他引:8  
杨一鸣  潘嵘  潘嘉林  杨强  李磊 《计算机学报》2007,30(8):1259-1266
时间序列分类是时间序列数据分析中的重要任务之一.不同于时间序列分析中常用的算法与问题,时间序列分类是要把整个时间序列当作输入,其目的是要赋予这个序列某个离散标记.它比一般分类问题困难,主要在于要分类的时间序列数据不等长,这使得一般的分类算法不能直接应用.即使是等长的时间序列,由于不同序列在相同位置的数值一般不可直接比较,一般的分类算法依然还是不适合直接应用.为了解决这些难点,通常有两种方法:第一,定义合适的距离度量(这里,最常用的距离度量是DTW距离),使得在此度量意义下相近的序列有相同的分类标签,这类方法属于领域无关的方法;第二,首先对时间序列建模(利用序列中前后数据的依赖关系建立模型),再用模型参数组成等长向量来表示每条序列,最后用一般的分类算法进行训练和分类,这类方法属于领域相关的方法.长期以来,研究者往往只倾向于使用其中一种算法,而这两类算法的比较却比较缺乏.文中深入分析了这两类方法,并且分别在不同的合成数据集和实际数据集上比较了两类方法.作者观测到了两类算法在不同因素影响下的性能表现,从而为今后发展新的算法提供了有力依据.  相似文献   

8.
In this article, we investigate multistability of Hopfield neural networks (HNNs) with almost periodic stimuli and continuously distributed delays. By employing the theory of exponential dichotomy and Schauder's fixed point theorem, sufficient conditions are gained for the existence of 2 N almost periodic solutions which lie in invariant regions. Meanwhile, we derive some new criteria for the networks to converge toward these 2 N almost periodic solutions and the domain of attraction is also given. The obtained results are new, general and improve corresponding results existing in previous literature.  相似文献   

9.
尚敬文  王朝坤  辛欣  应翔 《软件学报》2017,28(3):648-662
社区结构是复杂网络的一个重要特征,社区发现对研究网络结构有重要的应用价值.k-均值等经典聚类算法是解决社区发现问题的一类基本方法.然而,在处理网络的高维矩阵时,使用这些经典聚类方法得到的社区往往不够准确.提出一种基于深度稀疏自动编码器的社区发现算法CoDDA,尝试提高使用这些经典方法处理高维邻接矩阵进行社区发现的准确性.首先,提出基于跳数的处理方法,对稀疏的邻接矩阵进行优化处理.得到的相似度矩阵不仅能反映网络拓扑结构中相连节点间的相似关系,同时能反映不相连节点间的相似关系.接着,基于无监督深度学习方法,构建深度稀疏自动编码器,对相似度矩阵进行特征提取,得到低维的特征矩阵.与邻接矩阵相比,特征矩阵对网络拓扑结构有更强的特征表达能力.最后,使用k-均值算法对低维特征矩阵聚类得到社区结构.实验结果显示,与6种典型的社区发现算法相比,CoDDA算法能够发现更准确的社区结构.同时,参数实验结果显示,CoDDA算法发现的社区结构比直接使用高维邻接矩阵的基本k-均值算法发现的社区结构更为准确.  相似文献   

10.
Data from the Advanced Very High Resolution Radiometer (AVHRR) have been used for the detection of fires in various ecosystems throughout the world. In this study the most commonlyused methods have been applied to a time-series of 63 AVHRR day time images for the whole of West Africa for the 1991–1992 dry season. The West African region includes ecosystems ranging from dry Sahelian grasslands to moist tropical forests. Furthermore, these ecosystems show considerable seasonal variability. Existing methods were found to be inadequate for fire detection for the whole region becauseof the spatial and temporal heterogeneity of the region's environments. A number of changes were made to the established methods and the new fire detection procedure was applied to the time-series. Field verification and interpretation of the results in the context of the main ecological divisionsof the regionshowed the new method to give good results for all ecosystems throughout the season. Finally, interpretation of fire counts using a Geographical Information System illustrates how such data can improve our knowledge of fire activity at national and sub-continental scales.  相似文献   

11.
This paper describes a new approach towards the detection of metamorphic computer viruses through the algebraic specification of an assembly language. Metamorphic computer viruses are computer viruses that apply a variety of syntax-mutating, behaviour-preserving metamorphoses to their code in order to defend themselves against static analysis based detection methods. An overview of these metamorphoses is given. Then, in order to identify behaviourally equivalent instruction sequences, the syntax and semantics of a subset of the IA-32 assembly language instruction set is specified formally using OBJ – an algebraic specification formalism and theorem prover based on order-sorted equational logic. The concepts of equivalence and semi-equivalence are given formally, and a means of proving equivalence from semi-equivalence is given. The OBJ specification is shown to be useful for proving the equivalence or semi-equivalence of IA-32 instruction sequences by applying reductions – sequences of equational rewrites in OBJ. These proof methods are then applied to fragments of two different metamorphic computer viruses, Win95/Bistro and Win9x.Zmorph.A, in order to prove their (semi-)equivalence. Finally, the application of these methods to the detection of metamorphic computer viruses in general is discussed.  相似文献   

12.
Erythropoietin (Epo) is a hormone which can be misused as a doping substance. Its detection involves analysis of images containing specific objects (bands), whose position and intensity are critical for doping positivity. Within a research project of the World Anti-Doping Agency (WADA) we are implementing the GASepo software that serves for Epo testing in doping control laboratories worldwide. For identification of the bands we have developed a segmentation procedure based on a sequence of filters. Whereas all true bands are properly segmented, the procedure generates a number of false positives (artefacts). To separate these artefacts we suggested a post-segmentation supervised classification using real-valued geometrical measures of objects. The method is based on a fuzzy modification of Ross Quinlan’s ID3 method, included in the mlf™ software (Machine Learning Framework). It provides a framework that generates fuzzy decision trees, as well as fuzzy sets for input data. Initially used training set of segmented objects has been replaced by a new one prepared by more accurate expertise using the latest release of the GASepo software. The new fuzzy decision trees (FDT) have been generated for a set of five and nine fuzzy sets. The comparison of the results on testing set of segmented objects shows that the classification based on the new FDTs outperforms other classification methods.  相似文献   

13.
Many machine learning problems in natural language processing, transaction-log analysis, or computational biology, require the analysis of variable-length sequences, or, more generally, distributions of variable-length sequences.Kernel methods introduced for fixed-size vectors have proven very successful in a variety of machine learning tasks. We recently introduced a new and general kernel framework, rational kernels, to extend these methods to the analysis of variable-length sequences or more generally distributions given by weighted automata. These kernels are efficient to compute and have been successfully used in applications such as spoken-dialog classification with Support Vector Machines.However, the rational kernels previously introduced in these applications do not fully encompass distributions over alternate sequences. They are based only on the counts of co-occurring subsequences averaged over the alternate paths without taking into accounts information about the higher-order moments of the distributions of these counts.In this paper, we introduce a new family of rational kernels, moment kernels, that precisely exploits this additional information. These kernels are distribution kernels based on moments of counts of strings. We describe efficient algorithms to compute moment kernels and apply them to several difficult spoken-dialog classification tasks. Our experiments show that using the second moment of the counts of n-gram sequences consistently improves the classification accuracy in these tasks.Editors: Dan Roth and Pascale Fung  相似文献   

14.
N-gram analysis for computer virus detection   总被引:2,自引:0,他引:2  
Generic computer virus detection is the need of the hour as most commercial antivirus software fail to detect unknown and new viruses. Motivated by the success of datamining/machine learning techniques in intrusion detection systems, recent research in detecting malicious executables is directed towards devising efficient non-signature-based techniques that can profile the program characteristics from a set of training examples. Byte sequences and byte n-grams are considered to be basis of feature extraction. But as the number of n-grams is going to be very large, several methods of feature selections were proposed in literature. A recent report on use of information gain based feature selection has yielded the best-known result in classifying malicious executables from benign ones. We observe that information gain models the presence of n-gram in one class and its absence in the other. Through a simple example we show that this may lead to erroneous results. In this paper, we describe a new feature selection measure, class-wise document frequency of byte n-grams. We empirically demonstrate that the proposed method is a better method for feature selection. For detection, we combine several classifiers using Dempster Shafer Theory for better classification accuracy instead of using any single classifier. Our experimental results show that such a scheme detects virus program far more efficiently than the earlier known methods.  相似文献   

15.
Mining of periodic patterns in time-series databases is an interesting data mining problem. It can be envisioned as a tool for forecasting and prediction of the future behavior of time-series data. Incremental mining refers to the issue of maintaining the discovered patterns over time in the presence of more items being added into the database. Because of the mostly append only nature of updating time-series data, incremental mining would be very effective and efficient. Several algorithms for incremental mining of partial periodic patterns in time-series databases are proposed and are analyzed empirically. The new algorithms allow for online adaptation of the thresholds in order to produce interactive mining of partial periodic patterns. The storage overhead of the incremental online mining algorithms is analyzed. Results show that the storage overhead for storing the intermediate data structures pays off as the incremental online mining of partial periodic patterns proves to be significantly more efficient than the nonincremental nononline versions. Moreover, a new problem, termed merge mining, is introduced as a generalization of incremental mining. Merge mining can be defined as merging the discovered patterns of two or more databases that are mined independently of each other. An algorithm for merge mining of partial periodic patterns in time-series databases is proposed and analyzed.  相似文献   

16.
M. Abundo  L. Accardi  A. Auricchio 《Calcolo》1992,29(3-4):213-240
A method for generating pseudo-random sequences of d-dimensional vectors is considered; it is based on theergodic theory of periodic orbits in the sense of [2] for unstable dynamical systems such as the hyperbolic automorphisms of the d-dimensional Torus. Since these systems enjoy strong chaotic properties, their orbits are both dense andchaotic in some sense, however the ergodic property holds only for orbits having initial points with irrational coordinates, the remaining ones being periodic. Unfortunately, those orbits are the only ones that a computer is able to generate. Since a pseudo-random sequence in [0,1] d is a long periodic orbit which has chaotic behaviour similar in some sense to the one of aperiodic orbits, in this note, we shall prove lower and upper bounds for the length of the period of orbits of the hyperbolic automorphisms of the d-dimensional Torus, expressed in terms of the (rational) starting point. The algorithms proposed are free of computational error, since they work in integer arithmetic. Surprisingly the elimination of the round off errors turns out in anincrease of the length of the period. Statistical testing and the problem of estimating the discrepancy of the obtained sequences are also treated.  相似文献   

17.
In order to improve the efficiency and accuracy of the previous Obrechkoff method, in this paper we put forward a new kind of P-stable three-step Obrechkoff method of O(h10) for periodic initial-value problems. By using a new structure and an embedded high accurate first-order derivative formula, we can avoid time-consuming iterative calculation to obtain the high-order derivatives. By taking advantage of new trigonometrically-fitting scheme we can make both the main structure and the first-order derivative formula to be P-stable. We apply our new method to three periodic problems and compare it with the previous three Obrechkoff methods. Numerical results demonstrate that our new method is superior over the previous ones in accuracy, efficiency and stability.  相似文献   

18.
Estimating average precision when judgments are incomplete   总被引:2,自引:1,他引:1  
We consider the problem of evaluating retrieval systems with incomplete relevance judgments. Recently, Buckley and Voorhees showed that standard measures of retrieval performance are not robust to incomplete judgments, and they proposed a new measure, bpref, that is much more robust to incomplete judgments. Although bpref is highly correlated with average precision when the judgments are effectively complete, the value of bpref deviates from average precision and from its own value as the judgment set degrades, especially at very low levels of assessment. In this work, we propose three new evaluation measures induced AP, subcollection AP, and inferred AP that are equivalent to average precision when the relevance judgments are complete and that are statistical estimates of average precision when relevance judgments are a random subset of complete judgments. We consider natural scenarios which yield highly incomplete judgments such as random judgment sets or very shallow depth pools. We compare and contrast the robustness of the three measures proposed in this work with bpref for both of these scenarios. Through the use of TREC data, we demonstrate that these measures are more robust to incomplete relevance judgments than bpref, both in terms of how well the measures estimate average precision (as measured with complete relevance judgments) and how well they estimate themselves (as measured with complete relevance judgments). Finally, since inferred AP is the most accurate approximation to average precision and the most robust measure in the presence of incomplete judgments, we provide a detailed analysis of this measure, both in terms of its behavior in theory and its implementation in practice. We gratefully acknowledge the support provided by NSF grants CCF-0418390 and IIS-0534482.  相似文献   

19.
针对类间分布不平衡的时间序列数据的异常检测问题,提出了一种基于深度卷积神经网络的检测方法.首先采用抽样法对不平衡时间序列数据进行预处理;其次,将处理后的时间序列数据转换为尺度一致、时长一致的片段;最后将数据送入具有4层隐藏层结构的卷积神经网络模型中进行异常检测.实验结果表明,所提方法弥补了现存的检测技术由于忽略数据分布的偏斜性而造成的少数类检测精度低的缺点,并通过与现有的时间序列分类方法的比较,验证了所提方法的高效性.  相似文献   

20.
Recordings of spontaneous activity of in vitro neuronal networks reveal various phenomena on different time scales. These include synchronized firing of neurons, bursting events of firing on both cell and network levels, hierarchies of bursting events, etc. These findings suggest that networks’ natural dynamics are self-regulated to facilitate different processes on intervals in orders of magnitude ranging from fractions of seconds to hours. Observing these unique structures of recorded time-series give rise to questions regarding the diversity of the basic elements of the sequences, the information storage capacity of a network and the means of implementing calculations. Due to the complex temporal nature of the recordings, the proper methods of characterizing and quantifying these dynamics are on the time–frequency plane. We thus introduce time-series analysis of neuronal network’s synchronized bursting events applying the wavelet packet decomposition based on the Haar mother-wavelet. We utilize algorithms for optimal tiling of the time–frequency plane to signify the local and global variations within the sequence. New quantifying observables of regularity and complexity are identified based on both the homogeneity and diversity of the tiling (Hulata et al., 2004, Physical Review Letters 92: 198181–198104 ). These observables are demonstrated while exploring the regularity–complexity plane to fulfill the accepted criteria (yet lacking an operational definition) of Effective Complexity. The presented question regarding the sequences’ capacity of information is addressed through applying our observables on recorded sequences, scrambled sequences, artificial sequences produced with similar long-range statistical distributions and on outputs of neuronal models devised to simulate the unique networks’ dynamics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号