首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
针对SAX方法的某些缺陷,提出基于SAX[8]的VSB(矢量化符号)方法,通过引入最大值,最小值这二个极值分量,将原来的SAX符号转化为具有三个分量的符号矢量,其VSB符号值由各分量的加权和最终确定.由于VSB方法能够比SAX提供更多对时序数据的描述信息,因而在时序分析中能够获得比SAX更精确的结果.大量的实验也证实了它的出色表现.  相似文献   

2.
基于统计特征的时序数据符号化算法   总被引:9,自引:0,他引:9  
为克服SAX(符号聚合近似)算法对时序信息描述不完整的缺陷,提出基于统计特征的时序数据符号化算法,与SAX不同的是,该算法将时序符号看作矢量,而各时序子段的均值和方差则分别作为描述其平均值及发散程度的分量.由于该算法能够比SAX提供更多的描述信息,因而在时序数据挖掘应用中能够获得比SAX更精确的结果.大量的实验也证实了它的出色表现.  相似文献   

3.
DTW(Dynamic Time Warping)算法被广泛应用于序列数据比对,以度量序列间距离,但算法较高的时间复杂度限制了其在长序列比对上的应用。提出基于自适应搜索窗口的序列相似比对算法(ADTW),算法利用分段聚集平均(Piecewise Aggregate Approximation,PAA)策略进行序列抽样得到低精度序列,然后计算低精度序列下的比对路径,并根据低精度距离矩阵上的梯度变化预测路径偏差,限制路径搜索窗口的拓展范围;随后算法逐步提高序列精度,并在搜索窗口内修正路径、计算新的搜索窗口,最终,实现DTW距离和相似比对路径的快速求解。对比FastDTW,ADTW算法在同等度量准确率下提高计算效率约20%,其时间复杂度为[O(n)]。  相似文献   

4.
SAX(symbolic aggregate approximation)是一种符号化的时间序列相似性度量方法,该方法在对时间序列划分时,采用了PAA算法的均值划分,但均分点无法有效描述序列的形态变化,导致序列间对应分段均值相似的情况下,SAX无法有效区分序列之间的相似度.在SAX算法的基础上,提出了基于关键点的SAX改进算法(KP_SAX),该算法的相似性度量公式既可描述时间序列自身数值变化的统计规律,又可描述时间序列形态变化.实验结果表明:KP_SAX虽然部分提高了算法的复杂度,但可在SAX算法无法计算序列相似度的情况下,有效计算各序列间的相似度距离,达到了改进的目的.  相似文献   

5.

In this paper a new exact string-matching algorithm with sub-linear average case complexity has been presented. Unlike other sub-linear string-matching algorithms it never performs more than n text character comparisons while working on a text of length n . It requires only O ( m +σ) extra pre-processing time and space, where m is the length of the pattern and σ is the size of the alphabet.  相似文献   

6.
Time-series discord is widely used in data mining applications to characterize anomalous subsequences in time series. Compared to some other discord search algorithms, the direct search algorithm based on the recurrence plot shows the advantage of being fast and parameter free. The direct search algorithm, however, relies on quasi-periodicity in input time series, an assumption that limits the algorithm’s applicability. In this paper, we eliminate the periodicity assumption from the direct search algorithm by proposing a reference function for subsequences and a new sampling strategy based on the reference function. These measures result in a new algorithm with improved efficiency and robustness, as evidenced by our empirical evaluation.  相似文献   

7.
杨慧  孟凡星 《计算机应用》2012,32(5):1484-1487
鉴于快速存取记录器(QAR)数据是结构非常复杂和数据量大的时间序列数据,直接采用传统的符号聚合近似算法(SAX)对QAR数据进行描述、存储、检索等操作时,不能克服时间序列幅度值伸缩和时间轴漂移等方面的不足。提出了改进的符号聚合近似算法,将快速存取记录器数据划分为起飞、巡航和降落三个阶段,并利用此改进的算法对巡航阶段进行填补,对不同长度的故障模型序列进行有效的相似性搜索。通过实验以及其在飞机故障诊断项目中的应用,证明了其可行性和有效性,从而提高了飞机的排故效率。  相似文献   

8.
通过对多变量时空时间序列中异常的度量,可以从大量时空事件数据中检测出异常的数据部分。与孤立异常数据点检测采用的技术不同,提出了无偏KL散度算法(UKLD)。首先定义了时空时间序列中的异常区间,嵌入时间延迟后用高斯分布来估计检测区间和剩余区间的分布并通过累计和来加快高斯分布的参数估计过程,最后使用无偏KL散度计算区间之间的差异水平,将这种差异水平作为检测区间的异常得分从而得到时空异常区间。仿真分析结果表明,对比HOT SAX算法和RKDE算法,UKLD算法在精度方面更优,能更好地实现时空数据中的异常区间检测。  相似文献   

9.
时间序列数据的特征表示方法是时间序列数据挖掘任务的关键技术,符号聚合近似表示(SAX)是特征表示方法中比较常用的一种。针对SAX算法在各序列段表示符号一致时无法区分时间序列间的相似性这一缺陷,提出了一种基于始末距离的时间序列符号聚合近似表示方法(SAX_SM)。由于时间序列有很强的形态趋势,因此文中提出的方法选用起点和终点来表示各个序列段的形态特征,并使用各序列段的形态特征和表示符号来近似表示时间序列数据,以将其从高维空间映射到低维空间;然后,针对起点和终点构建始末距离来计算两序列段间的形态距离;最后, 结合 始末距离和符号距离定义一种新的距离度量方式,以更客观地度量时间序列间的相似性。理论分析表明,该距离度量满足下界定理。在20组UCR时间序列数据集上的实验表明,所提SAX_SM方法在13个数据集中获得了最高的分类准确率(包含并列最大的),而SAX只在6个数据集中获得了最高的分类准确率(包含并列最大的),因此SAX_SM具有比SAX更优的分类效果。  相似文献   

10.
摘要: 针对传统算法中有关时间序列流不协调子序列计算代价比较高的问题,提出了一种快速发现Top-K不协调子序列的算法。该算法通过特殊的数据结构保留计算结果,避免了大量的重复计算,从而达到降低时间复杂度的目的;同时也通过一定的保留策略只保留有用的计算结果并及时清理无用的计算结果,从而达到降低空间复杂度的目的。实验采用随机数据和真实数据进行算法测试,其结果表明,该算法能显著降低计算量从而实现快速发现Top-K不协调子序列的目的。 关键字: 流时间序列;不协调子序列;实时  相似文献   

11.
Silhouette-based human action recognition using SAX-Shapes   总被引:1,自引:0,他引:1  
Human action recognition is an important problem in Computer Vision. Although most of the existing solutions provide good accuracy results, the methods are often overly complex and computationally expensive, hindering practical applications. In this regard, we introduce the combination of time-series representation for the silhouette and Symbolic Aggregate approXimation (SAX), which we refer to as SAX-Shapes, to address the problem of human action recognition. Given an action sequence, the extracted silhouettes of an actor from every frame are transformed into time series. Each of these time series is then efficiently converted into the symbolic vector: SAX. The set of all these SAX vectors (SAX-Shape) represents the action. We propose a rotation invariant distance function to be used by a random forest algorithm to perform the human action recognition. Requiring only silhouettes of actors, the proposed method is validated on two public datasets. It has an accuracy comparable to the related works and it performs well even in varying rotation.  相似文献   

12.
The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many real-world problems this is not be the case. For example, in astronomy, multi-terabyte time series datasets are the norm. Most current algorithms faced with data which cannot fit in main memory resort to multiple scans of the disk /tape and are thus intractable. In this work we show how one particular definition of unusual time series, the time series discord, can be discovered with a disk aware algorithm. The proposed algorithm is exact and requires only two linear scans of the disk with a tiny buffer of main memory. Furthermore, it is very simple to implement. We use the algorithm to provide further evidence of the effectiveness of the discord definition in areas as diverse as astronomy, web query mining, video surveillance, etc., and show the efficiency of our method on datasets which are many orders of magnitude larger than anything else attempted in the literature.
Dragomir YankovEmail:
  相似文献   

13.
An algorithm for drawing a random sample of size M from the population of size N(M < N) has been proposed. The algorithm has the time complexity of 0(M log2 M) and the space complexity of 0(M),  相似文献   

14.
刘芬  郭躬德 《计算机应用》2013,33(1):192-198
基于关键点的符号化聚合近似(SAX)改进算法(KP_SAX)在SAX的基础上利用关键点对时间序列进行点距离度量,能更有效地计算时间序列的相似性,但对时间序列的模式信息体现不足,仍不能合理地度量时间序列的相似性。针对SAX与KP_SAX存在的缺陷,提出了一种基于SAX的时间序列相似性复合度量方法。综合了点距离和模式距离两种度量,先利用关键点将分段累积近似(PAA)法平均分段进一步细分成各个子分段;再用一个包含此两种距离信息的三元组表示每个子分段;最后利用定义的复合距离度量公式计算时间序列间的相似性,计算结果能更有效地反映时间序列间的差异。实验结果显示,改进方法的时间效率比KP_SAX算法仅降低了0.96%,而在时间序列区分度性能上优于KP_SAX算法和SAX算法。  相似文献   

15.
基于时间序列的模式表示挖掘频繁子模式   总被引:1,自引:0,他引:1  
论文提出了一种基于时间序列的模式表示挖掘时间序列中频繁子模式的算法(TSFSM)。时间序列的模式表示本身就具有压缩数据、保持时间序列基本形态的功能,并且具有一定的除噪能力。在时间序列的模式表示的基础上挖掘其频繁子模式,可以大大提高挖掘的效率和准确性,达到事半功倍的效果。在该算法中,还使用了一定的剪枝策略,使得算法的时间复杂度进一步降低。并且该算法计算简单,实现方便,可以支持时间序列的动态增长。  相似文献   

16.
针对时间序列模体发现算法计算复杂,并且无法发现多实例模体的问题,提出基于子序列全连接和最大团的时间序列模体发现(TSSJMC)算法。首先,使用快速时间序列子序列全连接算法求得所有子序列之间的距离,生成距离矩阵;然后,设置相似性阈值,将距离矩阵转化为邻接矩阵,构造子序列相似图;最后采用最大团搜索算法从相似图中搜索最大团,最大团的顶点对应的时间序列为包含最多实例的模体。在公开的时间序列数据集上进行实验,选用已有的能够发现多实例模体的Brute Force和Random Projection算法作为对比对象,分别从准确性、效率、可扩展性和鲁棒性对TSSJMC算法进行分析并获得了客观的评判结果。实验结果表明,与Random Projection算法相比,TSSJMC算法在效率、可扩展性和鲁棒性法方面均有明显优势;与Brute Force算法相比,TSSJMC算法发现的模体实例数量虽略低,但其效率和可扩展性都优于Brute Force算法。因此,TSSJMC是质量和效率相平衡的算法。  相似文献   

17.
In statistical data mining and spatial statistics, many problems (such as detection and clustering) can be formulated as optimization problems whose objective functions are functions of consecutive subsequences. Some examples are (1) searching for a high activity region in a Bernoulli sequence, (2) estimating an underlying boxcar function in a time series, and (3) locating a high concentration area in a point process. A comprehensive search algorithm always ends up with a high order of computational complexity. For example, if a length-n sequence is considered, the total number of all possible consecutive subsequences is A comprehensive search algorithm requires at least O(n2) numerical operations.

We present a multiscale-approximation-based approach. It is shown that most of the time, this method finds the exact same solution as a comprehensive search algorithm does. The derived multiscale approximation methods (MAMEs) have low complexity: for a length-n sequence, the computational complexity of an MAME can be as low as O(n). Numerical simulations verify these improvements.

The MAME approach is particularly suitable for problems having large size data. One known drawback is that this method does not guarantee the exact optimal solution in every single run. However, simulations show that as long as the underlying subjects possess statistical significance, a MAME finds the optimal solution with probability almost equal to one.  相似文献   


18.
In this paper, we study the knapsack sharing problem (KSP), a variant of the well-known NP-hard single knapsack problem. We propose an exact constructive tree search that combines two complementary procedures: a reduction interval search and a branch and bound. The reduction search has three phases. The first phase applies a polynomial reduction strategy that decomposes the problem into a series of knapsack problems. The second phase is a size reduction strategy that makes the resolution more efficient. The third phase is an interval reduction search that identifies a set of optimal capacities characterizing the knapsack problems. Experimental results provide computational evidence of the better performance of the proposed exact algorithm in comparison to KSPs best exact algorithm, to Cplex and to KSPs latest heuristic approach. Furthermore, they emphasize the importance of the reduction strategies.  相似文献   

19.
Dynamic Time Warping (DTW) is a popular method for measuring the similarity of time series. It is widely used in various domains. A major drawback of DTW is that it has a high computational complexity. To address this problem, pruning techniques to calculate the exact DTW distance, as well as DTW approximation methods, have become important approaches. In this paper, we introduce Blocked Dynamic Time Warping (BDTW), a new similarity measure which works on run-length encoded time series representation. BDTW utilizes any repetitive values (zero and nonzero) in time series to reduce DTW computation time. BDTW closely approximates DTW distance, and it is significantly faster than traditional DTW for time series with high levels of value repetition. Moreover, BDTW can be combined with time series representation methods which provide constant segments, to serve as a close approximation method even for the time series without value repetition. Constrained BDTW, BDTW upper bound and BDTW lower bound are discussed as variations of BDTW. BDTW upper bound and BDTW lower bound are presented as a new DTW upper bound and lower bound which can be efficiently applied on time series with high levels of value repetition for pruning unhopeful alignments and matches in the exact DTW calculation. We show the effectiveness of BDTW and its variations on different applications using the following datasets: Almanac of Minutely Power, Refit Smart Homes, as well as the 85 datasets from the University of California, Riverside time series classification archive (UCR archive).  相似文献   

20.
符号化表示是一种有效的时间序列降维技术,其相似性度量是诸多挖掘任务的基础。基于SAX(sym-bolic aggregate approximation)的距离MINDIST_PAA_iSAX不满足对称性,在时间序列挖掘中具有局限性,提出了对称的度量Sym_PAA_SAX,且下界于欧拉距离。在真实数据集和合成数据集上的实验说明下界紧密性较好,相似搜索错报率较低。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号