首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 112 毫秒
1.
A technique for creating and searching a tree of patterns using relative distances is presented. The search is conducted to find patterns which are nearest neighbors of a given test pattern. The structure of the tree is such that the search time is proportional to the distance between the test pattern and its nearest neighbor, which suggests the anomalous possibility that a larger tree, which can be expected on average to contain closer neighbors, can be searched faster than a smaller tree. The technique has been used to recognize OCR digit samples derived from NIST data at an accuracy rate of 97% using a tree of 7,000 patterns  相似文献   

2.
Towards a new approach for mining frequent itemsets on data stream   总被引:1,自引:0,他引:1  
Mining frequent patterns on streaming data is a new challenging problem for the data mining community since data arrives sequentially in the form of continuous rapid streams. In this paper we propose a new approach for mining itemsets. Our approach has the following advantages: an efficient representation of items and a novel data structure to maintain frequent patterns coupled with a fast pruning strategy. At any time, users can issue requests for frequent itemsets over an arbitrary time interval. Furthermore our approach produces an approximate answer with an assurance that it will not bypass user-defined frequency and temporal thresholds. Finally the proposed method is analyzed by a series of experiments on different datasets.  相似文献   

3.
The embedding dimension and the number of nearest neighbors are very important parameters in the prediction of chaotic time series. To reduce the prediction errors and the uncertainties in the determination of the above parameters, a new chaos Bayesian optimal prediction method (CBOPM) is proposed by choosing optimal parameters in the local linear prediction method (LLPM) and improving the prediction accuracy with Bayesian theory. In the new method, the embedding dimension and the number of nearest neighbors are combined as a parameter set. The optimal parameters are selected by mean relative error (MRE) and correlation coefficient (CC) indices according to optimization criteria. Real hydrological time series are taken to examine the new method. The prediction results indicate that CBOPM can choose the optimal parameters adaptively in the prediction process. Compared with several LLPM models, the CBOPM has higher prediction accuracy in predicting hydrological time series.  相似文献   

4.
Case based time series prediction (CTSP) is a machine learning technique to predict the future behavior of the current time series by referring similar old cases. To reduce the cost of the visual prostheses research, we devote to the investigation of predictive performance of CTSP in electrical evoked potential (EEP) prediction instead of doing numerous biological experiments. The heart of CTSP for EEP prediction is a similarity measure of training case for target electrical stimulus by using distance metric. As EEP experimental case consists of the stationary electrical stimulation values and time-varying EEP elicited values, this paper proposes a new distance metric which takes the advantage of point-to-point distance's efficient operation in stationary data and time series distance's high capability in temporal data, called as biased time warp distance (BTWD). In BTWD metric, stimulation set difference (Diff_I) and EEP sequence difference (Diff_II) are calculated respectively, and a time-dependent bias configuration is added to reflect the different influences of Diff_I and Diff_II to the numerical computation of BTWD. Similarity-related adaptation coefficient summation is employed to yield the predictive EEP values at given time point in principle of k nearest neighbors. The proposed predictor using BTWD was empirically tested with data collected from the electrophysiological EEP eliciting experiments. We statistically validated our results by comparing them with other predictor using classical point-to-point distances and time series distances. The empirical results indicated that our proposed method produces superior performance in EEP prediction in terms of predictive accuracy and computational complexity.  相似文献   

5.
Spark批处理应用执行时间预测是指导Spark系统资源分配、应用均衡的关键技术。然而,既有研究对于具有不同运行特征的应用采用统一的预测模型,且预测模型考虑因素较少,降低了预测的准确度。针对上述问题,提出了一种考虑了应用特征差异的Spark批处理应用执行时间预测模型,该模型基于强相关指标对Spark批处理应用执行时间进行分类,对于每一类应用,采用PCA和GBDT算法进行应用执行时间预测。当即席应用到达后,通过判断其所属应用类别并采用相应的预测模型进行执行时间预测。实验结果表明,与采用统一预测模型相比,提出的方法可使得预测结果的均方根误差和平均绝对百分误差平均降低32.1%和33.9%。  相似文献   

6.
Handling collisions among a large number of bodies can be a performance bottleneck in video games and many other real‐time applications. We present a new framework for detecting and resolving collisions using the penetration volume as an interpenetration measure. Given two non‐convex polyhedral bodies, a new sampling paradigm locates their near‐contact configurations in advance, and stores associated contact information in a compact database. At runtime, we retrieve a given configuration's nearest neighbors. By taking advantage of the penetration volume's continuity, cheap geometric methods can use the neighbors to estimate contact information as well as a translational gradient. This results in an extremely fast, geometry‐independent, and trivially parallelizable computation, which constitutes the first global volume‐based collision resolution. When processing multiple collisions simultaneously on a 4‐core processor, the average running cost is as low as 5 μs. Furthermore, no additional proximity or contact‐regions queries are required. These results are orders of magnitude faster than previous penetration volume approaches.  相似文献   

7.
In this paper, we present a fast and versatile algorithm which can rapidly perform a variety of nearest neighbor searches. Efficiency improvement is achieved by utilizing the distance lower bound to avoid the calculation of the distance itself if the lower bound is already larger than the global minimum distance. At the preprocessing stage, the proposed algorithm constructs a lower bound tree (LB-tree) by agglomeratively clustering all the sample points to be searched. Given a query point, the lower bound of its distance to each sample point can be calculated by using the internal node of the LB-tree. To reduce the amount of lower bounds actually calculated, the winner-update search strategy is used for traversing the tree. For further efficiency improvement, data transformation can be applied to the sample and the query points. In addition to finding the nearest neighbor, the proposed algorithm can also (i) provide the k-nearest neighbors progressively; (ii) find the nearest neighbors within a specified distance threshold; and (iii) identify neighbors whose distances to the query are sufficiently close to the minimum distance of the nearest neighbor. Our experiments have shown that the proposed algorithm can save substantial computation, particularly when the distance of the query point to its nearest neighbor is relatively small compared with its distance to most other samples (which is the case for many object recognition problems).  相似文献   

8.
用近邻算法预测通信量时间序列   总被引:3,自引:0,他引:3  
为了对通信系统进行有效的调控,需要对通信量进行预测,而通信量具有在不同日期遵循不同规律的特点。本文采用基于实例的近邻算法进行时间序列预测,并在考虑动态长度序列、序列特征提取和近似样例的选取上做出改进,取得很好的效果。将近邻预测算法应用到广东省电话网智能管理系统(GTNIMS)中,能够为路由求解提供快速、准确的预测话务量,为更精确的求解创造了条件。  相似文献   

9.
基于小波概要的并行数据流聚类   总被引:1,自引:0,他引:1  
许多应用中会连续不断产生大量随时间演变的序列型数据,构成时间序列数据流,如传感器网络、实时股票行情、网络及通信监控等场合.聚类是分析这类并行多数据流的一种有力工具.但数据流长度无限、随时间演变和大数据量的特点,使得传统的聚类方法无法直接应用.利用数据流的遗忘特性,应用离散小波变换,分层、动态地维护每个数据流的概要结构.基于该概要结构,快速计算数据流与聚类中心之间的近似距离,实现了一种适合并行多数据流的K-means聚类方法.所进行的实验验证了该聚类方法的有效性.  相似文献   

10.
A distance-preserving method is presented to map high-dimensional data sequentially to low-dimensional space. It preserves exact distances of each data point to its nearest neighbor and to some other near neighbors. Intrinsic dimensionality of data is estimated by examining the preservation of interpoint distances. The method has no user-selectable parameter. It can successfully project data when the data points are spread among multiple clusters. Results of experiments show its usefulness in projecting high-dimensional data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号