共查询到20条相似文献,搜索用时 31 毫秒
1.
《Data & Knowledge Engineering》2008,64(3):919-946
Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. In this article we propose a new dynamic paged and balanced access method for similarity search in metric data sets, named CM-tree (Clustered Metric tree). It fully supports dynamic capabilities of insertions and deletions both of single objects and in bulk. Distinctive from other methods, it is especially designed to achieve a structure of tight and low overlapping clusters via its primary construction algorithms (instead of post-processing), yielding significantly improved performance. Several new methods are introduced to achieve this: a strategy for selecting representative objects of nodes, clustering based node split algorithm and criteria for triggering a node split, and an improved sub-tree pruning method used during search. To facilitate these methods the pairwise distances between the objects of a node are maintained within each node. Results from an extensive experimental study show that the CM-tree outperforms the M-tree and the Slim-tree, improving search performance by up to 312% for I/O costs and 303% for CPU costs. 相似文献
2.
Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. In this article we propose a new dynamic paged and balanced access method for similarity search in metric data sets, named CM-tree (Clustered Metric tree). It fully supports dynamic capabilities of insertions and deletions both of single objects and in bulk. Distinctive from other methods, it is especially designed to achieve a structure of tight and low overlapping clusters via its primary construction algorithms (instead of post-processing), yielding significantly improved performance. Several new methods are introduced to achieve this: a strategy for selecting representative objects of nodes, clustering based node split algorithm and criteria for triggering a node split, and an improved sub-tree pruning method used during search. To facilitate these methods the pairwise distances between the objects of a node are maintained within each node. Results from an extensive experimental study show that the CM-tree outperforms the M-tree and the Slim-tree, improving search performance by up to 312% for I/O costs and 303% for CPU costs. 相似文献
3.
A new version of the RE–EM regression tree method for longitudinal and clustered data is presented. The RE–EM tree is a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods. The RE–EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. The previously-suggested methodology used the CART tree algorithm for tree building, and therefore that RE–EM regression tree method inherits the tendency of CART to split on variables with more possible split points at the expense of those with fewer split points. A revised version of the RE–EM regression tree corrects for this bias by using the conditional inference tree as the underlying tree algorithm instead of CART. Simulation studies show that the new version is indeed unbiased, and has several improvements over the original RE–EM regression tree in terms of prediction accuracy and the ability to recover the correct tree structure. 相似文献
4.
A tree-drawing algorithm that addresses the weaknesses of current approaches to constructing graphical user interfaces is presented. Present algorithms either do not let you draw tree nodes of varying shapes and sizes or they draw such trees in a way that does not produce trees as compact as they could be, which is especially important when diagramming a large system. Also, they cannot reuse layout information when the trees changes, so after every change the layout must be recomputed and the tree redrawn. The main difference between these traditional approaches and the author's approach is that his algorithm is more geometric. Unlike other algorithms, it uses an explicit representation of node and subtree contours, and it stores every contour as a polygon. It has three advantages over traditional algorithms. It allows one to draw trees with nodes of any polygonal shape compactly. The data structure supports insert and delete operations on subtrees. It is simple to implement, yet flexible 相似文献
5.
We study the problem of maintaining a dynamic ordered tree succinctly under updates of the following form: insertion or deletion of a leaf, insertion of a node on an edge (edge subdivision) or deletion of a node with only one child (the child becomes a child of its former grandparent). We allow satellite data of a fixed size to be associated to the nodes of the tree.We support update operations in constant amortized time and support access to satellite data and basic navigation operations in worst-case constant time; the basic navigation operations include parent, first/last-child, previous/next-child. These operations are moving from a node to its parent, leftmost/rightmost child, and its previous and next child respectively.We demonstrate that to efficiently support more extended operations, such as determining the i-th child of a node, rank of a child among its siblings, or size of the subtree rooted at a node, one requires a restrictive pattern for update strategy, for which we propose the finger-update model. In this model, updates are performed at the location of a finger that is only allowed to crawl on the tree between a child and a parent or between consecutive siblings. Under this model, we describe how the named extended operations are performed in worst-case constant time.Previous work on dynamic succinct trees (Munro et al., 2001 [17]; Raman and Rao, 2003 [19]) is mainly restricted to binary trees and achieves poly-logarithmic (Munro et al., 2001 [17]) or “poly-log-log” (Raman and Rao, 2003 [19]) update time under a more restricted model, where updates are performed in traversals starting at the root and ending at the root and queries can be answered when the traversal is completed. A previous result on ordinal trees achieves only sublinear amortized update time and “poly-log-log” query time (Gupta et al., 2007 [11]). More recently, the update time has been improved to O(logn/loglogn) while queries can be performed in O(logn/loglogn) time (Sadakane and Navarro, 2010 [20]). 相似文献
6.
On-line construction of suffix trees 总被引:47,自引:0,他引:47
E. Ukkonen 《Algorithmica》1995,14(3):249-260
An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It always has the suffix tree for the scanned part of the string ready. The method is developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries. Regardless of its quadratic worst case this latter algorithm can be a good practical method when the string is not too long. Another variation of this method is shown to give, in a natural way, the well-known algorithms for constructing suffix automata (DAWGs).This research was supported by the Academy of Finland and by the Alexander von Humboldt Foundation (Germany). 相似文献
7.
《Journal of Systems Architecture》2000,46(10):951-954
We present an algorithm for dynamic reconfiguration from a set of processor nodes connected using a multistage interconnection network into a set of m-ary trees of height h. The algorithm allows parameterization based on the branching factor m, the height of the tree h and bias B and produces a set of isomorphic trees for each value of the bias B. The computation of the identities of the neighbors by the nodes is performed using simple binary operations in parallel. 相似文献
8.
This paper presents a new technique and two algorithms to bulk-load data into multi-way dynamic metric access methods, based on the covering radius of representative elements employed to organize data in hierarchical data structures. 相似文献
9.
Metric indices support efficient similarity searches in metric spaces. This problem is central to many applications, including multimedia databases and repositories handling complex objects. Most metric indices are designed for main memory, and also most of them are static, that is, do not support insertions and deletions of objects. In this paper we introduce new metric indices for secondary memory that support updates, that is, they are dynamic. First, we show how the dynamic and memory-based Dynamic Spatial Approximation Tree (DSAT) can be extended to operate on secondary memory. Second, we design a dynamic and secondary-memory-based version of the static List of Clusters (LC), which performs well on high-dimensional spaces. The new structure is called Dynamic LC (DLC). Finally, we combine the DLC with the in-memory version of DSAT to create a third structure, Dynamic Set of Clusters (DSC), which improves upon the other two in various cases. We compare the new structures with the state of the art, showing that they are competitive and outstand in several scenarios, especially on spaces of medium and high dimensionality. 相似文献
10.
Longitudinal data refer to the situation where repeated observations are available for each sampled object. Clustered data,
where observations are nested in a hierarchical structure within objects (without time necessarily being involved) represent
a similar type of situation. Methodologies that take this structure into account allow for the possibilities of systematic
differences between objects that are not related to attributes and autocorrelation within objects across time periods. A standard
methodology in the statistics literature for this type of data is the mixed effects model, where these differences between
objects are represented by so-called “random effects” that are estimated from the data (population-level relationships are
termed “fixed effects,” together resulting in a mixed effects model). This paper presents a methodology that combines the
structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods.
We apply the resulting estimation method, called the RE-EM tree, to pricing in online transactions, showing that the RE-EM
tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random
effects and regression trees without random effects. We also apply it to a smaller data set examining accident fatalities,
and show that the RE-EM tree strongly outperforms a tree without random effects while performing comparably to a linear model
with random effects. We also perform extensive simulation experiments to show that the estimator improves predictive performance
relative to regression trees without random effects and is comparable or superior to using linear models with random effects
in more general situations. 相似文献
11.
Proposes a distance measure between unrooted and unordered trees based on the strongly structure-preserving mapping (SSPM). SSPM can make correspondences between the vertices of similar substructures of given structures more strictly than previously proposed mappings. The time complexity of computing the distance between trees Ta and T b is O(mb3NaNb), where Na and Nb are the number of vertices in trees Ta and Tb, respectively; ma and m b are the maximum degrees of a vertex in Ta and T b, respectively; and ma⩽mb is assumed. The space complexity of the method is O(NaNb ) 相似文献
12.
13.
A Distance labeling scheme is a type of localized network representation in which short labels are assigned to the vertices, allowing one to infer the distance between any two vertices directly from their labels, without using any additional information sources. As most applications for network representations in general, and distance labeling schemes in particular, concern large and dynamically changing networks, it is of interest to focus on distributed dynamic labeling schemes. The paper considers dynamic weighted trees where the vertices of the trees are fixed but the (positive integral) weights of the edges may change. The two models considered are the edge-dynamic model, where from time to time some edge changes its weight by a fixed quanta, and the increasing-dynamic model in which edge weights can only grow. The paper presents distributed approximate distance labeling schemes for the two dynamic models, which are efficient in terms of the required label size and communication complexity involved in updating the labels following the weight changes. 相似文献
14.
Slorkey A.J. Williams C.K.L. 《IEEE transactions on pattern analysis and machine intelligence》2003,25(7):859-871
This paper describes the position-encoding dynamic tree (PEDT). The PEDT is a probabilistic model for images that improves on the dynamic tree by allowing the positions of objects to play a part in the model. This increases the flexibility of the model over the dynamic tree and allows the positions of objects to be located and manipulated. This paper motivates and defines this form of probabilistic model using the belief network formalism. A structured variational approach for inference and learning in the PEDT is developed, and the resulting variational updates are obtained, along with additional implementation considerations that ensure the computational cost scales linearly in the number of nodes of the belief network. The PEDT model is demonstrated and compared with the dynamic tree and fixed tree. The structured variational learning method is compared with mean field approaches. 相似文献
15.
目前,用户的好友关系及其自身呈现的动态变化趋势,使得基于静态社交关系的推荐算法难以满足现今瞬息万变的世界。为解决准确度较低等问题,提出利用用户购买物品的时序行为挖掘隐式社交关系的方法。首先将隐式社交与相似度算法相融合,其次针对近邻评分的稀疏性,提出改进的近邻评分填补方法,然后使用填补后的近邻评分对模型预测评分进行修正,最后生成预测评分。实验部分采用MovieLens数据集评估提出的方法,并与现存算法作对比分析。结果表明,该算法与传统算法及改进算法相比更稳定,也更有效地预测了目标用户的真实评分。 相似文献
16.
17.
18.
Al-Furajh I. Aluru S. Goil S. Ranka S. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(2):136-148
Multidimensional binary search tree (abbreviated k-d tree) is a popular data structure for the organization and manipulation of spatial data. The data structure is useful in several applications including graph partitioning, hierarchical applications such as molecular dynamics and n-body simulations, and databases. In this paper, we study efficient parallel construction of k-d trees on coarse-grained distributed memory parallel computers. We consider several algorithms for parallel k-d tree construction and analyze them theoretically and experimentally, with a view towards identifying the algorithms that are practically efficient. We have carried out detailed implementations of all the algorithms discussed on the CM-5 and report on experimental results 相似文献
19.
Christos A. Papachristou 《Computers & Electrical Engineering》1986,12(3-4):101-107
This paper presents a technique for implementing carry lookahead addition by iterative tree circuits of fan-in constrained logic. The approach makes use of a recursive construction, the computation tree, for synthesizing an n-bit lookahead adder. The method is particularly effective for moderate values of n and it is in accord with well-known asymptotic bounds on addition. Extensions to modular tree structures for lookahead adders are discussed. 相似文献
20.
In this paper, we propose a new metric for rooted trees with unlabeled vertices based on alignments of nested parenthesis strings. We prove that the time complexity for computing this metric is NP-hard and present a 1.5-approximation algorithm for it. 相似文献