期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

CM-tree: A dynamic clustered index for similarity search in metric databases

《Data & Knowledge Engineering》2008,64(3):919-946

Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. In this article we propose a new dynamic paged and balanced access method for similarity search in metric data sets, named CM-tree (Clustered Metric tree). It fully supports dynamic capabilities of insertions and deletions both of single objects and in bulk. Distinctive from other methods, it is especially designed to achieve a structure of tight and low overlapping clusters via its primary construction algorithms (instead of post-processing), yielding significantly improved performance. Several new methods are introduced to achieve this: a strategy for selecting representative objects of nodes, clustering based node split algorithm and criteria for triggering a node split, and an improved sub-tree pruning method used during search. To facilitate these methods the pairwise distances between the objects of a node are maintained within each node. Results from an extensive experimental study show that the CM-tree outperforms the M-tree and the Slim-tree, improving search performance by up to 312% for I/O costs and 303% for CPU costs. 相似文献

2.

CM-tree: A dynamic clustered index for similarity search in metric databases

Lior Israel 《Data & Knowledge Engineering》2007,63(3):919-946

Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. In this article we propose a new dynamic paged and balanced access method for similarity search in metric data sets, named CM-tree (Clustered Metric tree). It fully supports dynamic capabilities of insertions and deletions both of single objects and in bulk. Distinctive from other methods, it is especially designed to achieve a structure of tight and low overlapping clusters via its primary construction algorithms (instead of post-processing), yielding significantly improved performance. Several new methods are introduced to achieve this: a strategy for selecting representative objects of nodes, clustering based node split algorithm and criteria for triggering a node split, and an improved sub-tree pruning method used during search. To facilitate these methods the pairwise distances between the objects of a node are maintained within each node. Results from an extensive experimental study show that the CM-tree outperforms the M-tree and the Slim-tree, improving search performance by up to 312% for I/O costs and 303% for CPU costs. 相似文献

3.

Unbiased regression trees for longitudinal and clustered data

《Computational statistics & data analysis》2015

A new version of the RE–EM regression tree method for longitudinal and clustered data is presented. The RE–EM tree is a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods. The RE–EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. The previously-suggested methodology used the CART tree algorithm for tree building, and therefore that RE–EM regression tree method inherits the tendency of CART to split on variables with more possible split points at the expense of those with fewer split points. A revised version of the RE–EM regression tree corrects for this bias by using the conditional inference tree as the underlying tree algorithm instead of CART. Simulation studies show that the new version is indeed unbiased, and has several improvements over the original RE–EM regression tree in terms of prediction accuracy and the ability to recover the correct tree structure. 相似文献

4.

Drawing dynamic trees

Moen S. 《Software, IEEE》1990,7(4):21-28

A tree-drawing algorithm that addresses the weaknesses of current approaches to constructing graphical user interfaces is presented. Present algorithms either do not let you draw tree nodes of varying shapes and sizes or they draw such trees in a way that does not produce trees as compact as they could be, which is especially important when diagramming a large system. Also, they cannot reuse layout information when the trees changes, so after every change the layout must be recomputed and the tree redrawn. The main difference between these traditional approaches and the author's approach is that his algorithm is more geometric. Unlike other algorithms, it uses an explicit representation of node and subtree contours, and it stores every contour as a polygon. It has three advantages over traditional algorithms. It allows one to draw trees with nodes of any polygonal shape compactly. The data structure supports insert and delete operations on subtrees. It is simple to implement, yet flexible 相似文献

5.

Succinct representation of dynamic trees

Arash Farzan J. Ian Munro 《Theoretical computer science》2011,412(24):2668-2678

We study the problem of maintaining a dynamic ordered tree succinctly under updates of the following form: insertion or deletion of a leaf, insertion of a node on an edge (edge subdivision) or deletion of a node with only one child (the child becomes a child of its former grandparent). We allow satellite data of a fixed size to be associated to the nodes of the tree.We support update operations in constant amortized time and support access to satellite data and basic navigation operations in worst-case constant time; the basic navigation operations include parent, first/last-child, previous/next-child. These operations are moving from a node to its parent, leftmost/rightmost child, and its previous and next child respectively.We demonstrate that to efficiently support more extended operations, such as determining the i-th child of a node, rank of a child among its siblings, or size of the subtree rooted at a node, one requires a restrictive pattern for update strategy, for which we propose the finger-update model. In this model, updates are performed at the location of a finger that is only allowed to crawl on the tree between a child and a parent or between consecutive siblings. Under this model, we describe how the named extended operations are performed in worst-case constant time.Previous work on dynamic succinct trees (Munro et al., 2001 [17]; Raman and Rao, 2003 [19]) is mainly restricted to binary trees and achieves poly-logarithmic (Munro et al., 2001 [17]) or “poly-log-log” (Raman and Rao, 2003 [19]) update time under a more restricted model, where updates are performed in traversals starting at the root and ending at the root and queries can be answered when the traversal is completed. A previous result on ordinal trees achieves only sublinear amortized update time and “poly-log-log” query time (Gupta et al., 2007 [11]). More recently, the update time has been improved to O(logn/loglogn) while queries can be performed in O(logn/loglogn) time (Sadakane and Navarro, 2010 [20]). 相似文献

6.

On-line construction of suffix trees 总被引：47，自引：0，他引：47

E. Ukkonen 《Algorithmica》1995,14(3):249-260

An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It always has the suffix tree for the scanned part of the string ready. The method is developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries. Regardless of its quadratic worst case this latter algorithm can be a good practical method when the string is not too long. Another variation of this method is shown to give, in a natural way, the well-known algorithms for constructing suffix automata (DAWGs).This research was supported by the Academy of Finland and by the Alexander von Humboldt Foundation (Germany). 相似文献

7.

Parameterization of efficient dynamic reconfigurable trees

《Journal of Systems Architecture》2000,46(10):951-954

We present an algorithm for dynamic reconfiguration from a set of processor nodes connected using a multistage interconnection network into a set of m-ary trees of height h. The algorithm allows parameterization based on the branching factor m, the height of the tree h and bias B and produces a set of isomorphic trees for each value of the bias B. The computation of the identities of the neighbors by the nodes is performed using simple binary operations in parallel. 相似文献

8.

Efficient bulk-loading on dynamic metric access methods

Thiago G. Vespa Caetano Traina Jr Agma J. Traina 《Information Systems》2010

This paper presents a new technique and two algorithms to bulk-load data into multi-way dynamic metric access methods, based on the covering radius of representative elements employed to organize data in hierarchical data structures. 相似文献

9.

New dynamic metric indices for secondary memory

《Information Systems》2016

Metric indices support efficient similarity searches in metric spaces. This problem is central to many applications, including multimedia databases and repositories handling complex objects. Most metric indices are designed for main memory, and also most of them are static, that is, do not support insertions and deletions of objects. In this paper we introduce new metric indices for secondary memory that support updates, that is, they are dynamic. First, we show how the dynamic and memory-based Dynamic Spatial Approximation Tree (DSAT) can be extended to operate on secondary memory. Second, we design a dynamic and secondary-memory-based version of the static List of Clusters (LC), which performs well on high-dimensional spaces. The new structure is called Dynamic LC (DLC). Finally, we combine the DLC with the in-memory version of DSAT to create a third structure, Dynamic Set of Clusters (DSC), which improves upon the other two in various cases. We compare the new structures with the state of the art, showing that they are competitive and outstand in several scenarios, especially on spaces of medium and high dimensionality. 相似文献

10.

RE-EM trees: a data mining approach for longitudinal and clustered data

Rebecca J. Sela Jeffrey S. Simonoff 《Machine Learning》2012,86(2):169-207

Longitudinal data refer to the situation where repeated observations are available for each sampled object. Clustered data, where observations are nested in a hierarchical structure within objects (without time necessarily being involved) represent a similar type of situation. Methodologies that take this structure into account allow for the possibilities of systematic differences between objects that are not related to attributes and autocorrelation within objects across time periods. A standard methodology in the statistics literature for this type of data is the mixed effects model, where these differences between objects are represented by so-called “random effects” that are estimated from the data (population-level relationships are termed “fixed effects,” together resulting in a mixed effects model). This paper presents a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods. We apply the resulting estimation method, called the RE-EM tree, to pricing in online transactions, showing that the RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. We also apply it to a smaller data set examining accident fatalities, and show that the RE-EM tree strongly outperforms a tree without random effects while performing comparably to a linear model with random effects. We also perform extensive simulation experiments to show that the estimator improves predictive performance relative to regression trees without random effects and is comparable or superior to using linear models with random effects in more general situations. 相似文献

11.

A metric between unrooted and unordered trees and its bottom-upcomputing method

Tanaka E. 《IEEE transactions on pattern analysis and machine intelligence》1994,16(12):1233-1238

Proposes a distance measure between unrooted and unordered trees based on the strongly structure-preserving mapping (SSPM). SSPM can make correspondences between the vertices of similar substructures of given structures more strictly than previously proposed mappings. The time complexity of computing the distance between trees T_a and T _b is O(m_b³N_aN_b), where N_a and N_b are the number of vertices in trees T_a and T_b, respectively; m_a and m _b are the maximum degrees of a vertex in T_a and T _b, respectively; and m_a⩽m_b is assumed. The space complexity of the method is O(N_aN_b) 相似文献

12.

A data structure for dynamic trees

Daniel D. Sleator Robert Endre Tarjan 《Journal of Computer and System Sciences》1983,26(3):362-391

相似文献

13.

Labeling schemes for weighted dynamic trees

Amos Korman David Peleg 《Information and Computation》2007,205(12):1721-1740

A Distance labeling scheme is a type of localized network representation in which short labels are assigned to the vertices, allowing one to infer the distance between any two vertices directly from their labels, without using any additional information sources. As most applications for network representations in general, and distance labeling schemes in particular, concern large and dynamically changing networks, it is of interest to focus on distributed dynamic labeling schemes. The paper considers dynamic weighted trees where the vertices of the trees are fixed but the (positive integral) weights of the edges may change. The two models considered are the edge-dynamic model, where from time to time some edge changes its weight by a fixed quanta, and the increasing-dynamic model in which edge weights can only grow. The paper presents distributed approximate distance labeling schemes for the two dynamic models, which are efficient in terms of the required label size and communication complexity involved in updating the labels following the weight changes. 相似文献

14.

Image modeling with position-encoding dynamic trees

Slorkey A.J. Williams C.K.L. 《IEEE transactions on pattern analysis and machine intelligence》2003,25(7):859-871

This paper describes the position-encoding dynamic tree (PEDT). The PEDT is a probabilistic model for images that improves on the dynamic tree by allowing the positions of objects to play a part in the model. This increases the flexibility of the model over the dynamic tree and allows the positions of objects to be located and manipulated. This paper motivates and defines this form of probabilistic model using the belief network formalism. A structured variational approach for inference and learning in the PEDT is developed, and the resulting variational updates are obtained, along with additional implementation considerations that ensure the computational cost scales linearly in the number of nodes of the belief network. The PEDT model is demonstrated and compared with the dynamic tree and fixed tree. The structured variational learning method is compared with mean field approaches. 相似文献

15.

融合动态社交关系的距离度量推荐算法

葛君伟岁飒《计算机应用研究》2020,37(12):3603-3606

目前,用户的好友关系及其自身呈现的动态变化趋势,使得基于静态社交关系的推荐算法难以满足现今瞬息万变的世界。为解决准确度较低等问题,提出利用用户购买物品的时序行为挖掘隐式社交关系的方法。首先将隐式社交与相似度算法相融合,其次针对近邻评分的稀疏性,提出改进的近邻评分填补方法,然后使用填补后的近邻评分对模型预测评分进行修正,最后生成预测评分。实验部分采用MovieLens数据集评估提出的方法,并与现存算法作对比分析。结果表明,该算法与传统算法及改进算法相比更稳定,也更有效地预测了目标用户的真实评分。相似文献

16.

基因的无树评估

下载免费PDF全文

李川唐常杰陈瑜代术成邱江涛罗谦朱军《计算机工程与应用》2008,44(14):80-84

传统基于表达式树ET的基因评估从性能角度讲主要缺点是：重复遍历表达式树和进行大量重复计算。提出基于Scale的基因评估。变量矩阵用来避免基因评估中的重复计算。实验表明,在绝大多数数据分布下和参数选择情况下：基于Scale的基因评估较基于 ET 的基因评估快 3~5 倍。相似文献

17.

无表达式树的基因表达 总被引：1，自引：0，他引：1

李川唐常杰陈瑜邱江涛罗谦吴江朱军《计算机应用》2008,28(5):1319-1321

传统基因表达式编程（GEP）进行基因表达时,动态生成释放大量表达式树（ET）,成为GEP的性能瓶颈。提出基于Scale的基因表达算法解决此问题。Scale是一个约束线性表。由于Scale支持一次创建,永久使用,而无须更多维护。所以把基因表达至Scale中避免了ET的动态生成和释放。实验结果表明,在多数情况中,基于Scale的基因表达较基于ET的基因表达快6~10 倍。相似文献

18.

Parallel construction of multidimensional binary search trees

Al-Furajh I. Aluru S. Goil S. Ranka S. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(2):136-148

Multidimensional binary search tree (abbreviated k-d tree) is a popular data structure for the organization and manipulation of spatial data. The data structure is useful in several applications including graph partitioning, hierarchical applications such as molecular dynamics and n-body simulations, and databases. In this paper, we study efficient parallel construction of k-d trees on coarse-grained distributed memory parallel computers. We consider several algorithms for parallel k-d tree construction and analyze them theoretically and experimentally, with a view towards identifying the algorithms that are practically efficient. We have carried out detailed implementations of all the algorithms discussed on the CM-5 and report on experimental results 相似文献

19.

Iterative construction of binary lookahead addition trees

Christos A. Papachristou 《Computers & Electrical Engineering》1986,12(3-4):101-107

This paper presents a technique for implementing carry lookahead addition by iterative tree circuits of fan-in constrained logic. The approach makes use of a recursive construction, the computation tree, for synthesizing an n-bit lookahead adder. The method is particularly effective for moderate values of n and it is in accord with well-known asymptotic bounds on addition. Extensions to modular tree structures for lookahead adders are discussed. 相似文献

20.

A metric for rooted trees with unlabeled vertices based on nested parentheses

Chan-Shuo Wu Guan-Shieng Huang 《Theoretical computer science》2010,411(44-46):3923-3931

In this paper, we propose a new metric for rooted trees with unlabeled vertices based on alignments of nested parenthesis strings. We prove that the time complexity for computing this metric is NP-hard and present a 1.5-approximation algorithm for it. 相似文献