首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The field of computational biology encloses a wide range of optimization problems that show non‐deterministic polynomial‐time hard complexities. Nowadays, phylogeneticians are dealing with a growing amount of biological data that must be analyzed to explain the origins of modern species. Evolutionary relationships among organisms are often described by means of tree‐shaped structures known as phylogenetic trees. When inferring phylogenies, two main challenges must be addressed. First, the inference of reliable evolutionary trees on data sets where different optimality principles support conflicting evolutionary hypotheses. Second, the processing of enormous tree searches spaces where traditional sequential strategies cannot be applied. In this sense, phylogenetic inference can benefit from the combination of high performance computing and evolutionary computation to carry out the reconstruction of complex evolutionary histories in reduced execution times. In this paper, we introduce multiobjective phylogenetics, a hybrid OpenMP/MPI approach to parallelize a well‐known multiobjective metaheuristic, the fast non‐dominated sorting genetic algorithm (NSGA‐II). This algorithm has been designed to conduct phylogenetic analyses on multi‐core clusters in accordance with two principles: maximum parsimony and maximum likelihood. The main goal is to combine the benefits of shared‐memory and distributed‐memory programming paradigms to efficiently infer a set of high‐quality Pareto solutions. Experiments on six real nucleotide data sets and comparisons with other hybrid parallel approaches show that multiobjective phylogenetics is able to achieve significant performance in terms of parallel, multiobjective, and biological results. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

2.
Comparing tree-structured data for structural similarity is a recurring theme and one on which much effort has been spent. Most approaches so far are grounded, implicitly or explicitly, in algorithmic information theory, being approximations to an information distance derived from Kolmogorov complexity. In this paper we propose a novel complexity metric, also grounded in information theory, but calculated via Shannon's entropy equations. This is used to formulate a directly and efficiently computable metric for the structural difference between unordered trees. The paper explains the derivation of the metric in terms of information theory, and proves the essential property that it is a distance metric. The property of boundedness means that the metric can be used in contexts such as clustering, where second-order comparisons are required. The distance metric property means that the metric can be used in the context of similarity search and metric spaces in general, allowing trees to be indexed and stored within this domain. We are not aware of any other tree similarity metric with these properties.  相似文献   

3.
We present a Directed Acyclic Graph visualisation designed to allow interaction with a set of multiple classification trees, specifically to find overlaps and differences between groups of trees and individual trees. The work is motivated by the need to find a representation for multiple trees that has the space-saving property of a general graph representation and the intuitive parent-child direction cues present in individual representation of trees. Using example taxonomic data sets, we describe augmentations to the common barycenter DAG layout method that reveal shared sets of child nodes between common parents in a clearer manner. Other interactions such as displaying the multiple ancestor paths of a node when it occurs in several trees, and revealing intersecting sibling sets within the context of a single DAG representation are also discussed.  相似文献   

4.
Learning from unlabeled images that contain various objects that change in pose, scale, and degree of occlusion is a challenging task in computer vision. Shared structures embody the consistence and coherence of features that repeatedly cooccur at an object class. They can be used as discriminative information to separate the various objects contained in unlabeled images. In this paper, we propose a maximum likelihood algorithm for unsupervised shared structure learning, where shared structures are represented as the strongly connected clusters of consistent pairwise relationships and shared structures of different order are learned through exploring and combining consistent pairwise spatial relationships. Two routines of sampling data, namely densely sampling and sparsely sampling, are also discussed in our work. We test our algorithm on a diverse set of data to verify its merits.  相似文献   

5.
Existing methods for spatial joins require pre-existing spatial indices or other precomputation, but such approaches are inefficient and limited in generality. Operand data sets of spatial joins may not all have precomputed indices, particularly when they are dynamically generated by other selection or join operations. Also, existing spatial indices are mostly designed for spatial selections, and are not always efficient for joins. This paper explores the design and implementation of seeded trees, which are effective for spatial joins and efficient to construct at join time. Seeded trees are R-tree-like structures, but divided into seed levels and grown levels. This structure facilitates using information regarding the join to accelerate the join process, and allows efficient buffer management. In addition to the basic structure and behavior of seeded trees we present techniques for efficient seeded tree construction, a new buffer management strategy to lower I/O costs, and theoretical analysis for choosing algorithmic parameters. We also present methods for reducing space requirements and improving the stability of seeded tree performance with no additional I/O costs. Our performance studies show that the seeded tree method outperforms other tree-based methods by far both in terms of the number disk pages accessed and weighted I/O costs. Further, its performance gain is stable across different input data, and its incurred CPU penalties are also lower  相似文献   

6.
Text-indexing structures provide significant advantages in the solution of many problems related to string analysis and comparison, and are nowadays widely used in the analysis of biological sequences. In this paper, we present some applications of affix trees to problems of exact and approximate pattern matching and discovery in RNA sequences. By allowing bidirectional search for symmetric patterns in the sequences, affix trees permit to discover and locate in the sequences patterns describing not only sequence regions, but also containing information about the secondary structure that a given region could form, with improvements in terms of theoretical and practical efficiency over the existing methods. The search can be either exact or approximate, where the approximation can be defined simultaneously both for the sequence and the structure of patterns. The approach presented in this paper could provide significant help in the analysis of RNA sequences, where the functional motifs often involve not only sequence, but also the structural constraints.  相似文献   

7.
Parallel Data Mining for Association Rules on Shared-Memory Systems   总被引:11,自引:1,他引:10  
In this paper we present a new parallel algorithm for data mining of association rules on shared-memory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a significant improvement of performance is achieved using our proposed optimizations. We also achieved good speed-up for the parallel algorithm. A lot of data-mining tasks (e.g. association rules, sequential patterns) use complex pointer-based data structures (e.g. hash trees) that typically suffer from suboptimal data locality. In the multiprocessor case shared access to these data structures may also result in false sharing. For these tasks it is commonly observed that the recursive data structure is built once and accessed multiple times during each iteration. Furthermore, the access patterns after the build phase are highly ordered. In such cases locality and false sharing sensitive memory placement of these structures can enhance performance significantly. We evaluate a set of placement policies for parallel association discovery, and show that simple placement schemes can improve execution time by more than a factor of two. More complex schemes yield additional gains. Received 24 May 1999 / Revised 20 June 2000 / Accepted in revised form 6 July 2000  相似文献   

8.
Eric K. Lee  Charles U. Martel 《Software》2007,37(15):1559-1575
In this paper we present new empirical results for splay trees. These results provide a better understanding of how cache performance affects query execution time. Our results show that splay trees can have faster lookup times compared with randomly built binary search trees (BST) under certain settings. In contrast, previous experiments have shown that because of the instruction overhead involved in splaying, splay trees are less efficient in answering queries than randomly built BSTs—even when the data sets are heavily skewed (a favorable setting for splay trees). We show that at large tree sizes the difference in cache performance between the two types of trees is significant. This difference means that splay trees are faster than BSTs for this setting—despite still having a higher instruction count. Based on these results we offer guidelines in terms of tree size, access pattern, and cache size as to when splay trees will likely be more efficient. We also present a new splaying heuristic aimed at reducing instruction count and show that it can improve on standard splaying by 10–27%. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

9.
10.
In the design of electronic embedded systems, the allocation of data structures to memory banks is a main challenge faced by designers. Indeed, if this optimization problem is solved correctly, a great improvement in terms of efficiency can be obtained. In this paper, we consider the dynamic memory allocation problem, where data structures have to be assigned to memory banks in different time periods during the execution of the application. We propose a GRASP to obtain high quality solutions in short computational time, as required in this type of problem. Moreover, we also explore the adaptation of the ejection chain methodology, originally proposed in the context of tabu search, for improved outcomes. Our experiments with real and randomly generated instances show the superiority of the proposed methods compared to the state-of-the-art method.  相似文献   

11.
Splicing on tree-like structures   总被引:1,自引:0,他引:1  
In this paper, we provide a method to increase the power of splicing systems. We introduce the splicing systems on trees to be built as partially annealed single strands, which is a quite similar notion and a natural extension of splicing systems on strings. Trees are a common and useful data structure in computer science and have a biological counterpart such as molecular sequences with secondary structures, which are typical structures in RNA sequences. Splicing on trees involves (1) complete subtrees as axioms, (2) restriction operated on the annealed subsequences, (3) rules to substitute a complete subtree with another. We show that splicing systems on trees with finite sets of axioms and finite sets of rules can generate the class of context-free languages without the need of imposing multiplicity constraints.  相似文献   

12.
Sometimes the complex structures of nature inspire human constructions. Gothic construction has shown that forces can cross space along intricate paths that may even be arbitrary if correctly dimensioned. In some way, ribbed structures are like trees where the branches conduct forces instead of sap; they operate as branches and trunks descending by fractal ways. Here we discuss reciprocal tree-like fractal structures and the difficulty in their design and erection and solutions for constructive details, as well as the possible analytical questions and automatic generation by means of proper software. The results are shown in the design of the Natural Interpretation Centre in Melilla where we have proposed two connected trees like shown at figures included below.  相似文献   

13.
A number of coverage criteria have been proposed for testing classes and class clusters modeled with state machines. Previous research has revealed their limitations in terms of their capability to detect faults. As these criteria can be considered to execute the control flow structure of the state machine, we are investigating how data flow information can be used to improve them in the context of UML state machines. More specifically, we investigate how such data flow analysis can be used to further refine the selection of a cost‐effective test suite among alternative, adequate test suites for a given state machine criterion. This paper presents a comprehensive methodology to perform data flow analysis of UML state machines—with a specific focus on identifying the data flow from OCL guard conditions and operation contracts—and applies it to a widely referenced coverage criterion, the round‐trip path (transition tree) criterion. It reports on two case studies whose results show that data flow information can be used to select the best transition tree, in terms of cost effectiveness, when more than one satisfies the transition tree criterion. The results also suggest that different trees are complementary in terms of the data flow that they exercise, thus, leading to the detection of intersecting but distinct subsets of faults. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

14.
In general, the verification of parameterized networks is undecidable. In recent years there has been a lot of research to identify subclasses of parameterized systems for which certain properties are decidable. Some of the results are based on finite abstractions of the parameterized system in order to use model-checking techniques to establish those properties. In a previous paper we presented a method which allows to compute abstractions of a parameterized system modeled in the decidable logic WS1S. These WS1S systems provide an intuitive way to describe parameterized systems of finite state processes. In practice however, the processes in the network themselves are infinite because of unbounded data structures. One source of unboundedness can be the usage of a parameterized data structure. Another typical source may be the presence of structures ranging over subsets of participating processes. E.g., this is the case for group membership or distributed shared memory consistency protocols. In this paper we use deductive methods to deal with such networks where the data structure is parameterized by the number of processes and an extra parameter. We show how to derive an abstract WS1S system which can be subject to algorithmic verification. For illustration of the method we verify the correctness of a distributed shared memory consistency protocol using PVS for the deductive verification part and the tools PAX and SMV for the algorithmic part.  相似文献   

15.
Context-aware ubiquitous computing systems should be able to introspect the surrounding environment and adapt their behavior according to other existing systems and context changes. Although numerous ubiquitous computing systems have been developed that are aware of different types of context such as location, social situation, and available computational resources, few are aware of their computational behavior. Computational behavior introspection is common in reflective systems and can be used to improve the awareness and autonomy of ubicomp systems. In this paper, we propose a decentralized approach based on Simple Network Management Protocol (SNMP) and Universal Plug and Play (UPnP), and on state transition models to model and expose computational behavior. Typically, SNMP and UPnP are targeted to retrieve raw operational variables from managed network devices and consumer electronic devices, e.g., checking network interface bandwidth and automating device discovery and plug and play operations. We extend the use of these protocols by exposing the state of different ubicomp systems and associated state transitions statistics. This computational behavior may be collected locally or remotely from ubicomp systems that share a physical environment, and sent to a coordinator node or simply shared among ubicomp systems. We describe the implementation of this behavior awareness approach in a home health-care environment equipped with a VoIP Phone and a drug dispenser. We provide the means for exposing and using the behavior context in managing a simple home health-care setting. Our approach relies on a system state specification being provided by manufacturers. In the case where the specification is not provided, we show how it can be automatically discovered. We propose two machine learning approaches for automatic behavior discovery and evaluate them by determining the expected state graphs of our two systems (a VoIP Phone and a drug dispenser). These two approaches are also evaluated regarding the effectiveness of generated behavior graphs.  相似文献   

16.
A binary decision diagram based approach for mining frequent subsequences   总被引:2,自引:1,他引:1  
Sequential pattern mining is an important problem in data mining. State of the art techniques for mining sequential patterns, such as frequent subsequences, are often based on the pattern-growth approach, which recursively projects conditional databases. Explicitly creating database projections is thought to be a major computational bottleneck, but we will show in this paper that it can be beneficial when the appropriate data structure is used. Our technique uses a canonical directed acyclic graph as the sequence database representation, which can be represented as a binary decision diagram (BDD). In this paper, we introduce a new type of BDD, namely a sequence BDD (SeqBDD), and show how it can be used for efficiently mining frequent subsequences. A novel feature of the SeqBDD is its ability to share results between similar intermediate computations and avoid redundant computation. We perform an experimental study to compare the SeqBDD technique with existing pattern growth techniques, that are based on other data structures such as prefix trees. Our results show that a SeqBDD can be half as large as a prefix tree, especially when many similar sequences exist. In terms of mining time, it can be substantially more efficient when the support is low, the number of patterns is large, or the input sequences are long and highly similar.  相似文献   

17.
Contextualizing ontologies   总被引:2,自引:0,他引:2  
Ontologies are shared models of a domain that encode a view which is common to a set of different parties. Contexts are local models that encode a party’s subjective view of a domain. In this paper, we show how ontologies can be contextualized, thus acquiring certain useful properties that a pure shared approach cannot provide. We say that an ontology is contextualized or, also, that it is a contextual ontology, when its contents are kept local, and therefore not shared with other ontologies, and mapped with the contents of other ontologies via explicit (context) mappings. The result is Context OWL (C-OWL), a language whose syntax and semantics have been obtained by extending the OWL syntax and semantics to allow for the representation of contextual ontologies.  相似文献   

18.
We introduce new data structures for compressed suffix trees whose size are linear in the text size. The size is measured in bits; thus they occupy only O(n log|A|) bits for a text of length n on an alphabet A. This is a remarkable improvement on current suffix trees which require O(n log n) bits. Though some components of suffix trees have been compressed, there is no linear-size data structure for suffix trees with full functionality such as computing suffix links, string-depths and lowest common ancestors. The data structure proposed in this paper is the first one that has linear size and supports all operations efficiently. Any algorithm running on a suffix tree can also be executed on our compressed suffix trees with a slight slowdown of a factor of polylog(n).  相似文献   

19.
Summary Recursive data types are data types which are defined in terms of themselves, such as lists and trees. There is a single access path to each component in a recursive data structure.Generalized recursive data structures may include multiple access paths to some parts of the data structure. Two way lists, threaded trees and circular lists are generalized recursive data types. The extra access paths in a generalized recursive data structure are uniquely determined by the type of the structure and the main paths through the structure.An extension to Pascal in which generalized recursive data structures may be defined is described.  相似文献   

20.
Correlated survival outcomes occur quite frequently in the biomedical research. Available software is limited, particularly if we wish to obtain smoothed estimate of the baseline hazard function in the context of random effects model for correlated data. The main objective of this paper is to describe an R package called frailtypack that can be used for estimating the parameters in a shared gamma frailty model with possibly right-censored, left-truncated stratified survival data using penalized likelihood estimation. Time-dependent structure for the explanatory variables and/or extension of the Cox regression model to recurrent events are also allowed. This program can also be used simply to obtain directly a smooth estimate of the baseline hazard function. To illustrate the program we used two data sets, one with clustered survival times, the other one with recurrent events, i.e., the rehospitalizations of patients diagnosed with colorectal cancer. We show how to fit the model with recurrent events and time-dependent covariates using Andersen-Gill approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号