首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《Computers & chemistry》1996,20(1):41-48
Simple Sequence Repeats (SSRs) are common and frequently polymorphic in eukaryote DNA. Many are subject to high rates of length mutation in which a gain or loss of one repeat unit is most often observed. Can the observed abundances and their length distributions be explained as the result of an unbiased random walk, starting from some initial repeat length? In order to address this question, we have considered two models for an unbiased random walk on the integers, n (n0n). The first is a continuous time process (Birth and Death Model or BDM) in which the probability of a transition to n + 1 or n − 1 is λk, with k = nn0 + 1 per unit time. The second is a discrete time model (Random Walk Model or RWM), in which a transition is made at each time step, either to n − 1 or to n + 1. In each case the walks start at length n0, with new walks being generated at a steady rate, S, the source rate, determined by a base substitution rate of mutation from neighboring sequences. Each walk terminates whenever n reaches n0 − 1 or at some time, T, which reflects the contamination of pure repeat sequences by other mutations that remove them from consideration, either because they fail to satisfy the criteria for repeat selection from some database or because they can no longer undergo efficient length mutations. For infinite T, the results are particularly simple for N(k), the expected number of repeats of length n = k + n0 − 1, being, for BDM, N(k) = S/, and for RWM, N(k) = 2S. In each case, there is a cut-off value of k for finite T, namely k = ln2 for BDM and k = 0.57√T for RWM; for larger values of k, N(k) becomes rapidly smaller than the infinite time limit. We argue that these results may be compared with SSR length distributions averaged over many loci, but not for a particular locus, for which founder effects are important. For the data of Beckmann & Weber [(1992), Genomics 12, 627] on GT·AC repeats in the human, each model gives a reasonable fit to the data, with the source at two repeat units (n0 = 2). Both the absolute number of loci and their length distribution are well represented.  相似文献   

2.
Wireless Mesh Networks (WMNs) extend Internet access in areas where the wired infrastructure is not available. A problem that arises is the congestion around gateways, delayed access latency and low throughput. Therefore, object replication and placement is essential for multi-hop wireless networks. Many replication schemes are proposed for the Internet, but they are designed for CDNs that have both high bandwidth and high server capacity, which makes them unsuitable for the wireless environment. Object replication has received comparatively less attention from the research community when it comes to WMNs. In this paper, we propose an object replication and placement scheme for WMNs. In our scheme, each mesh router acts as a replica server in a peer-to-peer fashion. The scheme exploits graph partitioning to build a hierarchy from fine-grained to coarse-grained partitions. The challenge is to replicate content as close as possible to the requesting clients and thus reduce the access latency per object, while minimizing the number of replicas. Using simulation tests, we demonstrate that our scheme is scalable, performing well with respect to the number of replica servers and the number of objects. The simulation results show that our proposed scheme has better performance compared to other replication schemes.  相似文献   

3.
We consider an incremental optimal label placement in a closed-2PM map containing points each attached with a label. Labels are assumed to be axis-parallel square-shaped and have to be pairwise disjoint with maximum possible length each attached to its corresponding point on one of its horizontal edges. Such a labeling is denoted as optimal labeling. Our goal is to efficiently generate a new optimal labeling for all points after each new point being inserted in the map. Inserting each point may require several labels to flip or all labels to shrink. We present an algorithm that generates each new optimal labeling in O(lgn+k) time where k is the number of required label flips, if there is no need to shrink the label lengths, or in O(n) time when we have to shrink the labels and flip some of them. The algorithm uses O(n) space in both cases. This is a new result on this problem.  相似文献   

4.
Unstructured Peer-to-Peer (P2P) networks have become a very popular architecture for content distribution in large-scale and dynamic environments. Searching for content in unstructured P2P networks is a challenging task because the distribution of objects has no association with the organization of peers. Proposed methods in recent years either depend too much on objects replication rate or suffer from a sharp decline in performance when objects stored in peers change rapidly, although their performance is better than flooding or random walk algorithms to some extent. In this paper, we propose a novel query routing mechanism for improving query performance in unstructured P2P networks. We design a data structure called traceable gain matrix (TGM) that records every query's gain at each peer along the query hit path, and allows for optimizing query routing decision effectively. Experimental results show that our query routing mechanism achieves relatively high query hit rate with low bandwidth consumption in different types of network topologies under static and dynamic network conditions.  相似文献   

5.
A new method for the search of local repeats in long DNA sequences, such as complete genomes, is presented. It detects a large variety of repeats varying in length from one to several hundred bases, which may contain many mutations. By mutations we mean substitutions, insertions or deletions of one or more bases. The method is based on counting occurrences of short words (3-12 bases) in sequence fragments called windows. A score is computed for each window, based on calculating exact word occurrence probabilities for all the words of a given length in the window. The probabilities are defined using a Bernoulli model (independent letters) for the sequence, using the actual letter frequencies from each window. A plot of the probabilities along the sequence for high-scoring windows facilitates the identification of the repeated patterns. We applied the method to the 1.87 Mb sequence of chromosome 4 of Arabidopsis thaliana and to the complete genome of Bacillus subtilis (4.2 Mb). The repeats that we found were classified according to their size, number of occurrences, distance between occurrences, and location with respect to genes. The method proves particularly useful in detecting long, inexact repeats that are local, but not necessarily tandem. The method is implemented as a C program called EXCEP, which is available on request from the authors.  相似文献   

6.
ContextA replication is the repetition of an experiment. Several efforts have been made to adopt replication as a common practice in software engineering. There are different types of replications, depending on their purpose. Similar replications keep the experimental conditions as alike as possible to the original ones. External similar replications, where the replicating experimenters are not the same people as the original experimenters, have been a stumbling block. Several attempts at combining the results of replications have resulted in failure. Software engineering does not appear to be well suited to such replications, because it works with complex experimentally immature contexts. Software engineering settings have a large number of variables, and the role that many of them play is unknown. A successful (or useful) similar replication helps to better understand the phenomenon under study by verifying results and/or identifying contextual variables that could influence (or not) the results, through the combination of experimental results.ObjectiveTo be able to get successful similar replications, there needs to be interaction between original and replicating experimenters. In this paper, we propose an interaction process for achieving successful similar replications.MethodThis process consists of: an adaptation meeting, where experimenters tailor the experiment to the new setting; querying, to settle occasional inquiries while the experiment is being run; and a combination meeting, where experimenters meet to discuss the combination of replication outcomes with previous results. To check its effectiveness, the process has been tested on three different replications of the same experiment.ResultsThe proposed interaction process has helped to identify new contextual variables that could potentially influence (or not) the experimental results in the three replications run. Additionally, the interaction process has helped to uncover certain problems and deviations that occurred during some of the replications that we would have not been aware of otherwise.ConclusionsThere are signs that suggest that it is possible to get successful similar replications in software engineering experimentation, when there is appropriate interaction among experimenters.  相似文献   

7.
This paper studies the control policies of an M/G/1 queueing system with a startup and unreliable server, in which the length of the vacation period is controlled either by the number of arrivals during the idle period, or by a timer. After all the customers are served in the queue exhaustively, the server immediately takes a vacation and operates two different policies: (i) the server reactivates as soon as the number of arrivals in the queue reaches to a predetermined threshold N or the waiting time of the leading customer reaches T units; and (ii) the server reactivates as soon as the number of arrivals in the queue reaches to a predetermined threshold N or T time units have elapsed since the end of the completion period. If the timer expires or the number of arrivals exceeds the threshold N, then the server reactivates and requires a startup time before providing the service until the system is empty. Furthermore, it is assumed that the server breaks down according to a Poisson process and his repair time has a general distribution. We analyze the system characteristics for each scheme. The total expected cost function per unit time is developed to determine the optimal thresholds of N and T at a minimum cost.  相似文献   

8.
Collaborative applications are characterized by high levels of data sharing. Optimistic replication has been suggested as a mechanism to enable highly concurrent access to the shared data, whilst providing full application-defined consistency guarantees. Nowadays, there are a growing number of emerging cooperative applications adequate for Peer-to-Peer (P2P) networks. However, to enable the deployment of such applications in P2P networks, it is required a mechanism to deal with their high data sharing in dynamic, scalable and available way. Previous work on optimistic replication has mainly concentrated on centralized systems. Centralized approaches are inappropriate for a P2P setting due to their limited availability and vulnerability to failures and partitions from the network. In this paper, we focus on the design of a reconciliation algorithm designed to be deployed in large scale cooperative applications, such as P2P Wiki. The main contribution of this paper is a distributed reconciliation algorithm designed for P2P networks (P2P-reconciler). Other important contributions are: a basic cost model for computing communication costs in a DHT overlay network; a strategy for computing the cost of each reconciliation step taking into account the cost model; and an algorithm that dynamically selects the best nodes for each reconciliation step. Furthermore, since P2P networks are built independently of the underlying topology, which may cause high latencies and large overheads degrading performance, we also propose a topology-aware variant of our P2P-reconciler algorithm and show the important gains on using it. Our P2P-reconciler solution enables high levels of concurrency thanks to semantic reconciliation and yields high availability, excellent scalability, with acceptable performance and limited overhead.  相似文献   

9.
霍红卫  白帆 《计算机学报》2008,31(2):214-219
当前大部分重复体识别算法不是依靠于已经标识的重复体数据库就是定义重复体为两个最大长度的相似序列,而没有一个严格的定义来平衡重复体的长度和频率.针对这些问题文中提出了一种基于局部序列比对算法BLAST变型且支持空位的快速识别重复体的RepeatSearcher算法.算法通过定义重复体的精确边界运用逐步扩展调和序列来识别重复体.算法使用C.briggsae基因组序列作为测试对象,并与当前通用的重复体识别算法RECON以及新近的识别算法RepeatScout做了比较分析.结果表明RepeatSearcher使每一条重复体序列具有了精确的边界,而且相对其它算法在没有损失精度的情况下,缩短了算法的运行时间.  相似文献   

10.
Abstract

Proto-organisms probably were randomly aggregated nets of chemical reactions. The hypothesis that contemporary organisms are also randomly constructed molecular automata is examined by modeling the gene as a binary (on-off) device and studying the behavior of large, randomly constructed nets of these binary “genes.” The results suggest that, if each “gene” is directly affected by two or three other “genes,” then such random nets: behave with great order and stability; undergo behavior cycles whose length predicts cell replication time as a function of the number of genes per cell; possess different modes of behavior whose number per net predicts roughly the number of cell types in an organism as a function of its number of genes; and under the stimulus of noise are capable of differentiating directly from any mode of behavior to at most a few other modes of behavior. Cellular differentiation is modeled as a Markov chain among the modes of behavior of a genetic net. The possibility of a general theory of metabolic behavior is suggested. Analytic approaches to the behavior of switching nets are discussed in Appendix 1, and some implications of the results for the origin of self replicating macromolecular systems is discussed in Appendix 6.  相似文献   

11.
《Knowledge》2002,15(3):169-175
Clustered linear regression (CLR) is a new machine learning algorithm that improves the accuracy of classical linear regression by partitioning training space into subspaces. CLR makes some assumptions about the domain and the data set. Firstly, target value is assumed to be a function of feature values. Second assumption is that there are some linear approximations for this function in each subspace. Finally, there are enough training instances to determine subspaces and their linear approximations successfully. Tests indicate that if these approximations hold, CLR outperforms all other well-known machine-learning algorithms. Partitioning may continue until linear approximation fits all the instances in the training set — that generally occurs when the number of instances in the subspace is less than or equal to the number of features plus one. In other case, each new subspace will have a better fitting linear approximation. However, this will cause over fitting and gives less accurate results for the test instances. The stopping situation can be determined as no significant decrease or an increase in relative error. CLR uses a small portion of the training instances to determine the number of subspaces. The necessity of high number of training instances makes this algorithm suitable for data mining applications.  相似文献   

12.
《Computers & chemistry》1992,16(2):135-143
Repetitive sequences are ubiquitous in the DNA of eukaryotes, some as tandem arrays and others interspersed widely in the genome. Repetitive sequences have special roles in genome evolution, which increasingly detailed sequence information is helping to elucidate. Processes, including meiotic crossing over (equal and unequal), unequal mitotic sister chromatid exchange, gene conversion and transposition, with or without multiplication, can foster homogeneity of the members of a repeat family (concerted evolution) and turnover of the whole genome. Some examples are considered. Tandem repeats, satellite and minisatellite sequences are considered as well as telomeric repeats. For a minisatellite locus, in which the frequency of length mutations has been measured, comparisons are made with expectations due to unequal sister chromatid exchange, and qualitative agreement is found. Interspersed repeats, of which Alu sequences are discussed as an important example, can through unequal recombination lead to a loss or duplication of the DNA between the recombination sites and hence to genetic disease or to gene duplication. It is argued that the rate of unequal Alu recombination may be quite high in the human genome.  相似文献   

13.
Effective data management is an important issue in a large-scale distributed environment such as distributed DBMS, Peer-to-Peer System (P2P), data grid, and World Wide Web (WWW). This can be achieved by using a replication protocol, which efficiently decrease the communication cost and increase the data availability. The Tree Quorum protocol is one of the representative replication protocols allowing low read cost in the best case but it has some drawbacks such as that the number of replicas grows rapidly as the level increases and root replica is a bottleneck. The Grid protocol requires fixed operation cost regardless of failure condition. In this paper we propose a new replication protocol called Dynamic Hybrid protocol, which efficiently improves the existing protocols. The proposed protocol effectively combines the grid and tree structure so that the overall topology can be flexibly adjusted using three configuration parameters; tree height, number of descendants and grid depth. For high read availability, the height of tree and number of descendants are decreased and depth of grid is increased. For high write availability, the height of tree and the depth of grid are decreased, while the number of descendant is increased. We present an analytical model of read/write availability and the average number of nodes accessed for each operation. We also employ computer simulation to estimate the throughput and communication overhead. The proposed protocol always allows much smaller communication and operation cost than earlier protocols.  相似文献   

14.
We consider the problem of designing an efficient index for approximate pattern matching. Despite ongoing efforts, this topic is still a challenge in combinatorial pattern matching. We present a new data structure that allows to report all matches in worst-case time O(|Σ|m+occ), which is linear in the pattern length m and the number of occurrences occ for alphabets of constant size |Σ|. Our index uses O(n|Σ|logn) extra space on average and with high probability, where n is the length of the text (indexing) or the number of strings to index (dictionary lookup).  相似文献   

15.
In this paper we consider three fundamental communication problems on the star interconnection network: the problem of simultaneous broadcasting of a message from every node to all other nodes, or multinode broadcasting, the problem of a single node sending distinct messages to each one of the other nodes, or single-node scattering, and finally the problem of each node sending distinct messages to every other node, or total exchange. All of these problems are studied under two different assumptions: the assumption that each node can transmit a message of fixed length to one of its neighbors and simultaneously receive a message of fixed length from one of its neighbors (not necessarily the same one) at each time step, or single-link availability, and the assumption that each node can exchange messages of fixed length with all of its neighbors at each time step, or multiple-link availability. In both cases communication is assumed to be bidirectional. The cases where the originating processor wishes to send only one or more than one message to each one of the other processors are distinguished when necessary. Lower bounds are derived for these problems under the stated assumptions, and optimal algorithms are designed in terms of both time and number of message transmissions. Although the algorithms derived for the first two problems require the minimum amount of the above resources, the algorithm designed for the total exchange problem is optimal only to within a multiplicative factor. All the communication algorithms presented in this paper are based on the construction of spanning trees with special properties on the star graph to fit different communication needs. A special framework is developed to facilitate the construction of these trees. The scheduling disciplines that lead to optimal results in each case are described.  相似文献   

16.
We consider a variant of the graph searching games that models the routing reconfiguration problem in WDM networks. In the digraph processing game, a team of agents aims at processing, or clearing, the vertices of a digraph D. We are interested in two different measures: (1) the total number of agents used, and (2) the total number of vertices occupied by an agent during the processing of D. These measures, respectively, correspond to the maximum number of simultaneous connections interrupted and to the total number of interruptions during a routing reconfiguration in a WDM network.Previous works have studied the problem of independently minimizing each of these parameters. In particular, the corresponding minimization problems are APX-hard, and the first one is known not to be in APX. In this paper, we give several complexity results and study tradeoffs between these conflicting objectives. In particular, we show that minimizing one of these parameters while the other is constrained is NP-complete. Then, we prove that there exist some digraphs for which minimizing one of these objectives arbitrarily impairs the quality of the solution for the other one. We show that such bad tradeoffs may happen even for a basic class of digraphs. On the other hand, we exhibit classes of graphs for which good tradeoffs can be achieved. We finally detail the relationship between this game and the routing reconfiguration problem. In particular, we prove that any instance of the processing game, i.e. any digraph, corresponds to an instance of the routing reconfiguration problem.  相似文献   

17.
Blanchet-Sadri et al. have shown that Avoidability, or the problem of deciding the avoidability of a finite set of partial words over an alphabet of size k≥2, is NP-hard [F. Blanchet-Sadri, R. Jungers, J. Palumbo, Testing avoidability on sets of partial words is hard, Theoret. Comput. Sci. 410 (2009) 968-972]. Building on their work, we analyze in this paper the complexity of natural variations on the problem. While some of them are NP-hard, others are shown to be efficiently decidable. Using some combinatorial properties of de Bruijn graphs, we establish a correspondence between lengths of cycles in such graphs and periods of avoiding words, resulting in a tight bound for periods of avoiding words. We also prove that Avoidability can be solved in polynomial space, and reduces in polynomial time to the problem of deciding the avoidability of a finite set of partial words of equal length over the binary alphabet. We give a polynomial bound on the period of an infinite avoiding word, in the case of sets of full words, in terms of two parameters: the length and the number of words in the set. We give a polynomial space algorithm to decide if a finite set of partial words is avoided by a non-ultimately periodic infinite word. The same algorithm also decides if the number of words of length n avoiding a given finite set of partial words grows polynomially or exponentially with n.  相似文献   

18.
选用分布于水稻12条染色体上的249对SSR引物,对9个不同类型水稻品种进行DNA扩增,筛选到具有3条以上特异扩增条带的SSR引物39对,其中能揭示籼粳多态性的引物有6对。用筛选到的引物在9个不同水稻品种间的扩增结果,初步分析了SSR标记的多态性及用于水稻指纹分析、籼粳分化分析和聚类分析的潜力。  相似文献   

19.
High performance computing can be well supported by the Grid or cloud computing systems. However, these systems have to overcome the failure risks, where data is stored in the “unreliable” storage nodes that can leave the system at any moment and the nodes’ network bandwidth is limited. In this case, the basic way to assure data reliability is to add redundancy using either replication or erasure codes. As compared to replication, erasure codes are more space efficient. Erasure codes break data into blocks, encode these blocks and distribute them into different storage nodes. When storage nodes permanently or temporarily abandon the system, new redundant blocks must be created to guarantee the data reliability, which is referred to as repair. Later when the churn nodes rejoin the system, the blocks stored in these nodes can reintegrate the data group to enhance the data reliability. For “classical” erasure codes, generating a new block requires to transmit a number of k blocks over the network, which brings lots of repair traffic, high computation complexity and high failure probability for the repair process. Then a near-optimal erasure code named Hierarchical Codes, has been proposed that can significantly reduce the repair traffic by reducing the number of nodes participating in the repair process, which is referred to as the repair degree d. To overcome the complexity of reintegration and provide an adaptive reliability for Hierarchical Codes, we refine two concepts called location and relocation, and then propose an integrated maintenance scheme for the repair process. Our experiments show that Hierarchical Code is the most robust redundancy scheme for the repair process as compared to other famous coding schemes.  相似文献   

20.
Data streams management has attracted the attention of many researchers during the recent years. The reason is that numerous devices generate huge amounts of data demanding an efficient processing scheme for delivering high quality applications. Data are reported through streams and stored into a number of partitions. Separation techniques facilitate the parallel management of data while intelligent methods are necessary to manage these multiple instances of data. Progressive analytics over huge amounts of data could be adopted to deliver partial responses and, possibly, to save time in the execution of applications. An interesting research domain is the efficient management of queries over multiple partitions. Usually, such queries demand responses in the form of ordered sets of objects (e.g., top-k queries). These ordered sets include objects in a ranked order and require novel mechanisms for deriving responses based on partial results. In this paper, we study a setting of multiple data partitions and propose an intelligent, uncertainty driven decision making mechanism that aims to respond to streams of queries. Our mechanism delivers an ordered set of objects over a number of partial ordered subsets retrieved by each partition of data. We envision that a number of query processors are placed in front of each partition and report progressive analytics to a Query Controller (QC). The QC receives queries, assigns the task to the underlying processors and decides the right time to deliver the final ordered set to the application. We propose an aggregation model for deriving the final ordered set of objects and a Fuzzy Logic (FL) inference process. We present a Type-2 FL system that decides when the QC should stop aggregating partial subsets and return the final response to the application. We report on the performance of the proposed mechanism through the execution of a large set of experiments. Our results deal with the throughput of the QC, the quality of the final ordered set of objects and the time required for delivering the final response.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号