共查询到7条相似文献,搜索用时 15 毫秒
1.
In this paper, we address the biological sequence alignment problem, which is one of the most commonly used steps in several bioinformatics applications. We employ the Divisible Load Theory (DLT) paradigm that is suitable for handling large-scale processing on network-based systems to achieve a high degree of parallelism. Using the DLT paradigm, we propose a strategy in which we carefully partition the computation work load among the processors in the system so as to minimize the overall computation time of determining the maximum similarity between the DNA/protein sequences. We consider handling such a computational problem on networked computing platforms connected as a linear daisy chain. We derive the individual load quantum to be assigned to the processors according to computation and communication link speeds along the chain. We consider two cases of sequence alignment where post-processes, i.e., trace-back processes that are required to determine an optimal alignment, may or may not be done at individual processors in the system. We derive some critical conditions to determine if our strategies are able to yield an optimal processing time. We apply three different heuristic strategies proposed in the literature to generate sub-optimal solutions for processing time when the above conditions cannot be satisfied. To testify the proposed schemes, we use real-life DNA samples of house mouse mitochondrion and the DNA of human mitochondrion obtained from the public database GenBank [GenBank, http://www.ncbi.nlm.nih.gov] in our simulation experiments. By this study, we conclusively demonstrate the applicability and potential of the DLT paradigm to such biological sequence related computational problems. 相似文献
2.
对线性有限自动机的UIO序列进行了讨论,得到了线性有限自动机的任意一状态有某一长度的UIO序列的充要条件,得到了线性有限自动机的(所有)状态有UIO序列的的充要条件,还给出了有UIO序列的线性有限自动机的状态的最短UIO序列长度的上界,最后给出了判断线性有限自动机的所有状态有无UIO序列以及有求其UIO序列的两个算法. 相似文献
3.
Encoding and processing information in DNA-, RNA- and other biomolecule-based devices is an important requirement for DNA based computing with potentially important applications. To make DNA computing more reliable, much work has focused on designing the good DNA sequences. However, this is a bothersome task as encoding problem is an NP problem. In this paper, a new methodology based on the IWO algorithm is developed to optimize encoding sequences. Firstly, the mathematics models of constrained objective optimization design for encoding problems based on the thermodynamic criteria are set up. Then, a modified IWO method is developed by defining the colonizing behavior of weeds to overcome the obstacles of the original IWO algorithm, which cannot be applied to discrete problems directly. The experimental results show that the proposed method is effective and convenient for the user to design and select effective DNA sequences in silicon for controllable DNA computing. 相似文献
4.
This paper addresses computational prediction of protein structural classes. Although in recent years progress in this field was made, the main drawback of the published prediction methods is a limited scope of comparison procedures, which in same cases were also improperly performed. Two examples include using protein datasets of varying homology, which has significant impact on the prediction accuracy, and comparing methods in pairs using different datasets. Based on extensive experimental work, the main aim of this paper is to revisit and reevaluate state of the art in this field. To this end, this paper performs a first-of-its-kind comprehensive and multi-goal study, which includes investigation of eight prediction algorithms, three protein sequence representations, three datasets with different homologies and finally three test procedures. Quality of several previously unused prediction algorithms, newly proposed sequence representation, and a new-to-the-field testing procedure is evaluated. Several important conclusions and findings are made. First, the logistic regression classifier, which was not previously used, is shown to perform better than other prediction algorithms, and high quality of previously used support vector machines is confirmed. The results also show that the proposed new sequence representation improves accuracy of the high quality prediction algorithms, while it does not improve results of the lower quality classifiers. The study shows that commonly used jackknife test is computationally expensive, and therefore computationally less demanding 10-fold cross-validation procedure is proposed. The results show that there is no statistically significant difference between these two procedures. The experiments show that sequence homology has very significant impact on the prediction accuracy, i.e. using highly homologous datasets results in higher accuracies. Thus, results of several past studies that use homologous datasets should not be perceived as reliable. The best achieved prediction accuracy for low homology datasets is about 57% and confirms results reported by Wang and Yuan [How good is the prediction of protein structural class by the component-coupled method?. Proteins 2000;38:165–175]. For a highly homologous dataset instance based classification is shown to be better than the previously reported results. It achieved 97% prediction accuracy demonstrating that homology is a major factor that can result in the overestimated prediction accuracy. 相似文献
5.
A program to calculate optimum alignment between two sequences, which may be DNA, amino acid or other information, has been written in PASCAL. The Sellers' algorithm for calculating distance between sequences has been modified to reduce its demands on microcomputer memory space by more than half. Gap penalties and mismatch scores are user-adjustable. In 48 K of memory the program aligns sequences up to 170 elements in length; optimum alignment and total distance between a pair of sequences are displayed. The program aligns longer sequences by subdivision of both sequences into corresponding, overlapping sections. Section length and amount of section overlap are user-defined. More importantly, extension of this modification of Sellers' algorithm to align longer sequences, given hardware and compilers/languages capable of using a larger memory space (e.g. 640 K), shows that it is now possible to align, without subdivision, sequences with up to 700 elements each. The increase in computation time for this program with increasing sequence lengths aligned without subdivision is curvilinear, but total times are essentially dependent on hardware/language/compiler combinations. The statistical significance of an alignment is examined with conventional Monte Carlo approaches. 相似文献
6.
The main objective of this paper is to describe a realistic framework to understand parallel performance of high-dimensional
image processing algorithms in the context of heterogeneous networks of workstations (NOWs). As a case study, this paper explores
techniques for mapping hyperspectral image analysis techniques onto fully heterogeneous NOWs. Hyperspectral imaging is a new
technique in remote sensing that has gained tremendous popularity in many research areas, including satellite imaging and
aerial reconnaissance. The automation of techniques able to transform massive amounts of hyperspectral data into scientific
understanding in valid response times is critical for space-based Earth science and planetary exploration. Using an evaluation
strategy which is based on comparing the efficiency achieved by an heterogeneous algorithm on a fully heterogeneous NOW with
that evidenced by its homogeneous version on a homogeneous NOW with the same aggregate performance as the heterogeneous one,
we develop a detailed analysis of parallel algorithms that integrate the spatial and spectral information in the image data
through mathematical morphology concepts. For comparative purposes, performance data for the tested algorithms on Thunderhead
(a large-scale Beowulf cluster at NASA’s Goddard Space Flight Center) are also provided. Our detailed investigation of the
parallel properties of the proposed morphological algorithms provides several intriguing findings that may help image analysts
in selection of parallel techniques and strategies for specific applications.
相似文献
7.
This paper concerns the following problem: given a set of multi-attribute records, a fixed number of buckets and a two-disk system, arrange the records into the buckets and then store the buckets between the disks in such a way that, over all possible orthogonal range queries (ORQs), the disk access concurrency is maximized. We shall adopt the multiple key hashing (MKH) method for arranging records into buckets and use the disk modulo (DM) allocation method for storing buckets onto disks. Since the DM allocation method has been shown to be superior to any other allocation methods for allocating an MKH file onto a two-disk system for answering ORQs, the real issue is knowing how to determine an optimal way for organizing the records into buckets based upon the MKH concept. A performance formula that can be used to evaluate the average response time, over all possible ORQs, of an MKH file in a two-disk system using the DM allocation method is first presented. Based upon this formula, it is shown that our design problem is related to a notoriously difficult problem, namely the Prime Number Problem. Then a performance lower bound and an efficient algorithm for designing optimal MKH files in certain cases are presented. It is pointed out that in some cases the optimal MKH file for ORQs in a two-disk system using the DM allocation method is identical to the optimal MKH file for ORQs in a single-disk system and the optimal average response time in a two-disk system is slightly greater than one half of that in a single-disk system. 相似文献
|