期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Handling biological sequence alignments on networked computing systems: A divide-and-conquer approach

Veeravalli Han Min 《Journal of Parallel and Distributed Computing》2009,69(10):854-865

In this paper, we address the biological sequence alignment problem, which is one of the most commonly used steps in several bioinformatics applications. We employ the Divisible Load Theory (DLT) paradigm that is suitable for handling large-scale processing on network-based systems to achieve a high degree of parallelism. Using the DLT paradigm, we propose a strategy in which we carefully partition the computation work load among the processors in the system so as to minimize the overall computation time of determining the maximum similarity between the DNA/protein sequences. We consider handling such a computational problem on networked computing platforms connected as a linear daisy chain. We derive the individual load quantum to be assigned to the processors according to computation and communication link speeds along the chain. We consider two cases of sequence alignment where post-processes, i.e., trace-back processes that are required to determine an optimal alignment, may or may not be done at individual processors in the system. We derive some critical conditions to determine if our strategies are able to yield an optimal processing time. We apply three different heuristic strategies proposed in the literature to generate sub-optimal solutions for processing time when the above conditions cannot be satisfied. To testify the proposed schemes, we use real-life DNA samples of house mouse mitochondrion and the DNA of human mitochondrion obtained from the public database GenBank [GenBank, http://www.ncbi.nlm.nih.gov] in our simulation experiments. By this study, we conclusively demonstrate the applicability and potential of the DLT paradigm to such biological sequence related computational problems. 相似文献

2.

线性有限自动机的UIO序列及其生成算法

谢正卫邓培民易忠《计算机工程与应用》2007,43(2):49-52,144

对线性有限自动机的UIO序列进行了讨论,得到了线性有限自动机的任意一状态有某一长度的UIO序列的充要条件,得到了线性有限自动机的(所有)状态有UIO序列的的充要条件,还给出了有UIO序列的线性有限自动机的状态的最短UIO序列长度的上界,最后给出了判断线性有限自动机的所有状态有无UIO序列以及有求其UIO序列的两个算法. 相似文献

3.

Application of a novel IWO to the design of encoding sequences for DNA computing

Xuncai Zhang Yanfeng Wang Guangzhao Cui Ying Niu Jin Xu 《Computers & Mathematics with Applications》2009,57(11-12):2001

Encoding and processing information in DNA-, RNA- and other biomolecule-based devices is an important requirement for DNA based computing with potentially important applications. To make DNA computing more reliable, much work has focused on designing the good DNA sequences. However, this is a bothersome task as encoding problem is an NP problem. In this paper, a new methodology based on the IWO algorithm is developed to optimize encoding sequences. Firstly, the mathematics models of constrained objective optimization design for encoding problems based on the thermodynamic criteria are set up. Then, a modified IWO method is developed by defining the colonizing behavior of weeds to overcome the obstacles of the original IWO algorithm, which cannot be applied to discrete problems directly. The experimental results show that the proposed method is effective and convenient for the user to design and select effective DNA sequences in silicon for controllable DNA computing. 相似文献

4.

Prediction of structural classes for protein sequences and domains—Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy

Lukasz A. Kurgan Leila Homaeian 《Pattern recognition》2006,39(12):2323-2343

This paper addresses computational prediction of protein structural classes. Although in recent years progress in this field was made, the main drawback of the published prediction methods is a limited scope of comparison procedures, which in same cases were also improperly performed. Two examples include using protein datasets of varying homology, which has significant impact on the prediction accuracy, and comparing methods in pairs using different datasets. Based on extensive experimental work, the main aim of this paper is to revisit and reevaluate state of the art in this field. To this end, this paper performs a first-of-its-kind comprehensive and multi-goal study, which includes investigation of eight prediction algorithms, three protein sequence representations, three datasets with different homologies and finally three test procedures. Quality of several previously unused prediction algorithms, newly proposed sequence representation, and a new-to-the-field testing procedure is evaluated. Several important conclusions and findings are made. First, the logistic regression classifier, which was not previously used, is shown to perform better than other prediction algorithms, and high quality of previously used support vector machines is confirmed. The results also show that the proposed new sequence representation improves accuracy of the high quality prediction algorithms, while it does not improve results of the lower quality classifiers. The study shows that commonly used jackknife test is computationally expensive, and therefore computationally less demanding 10-fold cross-validation procedure is proposed. The results show that there is no statistically significant difference between these two procedures. The experiments show that sequence homology has very significant impact on the prediction accuracy, i.e. using highly homologous datasets results in higher accuracies. Thus, results of several past studies that use homologous datasets should not be perceived as reliable. The best achieved prediction accuracy for low homology datasets is about 57% and confirms results reported by Wang and Yuan [How good is the prediction of protein structural class by the component-coupled method?. Proteins 2000;38:165–175]. For a highly homologous dataset instance based classification is shown to be better than the previously reported results. It achieved 97% prediction accuracy demonstrating that homology is a major factor that can result in the overestimated prediction accuracy. 相似文献

5.

Alignment of nucleotide or amino acid sequences on microcomputers, using a modification of Sellers' (1974) algorithm which avoids the need for calculation of the complete distance matrix 总被引：1，自引：0，他引：1

H Tyson B Haley 《Computer methods and programs in biomedicine》1985,21(1):3-10

A program to calculate optimum alignment between two sequences, which may be DNA, amino acid or other information, has been written in PASCAL. The Sellers' algorithm for calculating distance between sequences has been modified to reduce its demands on microcomputer memory space by more than half. Gap penalties and mismatch scores are user-adjustable. In 48 K of memory the program aligns sequences up to 170 elements in length; optimum alignment and total distance between a pair of sequences are displayed. The program aligns longer sequences by subdivision of both sequences into corresponding, overlapping sections. Section length and amount of section overlap are user-defined. More importantly, extension of this modification of Sellers' algorithm to align longer sequences, given hardware and compilers/languages capable of using a larger memory space (e.g. 640 K), shows that it is now possible to align, without subdivision, sequences with up to 700 elements each. The increase in computation time for this program with increasing sequence lengths aligned without subdivision is curvilinear, but total times are essentially dependent on hardware/language/compiler combinations. The statistical significance of an alignment is examined with conventional Monte Carlo approaches. 相似文献

6.

Impact of platform heterogeneity on the design of parallel algorithms for morphological processing of high-dimensional image data

Antonio Plaza Javier Plaza David Valencia 《The Journal of supercomputing》2007,40(1):81-107

The main objective of this paper is to describe a realistic framework to understand parallel performance of high-dimensional image processing algorithms in the context of heterogeneous networks of workstations (NOWs). As a case study, this paper explores techniques for mapping hyperspectral image analysis techniques onto fully heterogeneous NOWs. Hyperspectral imaging is a new technique in remote sensing that has gained tremendous popularity in many research areas, including satellite imaging and aerial reconnaissance. The automation of techniques able to transform massive amounts of hyperspectral data into scientific understanding in valid response times is critical for space-based Earth science and planetary exploration. Using an evaluation strategy which is based on comparing the efficiency achieved by an heterogeneous algorithm on a fully heterogeneous NOW with that evidenced by its homogeneous version on a homogeneous NOW with the same aggregate performance as the heterogeneous one, we develop a detailed analysis of parallel algorithms that integrate the spatial and spectral information in the image data through mathematical morphology concepts. For comparative purposes, performance data for the tested algorithms on Thunderhead (a large-scale Beowulf cluster at NASA’s Goddard Space Flight Center) are also provided. Our detailed investigation of the parallel properties of the proposed morphological algorithms provides several intriguing findings that may help image analysts in selection of parallel techniques and strategies for specific applications.

Antonio PlazaEmail:

相似文献

7.

On the design of multiple key hashing files for concurrent orthogonal range retrieval between two disks

C. Y. Chen C. C. Chang R. C. T. Lee 《Information Systems》1991,16(6):613-625

This paper concerns the following problem: given a set of multi-attribute records, a fixed number of buckets and a two-disk system, arrange the records into the buckets and then store the buckets between the disks in such a way that, over all possible orthogonal range queries (ORQs), the disk access concurrency is maximized. We shall adopt the multiple key hashing (MKH) method for arranging records into buckets and use the disk modulo (DM) allocation method for storing buckets onto disks. Since the DM allocation method has been shown to be superior to any other allocation methods for allocating an MKH file onto a two-disk system for answering ORQs, the real issue is knowing how to determine an optimal way for organizing the records into buckets based upon the MKH concept.

A performance formula that can be used to evaluate the average response time, over all possible ORQs, of an MKH file in a two-disk system using the DM allocation method is first presented. Based upon this formula, it is shown that our design problem is related to a notoriously difficult problem, namely the Prime Number Problem. Then a performance lower bound and an efficient algorithm for designing optimal MKH files in certain cases are presented. It is pointed out that in some cases the optimal MKH file for ORQs in a two-disk system using the DM allocation method is identical to the optimal MKH file for ORQs in a single-disk system and the optimal average response time in a two-disk system is slightly greater than one half of that in a single-disk system. 相似文献