首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
IBM microcomputer programs that analyze DNA sequences for tRNA genes   总被引:2,自引:0,他引:2  
A set of four computer programs that search DNA sequence data files for transfer RNA genes have been written in IBM (Microsoft) BASIC for the IBM personal computer. These programs locate and plot predicted secondary structures of tRNA genes in the cloverleaf conformation. The set of programs are applicable to eukaryotic tRNA genes, including those containing intervening sequences, and to prokaryotic and mitochondrial tRNA genes. In addition, two of the programs search up to 150 residues downstream of tRNA gene sequences for possible eukaryotic RNA polymerase III termination sites comprised of at least four consecutive T residues. Molecular biologists studying a variety of gene sequence and flanking regions can use these programs to search for the additional presence of tRNA genes. Furthermore, investigators studying tRNA gene structure-to-function relationships would not need to do extensive restriction mapping to locate tRNA gene sequences within their cloned DNA fragments.  相似文献   

2.
Modern methods of automated protein sequence analysis can provide high-quality data with which unambiguous amino-acid sequences can be determined, but analyses are more difficult when the sample is not pure. COMSEQ and auxillary programs were written to facilitate reconciliation of multiple amino-acid sequences potentially contained in noisy data with the known amino-acid sequence of the parent protein. The COMSEQ program prints a matrix in which the first vertical column represents the known amino-acid sequence of a selected protein. Each row of the matrix contains the sequencer yield corresponding to the amino acid in the first column, with each column corresponding to the sequencing reaction cycle. A diagonal which contains net increases of amino acids for each amino acid in the known sequence identifies a peptide potentially contained within the data. The number of matches for each diagonal over the entire known sequence are tabulated and presented as an aid to locating comparisons of greatest interest. The RNDSEQ program conducts multiple analyses using randomized versions of the known amino-acid sequence and tabulates the cumulative frequencies of potential sequence matches irrespective of the true known sequence. TRANSEQ is a utility program that translates edited sequence data from common databases into files that can be used by COMSEQ and RNDSEQ. The programs have been used successfully to identify two co-sequenced peptides from bovine serum albumin, an albumin peptide sequence in the presence of hemoglobin, and to identify two sequences of rat alpha-2u-globulin that differ in their amino termini.  相似文献   

3.
This paper describes a generic algorithm for finding restriction sites within DNA sequences. The 'genericity' of the algorithm is made possible through the use of set theory. Basic elements of DNA sequences, i.e. nucleotides (bases), are represented in sets, and DNA sequences, whether specific, ambiguous or even protein-coding, are represented as sequences of those sets. The set intersection operation demonstrates its ability to perform pattern-matching correctly on various DNA sequences. The performance analysis showed that the degree of complexity of the pattern matching is reduced from exponential to linear. An example is given to show the actual and potential restriction sites, derived by the generic algorithm, in the DNA sequence template coding for a synthetic calmodulin.  相似文献   

4.
The emerging field of synthetic biology moves beyond conventional genetic manipulation to construct novel life forms which do not originate in nature. We explore the problem of designing the provably shortest genomic sequence to encode a given set of genes by exploiting alternate reading frames. We present an algorithm for designing the shortest DNA sequence simultaneously encoding two given amino acid sequences. We show that the coding sequence of naturally occurring pairs of overlapping genes approach maximum compression. We also investigate the impact of alternate coding matrices on overlapping sequence design. Finally, we discuss an interesting application for overlapping gene design, namely the interleaving of an antibiotic resistance gene into a target gene inserted into a virus or plasmid for amplification.  相似文献   

5.
A set of programs has been developed for the definition and handling of nucleic acid sequence consensus information. The sequences of known genetic control signals are combined in a matrix. The origins and positions of the signals are recorded. Old matrices can be updated dynamically: new signals are included and obsolete ones deleted. Matrices of several different types are computed optionally. Several of these matrices can be combined to find possible new signals. The use of matrices allows the exact quantification of signal qualities. The described programs are part of a program library named GENEXPERT. Application examples given are the search for tRNA genes and the search for promoters in the bacteriophage lambda genome.  相似文献   

6.
An algorithm for searching restriction maps   总被引:1,自引:0,他引:1  
This paper presents an algorithm that searches a DNA restriction enzyme map for regions that approximately match a shorter 'probe' map. Both the map and the probe consist of a sequence of address-enzyme pairs denoting restriction sites, and the algorithm penalizes a potential match for undetected or missing sites and for discrepancies in the distance between adjacent sites. The algorithm was designed specifically for comparing relatively short DNA sequences with a long restriction map, a problem that will become increasing common as large physical maps are generated. The algorithm has been used to extract information from a restriction map of the entire Escherichia coli genome.  相似文献   

7.
A program has been developed for the modelling of modifications in DNA ends, for the construction of ligated junctions, and for the analysis in these junctions of new restriction enzyme recognition sequences. This program allows the analysis of restriction enzyme specificities in ligated junctions of cohesive or blunt DNA ends. Cohesive ends are considered in their natural configuration or after modification by possible blunt-ending procedures. The program also allows the modelling of partial filling-in for 5'-single-stranded ends. This program has proven useful for the design of sequences with new restriction sites or to predict or confirm the sequence of junctions created by the ligation of modified ends.  相似文献   

8.
We have improved an existing clone database management system written in FORTRAN 77 and adapted it to our software environment. Improvements are that the database can be interrogated for any type of information, not just keywords. Also, recombinant DNA constructions can be represented in a simplified 'shorthand', whereafter a program assembles the full nucleotide sequence from the contributing fragments, which may be obtained from nucleotide sequence databases. Another improvement is the replacement of the database manager by programs, running in batch to maintain the databank and verify its consistency automatically. Finally, graphic extensions are written in Graphical Kernel System, to draw linear and circular restriction maps of recombinants. Besides restriction sites, recombinant features can be presented from the feature lines of recombinant database entries, or from the feature tables of nucleotide databases. The clone database management system is fully integrated into the sequence analysis software package from the Pasteur Institute, Paris, and is made accessible through the same menu. As a result, recombinant DNA sequences can directly be analysed by the sequence analysis programs.  相似文献   

9.
A fast general purpose DNA handling program has been developed in BASIC and machine language. The program runs on the Apple II plus or on the Apple IIe microcomputer, without additional hardware except for disk drives and printer. The program allows file insertion and editing, translation into protein sequence, reverse translation, search for small strings and restriction enzyme sites. The homology may be shown either as a comparison of two sequences or through a matrix on screen. Two additional features are: (i) drawing restriction site maps on the printer; and (ii) simulating a gel electrophoresis of restriction fragments both on screen and on paper. All the operations are very fast. The more common tasks are carried out almost instantly; only more complex routines, like finding homology between large sequences or searching and sorting all the restriction sites in a long sequence require longer, but still quite acceptable, times (generally under 30 s).  相似文献   

10.
A computer program has been devised to automate rationalization of peptide fragmentation patterns. The program systematically generates all possible linear amino acid sequences which might be attributable to a peptide with a known amino acid composition. The generated sequences are then searched to find those that most closely match the spectrum of an unknown sequence.  相似文献   

11.
Several interactive Pascal programs have been written for the analysis and display of structural information in nucleic acid sequences. Layout procedures were developed to display the homology and repeat matrices of a sequence and to predict and display the secondary structure of RNA/DNA molecules free of overlap and to predict and display internal repeats. No special plotting devices are required because the output is adapted to line printers. Sequences from several DNA database systems can be used as input. These programs are part of a general nucleic acid sequence analysis package.  相似文献   

12.
《Computers & chemistry》1997,21(4):215-222
As the Human Genome Project enters the large-scale sequencing phase, computational gene identification methods are becoming essential for the automatic analysis and annotation of large uncharacterized genomic sequences. Currently available computer programs relying mainly on sequence coding statistics are of great use in pin-pointing regions in genomic sequences containing exons. Such programs perform rather poorly, however, when the problem is to fully elucidate gene structure. For this problem, the DNA sequence signals involved in the specification of the genes—start sites and splice sites—carry a lot of information, and simple methods relying on such information can predict gene structure with an accuracy to some extent comparable to that of other more sophisticated computational methods.  相似文献   

13.
We have developed a program for the graphic representation and manipulation of DNA sequences. The program (named CARTE from the French for 'map') is intended as a tool in the planning and analysis of recombinant DNA experiments. DNA sequences are represented as standard restriction maps, using any desired combination of restriction enzymes. Features of interest, such as promoters or coding sequences, can be highlighted. The sequence can be manipulated to mimic cloning, using deletions, insertions or replacements at specified sites. This process is facilitated by the simultaneous display of a graphic map of the entire sequence, a detailed picture of the work in progress, and a menu of functions.  相似文献   

14.

Studies of biological evolution have generally focused on nucleotide or amino acid sequences of certain genes related to specific enzymes. Most phylogenetic tree constructions have been carried out using amino acid sequences and are used as a predictor to show evolutionary relationships. Phylogenetic analysis is usually performed based on multiple sequence alignment of a gene from different organisms including fungi. A number of programs have been introduced for gene clustering and phylogenetic analysis. For example, the most popular web-based program is Clustal Omega which is commonly used by biologists. When the number of uploaded sequences increases, this program not only works slowly but also the final constructed cladogram is confusing and incorrect from evolutionary point of view. In the present study, we used fungal hexosaminidases which are extracellular enzymes with a lot of applications in biotechnology but extremely varied and confusing in evolutionary terms. A standard taxonomy-based phylogenetic tree was constructed for 835 FH amino acid sequences retrieved from National Center for Biotechnology Information (NCBI) on March 16, 2015. Then a supervised multilayer perceptron (MLP) neural network was used to discriminate FH sequences. Based on relative frequency of amino acid in FH sequences, 41 neural networks were designed for seven levels from the phylum to family. Minimum accuracy of the neural network was equal to 99% at all seven discrimination levels. As a final step, an additional evaluation was performed on the designed model with 143 new released FH sequences extracted on July 1, 2015. The clustering results have shown a proper match with fungal taxonomy to show evolutionary relationships.

  相似文献   

15.
An integrated family of amino acid sequence analysis programs   总被引:6,自引:0,他引:6  
During the last years abundant sequence data has become available due to the rapid progress in protein and DNA sequencing techniques. The exact three-dimensional structures, however, are available only for a fraction of proteins with known sequences. For many purposes the primary amino acid sequence of a protein can be directly used to predict important structural parameters. However, mathematical presentation of the calculated values often makes interpretation difficult, especially if many proteins must be analysed and compared. Here we introduce a broad-based, user-defined analysis of amino acid sequence information. The program package is based on published algorithms and is designed to access standard protein data bases, calculate hydropathy, surface probability and flexibility values and perform secondary structure predictions. The data output is in an 'easy-to-read' graphic format and several parameters can be superimposed within a single plot in order to simplify data interpretations. Additionally, this package includes a novel algorithm for the prediction of potential antigenic sites. Thus the software package presented here offers a powerful means of analysing an amino acid sequence for the purpose of structure/function studies as well as antigenic site analyses. These algorithms were written to function in context with the UWGCG (University of Wisconsin Genetics Computer Group) program collection, and are now distributed within that package.  相似文献   

16.
This paper describes a multiple alignment method using a workstation and supercomputer. The method is based on the alignment of a set of aligned sequences with the new sequence, and uses a recursive procedure of such alignment. The alignment is executed in a reasonable computation time on diverse levels from a workstation to a supercomputer, from the viewpoint of alignment results and computational speed by parallel processing. The application of the algorithm is illustrated by several examples of multiple alignment of 12 amino acid and DNA sequences of HIV (human immunodeficiency virus) env genes. Colour graphic programs on a workstation and parallel processing on a supercomputer are discussed.  相似文献   

17.
18.
As many structures of protein–DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein–DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone.  相似文献   

19.
Improved sensitivity of biological sequence database searches   总被引:7,自引:0,他引:7  
We have increased the sensitivity of DNA and protein sequence database searches by allowing similar but non-identical amino acids or nucleotides to match. In addition, one can match k-tuples or words instead of matching individual residues in order to speed the search. A matching matrix species which k-tuples match each other. The matching matrix can be calculated from a similarity matrix of amino acids and a threshold of similarity required for matching. This permits amino acid similarity matrices or replacement matrices (PAM matrices) to be used in the first step of a sequence comparison rather than in a secondary scoring phase. The concept of matching non-identical k-tuples also increases the power of DNA database searches. For example, a matrix that specifies that any 3-tuple in a DNA sequence can match any other 3-tuple encoding the same amino acid permits a DNA database search using a DNA query sequence for regions that would encode a similar amino acid sequence.  相似文献   

20.
脯氨酸肽键数据集的构建   总被引:1,自引:0,他引:1  
由分辨率<0.25nm,同一性(identity)<30%的2401条肽链中计算提取了全部顺式与反式脯氨酸肽键的位置,数目分别为1221个与26401个,从而建立了一个较大规模的脯氨酸肽键数据集。统计分析了该数据集的基本特征:肽键N端残基的分布、N端残基的二面角统计、在二级结构中的分布情况、顺式肽键在脯氨酸肽键中所占比例。此数据集对于进一步研究顺反X-Pro肽键的结构、与氨基酸序列之间的关系,以及肽链折叠动力学具有重要作用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号