首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
MHCPEP is a curated database comprising over 9000 peptide sequences known to bind MHC molecules. Entries are compiled from published reports as well as from direct submissions of experimental data. Each entry contains the peptide sequence, its MHC specificity and, when available, experimental method, observed activity, binding affinity, source protein, anchor positions and publication references. The present format of the database allows text string matching searches but can easily be converted for use in conjunction with sequence analysis packages. The database can be accessed via Internet using WWW, FTP or Gopher.  相似文献   

2.
The GenBank (Registered Trademark symbol) sequence database incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from individual laboratories and from large-scale sequencing projects. Most submitters use the BankIt (Web) or Sequin programs to format and send sequence data. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's integrated retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome and protein structure information. MEDLINE (Registered Trademark symbol) s from published articles describing the sequences are included as an additional source of biological annotation through the PubMed search system. Sequence similarity searching is offered through the BLAST series of database search programs. In addition to FTP, Email, and server/client versions of Entrez and BLAST, NCBI offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the URL: http://www.ncbi.nlm.nih.gov  相似文献   

3.
We have found that human organs such as colon, lung, and muscle, as well as their derived tumors, share nearly all mitochondrial hotspot point mutations. Seventeen hotspots, primarily G --> A and A --> G transitions, have been identified in the mitochondrial sequence of base pairs 10,030-10,130. Mutant fractions increase with the number of cell generations in a human B cell line, TK6, indicating that they are heritable changes. The mitochondrial point mutation rate appears to be more than two orders of magnitude higher than the nuclear point mutation rate in TK6 cells and in human tissues. The similarity of the hotspot sets in vivo and in vitro leads us to conclude that human mitochondrial point mutations in the sequence studied are primarily spontaneous in origin and arise either from DNA replication error or reactions of DNA with endogenous metabolites. The predominance of transition mutations and the high number of hotspots in this short sequence resembles spectra produced by DNA polymerases in vitro.  相似文献   

4.
A method is described for searching protein sequence databases using tandem mass spectra of tryptic peptides. The approach uses a de novo sequencing algorithm to derive a short list of possible sequence candidates which serve as query sequences in a subsequent homology-based database search routine. The sequencing algorithm employs a graph theory approach similar to previously described sequencing programs. In addition, amino acid composition, peptide sequence tags and incomplete or ambiguous Edman sequence data can be used to aid in the sequence determinations. Although sequencing of peptides from tandem mass spectra is possible, one of the frequently encountered difficulties is that several alternative sequences can be deduced from one spectrum. Most of the alternative sequences, however, are sufficiently similar for a homology-based sequence database search to be possible. Unfortunately, the available protein sequence database search algorithms (e.g. Blast or FASTA) require a single unambiguous sequence as input. Here we describe how the publicly available FASTA computer program was modified in order to search protein databases more effectively in spite of the ambiguities intrinsic in de novo peptide sequencing algorithms.  相似文献   

5.
Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identify heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C.elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.  相似文献   

6.
ACT/DB is a client-server database application for storing clinical trials and outcomes data, which is currently undergoing initial pilot use. It stores most of its data in entity-attribute-value form. Such data are segregated according to data type to allow indexing by value when possible, and binary large object data are managed in the same way as other data. ACT/DB lets an investigator design a study rapidly by defining the parameters (or attributes) that are to be gathered, as well as their logical grouping for purposes of display and data entry. ACT/DB generates customizable data entry. The data can be viewed through several standard reports as well as exported as text to external analysis programs. ACT/DB is designed to encourage reuse of parameters across multiple studies and has facilities for dictionary search and maintenance. It uses a Microsoft Access client running on Windows 95 machines, which communicates with an Oracle server running on a UNIX platform. ACT/DB is being used to manage the data for seven studies in its initial deployment.  相似文献   

7.
Large-scale DNA sequencing is creating a sequence infrastructure of great benefit to protein biochemistry. Concurrent with the application of large-scale DNA sequencing to whole genome analysis, mass spectrometry has attained the capability to rapidly, and with remarkable sensitivity, determine weights and amino acid sequences of peptides. Computer algorithms have been developed to use the two different types of data generated by mass spectrometers to search sequence databases. When a protein is digested with a site-specific protease, the molecular weights of the resulting collection of peptides, the mass map or fingerprint, can be determined using mass spectrometry. The molecular weights of the set of peptides derived from the digestion of a protein can then be used to identify the protein. Several different approaches have been developed. Protein identification using peptide mass mapping is an effective technique when studying organisms with completed genomes. A second method is based on the use of data created by tandem mass spectrometers. Tandem mass spectra contain highly specific information in the fragmentation pattern as well as sequence information. This information has been used to search databases of translated protein sequences as well as nucleotide databases such as expressed sequence tag (EST) sequences. The ability to search nucleotide databases is an advantage when analyzing data obtained from organisms whose genomes are not yet completed, but a large amount of expressed gene sequence is available (e.g., human and mouse). Furthermore, a strength of using tandem mass spectra to search databases is the ability to identify proteins present in fairly complex mixtures.  相似文献   

8.
9.
Simplification of molecular genetic techniques is one of the main features of large-scale clinical applications of mutation analysis. The solid-phase minisequencing method, which is based on single-nucleotide primer extension by a DNA polymerase on a solid support, is an easy way of detecting point mutations of previously known locations. Here the procedure was further simplified by the use of microplates made of scintillating plastics, a microplate format scintillation counter and an automatic microplate washer. DNA samples from patients with either a hereditary aspartylglucosaminidase (AGA) gene point mutation or an acquired N-ras gene mutation were analyzed by three different minisequencing detection procedures utilizing tritiated nucleotides. The new counting method with scintillating plates was compared to traditional liquid scintillation counting in scintillation vials or to another microplate format procedure, which requires addition of scintillation liquid. In all three methods, normal individuals, heterozygous carriers of the AGA mutation and homozygous patients could be unequivocally discriminated. The N-ras mutation in leukemic blasts could also be detected with high resolution. The coefficients of variation and reproducibility of the scintillating microplate method were almost identical to those of the traditional liquid scintillation assay, which was used as a reference method. The technical innovations adopted here for performing minisequencing assays reduce significantly the labor required without affecting the quality of the results.  相似文献   

10.
A computerized database containing DNA sequence information regarding human HPRT mutants has been created. The database itself is in the dBASE format and contains information on about 1500 mutants. In addition, an IBM PC compatible software package to analyze the information in the database has been developed. Both the database and software are freely available via the Internet.  相似文献   

11.
The Ad5 E1A database is a listing of mutations affecting the early region 1A (E1A) proteins of human adenovirus type 5. The database contains the name of the mutation, the nucleic acid sequence changes, the resulting alterations in amino acid sequence and reference. Additional notes and references are provided on the effect of each mutation on E1A function. The database is contained within the Adenovirus 5 E1A page on the World Wide Web at: http://www.geocities.com/CapeCanaveral/Hangar /2541/  相似文献   

12.
The Olfactory Receptor Database (ORDB) is a WWW-accessible database that stores data on Olfactory Receptor-like molecules (ORs) and has been open to the public since June 1996. It contains a public and a private area. The public area includes published DNA and protein sequence data for ORs, links to OR models and data on their expression, chromosomal localization and source organism, as well as (i) links to bibliography through PubMed and (ii) interactive WWW-based tools, such as BLAST homology searching. The private area functions as a service to laboratories that are actively cloning receptors. Source laboratories enter the sequences of the receptor clones they have characterized to the private database and can search for identical or near identical OR sequences in both public and private databases. If another laboratory has cloned and deposited an identical or closely matching sequence there are means for communication between the laboratories to help avoid duplication of work. ORDB is available via the WWW at http://crepe.med.yale.edu/ORDB/HTML  相似文献   

13.
We have studied spontaneous mutagenesis in five hprt cDNA genes integrated at five different genomic positions in a human lymphoblastoid cell line (TK6). The spectra of 40 mutants from each position were combined to obtain a mutation spectrum of the overall genome. This collection of mutants was used to assess the contribution of several mutagenic processes to spontaneous mutagenesis. Deletions and single base pair changes account for the majority of the mutants and arise in approximately equal amounts (43 and 41%, respectively). The majority of the deletions and insertions are < 5 bp and are likely to be caused by template-directed misalignment (slippage) during replication. To account for frameshifts at non-iterated sites we propose a slightly different template-directed replication error model. A considerable amount of the observed base pair changes can also be explained by this last model, but several other processes leading to base pair changes such as depurination, deamination or spontaneously arising DNA damage are likely to contribute as well. We have compared this spectrum with mutation spectra in the endogenous hprt genes using published mutation data. It is shown that in the endogenous genes the contribution of base pair substitutions is much larger (71%) than in the hprt cDNA integrates and that deletions are less frequently observed (20%). The mutation rates of the integrated hprt cDNA genes show a mean increase of 30-fold as compared with the endogenous hprt gene. This results in a 60-fold increase of the absolute rate of deletion in the hprt cDNA genes and in a 15-fold increase of the base pair substitution rate. Replication errors such as slippage or the mechanism proposed in this study probably account to a large extent for this increase.  相似文献   

14.
15.
Q Liu  EC Thorland  SS Sommer 《Canadian Metallurgical Quarterly》1997,22(2):292-4, 296, 298, passim
A T-->C point mutation is shown to specifically inhibit PCR amplification when compared to wild-type controls in exon H of the factor IX gene. Multiple primers of different lengths and locations were designed to examine this phenomenon. The experiments suggest that poor annealing and/or extension from the downstream primer are responsible for the observed inhibition and that the mutation can exert an inhibitory effect upon PCR amplification at a distance of at least 84 bp. The inhibition was not alleviated when amplification conditions such as annealing temperature, time of extension, type of DNA polymerase or concentration of DNA template, primer or DNA polymerase were varied. The inhibitory factor(s) are likely to be contained within the amplified segment itself because neither the use of a previously amplified PCR product as template for nested PCRs nor the restriction enzyme digestion of that previously amplified product relieved the inhibition of PCR amplification in the mutant sample. Computer analyses with the FOLDRNA and FOLDDNA programs did not reveal the mechanism of inhibition. Although dramatic inhibition, as shown here, may be uncommon, more subtle inhibition may be frequent. Documentation of differential amplification caused by a single-base substitution in template sequence has implications for certain commonly used PCR-based methods such as quantitative PCR, differential display and DNA fingerprinting. In addition, heterozygous single-base pair mutations down-stream of a primer may be missed if the PCR is inhibited; alternatively; the mutation may appear to be homozygous if amplification of the mutated allele is selectively enhanced.  相似文献   

16.
The docking of repressor proteins to DNA starting from the unbound protein and model-built DNA coordinates is modeled computationally. The approach was evaluated on eight repressor/DNA complexes that employed different modes for protein/ DNA recognition. The global search is based on a protein-protein docking algorithm that evaluates shape and electrostatic complementarity, which was modified to consider the importance of electrostatic features in DNA-protein recognition. Complexes were then ranked by an empirical score for the observed amino acid /nucleotide pairings (i.e., protein-DNA pair potentials) derived from a database of 20 protein/ DNA complexes. A good prediction had at least 65% of the correct contacts modeled. This approach was able to identify a good solution at rank four or better for three out of the eight complexes. Predicted complexes were filtered by a distance constraint based on experimental data defining the DNA footprint. This improved coverage to four out of eight complexes having a good model at rank four or better. The additional use of amino acid mutagenesis and phylogenetic data defining residues on the repressor resulted in between 2 and 27 models that would have to be examined to find a good solution for seven of the eight test systems. This study shows that starting with unbound coordinates one can predict three-dimensional models for protein/DNA complexes that do not involve gross conformational changes on association.  相似文献   

17.
LIGAND: chemical database for enzyme reactions   总被引:1,自引:0,他引:1  
MOTIVATION: The existing molecular biology databases focus on the sequence and structural aspects of biological macromolecules, i.e. DNAs, RNAs and proteins. However, in order to understand the functional aspects, it is essential to computerize the interaction of these molecules. Furthermore, living cells contain additional molecules, such as metabolic compounds and metal ions, that may also be considered as parts of the basic building blocks of life, but are not well organized in public databases. LIGAND chemical database is our attempt to solve these problems, at least for enzymatic reactions. RESULTS: LIGAND consists of two sections: ENZYME and COMPOUND. The ENZYME section is an extension of previous studies (Suyama et al. , Comput. Applic. Biosci., 9, 9-15, 1993), and it is a flat-file representation of 3303 enzymes and 2976 enzymatic reactions in the chemical equation format that can be parsed by machine. The COMPOUND section has been newly constructed for information on the nomenclature and chemical structures of compounds. It contains 5383 chemical compounds. Both ENZYME and COMPOUND entries contain rich cross-reference information, most of which is automatically generated by the DBGET/LinkDB system, thus providing the linkage between chemical and biological databases. LIGAND is updated daily, tightly coupled with the KEGG metabolic pathway database, and forms the basis for reconstruction and computation of pathways. AVAILABILITY: LIGAND can be accessed through the DBGET/LinkDB and KEGG systems in the Japanese GenomeNet database service via http://www.genome.ad.jp/. The flat-file format of the LIGAND database can be downloaded by anonymous FTP via ftp://kegg. genome.adjp/molecules/ligand/. CONTACT: goto@kuicr.kyoto-u.ac.jp; nishioka@scl.kyoto-u.ac.jp; kanehisa@kuicr.kyoto-u.ac.jp  相似文献   

18.
Searches of the major public protein databases with core and linker chicken and human histone sequences have resulted in the compilation of an annotated set of histone protein sequences. In addition, new database searches with two distinct motif search algorithms have identified several members of the histone fold family, including human DRAP1 and yeast CSE4. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, links to the Entrez integrated information retrieval system, structures for histone and histone fold proteins, and the ability to visualize structural data through Cn3D. The database currently contains >1000 protein sequences, which are searchable by protein type, accession number, organism name, or any other free text appearing in the definition line of the entry. All sequences and alignments in this database are available through the World Wide Web at http://www.nhgri.nih. gov/DIR/GTB/HISTONES or http://www.ncbi.nlm.nih. gov/Baxevani/HISTONES  相似文献   

19.
Written sentences often contain several meaningful components (e.g., causes and effects or events in a sequence). Preliminary studies of technical documents showed that typographically segmenting these components improved raters' judgments of the comprehensibility of the information. In the present paper, this segmentation notion is generalized, suggesting that phrase segmentation and indentation can be used to facilitate comprehension. Five experiments were conducted (with a total of 72 college students or technical aides) in which Ss verified sentences by reading complex information in several technical passages. Meaningfully segmented and indented text resulted in 14–28% faster response times than standard text. Both segmenting and indenting significantly influenced performance; however, once a text had been meaningfully segmented, the addition of indentation cues did not significantly affect response time. These data shed light on persisting issues in typographic design, namely, whether there is an optimal length for lines and whether justified margins are desirable. Such factors appear to be of minor cognitive relevance. The critical variable is whether the format results in a display of easily encoded units, regardless of length or neatness of margins. (16 ref) (PsycINFO Database Record (c) 2010 APA, all rights reserved)  相似文献   

20.
Two-dimensional (2-D) gene scanning (TDGS) is a method for mutation detection based on the electrophoretic separation of PCR-amplified DNA fragments according to size and base pair sequence. The use of denaturing gradient gel electrophoresis (DGGE) as the second separation step provides virtually 100% sensitivity, while the 2-D format allows the inspection of multiple gene fragments simultaneously. Analysis of many exons in parallel is greatly facilitated by extensive PCR multiplexing based on preamplification by long-distance PCR. Recently, TDGS has been applied to detect mutations in the retinoblastoma tumor suppressor gene RB1. Using RB1 as a model, we have now analyzed each step of the protocol, presenting overall improvements and a detailed cost analysis, where the total cost of the assay is found to be about $40 (US). An overall picture of TDGS cost-performance, as compared to direct sequencing, is provided as a function of the number of target fragments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号