首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a generalized Lévy walk to model fractal landscapes observed in noncoding DNA sequences. We find that this model provides a very close approximation to the empirical data and explains a number of statistical properties of genomic DNA sequences such as the distribution of strand-biased regions (those with an excess of one type of nucleotide) as well as local changes in the slope of the correlation exponent alpha. The generalized Lévy-walk model simultaneously accounts for the long-range correlations in noncoding DNA sequences and for the apparently paradoxical finding of long subregions of biased random walks (length lj) within these correlated sequences. In the generalized Lévy-walk model, the lj are chosen from a power-law distribution P(lj) varies as lj(-mu). The correlation exponent alpha is related to mu through alpha = 2-mu/2 if 2 < mu < 3. The model is consistent with the finding of "repetitive elements" of variable length interspersed within noncoding DNA.  相似文献   

2.
Mosaic organization of DNA nucleotides   总被引:23,自引:0,他引:23  
Long-range power-law correlations have been reported recently for DNA sequences containing noncoding regions. We address the question of whether such correlations may be a trivial consequence of the known mosaic structure ("patchiness") of DNA. We analyze two classes of controls consisting of patchy nucleotide sequences generated by different algorithms--one without and one with long-range power-law correlations. Although both types of sequences are highly heterogenous, they are quantitatively distinguishable by an alternative fluctuation analysis method that differentiates local patchiness from long-range correlations. Application of this analysis to selected DNA sequences demonstrates that patchiness is not sufficient to account for long-range correlation properties.  相似文献   

3.
An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.  相似文献   

4.
采用图像分析方法研究铝用石墨质阴极在不同焙烧温度下孔隙结构特征及其演变,考察孔隙率、孔径分布、形状因子、视孔隙比表面积、连通性等参数的变化规律和孔隙复杂度的分形特征.结果表明:随着焙烧温度增加,孔隙率逐渐增大,而视孔隙比表面积、形状因子和连通性呈先减小后增大趋势;石墨质阴极试样不同温度下焙烧生成孔隙均符合分形规律,借助图像分析孔隙结构参数和分形维数可界定不同典型焙烧温度下阴极孔隙结构的演变特征,并据此提出相应的孔隙特征演化模式.   相似文献   

5.
The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p-distance) between sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree-making with any of the popular methods working with evolutionary distances if evolution of sequences obeys a "molecular clock" (more precisely, if it follows a stationary time-reversible Markov model of nucleotide substitution). Next, we show that p-distances seem to be efficient in recovering the correct tree topology under a "molecular clock," but produce "statistically supported" wrong trees when substitutions rates vary among evolutionary lineages. Finally, we outline a practical approach for selecting an "optimal" model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a "prior" distribution of the expected tree branch lengths under the Jukes-Cantor model. We conclude that the use of a model that is obviously oversimplified is inadvisable unless it is justified by a preliminary analysis of the real sequences.  相似文献   

6.
7.
The dynamics of heartbeat interval time series over large time scales were studied by a modified random walk analysis introduced recently as Detrended Fluctuation Analysis. In this analysis, the intrinsic fractal long-range power-law correlation properties of beat-to-beat fluctuations generated by the dynamical system (i.e., cardiac rhythm generator), after decomposition from extrinsic uncorrelated sources, can be quantified by the scaling exponent (alpha) which, in healthy subjects, for time scales of approximately 10(4) beats is approximately 1.0. The effects of chronic hypoxia were determined from serial heartbeat interval time series of digitized twenty-four-hour ambulatory ECGs recorded in nine healthy subjects (mean age thirty-four years old) at sea level and during a sojourn at 5,050 m for thirty-four days (EvK2-CNR Pyramid Laboratory, Sagarmatha National Park, Nepal). The group averaged alpha exponent (+/- SD) was 0.99 +/- 0.04 (range 0.93-1.04). Longitudinal assessment of alpha in individual subjects did not reveal any effect of exposure to chronic high altitude hypoxia. The finding of alpha approximately 1 indicating scale-invariant long-range power-law correlations (1/f noise) of heartbeat fluctuations would reflect a genuinely self-similar fractal process that typically generates fluctuations on a wide range of time scales. Lack of a characteristic time scale along with the absence of any effect from exposure to chronic hypoxia on scaling properties suggests that the neuroautonomic cardiac control system is preadapted to hypoxia which helps prevent excessive mode-locking (error tolerance) that would restrict its functional responsiveness (plasticity) to hypoxic or other physiological stimuli.  相似文献   

8.
We analyze the fluctuations in the correlation exponents obtained for noncoding DNA sequences. We find prominent sample-to-sample variations as well as variations within a single sample in the scaling exponent. To determine if these fluctuations may result from finite system size, we generate correlated random sequences of comparable length and study the fluctuations in this control system. We find that the DNA exponent fluctuations are consistent with those obtained from the control sequences having long-range power-law correlations. Finally, we compare our exponents for the DNA sequences with the exponents obtained from power-spectrum analysis and correlation-function techniques, and demonstrate that the original "DNA-walk" method is intrinsically more accurate due to reduced noise.  相似文献   

9.
The sequences, or primary structures, of existing biopolymers--in particular, proteins--are believed to be a product of evolution. Are the sequences random? If not, what is the character of this nonrandomness? To explore the statistics of protein sequences, we use the idea of mapping the sequence onto the trajectory of a random walk, originally proposed by Peng et al. [Peng, C.-K., Buldyrev, S. V., Goldberger, A. L., Havlin, S., Sciortino, F., Simons, M. & Stanley, H. E. (1992) Nature (London) 356, 168-170] in their analysis of DNA sequences. Using three different mappings, corresponding to three basic physical interactions between amino acids, we found pronounced deviations from pure randomness, and these deviations seem directed toward minimization of the energy of the three-dimensional structure. We consider this result as evidence for a physically driven stage of evolution.  相似文献   

10.
Complete nucleotide sequences, precise endpoints and coding potential of several 3.0-kilobase mitochondrial DNA (mtDNA) repeating units derived from two isofemale lineages of the mermithid nematode Romanomermis culicivorax have been determined. Endpoint analysis has allowed us to infer deletion and inversion events that most likely generated the present day repeat configuration. Each amplified unit contains the genes for NADH dehydrogenase subunits 3 and 6 (ND3 and ND6), an open reading frame (ORF 1) that represents a cytochrome P450-like gene, and three additional unidentified open reading frames. The primary nucleotide sequences of the R. culicivorax mt-repeat copies within individual haplotypes are highly conserved; three nearly complete copies of the repeat unit vary by 0.01% at the nucleotide level. These observations suggest that concerted evolution mechanisms may be active, resulting in sequence homogenation of these lengthy duplications.  相似文献   

11.
Length variation due to tandem repeats is now recognized as a common feature of animal mitochondrial DNA; however, the evolutionary dynamics of repeated sequences are not well understood. Using phylogenetic analysis, predictions of three models of repeat evolution were tested for arrays of 260-bp repeats in the cyprinid fish Cyprinella spiloptera. Variation at different nucleotide positions in individual repeats supported different models of repeat evolution. One set of characters included several nucleotide variants found in all copies from a limited number of individuals, while the other set included an 8-bp deletion found in a limited number of copies in all individuals. The deletion and an associated nucleotide change appear to be the result of a deterministic, rather than stochastic, mutation process. Parallel origins of repeat arrays in different mitochondrial lineages, possibly coupled with a homogenization mechanism, best explain the distribution of nucleotide variation.  相似文献   

12.
Genes within the major histocompatibility complex (MHC) are characterized by extensive polymorphism within species and also by a remarkable conservation of contemporary human allelic sequences in evolutionarily distant primates. Mechanisms proposed to account for strict nucleotide conservation in the context of highly variable genes include the suggestion that intergenic exchange generates repeated sets of MHC DRB polymorphisms [Gyllensten, U. B., Sundvall, M. & Erlich, H. A. (1991) Proc. Natl. Acad. Sci. USA 88, 3686-3690; Lundberg, A. S. & McDevitt, H. 0. (1992) Proc. Natl. Acad. Sci. USA 89, 6545-6549]. We analyzed over 50 primate MHC DRB sequences, and identified nucleotide elements within macaque and baboon DRB6-like sequences with deletions corresponding to specific exon 2 hypervariable regions, which encode a discrete alpha helical segment of the MHC antigen combining site. This precisely localized deletion provides direct evidence implicating segmental exchange of MHC-encoded DRB gene fragments as one of the evolutionary mechanisms both generating and maintaining MHC diversity. Intergenic exchange at this site may be fundamental to the diversification of immune protection in populations by permitting alteration in the specificity of the MHC that determines the repertoire of antigens bound.  相似文献   

13.
We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.  相似文献   

14.
Using pulsed field gel electrophoresis (PFGE), we have examined the rat major histocompatibility complex (MHC) for the presence of a number of new class III region genes recently identified in the human MHC. We find homologous genes to the human G1, G2, G4, G7a, G9, G9a, G10, G13, G15, and G18 genes, but not the G8 gene in the rat genome, and show that these are linked to the rat TNF-alpha and C4/Slp loci. A long-range restriction map has been constructed on the basis of a PFGE analysis which demonstrates extensive co-linearity in the positions of the homologous sequences in the region between the C4/Slp and TNF loci in the rat MHC when compared with that of the human MHC class III region.  相似文献   

15.
The presence and distribution of the most important highly repetitive DNA sequences of rye in cultivated and wild species of the genus Secale were investigated using fluorescence in situ hybridization. Accurate identification of individual chromosomes in the most commonly recognized species or subspecies of the genus Secale (S. cereale, S. ancestrale, S. segetale, S. afghanicum, S. dighoricum, S. montanum, S. montanum ssp. kuprijanovii, S. africanum, S. anatolicum, S. vavilovii, and S. silvestre) was achieved using three highly repetitive rye DNA sequences (probes pSc119.2, pSc74, and pSc34) and the 5S ribosomal DNA sequence pTa794. It is difficult to superimpose trends in the complexity of repetitive DNA during the evolution of the genus on conclusions from other cytogenetic and morphological assays. However, there are two clear groups. The first comprises the self-pollinated annuals S. silvestre and S. vavilovii that have few repeated nucleotide sequences of the main families of 120 and 480 bp. The second group presents amplification and interstitialization of the repeated nucleotide sequences and includes the perennials S. montanum, S. anatolicum, S. africanum, and S. kuprijanovii, as well as the annual and open-pollinated species S. cereale and its related weedy forms. The appearance of a new locus for 5S rRNA in S. cereale and S. ancestrale suggests that cultivated ryes evolved from this wild weedy species.  相似文献   

16.
17.
IMGT, the international ImMunoGeneTics database, is an integrated database specialising in Immunoglobulins (Ig), T cell Receptors (TcR) and Major Histocompatibility Complex (MHC) of all vertebrate species, created by Marie-Paule Lefranc, CNRS, Montpellier II University, Montpellier, France (lefranc@ligm.crbm.cnrs-mop.fr). IMGT includes three databases: LIGM-DB (for Ig and TcR), MHC/HLA-DB and PRIMER-DB (the last two in development). IMGT comprises expertly annotated sequences and alignment tables. LIGM-DB contains more than 23 000 Immunoglobulin and T cell Receptor sequences from 78 species. MHC/HLA-DB contains Class I and Class II Human Leucocyte Antigen alignment tables. An IMGT tool, DNAPLOT, developed for Ig, TcR and MHC sequence alignments, is also available. IMGT works in close collaboration with the EMBL database. IMGT goals are to establish a common data access to all immunogenetics data, including nucleotide and protein sequences, oligonucleotide primers, gene maps and other genetic data of Ig, TcR and MHC molecules, and to provide a graphical user friendly data access. IMGT has important implications in medical research (repertoire in autoimmune diseases, AIDS, leukemias, lymphomas), therapeutical approaches (antibody engineering), genome diversity and genome evolution studies. IMGT is freely available at http://imgt.cnusc.fr:8104  相似文献   

18.
Statistical analyses of DNA sequences have revealed patterns of nonneutral evolution in mitochondrial DNA of mice, humans, and Drosophila. Here we report patterns of mitochondrial sequence evolution in South American marsh rats (genus Holochilus). We sequenced the complete mitochondrial ND3 gene in 82 Holochilus brasiliensis and 21 H. vulpinus to test the neutral prediction that the ratio of nonsynonymous to synonymous nucleotide changes is the same within and between species. Within H. brasiliensis we observed a greater number of amino acid polymorphisms than expected based on interspecific comparisons. This contingency table analysis suggests that many amino acid polymorphisms are mildly deleterious. Several tests of the frequency distribution also revealed departures from a neutral, equilibrium model, and these departures were observed for both nonsynonymous and synonymous sites. In general, an excess of rare sites was observed, consistent with either a recent selective sweep or with populations not at mutation-drift equilibrium.  相似文献   

19.
Total cellular RNA preparations were isolated from chicken oviducts at three different development stages: (a) immature chicks which were chronically stimulated with estrogen; (b) estrogen-stimulated chicks which were then withdrawn from hormone for 12 days; and (c) laying hens. Total cellular RNA containing 3'-poly(A) sequences (poly(A)-RNA) were than isolated from these preparations using oligo(dT)-cellulose chromatography. The number average nucleotide length of the poly(A)-RNA preparations in each case was approximately 2000 nucleotides. The number average nucleotide length of the poly(A) residues at the 3'-terminal end of each RNA preparation was approximately 70 adenylate residues. Complementary DNA (cDNA) copies to each preparation of poly(A)-RNA were synthesized using avian myeloblastosis virus RNA-directed DNA polymerase. The cDNApoly(A) preparations were then utilized in DNA excess hybridization experiments to analyze the complexity of the DNA sequences from which these RNAs were transcribed. Approximately 22% of each of the total cellular poly(A)-RNAs were transcribed from repeated DNA sequences (average repeat frequency of 35 copies/genome) while the remaining majority were transcribed from single copy or unique sequence DNA. It was possible to estimate the number of different poly(A)-RNA sequences per cell by analyzing the kinetics of hybridization of these cDNApoly(A) preparations to total cellular poly(A)-RNA extracts under conditions of RNA excess. The results revealed that 41% of the poly(A)-RNA from laying hen oviduct consisted of, on the average, three different sequences/cell, each of which was present in approximately 25,000 copies/cell. The remainder of the poly(A)-RNA in this tissue consisted of approximately 25,000 different sequences/cell, which were present largely in only two or three copies/cell. A somewhat similar sequence complexity was found for oviduct cells prepared from estrogen-stimulated chicks. We estimated that there were approximately 20,000 different poly(A)-RNA sequences/cell, each represented in only one to two copies/cell. However, there were five sequences which were present, on the average, in a concentration of 5600 copies/cell. The poly(A)-RNAs from hormone-wtihdrawn tissue, on the other hand, had a lower sequence complexity. There were only approximately 10,000 different poly(A)-RNA sequences/cell, each present in about three copies/cell. Furthermore, the few sequences present in a great abundance in hen and hormone-stimulated tissues were apparently absent in oviduct tissue from hormone-wtihdrawn chicks, suggesting that the intracellular concentrations of these high frequency RNA sequences are dependent on estrogen.  相似文献   

20.
A framework is outlined to study the evolution of DNA or amino acid sequences, if sequence sites do not evolve independently. The units of evolution are nonoverlapping subsequences of length l. Each subsequence evolves independently of the others, but within a subsequence the sequences show a Markov order one dependency. We describe an algorithm to mimic the evolution of such sequences. The influence of dependencies between sites on distance estimates and the reliability of tree reconstruction methods is investigated. We show that an inappropriate model of sequence evolution in the tree reconstruction process will lead to a nonempty Felsenstein zone. Finally, we describe a method to infer l from sequence data. Examples from the evolution of DNA sequences as well as from amino acids are given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号