首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.  相似文献   

2.
Mapping nucleotide sequences onto a "DNA walk" produces a novel representation of DNA that can then be studied quantitatively using techniques derived from fractal landscape analysis. We used this method to analyze 11 complete genomic and cDNA myosin heavy chain (MHC) sequences belonging to 8 different species. Our analysis suggests an increase in fractal complexity for MHC genes with evolution with vertebrate > invertebrate > yeast. The increase in complexity is measured by the presence of long-range power-law correlations, which are quantified by the scaling exponent alpha. We develop a simple iterative model, based on known properties of polymeric sequences, that generates long-range nucleotide correlations from an initially noncorrelated coding region. This new model-as well as the DNA walk analysis-both support the intron-late theory of gene evolution.  相似文献   

3.
We propose a generalized Lévy walk to model fractal landscapes observed in noncoding DNA sequences. We find that this model provides a very close approximation to the empirical data and explains a number of statistical properties of genomic DNA sequences such as the distribution of strand-biased regions (those with an excess of one type of nucleotide) as well as local changes in the slope of the correlation exponent alpha. The generalized Lévy-walk model simultaneously accounts for the long-range correlations in noncoding DNA sequences and for the apparently paradoxical finding of long subregions of biased random walks (length lj) within these correlated sequences. In the generalized Lévy-walk model, the lj are chosen from a power-law distribution P(lj) varies as lj(-mu). The correlation exponent alpha is related to mu through alpha = 2-mu/2 if 2 < mu < 3. The model is consistent with the finding of "repetitive elements" of variable length interspersed within noncoding DNA.  相似文献   

4.
We analyze the fluctuations in the correlation exponents obtained for noncoding DNA sequences. We find prominent sample-to-sample variations as well as variations within a single sample in the scaling exponent. To determine if these fluctuations may result from finite system size, we generate correlated random sequences of comparable length and study the fluctuations in this control system. We find that the DNA exponent fluctuations are consistent with those obtained from the control sequences having long-range power-law correlations. Finally, we compare our exponents for the DNA sequences with the exponents obtained from power-spectrum analysis and correlation-function techniques, and demonstrate that the original "DNA-walk" method is intrinsically more accurate due to reduced noise.  相似文献   

5.
We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.  相似文献   

6.
7.
The dynamics of heartbeat interval time series over large time scales were studied by a modified random walk analysis introduced recently as Detrended Fluctuation Analysis. In this analysis, the intrinsic fractal long-range power-law correlation properties of beat-to-beat fluctuations generated by the dynamical system (i.e., cardiac rhythm generator), after decomposition from extrinsic uncorrelated sources, can be quantified by the scaling exponent (alpha) which, in healthy subjects, for time scales of approximately 10(4) beats is approximately 1.0. The effects of chronic hypoxia were determined from serial heartbeat interval time series of digitized twenty-four-hour ambulatory ECGs recorded in nine healthy subjects (mean age thirty-four years old) at sea level and during a sojourn at 5,050 m for thirty-four days (EvK2-CNR Pyramid Laboratory, Sagarmatha National Park, Nepal). The group averaged alpha exponent (+/- SD) was 0.99 +/- 0.04 (range 0.93-1.04). Longitudinal assessment of alpha in individual subjects did not reveal any effect of exposure to chronic high altitude hypoxia. The finding of alpha approximately 1 indicating scale-invariant long-range power-law correlations (1/f noise) of heartbeat fluctuations would reflect a genuinely self-similar fractal process that typically generates fluctuations on a wide range of time scales. Lack of a characteristic time scale along with the absence of any effect from exposure to chronic hypoxia on scaling properties suggests that the neuroautonomic cardiac control system is preadapted to hypoxia which helps prevent excessive mode-locking (error tolerance) that would restrict its functional responsiveness (plasticity) to hypoxic or other physiological stimuli.  相似文献   

8.
9.
10.
Direct activation of the N-myc2 oncogene by insertion of woodchuck hepatitis virus (WHV) DNA is a major oncogenic step in woodchuck hepatocarcinogenesis. We previously reported that WHV enhancer II (We2), which controls expression of the core/pregenome RNA, can also activate the N-myc2 promoter in hepatoma cell lines. To better define the integrated WHV regulatory sequences responsible for N-myc2 promoter activation in woodchuck liver tumors, we analyzed the structure and enhancer activity of a single viral integrant found at the win locus in tumor 2260T1 and mapping approximately 175 kb 3' of N-myc2. This viral insert was made of 11 concatemerized WHV fragments, 5 of which overlapped with We2 sequences and 1 with WHV sequence homologous to that of hepatitis B virus enhancer I (We1). In transient transfection assays in hepatoma-derived cells, the We2 activator was found to be fully effective only when inserted in close proximity to the N-myc2 promoter whereas the We1 element by itself was apparently devoid of activity. In contrast, the 2260T1 viral insert exhibited a potent enhancer capacity that depended both on multimerized We2 and on We1 sequences. In a survey of different woodchuck hepatomas, both elements were commonly found within integrated viral sequences involved in long-range N-myc2 activation.  相似文献   

11.
12.
Short-range and long-range photoreactions between ethidium and DNA have been characterized. While no DNA reaction is observed upon excitation into the visible absorption band of ethidium, higher-energy irradiation (313-340 nm) leads both to direct strand cleavage at the 5'-G of 5'-GG-3' doublets and to piperidine-sensitive lesions at guanine. This reactivity is not consistent with oxidation of guanine by either electron transfer or singlet oxygen as shown by comparison with reactions of a rhodium intercalator and methylene blue, respectively. By covalently tethering ethidium to one end of a DNA duplex, we demonstrate the presence of two distinct reactions, one short-range and the other long-range. The short-range reaction involves a covalent modification of guanine by ethidium, based upon HPLC analysis of the nucleoside products and studies with ethidium derivatives. The long-range reaction is entirely consistent with oxidation of guanine by DNA-mediated electron transfer. The yield of this electron-transfer reaction is not attenuated with distance; equal yields of guanine damage are observed at a proximal (17 A Et-GG separation) and distal (44 A Et-GG separation) site. These results are quite similar to those previously observed with a covalently tethered rhodium photooxidant and underscore the unique ability of the DNA base stack to facilitate long-range electron transfer so as to effect oxidative damage from a distance.  相似文献   

13.
Extensive DNA rearrangement occurs during the development of the somatic macronucleus from the germ line micronucleus in ciliated protozoans. The micronuclear junctions and the macronuclear product of a developmentally regulated DNA rearrangement in Tetrahymena thermophila, Tlr1, have been cloned. The intrachromosomal rearrangement joins sequences that are separated by more than 13 kb in the micronucleus with the elimination of moderately repeated micronucleus-specific DNA sequences. There is a long, 825-bp, inverted repeat near the micronuclear junctions. The inverted repeat contains two different 19-bp tandem repeats. The 19-bp repeats are associated with each other and with DNA rearrangements at seven locations in the micronuclear genome. Southern blot analysis is consistent with the occurrence of the 19-bp repeats within pairs of larger repeated sequences. Another family member was isolated. The 19-mers in that clone are also in close proximity to a rearrangement junction. We propose that the 19-mers define a small family of developmentally regulated DNA rearrangements having elements with long inverted repeats near the junction sites. We discuss the possibility that transposable elements evolve by capture of molecular machinery required for essential cellular functions.  相似文献   

14.
15.
Molecular dynamics simulations are carried out to investigate the binding of the estrogen receptor, a member of the nuclear hormone receptor family, to specific and non-specific DNA. Two systems have been simulated, each based on the crystallographic structure of a complex of a dimer of the estrogen receptor DNA binding domain with DNA. One structure includes the dimer and a consensus segment of DNA, ds(CCAGGTCACAGTGACCTGG); the other structure includes the dimer and a nonconsensus segment of DNA, ds(CCAGAACACAGTGACCTGG). The simulations involve an atomic model of the protein-DNA complex, counterions, and a sphere of explicit water with a radius of 45 A. The molecular dynamics package NAMD was used to obtain 100 ps of dynamics for each system with complete long-range electrostatic interactions. Analysis of the simulations revealed differences in the protein-DNA interactions for consensus and nonconsensus sequences, a bending and unwinding of the DNA, a slight rearrangement of several amino acid side chains, and inclusion of water molecules at the protein-DNA interface region. Our results indicate that binding specificity and stability is conferred by a network of direct and water mediated protein-DNA hydrogen bonds. For the consensus sequence, the network involves three water molecules, residues Glu-25, Lys-28, Lys-32, Arg-33, and bases of the DNA. The binding differs for the nonconsensus DNA sequence in which case the fluctuating network of hydrogen bonds allows water molecules to enter the protein-DNA interface. We conclude that water plays a role in furnishing DNA binding specificity to nuclear hormone receptors.  相似文献   

16.
17.
18.
Interspersed repeated DNA sequences are characteristic features of both prokaryotic and eukaryotic genomes. REP sequences are defined as conserved repetitive extragenic palindromic sequences and are found in Escherichia coli, Salmonella typhimurium and other closely related enteric bacteria. These REP sequences may participate in the folding of the bacterial chromosome. In this work we describe a unique class of 28 conserved complex REP clusters, about 100bp long, in which two inverted REPs are separated by a singular integration host factor (IHF) recognition sequence. We term these sequences RIP (for repetitive IHF-binding palindromic) elements and demonstrate that IHF binds to them specifically. It is estimated that there are about 70 RIP elements in E. coli. Our analysis shows that the RIP elements are evenly distributed around the bacterial chromosome. The possible function of the RIP element is discussed.  相似文献   

19.
20.
We recently reported cloning of Streptococcus anginosus (S. anginosus) DNA fragments containing the 16S ribosomal gene from DNA samples of surgical specimens of gastric cancers. To investigate the specificity of S. anginosus infection, Southern blot analysis with S. anginosus 16S ribosomal DNA probe and PCR analysis with S. anginosus-specific primers were performed in DNA samples prepared from 15 esophageal cancers, 43 gastric cancers, 16 lung cancers, 10 cervical cancers, 14 renal cell carcinomas, 10 colorectal cancers, and 19 bladder cancers. We frequently found S. anginosus DNA sequences in DNA samples from esophageal cancer and gastric cancer tissues, as well as in those from dysplasia of the esophagus of esophageal cancer patients. No S. anginosus DNA bands were detected by Southern blot analysis on DNAs from the noncancerous portions of the esophagus or the stomach. By PCR analysis with 35 cycles, only 7% of the noncancerous portion of the esophagus was shown to contain S. anginosus sequences. No S. anginosus sequences were found in DNAs from cancers in lung, cervix, and kidney, but they were found in 1 of 10 colon cancers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号