首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We propose a generalized Lévy walk to model fractal landscapes observed in noncoding DNA sequences. We find that this model provides a very close approximation to the empirical data and explains a number of statistical properties of genomic DNA sequences such as the distribution of strand-biased regions (those with an excess of one type of nucleotide) as well as local changes in the slope of the correlation exponent alpha. The generalized Lévy-walk model simultaneously accounts for the long-range correlations in noncoding DNA sequences and for the apparently paradoxical finding of long subregions of biased random walks (length lj) within these correlated sequences. In the generalized Lévy-walk model, the lj are chosen from a power-law distribution P(lj) varies as lj(-mu). The correlation exponent alpha is related to mu through alpha = 2-mu/2 if 2 < mu < 3. The model is consistent with the finding of "repetitive elements" of variable length interspersed within noncoding DNA.  相似文献   

2.
Mosaic organization of DNA nucleotides   总被引:23,自引:0,他引:23  
Long-range power-law correlations have been reported recently for DNA sequences containing noncoding regions. We address the question of whether such correlations may be a trivial consequence of the known mosaic structure ("patchiness") of DNA. We analyze two classes of controls consisting of patchy nucleotide sequences generated by different algorithms--one without and one with long-range power-law correlations. Although both types of sequences are highly heterogenous, they are quantitatively distinguishable by an alternative fluctuation analysis method that differentiates local patchiness from long-range correlations. Application of this analysis to selected DNA sequences demonstrates that patchiness is not sufficient to account for long-range correlation properties.  相似文献   

3.
Mapping nucleotide sequences onto a "DNA walk" produces a novel representation of DNA that can then be studied quantitatively using techniques derived from fractal landscape analysis. We used this method to analyze 11 complete genomic and cDNA myosin heavy chain (MHC) sequences belonging to 8 different species. Our analysis suggests an increase in fractal complexity for MHC genes with evolution with vertebrate > invertebrate > yeast. The increase in complexity is measured by the presence of long-range power-law correlations, which are quantified by the scaling exponent alpha. We develop a simple iterative model, based on known properties of polymeric sequences, that generates long-range nucleotide correlations from an initially noncorrelated coding region. This new model-as well as the DNA walk analysis-both support the intron-late theory of gene evolution.  相似文献   

4.
We analyze the fluctuations in the correlation exponents obtained for noncoding DNA sequences. We find prominent sample-to-sample variations as well as variations within a single sample in the scaling exponent. To determine if these fluctuations may result from finite system size, we generate correlated random sequences of comparable length and study the fluctuations in this control system. We find that the DNA exponent fluctuations are consistent with those obtained from the control sequences having long-range power-law correlations. Finally, we compare our exponents for the DNA sequences with the exponents obtained from power-spectrum analysis and correlation-function techniques, and demonstrate that the original "DNA-walk" method is intrinsically more accurate due to reduced noise.  相似文献   

5.
We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.  相似文献   

6.
7.
In order to analyse the diversity of T-cell receptors (TCRs) expressed by the T-cell population activated by allogeneic HLA-DR stimulation, TCR beta cDNA was synthesized from mRNA of human CD4+ T cells that had been stimulated in a primary mixed lymphocyte reaction (MLR). The TCR beta cDNA was amplified by the polymerase chain reaction (PCR), subjected to bacterial cloning, and sequenced from V beta through J beta. Twenty-six different V beta and 10 different J beta segments were detected among 56 randomly selected cDNA clones. Occurrences of V beta 17.1 and J beta 1.5 were higher than those found in the CD4+ T-cell population activated with a CD3-specific antibody. A total of 53 different CDR3 sequences, two of them occurring more than once, were detected among the 56 cDNA clones. In order to estimate the degree of CDR3 diversity, amino acid similarity in the CDR3 region of the cDNA was calculated and compared with those of the anti-CD3-activated T-cell sequences as well as those of various published T-cell clone sequences, each directed to either alloantigens or single antigenic peptides. It was found that the similarity score among CDR3 sequences obtained from the MLR (56.4 +/- 10.3) was comparable to those of anti-CD3-activated T cells (55.7 +/- 10.7) and those of T-cell clones directed toward alloantigens (range, 48.4 +/- 12.4-59.4 +/- 13.1), but significantly smaller than those of T-cell clones directed toward single antigenic peptides such as those derived from myelin basic protein (75.6 +/- 17.9) and cytochrome c (76.9 +/- 20.5). These results provide quantitative proof that TCRs of T cells activated by primary allogeneic HLA-DR stimulation have a larger diversity than those recognizing single antigenic peptides.  相似文献   

8.
The dynamics of heartbeat interval time series over large time scales were studied by a modified random walk analysis introduced recently as Detrended Fluctuation Analysis. In this analysis, the intrinsic fractal long-range power-law correlation properties of beat-to-beat fluctuations generated by the dynamical system (i.e., cardiac rhythm generator), after decomposition from extrinsic uncorrelated sources, can be quantified by the scaling exponent (alpha) which, in healthy subjects, for time scales of approximately 10(4) beats is approximately 1.0. The effects of chronic hypoxia were determined from serial heartbeat interval time series of digitized twenty-four-hour ambulatory ECGs recorded in nine healthy subjects (mean age thirty-four years old) at sea level and during a sojourn at 5,050 m for thirty-four days (EvK2-CNR Pyramid Laboratory, Sagarmatha National Park, Nepal). The group averaged alpha exponent (+/- SD) was 0.99 +/- 0.04 (range 0.93-1.04). Longitudinal assessment of alpha in individual subjects did not reveal any effect of exposure to chronic high altitude hypoxia. The finding of alpha approximately 1 indicating scale-invariant long-range power-law correlations (1/f noise) of heartbeat fluctuations would reflect a genuinely self-similar fractal process that typically generates fluctuations on a wide range of time scales. Lack of a characteristic time scale along with the absence of any effect from exposure to chronic hypoxia on scaling properties suggests that the neuroautonomic cardiac control system is preadapted to hypoxia which helps prevent excessive mode-locking (error tolerance) that would restrict its functional responsiveness (plasticity) to hypoxic or other physiological stimuli.  相似文献   

9.
10.
Gradient-enhanced, two-dimensional, homonuclear correlation techniques (GCOSY) of carbohydrates provide numerous correlations based on 4J and 5J long-range interactions. Intraresidue correlations, involving all 1H resonances of a given pyranose ring with its anomeric proton, are consistently observed in alpha-pyranosyl residues at approximately 5 to 10 times lower intensities than vicinal 3J correlation cross peaks. beta-Anomers, pyranosyl residues with axial H1 protons, show very few such effects. Both alpha and beta anomers do, however, exhibit interresidue 4J correlations across the glycosidic linkage as shown for several linear and branched oligosaccharides ranging from three to five residues and are especially useful for spectral assignments in the envelope of pyranosyl ring protons located in the typically very crowded 3 to 4 ppm region. These effects depend on the strength and duration of the applied gradients.  相似文献   

11.
A cDNA encoding human fast skeletal beta troponin T (beta TnTf) has been isolated and characterized from a fetal skeletal muscle library. The cDNA insert is 1,000 bp in length and contains the entire coding region of 777 bp and 5' and 3' untranslated (UT) segments of 12 and 211 bp, respectively. The 3' UT segment shows the predicted stem-loop structure typical of eukaryotic mRNAs. The cDNA-derived amino acid sequence is the first available sequence for human beta TnTf protein. It is encoded by a single-copy gene that is expressed in a tissue-specific manner in fetal and adult fast skeletal muscles. Although the human beta TnTf represents the major fetal isoform, the sequence information indicates that this cDNA and the coded protein are quite distinct from the fetal and neonatal TnTf isoforms reported in other mammalian fetal muscles. The hydropathy plot indicates that human beta TnTf is highly hydrophilic along its entire length. The protein has an extremely high degree of predicted alpha-helical content involving the entire molecule except the carboxy-terminal 30 residues. Comparative sequence analysis reveals that the human beta TnTf shares a high level of sequence similarity in the coding region with other vertebrate TnTf and considerably reduced similarity with slow skeletal and cardiac TnT cDNAs. The TnT isoforms have a large central region consisting of amino acid residues 46-204 which shows a high sequence conservation both at the nucleotide and amino acid levels. This conserved region is flanked by the variable carboxy-terminal and an extremely variable amino-terminal segment. The tropomyosin-binding peptide of TnT, which is represented by amino acid residues 47-151 and also includes a part of troponin I binding region, is an important domain of this central segment. It is suggested that this conserved segment is encoded by an ancestral gene. The variable regions of vertebrate striated TnT isoforms reflect the subsequent addition and modification of genomic sequences to give rise to members of the TnT multigene family.  相似文献   

12.
We examined the stability of microsatellites of different repeat unit lengths in Saccharomyces cerevisiae strains deficient in DNA mismatch repair. The msh2 and msh3 mutations destabilized microsatellites with repeat units of 1, 2, 4, 5, and 8 bp; a poly(G) tract of 18 bp was destabilized several thousand-fold by the msh2 mutation and about 100-fold by msh3. The msh6 mutations destabilized microsatellites with repeat units of 1 and 2 bp but had no effect on microsatellites with larger repeats. These results argue that coding sequences containing repetitive DNA tracts will be preferred target sites for mutations in human tumors with mismatch repair defects. We find that the DNA mismatch repair genes destabilize microsatellites with repeat units from 1 to 13 bp but have no effect on the stability of minisatellites with repeat units of 16 or 20 bp. Our data also suggest that displaced loops on the nascent strand, resulting from DNA polymerase slippage, are repaired differently than loops on the template strand.  相似文献   

13.
We have used the asymmetry between the coding and noncoding strands in different codon positions of coding sequences of DNA as a parameter to evaluate the coding probability for open reading frames (ORFs). The method enables an approximation of the total number of coding ORFs in the set of analyzed sequences as well as an estimation of the coding probability for the ORFs. The asymmetry observed in the nucleotide composition of codons in coding sequences has been used successfully for analysis of the genomes completed at the time of this analysis.  相似文献   

14.
cDNA clones of feline chemokines, MIP-1alpha, MIP-1beta and RANTES, were molecularly isolated with the purpose of using these sequences for future investigation of the inhibitory effects on lentivirus entry and their role in immunological functions. The feline MIP-1alpha and MIP-1beta cDNA clones spanned their entire coding regions encoding 93 and 92 amino acids, respectively. The amino acid sequences of feline MIP-1alpha and MIP-1beta compared to those of their human, mouse and rat counterparts showed similarities of 75.3-79.6% and 73.9-88.0%, respectively. Feline MIP-1alpha and MIP-1beta had four conserved cysteines with a structure made up of the first two cysteines that are characteristic of the CC-chemokine subfamily. The amino terminal of these MIP-1alpha and MIP-1beta sequences was distinctly hydrophobic, suggesting that they may function as signal peptides. A partial cDNA clone consisting of 193 bp was obtained for feline RANTES, and it also showed a high degree of sequence similarity to those of other species and contained the characteristic structure made up of adjacent cysteines. These molecular clones of feline chemokines will be useful in the examination of their inhibitory effect on the cellular entry of feline immunodeficiency virus.  相似文献   

15.
We report the presence, in the mitochondrial DNA (mtDNA) of all of the sexual species of the salamander family Ambystomatidae, of a shared 240-bp intergenic spacer between tRNAThr and tRNAPro. We place the intergenic spacer in context by presenting the sequence of 1,746 bp of mtDNA from Ambystoma tigrinum tigrinum, describe the nucleotide composition of the intergenic spacer in all of the species of Ambystomatidae, and compare it to other coding and noncoding regions of Ambystoma and several other vertebrate mtDNAs. The nucleotide substitution rate of the intergenic spacer is approximately three times faster than the substitution rate of the control region, as shown by comparisons among six Ambystoma macrodactylum sequences and eight members of the Ambystoma tigrinum complex. We also found additional inserts within the intergenic spacers of five species that varied from 87-444 bp in length. The presence of the intergenic spacer in all sexual species of Ambystomatidae suggests that it arose at least 20 MYA and has been a stable component of the ambystomatid mtDNA ever since. As such, it represents one of the few examples of a large and persistent intergenic spacer in the mtDNA of any vertebrate clade.  相似文献   

16.
The stereoselective nitrile hydratase (NHase) from Pseudomonas putida 5B has been over-produced in Escherichia coli. Maximal enzyme activity requires the co-expression of a novel downstream gene encoding a protein (P14K) of 127 amino acids, which shows no significant homology to any sequences in the protein database. Nitrile hydratase produced in transformed E. coli showed activity as high as 472 units/mg dry cell (sixfold higher than 5B), and retained the stereoselectivity observed in the native organism. Separated from the end of the beta subunit by only 51 bp, P14K appears to be part of an operon that includes the alpha and beta structural genes of nitrile hydratase, and other potential coding sequences.  相似文献   

17.
18.
The aim of this study was to evaluate the prevalence of simple sequence variation in the BRCA2 gene. To this end, 71 breast and breast-ovarian cancer (HBC/HBOC) families along with 95 control individuals from a wide range of ethnicities were analyzed by means of denaturing high-performance liquid chromatography (DHPLC) and direct sequence analysis. In the coding (10 257 bp) and non-coding (2799 bp) sequences of BRCA2, 82 sequence variants were identified. Three different, apparently disease-associated BRCA2 mutations were found in six HBC/HBOC families (8%): two splice site mutations in introns 5 and 21, and one frameshift mutation in exon 11. In the coding region, 53 simple sequence variants were found: 35 missense mutations, one 2 bp deletion (CT) resulting in a stop at codon 3364, one nonsense mutation with a stop at codon 3326, one deletion of a complete codon (AAA) resulting in the loss of leucine, and 15 silent mutations. In the non-coding region, 26 polymorphisms were detected. Of the 79 sequence variants that were not obviously disease-associated, eight were detected only in HBC/HBOC families. The remaining 71 variants were identified in both HBC/HBOC families and control individuals. Sixty three sequence variants (80%) were specific for a continent. Forty two percent (33 out of 79) of the sequence variants were detected exclusively in Africa, though only 13% of the 332 chromosomes screened were of African origin. Our data indicate that, in BRCA2, simple sequence variation is frequent [in the coding region 1 in 194 bp (straight theta = 2.2 x 10(-4)), and in the non-coding region 1 in 108 bp (straight theta = 4.4 x 10(-4)), respectively].  相似文献   

19.
20.
Mycobacterial interspersed repetitive units (MIRUs), a novel class of repeated sequences, were identified within the intercistronic region of an operon coding for a mycobacterial two-component system, named senX3-regX3. Southern blot analysis and homology searches revealed the presence of several homologous sequences in intergenic regions dispersed throughout the genomes of Mycobacterium bovis BCG, Mycobacterium tuberculosis and Mycobacterium leprae. These could be grouped into three major families, containing elements of 77-101 bp, 46-53 bp and 58-101 bp. Based on the available mycobacterial sequences, the total number of MIRUs is estimated to be about 40-50 per genome. Similar to previously identified small repetitive sequences, the MIRUs of the two-component operon are transcribed on a polycistronic mRNA. Unlike previously identified small repetitive sequences, however, MIRUs do not contain dyad symmetries, comprise small open reading frames (ORFs) whose extremities overlap those of the contiguous ORFs and are oriented in the same translational direction as those of the adjacent genes. Analyses of the sequences at the insertion sites suggest that MIRUs disseminate by transposition into DTGA sites involved in translational coupling in polycistronic operons.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号