首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Single nucleotide polymorphism (SNP) in human genomes is considered to be highly associated with complex genetic diseases. As a consequence, obtaining all SNPs from human populations is one of the primary goals of recent studies on human genomics. The two sequences of SNPs in diploid human organisms are called haplotypes. In this paper, the problem of haplotype reconstruction from SNP fragments with and without genotype information is studied. Minimum error correction (MEC) is an important model for this problem but only effective when the error rate of the fragments is low. MEC/GI, as an extension to MEC model, employs the related genotype information besides the SNP fragments and, therefore, results in a more accurate inference. We introduce algorithmic neural network-based approaches and experimentally prove that our methods are fast and accurate. Particularly, our approach is faster, more accurate, and also compatible for solving MEC model, in comparison with a feed-forward (and back propagation like) neural network.  相似文献   

2.
A single nucleotide polymorphism (SNP), as the most common form of genetic variation, has been widely studied to help analyze the possible association between diseases and genomes. To gain more information, SNPs on a single chromosome are usually studied together, which constitute a haplotype. Gaining haplotypes from biological experiments is usually very costly and time-consuming, which causes people to develop efficient methods to determine haplotypes from the computational angle. Many problems and algorithms about haplotypes have been proposed to reduce the cost of studies of disease association. In general, four categories of problems are widely researched: the haplotype assembly problem, the haplotype inference problem, the haplotype block partition problem, and the haplotype tagging SNP selection problem. The former two problems have been well reviewed by many researchers, whereas the latter two have not been comprehensively surveyed to our knowledge. In this paper, we try to make a detailed introduction to the four problems, especially the latter two.  相似文献   

3.
基于深度序列的人体行为识别, 一般通过提取特征图来提高识别精度, 但这类特征图通常存在时序信息缺失的问题. 针对上述问题, 本文提出了一种新的深度图序列表示方式, 即深度时空图(Depth space time maps, DSTM). DSTM降低了特征图的冗余度, 弥补了时序信息缺失的问题. 本文通过融合空间信息占优的深度运动图(Depth motion maps, DMM) 与时序信息占优的DSTM, 进行高精度的人体行为研究, 并提出了多聚点子空间学习(Multi-center subspace learning, MCSL)的多模态数据融合算法. 该算法为各类数据构建多个投影聚点, 以此增大样本的类间距离, 降低了投影目标区域维度. 本文在MSR-Action3D数据集和UTD-MHAD数据集上进行人体行为识别. 最后实验结果表明, 本文方法相较于现有人体行为识别方法有着较高的识别率.  相似文献   

4.
DNA Translator and Aligner are molecular phylogenetics HyperCard stacks for Macintosh computers. They manipulate sequence data to provide graphical gene mapping, conversions, translations and manual multiple-sequence alignment editing. DNA Translator is able to convert documented GenBank or EMBL documented sequences into linearized, rescalable gene maps whose gene sequences are extractable by clicking on the corresponding map button or by selection from a scrolling list. Provided gene maps, complete with extractable sequences, consist of nine metazoan, one yeast, and one ciliate mitochondrial DNAs and three green plant chloroplast DNAs. Single or multiple sequences can be manipulated to aid in phylogenetic analysis. Sequences can be translated between nucleic acids and proteins in either direction with flexible support of alternate genetic codes and ambiguous nucleotide symbols. Multiple aligned sequence output from diverse sources can be converted to Nexus, Hennig86 or PHYLIP format for subsequent phylogenetic analysis. Input or output alignments can be examined with Aligner, a convenient accessory stack included in the DNA Translator package. Aligner is an editor for the manual alignment of up to 100 sequences that toggles between display of matched characters and normal unmatched sequences. DNA Translator also generates graphic displays of amino acid coding and codon usage frequency relative to all other, or only synonymous, codons for approximately 70 select organism-organelle combinations. Codon usage data is compatible with spreadsheet or UWGCG formats for incorporation of additional molecules of interest. The complete package is available via anonymous ftp and is free for non-commercial uses.  相似文献   

5.
Haplotypes play an important role in genetic association studies of complex diseases. Recently, computational techniques helping to determine human haplotypes were studied extensively. Given the genotype and the aligned single nucleotide polymorphism (SNP) fragments of an individual, Minimum Error Correction with Genotype Information (MEC/GI) is an important computational model to infer a pair of haplotypes compatible with the genotype by correcting minimum number of SNPs in the given SNP fragments. The MEC/GI problem has been proven NP-hard, for which there is no practical exact algorithm. Despite the rapid advances in molecular biological techniques, modern high-throughput sequencers cannot sequence directly a DNA fragment that contains more than 1200 nucleotide bases. With low SNP density, current available data reveal that the number k of SNP sites that a DNA fragment covers is usually smaller than 10. Based on the above fact, we develop a new dynamic programming algorithm with running time O(mk2 k +mlog m+mk), where m is the number of fragments. Since k is small in real biological applications, the algorithm is practical and efficient.  相似文献   

6.
The grouping of pixels based on some similarity criteria is called image segmentation. In this paper the problem of color image segmentation is considered as a clustering problem and a fixed length genetic algorithm (GA) is used to handle it. The effectiveness of GA depends on the objective function (fitness function) and the initialization of the population. A new objective function is proposed to evaluate the quality of the segmentation and the fitness of a chromosome. In fixed length genetic algorithm the chromosomes have same length, which is normally set by the user. Here, a self organizing map (SOM) is used to determine the number of segments in order to set the length of a chromosome automatically. An opposition based strategy is adopted for the initialization of the population in order to diversify the search process. In some cases the proposed method makes the small regions of an image as separate segments, which leads to noisy segmentation. A simple ad hoc mechanism is devised to refine the noisy segmentation. The qualitative and quantitative results show that the proposed method performs better than the state-of-the-art methods.  相似文献   

7.
单体型组装问题就是根据个体基因组测序获得的DNA序列数据重构出该个体的一对单体型。目前单体型组装问题的各种优化计算模型已有相关的启发式算法和参数化精确算法,但是这些算法只能得出一个最优解,即一对单体型。可是生物问题的最优解往往不是唯一的,或者真实解可能只是接近最优的。该文设计了一个新的能枚举出最优的多个解的遗传算法。实验结果表明该算法具有较高的单体型重建精度,并为生物学家根据领域知识在算法获得的多个解的基础进一步选择提供了可能。  相似文献   

8.
For more than two decades, genetic algorithms (GAs) have been studied by researchers from different fields. Over the years, many modifications have been suggested to alleviate the difficulties encountered by GAs in solving different problems. Despite these modifications, with the increase in application traditional GAs remain inadequate for many practical purposes. This paper introduces a new genetic model called the structured genetic algorithm (sGA) to address some of the difficulties encountered by the simple genetic approaches in solving various types of problems. The novelty of this genetic model lies primarily in its redundant genetic material and a gene activation mechanism that utilizes a multilayered structure for the chromosome. This representation provides genetic variation and has many advantages in search and optimization. For example, it can retain multiple (alternative) solutions or parameter spaces in its representation. In effect, it also works as a long-term distributed memory within the population, enabling rapid adaptation in non stationary environments. Theoretical arguments and empirical studies are presented which demonstrate that the sGA can more efficiently solve complex problems than simple GAs. It is also noted that the sGA exhibits greater implicit nondisruptive diversity than other exist-  相似文献   

9.
The aim of this paper is to present a new approach, called 'Hybrid Chromosome Model' (HXM), which allows both the extraction of regions of similarity between two sequences, and the compartimentation of a set of DNA sequences. The principle of the method consists in compacting a set of sequences (split into fragments of fixed length) into a 'hybrid chromosome', which results from the stacking of the whole sequence fragments. We have illustrated our approach on the 32 subtelomeres of Saccharomyces cerevisae. The compartimentation of these chromosome extremities into common regions of similarity has been carried out. The approach HXM is a fast and efficient tool for mapping entire genomes and for extracting ancient duplications within or between genomes.  相似文献   

10.
Pharmacophore modeling, including ligand- and structure-based approaches, has become an important tool in drug discovery. However, the ligand-based method often strongly depends on the training set selection, and the structure-based pharmacophore model is usually created based on apo structures or a single protein-ligand complex, which might miss some important information. In this study, multicomplex-based method has been suggested to generate a comprehensive pharmacophore map of cyclin-dependent kinase 2 (CDK2) based on a collection of 124 crystal structures of human CDK2-inhibitor complex. Our multicomplex-based comprehensive pharmacophore map contains almost all the chemical features important for CDK2-inhibitor interactions. A comparison with previously reported ligand-based pharmacophores has revealed that the ligand-based models are just a subset of our comprehensive map. Furthermore, one most-frequent-feature pharmacophore model consisting of the most frequent pharmacophore features was constructed based on the statistical frequency information provided by the comprehensive map. Validations to the most-frequent-feature model show that it can not only successfully discriminate between known CDK2 inhibitors and the molecules of focused inactive dataset, but also is capable of correctly predicting the activities of a wide variety of CDK2 inhibitors in an external active dataset. Obviously, this investigation provides some new ideas about how to develop a multicomplex-based pharmacophore model that can be used in virtual screening to discover novel potential lead compounds.  相似文献   

11.
This paper attempts to compare the effect of using different chromosome representations while developing a genetic algorithm to solve a scheduling problem called DFJS (distributed flexible job shop scheduling) problem. The DFJS problem is strongly NP-hard; most recent prior studies develop various genetic algorithms (GAs) to solve the problems. These prior GAs are similar in the algorithmic flows, but are different in proposing different chromosome representations. Extending from this line, this research proposes a new chromosome representation (called SOP) and develops a genetic algorithm (called GA_OP) to solve the DFJS problem. Experiment results indicate that GA_OP outperforms all prior genetic algorithms. This research advocates the importance of developing appropriate chromosome representations while applying genetic algorithms (or other meta-heuristic algorithms) to solve a space search problem, in particular when the solution space is high-dimensional.  相似文献   

12.
The reconstruction of founder genetic sequences of a population is a relevant issue in evolutionary biology research. The problem consists in finding a biologically plausible set of genetic sequences (founders), which can be recombined to obtain the genetic sequences of the individuals of a given population. The reconstruction of these sequences can be modelled as a combinatorial optimisation problem in which one has to find a set of genetic sequences such that the individuals of the population under study can be obtained by recombining founder sequences minimising the number of recombinations. This problem is called the founder sequence reconstruction problem. Solving this problem can contribute to research in understanding the origins of specific genotypic traits. In this paper, we present large neighbourhood search algorithms to tackle this problem. The proposed algorithms combine a stochastic local search with a branch-and-bound algorithm devoted to neighbourhood exploration. The developed algorithms are thoroughly evaluated on three different benchmark sets and they establish the new state of the art for realistic problem instances.  相似文献   

13.
The objective of this paper is to find a sequence of jobs in the flow shop to minimize makespan. A feed forward back propagation neural network is used to solve the problem. The network is trained with the optimal sequences of completely enumerated five, six and seven jobs, ten machine problem and this trained network is then used to solve the problem with greater number of jobs. The sequence obtained using artificial neural network (ANN) is given as the initial sequence to a heuristic proposed by Suliman and also to genetic algorithm (GA) as one of the sequences of the population for further improvement. The approaches are referred as ANN-Suliman heuristic and ANN-GA heuristic respectively. Makespan of the sequences obtained by these heuristics are compared with the makespan of the sequences obtained using the heuristic proposed by Nawaz, Enscore and Ham (NEH) and Suliman Heuristic initialized with Campbell Dudek and Smith (CDS) heuristic called as CDS-Suliman approach. It is found that the ANN-GA and ANN-Suliman heuristic approaches perform better than NEH and CDS-Suliman heuristics for the problems considered.  相似文献   

14.
动态场景的外形或表观在很大程度上往往受到一个潜在低维动态过程的控制。基于视频序列之间的时间相干特性,引入一种称为自编码(autoencoder)的特殊双向深层神经网络,采用CRBM(continuous restricted Boltzmann machine)的网络结构,用来学习序列图像的低维流形结构。将autoencoder 用于人体步态序列的实验表明,该方法能提供从高维视频帧到具有一定物理意义过程的低维序列的映射,并能从低维描述中恢复高维图像序列。  相似文献   

15.
Peter Z. Revesz 《Constraints》1997,2(3-4):361-375
A genome map is an ordering of a set of clones according to their believed position on a DNA string. Simple heuristics for genome map assembly based on single restriction enzyme with complete digestion data can lead to inaccuracies and ambiguities. This paper presents a method that adds additional constraint checking to the assembly process. An automaton is presented that for any genome map produces a refined genome map where both the clones and the restriction fragments in each clone are ordered satisfying natural constraints called step constraints. Any genome map that cannot be refined is highly likely to be inaccurate and can be eliminated as a possibility.  相似文献   

16.
Routine Discovery of Complex Genetic Models using Genetic Algorithms   总被引:1,自引:0,他引:1  
Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of disease susceptibility genes whose effects are complex, nonlinear, and partially or solely dependent on the effects of other genes (i.e. epistasis or gene-gene interaction). Despite this need, the development of complex genetic models that can be used to simulate data is not always intuitive. In fact, only a few such models have been published. We have previously developed a genetic algorithm approach to discovering complex genetic models in which two single nucleotide polymorphisms (SNPs) influence disease risk solely through nonlinear interactions. In this paper, we extend this approach for the discovery of high-order epistasis models involving three to five SNPs. We demonstrate that the genetic algorithm is capable of routinely discovering interesting high-order epistasis models in which each SNP influences risk of disease only through interactions with the other SNPs in the model. This study opens the door for routine simulation of complex gene-gene interactions among SNPs for the development and evaluation of new statistical and computational approaches for identifying common, complex multifactorial disease susceptibility genes.  相似文献   

17.
Sketching is a natural and easy way for humans to express visual information in everyday life. Despite a number of approaches to understand online sketch maps, the automatic understanding of offline, hand-drawn sketch maps still poses a problem. This paper presents a new approach for novel sketch map understanding. To our knowledge, this is the first comprehensive work dealing with this task in an offline way. This paper presents a system for automatic understanding of sketch maps and the underlying algorithms for all steps. Major parts are a region-growing segmentation for sketch map objects, a classification for isolated objects, and a context-aware classification. The context-aware classification uses probabilistic relaxation labeling to integrate dependencies between objects into the recognition. We show how these algorithms can deal with the major problems of sketch map understanding, such as vagueness in interpretation. Our experiments demonstrate the importance of context-aware classification for sketch map understanding. In addition, a new database of annotated sketch maps was developed and is made publicly available. This can be used for training and evaluation of sketch map understanding algorithms.  相似文献   

18.
As a typical combinatorial optimization problem, the traveling salesman problem (TSP) has attracted extensive research interest. In this paper, we develop a self-organizing map (SOM) with a novel learning rule. It is called the integrated SOM (ISOM) since its learning rule integrates the three learning mechanisms in the SOM literature. Within a single learning step, the excited neuron is first dragged toward the input city, then pushed to the convex hull of the TSP, and finally drawn toward the middle point of its two neighboring neurons. A genetic algorithm is successfully specified to determine the elaborate coordination among the three learning mechanisms as well as the suitable parameter setting. The evolved ISOM (eISOM) is examined on three sets of TSP to demonstrate its power and efficiency. The computation complexity of the eISOM is quadratic, which is comparable to other SOM-like neural networks. Moreover, the eISOM can generate more accurate solutions than several typical approaches for TSP including the SOM developed by Budinich, the expanding SOM, the convex elastic net, and the FLEXMAP algorithm. Though its solution accuracy is not yet comparable to some sophisticated heuristics, the eISOM is one of the most accurate neural networks for the TSP.  相似文献   

19.
DNA computation simulator based on abstract bases   总被引:1,自引:0,他引:1  
 We developed a simulator to aid those who design algorithms and protocols for DNA computing. In this simulator, abstract sequences instead of real DNA sequences are used to represent molecules in order to increase efficiency of simulations. Two approaches for simulation are available: threshold and stochastic. The simulator consists of two main parts, one for finding reactions among existing molecules and generating new ones, and the other for numerically solving differential equations to calculate the concentration of each molecule. The two parts rely on each other. In particular for the threshold approach, the former avoids a combinatorial explosion by setting a threshold on concentrations of molecules that can take part in reactions. In addition, the stochastic approach is also available for simulations which are hard by the threshold approach. Some simulation results by the approaches are also presented: computation of Boolean circuits, whiplash PCR, formation of DNA tiles and polymerase chain reaction (PCR). We also integrate simulating DNA computation and fitting parameters by the genetic algorithm (GA), where simulation results are used as evaluation functions for the genetic algorithm. The integration is applied to find good protocols for PCR amplification. A trial to refine the reaction model for hybridization is also described before the final discussion on the simulator.  相似文献   

20.
一种基于遗传算法的多缺陷定位方法   总被引:1,自引:0,他引:1  
王赞  樊向宇  邹雨果  陈翔 《软件学报》2016,27(4):879-900
基于程序频谱的缺陷定位方法可以有效地辅助开发人员定位软件内部缺陷,但大部分已有自动化方法在解决多缺陷定位问题时表现不佳,部分效果尚可的方法因复杂度较高或需要开发人员较多交互而仍需进一步改善.为改善上述问题,提出一种基于遗传算法的多缺陷定位方法 GAMFal,具体来说:首先基于搜索的软件工程思想对多缺陷定位问题进行建模,构建了候选缺陷分布的染色体编码方式,并基于扩展的Ochiai系数计算个体的适应度值;随后使用遗传算法在解空间中搜索具有最高适应度值的候选缺陷分布,在终止条件被满足后返回最优解种群;最后根据这个种群对程序实体进行排序.这样开发人员可以依次对程序实体进行检查并最终确定多个缺陷的具体位置.实证研究以Siemens套件中的7个程序和Linux的3个程序(gzip、grep和sed)作为评测对象,并扩展传统的定位方法评测标准EXAM至EXAMF和EXAML,通过与其他经典的缺陷定位方法(Tarantula、Improved Tarantula及Ochiai)进行对比,并通过Friedman检测和最小显著性差异测试可得,提出的GAMFal方法在整体定位效率方面优于传统方法,且需要更少的人工交互.除此之外,GAMFal的执行时间也在可接受的范围之内.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号