共查询到20条相似文献,搜索用时 0 毫秒
1.
姜涛 《计算机科学技术学报》2010,25(1):42-52
Genome-scale assignment of orthologous genes is a fundamental and challenging problem in computational biology and has a wide range of applications in comparative genomics, functional genomics, and systems biology. Many methods based on sequence similarity, phylogenetic analysis, chromosomal syntenic information, and genome rearrangement have been proposed in recent years for ortholog assignment. Although these methods produce results that largely agree with each other, their results may still contain signi... 相似文献
2.
Computational Challenges in Characterization of Bacteria and Bacteria-Host Interactions Based on Genomic Data 下载免费PDF全文
With the rapid development of next-generation sequencing technologies, bacterial identification becomes a very important and essential step in processing genomic data, especially for metagenomic data. Many computational methods have been developed and some of them are widely used to address the problems in bacterial identification. In this article we review the algorithms of these methods, discuss their drawbacks, and propose future computational methods that use genomic data to characterize bacteria. In addition, we tackle two specific computational problems in bacterial identification, namely, the detection of host-specific bacteria and the detection of disease-associated bacteria, by offering potential solutions as a starting point for those who are interested in the area. 相似文献
3.
马斌 《计算机科学技术学报》2010,25(1):107-123
Mass spectrometry is an analytical technique for determining the composition of a sample. Recently it has become a primary tool for protein identification and quantification, and post translational modification characterization in proteomics research. Both the size and the complexity of the data produced by this experimental technique impose great computational challenges in the data analysis. This article reviews some of these challenges and serves as an entry point for those who want to study the area in ... 相似文献
4.
A phylogeny is the evolutionary history of a group of organisms; systematists (and other biologists) attempt to reconstruct this history from various forms of data about contemporary organisms. Phylogeny reconstruction is a crucial step in the understanding of evolution as well as an important tool in biological, pharmaceutical, and medical research. Phylogeny reconstruction from molecular data is very difficult: almost all optimization models give rise to NP-hard (and thus computationally intractable) problems. Yet approximations must be of very high quality in order to avoid outright biological nonsense. Thus many biologists have been willing to run farms of processors for many months in order to analyze just one dataset. High-performance algorithm engineering offers a battery of tools that can reduce, sometimes spectacularly, the running time of existing phylogenetic algorithms, as well as help designers produce better algorithms. We present an overview of algorithm engineering techniques, illustrating them with an application to the breakpoint analysis method of Sankoff et al., which resulted in the GRAPPA software suite. GRAPPA demonstrated a speedup in running time by over eight orders of magnitude over the original implementation on a variety of real and simulated datasets. We show how these algorithmic engineering techniques are directly applicable to a large variety of challenging combinatorial problems in computational biology. 相似文献
5.
Petri网在生物信息学中的应用 总被引:4,自引:1,他引:3
生物信息学是一门正在快速发展的使用数学和计算机技术来构造和分析生物学模型的学科.Petri网是近来被用于生物信息学的有效工具,但是应用的深度和广度还有待深入研究.文中综述了Petri网在生物信息学领域应用的最新研究进展,主要包括三个方面:应用位置/变迁网定性分析生物学对象的结构性质;应用随机Petri网将随机性加入到生物学建模和分析中;应用混合Petri网描述和分析同时具有离散特性和连续特性的生物系统.最后对Petri网在生物信息学领域的应用情况进行总结并展望了未来的研究方向. 相似文献
6.
【目的】为推动精准医学研究的发展,世界各国相继开展大规模人群队列基因组测序计划,通过对数以万计个体进行全基因组测序,构建人群特异的基因组变异图谱。这些海量基因组数据产出,对计算速度和计算通量提出了新的要求,迫切需要速度更快、通量更高的计算平台来处理与解读这些生物序列信息。由于基因组数据自身的特点、数据解析过程的多样性和复杂性,致使在大规模人群基因组变异解析中高通量计算资源的使用效率低、计算速度慢、耗时长,服务器与本地数据交换不便,因此需要针对基因组变异解析进行多方面优化,通过软硬件开发来解决应用中存在的多种问题。本文拟对这些优化方法进行分析和综述。【方法】在高通量计算系统中,系统IO瓶颈问题是基因组变异解析并行化效率低的主要原因,通常采用基于分布式非结构化存储数据库以及对象存储系统,以提升IO的大规模可扩展能力,解决分析流程中存在的IO问题;同时通过基因组数据的高效压缩算法,可减少数据IO和传输压力。为了加快基因组数据解析速度,可在软件上采用神经网络等算法优化基因组解析方法,在硬件上使用FPGA(现场可编程逻辑门阵列)或GPU异构计算,以提高数据处理速度。【结果】综合来看,以上多方面的优化可以大幅提升基因组数据分析中高通量计算的性能,解决基因组数据处理中的存储墙问题,提高高通量计算资源的使用效率,大大减少全基因组变异解析的计算时间。【结论】高通量计算在基因组数据解析应用中存在的多种问题,可通过软硬件开发和优化得以解决,从而显著改进高通量计算在大规模人群队列变异解析应用中的计算效率,促进今后人群队列基因组研究与应用的广泛开展。 相似文献
7.
Recently, biology has been confronted with large multidimensional gene expression data sets where the expression of thousands of genes is measured over dozens of conditions. The patterns in gene expression are frequently explained retrospectively by underlying biological principles. Here we present a method that uses text analysis to help find meaningful gene expression patterns that correlate with the underlying biology described in scientific literature. The main challenge is that the literature about an individual gene is not homogenous and may addresses many unrelated aspects of the gene. In the first part of the paper we present and evaluate the neighbor divergence per gene (NDPG) method that assigns a score to a given subgroup of genes indicating the likelihood that the genes share a biological property or function. To do this, it uses only a reference index that connects genes to documents, and a corpus including those documents. In the second part of the paper we present an approach, optimizing separating projections (OSP), to search for linear projections in gene expression data that separate functionally related groups of genes from the rest of the genes; the objective function in our search is the NDPG score of the positively projected genes. A successful search, therefore, should identify patterns in gene expression data that correlate with meaningful biology. We apply OSP to a published gene expression data set; it discovers many biologically relevant projections. Since the method requires only numerical measurements (in this case expression) about entities (genes) with textual documentation (literature), we conjecture that this method could be transferred easily to other domains. The method should be able to identify relevant patterns even if the documentation for each entity pertains to many disparate subjects that are unrelated to each other. 相似文献
8.
基因是遗传的物质基础。生物体的生、长、病、老、死等一切生命现象都与基因有关。基因测序是解读生命的一种途径。随着新一代高通量测序技术的发展,每天会产生TB甚至更多的序列数据。合理诠释这些大规模及复杂高维度的数据成为获取数据后一个更大的难点,是当前生物研究的关键步骤,具有巨大的现实意义。海量高通量测序数据的存储、处理和分析都极大地挑战着当前的计算机系统和计算模式。本文将结合调研情况,尤其是华大基因的实例调研,讨论当前高通量测序数据分析的现状、问题和多方采取的措施。然而,面对高通量测序数据带来的挑战,仍需要多方密切合作和长久深入的研究。 相似文献
9.
As genomes evolve over hundreds of millions years, the chromosomes become rearranged, with segments of some chromosomes inverted, while other chromosomes reciprocally exchange chunks from their ends. These rearrangements lead to the scrambling of the elements of one genome with respect to another descended from a common ancestor. Multidisciplinary work undertakes to mathematically model these processes and to develop statistical analyses and mathematical algorithms to understand the scrambling in the chromo... 相似文献
10.
对土壤微生物的4类研究方法即:传统微生物培养法、微生物标记物法、BIOLOGGN微平板法和微生物分子生物技术方法及其应用特点进行了简要的评述和分析,旨在通过比较寻求能够揭示土壤微生物群落结构的最佳方法。分子生物技术方法与传统研究方法等相结合将是大力推动土壤微生物研究的有效方式。 相似文献
11.
12.
Energy Theft Detection in Smart Grids: Taxonomy,Comparative Analysis,Challenges, and Future Research Directions 下载免费PDF全文
Mohsin Ahmed Abid Khan Mansoor Ahmed Mouzna Tahir Gwanggil Jeon Giancarlo Fortino Francesco Piccialli 《IEEE/CAA Journal of Automatica Sinica》2022,9(4):578-600
Electricity theft is one of the major issues in developing countries which is affecting their economy badly.Especially with the introduction of emerging technologies,this issue became more complicated.Though many new energy theft detection(ETD)techniques have been proposed by utilising different data mining(DM)techniques,state&network(S&N)based techniques,and game theory(GT)techniques.Here,a detailed survey is presented where many state-of-the-art ETD techniques are studied and analysed for their strengths and limitations.Three levels of taxonomy are presented to classify state-of-the-art ETD techniques.Different types and ways of energy theft and their consequences are studied and summarised and different parameters to benchmark the performance of proposed techniques are extracted from literature.The challenges of different ETD techniques and their mitigation are suggested for future work.It is observed that the literature on ETD lacks knowledge management techniques that can be more effective,not only for ETD but also for theft tracking.This can help in the prevention of energy theft,in the future,as well as for ETD. 相似文献
13.
Structure models for each of the secondary structure regions from the Escherichia coli 16S rRNA (58 separate elements) were constructed using a constraint satisfaction modelling program to determine which helices deviated from classic A-form geometry. Constraints for each rRNA element included the comparative secondary structure, H-bonding conformations predicted from patterns of base-pair covariation, tertiary interactions predicted from covariation analysis, chemical probing data, rRNA–rRNA crosslinking information, and coordinates from solved structures. Models for each element were built using the MC-SYM modelling algorithm and subsequently were subjected to energy minimization to correct unfavorable geometry. Approximately two-thirds of the structures that result from the input data are very similar to A-form geometry. In the remaining instances, the presence of internal loops and bulges, some sequences (and sequence covariants) and accessory information require deviation from A-form geometry. The structures of regions containing more complex base-pairing arrangements including the central pseudoknot, the 530 region, and the pseudoknot involving base-pairing between G570-U571/A865-C866 and G861-C862/G867-C868 were predicted by this approach. These molecular models provide insight into the connection between patterns of H-bonding, the presence of unpaired nucleotides, and the overall geometry of each element. 相似文献
14.
风载是大气边界层中露天结构的主要侧向载荷。针对目前工程中普遍采用的理论计算方法及计算流体动力学(Computational Fluid Dynamics,CFD)方法,分别论述利用CFD方法和理论计算方法计算结构风压及风载的原理。采用基于Reynolds时均N-S方程和标准k-ε湍流模型对一简单结构在不同高度的风压分布进行数值模拟,分析结构表面风压的分布特性,在此基础上比较两种计算方法的优劣。研究结果表明,由于未考虑结构脉动风载,数值计算结果较理论计算更小,但CFD方法可以获得风场中参数的更多信息,在对复杂结构的风载计算方面更方便有效。 相似文献
15.
数据挖掘在生物信息学中的应用 总被引:3,自引:0,他引:3
生物信息学是一门新兴的交叉学科。人类基因组计划的启动和实施使得核酸?蛋白质数据迅速增长,如何从海量数据中获取有效信息成为生物信息学迫切要解决的问题。数据挖掘与生物信息学有很好的结合点,在生物信息学领域的应用潜力日益受到人们的重视。文中介绍了数据挖掘的概念、生物数据的挖掘步骤,初步探讨了数据挖掘在生物信息领域的应用潜力及生物信息学挖掘工具的开发和应用。研究证明数据挖掘技术是生物信息处理的强有力工具。数据挖掘在生物信息学中的应用将取得更大的进展。 相似文献
16.
本文分析了六种低成本冗余结构的可信性的定量属性,包括可靠度、MTTF、可用度、MTBF、可维度和MTTR。这些低成本冗余结构是;冷备用双模动态冗余结构、热备用双模动态冗余结构、双侯静态冗余结构、双模比较──单模切换结构、双摸负载分担──单模切换结构和N+1冗余结构。根据每种结构所可能具有的维修特性,分别给出了相应可信性属性在不可维和可维情况下的计算表达式。 相似文献
17.
Autocatalytic networks, in particular the glycolytic pathway, constitute an important part of the cell metabolism. Changes in the concentration of metabolites and catalyzing enzymes during the lifetime of the cell can lead to perturbations from its nominal operating condition. We investigate the effects of such perturbations on stability properties, e.g., the extent of regions of attraction, of a particular family of autocatalytic network models. Numerical experiments demonstrate that systems that are robust with respect to perturbations in the parameter space have an easily “verifiable” (in terms of proof complexity) region of attraction properties. Motivated by the computational complexity of optimization-based formulations, we take a compositional approach and exploit a natural decomposition of the system, induced by the underlying biological structure, into a feedback interconnection of two input–output subsystems: a small subsystem with complicating nonlinearities and a large subsystem with simple dynamics. This decomposition simplifies the analysis of large pathways by assembling region of attraction certificates based on the input–output properties of the subsystems. It enables numerical as well as analytical construction of block-diagonal Lyapunov functions for a large family of autocatalytic pathways. 相似文献
18.
G.S. Chirikjian 《Advanced Robotics》2015,29(13):817-829
Hyper-redundant (or snakelike) manipulators have many more degrees of freedom than required to position and orient an object in space. They have been employed in a variety of applications ranging from search-and-rescue to minimally invasive surgical procedures, and recently they even have been proposed as solutions to problems in maintaining civil infrastructure and the repair of satellites. The kinematic and dynamic properties of snakelike robots are captured naturally using a continuum backbone curve equipped with a naturally evolving set of reference frames, stiffness properties, and mass density. When the snakelike robot has a continuum architecture, the backbone curve corresponds with the physical device itself. Interestingly, these same modeling ideas can be used to describe conformational shapes of DNA molecules and filamentous protein structures in solution and in cells. This paper reviews several classes of snakelike robots: (1) hyper-redundant manipulators guided by backbone curves; (2) flexible steerable needles; and (3) concentric tube continuum robots. It is then shown how the same mathematical modeling methods used in these robotics contexts can be used to model molecules such as DNA. All of these problems are treated in the context of a common mathematical framework based on the differential geometry of curves, continuum mechanics, and variational calculus. Both coordinate-dependent Euler–Lagrange formulations and coordinate-free Euler–Poincaré approaches are reviewed. 相似文献
19.
度量空间中高维索引结构回顾 总被引:4,自引:0,他引:4
1 引言近年来,高维数据库的应用得到快速的发展,如海量的多媒体数据库、大规模的文本数据以及生物信息学中庞大的DNA数据库等,这些信息一般使用特征抽取等方法映射为高维数据,然后通过计算这些高维数据之间距离实现相似性查询。例如,对于图像数据,往往采用颜色直方图来表征一幅图像,当需要从数据集查找与给定图像相似的图像时,通过计算 相似文献
20.
Analysis of Consumer Value Using Semantic Network: The Comparison of Hierarchical and Nonhierarchical Value Structures 下载免费PDF全文
This study compares the value structure of consumers derived from a nonhierarchical method that only requires grouping similar components and a hierarchical method that needs additional steps to classify components into the levels of abstraction, using semantic network analysis. The overall process of understanding consumers’ value structure consists of data collection, data structuring, and network analysis using UCINET 6.0. A case study was conducted to identify the value structure of teenage Internet use behavior. Based on the relative ranking of words with the smallest farness from others, the nonhierarchical method showed “beauty” as the key value of teenagers, while the hierarchical method revealed “warm relationship” as the critical value in their use of the Internet. This nonhierarchical method showed the ability to elicit more diverse values, depending on the characteristics of consumer groups when compared with conventional hierarchical method. © 2016 Wiley Periodicals, Inc. 相似文献