首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
To utilize fully all available information in protein structure prediction, including both backbone and side-chain structures, we present a novel algorithm for solving a generalized threading problem. In this problem we consider simultaneous backbone threading and side-chain packing during the process of a protein structure prediction. For a given query protein sequence and a template structure, our goal is to find a threading alignment between the query sequence and the template structure, along with a rotamer assignment for each side-chain of the query protein, which optimizes an energy function that combines a backbone threading energy and a side-chain packing energy. This highly computationally challenging problem is solved through first formulating this problem as a graph-based optimization problem. Various graph-theoretic techniques are employed to achieve the computational efficiency to make our algorithm practically useful, which takes advantage of a number of special properties of the graph representing this generalized threading problem. The overall framework of our algorithm is a dynamic programming algorithm implemented on an optimal tree decomposition of the graph representation of our problem. By using various additional heuristic techniques such as dead-end elimination, we have demonstrated that our algorithm can solve a generalized threading problem within a practically acceptable amount of time and space, the first of its kind.  相似文献   

2.
The protein threading problem is the problem of determining the three-dimensional structure of a given but arbitrary protein sequence from a set of known structures of other proteins. This problem is known to be NP-hard and current computational approaches to threading are unrealistic for long proteins and/or large template data sets. In this paper, we propose an evolution strategy for the solution of the protein threading problem. We also propose three parallel methods for fast threading. Our experiments produced encouraging preliminary results in term of threading energy as well as significant reduction in threading time.  相似文献   

3.
The increasing popularity of graph data in various domains has lead to a renewed interest in developing efficient graph matching techniques, especially for processing large graphs. In this paper, we study the problem of approximate graph matching in a large attributed graph. Given a large attributed graph and a query graph, we compute a subgraph of the large graph that best matches the query graph. We propose a novel structure-aware and attribute-aware index to process approximate graph matching in a large attributed graph. We first construct an index on the similarity of the attributed graph, by partitioning the large search space into smaller subgraphs based on structure similarity and attribute similarity. Then, we construct a connectivity-based index to give a concise representation of inter-partition connections. We use the index to find a set of best matching paths. From these best matching paths, we compute the best matching answer graph using a greedy algorithm. Experimental results on real datasets demonstrate the efficiency of both index construction and query processing. We also show that our approach attains high-quality query answers.  相似文献   

4.
摘要:在蛋白质结构预测的研究中,一个重要的问题就是正确预测二硫键的连接,二硫键的准确预测可以减少蛋白质构像的搜索空间,有利于蛋白质的3D结构的预测。本文将一个蛋白质结构中二硫键的预测问题,等价为一个寻找图的最大权的匹配问题。图的顶点表示序列中的半胱氨酸残基,边连接每一顶点,表示一种可能的连接方式,边的权根据一个权值函数赋值,用EJ算法寻找具有最大权的匹配,则这个匹配对应二硫键的正确连接。应用这个方法对蛋白质结构的二硫键进行了预测取得了良好的结果。  相似文献   

5.
We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions.  相似文献   

6.
《Computers & chemistry》1998,21(5):369-375
Six protein pairs, all with known 3D-structures, were used to evaluate different protein structure prediction tools. Firstly, alignments between a target sequence and a template sequence or structure were obtained by sequence alignment with QUANTA or by threading with THREADER, 123D and PHD Topits. Secondly, protein structure models were generated using MODELLER. The two protein structure assessment tools used were the root mean square deviation (RMSD) compared with the experimental target structure and the total 3D profile score. Also the accuracy of the active sites of models built in the absence and presence of ligands was investigated. Our study confirms that threading methods are able to yield more accurate models than comparative modelling in cases of low sequence identity (<30%). However, a gap of 2 Å(RMSD) exists between the theoretically best model and the models obtained by threading methods. For high sequence identities (>30%) comparative modelling using MODELLER resulted in accurate models. Furthermore, the total 3D profile score was not always able to distinguish correct from incorrect folds when different alignment methods were used. Finally, we found it to be important to include possible ligands in the model-building process in order to prevent unrealistic filling of active site areas.  相似文献   

7.
We consider the problem of partial shape matching. We propose to transform shapes into sequences and utilize an algorithm that determines a subsequence of a target sequence that best matches a query. In the proposed algorithm we map the problem of the best matching subsequence to the problem of a cheapest path in a directed acyclic graph (DAG). The approach allows us to compute the optimal scale and translation of sequence values, which is a nontrivial problem in the case of subsequence matching. Our experimental results demonstrate that the proposed algorithm outperforms the commonly used techniques in retrieval accuracy.  相似文献   

8.
Given the amino-acid sequence of a protein, the prediction of a protein’s tertiary structure is known as the protein folding problem. The protein folding problem in the hydrophobic–hydrophilic lattice model is to find the lowest energy conformation. In order to enhance the performance of predicting protein structure, in this paper we propose an efficient hybrid Taguchi-genetic algorithm that combines genetic algorithm, Taguchi method, and particle swarm optimization (PSO). The GA has the capability of powerful global exploration, while the Taguchi method can exploit the optimum offspring. In addition, we present the PSO inspired by a mutation mechanism in a genetic algorithm. We demonstrate that our algorithm can be applied successfully to the protein folding problem based on the hydrophobic-hydrophilic lattice model. Simulation results indicate that our approach performs very well against existing evolutionary algorithm.  相似文献   

9.
In this paper, we study the protein threading problem, which was proposed for predicting a folded 3D protein structure from an amino acid sequence. Since this problem was already proved to be NP-hard, we study polynomial time approximation algorithms. We show several hardness results for the approximation, which includes a MAX SNP-hardness result. We also show approximation algorithms for a special case and a general case, where a graph representing interactions between amino acid residues is restricted to be planar in a special case. For this special case, we obtain a constant approximation ratio.  相似文献   

10.
正则路径查询是一种应用正则表达式在图数据上进行查询的技术,通常利用有限状态自动机实现查询匹配。现有正则路径查询方法的匹配结果为顶点对的序列,未能充分保留图的结构,为了解决这一问题,提出了一种面向图数据的结构化正则路径查询方法,通过在不同的序列间加以结构化约束,使得查询结果由路径转变为子图。为了实现这一目的,首先定义了一种结构化的正则路径查询语言,并设计了结构化的查询解析以及基于此结构的匹配算法。实验在模拟数据集和真实数据集上进行了测试与分析,验证了网络规模对查询速度的影响,并设置了对照实验。实验结果表明,提出方法能够在保证满足正则表达式约束的前提下实现结构化查询。  相似文献   

11.
刘磊军  朱猛  张磊 《计算机应用》2015,35(11):3161-3165
针对移动对象轨迹预测所面临的"数据稀疏"问题,即有效的历史轨迹空间不能覆盖所有可能的查询轨迹,提出了一种基于迭代网格划分和熵估计的稀疏轨迹预测算法(TPDS-IGP&EE).首先,对轨迹区域进行迭代网格划分并生成轨迹序列;然后,引入L-Z熵估计计算轨迹序列的熵值,在轨迹熵值的基础上进行轨迹综合形成新的轨迹空间;最后,结合子轨迹综合算法,进行稀疏轨迹预测.实验结果表明,当轨迹完整度达到90%以上,Baseline算法的查询覆盖率只有25%左右;而TPDS-IGP&EE算法几乎不受查询轨迹长度的影响,可以预测几乎100%的查询轨迹;并且TPDS-IGP&EE算法的预测准确率普遍高于Baseline算法4%左右;同时Baseline算法的预测时间非常长,达到100 ms,而TPDS-IGP&EE算法的预测时间(10 μs)几乎可以忽略不计.TPDS-IGP&EE算法能够有效地进行稀疏环境下的轨迹预测,具有更广的预测范围、更快的预测速度和较高的预测准确率.  相似文献   

12.
图近似查询能够得到与查询图近似的结果集,相比较精确查询具有更广泛的应用范围。为提高近似查询的查准率和查全率,提出一种基于图结构分解的查询算法。该算法通过对查询图和目标图进行图结构分解,对其建立图分解索引,利用查询图的最小生成树集得到满足阈值的生成树集,通过图标准编码在索引中快速定位,查找出所有可能的近似结果。实验结果表明,该算法能有效得到近似结果,提高查询速度。  相似文献   

13.
Semijoin has traditionally been relied upon to reduce the cost of data transmission for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the amount of data transmission required. In view of this fact, we explore the approach of using join operations as reducers in distributed query processing. We first show that the problem of determining a sequence of join operations for a query can be transformed to that of finding a specific type of set of cuts to the corresponding query graph, where a cut to a graph is a partition of nodes in that graph. Then, in light of this concept, we prove that the problem of determining the optimal sequence of join operations for a given query graph is of exponential complexity, thus justifying the necessity of applying heuristic approaches to solve this problem. By mapping the problem of determining a sequence of join reducers into the one of finding a set of cuts, we develop (for tree and general query graphs, respectively) efficient heuristic algorithms to determine a join reducer sequence for distributed query processing. The algorithms developed are based on the concept of divide and conquer and are of polynomial time complexity. Simulation is performed to evaluate these algorithms  相似文献   

14.
This paper addresses the classic job shop scheduling problem where sequence dependent setup times are present. Based on a modified disjunctive graph, we further investigate and generalize structural properties for the problem under study. A tabu search algorithm with a sophisticated neighbourhood structure is then developed. Compared to most studies in this research area, we are interested in moving internal critical operations rather than merely focusing on non-internal ones. Moreover, neighbourhood functions are defined using insertion techniques instead of simple swaps. Test results show that our algorithm outperforms a simulated annealing algorithm which is recently published. We have also conducted experiments considering the efficiency of developed propositions.  相似文献   

15.
16.
以简化卫星舱承载板上三维布局设计问题为背景,研究一类带静不平衡约束的圆柱体和长方体混合待布物布局问题。针对该三维布局问题,将已成功应用于统计物理学和蛋白质结构预测的Wang-Landau抽样算法引入布局问题中。Wang- Landau抽样算法通过在复杂布局空间中进行有效抽样来得到一个平坦的能量直方图,从而精确估计布局系统的状态密度。通过将Wang- Landau抽样算法与带加速策略的最速下降法、质心平移策略相结合,提出了改进的Wang-Landau抽样算法。对文献中两个算例进行了实算,计算结果表明,改进的Wang-Landau抽样算法的收敛速度和解的质量相比文献中其它算法均有较大的提高。  相似文献   

17.
In many applications, information is best represented as graphs. In a dynamic world, information changes and so the graphs representing the information evolve with time. We propose that historical graph-structured data be maintained for analytical processing. We call a historical evolving graph sequence an EGS. We observe that in many applications, graphs of an EGS are large and numerous, and they often exhibit much redundancy among them. We study the problem of efficient shortest path query processing on an EGS and put forward a solution framework called FVF. Two algorithms, namely, FVF-F and FVF-H, are proposed. While the FVF-F algorithm works on a sequence of flat graph clusters, the FVF-H algorithm works on a hierarchy of such clusters. Through extensive experiments on both real and synthetic datasets, we show that our FVF framework is highly efficient in shortest query processing on EGSs. Comparing FVF-F and FVF-H, the latter gives a larger speedup, is more flexible in terms of memory requirements, and is far less sensitive to parameter values.  相似文献   

18.
Frequent subgraphs proved to be powerful features for graph classification and prediction tasks. Their practical use is, however, limited by the computational intractability of pattern enumeration and that of graph embedding into frequent subgraph feature spaces. We propose a simple probabilistic technique that resolves both limitations. In particular, we restrict the pattern language to trees and relax the demand on the completeness of the mining algorithm, as well as on the correctness of the pattern matching operator by replacing transaction and query graphs with small random samples of their spanning trees. In this way we consider only a random subset of frequent subtrees, called probabilistic frequent subtrees, that can be enumerated efficiently. Our extensive empirical evaluation on artificial and benchmark molecular graph datasets shows that probabilistic frequent subtrees can be listed in practically feasible time and that their predictive and retrieval performance is very close even to those of complete sets of frequent subgraphs. We also present different fast techniques for computing the embedding of unseen graphs into (probabilistic frequent) subtree feature spaces. These algorithms utilize the partial order on tree patterns induced by subgraph isomorphism and, as we show empirically, require much less evaluations of subtree isomorphism than the standard brute-force algorithm. We also consider partial embeddings, i.e., when only a part of the feature vector has to be calculated. In particular, we propose a highly effective practical algorithm that significantly reduces the number of pattern matching evaluations required by the classical min-hashing algorithm approximating Jaccard-similarities.  相似文献   

19.
Recently, uncertain graph data management and mining techniques have attracted significant interests and research efforts due to potential applications such as protein interaction networks and social networks. Specifically, as a fundamental problem, subgraph similarity all-matching is widely applied in exploratory data analysis. The purpose of subgraph similarity all-matching is to find all the similarity occurrences of the query graph in a large data graph. Numerous algorithms and pruning methods have been developed for the subgraph matching problem over a certain graph. However, insufficient efforts are devoted to subgraph similarity all-matching over an uncertain data graph, which is quite challenging due to high computation costs. In this paper, we define the problem of subgraph similarity maximal all-matching over a large uncertain data graph and propose a framework to solve this problem. To further improve the efficiency, several speed-up techniques are proposed such as the partial graph evaluation, the vertex pruning, the calculation model transformation, the incremental evaluation method and the probability upper bound filtering. Finally, comprehensive experiments are conducted on real graph data to test the performance of our framework and optimization methods. The results verify that our solutions can outperform the basic approach by orders of magnitudes in efficiency.  相似文献   

20.
The aim of this work is to construct a tool to assist in the prediction of peptidic properties resulting from the exchange of two amino acids in a proteic chain. In the past others have used experimental properties for this purpose. However, the nature of these data sets severely limits their access to important properties pertaining to secondary structure, and hence the indices used cannot characterize different backbone conformers like alpha helix and beta strands, or side-chain conformations like gauche +, gauche - and trans. In this study we explore the importance of backbone and side-chain angles with regard to conformer similarity measured with theoretical properties calculated in an ab initio manner. For each of the 20 genetically encoded amino acids, we studied five conformers that correspond to alpha helical and beta strand structures, with three different side chain conformations for each, defined solely by their angles phi, psi and chi1. This methodology allowed each of the 108 conformers to be represented by a mathematical object without ambiguity. The peptidic chain was emulated using two capping models to simulate the effect of nearest neighbors. These are OHC-Xaa-NH2 and Ala-Xaa-Ala, where Xaa is the conformer of interest. We then calculated 40 ab initio quantum chemical and graph theory indices for each backbone-side-chain conformer to obtain a characterization and classification scheme. We found that: (1) while backbone structure is very important to conformer similarity, side-chain conformations do not cluster together in a top-level manner; (2) amino acids with pi electrons group together independent of backbone conformation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号