首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于BP神经网络的肿瘤特征基因选取   总被引:2,自引:0,他引:2  
该文提出基于BP神经网络的灵敏度分析方法,并用于选取肿瘤特征基因。以结肠癌基因表达谱为例,首先定义基因对BP神经网络模型输出函数的灵敏度,递归去除灵敏度较低的若干基因,生成一组嵌套的候选特征基因子集。然后以支持向量机为分类器,检验候选特征基因子集对样本分类的贡献,选取错分率最低的候选特征基因子集为结肠癌特征基因子集。通过实验对比,该特征基因子集的分类结果优于文献给出的其他特征基因子集,表明了该方法的可行性和有效性。  相似文献   

2.
The information bottleneck (IB) method is an unsupervised model independent data organization technique. Given a joint distribution, p(X, Y), this method constructs a new variable, T, that extracts partitions, or clusters, over the values of X that are informative about Y. Algorithms that are motivated by the IB method have already been applied to text classification, gene expression, neural code, and spectral analysis. Here, we introduce a general principled framework for multivariate extensions of the IB method. This allows us to consider multiple systems of data partitions that are interrelated. Our approach utilizes Bayesian networks for specifying the systems of clusters and which information terms should be maintained. We show that this construction provides insights about bottleneck variations and enables us to characterize the solutions of these variations. We also present four different algorithmic approaches that allow us to construct solutions in practice and apply them to several real-world problems.  相似文献   

3.
Xintao  Yong   《Pattern recognition》2006,39(12):2439-2449
DNA microarray provides a powerful basis for analysis of gene expression. Bayesian networks, which are based on directed acyclic graphs (DAGs) and can provide models of causal influence, have been investigated for gene regulatory networks. The difficulty with this technique is that learning the Bayesian network structure is an NP-hard problem, as the number of DAGs is superexponential in the number of genes, and an exhaustive search is intractable. In this paper, we propose an enhanced constraint-based approach for causal structure learning. We integrate with graphical Gaussian modeling and use its independence graph as an input of our constraint-based causal learning method. We also present graphical decomposition techniques to further improve the performance. Our enhanced method makes it feasible to explore causal interactions among genes interactively. We have tested our methodology using two microarray data sets. The results show that the technique is both effective and efficient in exploring causal structures from microarray data.  相似文献   

4.
高娟  王国胤  胡峰 《计算机科学》2012,39(10):193-197
从信息学角度出发寻找肿瘤相关基因、发现肿瘤基因表达特征对肿瘤的诊断和治疗具有重要的生物学意义,而肿瘤与正常组织的分类是其中一个重要应用。根据多类别肿瘤基因表达谱,提出了一种自动特征选择方法。首先,结合非参数方法和filter思想,利用决策序列的随机性度量基因的权值并排序;然后,采用相关信息熵进行冗余性排除,自动地选择出具有高分辨能力、低冗余度的特征基因子集。实验结果表明,提出的方法能从多类别肿瘤基因表达谱数据中自动选出30个具有良好分类能力的特征基因,且具有较高的正确识别率。  相似文献   

5.
从癌症基因表达谱分析入手,针对基因表达谱维数高、样本少的特点,提出一种用于癌症分类的基于邻域粗糙集和概率神经网络集成的分类方法.首先利用Relief算法对基因进行排序,然后利用邻域粗糙集选取分类特征基因,最后结合概率神经网络集成分类模型进行癌症分类.实验结果表明,该方法可以快速有效地选取癌症特征基因,能获得更好的分类效...  相似文献   

6.
In computational biology, gene networks are typically inferred from gene expression data alone. Incorporating multiple types of biological evidences makes it possible to improve gene network estimation. In this paper, we describe an approach for building enzyme gene networks by the integration of gene expression data, motif sequence, and metabolic information. To evaluate the approach, we apply it to a pool of E. coli genes related to aspartate pathway. The results show that integrative approach has potentials of obtaining more accurate gene networks.  相似文献   

7.
The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a two-stage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset.  相似文献   

8.
Gene expression data are expected to be of significant help in the development of efficient cancer diagnosis and classification platforms. One problem arising from these data is how to select a small subset of genes from thousands of genes and a few samples that are inherently noisy. This research aims to select a small subset of informative genes from the gene expression data which will maximize the classification accuracy. A model for gene selection and classification has been developed by using a filter approach, and an improved hybrid of the genetic algorithm and a support vector machine classifier. We show that the classification accuracy of the proposed model is useful for the cancer classification of one widely used gene expression benchmark data set.  相似文献   

9.
This paper investigates Bayesian modeling of known and unknown causes of events in the context of disease-outbreak detection. We introduce a multivariate Bayesian approach that models multiple evidential features of every person in the population. This approach models and detects (1) known diseases (e.g., influenza and anthrax) by using informative prior probabilities and (2) unknown diseases (e.g., a new, highly contagious respiratory virus that has never been seen before) by using relatively non-informative prior probabilities. We report the results of simulation experiments which support that this modeling method can improve the detection of new disease outbreaks in a population. A contribution of this paper is that it introduces a multivariate Bayesian approach for jointly modeling both known and unknown causes of events. Such modeling has general applicability in domains where the space of known causes is incomplete.  相似文献   

10.
Embar  Varun  Srinivasan  Sriram  Getoor  Lise 《Machine Learning》2021,110(7):1847-1866

Statistical relational learning (SRL) and graph neural networks (GNNs) are two powerful approaches for learning and inference over graphs. Typically, they are evaluated in terms of simple metrics such as accuracy over individual node labels. Complex aggregate graph queries (AGQ) involving multiple nodes, edges, and labels are common in the graph mining community and are used to estimate important network properties such as social cohesion and influence. While graph mining algorithms support AGQs, they typically do not take into account uncertainty, or when they do, make simplifying assumptions and do not build full probabilistic models. In this paper, we examine the performance of SRL and GNNs on AGQs over graphs with partially observed node labels. We show that, not surprisingly, inferring the unobserved node labels as a first step and then evaluating the queries on the fully observed graph can lead to sub-optimal estimates, and that a better approach is to compute these queries as an expectation under the joint distribution. We propose a sampling framework to tractably compute the expected values of AGQs. Motivated by the analysis of subgroup cohesion in social networks, we propose a suite of AGQs that estimate the community structure in graphs. In our empirical evaluation, we show that by estimating these queries as an expectation, SRL-based approaches yield up to a 50-fold reduction in average error when compared to existing GNN-based approaches.

  相似文献   

11.
Graphical models, such as Bayesian networks and Markov networks, represent joint distributions over a set of variables by means of a graph. When the graph is singly connected, local propagation rules of the sort proposed by Pearl (1988) are guaranteed to converge to the correct posterior probabilities. Recently a number of researchers have empirically demonstrated good performance of these same local propagation schemes on graphs with loops, but a theoretical understanding of this performance has yet to be achieved. For graphical models with a single loop, we derive an analytical relationship between the probabilities computed using local propagation and the correct marginals. Using this relationship we show a category of graphical models with loops for which local propagation gives rise to provably optimal maximum a posteriori assignments (although the computed marginals will be incorrect). We also show how nodes can use local information in the messages they receive in order to correct their computed marginals. We discuss how these results can be extended to graphical models with multiple loops and show simulation results suggesting that some properties of propagation on single-loop graphs may hold for a larger class of graphs. Specifically we discuss the implication of our results for understanding a class of recently proposed error-correcting codes known as turbo codes.  相似文献   

12.
We present a computational framework for identifying a set of initial states from which all trajectories of a piecewise affine (PWA) system with additive uncertainty satisfy a linear temporal logic (LTL) formula over a set of linear predicates in its state variables. Our approach is based on the construction and refinement of finite abstractions of infinite systems. We derive conditions guaranteeing the equivalence of an infinite system and its finite abstraction with respect to a specific LTL formula and propose a method for the construction of such formula-equivalent abstractions. While provably correct, the overall method is conservative and expensive. A tool for PWA systems implementing the proposed procedure using polyhedral operations and analysis of finite graphs is made available. Examples illustrating the analysis of PWA models of gene networks are included.  相似文献   

13.
Synthesizing networks that satisfy multiple requirements, such as high reliability, low diameter, good embeddability, etc., is a difficult problem to which there has been no completely satisfactory solution. We present a simple, yet very effective, approach to this problem. The crux of our approach is a filtration process that takes as input a large set of randomly generated graphs and filters out those that do not meet the specified requirements. Our experimental results show that this approach is both practical and powerful. The use of random regular networks as the raw material for the filtration process was motivated by their surprisingly good performance with regard to almost all properties that characterize a good interconnection network. We provide results related to the generation of networks that have low diameter, high fault tolerance, and good embeddability. Through this, we show that the generated networks are serious competitors to several traditional well-known networks. We also explore how random networks can be used in a packaging hierarchy and comment on the scope of application of these networks.  相似文献   

14.
Wireless sensor networks (WSNs) have become increasingly appealing in recent years for the purpose of data acquisition, surveillance, event monitoring, etc. Optimal positioning of wireless sensor nodes is an important issue for small networks of relatively expensive sensing devices. For such networks, the placement problem requires that multiple objectives be met. These objectives are usually conflicting, e.g. achieving maximum coverage and maximum connectivity while minimizing the network energy cost. A flexible algorithm for sensor placement (FLEX) is presented that uses an evolutionary computational approach to solve this multiobjective sensor placement optimization problem when the number of sensor nodes is not fixed and the maximum number of nodes is not known a priori. FLEX starts with an initial population of simple WSNs and complexifies their topologies over generations. It keeps track of new genes through historical markings, which are used in later generations to assess two networks’ compatibility and also to align genes during crossover. It uses Pareto-dominance to approach Pareto-optimal layouts with respect to the objectives. Speciation is employed to aid the survival of gene innovations and facilitate networks to compete with similar networks. Elitism ensures that the best solutions are carried over to the next generation. The flexibility of the algorithm is illustrated by solving the device/node placement problem for different applications like facility surveillance, coverage with and without obstacles, preferential surveillance, and forming a clustering hierarchy.  相似文献   

15.
Visualizing pathways, i. e. models of cellular functional networks, is a challenging task in computer assisted biomedicine. Pathways are represented as large collections of interwoven graphs, with complex structures present in both the individual graphs and their interconnections. This situation requires the development of novel visualization techniques to allow efficient visual exploration. We present the Caleydo framework, which incorporates a number of approaches to handle such pathways. Navigation in the network of pathways is facilitated by a hierarchical approach which dynamically selects a working set of individual pathways for closer inspection. These pathways are interactively rendered together with visual interconnections in a 2.5D view using graphics hardware acceleration. The layout of individual graphs is not computed automatically, but taken from the KEGG and BioCarta databases, which use layouts that life scientists are familiar with. Therefore they encode essential meta‐information. While the KEGG and BioCarta pathways use a pre‐defined layout, interactions such as linking+brushing, neighborhood search or detail on demand are still fully interactive in Caleydo. We have evaluated Caleydo with pathologists working on the determination of unknown gene functions. Informal experiences confirm that Caleydo is useful in both generating and validating such hypotheses. Even though the presented techniques are applied to medical pathways, the proposed way of interaction is not limited to cellular processes and therefore has the potential to open new possibilities in other fields of application.  相似文献   

16.
17.
针对动态图的聚类主要存在着两点不足:首先, 现有的经典聚类算法大多从静态图分析的角度出发, 无法对真实网络图持续演化的特性进行有效建模, 亟待对动态图的聚类算法展开研究, 通过对不同时刻图快照的聚类结构进行分析进而掌握图的动态演化情况.其次, 真实网络中可以预先获取图中部分节点的聚类标签, 如何将这些先验信息融入到动态图的聚类结构划分中, 从而向图中的未标记节点分配聚类标签也是本文需要解决的问题.为此, 本文提出进化因子图模型(Evolution factor graph model, EFGM)用于解决动态图节点的半监督聚类问题, 所提EFGM不仅可以捕获动态图的节点属性和边邻接属性, 还可以捕获节点的时间快照信息.本文对真实数据集进行实验验证, 实验结果表明EFGM算法将动态图与先验信息融合到一个统一的进化因子图框架中, 既使得聚类结果满足先验知识, 又契合动态图的整体演化规律, 有效验证了本文方法的有效性.  相似文献   

18.
基于遗传算法的结肠癌基因选择与样本分类   总被引:2,自引:1,他引:1       下载免费PDF全文
提出了一种基于两轮遗传算法的用于结肠癌微阵列数据基因选择与样本分类的新方法。该方法先根据基因的Bhattacharyya距离指标过滤大部分与分类不相关的基因,而后使用结合了遗传算法和CFS(Correlation-based Feature Selection)的GA/CFS方法选择优秀基因子集,并存档记录这些子集。根据存档子集中基因被选择的频率选择进一步搜索的候选子集,最后以结合了遗传算法和SVM的GA/SVM从候选基因子集中选择分类特征子集。把这种GA/CFS-GA/SVM方法应用到结肠癌微阵列数据,实验结果及与文献的比较表明了该方法效果良好。  相似文献   

19.
Social networks are usually modeled and represented as deterministic graphs with a set of nodes as users and edges as connection between users of networks. Due to the uncertain and dynamic nature of user behavior and human activities in social networks, their structural and behavioral parameters are time varying parameters and for this reason using deterministic graphs for modeling and analysis of behavior of users may not be appropriate. In this paper, we propose that stochastic graphs, in which weights associated with edges are random variables, may be a better candidate as a graph model for social network analysis. Thus, we first propose generalization of some network measures for stochastic graphs and then propose six learning automata based algorithms for calculating these measures under the situation that the probability distribution functions of the edge weights of the graph are unknown. Simulations on different synthetic stochastic graphs for calculating the network measures using the proposed algorithms show that in order to obtain good estimates for the network measures, the required number of samples taken from edges of the graph is significantly lower than that of standard sampling method aims to analysis of human behavior in online social networks.  相似文献   

20.
Modern information networks, such as social networks, communication networks, and citation networks, are often characterized by very large sizes and dynamically changing structures. Common solutions to graph mining tasks (e.g., node classification) usually employ an unrestricted sampling-then-mining paradigm to reduce a large network to a manageable size, followed by subsequent mining tasks. However, real-world networks may be unaccessible at once and must be crawled progressively. This can be due to the fact that the size of the network is too large, or some privacy/legal concerns. In this paper, we propose an Active Exploration framework for large graphs, where the goal is to simultaneously carry out network sampling and node labeling in order to build a sampled network from which the trained classifier can have the maximum node classification accuracy. To achieve this goal, we consider a network as a Markov chain and compute the stationary distribution of the nodes by deriving supervised random walks. The stationary distribution helps identify specific nodes to be sampled in the next step, and the labeling process labels the most informative node which in turn strengthens the sampling of the network. To improve the scalability of active exploration for large graphs, we also propose a more efficient multi-seed algorithm that simultaneously runs multiple, parallel exploration processes, and makes joint decisions to determine which nodes are to be sampled and labeled next. The simultaneous, mutually enhanced sampling and labeling processes ensure that the final sampled network contains a maximum number of nodes directly related to the underlying mining tasks. Experiments on both synthetic and real-world networks demonstrate that our active exploration algorithms have much better chance to include target nodes in the sampled networks than baseline methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号