首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 0 毫秒
1.
Unsupervised clustering methods such as K-means, hierarchical clustering and fuzzy c-means have been widely applied to the analysis of gene expression data to identify biologically relevant groups of genes. Recent studies have suggested that the incorporation of biological information into validation methods to assess the quality of clustering results might be useful in facilitating biological and biomedical knowledge discoveries. In this study, we generalize two bio-validity indices, the biological homogeneity index and the biological stability index, to quantify the abilities of soft clustering algorithms such as fuzzy c-means and model-based clustering. The results of an evaluation of several existing soft clustering algorithms using simulated and real data sets indicate that the soft versions of the indices provide both better precision and better accuracy than the classical ones. The significance of the proposed indices is also discussed.  相似文献   

2.
On-line fuzzy modeling via clustering and support vector machines   总被引:1,自引:0,他引:1  
Wen Yu  Xiaoou Li 《Information Sciences》2008,178(22):4264-4279
In this paper, we propose a novel approach to identify unknown nonlinear systems with fuzzy rules and support vector machines. Our approach consists of four steps which are on-line clustering, structure identification, parameter identification and local model combination. The collected data are firstly clustered into several groups through an on-line clustering technique, then structure identification is performed on each group using support vector machines such that the fuzzy rules are automatically generated with the support vectors. Time-varying learning rates are applied to update the membership functions of the fuzzy rules. The modeling errors are proven to be robustly stable with bounded uncertainties by a Lyapunov method and an input-to-state stability technique. Comparisons with other related works are made through a real application of crude oil blending process. The results demonstrate that our approach has good accuracy, and this method is suitable for on-line fuzzy modeling.  相似文献   

3.
为改善传统的基因表达数据聚类方法正确率偏低的问题,研究了支持向量数据描述(SVDD)算法在基因表达数据聚类中的应用,该方法通过寻找最优分类超球实现对数据集的有效聚类.将类间信息融入聚类有效性评估准则中,通过模拟退火优化算法寻找SVDD算法中的最优核函数参数和惩罚因子,在训练时引入非样本数据提高运算效率.对酵母细胞生长周期的基因表达数据集的仿真实验结果表明,在新的聚类有效性评估准则下进行参数寻优,能够更快更好地得到最佳参数,同时,算法具有聚类精度高和运算速度快的优点.  相似文献   

4.
In this paper a fuzzy point symmetry based genetic clustering technique (Fuzzy-VGAPS) is proposed which can automatically determine the number of clusters present in a data set as well as a good fuzzy partitioning of the data. The clusters can be of any size, shape or convexity as long as they possess the property of symmetry. Here the membership values of points to different clusters are computed using the newly proposed point symmetry based distance. A variable number of cluster centers are encoded in the chromosomes. A new fuzzy symmetry based cluster validity index, FSym-index is first proposed here and thereafter it is utilized to measure the fitness of the chromosomes. The proposed index can detect non-convex, as well as convex-non-hyperspherical partitioning with variable number of clusters. It is mathematically justified via its relationship to a well-defined hard cluster validity function: the Dunn’s index, for which the condition of uniqueness has already been established. The results of the Fuzzy-VGAPS are compared with those obtained by seven other algorithms including both fuzzy and crisp methods on four artificial and four real-life data sets. Some real-life applications of Fuzzy-VGAPS to automatically cluster the gene expression data as well as segmenting the magnetic resonance brain image with multiple sclerosis lesions are also demonstrated.  相似文献   

5.
6.
Road slope collapse events are frequent occurrences in Taiwan, often exacerbated by earthquakes and/or heavy rainfall. Such collapses disrupt transportation, damage infrastructure and property, and may cause injuries and fatalities. While significant efforts are regularly invested in reducing road slope collapse risk, most focus exclusively on limiting the potential for slope failure. Collapse prediction efforts may result in inference errors that cause allocated road slope maintenance resources to be expended inefficiently, resulting in relatively higher collapse risk than should be achievable under ideal circumstances. Most maintenance programs rely on decision maker risk preferences, as his/her knowledge and experience can contribute to risk assessment decision making. The decision maker is capable of choosing an acceptable balance between two types of inference error, i.e., α and β errors. This preference may later be used as guidance to minimize inference error. This paper proposed the evolutionary risk preference fuzzy support vector machine inference model (ERP-FSIM) as a hybrid AI system able to make predictions regarding road slope collapse that takes decision maker risk preference into account. Validation results demonstrate ERP-FSIM viability, as level of average error both for the training set and validation set conform to the decision maker risk preference ratio and is significantly lower than the error tolerance of ±10%.  相似文献   

7.
一种新聚类算法在基因表达数据分析中的应用   总被引:2,自引:1,他引:1       下载免费PDF全文
自组织特征映射神经网络与层次聚类算法是两种较经典的分析基因表达数据的聚类算法,但由于基因表达数据的复杂性与不稳定性,这两种算法都存在着自身的优劣。因此,在比较两种算法差异性的基础上,创造性地提出了一种新算法,即通过SOM算法对基因表达数据进行聚类,再用层次聚类将每个类对应的神经元权值二次聚类,并将此算法应用在酵母菌基因表达数据中,用实验证明改进算法克服了自组织算法的一些缺陷,提高了基因聚类的效能。  相似文献   

8.
针对EM算法中的初始类的数目很难决定,在迭代中经常产生部分最优的情况,将K-means算法与基于EM的聚类方法相结合,提出了一个新的适用于基因表达数据的模型聚类方法。新的聚类方法,首先利用K-means算法具有全局性、效率高的优点,快速得到聚类的起始类的划分,将其设置为高斯混合模型的初始参数值,进一步采用EM方法进行聚类,得到最优聚类结果。通过2次对真实数据集的实验测试,将新的算法分别与K均值算法和EM算法进行了比较。实验结果表明,新算法是一种有效的聚类方法,聚类结果的准确度得到了提高。  相似文献   

9.
在生命科学中,需要对物种及基因进行分类,以获得对种群固有结构的认识。利用数据聚类方法,有效地辨别/识别基因表示数据的模式,对它们进行分类。将特征相似性大的归为一类,特征相异性大的归为不同类。这对于研究基因的结构、功能、以及不同种类基因之间的关系都具有重要意义。利用图论的方法对分子生物学中基因表示数据进行初始聚类,然后再结合别的算法,如K-近邻自学习聚类算法或基于中心点的自学习聚类算法,对其进一步求精。对于某种聚类判别准则,能够产生全局最优簇。最后对算法进行了分析和讨论,并用模拟数据进行了实验验证。  相似文献   

10.
针对基因表达数据中存在的噪声对聚类分析结果准确度的影响问题,提出了一种基于小波包分解的基因表达数据模糊聚类分析方案,介绍了理论根据和算法,给出了Matlab仿真结果,并与其他方法聚类的结果进行了比较。结果表明提出的方法能够减少传统聚类方法受到噪声影响的程度,能够挖掘出基因表达数据在时间上的行为特征,对与细胞周期调控有关的基因表达数据的聚类结果划分更为准确和细致。  相似文献   

11.
针对FCM算法应用于基因表达数据分析时存在的局限性,提出一种特征加权自适应FCM算法。该算法在FCM算法的基础上引入数据集预处理机制,可依据数据集的分布特征自适应地获取分类数目和初始聚类中心,并通过ReliefF算法实现特征权值的自动确定。同时,新算法考虑了不同属性对分类贡献的差异,在FCM算法中引入特征权重。将算法应用于真实基因表达数据集,实验结果表明,算法能够自适应地确定聚类数目、获得稳定性较好的聚类结果,而且具有较高的聚类精度。  相似文献   

12.
Since most biological systems are developmental and dynamic, time-course gene expression profiles provide an important characterization of gene functions. Assigning functions for genes with unknown functions based on time-course gene expressions is an important task in functional genomics. Recently, various methods have been proposed for the classification of gene functions based on time-course gene expression data. In this paper, we consider the classification of gene functions from functional data analysis viewpoint, where a functional support vector machine is adopted. The functional support vector machine can model temporal effects of time-course gene expression data by incorporating the coefficients as well as the basis matrix obtained from a finite expansion of gene expressions on a set of basis functions. We apply the functional support vector machine to both real microarray and simulated data. Our results indicate that the functional support vector machine is effective in discriminating gene functions of time-course gene expressions with predefined functions. The method also provides valuable functional information about interactions between genes and allows the assignment of new functions to genes with unknown functions.  相似文献   

13.
PLS和SVM应用于基因表达数据分类   总被引:4,自引:3,他引:4  
基因表达数据的一个重要应用是给疾病样本分类,如鉴别肿瘤的类型。基因芯片的蓬勃发展使得同时测定成千上万个基因的表达成为可能。这种测定能力使得我们在很短的时间内可以得到变量数p(基因数)远远大于样本数N的数据矩阵。标准的分类统计方法在N相似文献   

14.
This paper aims at automatic classification of power quality events using Wavelet Packet Transform (WPT) and Support Vector Machines (SVM). The features of the disturbance signals are extracted using WPT and given to the SVM for effective classification. Recent literature dealing with power quality establishes that support vector machine methods generally outperform traditional statistical and neural methods in classification problems involving power disturbance signals. However, the two vital issues namely the determination of the most appropriate feature subset and the model selection, if suitably addressed, could pave way for further improvement of their performances in terms of classification accuracy and computation time. This paper addresses these issues through a classification system using two optimization techniques, the genetic algorithms and simulated annealing. This system detects the best discriminative features and estimates the best SVM kernel parameters in a fully automatic way. Effectiveness of the proposed detection method is shown in comparison with the conventional parameter optimization methods discussed in literature like grid search method, neural classifiers like Probabilistic Neural Network (PNN), fuzzy k-nearest neighbor classifier (FkNN) and hence proved that the proposed method is reliable as it produces consistently better results.  相似文献   

15.
The effective recognition of unnatural control chart patterns (CCPs) is a critical issue in statistical process control, as unnatural CCPs can be associated with specific assignable causes adversely affecting the process. Machine learning techniques, such as artificial neural networks (ANNs), have been widely used in the research field of CCP recognition. However, ANN approaches can easily overfit the training data, producing models that can suffer from the difficulty of generalization. This causes a pattern misclassification problem when the training examples contain a high level of background noise (common cause variation). Support vector machines (SVMs) embody the structural risk minimization, which has been shown to be superior to the traditional empirical risk minimization principle employed by ANNs. This research presents a SVM-based CCP recognition model for the on-line real-time recognition of seven typical types of unnatural CCP, assuming that the process observations are AR(1) correlated over time. Empirical comparisons indicate that the proposed SVM-based model achieves better performance in both recognition accuracy and recognition speed than the model based on a learning vector quantization network. Furthermore, the proposed model is more robust toward background noise in the process data than the model based on a back propagation network. These results show the great potential of SVM methods for on-line CCP recognition.  相似文献   

16.
针对最小二乘支持向量机的多参数寻优问题,提出了一种基于基因表达式编程的最小二乘支持向量机参数优选方法.该算法将最小二乘支持向量机参数(C,σ)样本作为GEP的基因,按其变异算子随着进化代数和染色体所含基因数目动态变化的机制执行,其收敛速度和精确度大大提高.并与基于粒子群算法和遗传算法参数优选方法比较,通过标准测试函数验证了该算法的拟合误差最低.最后用其建立氧化铝生产蒸发过程参数预测模型,应用工业生产数据进行验证,实验结果表明该方法有效且获得了满意的效果.  相似文献   

17.
Nowadays, microarray gene expression data plays a vital role in tumor classification. However, due to the accessibility of a limited number of tissues compared to large number of genes in genomic data, various existing methods have failed to identify a small subset of discriminative genes. To overcome this limitation, in this paper, we developed a new hybrid technique for gene selection, called ensemble multipopulation adaptive genetic algorithm (EMPAGA) that can overlook the irrelevant genes and classify cancer accurately. The proposed hybrid gene selection algorithm comprises of two phase. In the first phase, an ensemble gene selection (EGS) method used to filter the noisy and redundant genes in high-dimensional datasets by combining multilayer and F-score approaches. Then, an adaptive genetic algorithm based on multipopulation strategy with support vector machine and naïve Bayes (NB) classifiers as a fitness function is applied for gene selection to select the extremely sensible genes from the reduced datasets. The performance of the proposed method is estimated on 10 microarray datasets of numerous tumor. The comprehensive results and various comparisons disclose that EGS has a remarkable impact on the efficacy of the adaptive genetic algorithm with multipopulation strategy and enhance the capability of the proposed approach in terms of convergence rate and solution quality. The experiments results demonstrate the superiority of the proposed method when compared to other standard wrappers regarding classification accuracy and optimal number of genes.  相似文献   

18.
Predicting the accurate prognosis of breast cancer from high throughput microarray data is often a challenging task. Although many statistical methods and machine learning techniques were applied to diagnose the prognosis outcome of breast cancer, they are suffered from the low prediction accuracy (usually lower than 70%). In this paper, we propose a better method (genetic algorithm-support vector machine, we called GASVM) to significant improve the prediction accuracy of breast cancer from gene expression profiles. To further improve the classification performance, we also apply GASVM model using combined clinical and microarray data. In this paper, we evaluate the performance of the GASVM model based on data provided by 97 breast cancer patients. Four kinds of gene selection methods are used: all genes (All), 70 correlation-selected genes (C70), 15 medical literature-selected genes (R15), and 50 T-test-selected genes (T50). With optimized parameter values identified from GASVM model, the average predictive accuracy of our model approaches 95% for T50 and 90% for C70 or R15 in all four kernel functions using integrated clinical and microarray data. Our model produces results more accurately than the average 70% predictive accuracy of other machine learning methods. The results indicate that the GASVM model has the potential to better assist physicians in the prognosis of breast cancer through the use of both clinical and microarray data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号