首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The protein microarray is a powerful chip-based technology for profiling hundreds of proteins simultaneously and is being increasingly used. To study humoral response in pancreatic cancers, scientists have developed a two-dimensional liquid separation technique and built a two-dimensional protein microarray. However, identifying regions of differential expression on the protein microarray requires the use of appropriate statistical methods to assess the large amounts of data generated. A permutation-based test is proposed that incorporates spatial information of the two-dimensional antibody microarray. By borrowing strength from neighboring differentially expressed spots, the procedure is able to detect differentially expressed regions with high power while controlling the familywise type I error at 0.05 in simulation studies. The proposed methodology is also applied to a real microarray dataset.  相似文献   

2.
3.
A new improved forward floating selection (IFFS) algorithm for selecting a subset of features is presented. Our proposed algorithm improves the state-of-the-art sequential forward floating selection algorithm. The improvement is to add an additional search step called “replacing the weak feature” to check whether removing any feature in the currently selected feature subset and adding a new one at each sequential step can improve the current feature subset. Our method provides the optimal or quasi-optimal (close to optimal) solutions for many selected subsets and requires significantly less computational load than optimal feature selection algorithms. Our experimental results for four different databases demonstrate that our algorithm consistently selects better subsets than other suboptimal feature selection algorithms do, especially when the original number of features of the database is large.  相似文献   

4.
属性规约是应对“维数灾难”的有效技术,分形属性规约FDR(Fractal Dimensionality Reduction)是近年来出现的一种无监督属性选择技术,令人遗憾的是其需要多遍扫描数据集,因而难于应对高维数据集情况;基于遗传算法的属性规约技术对于高维数据而言优越于传统属性选择技术,但其无法应用于无监督学习领域。为此,结合遗传算法内在随机并行寻优机制及分形属性选择的无监督特点,设计并实现了基于遗传算法的无监督分形属性子集选择算法GABUFSS(Genetic Algorithm Based Unsupervised Feature Subset Selection)。基于合成与实际数据集的实验对比分析了GABUFSS算法与FDR算法的性能,结果表明GABUFSS相对优于FDR算法,并具有发现等价结果属性子集的特点。  相似文献   

5.
肿瘤亚型的准确判别对肿瘤的治疗具有重要意义,对肿瘤的不同亚型进行准确判别是当前生物信息学研究的重要课题.本文首先利用Relief算法排序基因并选出初始的肿瘤信息基因子集,然后利用向基于邻域粗糙集模型的向前属性约减算法FARNeM来计算加权基因集合,最后用加权KNN算法对肿瘤对这些数据进行分析,从而发现有差异的基因表达。实验结果表明了上述方法的可行性和有效性。  相似文献   

6.
Microarrays have been widely used to classify cancer samples and discover the biological types, for example tumor versus normal phenotypes in cancer research. One of the challenging scientific tasks in the post-genomic epoch is how to identify a subset of differentially expressed genes from thousands of genes in microarray data which will enable us to understand the underlying molecular mechanisms of diseases, accurately diagnosing diseases and identifying novel therapeutic targets. In this paper, we propose a new framework for identifying differentially expressed genes. In the proposed framework, genes are ranked according to their residuals. The performance of the framework is assessed through applying it to several public microarray data. Experimental results show that the proposed method gives more robust and accurate rank than other statistical test methods, such as t-test, Wilcoxon rank sum test and KS-test. Another novelty of the method is that we design an algorithm for selecting a small subset of genes that show significant variation in expression (“outlier” genes). The number of genes in the small subset can be controlled via an alterable window of confidence level. In addition, the results of the proposed method can be visualized. By observing the residual plot, we can easily find genes that show significant variation in two groups of samples and learn the degrees of differential expression of genes. Through a comparison study, we found several “outlier” genes which had been verified in previous biological experiments while they were either not identified by other methods or had lower ranks in standard statistical tests.  相似文献   

7.
A reliable and precise classification of tumors is essential for successful treatment of cancer. Gene selection is an important step for improved diagnostics. The modified SFFS (sequential forward floating selection) algorithm based on weighted Mahalanobis distance, called MSWM, is proposed to identify optimal informative gene subsets taking into account joint discriminatory power for accurate discrimination in this study. Firstly, we make use of the one-dimensional weighted Mahalanobis distance to perform a preliminary selection of genes and then make use of the modified SFFS method and multidimensional weighted Mahalanobis distance to obtain the optimal informative gene subset for tumor classification. Finally, we used the k nearest neighbor and naive Bayes methods to classify tumors based on the optimal gene subset selected using the MSWM method. To validate the efficiency, the proposed MSWM method is applied to classify two different DNA microarray datasets. Our empirical study shows that the MSWM method for tumor classification can obtain better effectiveness of classification than the BWR (the ratio of between-groups to within-groups sum of squares) and IVGA_I (independent variable group analysis I) methods. It suggests that the MSWM gene selection method is ability to obtain correct informative gene subsets taking into account genes’ joint discriminatory power for tumor classification.  相似文献   

8.
Multidimensional fingerprinting (MDF) utilizes measurable peptide characteristics to identify proteins. In this study, 3‐D fingerprinting, namely, parent protein molecular weight, peptide mass, and peptide retention time on RPLC, is used to identify 331 differentially expressed proteins between normal and human colon cancer plasma membrane samples. A false discovery rate (FDR) procedure is introduced to evaluate the performance of MDF on the colon cancer dataset. This evaluation establishes a false protein identification rate below 15% for this dataset. Western blot analysis is performed to validate the differential expression of the MDF‐identified protein VDAC1 on the original tissue samples. The limits of MDF are further assessed by a simulation study where key parameters such as database size, query size, and mass accuracy are varied. The results of this simulation study demonstrate that fingerprinting with three dimensions yields low FDR values even for large queries on the complete human proteome without the need for prior peptide sequencing by tandem mass spectrometry. Specifically, when mass accuracy is 10 ppm or lower, full human proteome searches can achieve FDR values of 10% or less.  相似文献   

9.
飞行数据记录仪(FDR)在每一次飞行中都记录了大量的航空观测数据,航空数据属于多元时间序列,具有高维和异构的特点。为了检测出其中的异常飞行记录,提出了一种异构航空数据的异常检测模型HDAD(Anomaly Detection for Heterogeneous Data)。HDAD模型分别使用基于局部趋势的向量表示法SMV(Slope-Mean Vector)和基于变化点的方法分别对连续特征的时间序列和离散特征的时间序列进行压缩表示。经过验证试验表明SMV表示法与SAX,PCA相比,能够更加精确的表示时间序列的信息。通过仿真,使用HDAD模型对合成航空数据集与真实航空数据集进行异常检测,实验结果表明,所提出的HDAD模型能够检测出FDR数据中可能存在的潜在异常,有助于航空公司对FDR数据进行进一步的分析。  相似文献   

10.
We present four efficient parallel algorithms for computing a nonequijoin, called range-join, of two relations on N-dimensional mesh-connected computers. Range-joins of relations R and S are an important generalization of conventional equijoins and band-joins and are solved by permutation-based approaches in all proposed algorithms. In general, after sorting all subsets of both relations, the proposed algorithms permute every sorted subset of relation S to each processor in turn, where it is joined with the local subset of relation R. To permute the subsets of S efficiently, we propose two data permutation approaches, namely, the shifting approach which permutes the data recursively from lower dimensions to higher dimensions and the Hamiltonian-cycle approach which first constructs a Hamiltonian cycle on the mesh and then permutes the data along this cycle by repeatedly transferring data from each processor to its successor. We apply the shifting approach to meshes with different storage capacities which results in two different join algorithms. The basic shifting join (BASHJ) algorithm can minimize the number of subsets stored temporarily at a processor, but requires a large number of data transmissions, while the buffering shifting join (BUSHJ) algorithm can achieve a high parallelism and minimize the number of data transmissions, but requires a large number of subsets stored at each processor  相似文献   

11.
针对特征子集区分度准则(Discernibility of feature subsets, DFS)没有考虑特征测量量纲对特征子集区分能力影响的缺陷, 引入离散系数, 提出GDFS (Generalized discernibility of feature subsets)特征子集区分度准则. 结合顺序前向、顺序后向、顺序前向浮动和顺序后向浮动4种搜索策略, 以极限学习机为分类器, 得到4种混合特征选择算法. UCI数据集与基因数据集的实验测试, 以及与DFS、Relief、DRJMIM、mRMR、LLE Score、AVC、SVM-RFE、VMInaive、AMID、AMID-DWSFS、CFR和FSSC-SD的实验比较和统计重要度检测表明: 提出的GDFS优于DFS, 能选择到分类能力更好的特征子集.  相似文献   

12.
基于BP神经网络的肿瘤特征基因选取   总被引:2,自引:0,他引:2  
该文提出基于BP神经网络的灵敏度分析方法,并用于选取肿瘤特征基因。以结肠癌基因表达谱为例,首先定义基因对BP神经网络模型输出函数的灵敏度,递归去除灵敏度较低的若干基因,生成一组嵌套的候选特征基因子集。然后以支持向量机为分类器,检验候选特征基因子集对样本分类的贡献,选取错分率最低的候选特征基因子集为结肠癌特征基因子集。通过实验对比,该特征基因子集的分类结果优于文献给出的其他特征基因子集,表明了该方法的可行性和有效性。  相似文献   

13.
基于RNA-Seq的转录组测序数据特征维度较高,使用传统生信方法寻找表型相关基因需要大量计算资源,且差异分析所得候选基因范围较大,进一步筛选依赖已有的先验知识.针对这一问题,本文提出了融合遗传算法和XGBoost的转录组分析方法—GA-XGBoost,通过融入机器学习算法缩小了后续分析的候选基因范围.在一组高质量玉米数...  相似文献   

14.
波段选择是降低高光谱数据量,克服地物分类中Hughes现象的有效手段。子集生成方式和评价准则是选择算法的两要素。提出一种混合随机搜索与启发式搜索的子集生成方法。该方法在随机搜索中嵌入启发式搜索,对由离散粒子群优化算法每次迭代更新的种群利用序贯搜索进行局部微调,提高了随机搜索的精度。这种嵌入微调也保证了优化算法解的有效性。高光谱波段选择与分类实验比较了该方法与混合遗传算法、标准遗传算法和顺序前向浮动选择算法的性能,表明算法能选择出评价准则意义下更好的子集。  相似文献   

15.
There may exist priority relationships among criteria in multi-criteria decision making (MCDM) problems. This kind of problems, which we focus on in this paper, are called prioritized MCDM ones. In order to aggregate the evaluation values of criteria for an alternative, we first develop some weighted prioritized aggregation operators based on triangular norms (t-norms) together with the weights of criteria by extending the prioritized aggregation operators proposed by Yager (Yager, R. R. (2004). Modeling prioritized multi-criteria decision making. IEEE Transactions on Systems, Man, and Cybernetics, 34, 2396–2404). After discussing the influence of the concentration degrees of the evaluation values with respect to each criterion to the priority relationships, we further develop a method for handling the prioritized MCDM problems. Through a simple example, we validate that this method can be used in more wide situations than the existing prioritized MCDM methods. At length, the relationships between the weights associated with criteria and the preference relations among alternatives are explored, and then two quadratic programming models for determining weights based on multiplicative and fuzzy preference relations are developed.  相似文献   

16.
Abstract: In this work an entropic filtering algorithm (EFA) for feature selection is described, as a workable method to generate a relevant subset of genes. This is a fast feature selection method based on finding feature subsets that jointly maximize the normalized multivariate conditional entropy with respect to the classification ability of tumours. The EFA is tested in combination with several machine learning algorithms on five public domain microarray data sets. It is found that this combination offers subsets yielding similar or much better accuracies than using the full set of genes. The solutions obtained are of comparable quality to previous results, but they are obtained in a maximum of half an hour computing time and use a very low number of genes.  相似文献   

17.
In this paper, we present a gene selection method based on genetic algorithm (GA) and support vector machines (SVM) for cancer classification. First, the Wilcoxon rank sum test is used to filter noisy and redundant genes in high dimensional microarray data. Then, the different highly informative genes subsets are selected by GA/SVM using different training sets. The final subset, consisting of highly discriminating genes, is obtained by analyzing the frequency of appearance of each gene in the different gene subsets. The proposed method is tested on three open datasets: leukemia, breast cancer, and colon cancer data. The results show that the proposed method has excellent selection and classification performance, especially for breast cancer data, which can yield 100% classification accuracy using only four genes.  相似文献   

18.
随着当今世界逐渐从信息化转型为数据化,模式识别和数据挖掘等领域面临越来越大的挑战.爆炸式增大的数据量使得特征选择过程成为大数据模式识别等领域必不可少的环节.受动物界资源争夺行为启发,在由特征选择模型转变为资源分配问题模型中加入个体的资源争夺行为,提出多群体公平算法(multi-colony fairness algorithm, MCFA)对该行为进行评判和处理,用以取得更优的分配方案(即更优特征子集),其有机融合随机搜索和启发式搜索,且将filter方法和wrapper方法相结合,降低计算量的同时获得更高的分类准确率.对提出的多群体公平算法进行了分析,从理论上证明了算法的收敛性和有效性;UCI机器学习数据库数据集与4种经典特征选择算法:顺序前向搜索算法(sequential forward selection, SFS)、顺序后向搜索算法(sequential backward selection, SBS)、顺序前向浮动搜索算法(sequential floating forward selection, SFFS)、顺序后向浮动搜索算法(sequential floating backward selection, SBFS)和3种主流特征选择算法:相关性-冗余度特征选择算法(relevance-redundancy feature selection, RRFS)、最大相关最小冗余算法(minimal-redundancy-maximal-relevance, mRMR)、ReliefF算法的对比实验表明,提出的多群体公平算法能够有效选择规模和性能都比较好的特征子集.  相似文献   

19.
基于遗传算法的结肠癌基因选择与样本分类   总被引:2,自引:1,他引:1       下载免费PDF全文
提出了一种基于两轮遗传算法的用于结肠癌微阵列数据基因选择与样本分类的新方法。该方法先根据基因的Bhattacharyya距离指标过滤大部分与分类不相关的基因,而后使用结合了遗传算法和CFS(Correlation-based Feature Selection)的GA/CFS方法选择优秀基因子集,并存档记录这些子集。根据存档子集中基因被选择的频率选择进一步搜索的候选子集,最后以结合了遗传算法和SVM的GA/SVM从候选基因子集中选择分类特征子集。把这种GA/CFS-GA/SVM方法应用到结肠癌微阵列数据,实验结果及与文献的比较表明了该方法效果良好。  相似文献   

20.
Although Yager has presented a prioritized operator for fuzzy subsets, called the non-monotonic operator, it can not be used to deal with multi-criteria fuzzy decision-making problems when generalized fuzzy numbers are used to represent the evaluating values of criteria. In this paper, we present a prioritized information fusion algorithm based on the similarity measure of generalized fuzzy numbers. The proposed prioritized information fusion algorithm has the following advantages: (1) It can handle prioritized multi-criteria fuzzy decision-making problems in a more flexible manner due to the fact that it allows the evaluating values of criteria to be represented by generalized fuzzy numbers or crisp values between zero and one, and (2) it can deal with prioritized information filtering problems based on generalized fuzzy numbers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号