首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
DNA序列分类的神经网络方法   总被引:5,自引:3,他引:5  
该文将人工神经网络方法用于DNA分类,首先应用概率统计的方法对20个已知类别的人工DNA序列进行特征提取,形成DNA序列的特征向量,并将之作为样本输入BP神经网络进行学习。采用MATLAB软件包中的神经网络工具箱中的反向传播算法来训练神经网络。构造了两个三层BP神经网络,将提取的DNA特征向量集作为样本分别输入这两个网络进行学习,通过训练后,将20个未分类的人工序列样本和182个自然序列样本提取特征向量并输入两个网络进行分类,结果表明,分类方法能够以很高的正确率和精度对DNA进行分类,将人工神经网络用于DNA序列分类是完全可行的。  相似文献   

2.
提出了一种基于核的模糊多球分类算法,该算法在训练阶段为每一个模式类构造多个最小球覆盖其所有的训练样本,并且在识别阶段算法利用一个模糊隶属函数来归类测试样本。此外,在提出的分类算法的基础上,还给出了它的集成方法。最后,采用了4个真实数据集进行实验,实验结果表明该文提出的算法具有较好的分类性能,是一种行之有效的分类算法。  相似文献   

3.
20世纪90年代,人类基因组计划的启动,有力推动了DNA测序工作的发展。寻找某些特征片段(功能片段)在序列中的分布规律,对遗传学、生物信息学等都有重要的应用意义。在教学、研究中发现,应用数学分析软件MATLAB的字符串处理功能,可以容易地达到功能片段分析的目的,本系统通过分析DNA序列链之间的关联程度,构造出特征矩阵,根据模糊C均值算法较准确的对DNA序列的集合进行了分类,同时利用matlab的图像显示功能将聚类的最终结果清楚明了的显示在图像中,使用户能清楚的看到聚类效果。本系统主要研究了DNA链碱基序列分析、多个DNA链特征矩阵提取、模糊C均值聚类算法分类DNA等三大部分。首先该系统对DNA序列的总长度和功能序列的长度进行了测量,利用一维数组确定功能片段在DNA序列中的位置特征,从而完成了对DNA碱基序列的分析;其次该系统对用户给出的数个DNA链进行序列之间的特征分析,统计出每个序列的(A,T,C,G)碱基密度,得到一个特征矩阵,有效的为模糊聚类分析方法提供数据来源。最终该系统应用模糊C均值聚类算法,利用特征矩阵的数值,将数个DNA序列聚类并分为两类。  相似文献   

4.
20世纪90年代,人类基因组计划的启动,有力推动了DNA测序工作的发展。寻找某些特征片段(功能片段)在序列中的分布规律,对遗传学、生物信息学等都有重要的应用意义。在教学、研究中发现,应用数学分析软件MATLAB的字符串处理功能,可以容易地达到功能片段分析的目的,本系统通过分析DNA序列链之间的关联程度,构造出特征矩阵,根据模糊C均值算法较准确的对DNA序列的集合进行了分类,同时利用matlab的图像显示功能将聚类的最终结果清楚明了的显示在图像中,使用户能清楚的看到聚类效果。本系统主要研究了DNA链碱基序列分析、多个DNA链特征矩阵提取、模糊C均值聚类算法分类DNA等三大部分。首先该系统对DNA序列的总长度和功能序列的长度进行了测量,利用一维数组确定功能片段在DNA序列中的位置特征,从而完成了对DNA碱基序列的分析;其次该系统对用户给出的数个DNA链进行序列之间的特征分析,统计出每个序列的(A,T,C,G)碱基密度,得到一个特征矩阵,有效的为模糊聚类分析方法提供数据来源。最终该系统应用模糊C均值聚类算法,利用特征矩阵的数值,将数个DNA序列聚类并分为两类。  相似文献   

5.
当使用模糊时间序列预测模型进行预测时, 模糊区间的不同划分对最后的预测精度有着十分重要的影响. 针对如何更有效的划分模糊区间、进一步提高模糊时间序列的预测精度问题, 本文提出了一种基于改进狼群算法 的模糊时间序列预测模型. 为此首先简要介绍了模糊时间序列, 然后阐述了狼群算法并在其游走行为中引入趋向 行为和死亡概率对其进行了改进, 最后利用改进狼群算法来划分模糊区间, 建立了一种新的模糊时间序列预测模 型. 将Alabama大学入学人数作为实验数据进行实例分析和验证. 通过与现有的一些模型进行对比分析, 本文所提 模型具有更高的预测精度, 为模糊时间序列预测提供了新思路.  相似文献   

6.
基于蚁群优化聚类算法的DNA序列分类方法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对目前聚类算法在分析DNA序列数据时的低效性和分类精度低问题,提出一种基于蚁群优化聚类算法(ACOC)的DNA序列分类方法,在密度函数中加入自适应感应量并应用模拟退火中的α-适应量的冷却策略,采用DNA序列分布特征对DNA序列进行特征提取,并将pearson相关系数引入蚁群聚类算法作为相似性度量。在EMBL-DNA数据库中4个数据集上进行性能测试,与统计聚类和k-means算法的比较表明,该方法具有一定的时间和精度的优越性,适于解决大规模DNA序列数据分类问题。  相似文献   

7.
胡耀炜  段磊  李岭  韩超 《计算机应用》2018,38(2):427-432
针对现有的基于模式的序列分类算法对于生物序列存在分类精度不理想、模型训练时间长的问题,提出密度感知模式,并设计了基于密度感知模式的生物序列分类算法——BSC。首先,在生物序列中挖掘具有"密度感知"的频繁序列模式;然后,对挖掘出的频繁序列模式进行筛选、排序制定成分类规则;最后,通过分类规则对没有分类的序列进行分类预测。在4组真实生物序列中进行实验,分析了BSC算法参数对结果的影响并提供了推荐参数设置;同时分类结果表明,相比其他四种基于模式的分类算法,BSC算法在实验数据集上的准确率至少提高了2.03个百分点。结果表明,BSC算法有较高的生物序列分类精度和执行效率。  相似文献   

8.
为了解决多属性数据分类问题,提出了一种基于模糊优选模型与聚类分析的分类方法(FO-CA)。首先由模糊优选模型得到有序综合指标数据集,其中在权重阶段提出了距离差异度并以此为依据构建了一种组合主客观权重的赋权方法;然后采用聚类分析将有序综合指标数据集聚类为几个簇进而分类;最后选取UCI中的Iris、Wine和Ruspini 3个数据集进行仿真实验。实验结果表明,该分类方法相比模糊优选方法及K-Means算法能获得更好的分类结果,对决策者有一定的参考价值。  相似文献   

9.
基于聚类分析的模糊分类系统构造方法   总被引:16,自引:0,他引:16  
童树鸿  沈毅  刘志言 《控制与决策》2001,16(Z1):737-740
提出一种新的利用样本数据构造模糊分类系统的方法.首先对每一类样本进行聚类分析,提出一种自适应确定各类别聚类数目的迭代算法,从而实现对特征空间的划分.然后对每个特征子空间产生一条模糊规则,将所有的规则组合在一起形成初始模糊分类系统,并对该系统进行结构简化和参数优化,在系统结构尽可能简单的前提下,进一步提高系统的分类性能.最后利用该方法对二维特征空间的两类样本和Iris数据样本进行分类.仿真结果表明,该方法能利用较少的模糊分类规则达到较高的识别率.  相似文献   

10.
网络信息的多样性和多变性给信息的管理和过滤带来极大困难,为加快网络信息的分类速度和分类精度,提出了一种基于模糊粗糙集的Wdb文本分类方法.采用机器学习的方法:在训练阶段,首先对Web文本信息预处理,用向量空间模型表示文本,生成初始特征属性空间,并进行权值计算;然后用模糊粗糙集算法来进行信息过滤,用基于模糊租糙集的属性约简算法生成分类规则:最后利用知识库进行文档分类.在测试阶段,对未经预处理的文本直接进行关键属性匹配,经模糊粗糙因子加权后,用空间距离法分类.通过试验比较,该方法具有较好的分类效果.  相似文献   

11.
Abstract: Currently, classifying samples into a fixed number of clusters (i.e. supervised cluster analysis) as well as unsupervised cluster analysis are limited in their ability to support 'cross-algorithms' analysis. It is well known that each cluster analysis algorithm yields different results (i.e. a different classification); even running the same algorithm with two different similarity measures commonly yields different results. Researchers usually choose the preferred algorithm and similarity measure according to analysis objectives and data set features, but they have neither a formal method nor tool that supports comparisons and evaluations of the different classifications that result from the diverse algorithms. Current research development and prototype decisions support a methodology based upon formal quantitative measures and a visual approach, enabling presentation, comparison and evaluation of multiple classification suggestions resulting from diverse algorithms. This methodology and tool were used in two basic scenarios: (I) a classification problem in which a 'true result' is known, using the Fisher iris data set; (II) a classification problem in which there is no 'true result' to compare with. In this case, we used a small data set from a user profile study (a study that tries to relate users to a set of stereotypes based on sociological aspects and interests). In each scenario, ten diverse algorithms were executed. The suggested methodology and decision support system produced a cross-algorithms presentation; all ten resultant classifications are presented together in a 'Tetris-like' format. Each column represents a specific classification algorithm, each line represents a specific sample, and formal quantitative measures analyse the 'Tetris blocks', arranging them according to their best structures, i.e. best classification.  相似文献   

12.

In the fields of pattern recognition and machine learning, the use of data preprocessing algorithms has been increasing in recent years to achieve high classification performance. In particular, it has become inevitable to use the data preprocessing method prior to classification algorithms in classifying medical datasets with the nonlinear and imbalanced data distribution. In this study, a new data preprocessing method has been proposed for the classification of Parkinson, hepatitis, Pima Indians, single proton emission computed tomography (SPECT) heart, and thoracic surgery medical datasets with the nonlinear and imbalanced data distribution. These datasets were taken from UCI machine learning repository. The proposed data preprocessing method consists of three steps. In the first step, the cluster centers of each attribute were calculated using k-means, fuzzy c-means, and mean shift clustering algorithms in medical datasets including Parkinson, hepatitis, Pima Indians, SPECT heart, and thoracic surgery medical datasets. In the second step, the absolute differences between the data in each attribute and the cluster centers are calculated, and then, the average of these differences is calculated for each attribute. In the final step, the weighting coefficients are calculated by dividing the mean value of the difference to the cluster centers, and then, weighting is performed by multiplying the obtained weight coefficients by the attribute values in the dataset. Three different attribute weighting methods have been proposed: (1) similarity-based attribute weighting in k-means clustering, (2) similarity-based attribute weighting in fuzzy c-means clustering, and (3) similarity-based attribute weighting in mean shift clustering. In this paper, we aimed to aggregate the data in each class together with the proposed attribute weighting methods and to reduce the variance value within the class. Thus, by reducing the value of variance in each class, we have put together the data in each class and at the same time, we have further increased the discrimination between the classes. To compare with other methods in the literature, the random subsampling has been used to handle the imbalanced dataset classification. After attribute weighting process, four classification algorithms including linear discriminant analysis, k-nearest neighbor classifier, support vector machine, and random forest classifier have been used to classify imbalanced medical datasets. To evaluate the performance of the proposed models, the classification accuracy, precision, recall, area under the ROC curve, κ value, and F-measure have been used. In the training and testing of the classifier models, three different methods including the 50–50% train–test holdout, the 60–40% train–test holdout, and tenfold cross-validation have been used. The experimental results have shown that the proposed attribute weighting methods have obtained higher classification performance than random subsampling method in the handling of classifying of the imbalanced medical datasets.

  相似文献   

13.
基于宏块分类的运动估计/补偿技术   总被引:2,自引:0,他引:2       下载免费PDF全文
本文提出了一种简单有效的宏块分类方法,分析了基于块运动估计/补偿技术及两种典型的快速运动估计技术,基于MPEG-1的模拟结果表明了基于宏块分类的运动估计法之有效性。  相似文献   

14.
微博用户性别分类旨在根据用户信息进行用户性别的识别。目前性别分类的相关研究主要针对单一类型的特征(文本特征或者社交特征)进行性别分类。与以往研究不同,文中提出了一种双通道LSTM(Long-Short Term Memory)模型,以充分结合文本特征(用户发表的微博文本)和社交特征(用户关注者的信息)进行用户性别分类方法的研究。首先,利用单通道LSTM模型分别学习两组文本特征,得到两种特征表示;然后,在神经网络中加入Merge层, 结合两种特征表示进行集成学习,以充分学习文本特征和社交特征之间的联系。实验结果表明,相对于传统的分类算法,双通道LSTM模型分类算法能够获得更好的用户性别分类效果。  相似文献   

15.
Recently, hesitant fuzzy sets (HFSs) have been studied by many researchers as a powerful tool to describe and deal with uncertain data, but relatively, very few studies focus on the clustering analysis of HFSs. In this paper, we propose a novel hesitant fuzzy agglomerative hierarchical clustering algorithm for HFSs. The algorithm considers each of the given HFSs as a unique cluster in the first stage, and then compares each pair of the HFSs by utilising the weighted Hamming distance or the weighted Euclidean distance. The two clusters with smaller distance are jointed. The procedure is then repeated time and again until the desirable number of clusters is achieved. Moreover, we extend the algorithm to cluster the interval-valued hesitant fuzzy sets, and finally illustrate the effectiveness of our clustering algorithms by experimental results.  相似文献   

16.
This paper proposes a novel two-stage fuzzy classification model established by the fuzzy feature extraction agent (FFEA) and the fuzzy classification unit (FCU). At first, we propose a FFEA to validly extraction the feature variables from the original database. And then, the FCU, which is the main determination of the classification result, is developed to generate the if–then rules automatically. In fact, both the FFEA and FCU are fuzzy models themselves. In order to obtain better classification results, we utilize the genetic algorithms (GAs) and adaptive grade mechanism (AGM) to tune the FFEA and FCU, respectively, to improve the performance of the proposed fuzzy classification model. In this model, GAs are used to determine the distribution of the fuzzy sets for each feature variable of the FFEA, and the AGM is developed to regulate the confidence grade of the principal if–then rule of the FCU. Finally, the well-known Iris, Wine, and Glass databases are exploited to test the performances. Computer simulation results demonstrate that the proposed fuzzy classification model can provide a sufficiently high classification rate in comparison with other models in the literature.  相似文献   

17.
Traditional Chinese Medicine diagnoses a wide range of health conditions by examining features of the tongue, including its shape. This paper presents a classification approach for automatically recognizing and analyzing tongue shapes based on geometric features. The approach corrects the tongue deflection by applying three geometric criteria and then classifies tongue shapes according to seven geometric features defined using various measurements of length, area and angle of the tongue. To establish a measurable and machine readable relationship between expert human judgments and the machine classifications of tongue shapes, we use a decision support tool, Analytic Hierarchy Process (AHP), to weight the relative influences of the various length/area/angle factors used in classifying a tongue, and then apply a fuzzy fusion framework that combines seven AHP modules, one for each tongue shape, to represent the uncertainty and imprecision between these quantitative features and tongue shape classes. Experimental results show that the proposed shape correction method reduces the deflection of tongue shapes and that our shape classification approach, tested on a total of 362 tongue samples, achieved an accuracy of 90.3%, making it more accurate than either KNN or LDA.  相似文献   

18.
昌燕 《计算机应用》2011,31(7):1880-1883
在已有的信任类型研究的基础上,分析了针对Web服务的信任分类的不准确性问题,提出了动态定义信任类别的方案。从定义Web服务的信任属性的合理性和灵活性角度出发,提出用直觉模糊数描述信任特征,考虑固有能力、安全特性和声誉三个方面的特征对信任的影响,并给出能力信任贡献度、安全信任贡献度和声誉信任贡献度的计算方法。构造了信任直觉模糊集的相似矩阵,由求传递闭包的方式得到直觉模糊等价矩阵,通过设定不同的阈值得到不同的阈值截矩阵,从而得到不同的分类结果。验证了动态定义信任类别的有效性和准确性。  相似文献   

19.
Clustering is one of the major operations to analyse genome sequence data. Sophisticated sequencing technologies generate huge DNA sequence data; consequently, the complexity of analysing sequences is also increased. So, there is an enormous need for faster sequence analysis algorithms. Most of the existing tools focused on alignment-based approaches, which are slow-paced for sequence comparison. Alignment-free approaches are more successful for fast clustering. The state-of-the-art methods have been applied to cluster small genome sequences of various species; however, they are sensitive to large size sequences. To subdue this limitation, we propose a novel alignment-free method called DNA sequence clustering with map-reduce (DCMR). Initially, MapReduce paradigm is used to speed up the process of extracting eight different types of repeats. Then, the frequency of each type of repeat in a sequence is considered as a feature for clustering. Finally, K-means (DCMR-Kmeans) and K-median (DCMR-Kmedian) algorithms are used to cluster large DNA sequences by using extracted features. The two variants of proposed method are evaluated to cluster large genome sequences of 21 different species and the results show that sequences are very well clustered. Our method is tested for different benchmark data sets like viral genome, influenza A virus, mtDNA, and COXI data sets. Proposed method is compared with MeshClust, UCLUST, STARS, and ClustalW. DCMR-Kmeans outperforms MeshClust, UCLUST, and DCMR-Kmedian with respect to purity and NMI on virus data sets. The computational time of DCMR-Kmeans is less than STARS, DCMR-Kmedian, and much less than UCLUST on COXI data set.  相似文献   

20.
Recent work on extracting features of gaps in handwritten text allows a classification of these gaps into inter-word and intra-word classes using suitable classification techniques. In this paper, we first analyse the features of the gaps using mutual information. We then investigate the underlying data distribution by using visualisation methods. These suggest that a complicated structure exists, which makes them difficult to be separated into two distinct classes. We apply five different supervised classification algorithms from the machine learning field on both the original dataset and a dataset with the best features selected using mutual information. Moreover, we improve the classification result with the aid of a set of feature variables of strokes preceding and following each gap. The classifiers are compared by employing McNemar's test. We find that SVMs and MLPs outperform the other classifiers and that preprocessing to select features works well. The best classification result attained suggests that the technique we employ is particularly suitable for digital ink manipulation at the level of words.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号