首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
基于蚁群优化聚类算法的DNA序列分类方法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对目前聚类算法在分析DNA序列数据时的低效性和分类精度低问题,提出一种基于蚁群优化聚类算法(ACOC)的DNA序列分类方法,在密度函数中加入自适应感应量并应用模拟退火中的α-适应量的冷却策略,采用DNA序列分布特征对DNA序列进行特征提取,并将pearson相关系数引入蚁群聚类算法作为相似性度量。在EMBL-DNA数据库中4个数据集上进行性能测试,与统计聚类和k-means算法的比较表明,该方法具有一定的时间和精度的优越性,适于解决大规模DNA序列数据分类问题。  相似文献   

2.
为了使分类器能够在某个强度级别的行为样本集上训练而在其他强度级别上正确分类行为,提出了行为识别的随机逼近模型。在训练阶段从加速度计的时间序列数据提取特征,然后将特征送入聚类算法。数据依据行为聚类,聚类的均值和方差组合成相对应的SAM。在识别随机行为阶段,测试样本和每种行为类别的SAM进行比较。利用聚类算法和随机逼近给每种行为创建模型,然后使用启发式随机逼近最近邻方法来对行为进行分类。在实验中结合k-均值和高斯混合模型两种聚类算法,验证了提出的随机逼近模型的性能优于其他几种流行的行为分类方案。  相似文献   

3.
针对时间序列的全序列聚类展开,提出一种新的相似性度量——全局特征,即从时间序列的统计分布特征、非线性和Fourier频谱转换等3个方面提取11个全局特征构建特征向量。利用特征向量来描述原时间序列,不仅保留了大部分原有的信息,还能加快聚类计算的速度。经过大量的实验验证表明,基于全局特征提取的相似性度量能得到合理的聚类结果,特别是对经济领域的时间序列效果更为明显。例举了2个数据进行实验,并从主观和客观两个角度对聚类结果进行评估。  相似文献   

4.
在时间序列挖掘工作中,比如聚类和分类,需要计算距离来衡量时间序列样本之间的相似性,有许多研究都致力于时间序列相似性度量的研究.充分利用非线性趋势特征来进行时间序列挖掘.首先计算时间序列的ACF,进而构造ACF的非线性趋势特征,利用该特征作为时间序列相似性度量来进行聚类,它给时间序列平稳性的判定提供了一种新的途径.列举了一个模拟数据和一个实际数据来进行实例验证,实验结果表明,ACF非线性趋势特征作为一种新的相似性度量,相对已有的一些相似性度量而言,ACF非线性趋势特征通常只需计算少量的若干特征值就能更合理地刻画时间序列的平稳性特征.借助K-means进行聚类实验.  相似文献   

5.
针对时间序列传统静态聚类问题,提出了对时间序列进行动态聚类的方法。该方法首先提取时间序列的关键点集合,根据改进的FCM算法找到动态特征明显的时间序列,再利用提出的动态聚类算法确定此类时间序列在不同时间段的所属类别,在改进的FCM算法中采用兰氏距离可以使其对奇异值不敏感。实验结果反映出动态特征明显的时间序列类别随时间演化的特性,表明了方法的可行性和有效性。与已有算法相比,该方法揭示了时间序列的部分动态特征。该方法还可以运用于研究数据挖掘的其他问题。  相似文献   

6.
提出了一种基于DTW的符号化时间序列聚类算法,对降维后得到的不等长符号时间序列进行聚类。该算法首先对时间序列进行降维处理,提取时间序列的关键点,并对其进行符号化;其次利用DTW方法进行相似度计算;最后利用Normal矩阵和FCM方法进行聚类分析。实验结果表明,将DTW方法应用在关键点提取之后的符号化时间序列上,聚类结果的准确率有较好大提高。  相似文献   

7.
针对目前动态正电子发射断层扫描(PET)影像的感兴趣区域(ROI)提取的聚类方法忽略了时间放射性曲线(TAC)的时间序列特征,提出一种基于曲线聚类的ROI提取方法。首先用K-均值(K-Means)聚类去除背景得到心脏的位置,然后对心脏进行曲线聚类提取出心肌,最后根据像素点的空间位置关系提取血池。将该方法应用于14只小鼠的PET影像ROI勾画,实验结果表明,与K-Means和混合型的聚类方法HCM相比,该方法能够更准确地提取出14只小鼠的血池,且具有更高的精确度和稳定性。  相似文献   

8.
20世纪90年代,人类基因组计划的启动,有力推动了DNA测序工作的发展。寻找某些特征片段(功能片段)在序列中的分布规律,对遗传学、生物信息学等都有重要的应用意义。在教学、研究中发现,应用数学分析软件MATLAB的字符串处理功能,可以容易地达到功能片段分析的目的,本系统通过分析DNA序列链之间的关联程度,构造出特征矩阵,根据模糊C均值算法较准确的对DNA序列的集合进行了分类,同时利用matlab的图像显示功能将聚类的最终结果清楚明了的显示在图像中,使用户能清楚的看到聚类效果。本系统主要研究了DNA链碱基序列分析、多个DNA链特征矩阵提取、模糊C均值聚类算法分类DNA等三大部分。首先该系统对DNA序列的总长度和功能序列的长度进行了测量,利用一维数组确定功能片段在DNA序列中的位置特征,从而完成了对DNA碱基序列的分析;其次该系统对用户给出的数个DNA链进行序列之间的特征分析,统计出每个序列的(A,T,C,G)碱基密度,得到一个特征矩阵,有效的为模糊聚类分析方法提供数据来源。最终该系统应用模糊C均值聚类算法,利用特征矩阵的数值,将数个DNA序列聚类并分为两类。  相似文献   

9.
20世纪90年代,人类基因组计划的启动,有力推动了DNA测序工作的发展。寻找某些特征片段(功能片段)在序列中的分布规律,对遗传学、生物信息学等都有重要的应用意义。在教学、研究中发现,应用数学分析软件MATLAB的字符串处理功能,可以容易地达到功能片段分析的目的,本系统通过分析DNA序列链之间的关联程度,构造出特征矩阵,根据模糊C均值算法较准确的对DNA序列的集合进行了分类,同时利用matlab的图像显示功能将聚类的最终结果清楚明了的显示在图像中,使用户能清楚的看到聚类效果。本系统主要研究了DNA链碱基序列分析、多个DNA链特征矩阵提取、模糊C均值聚类算法分类DNA等三大部分。首先该系统对DNA序列的总长度和功能序列的长度进行了测量,利用一维数组确定功能片段在DNA序列中的位置特征,从而完成了对DNA碱基序列的分析;其次该系统对用户给出的数个DNA链进行序列之间的特征分析,统计出每个序列的(A,T,C,G)碱基密度,得到一个特征矩阵,有效的为模糊聚类分析方法提供数据来源。最终该系统应用模糊C均值聚类算法,利用特征矩阵的数值,将数个DNA序列聚类并分为两类。  相似文献   

10.
随着物联网、大数据和人工智能等技术研究和应用的蓬勃发展,各类时间序列数据不断涌现.时间序列数据特征是表象,内在蕴含着丰富的领域知识,如何高效分析时间序列特征模式,提取可辨识的时间序列特征,挖掘数据蕴含的规律,正成为业界研究的热点.本文首先介绍时间序列概念,综述了时间序列分类、聚类和预测三方面研究的最新进展;然后从时间序列特征提取方法的形状特征、时间依赖特性、序列变换特征3方面,详细分析和比较机器学习方法在时间序列问题上的研究情况,最后基于当前时间序列特征提取方法的发展趋势,对时间序列特征提取方法的未来发展做出展望.  相似文献   

11.
翁楦乔  文成林 《控制工程》2022,29(1):175-181
针对传统方法难以利用大量时序数据和无标签数据对电网进行故障诊断的问题,提出了基于深度特征聚类和循环神经网络(RNN)的电网智能故障诊断方法.该方法首先利用卷积神经网络搭建起特征提取器来提取时序数据的高层特征,然后对提取的特征进行半监督聚类,为无标签样本获得对应的标签,从而可以确定无标签样本所属的故障类别并加以利用;然后...  相似文献   

12.
13.
线性邻近点传播(LNP)是一种非常有效的基于图的半监督分类方法,而类重叠与数据分布不平衡问题会使LNP构造图时由于选择的邻居不合理而影响分类性能。采用谱聚类来分析数据的分布,根据聚类结果对邻居选择时的距离度量进行调整,使得选择的邻居更合理。将基于谱聚类的LNP方法应用于时间序列分类,在UCR时间序列挖掘库的四个数据集上进行实验,结果表明该方法比LNP方法具有更高的分类准确率。  相似文献   

14.
Share price trends can be recognized by using data clustering methods. However, the accuracy of these methods may be rather low. This paper presents a novel supervised classification scheme for the recognition and prediction of share price trends. We first produce a smooth time series using zero-phase filtering and singular spectrum analysis from the original share price data. We train pattern classifiers using the classification results of both original and filtered time series and then use these classifiers to predict the future share price trends. Experiment results obtained from both synthetic data and real share prices show that the proposed method is effective and outperforms the well-known K-means clustering algorithm.  相似文献   

15.
Characteristic-Based Clustering for Time Series Data   总被引:1,自引:0,他引:1  
With the growing importance of time series clustering research, particularly for similarity searches amongst long time series such as those arising in medicine or finance, it is critical for us to find a way to resolve the outstanding problems that make most clustering methods impractical under certain circumstances. When the time series is very long, some clustering algorithms may fail because the very notation of similarity is dubious in high dimension space; many methods cannot handle missing data when the clustering is based on a distance metric.This paper proposes a method for clustering of time series based on their structural characteristics. Unlike other alternatives, this method does not cluster point values using a distance metric, rather it clusters based on global features extracted from the time series. The feature measures are obtained from each individual series and can be fed into arbitrary clustering algorithms, including an unsupervised neural network algorithm, self-organizing map, or hierarchal clustering algorithm.Global measures describing the time series are obtained by applying statistical operations that best capture the underlying characteristics: trend, seasonality, periodicity, serial correlation, skewness, kurtosis, chaos, nonlinearity, and self-similarity. Since the method clusters using extracted global measures, it reduces the dimensionality of the time series and is much less sensitive to missing or noisy data. We further provide a search mechanism to find the best selection from the feature set that should be used as the clustering inputs.The proposed technique has been tested using benchmark time series datasets previously reported for time series clustering and a set of time series datasets with known characteristics. The empirical results show that our approach is able to yield meaningful clusters. The resulting clusters are similar to those produced by other methods, but with some promising and interesting variations that can be intuitively explained with knowledge of the global characteristics of the time series.  相似文献   

16.
17.
In this paper, a new classification method (SDCC) for high dimensional text data with multiple classes is proposed. In this method, a subspace decision cluster classification (SDCC) model consists of a set of disjoint subspace decision clusters, each labeled with a dominant class to determine the class of new objects falling in the cluster. A cluster tree is first generated from a training data set by recursively calling a subspace clustering algorithm Entropy Weighting k-Means algorithm. Then, the SDCC model is extracted from the subspace decision cluster tree. Various tests including Anderson–Darling test are used to determine the stopping condition of the tree growing. A series of experiments on real text data sets have been conducted. Their results show that the new classification method (SDCC) outperforms the existing methods like decision tree and SVM. SDCC is particularly suitable for large, high dimensional sparse text data with many classes.  相似文献   

18.
Dynamic Time Warping (DTW) is a popular and efficient distance measure used in classification and clustering algorithms applied to time series data. By computing the DTW distance not on raw data but on the time series of the (first, discrete) derivative of the data, we obtain the so-called Derivative Dynamic Time Warping (DDTW) distance measure. DDTW, used alone, is usually inefficient, but there exist datasets on which DDTW gives good results, sometimes much better than DTW. To improve the performance of the two distance measures, we can combine them into a new single (parametric) distance function. The literature contains examples of the combining of DTW and DDTW in algorithms for supervised classification of time series data. In this paper, we demonstrate that combination of DTW and DDTW can also be applied in a method of time series clustering (unsupervised classification). In particular, we focus on a hierarchical clustering (with average linkage) of univariate (one-dimensional) time series data. We construct a new parametric distance function, combining DTW and DDTW, where a single real number parameter controls the contribution of each of the two measures to the total value of the combined distances. The parameter is tuned in the initial phase of the clustering algorithm. Using this technique in clustering methods requires a different approach (to address certain specific problems) than for supervised methods. In the clustering process we use three internal cluster validation measures (measures which do not use labels) and three external cluster validation measures (measures which do use clustering data labels). Internal measures are used to select an optimal value of the parameter of the algorithm, where external measures give information about the overall performance of the new method and enable comparison with other distance functions. Computational experiments are performed on a large real-world data base (UCR Time Series Classification Archive: 84 datasets) from a very broad range of fields, including medicine, finance, multimedia and engineering. The experimental results demonstrate the effectiveness of the proposed approach for hierarchical clustering of time series data. The method with the new parametric distance function outperforms DTW (and DDTW) on the data base used. The results are confirmed by graphical and statistical comparison.  相似文献   

19.
针对传统时间序列分类方法需要较为繁琐的特征抽取工作以及在只有少量标记数据时分类效果不佳的问题,通过分析BP神经网络和朴素贝叶斯分类器的特点,提出一种基于BP和朴素贝叶斯的时间序列分类模型。利用了BP神经网络非线性映射能力和朴素贝叶斯分类器在少量标记数据下的分类能力,将BP神经网络抽取到的特征输入到朴素贝叶斯分类器中,可以较为有效的解决传统时间序列分类算法的问题。实验结果表明,该模型在标记数据较少的情况下的时间序列分类中具有较高的分类准确度。  相似文献   

20.
Time series classification has been extensively explored in many fields of study. Most methods are based on the historical or current information extracted from data. However, if interest is in a specific future time period, methods that directly relate to forecasts of time series are much more appropriate. An approach to time series classification is proposed based on a polarization measure of forecast densities of time series. By fitting autoregressive models, forecast replicates of each time series are obtained via the bias-corrected bootstrap, and a stationarity correction is considered when necessary. Kernel estimators are then employed to approximate forecast densities, and discrepancies of forecast densities of pairs of time series are estimated by a polarization measure, which evaluates the extent to which two densities overlap. Following the distributional properties of the polarization measure, a discriminant rule and a clustering method are proposed to conduct the supervised and unsupervised classification, respectively. The proposed methodology is applied to both simulated and real data sets, and the results show desirable properties.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号