首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Due to the slow processing speed of text topic clustering in stand-alone architecture under the background of big data, this paper takes news text as the research object and proposes LDA text topic clustering algorithm based on Spark big data platform. Since the TF-IDF (term frequency-inverse document frequency) algorithm under Spark is irreversible to word mapping, the mapped words indexes cannot be traced back to the original words. In this paper, an optimized method is proposed that TF-IDF under Spark to ensure the text words can be restored. Firstly, the text feature is extracted by the TF-IDF algorithm combined CountVectorizer proposed in this paper, and then the features are inputted to the LDA (Latent Dirichlet Allocation) topic model for training. Finally, the text topic clustering is obtained. Experimental results show that for large data samples, the processing speed of LDA topic model clustering has been improved based Spark. At the same time, compared with the LDA topic model based on word frequency input, the model proposed in this paper has a reduction of perplexity.  相似文献   

2.
Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning. This paper implements and compares unsupervised and semi-supervised clustering analysis of BOAArgo ocean text data. Unsupervised K-Means and Affinity Propagation (AP) are two classical clustering algorithms. The Election-AP algorithm is proposed to handle the final cluster number in AP clustering as it has proved to be difficult to control in a suitable range. Semi-supervised samples thermocline data in the BOA-Argo dataset according to the thermocline standard definition, and use this data for semi-supervised cluster analysis. Several semi-supervised clustering algorithms were chosen for comparison of learning performance: Constrained-K-Means, Seeded-K-Means, SAP (Semi-supervised Affinity Propagation), LSAP (Loose Seed AP) and CSAP (Compact Seed AP). In order to adapt the single label, this paper improves the above algorithms to SCKM (improved Constrained-K-Means), SSKM (improved Seeded-K-Means), and SSAP (improved Semi-supervised Affinity Propagationg) to perform semi-supervised clustering analysis on the data. A DSAP (Double Seed AP) semi-supervised clustering algorithm based on compact seeds is proposed as the experimental data shows that DSAP has a better clustering effect. The unsupervised and semi-supervised clustering results are used to analyze the potential patterns of marine data.  相似文献   

3.
针对除湿机系统的故障诊断问题及其特点,以CFTZ21型除湿机为对象,应用模糊C-均值聚类(FCM)算法进行了研究;引入遗传算法对传统模糊C-均值聚类算法进行了改进,克服了传统算法的不足;结合实验采集到的数据样本,对改进后的遗传模糊C-均值聚类算法进行检验,结果达到预期效果,由此说明,将改进的FCM应用于除湿机故障诊断是可行的。  相似文献   

4.
In this paper, job shop scheduling problem with outsourcing options is considered and a novel shuffled frog-leaping algorithm (SFLA) is presented to minimise total tardiness under condition that total outsourcing cost does not exceed a given upper bound. In SFLA, a tournament selection-based method is used to decompose the whole population into some memeplexes, the search process in each memeplex is done on the best solution of the memeplex and composed of the global search step and the multiple neighbourhood search step. SFLA is tested on a number of instances and compared with some methods from the literature. Computational results validate the promising performance of SFLA on the considered problem.  相似文献   

5.
一种新的聚类分析距离算法   总被引:1,自引:0,他引:1  
经研究发现现有聚类分析算法普遍存在聚类盲目性,而这又直接影响着聚类的质量.针对这一问题,本文以Parks距离算法为例,提出了一种改良的聚类分析距离算法,并将新距离算法应用在船舶装配产品归类中,经过对改良前后算法的比较,验证了新算法的优越性.  相似文献   

6.
In a large-scale wireless sensor network (WSN), densely distributed sensor nodes process a large amount of data. The aggregation of data in a network can consume a great amount of energy. To balance and reduce the energy consumption of nodes in a WSN and extend the network life, this paper proposes a nonuniform clustering routing algorithm based on the improved K-means algorithm. The algorithm uses a clustering method to form and optimize clusters, and it selects appropriate cluster heads to balance network energy consumption and extend the life cycle of the WSN. To ensure that the cluster head (CH) selection in the network is fair and that the location of the selected CH is not concentrated within a certain range, we chose the appropriate CH competition radius. Simulation results show that, compared with LEACH, LEACH-C, and the DEEC clustering algorithm, this algorithm can effectively balance the energy consumption of the CH and extend the network life.  相似文献   

7.
田源  王洪涛 《计量学报》2016,37(6):582-586
为了提高图像边缘特征提取质量,采取了量子核聚类算法。首先把像素映射量子编码,在码元建立域内对像素块进行随机采样;然后通过聚类距离计算数据点和每一个聚类核心的距离,把数据向量分配到距离最小的核心向量中,核函数确定有效影响范围;最后对像素聚类相异性分析,给出了算法流程。实验仿真显示这种算法对图像边缘特征提取轮廓清晰,连贯性好,评价指标MS和聚类准确率较好,算法收敛快。  相似文献   

8.
Air quality prediction is an important part of environmental governance. The accuracy of the air quality prediction also affects the planning of people’s outdoor activities. How to mine effective information from historical data of air pollution and reduce unimportant factors to predict the law of pollution change is of great significance for pollution prevention, pollution control and pollution early warning. In this paper, we take into account that there are different trends in air pollutants and that different climatic factors have different effects on air pollutants. Firstly, the data of air pollutants in different cities are collected by a sliding window technology, and the data of different cities in the sliding window are clustered by Kohonen method to find the same tends in air pollutants. On this basis, combined with the weather data, we use the ReliefF method to extract the characteristics of climate factors that helpful for prediction. Finally, different types of air pollutants and corresponding extracted the characteristics of climate factors are used to train different sub models. The experimental results of different algorithms with different air pollutants show that this method not only improves the accuracy of air quality prediction, but also improves the operation efficiency.  相似文献   

9.
本文分析了文本聚类的概念和分类,然后着重描述基于划分的文本聚类方法并描述其算法核心,将其在应用标准文献题录数据中进行聚类试验,并分析最终的试验结果,得出结论。  相似文献   

10.
The Quadratic Assignment Problem (QAP) is a difficult and important problem studied in the domain of combinatorial optimisation. It is possible to solve QAP instances with 10--20 facilities using exhaustive parallel algorithms within a few days on a cluster machine. However, large QAP instances with more than 100 facilities are not solvable using exhaustive techniques. We have explored a variety of Genetic Algorithm crossover operators for this problem and verified its performance experimentally using well-known instances from the QAPLIB library. By increasing the number of processors, generations and population sizes we have been able to find solutions that are the same as (or very close to) the best reported solutions for large QAP instances in QAPLIB. In order to parallelise the Genetic Algorithm we generate and evolve separate solution pools on each cluster processor, using an island model. This model exchanges 10% of each processor’s solutions at the initial stages of optimisation. We show experimentally that both execution times and solution qualities are improved for large QAP instances by using our Island Parallel Genetic Algorithm.  相似文献   

11.
Flexible job shop scheduling problem (FJSP) has been extensively investigated and objectives are often related to time. Energy-related objective should be considered fully in FJSP with the advent of green manufacturing. In this study, FJSP with the minimisation of workload balance and total energy consumption is considered and the conflicting between two objectives is analysed. A shuffled frog-leaping algorithm (SFLA) is proposed based on a three-string coding approach. Population and a non-dominated set are used to construct memeplexes according to tournament selection and the search process of each memeplex is done on its non-dominated member. Extensive experiments are conducted to test the search performance of SFLA and computational results show the conflicting between two objectives of FJSP and the promising advantages of SFLA on the considered FJSP.  相似文献   

12.
目的 针对目前烟草物流配送中心条烟分拣量大,不同条烟品规的分配对订单的总处理时间影响较大的问题,研究平衡各个分拣区品规的分配,提高分拣效率。方法 建立以各分区品规相似系数和最小为目标函数的数学模型,并采用改进的遗传粒子群动态聚类(GAPSO-K)算法进行求解。首先,结合各品规分拣量对品规相似系数进行改进,并将其作为适应度函数;然后在粒子群算法中对惯性权重因子进行改进,使其值可以进行自适应改变;最后,在粒子群动态聚类算法中引入遗传算法中的交叉变异扩大解的搜索范围,基于Matlab对文中的其他算法进行求解对比,求得结果在EM-plant中进行仿真验证。结果 结合某烟草物流配送中心数据仿真验证,利用GAPSO-K算法处理订单的时间为234.5s,较传统时间大幅度较少,有效提升了柔性物流分拣效率。结论 采用该算法可充分发挥2种算法的优良性,具有更好的收敛性及寻优性,为柔性物流品规分配提供了新思路。  相似文献   

13.
一种基于半模糊聚类的故障诊断方法   总被引:1,自引:0,他引:1  
为满足故障诊断的实时性和准确性要求,采用阈值化类内距离的方法,研究了一种快速收敛的半模糊c均值(SFCM)聚类诊断方法.证明了SFCM算法的模糊加权幂指数m在区间(0,1)取值时能实现半模糊聚类,讨论了阈值η对算法的影响并给出了算法步骤.以机载武器控制系统信息通道为诊断对象,采用该方法对通道进行了样本无监督分类验证和故障模式识别诊断试验.结果表明:SFCM算法能对信息通道故障模式进行快速准确的分类识别.  相似文献   

14.
平庆杰 《工业计量》2006,16(6):8-10
文章提出一种根据模糊聚类的思想来确定RBF神经网络隐层节点数,并用K-Means的聚类算法来训练RBF神经网络.并根据此算法进行仿真,并证明是有效的.  相似文献   

15.
Ye Xu  Ling Wang  Shengyao Wang  Min Liu 《工程优选》2013,45(12):1409-1430
In this article, an effective shuffled frog-leaping algorithm (SFLA) is proposed to solve the hybrid flow-shop scheduling problem with identical parallel machines (HFSP-IPM). First, some novel heuristic decoding rules for both job order decision and machine assignment are proposed. Then, three hybrid decoding schemes are designed to decode job order sequences to schedules. A special bi-level crossover and multiple local search operators are incorporated in the searching framework of the SFLA to enrich the memetic searching behaviour and to balance the exploration and exploitation capabilities. Meanwhile, some theoretical analysis for the local search operators is provided for guiding the local search. The parameter setting of the algorithm is also investigated based on the Taguchi method of design of experiments. Finally, numerical testing based on well-known benchmarks and comparisons with some existing algorithms are carried out to demonstrate the effectiveness of the proposed algorithm.  相似文献   

16.
Several pests feed on leaves, stems, bases, and the entire plant, causing plant illnesses. As a result, it is vital to identify and eliminate the disease before causing any damage to plants. Manually detecting plant disease and treating it is pretty challenging in this period. Image processing is employed to detect plant disease since it requires much effort and an extended processing period. The main goal of this study is to discover the disease that affects the plants by creating an image processing system that can recognize and classify four different forms of plant diseases, including Phytophthora infestans, Fusarium graminearum, Puccinia graminis, tomato yellow leaf curl. Therefore, this work uses the Support vector machine (SVM) classifier to detect and classify the plant disease using various steps like image acquisition, Pre-processing, Segmentation, feature extraction, and classification. The gray level co-occurrence matrix (GLCM) and the local binary pattern features (LBP) are used to identify the disease-affected portion of the plant leaf. According to experimental data, the proposed technology can correctly detect and diagnose plant sickness with a 97.2 percent accuracy.  相似文献   

17.
模拟退火与模糊C-均值聚类相结合的图像分割算法   总被引:7,自引:0,他引:7  
模糊C-均值(FCM)聚类算法是一种结合无监督聚类和模糊集合概念的图像分割技术,比较有效,但存在着受初始聚类中心和隶属度矩阵影响,可能收敛到局部极小的缺点.将模拟退火算法(SA)与模糊C-均值聚类算法相结合,在合理选择冷却进度表的基础上,依据模糊C-均值聚类算法建立模拟退火算法的目标函数,实现了基于模拟退火的模糊C-均值聚类图像分割算法.实验表明,该方法具有搜索全局最优解的能力,因而可得到很好的图像分割结果.  相似文献   

18.
Mobile commerce (m-commerce) contributes to increasing the popularity of electronic commerce (e-commerce), allowing anybody to sell or buy goods using a mobile device or tablet anywhere and at any time. As demand for e-commerce increasestremendously, the pressure on delivery companies increases to organise theirtransportation plans to achieve profits and customer satisfaction. One important planning problem in this domain is the multi-vehicle profitable pickup and delivery problem(MVPPDP), where a selected set of pickup and delivery customers need to be served within certain allowed trip time. In this paper, we proposed hybrid clustering algorithms with the greedy randomised adaptive search procedure (GRASP) to construct an initial solution for the MVPPDP. Our approaches first cluster the search space in order toreduce its dimensionality, then use GRASP to build routes for each cluster. We compared our results with state-of-the-art construction heuristics that have been used to construct initial solutions to this problem. Experimental results show that our proposed algorithms contribute to achieving excellent performance in terms of both quality of solutions and processing time.  相似文献   

19.
K-均值聚类中心分析法实现红外人体目标分割   总被引:5,自引:1,他引:4  
云廷进  郭永彩  高潮 《光电工程》2008,35(3):140-144
针对由于不同红外成像设备参数差异以及目标周围环境影响而造成的红外目标分割阈值自动选取算法的鲁棒性差这一问题,本文从红外成像的机理出发,提出了一个新的解决方案并加以实现.首先对图像的直方图采用K-均值聚类,然后通过对聚类中心分布特点的分析,确定图像分割的阈值.该方法不需要事先对图像进行均衡和对背景分布进行假设.实验结果表明,算法具有良好的鲁棒性.  相似文献   

20.
李积英  党建武 《光电工程》2013,40(1):126-131
针对模糊C-均值算法对初始值的依赖,容易陷入局部最优值的缺点,本文提出将量子蚁群算法与FCM聚类算法结合,首先利用量子蚁群算法的全局性和鲁棒性以及快速收敛的优点确定图像的初始聚类中心和聚类个数,再将所得结果作为FCM聚类算法的初始参数,然后用FCM聚类算法对医学图像进行分割。实验结果表明,该方法有效解决了FCM算法对初始参数的依赖,克服了FCM算法及蚁群算法容易陷入局部极值的的缺点,而且在分割速度和精度上得到了较大提高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号