期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

姜伦丁华福于飞《中国科技博览》2009,(9):208-208

本文提出了k-means聚类算法中选取初始聚类中心及处理孤立点的新方法,改进了k-means算法对初始聚类中心和孤立点文本很敏感的不足之处,并将改进后的算法应用于中文文本聚类中。实验结果表明,改进的算法较原算法在准确率上有较大提高,并且具有更好的稳定性。相似文献

2.

文本聚类中的改进特征权重算法

褚蕾蕾常文波李秦《工程数学学报》2012,29(4):523-528

本文提出了一种新的基于词频和文档频率的特征词权重计算方法ETFC.首先构造了新的函数作为特征词的类别区分度,加强了低文档频数特征词的类别区分能力.然后运用k-means算法进行聚类实验.结果表明,改进后的权重算法ETFC比现有的权重算法TFIDF和TFC在聚类纯度和算法的稳定性方面均有较大提高,从而表明改进策略是可行的. 相似文献

3.

一种中文文本聚类算法的研究

杨俊廖闻剑彭艳兵《硅谷》2009,(5)

聚类是一种无指导的分类方法,在没有预先定义好分类的情况下,将一个大的数据集合分成若干个簇,要求数据在同一个簇中相似度尽可能大,而不同簇之间相似度尽可能小。聚类作为数据挖掘的一种重要方法,现在越来越被人们所重视。目前常见的聚类方法有:基于划分的聚类方法、基于层次的聚类方法、基于局部的聚类方法和基于模型的聚类方法,吸取各类聚类算法的实质,提出一种预设阀值,逐一归类的简单聚类实现算法,并在后端对聚类结果做精确行处理,经实验验证该方法能达到一定的聚类效果。相似文献

4.

彩色图像量化的FSCAMMD聚类算法 总被引：5，自引：0，他引：5

凌玲凌卫新《工程图学学报》2001,22(3):65-70

提出了一种基于模式识别技术的彩色图像量化的新算法－－基于最小距离最大的快速统计聚类算法（FSCAMMD）。本算法克服了SCA算法对聚类中心初始值选取的不足,给出了最大频度与类内最小距离最大相结合的方法－－初始值优选法,实验结果表明,本算法可较大幅度地减少图像量化后的总方差以颜色失真度。相似文献

5.

基于词典中词语量化关系的中文文本聚类研究

胡熠陆汝占陈玉泉刘慧《高技术通讯》2007,17(8):778-782

鉴于词语知识对提高文本聚类性能的价值,提出了一种用线性插值方式把词典词语之间的量化关系和余弦相似度结合起来的文本相似度计算方法.在实现文本聚类之前,基于词典中一个词条和其释义在语义上等价的假设,构建出词条和释义中词语之间的量化关系,并把这种量化关系值作为文本聚类用到的知识.在k-均值聚类算法的框架下,这种以线性插值方式构造的新的相似度,给文本聚类系统性能带来了明显的提高.实验结果说明从词典中获取的词语量化关系对将来的文本聚类研究可能会有潜在的贡献. 相似文献

6.

使用“分裂-合并"策略改进文本聚类集成算法的研究

卢志茂徐森刘远超顾国昌《高技术通讯》2010,20(7)

探讨了"分裂-合并"(DM)策略对文本聚类集成算法改进的效果。首先在聚类成员生成阶段运行使用DM策略的超球K均值(SKM)算法r次,每次生成较多的文本子簇,并根据子簇的相似性使用凝聚层次聚类方法合并这些子簇,得到r个聚类成员,随后在聚类集成阶段采用两个快速的谱聚类算法进行集成。在6组真实文本集上进行了实验,使用DM策略的两个聚类集成算法获得的平均标准化互信息(NMI)分别比改进前的算法提高了4.6和7.9个百分点,证明了DM策略可以有效提高文本聚类集成算法的聚类质量。相似文献

7.

基于划分的文本聚类算法在标准文献中的试验与对比研究

甘克勤丛超张宝林孙旭凯《标准科学》2013,(10):47-50

本文分析了文本聚类的概念和分类,然后着重描述基于划分的文本聚类方法并描述其算法核心,将其在应用标准文献题录数据中进行聚类试验,并分析最终的试验结果,得出结论。相似文献

8.

区域生长和C-均值聚类结合的图像分割方法

李媛王浩全张培《测试技术学报》2012,(1):31-34

结合边缘检测的模糊C-均值聚类图像分割方法,本文提出一种基于区域生长和模糊C-均值聚类相结合的图像分割方法.采用与区域生长类似的方法,寻找图像中封闭边缘围成的相互独立的区域,根据物理就近原则对边缘点进行归类,完成图像的分割.经实验验证:目标区的分割相对完整. 相似文献

9.

基于机器学习的文本半自动类别标注方法

宫衍圣蔡科平王志强李鑫鑫靖稳峰《工程数学学报》2021,38(6):750-762

在文本分类问题中,人工标注方式需要耗费大量人力和财力,且需要熟悉所研究领域的专业人员才能进行文本标注。为了提高文本类数据标注的效率,提出了一种半自动化论文类别标注方法。首先使用 Word2vec 与 TF-IDF 相结合的方式得到论文的向量表示;接着使用 K-means 算法进行文本聚类;然后通过 $L_1$-LR 二分类模型构建 $K$ 个分类模型;对每个二分类模型选取其权重绝对值较大系数对应的单词作为主题词,最后根据主题词确定每一类别的标签。实验表明,所提出的论文类别半自动标注方法大大提高了文本标注的工作效率。相似文献

10.

用基于邓氏灰色关联度的聚类方法对煤种进行聚类的研究 总被引：1，自引：0，他引：1

陈慧清胡小芳吴成宝《中国粉体技术》2010,16(3):19-21,25

为实现煤种的准确分类,采用试验研究和定量分析的方法,测量了25组试验煤样的发热量、灰分、挥发分和硫分4项参数;运用基于邓氏灰色关联度的聚类方法,并根据其原理编写VB程序,实现该方法的程序化,对煤样的4项参数进行分析。由计算结果可知,当权值为0.4、临界值为0.68时,获得了最好的聚类结果,据此可从关联性角度研究煤样特性的相似特征,并对具备共同特征的煤种进行聚类。结果表明:在任选的25个试验样本中,误差样本只有一个,准确度为96%。相似文献

11.

Machine transliteration and transliterated text retrieval: a survey

Dinesh Kumar Prabhakar Sukomal Pal 《Sadhana》2018,43(6):93

相似文献

12.

基于累积Logistic 回归分析的文本段落聚类策略研究

徐永东徐志明王晓龙《高技术通讯》2006,16(8):789-794

提出一种新的文本段落聚类策略,该策略采用多特征融合思想尽可能多地挖掘段落内的特征,并采用累积Logistic 回归分析方法来拟合这些特征与段落相似度之间的内在关联,使得段落相似度计算的结果更为理想.最后采用层次聚合聚类算法中的complete-link方法对段落集合进行聚类处理.通过网络真实文本进行了段落相似度度量实验和段落聚类实验,实验结果显示了方法的可行性. 相似文献

13.

Evaluation of search algorithms and clustering efficiency measures for machine-part matrix clustering 总被引：1，自引：0，他引：1

M. Shargal S. Shekhar S.A. Irani 《IIE Transactions》1995,27(1):43-59

Clustering a machine-part matrix is the first step in the design of a cellular manufacturing system. It provides a basis for matching the machine groups to the part families that they must produce. The problem of clustering a machine-part matrix can be decomposed into two problems: designing a measure for clustering efficiency (CE) and searching for a permutation of rows and columns of the matrix to maximize this measure. Clustering is done by permuting the rows and columns of the initial machine-part matrix to produce a block diagonal form (BDF). The clustering efficiency of a machine-part matrix measures the desirability of its BDF as a solution to cell design. This paper evaluates six measures of CE and six search methods. Extensive experiments were carried out to find the combination of CE measure and search method that produces the best solution in reasonable CPU time. We used several benchmark machine-part matrices from the literature and several problems obtained from a local manufacturer. We performed a multivariate analysis of variance (MANOVA) to compare the search algorithms and the CE measures. 相似文献

14.

Genetic algorithms for symbolic clustering

K Chidananda Gowda T V Ravi 《Sadhana》1996,21(4):465-475

This paper introduces a novel methodology for clustering of symbolic objects by making use of Genetic Algorithms (GAs). GAs are a family of computational models inspired by evolution. These algorithms encode potential solutions to specific problems on simple chromosome-like data structures and apply recombination operators to these structures so as to preserve critical information. A new type of representation for chromosome structure is presented here along with a new method for mutation. The efficacy of the proposed method is examined by application to numeric data of known number of classes and also to assertion type of symbolic objects drawn from the domain of fat oil, microcomputers, microprocessors and botany. The validity of the clusters obtained is examined. 相似文献

15.

Noninterferometric and nontomographic iterative method for field retrieval

Dragoman D 《Applied optics》2004,43(21):4208-4213

A new method of field recovery is proposed based on the fact that an arbitrary light beam can be expressed as a finite sum over orthogonal field distributions with unknown coefficients. If these field distributions are eigenmodes of a specific waveguide, the coefficients of the field decomposition in orthogonal eigenmodes can be determined iteratively when the unknown field is passed through a series of waveguides that support an increasing number of modes. This series of waveguides can be replaced by a single reconfigurable electro-optic waveguide, which one can use to recover the unknown field by performing only intensity measurements. 相似文献

16.

Simple lossless preprocessing algorithms for text compression

《Software, IET》2009,3(1):37-45

Lossless data compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic coding, the Lempel-Ziv family, prediction by partial matching and Burrow-Wheeler transform based algorithms. One approach for attaining better compression is to develop generic, reversible transformation that can be applied to a source text that improves an existing compression algorithm's ability to compress. A few reversible transformation techniques that give better compression ratios are presented. A method, which transforms a text file into intermediate file with minimum possible byte values, is proposed. An attempt has been made to reduce the number of possible bytes that appear after every byte in the source file. This increases backend algorithm's compression performance. 相似文献

17.

Thermodynamic product retrieval methodology and validation for NAST-I 总被引：1，自引：0，他引：1

Zhou DK Smith WL Li J Howell HB Cantwell GW Larar AM Knuteson RO Tobin DC Revercomb HE Mango SA 《Applied optics》2002,41(33):6957-6967

The National Polar-Orbiting Operational Environmental Satellite System (NPOESS) Airborne Sounder Testbed (NAST) consists of two passive collocated cross-track scanning instruments, an infrared interferometer (NAST-I) and a microwave radiometer (NAST-M), that fly onboard high-altitude aircraft such as the NASA ER-2 at an altitude near 20 km. NAST-I provides relatively high spectral resolution (0.25-cm(-1)) measurements in the 645-2700-cm(-1) spectral region with moderate spatial resolution (a linear resolution equal to 13% of the aircraft altitude at nadir) cross-track scanning. We report the methodology for retrieval of atmospheric temperature and composition profiles from NAST-I radiance spectra. The profiles were determined by use of a statistical eigenvector regression algorithm and improved, as needed, by use of a nonlinear physical retrieval algorithm. Several field campaigns conducted under varied meteorological conditions have provided the data needed to verify the accuracy of the spectral radiance, the retrieval algorithm, and the scanning capabilities of this instrumentation. Retrieval examples are presented to demonstrate the ability to reveal fine-scale horizontal features with relatively high vertical resolution. 相似文献

18.

基于类语义结构表示的文本分类

《中国计量学院学报》2020,(2):215-224

目的:针对文本分类任务,在综合考虑语义和结构信息的基础上,提出基于类语义结构的表示模型。方法:该模型先把词嵌入空间划分成不同的类子空间,在每个类子空间中选择对类别有代表性的特征词,再将特征词对应的词嵌入进行组合得到类特征向量,最后将所有的类特征向量进行级联形成文本的向量表示。结果:与其他加权词嵌入表示方法在多个数据集上进行实验比较,分类精度提高了5%～15%。结论:表明该模型在文本分类任务中具有更好的性能。相似文献

19.

Performance analysis for automated storage and retrieval systems 总被引：1，自引：0，他引：1

Heungsoon Felix Lee 《IIE Transactions》1997,29(1):15-28

Automated storage and retrieval (AS/R) systems have had a dramatic impact on material handling and inventory control in warehouses and production systems. A unit-load AS/R system is generic and other AS/R systems represent its variations. Common techniques that are used to predict performance of a unit-load AS/RS are a static analysis or computer simulation. A static analysis requires guessing a ratio of single cycles to dual cycles, which can lead to poor prediction. Computer simulation can be time-consuming and expensive. In order to resolve these weaknesses of both techniques, we present a stochastic analysis of a unit-load AS/RS by using a single-server queueing model with unique features. To our knowledge, this is the first study of a stochastic analysis of unit-load AS/R systems by an analytical method. Experimental results show that the proposed method is robust against violation of the underlying assumptions and is effective for both short-term and long-term planning of AS/R systems. 相似文献