共查询到20条相似文献,搜索用时 343 毫秒
1.
特征选择是文本聚类的重要环节,传统的阈值截断特征选择方法偏重高权重项,受特征词权重计算公式影响较大.遗传算法具有全局搜索的能力,并允许权重低的特征项以一定概率参与遗传进化.另外,本文提出基于<知网>特征词合并算法,通过合并具有高度相似性的特征词,实现初步降维.实验结果表明,基于<知网>和遗传算法的中文文本聚类特征选择方法能够有效降低特征向量维度,并且聚类结果较为稳定. 相似文献
2.
特征选择是文本聚类的重要环节,传统的阈值截断特征选择方法偏重高权重项,受特征词权重计算公式影响较大。遗传算法具有全局搜索的能力,并允许权重低的特征项以一定概率参与遗传进化。另外,本文提出基于《知网》特征词合并算法,通过合并具有高度相似性的特征词,实现初步降维。实验结果表明,基于《知网》和遗传算法的中文文本聚类特征选择方法能够有效降低特征向量维度,并且聚类结果较为稳定。 相似文献
3.
文章提出了一种基于模糊聚类的文本分类器构造方法,介绍了文本中特征词之间模糊相似度的度量方法,给出了利用“编网法”思想实现模糊聚类的算法。通过比较文本中特征词之间的模糊相似度,实现特征词的聚类,最终获取能够识别文本主题类别的特征词集合,并给出了分类器性能的测试结果。 相似文献
4.
5.
6.
针对微博数据文本内容短小、特征词稀疏以及规模庞大的特点,提出了一种基于MapReduce编程模型的发现微博热点话题的方法。该方法首先利用隐主题分析技术解决了微博内容短小、特征词稀疏的问题,然后利用CURE算法缓解了Kmeans算法对初始点敏感的问题,最后采用基于MapReduce编程模型Kmeans聚类算法,对海量微博短文本数据进行快速聚类。实验结果表明该方法可以有效提高微博热点话题发现的效率。 相似文献
7.
8.
9.
颜色特征是地层图像分割的重要依据,但地层图像的高噪声以及地层边界颜色混合使得颜色特征空间聚类分割方法无法获得很好的结果.本文提出了一种融合模糊C均值聚类与随机游走算法的图像分割算法,该算法在聚类过程中结合像素的空间信息计算像素的隶属度,在基于随机游走的半监督图像分割算法中像素结点构成的四连通图上插入类属结点作为已标记结点,将随机游走者第一次游走到某个类属结点的概率作为该像素隶属于该类的隶属度.实验结果表明,本算法可以对地层边界颜色混合区域的像素更准确地进行分类,噪声敏感性降低,有效解决构造模拟地层图像的分割问题. 相似文献
10.
11.
With the rapid development of social media platforms, huge amount of user generated contents (UGC) are generated ceaselessly. In recent years, content based microblog retrieval has attracted extensive research attention. Effective microblog retrieval services complex analysis of short text and multimedia contents. In this paper, we present a quality biased multimedia microblog retrieval framework. First, we develop an anchor graph based multiview embedding framework which maps the multimedia content features into a unified latent space. Then, the content matching scores of testing microblogs related to the query are obtained by a Markov random field. Further, we employ an quality model to incorporate both microblog quality and content matching. As compared with the state-of-art methods, experimental results demonstrate the effectiveness of the proposed approach. 相似文献
12.
《Mechatronics》2014,24(8):1189-1202
13.
结合颜色和MGD特征及MRF模型的场景文本分割 总被引:1,自引:1,他引:0
针对场景文本受到光照、复杂背景等因素影响而难以进行有效分割的问题,提出了一种融合颜色和最大梯度差(MGD,maximum gradient difference)特征及马尔科夫随机场(MRF,Markov random field)的场景文本分割方法。首先提取能够有效表达文本纹理特性的MGD特征,通过概率框架将其和颜色特征结合起来对观测图像进行建模;然后结合空间关系和邻域像素属性差异对传统势函数进行改进;最后建立场景文本分割的MRF模型,利用图割(graph cut)算法快速地求解该模型。实验结果表明,采用颜色和MGD特征相结合以及改进的势函数对分割结果具有较大地改善,尤其在光照不均匀及背景复杂情况下相比其他算法取得了较好的性能。 相似文献
14.
Abstractive text summarization is a process of making a summary of a given text by paraphrasing the facts of the text while keeping the meaning intact. The manmade summary generation process is laborious and time-consuming. We present here a summary generation model that is based on multilayered attentional peephole convolutional long short-term memory (MAPCoL; LSTM) in order to extract abstractive summaries of large text in an automated manner. We added the concept of attention in a peephole convolutional LSTM to improve the overall quality of a summary by giving weights to important parts of the source text during training. We evaluated the performance with regard to semantic coherence of our MAPCoL model over a popular dataset named CNN/Daily Mail, and found that MAPCoL outperformed other traditional LSTM-based models. We found improvements in the performance of MAPCoL in different internal settings when compared to state-of-the-art models of abstractive text summarization. 相似文献
16.
Sentiment classification has attracted increasing interest from natural language processing. The goal of sentiment classification is to automatically identify whether a given piece of text expresses positive or negative opinion on a topic of interest. This paper presents the standpoint that uses individual model (i-model) based on artificial neural networks (ANNs) to determine text sentiment classification. The individual model consists of sentimental features, feature weight and prior knowledge base. During the training process, i-model that makes right sentimental judgment will correct those are wrong, to make more accurate prediction of text sentiment polarity. Experimental results show that the accuracy of individual model is higher than that of support vector machines (SVMs) and hidden Markov model (HMM) classifiers on movie review corpus. 相似文献
17.
神经网络在处理中文文本情感分类任务时,文本显著特征提取能力较弱,学习速率也相对缓慢.针对这一问题,文中提出一种基于注意力机制的混合网络模型.首先对文本语料进行预处理,利用传统的卷积神经网络对样本向量的局部信息进行特征提取,并将其输入耦合输入和遗忘门网络模型,用以学习前后词句之间的联系.随后,再加入注意力机制层,对深层次... 相似文献
18.
针对通过微博文本获取用户情感倾向,以提高舆情监控效率的问题。利用深度学习的方法实现微博语料的情感分类,构建符合近年文本长度分布特点的高质量微博情感分类数据集,分析微博文本长度对情感分类的影响。由于中长语料主观性强、句子关联度弱,其检测准确率偏低。针对此问题,本文提出一种基于胶囊网络的中长微博情感分析模型。采用注意力机制,在融合局部特征与全局特征的基础上,利用胶囊向量实现深层情感特征提取,提高中长语料的检测效果。利用本文搜集的数据集进行实验,结果表明,相较于多种深度学习算法,本文模型性能更佳。在不同文本长度语料的对比实验中,伴随着文本长度的增加,分类准确率逐渐降低。相较于传统的LSTM算法,本文模型随文本长度增加效果提升,证明了该模型针对中长微博文本情感分类的可行性。 相似文献
19.
文本信息抽取是处理海量文本的重要手段之一.最大熵模型提供了一种自然语言处理的方法.提出了一种基于最大熵的隐马尔可夫模型文本信息抽取算法.该算法结合最大熵模型在处理规则知识上的优势,以及隐马尔可夫模型在序列处理和统计学习上的技术基础,将每个观察文本单元所有特征的加权之和用来调整隐马尔可夫模型中的转移概率参数,实现文本信息抽取.实验结果表明,新的算法在精确度和召回率指标上比简单隐马尔可夫模型具有更好的性能. 相似文献
20.
Stereo matching has been studied for many years and is still a challenge problem. The Markov Random Fields (MRF) model and the Conditional Random Fields (CRF) model based methods have achieved good performance recently. Based on these pioneer works, a deep conditional random fields based stereo matching algorithm is proposed in this paper, which draws a connection between the Convolutional Neural Network (CNN) and CRF. The object knowledge is used as a soft constraint, which can effectively improve the depth estimation accuracy. Moreover, we proposed a CNN potential function that learns the potentials of CRF in a CNN framework. The inference of the CRF model is formulated as a Recurrent Neural Network (RNN). A variety of experiments have been conducted on KITTI and Middlebury benchmark. The results show that the proposed algorithm can produce state-of-the-art results and outperform other MRF-based or CRF-based methods. 相似文献