首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
支持向量机(SVM)由于其出色的泛化能力,已成为目标检测领域应用最为广泛的分类器之一.然而在检测过程中,过多的支持向量会产生很大的时间开销,从而降低目标检测系统的实时性.针对此问题,提出一种约简支持向量的方法,以降低分类器的决策开销,加快检测速度.此方法采用迭代的方式来估计特征空间中向量的原像,通过构建精简原像集来简化支持向量机,从而达到了提升分类速度的效果.利用精简的SVM结合Selective Search+ BoW模型构建了一款快速检测器,测试结果表明:该检测器能够在保证检测率的前提下,通过约简支持向量,提高目标检测的实时性.  相似文献   

2.
本文通过进行大量预处理工作,将经过词袋模型和Word2Vec两种不同向量化方法处理后的文本数据分别输入到SVM和LSTM模型中,训练出可以识别文本情感倾向的模型.进而对新产生的评论进行分类.根据实际数据量的倾斜状况,基于传统机器学习算法支持向量机(SVM),本文提出双层支持向量机,采用2种不同的方法分别训练模型并预测....  相似文献   

3.
本文提出了一种改进的快速人脸检测方法.训练阶段节约了训练支持向量机(SVM)的时间;检测阶段利用肤色在YCbCr空间的聚类性和肤色符合高斯分布的特点,建立肤色模型并分割皮肤颜色.分割区域进行连通域体态分析后,用PCA方法将待检测样本降维处理再用SVM检测.实验结果表明,这种方法的检测效果令人满意.  相似文献   

4.
为了提高支持向量机(SVM)在嵌入环境中的适用性,提出了一种用于SVM训练和分类的可扩展硬件架构,并基于FPGA平台测试了其性能.基于映射-归约(MapReduce)模型分析提取出SVM算法中的并行性,并进一步映射至多个并行处理单元.实验表明,该架构可基于定点运算单元有效地完成SVM训练和分类,并具有良好的可扩展性.  相似文献   

5.
大规模的netflow训练数据集是构建高质量、高稳定网络流量分类器的必然要求。但随着网络流特征维数的提高和数据集规模的扩大,无论是网络流的分析处理还是基于支持向量机(SVM)的分类器模型的训练,都无法在有效的时间内得到有效的处理结果。本文基于Hadoop云计算平台,采用MapReduce技术对SVM网络流量分类器进行分布式学习和训练,构建CloudSVM网络流量分类器。通过对来自校园网出口镜像的近2 T的大规模网络流量的跟踪文件的分布式存储和处理,对抽取的样本数据集进行分类,实验验证了基于Hadoop平台分布式存储和并行处理大规模网络数据集的高效率性,也验证了CloudSVM分类器在不降低分类准确度的情况下可以快速收敛到最佳,并随着大规模网络流样本的增加,SVM分类器训练的时间趋近平稳。  相似文献   

6.
为提高支持向量机(SVM)ε不敏感损失函数下的回归算法的训练速度,提出了一种新的管道压缩模型,利用大ε值下的回归函数来预测小ε值下回归函数的支持向量.由该模型导出一种新的算法结构.在对记忆非线性功率放大器的SVM预失真器进行建模仿真中,将Keerthy的SMO算法同新的算法结构相结合,结果表明了新算法结构在不损失SVM预失真器性能的基础上,显著地提高了训练速度.  相似文献   

7.
支持向量机(support vector machine SVM)作为一种可训练的机器学习方法,目前已广泛应用于股票预测中。文章利用matlab和支持向量机,结合股票前一段时间的走势数据,做出数据回归拟合,得到训练模型,然后根据训练模型对未来的股票指数做出预测,对预测数据和原始数据做一些误差分析。  相似文献   

8.
传统SVM在训练大规模数据集时,训练速度慢,时间消耗代价大.针对此问题,提出利用FCM算法对训练样本集进行预处理,依据样本隶属度提取出所有可能的支持向量进行SVM训练.利用原始数据集对算法进行验证,此算法在保证SVM分类精度的同时,大大提高了训练速度,算法具有可行性.  相似文献   

9.
孙广路  王晓龙  刘秉权  关毅 《电子学报》2008,36(12):2450-2453
 提出了一种基于信息熵的层次词聚类算法,并将该算法产生的词簇作为特征应用到中文组块分析模型中.词聚类算法基于信息熵的理论,利用中文组块语料库中的词及其组块标记作为基本信息,采用二元层次聚类的方法形成具有一定句法功能的词簇.在聚类过程中,设计了优化算法节省聚类时间.用词簇特征代替传统的词性特征应用到组块分析模型中,并引入名实体和仿词识别模块,在此基础上构建了基于最大熵马尔科夫模型的中文组块分析系统.实验表明,本文的算法提升了聚类效率,产生的词簇特征有效地改进了中文组块分析系统的性能.  相似文献   

10.
针对基于双马赫-曾德尔干涉的分布式光纤扰动传 感系统对传感光信号的偏振状态需要进行实时监测以便及时进行偏振控制的问题,本文提出 一种基于支持向量机(SVM)的偏振状态快速判别算法。该方法对采集到的两路干涉信号进 行做差处理,得到原始数据。提取原始数据中的过零率与峰值的绝对和作为分类器的输入特 征向量,使用Verilog HDL语言编写支持向量机(SVM)分类识别算法,借助现场可编程门阵 列(FPGA)的硬件并行结构和流水线技术实现分类器权系数的快速迭代求解。通过提取未知 偏振状态信号的特征向量,并将其输入到训练好的支持向量机模型中可以实现高效率的偏振 状态识别。实验结果表明:本方法可以快速并准确的实现对系统偏振状态的识别判断,平均 识别率达到93.25%,分类模型训练时间在100 ms以内,平均识别响应时间在8ms以内。  相似文献   

11.
基于Web网页语料构建动态语言模型   总被引:1,自引:0,他引:1  
为语音识别系统构建语言模型,首先要进行语料准备,语料来源决定语言模型的性能。Web网页中涵盖了各种最新的语言现象,为语料准备提供了最多样化的资源。但Web网页中语义完整字串通常夹杂在格式、标记、广告等无用字串中。首先介绍语言模型的训练算法和更新方法,继而提出一种从HTML文档提取用于训练语言模型的语义完整汉字字串的算法,最后给出语料提取实验结果、语言模型训练结果和语言模型的动态更新结果。为基于Web网页语料动态更新语言模型提供了一个完整的解决方案。  相似文献   

12.
Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets. Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004. This study investigates which of these datasets can provide significant training and fair evaluation for text classification. In this investigation, well-known and accurate learning models are used, including naive Bayes, random forest, K-nearest neighbor, support vector machines, and logistic regression models. We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison. The performances of the five learning models across the seven datasets are measured and compared with the performance of the same models trained on a well-known English language dataset. The analysis of the relevance and time scores shows that training the support vector machine model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time, with the accuracy of 82%.  相似文献   

13.
文中针对生物医学实体识别中存在的边界识别不准确和鲁棒性差的问题,提出了一种融合了预训练语言模型BERT与跨度标签网络的命名实体识别模型。该模型利用BERT获取文本的上下文信息,并结合跨度标签网络进行实体分类及边界判定,显著提升了实体识别的准确性。为增强模型的鲁棒性,引入对抗训练策略,通过迭代训练正常样本与对抗样本,以优化模型参数。基于CCKS2019评测数据集的实验表明,应用对抗训练方法后,其精准率、召回率及F1值均有所提升,验证了对抗训练能对提高模型的预测能力和鲁棒性的有效性。  相似文献   

14.
 本文针对训练数据较少以及在基于图的分类算法中的文本表示问题,提出了一种基于潜在语义分析技术和直推式谱图算法的文本分类方法LSASGT,该方法将潜在语义分析技术和直推式谱图算法这两种基于谱分析理论的技术有机地结合在一起,对所有训练数据和测试数据进行统一建模,挖掘数据中潜在的多种结构信息.LSASGT引入潜在语义分析技术用于构造文本图表示模型,在能够反映人的分类标准的潜在语义特征空间中,描述文本之间的语义相关性;基于这样的文本表示,利用半监督的直推式谱图算法进行文本分类.在基准英文文本分类数据集Reuters21578和中文文本分类数据集Tan-Corp上的实验结果表明,本文给出的LSASGT文本分类方法获得了较好的分类结果.  相似文献   

15.
Text classification is one of the most important topics in the fields of Internet information management and natural language processing. Machine learning based text classification methods are currently most popular ones with better performance than rule based ones. But they always need lots of training samples, which not only brings heavy work for previous manual classification, but also puts forward a higher request for storage and computing resources during the computer post-processing. Naïve Bayes algorithm is one of the most effective methods for text classification with the same problem. Only in the large training sample set can it get a more accurate result. This paper mainly studies Naïve Bayes classification algorithm for Chinese text based on Poisson distribution model and feature selection. The experimental results have shown that this method keeps high classification accuracy even in a small sample set.  相似文献   

16.
In order to alleviate the mismatch in model between training and testing samples caused by inter-language variations,adaptive Gaussian back-end based on LDOF criterion was proposed for language recognition.The local distance-based outlier factor (LDOF) criterion was defined to find the appropriate model parameters and dynamically select the training data subset similar to the testing samples from multiple class training sets.Then original back-end was adjusted to obtain a more matched recognition model.Experimental results on NIST LRE 2009 easily-confused language data set show that proposed method achieves an obvious performance improvement on both the equal error rate (ERR) and average decision cost function.  相似文献   

17.
Sentiment classification has attracted increasing interest from natural language processing. The goal of sentiment classification is to automatically identify whether a given piece of text expresses positive or negative opinion on a topic of interest. This paper presents the standpoint that uses individual model (i-model) based on artificial neural networks (ANNs) to determine text sentiment classification. The individual model consists of sentimental features, feature weight and prior knowledge base. During the training process, i-model that makes right sentimental judgment will correct those are wrong, to make more accurate prediction of text sentiment polarity. Experimental results show that the accuracy of individual model is higher than that of support vector machines (SVMs) and hidden Markov model (HMM) classifiers on movie review corpus.  相似文献   

18.
标准MIDI在电子乐器表演中有着广泛的应用.LabVIEW是一种强大的图形化的编程语言,采用LabVIEW中VI的技术来实现标准MIDI文件的制作.根据标准MIDI的格式,设计了6个子VI包括16进制字符添加器、Meta事件添加器、MIDI事件添加器、音轨封装器、头块添加器和MIDI文件写出器.最后采用一个顶层VI来根据曲谱的输入文件,调用子VI,最终实现标准MIDI文件的输出.实验结果表明,用LabVIEW来制作标准MIDI文件是一种高效又简单的可行方法.  相似文献   

19.
This paper proposes a novel approach to comment spam identification based on content analysis. Three main features including the number of links, content repetitiveness, and text similarity are used for comment spam identification. In practice, content repetitiveness is determined by the length and frequency of the longest common substring. Furthermore, text similarity is calculated using vector space model. The precisions of preliminary experiments on comment spam identification con-ducted on Chinese and English are as high as 93% and 82% respectively. The results show the validity and language independency of this approach. Compared with conventional spam filtering approaches,our method requires no training, no rule sets and no link relationships. The proposed approach can also deal with new comments as well as existing comments.  相似文献   

20.
This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence‐chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4‐times the number of n‐grams with superior performance for English text.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号