首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 46 毫秒
一种基于WordNet的短文本语义相似性算法   总被引:3,自引:0,他引:3       下载免费PDF全文
 短文本语义相似性计算在文献检索、信息抽取、文本挖掘等方面应用日益广泛.本文提出了一种短文本语义相似性计算算法ST-CW.此算法使用WordNet和Brown文集来计算文本中的概念相似性,在此基础上提出了一个新的方法综合考虑概念、句法等信息来计算短文本的语义相似性.在R&;B及Miller数据集上进行实验,实验结果验证了算法的有效性.  相似文献   

卢佳伟  陈玮  尹钟 《电子科技》2009,33(10):51-56
传统的VSM向量空间模型忽略了文本语义,构建的文本特征矩阵具有稀疏性。基于深度学习词向量技术,文中提出一种融合改进TextRank算法的相似度计算方法。该方法利用词向量嵌入的技术来构建文本向量空间,使得构建的向量空间模型具有了语义相关性,同时采用改进的TextRank算法提取文本关键字,增强了文本特征的表达并消除了大量冗余信息,降低了文本特征矩阵的稀疏性,使文本相似度的计算更加高效。不同模型的仿真实验结果表明,融合改进的TextRank算法与Bert词向量技术的方法具有更好的文本相似度计算性能。  相似文献   

复杂背景下的文字处理一直是OCR领域中的难题。基于SIFT、K-means和SVM三种算法相结合,提出了一种全新的复杂背景下的文字判断方案。在学习阶段利用SIFT进行特征点提取和描述,并用K-means进行聚类,最后用SVM进行学习。在测试阶段对图片进行了SIFT特征点提取,并用学习阶段得到的聚类中心和SVM判决函数进行最终的判断,得出图片中是否含有文字的结果。  相似文献   

A new simulation based automated CMOS analog circuit design method which applies a multi-objective non-Darwinian-type evolutionary algorithm based on Learnable Evolution Model (LEM) is proposed in this article. The multi-objective property of this automated design of CMOS analog circuits is governed by a modified Strength Pareto Evolutionary Algorithm (SPEA) incorporated in the LEM algorithm presented here. LEM includes a machine learning method such as the decision trees that makes a distinction between high- and low-fitness areas in the design space. The learning process can detect the right directions of the evolution and lead to high steps in the evolution of the individuals. The learning phase shortens the evolution process and makes remarkable reduction in the number of individual evaluations. The expert designer’s knowledge on circuit is applied in the design process in order to reduce the design space as well as the design time. The circuit evaluation is made by HSPICE simulator. In order to improve the design accuracy, bsim3v3 CMOS transistor model is adopted in this proposed design method. This proposed design method is tested on three different operational amplifier circuits. The performance of this proposed design method is verified by comparing it with the evolutionary strategy algorithm and other similar methods.  相似文献   

Wireless Personal Communications - Multi-label text classification is a challenging task in many real applications. Mostly, in all the traditional techniques, word2vec is used to show the...  相似文献   

基于模糊同质性映射的文本检测方法   总被引:2,自引:0,他引:2  
视频图像中的文本是从语义层次对视频图像内容进行描述的非常有效信息,文本检测为基于语义的图像检索提供了条件。该文提出了一种基于模糊逻辑和同质映射相结合的文本检测方法,首先利用最大信息熵准则将原始图像模糊化;然后构造基于边缘信息和纹理信息的图像同质性,并利用它将图像映射到模糊同质性空间;最后在模糊同质性空间通过纹理分析检测文本区域。与直接在图像空间域中提取特征的文本检测方法相比,该方法对复杂背景视频图像的文本检测取得了更好的效果,并且适用于多种类型的视频图像中文本的检测。  相似文献   

A semantic-extension-based algorithm for short texts is proposed, by involving the Word2vec and the LDA model, to improve the performance of classification, which is frequently deteriorated by semantic dependencies and scarcity of features. For every keyword within a short text, weighted synonyms and related words can be generated by the Word2Vec and LDA model, respectively, and subsequently be inserted to extend the short text to a reasonable length. We not only have established a criterion by means of similarity estimation to determine whether a sentence should be extended, we designed a scheme to choose the number of extended words. The extended text will be classified. Experimental results show that, the classification performance of the proposed algorithm, in terms of the precision rate, is approximately 5% higher than that of the TF-IDF model and approximately 10%higher than that of the VSM method.  相似文献   

提升网络感知和客户满意度一直是网络优化的工作主线,而KPI指标无法反映网络真实感知情况,传统通过调研了解客户满意度的方式存在很大局限性。本文深入研究了KPI指标和网络真实感知的映射关系,通过大数据挖掘和机器学习建模实现了感知权重因子的量化,以此为基础完成了一种基于机器学习的网络感知评估方法,为客户满意度提升工作提供了全新的分析思路和支撑手段。  相似文献   

Mobile Networks and Applications - Machine learning has been increasingly used for making informed public policy decisions, however, its application in the area of social protection in developing...  相似文献   

Wireless Personal Communications - Consumers increasingly rely on online reviews to assist them in their buying decisions. The rising popularity of e-commerce websites, hotel reviews, and social...  相似文献   

张鑫姝  郭戈  程娟 《电子技术》2010,47(4):22-24
本文提出一种视频文本语义信息分析的新思路,即在文本区域提取后结合文种识别理论来提取新闻视频的来源和身份等高级语义信息,同时文种识别结果可为OCR的选择提供先验知识。主要工作包括:1)针对视频中的字幕,提出一种基于时-空分析的算法来检测视频中的字幕,然后对检测到的字幕通过投影分析进行定位、增强和二值化;2)对提取到的文本区域提出一种基于PCA和小波变换的文种识别算法。  相似文献   

业务差异化和良好的用户感知是LTE网络以及未来5G网络运营的关键,目前的网络KPI指标体系主要是用于评估网络运维情况,并不能真正反映用户对网络和业务使用的实际满意程度。提出一种细分业务用户感知QoE评估方法,通过对LTE业务KQI指标以及XDR、MR等相关数据信息的关联分析,选取出特定业务QoE评价特征指标集,采用机器学习方法,建立细分业务的QoE评估模型,并以特定视频业务为例,给出QoE评估的特征指标选取和建模过程。该QoE评估方法可以对用户业务体验感知进行细粒度、高准确度的实时评估,为后续基于用户感知的网络优化、网络运营提供准确的数据支撑。  相似文献   

A Method for Anomaly Detection of User Behaviors Based on Machine Learning   总被引:1,自引:0,他引:1  
1Introduction Intrusiondetectiontechniquescanbecategorizedinto misusedetectionandanomalydetection.Misusedetec tionsystemsmodelattacksasspecificpatterns,anduse thepatternsofknownattackstoidentifyamatchedac tivityasanattackinstance.Anomalydetectionsystems u…  相似文献   

针对现有Android恶意代码检测方法容易被绕过的问题,提出了一种强对抗性的Android恶意代码检测方法.首先设计实现了动静态分析相结合的移动应用行为分析方法,该方法能够破除多种反分析技术的干扰,稳定可靠地提取移动应用的权限信息、防护信息和行为信息.然后,从上述信息中提取出能够抵御模拟攻击的能力特征和行为特征,并利用一个基于长短时记忆网络(Long Short-Term Memory,LSTM)的神经网络模型实现恶意代码检测.最后通过实验证明了本文所提出方法的可靠性和先进性.  相似文献   

文本分类任务中,不同领域的文本很多表达相似,具有相关性的特点,可以解决有标签训练数据不足的问题.采用多任务学习的方法联合学习能够将不同领域的文本利用起来,提升模型的训练准确率和速度.该文提出循环卷积多任务学习(MTL-RC)模型用于文本多分类,将多个任务的文本共同建模,分别利用多任务学习、循环神经网络(RNN)和卷积神经网络(CNN)模型的优势获取多领域文本间的相关性、文本长期依赖关系、提取文本的局部特征.基于多领域文本分类数据集进行丰富的实验,该文提出的循环卷积多任务学习模型(MTL-LC)不同领域的文本分类平均准确率达到90.1%,比单任务学习模型循环卷积单任务学习模型(STL-LC)提升了6.5%,与当前热门的多任务学习模型完全共享多任务学习模型(FS-MTL)、对抗多任务学习模型(ASP-MTL)、间接交流多任务学习框架(IC-MTL)相比分别提升了5.4%,?4%和2.8%.  相似文献   

文本分类任务中,不同领域的文本很多表达相似,具有相关性的特点,可以解决有标签训练数据不足的问题.采用多任务学习的方法联合学习能够将不同领域的文本利用起来,提升模型的训练准确率和速度.该文提出循环卷积多任务学习(MTL-RC)模型用于文本多分类,将多个任务的文本共同建模,分别利用多任务学习、循环神经网络(RNN)和卷积神...  相似文献   


In cloud computing, more often times cloud assets are underutilized because of poor allocation of task in virtual machine (VM). There exist inconsistent factors affecting the scheduling tasks to VMs. In this paper, an effective scheduling with multi-objective VM selection in cloud data centers is proposed. The proposed multi-objective VM selection and optimized scheduling is described as follows. Initially the input tasks are gathered in a task queue and tasks computational time and trust parameters are measured in the task manager. Then the tasks are prioritized based on the computed measures. Finally, the tasks are scheduled to the VMs in host manager. Here, multi-objectives are considered for VM selection. The objectives such as power usage, load volume, and resource wastage are evaluated for the VMs and the entropy is calculated for the measured objectives and based on the entropy value krill herd optimization algorithm prioritized tasks are scheduled to the VMs. The experimental results prove that the proposed entropy based krill herd optimization scheduling outperforms the existing general krill herd optimization, cuckoo search optimization, cloud list scheduling, minimum completion cloud, cloud task partitioning scheduling and round robin techniques.



Use of internet of things (IoT) in different fields including smart cities, health care, manufacturing, and surveillance is growing rapidly, which results in massive amount of data generated by IoT devices. Real-time processing of large-scale data streams is one of the main challenges of IoT systems. Analyzing IoT data can help in providing better services, predicting trends and timely decision making for industries. The systematic structure of IoT data follows the pattern of big data. In this paper, a novel approach is proposed in which big data tools are used to perform real-time stream processing and analysis on IoT data. We have also applied Spark’s built-in support of the machine learning library in order to make real-time predictions. The efficiency of the proposed system is evaluated by conducting experiments and reporting results on the case scenario of IoT based weather station.


基于中文分词的文本相似度动态规划算法   总被引:1,自引:0,他引:1  
肖侃  谭长庚  丁玲 《现代电子技术》2011,34(8):72-74,78
针对传统的基于动态规划的对论文的文本相似度计算的不足,提出了一种基于中文分词和动态规划的论文文本相似度计算方法,并对此进行了讨论。该方法克服了一般基于动态规划的计算方法所具有的效率低,判断准确率低的缺点。通过对实际中的论文数据库中论文进行测试和分析,该算法能提高计算准确率,并一定程度上提高了运算速度,可以应用于论文防抄袭系统中。  相似文献   

有监督主题模型的SLDA-TC文本分类新方法   总被引:1,自引:0,他引:1       下载免费PDF全文
本文提出了一种有监督主题模型的SLDA-TC(Supervised LDA-Text Categorization)文本分类方法,引入主题-类别概率分布参数,识别主题-类别的语义信息;提出SLDA-TC-Gibbs主题采样新方法,对每个词的隐含主题采样,只从该词所在文档的同类其它文档中采样,并给出了理论推导;另外,其主题数只需略大于类别数.实验表明,对比LDA-TC(LDA-Text Categorization)和SVM算法,本方法能提高分类精度和时间性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号