首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
建立了一种基于股票情感词典与LDA分析股票文本情感倾向的模型。针对股票文本情感分析中情感词典不全面与句子分析片面的问题,构建较为全面的股票情感词典,同时以句子的倾向性、程度性与相关性三方面分析股票文本情感。引入针对股票的词语、程度性词语与转折性词语构建较为全面的情感词典;抽取预处理之后的股票文本句子的情感词;利用句子算法计算句子倾向、程度向量,并对句子向量利用支持向量机(SVM)和K均值算法分类;利用LDA(latent dirichlet allocation)对情感词计算文档 主题、文档 词语概率分布,以此概率分布获取句子的相关性;综合句子的倾向性、程度性、相关性计算句子情感;最后,通过句子情感获取股票文本的情感倾向比例。通过对百度新闻经济板块收集的股票文本进行实验并与其他算法比较,该模型对句子与文本分类准确率提高到82.78%与84.14%。  相似文献   

2.
Emotion prediction has been a core task in affective computing, which aims at finding the thorough human mental states by analyzing people's activities. In this paper, we focus on predicting emotions in the public online blogs from different people, by extracting as many reasonable emotions for each blog sentence as possible. Concretely, we consider three different perspectives for analyzing the multiple emotions in a sentence: (i) predict sentence emotions by examining the emotion related topics in a global sense; (ii) predict the sentence emotions from the context‐sensitive word emotions; and (iii) predict sentence emotions by considering the emotional significance in the local bag of words. We build different probabilistic models from each perspective, to separately generate the sentence emotion probabilities. We then integrate these probabilistic models to jointly predict the emotion probabilities. Because the component models are based on different emotional assumptions with distinct features, the integrated predictions should predict emotions from more general perspectives and therefore yield better results. In the experiment, we employ different evaluation criteria to compare the multi‐emotion predictions from the single and integrated models. Compared to the results in the baseline model, our bi‐integrated models achieve 8.69% higher Micro F1 and 7.78% higher Macro F1 scores, on average. Moreover, our tri‐integrated model acquires 10.00% higher Micro F1 and 9.19% higher Macro F1 scores than the baseline results, which proves our assumption, and suggests interesting features in the different emotion perspectives. © 2014 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

3.
事件触发词检测和分类是事件抽取中至关重要的第一步。传统的抽取和分类方法往往倾向于监督学习方法,如条件随机场、SVM等,但由于这类方法需要繁重的人工标注且受限于预先定义好的类别,因此很难在开放领域中得到应用。提出了一种非监督的事件触发词检测和分类方法,利用主题模型获取候选触发词在主题上的分布,然后利用二值状态自动机模型捕获高概率的主题,从而筛选出真正的事件触发词和相应的分类。在大规模的未标注新浪新闻数据集上的实验结果充分验证了本文方法的有效性。  相似文献   

4.
混凝土坝施工管理知识多以文本的形式记录存储,具有数据量大、碎片化严重、层次性差等特点。本文从非结构化文本数据中智能挖掘施工知识,理清知识间的逻辑关系,提升知识的应用效率是混凝土坝施工管理面临的重要问题。本文提出一种混凝土坝施工管理知识图谱智能生成方法,将海量文本数据转化为可直接利用的知识。融合字词向量、BiLSTM-CRF(Bi-directional Long Short-Term Memory-Conditional Random Field)网络、Attention机制,建立混凝土坝施工管理实体智能识别模型,强化施工实体特征,获取混凝土坝施工管理文本中的实体词语。结合已识别的施工实体,定义实体间关系类型,利用互信息提取实体关系,组合形成施工知识链,构建混凝土坝施工管理知识图谱。该方法应用于实际混凝土坝施工管理文本分析中,经过计算得到混凝土坝施工管理实体智能识别模型的F1值为92.48%,优于其他实体识别模型;利用已识别实体间的关联关系,建立了混凝土坝施工管理知识图谱,形成基于知识图谱的施工知识检索机制,实现施工知识的快速提取,提高了施工知识的应用效率。  相似文献   

5.
The research on Chinese‐Japanese machine translation has been lasting for many years, and now this research field is increasingly thoroughly refined. In practical machine translation system, the processing of a simple and short Chinese sentence has somewhat good results. However, the translation of complex long Chinese sentence still has difficulties. For example, these systems are still unable to solve the translation problem of complex ‘BA’ sentences. In this article a new method of parsing of ‘BA’ sentence for machine translation based on valency theory is proposed. A ‘BA’ sentence is one that has a prepositional word ‘BA’. The structural character of a ‘BA’ sentence is that the original verb is behind the object word. The object word after the ‘BA’ preposition is used as an adverbial modifier of an active word. First, a large number of grammar items from Chinese grammar books are collected, and some elementary judgment rules are set by classifying and including the collected grammar items. Then, these judgment rules are put into use in actual Chinese language and are modified by checking their results instantly. Rules are checked and modified by using the statistical information from an actual corpus. Then, a five‐segment model used for ‘BA’ sentence translation is brought forward after the above mentioned analysis. Finally, we applied this proposed model into our developed machine translation system and evaluated the experimental results. It achieved a 91.3% rate of accuracy and the satisfying result verified effectiveness of our five‐segment model for ‘BA’ sentence translation. Copyright © 2007 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

6.
This paper proposes a novel approach to the generation of Chinese sentences from ill-formed Taiwanese Sign Language (TSL) for people with hearing impairments. First, a sign icon-based virtual keyboard is constructed to provide a visualized interface to retrieve sign icons from a sign database. A proposed language model (LM), based on a predictive sentence template (PST) tree, integrates a statistical variable n-gram LM and linguistic constraints to deal with the translation problem from ill-formed sign sequences to grammatical written sentences. The PST tree trained by a corpus collected from the deaf schools was used to model the correspondence between signed and written Chinese. In addition, a set of phrase formation rules, based on trigger pair category, was derived for sentence pattern expansion. These approaches improved the efficiency of text generation and the accuracy of word prediction and, therefore, improved the input rate. For the assessment of practical communication aids, a reading-comprehension training program with ten profoundly deaf students was undertaken in a deaf school in Tainan, Taiwan. Evaluation results show that the literacy aptitude test and subjective satisfactory level are significantly improved.  相似文献   

7.
目前电网企业的电力设备供应商满意度评价主要依赖于人工统计和指标计算,其准确性受评价人员和评价内容的影响较大.以电力业务平台的对话文本为研究对象,在扩充已有电力本体词典的词条和属性的基础上,建立了基于文本挖掘技术的电力设备供应商评价模型.首先提出了基于Transformer的双向编码器下句预测与余弦相似度加权的单轮对话文本下句预测分析方法,建立了对话中断交叉处理流程和供应商识别规则,实现了电力对话文本主题归纳;然后考虑对话文本语义情感的复杂性,提出了对话情感分析规则,建立了供应商评价模型.最后通过算例验证了所提方法的准确性,结果表明基于对话文本智能挖掘的电力设备供应商评价,具有可行性和有效性,可以作为目前评价方法的有益补充.  相似文献   

8.
基于空间金字塔匹配模型(SPM)的图像分类中,构建视觉词直方图时对图像中所有特征都是同等对待,没有考虑到图像中不同区域特征的影响因子.显然,图像中目标区域比背景区域的特征重要性要大,为了避免图像中不重要区域的特征给图像分类带来干扰,提出了一种优化空间金字塔模型的图像分类方法.首先利用模拟退火算法与遗传算法相结合的聚类算法(SAGA)构造视觉词典,然后利用视觉注意机制构造加权的视觉词直方图.该方法在不丢失图像的全局信息的情况下,还考虑到了图像中各个区域对图像分类的重要性.最后将图像的表示向量使用SVM训练和分类.实验表明,本方法能够提高图像分类的准确率.  相似文献   

9.
We developed a Japanese-language, rapid synthesizing software application for use on a personal digital assistant. It has an unrestricted vocabulary and can synthesize words and sentences within 3 s. Eight hundred common sentences and words are preregistered. By touching the first character at the head of a preregistered sentence or word from an on-screen Kana (Japanese alphabet) chart, the user can select the sentence or word to be spoken. Characters on the Kana chart can also be input sequentially. Two Japanese subjects with speech impairments rated the device highly for its portability and quick response. Whereas communication previously had to be done by writing or sign language, it was easy for listeners with or without specialized training in communication with persons with speech impairments to understand the output from this device, making conversation easier which, in turn, improved the quality of life and social activity of these persons with speech impairments.  相似文献   

10.
高压直流输电线路的行波保护存在对装置采样率要求高及耐受过渡电阻能力差等问题。作为后备保护的纵联电流差动保护,为了防止线路分布电容等问题导致的误动,失去了速动性的优点,动作时间较长。利用HVDC线路发生区内外故障时,两端保护装置检测的电压和电流突变量的极性差异,提出基于Hilbert-Huang变换的突变量方向纵联保护方法。在分析不同故障时电压和电流突变量相位差别的基础上,采用Hilbert-Huang变换求取突变量相位差,识别两者的极性差异,进而判断故障发生的方向。基于PSCAD/EMTDC搭建了高压直流输电仿真模型,仿真结果表明,所提方法在各种故障情况下都能够实现保护的快速识别,可靠性高,且受过渡电阻的影响较小。  相似文献   

11.
Near-duplicate image retrieval aims to find all images that are duplicate or near duplicate to a query image. One of the most popular and practical methods in near-duplicate image retrieval is based on bag-of-words (BoW) model. However, the fundamental deficiency of current BoW method is the gap between visual word and image’s semantic meaning. Similar problem also plagues existing text retrieval. A prevalent method against such issue in text retrieval is to eliminate text synonymy and polysemy and therefore improve the whole performance. Our proposed approach borrows ideas from text retrieval and tries to overcome these deficiencies of BoW model by treating the semantic gap problem as visual synonymy and polysemy issues. We use visual synonymy in a very general sense to describe the fact that there are many different visual words referring to the same visual meaning. By visual polysemy, we refer to the general fact that most visual words have more than one distinct meaning. To eliminate visual synonymy, we present an extended similarity function to implicitly extend query visual words. To eliminate visual polysemy, we use visual pattern and prove that the most efficient way of using visual pattern is merging visual word vector together with visual pattern vector and obtain the similarity score by cosine function. In addition, we observe that there is a high possibility that duplicates visual words occur in an adjacent area. Therefore, we modify traditional Apriori algorithm to mine quantitative pattern that can be defined as patterns containing duplicate items. Experiments prove quantitative patterns improving mean average precision (MAP) significantly.  相似文献   

12.
混凝土坝施工信息多以文档文本的形式呈现,其体量大、分布广、内在关系复杂,人工操作难以准确、高效地提取信息知识内容,理清错综复杂的施工信息关系。在自然语言处理技术中,命名实体是文本信息知识的载体,实现精确快速的实体识别是施工知识挖掘的重要前提。本文提出一种融合深度学习与关联规则技术的混凝土坝施工文档知识智能识别及挖掘分析方法。该方法耦合双向长短期记忆神经网络(bi-directional long-short term memory,Bi-LSTM)与条件随机场(conditional random field,CRF),定义混凝土坝施工实体类型,构建命名实体识别模型,形成混凝土坝施工实体知识集合;在此基础上,考虑施工文本表达规律及实体类型,预定义实体之间关系,确定施工实体组合形式,形成实体关联规则提取技术;以实体关联规则提取技术为导向,改进Apriori算法计算频繁项集,获得实体间的强关联规则。该方法应用于实际混凝土坝施工监理周报中,经过计算得到命名实体识别的精确率为86.42%,验证了该方法的准确性。利用改进Apriori算法分析实体间的关联规则,证明了改进算法的优势,有助于提升混凝土坝施工文档知识分析的智能化与精细化水平。  相似文献   

13.
电网故障处置预案是电网故障处置的重要参考,对电网故障处置预案文本中各类电力设备、名称编号等细粒度的关键实体信息进行抽取,是实现计算机学习理解预案内容并进一步支撑故障处置智能化的重要基础。文中提出一种基于深度学习的电网故障处置预案文本命名实体识别技术,首先采用字向量表征预案文本,然后将注意力机制以及双向长短期记忆网络相结合,有所侧重地提取实体词深层字符特征,最后采用条件随机场求解最优序列化的标注。算例表明:文中所提预案文本命名实体识别模型不依赖人工特征,能够自动高效地提取文本特征,准确识别预案文本中细粒度的实体词,满足预案文本中关键实体信息精确定位和识别的要求。  相似文献   

14.
进度控制是水电工程管理的重要任务,及时总结进度管理信息有助于工程进度计划的制定与调整.水电工程建设中的进度信息多以半结构化、非结构化的文本形式呈现,增加了信息提取难度,实现水电工程进度文本信息自动化与智能化挖掘是当前亟待解决的问题.本文提出基于改进LDA的水电工程进度信息智能提取方法,智能提取进度管理文本中的关键信息....  相似文献   

15.
树障是高压输电线路在复杂山区植被茂密区域运行所面临的主要安全威胁之一,不同树种生长周期不同,一定时间内的树障风险也不同。为了大范围准确识别林区的树木种类,文中提出了一种基于机载雷达测量技术的树木种类快速识别方法。首先利用机载雷达对输电线路地区地面进行快速点云数据获取,并且预处理数据得到单棵树木的冠层点云;随后建立冠层的空间属性点云特征量,包括树冠高度、树冠体积、树冠点云密度、冠层激光反射强度以及树冠形貌特征;最后根据树木的空间点云特征建立树木的种类K均值聚类识别模型。结果表明:对于该地区生长的树木,5种空间点云特征具有良好的识别效果,最终建立的树木种类K均值聚类识别模型对于验证数据的准确率达到了85.9%,Kappa系数0.812。输电线路下方植被种类的快速识别对于树障风险评估和预警具有重要意义。  相似文献   

16.
Influenza is an acute respiratory illness and widespread activity that occurs every year. Detection and prevention of influenza in its earliest stage would reduce the spread range of the illness. Sina microblog is a popular microblogging service in China, which can be treated as perfect reference sources for flu detection because of its real‐time character. A large number of active users post about their daily life continually. In this paper, we investigate the real‐time flu detection problem and propose a flu detection model with emotion factors and semantic information. First, we extract flu‐related microblog posts automatically in real time by adopting support vector machine (SVM) filter and semantic features. We use association rule mining to extract strongly associated features as additional features for posts to overcome the limitation of 140 words, including sentiment information, which can help us to classify the posts without flu‐related features. Then, the conditional random field model is revised and applied to detect the transition time of flu so that we can find out which place is more likely to have influenza outbreak and when it is more likely to have influenza outbreak in a city or province in China. Experimental results on detecting flu situation during certain times in some locations show the robustness and effectiveness of the proposed model, which might help health authorities in predicting flu outbreak ahead and take timely control action and response. © 2016 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

17.
The resolution of overlapping ambiguity strings (OAS) is studied based on the maximum entropy model. There are two model outputs, where either the first two characters form a word or the last two characters form a word. The features of the model include one word in context of OAS, the current OAS and word probability relation of two kinds of segmentation results. OAS in training text is found by the combination of the FMM and BMM segmentation method. After feature tagging they are used to train the maximum entropy model. The People Daily corpus of January 1998 is used in training and testing. Experimental results show a closed test precision of 98.64 % and an open test precision of 95.01 %. The open test precision is 3.76 % better compared with that of the precision of common word probability method. __________ Translated from Transactions of Beijing Institute of Technology, 2005, 25(7): 590–593 (in Chinese)  相似文献   

18.
音乐的语义标注旨在使用词语或标签自动将一段音乐标注为一个语义标注集。通常,人们将多标注学习转换为独立二进制分类问题解决,再给每一个语义标注单独建模。为了得到更好的分类结果,应考虑标注之间的依赖关系。文章中尝试共同的音乐语义标注,对单标注和具有高相关性的成对标注同时建立模型。使用多标注条件随机场(CRF)模型直接参数化多标注分类中的共现标注。用到两种CRF模型,一种是使用无条件标注相关的共同多标注分类(CML)模型,另一种是使用有条件标注相关的考虑特征的共同多标注分类器(CMLF)模型。实验表明,将这两种模型用到CAL10K数据集上,平均精确度、宏 F1分数和微 F1分数比用高斯混合模型(GMM)给单个标注建模要高。  相似文献   

19.
电网调控告警识别是实现智能电网调度的重要环节.为提高电网调控告警识别的准确率,针对电网数据量庞大、有效信息提取困难、传统知识库知识迁移能力较差等问题,提出一种基于BERT-DSA-CNN和知识库的电网调控在线告警识别方法.首先在自然语言处理-深度学习的文本数据挖掘架构基础上,经过分词、去停用词等步骤,利用BERT模型获...  相似文献   

20.
防止敏感重要的文档资料泄漏是电力行业信息安全中一项重要的工作。采用二叉排序树技术对基础词组库和过滤关键字进行预排序,采用最大后缀匹配方式对需要检测的文本字符串进行中文分词,再通过关键字二叉排序树进行检查过滤,以达到安全高效检测敏感关键字的目的。经性能分析测试,该技术在性能和准确率上都有很好的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号