首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Research in financial domain has shown that sentiment aspects of stock news have a profound impact on volume trades, volatility, stock prices and firm earnings. In-depth analysis of stock news is now sourced from financial reviews by various social networking and marketing sites to help improve decision making. Nonetheless, such reviews are in the form of unstructured text, which requires natural language processing (NLP) in order to extract the sentiments. Accordingly, in this study we investigate the use of NLP tasks in effort to improve the performance of sentiment classification in evaluating the information content of financial news as an instrument in investment decision support system. At present, feature extraction approach is mainly based on the occurrence frequency of words. Therefore low-frequency linguistic features that could be critical in sentiment classification are typically ignored. In this research, we attempt to improve current sentiment analysis approaches for financial news classification by focusing on low-frequency but informative linguistic expressions. Our proposed combination of low and high-frequency linguistic expressions contributes a novel set of features for sentiment classification. The experimental results show that an optimal Ngram feature selection (combination of optimal unigram and bigram features) enhances sentiment classification accuracy as compared to other types of feature sets.

  相似文献   

2.
This letter introduces a new method to automatically acquire paraphrases using bilingual corpora. It utilizes the bilingual dependency relations obtained by projecting a monolingual dependency parse onto the other language's sentence based on statistical alignment techniques. Since the proposed paraphrasing method can clearly disambiguate the sense of the original phrases using the bilingual context of dependency relations, it would be possible to obtain interchangeable paraphrases under a given context. Through experiments with parallel corpora of Korean and English language pairs, we demonstrate that our method effectively extracts paraphrases with high precision, achieving success rates of 94.3% and 84.6% respectively, for Korean and English.  相似文献   

3.
基于小波系数相关性的空域隐写分析方法   总被引:2,自引:2,他引:0  
基于小波系数相关性,提出了一类具有较高正确检测率的空域隐写通用型检测方法。首先利用互信息分析秘密信息嵌入对图像小波系数在尺度方向和空间方向相关性的影响,并使用马尔可夫模型挖掘小波系数层内和层间相关性,提取转移概率矩阵作为特征;然后对提取的特征进行加权融合并结合Fisher线性判别(FLD)分类器进行分类。针对LSB(least significant bit)、LSBmatching和SM(stochastic modulation)隐写算法的实验表明,在不增加计算复杂度的情况下,本文方法相比现有的典型空域隐写通用型检测方法,正确检测率有明显提高。  相似文献   

4.
语言模型技术作为信息检索领域的一个新的建模技术,已逐渐成为当代语言信息处理的主流技术之一。将该技术应用于话题跟踪研究中,对语言模型理论进行了介绍,详细描述了如何基于语言模型实现话题跟踪,构建了2个话题跟踪系统,分别利用向量空间模型和语言模型进行建模,并对它们的性能进行了比较。实验结果表明,语言模型比向量空间模型更适合于话题跟踪任务。  相似文献   

5.
目前知识图谱研究主要面向信息检索、自然语言理解等领域,在推荐系统中融合知识图谱成为推荐领域学者广泛关注的问题。为了解决单一知识图谱忽略的丰富知识信息,该文对知识图谱进行多模态扩展,并提出一种融合知识图谱与图片特征的推荐模型(KG-I)。不同于其他基于知识图谱的推荐算法,该方法增加视觉嵌入、知识嵌入和结构嵌入去挖掘用户项目之间的隐式反馈信息。该模型利用深度游走模型(Deep Walk)捕获空间结构的方法和波纹网络模型(RippleNet)挖掘知识图谱的知识表达的思想,并且考虑图片对用户偏好的影响,有效地将信息进行融合,并在真实数据集上与其他模型实验比较,研究多种特征的影响,分析不同稀疏度数据下的表现。结果表明,融合知识图谱与图片特征的个性化推荐模型完全优于其他的对比算法并且有效缓解数据稀疏情况。  相似文献   

6.
Chinese Phonetic-Character Conversion(CPCC) is an important issue in Chinese speech recognition and Chinese sentence keyboard input system. The approaches based on large corpus statistic Markov language model (such as bigram, trigram) become more and more popular today. This paper presents an improved Chinese word bigram, space-compressed Chinese word bigram, which stores the bi-word co-articulation frequency in the form of the bi-character co-articulation frequency. The bi-word co-articulation frequency is estimated from the bi-character co-articulation frequency library. The CPCC experiment with the improved Chinese word bigram shows: it can reach a higher correct conversion ratio with less space occupation.  相似文献   

7.
8.
Chinese Phonetic-Character Conversion(CPCC) is an important issue in Chinese speech recognition and Chinese sentence keyboard input system. The approaches based on large corpus statistic Markov language model (such as bigram, trigram) become more and more popular today. This paper presents an improved Chinese word bigram, space-compressed Chinese word bigram, which stores the bi-word co-articulation frequency in the form of the bi-character co-articulation frequency. The bi-word co-articulation frequency is estimated from the bi-character co-articulation frequency library. The CPCC experiment with the improved Chinese word bigram shows: it can reach a higher correct conversion ratio with less space occupation.  相似文献   

9.
多域服务环境下的分布式故障诊断算法   总被引:1,自引:0,他引:1  
多域服务环境下,域间故障传播导致的跨域症状会对故障诊断算法性能造成影响。该文提出了多域服务环境下的分布式依赖模型,在该模型基础上提出分布式故障诊断算法,并从减小通信开销、更准确的症状引发评估函数和虚假症状概率3个方面对算法进行了改进。仿真结果显示,该文算法可以有效诊断多域环境下的服务故障。  相似文献   

10.
冯冲  廖纯  刘至润  黄河燕 《电子学报》2016,44(10):2471-2476
门户网站、博客和论坛中的新闻性文章往往都带有自己的情感倾向性,而情感关键句的识别对判断文章的情感倾向、了解社会动态和舆情状况有着非常重要的作用。传统方法主要基于词汇特征,未能充分利用潜在的句法和语义信息。本文提出了一种基于词汇语义和句法依存的情感关键句识别方法。该方法首先通过构建情感词典和关键词词典获取词汇语义信息,然后利用一种新颖的面向情感关键句提取算法获取句法依存信息,最后把情感关键句的识别问题看成一个是否为情感关键句的二分类问题加以解决。在COAE2014公开评测数据集上进行的实验表明本文方法的准确率和召回率均显著优于其他方法。  相似文献   

11.
语义通信是一种全新的通信范式,可以从语义级别提高通信的可靠性,解决通信带宽与频谱资源受限的问题。针对语义通信中语义重要性划分这一问题,本文提出了一种基于依存句法分析的分层语义通信系统。首先,为了获取传输语句内部的依存句法关系,本文设计了一种基于图解码的依存句法分析模型,用于提取传输语句对应的依存句法树。其次,本文根据提取到的依存句法树提出了一种语义分层方法,并根据信道质量对不同层级的语义信息进行选择传输,从而保证关键语义的准确传递。此外,本文还引入了ERNIE语言模型,结合依存句法关系提高接收端的语义恢复能力。仿真结果表明:本文提出的语义分层方法可以有效提取传输语句的关键语义信息。与传统通信系统相比,本文所提系统显著提升了在低信噪比下的通信可靠性。  相似文献   

12.
13.
Named entity recognition (NER) continues to be an important task in natural language processing because it is featured as a subtask and/or subproblem in information extraction and machine translation. In Urdu language processing, it is a very difficult task. This paper proposes various deep recurrent neural network (DRNN) learning models with word embedding. Experimental results demonstrate that they improve upon current state‐of‐the‐art NER approaches for Urdu. The DRRN models evaluated include forward and bidirectional extensions of the long short‐term memory and back propagation through time approaches. The proposed models consider both language‐dependent features, such as part‐of‐speech tags, and language‐independent features, such as the “context windows” of words. The effectiveness of the DRNN models with word embedding for NER in Urdu is demonstrated using three datasets. The results reveal that the proposed approach significantly outperforms previous conditional random field and artificial neural network approaches. The best f‐measure values achieved on the three benchmark datasets using the proposed deep learning approaches are 81.1%, 79.94%, and 63.21%, respectively.  相似文献   

14.
15.
Content-based image retrieval systems are meant to retrieve the most similar images of a collection to a query image. One of the most well-known models widely applied for this task is the bag of visual words (BoVW) model. In this paper, we introduce a study of different information gain models used for the construction of a visual vocabulary. In the proposed framework, information gain models are used as a discriminative information to index image features and select the ones that have the highest information gain values. We introduce some extensions to further improve the performance of the proposed framework: mixing different vocabularies and extending the BoVW to bag of visual phrases. Exhaustive experiments show the interest of information gain models on our retrieval framework.  相似文献   

16.
针对关联模型在复杂装备测试性评估中对不确定问题描述与分析的缺陷,给出了基于贝叶斯网络的测试性模型,利用条件概率描述系统的不确定信息.在基于统计数据的测试性建模与评估中,由于故障征兆与故障原因统计的不完全性,致使测试性建模属于数据不完备情况下的结构和参数学习问题,针对该问题利用离散粒子群算法,通过测试性数据完备化,以贝叶斯测度为计分值,实现贝叶斯网络结构学习.最后通过实例验证了算法的正确性和有效性.  相似文献   

17.
Rate–distortion optimization (RDO) is utilized to select the optimal coding parameters in multi-view video coding (MVC), which employs a Lagrange multiplier to balance the relationship between the distortion and the bitrate. In this paper, an efficient RDO method for the dependent view (DV) in multi-view video (MVV) is proposed based on inter-view dependency. First of all, by investigating the sources of the distortion in the DV, a new distortion model for the DV is established. In addition, based on the proposed distortion model, an efficient Lagrangian multiplier decision for B frame is proposed by considering the inter-view dependency. Finally, the optimized Lagrangian multiplier for P frame is designed using the scaling factor which is deduced to have a linear relationship with the disparity between I frame and P frame. Experiment results demonstrate that compared with the original HTM-16.0 encoder, the proposed overall method reduces 12.19% BD-rate for the DV on average, bringing 0.40 dB BD-PSNR gain.  相似文献   

18.
基于统计机器翻译模型的查询扩展   总被引:1,自引:0,他引:1  
在搜索引擎等实际的信息检索应用中,用户提交的查询请求通常都只包含很少的几个关键词,这会引起相关文档与用户查询之间的词不匹配问题,对检索性能有较严重的负面影响。该文在分析了查询产生模型的基础上,提出了一种新的基于统计机器翻译的查询扩展方法。通过统计机器翻译模型提取文档集中与查询词相关联的词,用以进行查询扩展。在TREC数据集上的试验结果表明:基于统计翻译的查询扩展方法不仅比不扩展的语言模型方法始终有12%~17%的提高,而且比流行的查询扩展方法-伪反馈也具有可比的平均准确率。  相似文献   

19.
This letter presents a new discriminative model for Information Retrieval (IR), referred to as Ordinal Regression Model (ORM). ORM is different from most existing models in that it views IR as ordinal regression problem (i.e. ranking problem) instead of binary classification. It is noted that the task of IR is to rank documents according to the user information needed, so IR can be viewed as ordinal regression problem. Two parameter learning algorithms for ORM are presented. One is a perceptron-based algorithm. The other is the ranking Support Vector Machine (SVM). The effectiveness of the proposed approach has been evaluated on the task of ad hoc retrieval using three English Text REtrieval Conference (TREC) sets and two Chinese TREC sets. Results show that ORM significantly outperforms the state-of-the-art language model approaches and OKAPI system in all test sets; and it is more appropriate to view IR as ordinal regression other than binary classification.  相似文献   

20.
基于项权值排序挖掘的跨语言查询扩展   总被引:1,自引:0,他引:1       下载免费PDF全文
黄名选  蒋曹清 《电子学报》2020,48(3):568-576
为了改善自然语言处理应用中长期存在的主题漂移和词不匹配问题,本文首先提出一种加权项集支持度计算方法和基于项权值排序的剪枝方法,给出面向查询扩展的基于项权值排序的加权关联规则挖掘算法,讨论关联规则混合扩展、后件扩展和前件扩展模型,最后提出基于项权值排序挖掘的跨语言查询扩展算法.该算法采用新的支持度和剪枝策略挖掘加权关联规则,根据扩展模型从规则中提取高质量扩展词实现跨语言查询扩展.实验结果表明,与现有基于加权关联规则挖掘的跨语言扩展算法比较,本文扩展算法能有效遏制查询主题漂移和词不匹配问题,可用于各种语言的信息检索以改善检索性能,扩展模型中后件扩展获得最优检索性能,混合扩展的检索性能不如后件扩展和前件扩展,支持度对后件扩展更有效,置信度更有利于提升前件扩展和混合扩展的检索性能.本文挖掘方法可用于文本挖掘、商务数据挖掘和推荐系统以提高其挖掘性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号