首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
4.
In this paper, a novel framework, named as global-local feature attention network with reranking strategy (GLAN-RS), is presented for image captioning task. Rather than only adopting unitary visual information in the classical models, GLAN-RS explores the attention mechanism to capture local convolutional salient image maps. Furthermore, we adopt reranking strategy to adjust the priority of the candidate captions and select the best one. The proposed model is verified using the Microsoft Common Objects in Context (MSCOCO) benchmark dataset across seven standard evaluation metrics. Experimental results show that GLAN-RS significantly outperforms the state-of-the-art approaches, such as multimodal recurrent neural network (MRNN) and Google NIC, which gets an improvement of 20% in terms of BLEU4 score and 13 points in terms of CIDER score.  相似文献   

5.
visual question answering (VQA) is a learning task involving two major fields of computer vision and natural language processing. The development of deep learning technology has contributed to the advancement of this research area. Although the research on the question answering model has made great progress, the low accuracy of the VQA model is mainly because the current question answering model structure is relatively simple, the attention mechanism of model is deviated from human attention and lacks a higher level of logical reasoning ability. In response to the above problems, we propose a VQA model based on multi-objective visual relationship detection. Firstly, the appearance feature is used to replace the image features from the original object, and the appearance model is extended by the principle of word vector similarity. The appearance features and relationship predicates are then fed into the word vector space and represented by a fixed length vector. Finally, through the concatenation of elements between the image feature and the question vector are fed into the classifier to generate an output answer. Our method is benchmarked on the DQAUAR data set, and evaluated by the Acc WUPS@0.0 and WUPS@0.9.  相似文献   

6.
For synchronously combining the dynamic semantic and visual information in the decoder part of image captioning, we propose a novel parallel-fusion LSTM (pLSTM) structure in this paper. Two parallel LSTMs with attributes and visual information of image are fused by the hidden states at every time step, which makes the attributes and visual information complementary or enhanced for generating more accurate captions. According to the different ways of integrating semantic information from attribute LSTM to visual LSTM, we propose two models pLSTM with attention (pLSTM-A) and pLSTM with guiding (pLSTM-G). pLSTM-A can automatically capture the crucial semantic and visual information to generate captions, and pLSTM-G directly adjusts the hidden state of visual LSTM by synchronous semantic information to the critical region. For verifying the effectiveness of our proposed pLSTM, we conduct a series of experiments on MSCOCO and Flickr30K datasets, and the experimental results outperform some state-of-the-art image captioning methods.  相似文献   

7.
董波  周燕  王永雄 《电子科技》2009,34(1):23-30
当前的显著性检测算法在复杂场景下难以分割出完整显著性区域以及锐利的边缘细节。针对这一问题,文中提出了一种新颖的特征融合算法。该方法利用全卷积神经网络获取多个层次粗糙的初始特征并结合特征金字塔结构对其深度解析。设计渐进结构感受野模块将特征转换至不同尺度的空间进行优化,实现特征的渐进融合与传递,有选择性地增强显著性区域。采用全局注意力机制消除背景噪声并建立显著性像素之间的长距离依赖,以提高显著性区域的有效性,突出显著性目标,再通过学习融合个层次特征得到显著图。综合实验表明,在绝对误差减小的情况下,F-measure指标远超出其他7种主流方法。所提的显著性模型综合了全卷积神经网络和特征金字塔结构的优点,结合文中设计的渐进结构感受野和全局注意力机制,使得显著图更接近真值图。  相似文献   

8.
金楠  王瑞琴  陆悦聪 《电信科学》2022,38(10):89-97
传统基于注意力机制的推荐算法只利用位置嵌入对用户行为序列进行建模,忽略了具体的时间戳信息,导致推荐性能不佳和模型训练过拟合等问题。提出基于时间注意力的多任务矩阵分解推荐模型,利用注意力机制提取邻域信息对用户和物品进行嵌入编码,借助艾宾浩斯遗忘曲线描述用户兴趣随时间的变化特性,在模型训练过程中引入经验回放的强化学习策略模拟人类的记忆复习过程。真实数据集上的实验结果表明,该模型比现有推荐模型具有更好的推荐性能。  相似文献   

9.
郑萌 《电子科技》2009,33(11):84-87
使用神经机器算法对英语进行翻译是当前研究的热点,采用传统序列神经框架进行英语翻译,其对长距离信息的捕获能力过差,自身有较大的局限性。然而,目前的改进框架,例如循环神经网络翻译效果也并不理想。文中针对传统机器翻译算法的不足,建立了注意力编解码模型,将注意力机制与神经网络框架相结合,并基于TensorFlow对整个英语翻译系统进行实现,由此提高了翻译精度。实验测试结果表明,文中所构建算法模型的BLUE值相比于传统机器学习算法均有不同程度的提升,证明了文中所提算法模型的性能相较于传统模型有较为明显的提高。  相似文献   

10.
Visual attention for the diagnosis of Autism Spectrum Disorder (ASD) which is a kind of mental disorder has attracted the interests of increasing number of researchers. Although multiple visual attention prediction models have been proposed, this problem is still open. In this paper, considering the shift of visual attention, we propose that an image can be viewed as a pseudo sequence. Besides, we propose a novel visual attention prediction method for ASD with hierarchical semantic fusion (ASD-HSF). Specifically, the proposed model mainly contains a Spatial Feature Module (SFM) and a Pseudo Sequential Feature Module (PSFM). SFM is designed to extract spatial semantic features with a fully convolutional network, while PSFM implemented by two Convolutional Long Short-Term Memory networks (ConvLSTMs) is applied to learn pseudo sequential features. And the outputs of these two modules are fused to extract the final saliency map which simultaneously includes spatial semantic information and pseudo sequential information. Experimental results show that the proposed model not only outperforms ten state-of-the-art general saliency prediction counterparts, but also reaches the first and the second ranks under four metrics and the rest ones of ASD saliency prediction respectively.  相似文献   

11.
强化学习通过与环境的交互学得任务的决策策略,具有自学习与在线学习的特点。但交互试错的机制也往往导致了算法的运行效率较低、收敛速度较慢。知识包含了人类经验和对事物的认知规律,利用知识引导智能体(agent)的学习,是解决上述问题的一种有效方法。该文尝试将定性规则知识引入到强化学习中,通过云推理模型对定性规则进行表示,将其作为探索策略引导智能体的动作选择,以减少智能体在状态-动作空间探索的盲目性。该文选用OpenAI Gym作为测试环境,通过在自定义的CartPole-v2中的实验,验证了提出的基于云推理模型探索策略的有效性,可以提高强化学习的学习效率,加快收敛速度。  相似文献   

12.
该文受人脑视觉感知机理启发,在深度学习框架下提出融合时空双网络流和视觉注意的行为识别方法。首先,采用由粗到细Lucas-Kanade估计法逐帧提取视频中人体运动的光流特征。然后,利用预训练模型微调的GoogLeNet神经网络分别逐层卷积并聚合给定时间窗口视频中外观图像和相应光流特征。接着,利用长短时记忆多层递归网络交叉感知即得含高层显著结构的时空流语义特征序列;解码时间窗口内互相依赖的隐状态;输出空间流视觉特征描述和视频窗口中每帧标签概率分布。其次,利用相对熵计算时间维每帧注意力置信度,并融合空间网络流感知序列标签概率分布。最后,利用softmax分类视频中行为类别。实验结果表明,与其他现有方法相比,该文行为识别方法在分类准确度上具有显著优势。  相似文献   

13.
针对深度神经网络模型仅学习当前指代链语义信息忽略了单个指代链识别结果的长期影响问题,提出一种结合深度强化学习(deep reinforcement learning)的维吾尔语人称代词指代消解方法.该方法将指代消解任务定义为强化学习环境下顺序决策过程,有效利用之前状态中先行语信息判定当前指代链指代关系.同时,采用基于整体奖励信号优化策略,相比于使用损失函数启发式优化特定的单个决策,该方法直接优化整体评估指标更加高效.最后在维吾尔语数据集进行实验,实验结果显示,该方法在维吾尔语人称代词指代消解任务中的F值为85.80%.实验结果表明,深度强化学习模型能显著提升维吾尔语人称代词指代消解性能.  相似文献   

14.
Fuzzy inference system learning by reinforcement methods   总被引:9,自引:0,他引:9  
Fuzzy Actor-Critic Learning (FACL) and Fuzzy Q-Learning (FQL) are reinforcement learning methods based on dynamic programming (DP) principles. In the paper, they are used to tune online the conclusion part of fuzzy inference systems (FIS). The only information available for learning is the system feedback, which describes in terms of reward and punishment the task the fuzzy agent has to realize. At each time step, the agent receives a reinforcement signal according to the last action it has performed in the previous state. The problem involves optimizing not only the direct reinforcement, but also the total amount of reinforcements the agent can receive in the future. To illustrate the use of these two learning methods, we first applied them to a problem that involves finding a fuzzy controller to drive a boat from one bank to another, across a river with a strong nonlinear current. Then, we used the well known Cart-Pole Balancing and Mountain-Car problems to be able to compare our methods to other reinforcement learning methods and focus on important characteristic aspects of FACL and FQL. We found that the genericity of our methods allows us to learn every kind of reinforcement learning problem (continuous states, discrete/continuous actions, various type of reinforcement functions). The experimental studies also show the superiority of these methods with respect to the other related methods we can find in the literature  相似文献   

15.
Sparse representation has been attracting much more attention in visual tracking. However most sparse representation based trackers only focus on how to model the target appearance and do not consider the learning of sparse representation when the training samples are imprecise, and hence may drift or fail in the challenging scene. In this paper, we present a novel online tracking algorithm. The tracker integrates the online multiple instance learning into the recent sparse representation scheme. For tracking, the integrated sparse representation combining texture, intensity and local spatial information is proposed to model the target. This representation takes both occlusion and appearance change into account. Then, an efficient online learning approach is proposed to select the most distinguishable features to separate the target from the background samples. In addition, the sparse representation is dynamically updated online with respect to the current context. Both qualitative and quantitative evaluations on challenging benchmark video sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods.  相似文献   

16.
A hierarchical-processed frame construction of artificial emotion model for intelligent system is proposed in the paper according to the basic conclusion of emotional psychology.The general method of emotion processing,which considers only one single layer,has been changed in the presented construction.An artificial emotional development model is put forward based on reinforcement learning mechanism of neural network.The new model takes the emotion itself as reinforcement signal and describes its different influences on action learning efficiency corresponding to different individualities.In the end,simulation result based on child playmate robot is discussed and the effectiveness of the model is verified.  相似文献   

17.
当前的移动边缘计算资源分配结构多为单向形式,资源分配效率较低,导致资源分配比下降,文中设计了一种基于强化学习的移动边缘计算资源分配方法,并通过实验验证了其有效性。根据当前的测试需求,首先部署了资源采集节点,然后采用多阶的方式,提升整体的资源分配效率,构建多阶迁移资源分配结构,最后设计了移动边缘计算强化学习资源分配模型,采用动态化辅助协作处理的方式来实现资源分配。测试结果表明,对于选定的5个测试周期,经过3个分配组的测定及比对,最终得出的资源分配比均可以达到5.5以上,这说明在强化学习技术的辅助下,文中设计的移动边缘计算资源分配方法更加灵活、多变,针对性较强,具有实际的应用价值。  相似文献   

18.
19.
基于深度学习的目标跟踪方法研究现状与展望   总被引:1,自引:0,他引:1       下载免费PDF全文
罗海波  许凌云  惠斌  常铮 《红外与激光工程》2017,46(5):502002-0502002(7)
目标跟踪是计算机视觉领域的重要研究方向之一,在精确制导、智能视频监控、人机交互、机器人导航、公共安全等领域有着重要的作用。目标跟踪的基本问题是在一个视频或图像序列中选择感兴趣的目标,在接下来的连续帧中,找到该目标的准确位置并形成其运动轨迹。目标跟踪是一个颇具挑战性的问题,目标的非刚性变化往往改变了目标的表观模型,同时复杂的光照变化、目标与场景间的遮挡、背景中相似物体的干扰和摄像机的抖动等使目标跟踪任务变得更加困难。近年来,随着深度学习在目标检测和识别等领域中取得巨大的突破,许多学者开始将深度学习模型引入到目标跟踪中,并在一系列数据评测集上取得了优于传统方法的性能,逐渐开启了目标跟踪领域的新篇章。文中将首先阐述目标跟踪问题的难点和基本解决思路;然后根据利用深度学习算法解决目标跟踪问题的不同思路,对当前出现的此类主流算法进行分析,介绍这些算法各自的优缺点及未来的工作方向。  相似文献   

20.
桑海峰  赵子裕  何大阔 《电子学报》2020,48(6):1052-1061
视频帧中复杂的环境背景、照明条件等与行为无关的视觉信息给行为空间特征带来了大量的冗余和噪声,一定程度上影响了行为识别的准确性.针对这一点,本文提出了一种循环区域关注单元以捕捉空间特征中与行为相关的区域视觉信息,并根据视频的时序特性又提出了循环区域关注模型.其次,本文又提出了一种能够突显整段行为视频序列中较为重要帧的视频帧关注模型,以减少异类行为视频序列间相似的前后关联给识别带来的干扰.最后,提出了一个能够端到端训练的网络模型:基于循环区域关注和视频帧关注的视频行为识别网络(Recurrent Region Attention and Video Frame Attention based video action recognition Network,RFANet).在两个视频行为识别基准UCF101数据集和HMDB51数据集上的实验表明,本文提出的端到端网络RFANet能够可靠地识别出视频中行为的所属类别.受双流结构启发,本文构建了双模态RFANet网络.在相同的训练环境下,双模态RFANet网络在两个数据集上达到了最优的性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号