首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 234 毫秒
1.
褚晶辉  董越  吕卫 《电视技术》2014,38(3):188-191
视频中包含的文字信息与视频的语义内容有很强的相关性,将视频中的文字信息提取出来进行分析处理可以有效地理解电视视频语义,从而实现对视频内容的安全监控。针对文字检测提出一种基于小波变换、角点特征图像和统计特征的有效方法,并运用基于彩色空间的文字提取方法获取二值图像,更有利于后面OCR的文字识别。  相似文献   

2.
多模态技术为文本、语音、视频、图像等非结构化数据的智能处理提供了可能性。本文基于对多模态深度学习模态表示等关键技术的研究,针对网络视听监管业务的工作需求,对多模态技术在网络视听内容监管方面的应用进行了初步探索,旨在有效提升网络视听大数据处理的准确性和效率。  相似文献   

3.
全媒体新闻中包含影像、字幕、语音、人物等多模态信息,已有的计算机编目系统常对单模态信息进行处理,生成的编目信息不够全面和准确.本文提出一种基于多模态融合的全媒体新闻智能编目系统,它通过多模态媒体内容统一表征与融合理解的核心算法来自动化生成拆条标记、影像描述、关键词、场景分类标签等关键信息,实现对新闻内容的全流程智能编目...  相似文献   

4.
目前,大多数讽刺识别模型都是针对文本数据进行研究,推文中包含的图像数据未得到有效利用,导致讽刺识别任务准确度不高.针对这一问题,提出一种结合注意力机制的联合神经网络模型RCBA,用于图文混合的多模态讽刺识别任务.RCBA模型首先利用结合空间注意力机制和通道注意力机制的深度残差网络(ResNet101)进行图像特征自适应...  相似文献   

5.
刘强  张文英  陈恩庆 《信号处理》2020,36(9):1422-1428
人体动作识别在人机交互、视频内容检索等领域有众多应用,是多媒体信息处理的重要研究方向。现有的大多数基于双流网络进行动作识别的方法都是在双流上使用相同的卷积网络去处理RGB与光流数据,缺乏对多模态信息的利用,容易造成网络冗余和相似性动作误判问题。近年来,深度视频也越来越多地用于动作识别,但是大多数方法只关注了深度视频中动作的空间信息,没有利用时间信息。为了解决这些问题,本文提出一种基于异构多流网络的多模态动作识别方法。该方法首先从深度视频中获取动作的时间特征表示,即深度光流数据,然后选择合适的异构网络来进行动作的时空特征提取与分类,最后对RGB数据、RGB中提取的光流、深度视频和深度光流识别结果进行多模态融合。通过在国际通用的大型动作识别数据集NTU RGB+D上进行的实验表明,所提方法的识别性能要优于现有较先进方法的性能。   相似文献   

6.
郭振堂  祝永新  田犁 《激光杂志》2021,42(12):52-58
针对图像作为数据源在车辆行人检测任务中无法确定车辆行人空间位置、预测框不准确的问题,提出一种基于多模态数据的车辆行人检测方法.对点云预处理后获得的点云图像投影;通过卷积神经网络检测图像中的车辆行人;利用该方法提出的匹配公式和基于MLP的多模态联合评价方法,对点云和图像处理结果进行融合,实验结果表明,该方法可以获得准确的车辆行人空间位置,88.5%的目标位置误差在1 m以内;融合检测框相比于YOLOv3和DeepLabv3分别提升2.2%和3.3%;方法具有可推广性.  相似文献   

7.
<正>1前言AIGC可生成的内容形式包含文本(文句)、图像、音频和视频。它能将文本中的语言符号信息或知识,与视觉中可视化的信息(或知识)建立出对应的关联。两者互相加强,形成图文并茂的景象,激发人脑更多想象,扩大人们的思维空间。其中,最基础的就是文本(Text)与图像(Image)之间的知识关联。本篇来介绍文本与图像的关联,并以CLIP模型为例,深入介绍多模态AIGC模型的幕后架构,  相似文献   

8.
得益于深度卷积神经网络在特征提取和语义理解的强大能力,基于深度神经网络的语义分割技术逐渐成为计算机视觉研究的热点课题.在无人驾驶、医学图像,甚至是虚拟交互、增强现实等领域都需要精确高效的语义分割技术.语义分割从图像像素级理解出发,为每个像素分配单独的类别标签.针对基于深度神经网络的语义分割技术,根据技术特性的差异,从编码-解码架构、多尺度目标融合、卷积优化、注意力机制、传统-深度结合、策略融合方面展开,对现有模型的优缺点进行梳理和分析,并当前主流语义分割方法在公共数据集实验结果进行对比,总结了该领域当前面临的挑战以及对未来研究方向的展望.  相似文献   

9.
张浩 《电声技术》2022,46(1):22-24,28
在媒体转型与发展的浪潮下,媒资内容的存储与管理在广播电视行业从传统向数字化的转型尤为重要.为了提高对媒资的管理与分析能力、提高用户的检索效率,河南广播电视台基于深度神经网络模型,通过对文本、图像和视频等内容进行嵌入表示,实现了语义级别的检索,为媒资内容管理的发展提供方案.  相似文献   

10.
周奕涛  张斌  刘自豪 《电子学报》2022,50(2):508-512
为进一步提升应用层DDoS攻击检测准确率,提出一种将流量与用户行为特征相结合且模型参数可高效更新的应用层DDoS攻击检测模型.为统一处理流量与用户行为特征的异源数据,利用多模态深度(Multimodal Deep Learning,MDL)神经网络从数据流量与网页日志中提取流量与用户行为深层特征后输入汇聚深度神经网络进...  相似文献   

11.
As social networks are getting more and more popular day by day, large numbers of users becoming constantly active social network users. In this way, there is a huge amount of data produced by users in social networks. While social networking sites and dynamic applications of these sites are actively used by people, social network analysis is also receiving an increasing interest. Moreover, semantic understanding of text, image, and video shared in a social network has been a significant topic in the network analysis research. To the best of the author's knowledge, there has not been any comprehensive survey of social networks, including semantic analysis. In this survey, we have reviewed over 200 contributions in the field, most of which appeared in recent years. This paper not only aims to provide a comprehensive survey of the research and application of social network analysis based on semantic analysis but also summarizes the state‐of‐the‐art techniques for analyzing social media data. First of all, in this paper, social networks, basic concepts, and components related to social network analysis were examined. Second, semantic analysis methods for text, image, and video in social networks are explained, and various studies about these topics are examined in the literature. Then, the emerging approaches in social network analysis research, especially in semantic social network analysis, are discussed. Finally, the trending topics and applications for future directions of the research are emphasized; the information on what kind of studies may be realized in this area is given.  相似文献   

12.
Overlay text brings important semantic clues in video content analysis such as video information retrieval and summarization, since the content of the scene or the editor's intention can be well represented by using inserted text. Most of the previous approaches to extracting overlay text from videos are based on low-level features, such as edge, color, and texture information. However, existing methods experience difficulties in handling texts with various contrasts or inserted in a complex background. In this paper, we propose a novel framework to detect and extract the overlay text from the video scene. Based on our observation that there exist transient colors between inserted text and its adjacent background, a transition map is first generated. Then candidate regions are extracted by a reshaping method and the overlay text regions are determined based on the occurrence of overlay text in each candidate. The detected overlay text regions are localized accurately using the projection of overlay text pixels in the transition map and the text extraction is finally conducted. The proposed method is robust to different character size, position, contrast, and color. It is also language independent. Overlay text region update between frames is also employed to reduce the processing time. Experiments are performed on diverse videos to confirm the efficiency of the proposed method.  相似文献   

13.
基于模糊同质性映射的文本检测方法   总被引:2,自引:0,他引:2  
视频图像中的文本是从语义层次对视频图像内容进行描述的非常有效信息,文本检测为基于语义的图像检索提供了条件。该文提出了一种基于模糊逻辑和同质映射相结合的文本检测方法,首先利用最大信息熵准则将原始图像模糊化;然后构造基于边缘信息和纹理信息的图像同质性,并利用它将图像映射到模糊同质性空间;最后在模糊同质性空间通过纹理分析检测文本区域。与直接在图像空间域中提取特征的文本检测方法相比,该方法对复杂背景视频图像的文本检测取得了更好的效果,并且适用于多种类型的视频图像中文本的检测。  相似文献   

14.
Most semantic video search methods use text-keyword queries or example video clips and images. But such methods have limitations. To address the problems of example-based video search approaches and avoid the use of specialized models, we conduct semantic video searches using a reranking method that automatically reorders the initial text search results based on visual cues and associated context. We developed two general reranking methods that explore the recurrent visual patterns in many contexts, such as the returned images or video shots from initial text queries, and video stories from multiple channels.  相似文献   

15.
On the social Web, the amount of video content either originated from wireless devices or previously received from media servers has increased enormously in the recent years. The astounding growth of Web videos has stimulated researchers to propose new strategies to organize them into their respective categories. Because of complex ontology and large variation in content and quality of Web videos, it is difficult to get sufficient, precisely labeled training data, which causes hindrance in automatic video classification. In this paper, we propose a novel content‐ and context‐based Web video classification framework by rendering external support through category discriminative terms (CDTs) and semantic relatedness measure (SRM). Mainly, a three‐step framework is proposed. Firstly, content‐based video classification is proposed, where twofold use of high‐level concept detectors is leveraged to classify Web videos. Initially, category classifiers induced from VIREO‐374 detectors are trained to classify Web videos, and then concept detectors with high confidence for each video are mapped to CDT through SRM‐assisted semantic content fusion function to further boost the category classifiers, which intuitively provide a more robust measure for Web video classification. Secondly, a context‐based video classification is proposed, where twofold use of contextual information is also harnessed. Initially, cosine similarity and then semantic similarity are measured between text features of each video and CDT through vector space model (VSM)‐ and SRM‐assisted semantic context fusion function, respectively. Finally, classification results from content and context are fused to compensate for the shortcomings of each other, which enhance the video classification performance. Experiments on large‐scale video dataset validate the effectiveness of the proposed solution.  相似文献   

16.
随着新媒体技术的飞速发展,传统方法已难以准确表达具有人工智能属性的复杂知识结构,跨媒体成为大家关注的焦点。当前,媒体数据感知与分析已经从文本、语音、图像以及视频等单一媒体形态向覆盖网络空间与物理空间的跨媒体融合转变。研究满足新一代人工智能发展规划的跨媒体感知和分析技术体系,并依托知识图谱、长短时记忆网络以及卷积神经网络等技术,实现多通道网络数据爬取、实体统一表征、文本语义识别以及视图像分类等,可有效支撑舆情分析、新闻追踪以及情报获取等领域的跨媒体应用。  相似文献   

17.
道路三维点云多特征卷积神经网络语义分割方法   总被引:1,自引:0,他引:1  
针对道路场景下三维激光点云语义分割精度低的问题,提出了一种基于卷积神经网络并结合几何点云多特征的端到端的语义分割方法。首先,通过球面投影构造出点云距离、相邻夹角及表面曲率等特征图像,以便于应用卷积神经网络;接着,利用卷积神经网络对多特征图像进行语义分割,得到像素级的分割结果。所提方法将传统点云特征融入到卷积神经网络中,提升了语义分割效果。使用KITTI点云数据集进行测试,结果表明:所提三维点云多特征卷积神经网络语义分割方法的效果优于SqueezeSeg V2等没有结合点云特征的语义分割方法;与SqueezeSeg V2网络相比,所提方法对车辆、自行车和行人分割的精确率分别提高了0.3、21.4、14.5个百分点。  相似文献   

18.
张良  周长胜 《电子科技》2011,24(10):111-114
分析了视频数据与文本数据的差异,以及视频数据在视频分析检索方面存在的问题。从视频内容分析领域的研究热点出发,分别对视频语义库、与视频分析相关的视频低层特征、视频对象划分与识别、视频信息描述与编码等方面的技术进行了分析和对比。并提出了一个视频语义分析的框架和分析流程。  相似文献   

19.
张鑫姝  郭戈  程娟 《电子技术》2010,47(4):22-24
本文提出一种视频文本语义信息分析的新思路,即在文本区域提取后结合文种识别理论来提取新闻视频的来源和身份等高级语义信息,同时文种识别结果可为OCR的选择提供先验知识。主要工作包括:1)针对视频中的字幕,提出一种基于时-空分析的算法来检测视频中的字幕,然后对检测到的字幕通过投影分析进行定位、增强和二值化;2)对提取到的文本区域提出一种基于PCA和小波变换的文种识别算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号