首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Multimedia Tools and Applications - Television news is an important medium to convey information to masses. This motivates several stakeholders to monitor and analyze the news broadcasts....  相似文献   

2.
The paper proposes measures for weighted indexing of sports news videos. The content-based analyses of sports news videos lead to the classification of frames or shots into sports categories. A set of sports categories reported in a given news video can be used as a video representation in visual information retrieval system. However, such an approach does not take into account how many sports events of a given category have been reported and how long these events have been presented in news for televiewers. Weighting of sports categories in a video representation reflecting their importance in a given video or in a whole video data base would be desirable. The effects of applying the proposed measures have been demonstrated in a test video collection. The experiments and evaluations performed on this collection have also shown that we do not need to apply perfect content-based analyses to ensure proper weighted indexing of sports news videos. It is sufficient to recognize the content of only some frames and to determine the number of shots, scenes or pseudo-scenes detected in temporal aggregation process, or even only the number of events of a given sports category in a sports news video being indexed.  相似文献   

3.
4.
5.
提出了一种基于子词链的中文新闻广播故事自动分割方法。利用中文同音异形字众多、词典开放、分词多样和组词灵活等特点,在新闻广播的语音识别抄本上采用中文子词单元(汉字和音节)创建子词链,进行中文新闻广播故事的自动分割,有效地解决了在传统词链方法中由于语音识别错误(特别是词典未收录词汇)导致的相关联词之间无法匹配的问题。同时,利用各级词汇表示单元之间的互补性,如词的表义确定性和子词对语音识别错误的鲁棒性,对各级词汇进行融合,利用不同级别词汇表示单元的优势进一步提高中文新闻广播故事分割的性能。在TDT2中文标准新闻  相似文献   

6.
7.
We study the broadcast scheduling problem in which clients send their requests to a server in order to receive some files available on the server. The server may be scheduled in a way that several requests are satisfied in one broadcast. When files are transmitted over computer networks, broadcasting the files by fragmenting them provides flexibility in broadcast scheduling that allows the optimization of per user response time. The broadcast scheduling algorithm, then, is in charge of determining the number of segments of each file and their order of transmission in each round of transmission. In this paper, we obtain a closed form approximation formula which approximates the optimal number of segments for each file, aiming at minimizing the total response time of requests. The obtained formula is a function of different parameters including those of underlying network as well as those of requests arrived at the server. Based on the obtained approximation formula we propose an algorithm for file broadcast scheduling which leads to total response time which closely conforms to the optimum one. We use extensive simulation and numerical study in order to evaluate the proposed algorithm which reveals high accuracy of obtained analytical approximation. We also investigate the impact of various headers that different network protocols add to each file segment. Our segmentation approach is examined for scenarios with different file sizes at the range of 100 KB to 1 GB. Our results show that for this range of file sizes the segmentation approach shows on average 13% tolerance from that of optimum in terms of total response time and the accuracy of the proposed approach is growing by increasing file size. Besides, using proposed segmentation in this work leads to a high Goodput of the scheduling algorithm.  相似文献   

8.
9.
10.
11.
This paper targets at the problem of automatic semantic indexing of news videos by presenting a video annotation and retrieval system which is able to perform automatic semantic annotation of news video archives and provide access to the archives via these annotations. The presented system relies on the video texts as the information source and exploits several information extraction techniques on these texts to arrive at representative semantic information regarding the underlying videos. These techniques include named entity recognition, person entity extraction, coreference resolution, and semantic event extraction. Apart from the information extraction components, the proposed system also encompasses modules for news story segmentation, text extraction, and video retrieval along with a news video database to make it a full-fledged system to be employed in practical settings. The proposed system is a generic one employing a wide range of techniques to automate the semantic video indexing process and to bridge the semantic gap between what can be automatically extracted from videos and what people perceive as the video semantics. Based on the proposed system, a novel automatic semantic annotation and retrieval system is built for Turkish and evaluated on a broadcast news video collection, providing evidence for its feasibility and convenience for news videos with a satisfactory overall performance.  相似文献   

12.
This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the expressions of pitch declination and reset. Our data-oriented study shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker normalized pitch features due to their large variations across different Mandarin syllable tones. We thus propose to use speaker- and tone-normalized pitch features that can provide clear separations between utterance and story boundaries. Our study also shows that speaker-normalized pause duration is quite effective to separate between story and utterance boundaries, while speaker-normalized speech energy and syllable duration are not effective. Experiments using decision trees for story boundary detection reinforce the difference between English and Chinese, i.e., speaker- and tone-normalized pitch features should be favorably adopted in Mandarin story segmentation. We show that the combination of different prosodic cues can achieve a very high F-measure of 93.04% due to the complementarity between pause, pitch and energy. Analysis of the decision tree uncovered five major heuristics that show how speakers jointly utilize pause duration and pitch to separate speech into stories.  相似文献   

13.

Extraction of news text captions aims at a digital understanding of what is happening in a specific region during a certain period that helps in better communication between different nations because we can easily translate the plain text from one language to another. Moving text captions causes blurry effects that are a significant cause of text quality impairments in the news channels. Most of the existing text caption detection models do not address this problem in a way that captures the different dynamic motion of captions, gathers a full news story among several frames in the sequence, resolves the blurring effect of text motion, offers a language-independent model, or provides it as an end-to-end solution for the community to use. We process the frames coming in sequence and extract edge features using either the Hough transform or our color-based technique. We verify text existence using a Convolutional Neural Network (CNN) text detection pre-trained model. We analyze the caption motion status using hybrid pre-trained Recurrent Neural Network (RNN) of Long Short-Term Memory (LSTM) type model and correlation-based model. In case the motion is determined to be horizontal rotation, there are two problems. First, it means that text keeps rotating with no stop resulting in a high blurring effect that affects the text quality and consequently resulting in low character recognition accuracy. Second, there are successive news stories which are separated by the channel logo or long spaces. We managed to solve the first problem by deblurring the text image using either Bicubic Spline Interpolation (BSI) technique or the Denoising Autoencoder Neural Network (DANN). We solved the second problem using a Point Feature Matching (PFM) technique to match the existing channel logo with the channels’ logo database (ground truth). We evaluate our framework using Abbyy® SDK as a standalone tool used for text recognition supporting different languages.

  相似文献   

14.
Given a segmentation result (an alpha matte or a binary mask) of the former frame, foreground prediction is a process of estimating the probability that each pixel in the current frame belongs to the foreground. It plays a very important role in bilayer segmentation of videos, especially videos with non-static backgrounds. In this paper, a new foreground prediction algorithm which is called opacity propagation is proposed. It can propagate the opacity values of the former frame to the current frame by minimizing a cost function that is constructed by assuming the spatiotemporally local color smoothness of the video. Optical flow and probability density estimation based on a local color model are employed to find the corresponding pixels of two adjacent frames. An OPSIC (opacity propagation with sudden illumination changes) algorithm is also proposed which is an improvement of our proposed opacity propagation algorithm because it adds a simple color transformation model. As far as we know, this is the first algorithm that can predict the foreground accurately when the illumination changes suddenly. The opacity map (OM) generated by the opacity propagation algorithm is usually more accurate than the previously used probability map (PM). The experiments demonstrate the effectiveness of our algorithm.  相似文献   

15.

视频-文本检索作为一项被广泛应用于现实生活中的多模态检索技术受到越来越多的研究者的关注. 近来, 大部分视频文本工作通过利用大规模预训练模型中所学到的视觉与语言之间的匹配关系来提升文本视频间跨模态检索效果. 然而, 这些方法忽略了视频、文本数据都是由一个个事件组合而成. 倘若能捕捉视频事件与文本事件之间的细粒度相似性关系, 将能帮助模型计算出更准确的文本与视频之间的语义相似性关系, 进而提升文本视频间跨模态检索效果. 因此, 提出了一种基于CLIP生成多事件表示的视频文本检索方法(CLIP based multi-event representation generation for video-text retrieval, CLIPMERG). 首先, 通过利用大规模图文预训练模型CLIP的视频编码器(ViT)以及文本编码器(Tansformer)分别将视频、文本数据转换成视频帧token序列以及文本的单词token序列;然后, 通过视频事件生成器(文本事件生成器)将视频帧token序列(单词token序列)转换成k个视频事件表示(k个文本事件表示);最后, 通过挖掘视频事件表示与文本事件表示之间的细粒度关系以定义视频、文本间的语义相似性关系. 在3个常用的公开视频文本检索数据集MSR-VTT, DiDeMo, LSMDC上的实验结果表明所提的CLIPMERG优于现有的视频文本检索方法.

  相似文献   

16.
基于文本及视音频多模态信息的新闻分割   总被引:1,自引:0,他引:1  
提出了一种融合文本和视音频多模态特征的电视新闻自动分割方案。该方案充分考虑各种媒体特征的特点,先用矢量模型和GMM对文本进行预分割,用语谱图和HMM对语音预分割、用改进的直方图和SVM分类器对视频进行预分割。然后在时间同步的基础上,使用复合策略用ANN对预分割的数据进行融合,从而获得具有一定语义内容的视频段。实验结果表明此方法的有效性,并且分割后的视频片段具备较完整的语义信息特征,避免了分割的过度细碎的弊端。  相似文献   

17.
Multimedia Tools and Applications - Salient object segmentation in videos is generally broken up in a video segmentation part and a saliency assignment part. Recently, object proposals, which are...  相似文献   

18.
19.
An explosive growth in the volume, velocity, and variety of the data available on the Internet has been witnessed recently. The data originated from multiple types of sources including mobile devices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, health data has led to one of the most challenging research issues of the big data era. In this paper, Knowle—an online news management system upon semantic link network model is introduced. Knowle is a news event centrality data management system. The core elements of Knowle are news events on the Web, which are linked by their semantic relations. Knowle is a hierarchical data system, which has three different layers including the bottom layer (concepts), the middle layer (resources), and the top layer (events). The basic blocks of the Knowle system—news collection, resources representation, semantic relations mining, semantic linking news events are given. Knowle does not require data providers to follow semantic standards such as RDF or OWL, which is a semantics-rich self-organized network. It reflects various semantic relations of concepts, news, and events. Moreover, in the case study, Knowle is used for organizing and mining health news, which shows the potential on forming the basis of designing and developing big data analytics based innovation framework in the health domain.  相似文献   

20.
目的 基于全卷积神经网络的图像语义分割研究已成为该领域的主流研究方向。然而,在该网络框架中由于特征图的多次下采样使得图像分辨率逐渐下降,致使小目标丢失,边缘粗糙,语义分割结果较差。为解决或缓解该问题,提出一种基于特征图切分的图像语义分割方法。方法 本文方法主要包含中间层特征图切分与相对应的特征提取两部分操作。特征图切分模块主要针对中间层特征图,将其切分成若干等份,同时将每一份上采样至原特征图大小,使每个切分区域的分辨率增大;然后,各个切分特征图通过参数共享的特征提取模块,该模块中的多尺度卷积与注意力机制,有效利用各切块的上下文信息与判别信息,使其更关注局部区域的小目标物体,提高小目标物体的判别力。进一步,再将提取的特征与网络原输出相融合,从而能够更高效地进行中间层特征复用,对小目标识别定位、分割边缘精细化以及网络语义判别力有明显改善。结果 在两个城市道路数据集CamVid以及GATECH上进行验证实验,论证本文方法的有效性。在CamVid数据集上平均交并比达到66.3%,在GATECH上平均交并比达到52.6%。结论 基于特征图切分的图像分割方法,更好地利用了图像的空间区域分布信息,增强了网络对于不同空间位置的语义类别判定能力以及小目标物体的关注度,提供更有效的上下文信息和全局信息,提高了网络对于小目标物体的判别能力,改善了网络整体分割性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号