首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Text in videos contains rich semantic information, which is useful for content based video understanding and retrieval. Although a great number of state-of-the-art methods are proposed to detect text in images and videos, few works focus on spatiotemporal text localization in videos. In this paper, we present a spatiotemporal text localization method with an improved detection efficiency and performance. Concretely, a unified framework is proposed which consists of the sampling-and-recovery model (SaRM) and the divide-and-conquer model (DaCM). SaRM aims at exploiting the temporal redundancy of text to increase the detection efficiency for videos. DaCM is designed to efficiently localize the text in spatiotemporal domain simultaneously. Besides, we construct a challenging video overlaid text dataset named UCAS-STLData, which contains 57070 frames with spatiotemporal ground truths. In the experiments, we comprehensively evaluate the proposed method on the publicly available overlaid text datasets and UCAS-STLData. A slight performance improvement is achieved compared with the state-of-the-art methods for spatiotemporal text localization, with a significant efficiency improvement.  相似文献   


TV news channels present rich and complete experience of various events through audio-visual content. This makes television news an influential medium to affect masses and thus persuaded various social scientists and regulators to monitor and analyze the content of broadcast videos. An organized archive of newscast is a prerequisite for any such analysis. Creating such archive requires segmentation of continuous news videos into suitable logical units. Based on the application, these logical units may be one of channel content obtained after advertisement removal, different shows, news stories or video shots. In this work, we propose an end to end system with software architecture for segmenting the TV broadcast videos at all these four granularities. The videos are segmented into shots. Video shots are used as basic unit for all further processing. Video shots are first subjected to advertisement detection and removal to obtain the non-commercial channel content. This channel content is further processed to identify various program boundaries. We propose to identify three types of shows based on the presentation format viz. news bulletins, interviews and debates. News bulletins so obtained are processed further to obtain news stories. We propose a modular and scalable framework and software architecture for the broadcast segmentation system for deployment on a computation cluster. This involves scheduler based recording module and broadcast segmentation module. We have presented the detailed software architecture for individual modules, automation of entire processing pipeline along with resource and database management systems. We have implemented and verified the software architecture by deploying the proposed system on a cluster of nine desktops and one workstation. The deployed system was used for round the clock processing of three Indian English news channels.


We consider the problem of localizing and segmenting objects in weakly labeled video. A video is weakly labeled if it is associated with a tag (e.g. YouTube videos with tags) describing the main object present in the video. It is weakly labeled because the tag only indicates the presence/absence of the object, but does not give the detailed spatial/temporal location of the object in the video. Given a weakly labeled video, our method can automatically localize the object in each frame and segment it from the background. Our method is fully automatic and does not require any user-input. In principle, it can be applied to a video of any object class. We evaluate our proposed method on a dataset with more than 100 video shots. Our experimental results show that our method outperforms other baseline approaches.  相似文献   

Soccer is the most popular sport around the world, and automatic processing of soccer images is a precious alternative to the manual solutions regarding the explosive growth of soccer videos. A new multi-player detection algorithm in far view frames as an initial step to a wide range of applications, such as player tracking, is addressed in this paper. In the proposed detector, a two-step blob detection (grass-based blob detection followed by an edge-based blob detection) is combined with an efficient search mechanism based on particle swarm optimization (PSO) by assigning sub-swarms to each detected blob. Then, a sub-swarm is initialized and tripled to search for three models corresponding to two teams and the referee. Therefore, the most player-like regions in detected blobs are simultaneously searched by all sub-swarms flying through the solution space, thus expanding the scope of single player detection to multi-player detection. Experimental results demonstrate the efficiency and robustness of the algorithm.  相似文献   

Multimedia Tools and Applications - Television news is an important medium to convey information to masses. This motivates several stakeholders to monitor and analyze the news broadcasts....  相似文献   

检测GSM广播频点在GSM手机定位和监听方面都起着非常重要的作用。本文介绍一种GSM广播频点检测方法。在本方法中,使用数字下变频技术对无线电信号进行I、Q两路采样,使用FFT变换对采样数据进行变换并检测出FCCH信道,从而检测出GSM广播频点。  相似文献   

Determining the pose (position and orientation) of a vehicle at any time is termed localization and is of paramount importance in achieving reliable and robust autonomous navigation. Knowing the pose it is possible to achieve high level tasks such as path planning. A new map-based algorithm for the localization of vehicles operating in harsh outdoor environments is presented in this article. A map building algorithm using observations from a scanning laser rangefinder is developed for building a polyline map that adequately captures the geometry of the environment. Using this map, the Iterative Closest Point (ICP) algorithm is employed for matching laser range images from the rangefinder to the polyline map. Once correspondences are established, an Extended Kalman Filter (EKF) algorithm provides reliable vehicle state estimates using a nonlinear observation model based on the vertices of the polyline map. Data gathered during field trials in an outdoor environment is used to test the efficiency of the proposed ICP-EKF algorithm in achieving the localization of a four-wheel drive (4WD) vehicle. © 2005 Wiley Periodicals, Inc.  相似文献   

一种快速新闻视频标题字幕探测与定位方法*   总被引:1,自引:0,他引:1  
新闻视频字幕包含有丰富的语义信息,尤其是标题字幕,对新闻视频高层语义内容的分析和理解具有 重要作用。利用标题字幕的时空分布特征,提出了一个新闻视频标题字幕的快速探测与定位方法。首先利用标 题字幕持续多帧出现的特点降低所需处理的帧数,然后基于标题字幕的边缘特征和位置特征,标记帧图像的候 选字幕块,对帧序列中的图像进行统计分析,探测出视频中标题字幕的位置及出现消失时间。实验结果表明所 提方法简单有效,能够快速、鲁棒地探测并定位新闻视频中的标题字幕。  相似文献   

Feng  Na  Song  Zikai  Yu  Junqing  Chen  Yi-Ping Phoebe  Zhao  Yizhu  He  Yunfeng  Guan  Tao 《Multimedia Tools and Applications》2020,79(39-40):28971-28992
Multimedia Tools and Applications - Soccer video analysis is the focus of sports video research as it receives widespread attention around the world. However, the lack of soccer datasets hinders...  相似文献   

分子相似性与MOLPRINT2D的本地化   总被引:2,自引:2,他引:0  
分子相似性是分子多样性、分子聚类分析的基础,在药物、材料设计等领域应用广泛.分子相似性有相似度和分子间距两种方式表示,它的计算以分子描述符的计算为前提,依据所采用的描述符是定量还是定性表示,其算法有所不同.本文首先简述了分子相似性的概念、计算方法及其分类,然后,将一款免费的分子相似性计算软件MOLPRINT2D本地化.  相似文献   

As a special application of computer vision, automatic sports video analysis has been studied by some researchers. This sports video analysis via computer vision is a moderately challenging problem: it is more difficult than analyzing a video of a few laboratory members acting as in a simple scenario and is easier than analyzing a video of crowded people at a subway station. So the success of an analysis heavily depends on how much one can exploit the prior information on the sport and setting. The most challenging and important part would be the tracking of players (and ball). With a multi-camera system, 3D tracking is feasible which is much more meaningful than 2D tracking for the analysis. As an initial step of 3D player tracking from multi-view soccer videos, this paper deals with automatic initialization of player positions. Initial 3D positions can be estimated by exploiting some conditions of a soccer match. To make it robust, prior knowledge on the features of players is learnt by support vector machines (SVM). Experimental results show that the proposed system is efficient for general soccer sequences.  相似文献   

Recently, localization methods based on detailed maps constructed using simultaneous localization and mapping have been widely used for mobile robot navigation. However, the cost of building such maps increases rapidly with expansion of the target environment. Here, we consider the problem of localization of a mobile robot based on existing 2D street maps. Although a large amount of research on this topic has been reported, the majority of the previous studies have focused on car-like vehicles that navigate on roadways; thus, the efficacy of such methods for sidewalks is not yet known. In this paper, we propose a novel localization approach that can be applied to sidewalks. Whereas roadways are typically marked, e.g. by white lines, sidewalks are not and, therefore, road boundary detection is not straightforward. Thus, obtaining exact correspondence between sensor data and a street map is complex. Our approach to overcoming this difficulty is to maximize the statistical dependence between the sensor data and the map, and localization is achieved through maximization of a mutual-information-based criterion. Our method employs a computationally efficient estimator of squared-loss mutual information, through which we achieve near real-time performance. The efficacy of our method is evaluated through localization experiments using real-world data-sets  相似文献   

Visual discomfort is one of the most frequent complaints of the viewers while watching 3D images and videos. Large disparity and large amount of motion are two main causes of visual discomfort. To quantify this influence, three objectives are set in this paper. The first one is the comparative analysis on the influence of different types of motion, i.e., static stereoscopic image, planar motion and in-depth motion, on visual discomfort. The second one is the investigation on the influence factors for each motion type, for example, the disparity offset, the disparity amplitude and velocity. The third one is to propose an objective model for visual discomfort. Thirty-six synthetic stereoscopic video stimuli with different types of motion are used in this study. In the subjective test, an efficient paired comparison method called Adaptive Square Design (ASD) was used to reduce the number of comparisons for each observer and keep the results reliable. The experimental results showed that motion does not always induce more visual discomfort than static conditions. The in-depth motion generally induces more visual discomfort than the planar motion. The relative disparity between the foreground and the background, and the motion velocity are identified as main factors for visual discomfort. According to the subjective results, an objective model for comparing visual discomfort induced by different types of motion is proposed which shows high correlation with the subjective perception.  相似文献   

赵海军  崔梦天  李明东  李佳 《计算机应用》2016,36(10):2659-2663
针对目前移动无线传感器网络定位问题存在的不足,提出了一种基于改进的洪泛广播机制和粒子滤波的节点定位算法。对于一个给定的未知节点,首先采用改进的洪泛广播机制,从离它最近的锚节点得到的有效平均跳距来计算出它到它的所有邻居节点的距离。然后采用一种差分误差校正算法,以减小平均跳距中由于多跳累积造成的测量误差;其次,采用粒子滤波和虚拟锚节点来减小预测区域,得到更有效的粒子预测区域,从而进一步减小对未知节点位置的估计误差。仿真结果表明,所提算法与定位算法DV-Hop、蒙特卡罗Baggio(MCB)和基于测试的蒙特卡罗定位(MCL)相比,能够有效地抑制冗余广播和减小与节点定位相关的消息开销,以较低的通信成本实现较高精度的定位性能。  相似文献   

Multimedia Tools and Applications - In this paper, the proposed algorithm provides a fluent and efficient method for repairing very-low quality depth maps of considerable manifold defects for...  相似文献   

In this paper we present Turk-2, a hybrid multi-modal chess player with a robot arm and a screen-based talking head. Turk-2 can not only play chess, but can see and hear the opponent, can talk to him and display emotions. We were interested to find out if a simple embodiment with human-like communication capabilities enhances the experience of playing chess against a computer. First, give an overview of the development road to multi-modal communication with computers. Then we motivate our research with a hybrid system, we introduce the architecture of Turk-2, we describe the human experiments and its evaluation. The results justify that multi-modal interaction makes game playing more engaging, enjoyable – and even more effective. These findings for a specific game situation provide yet another evidence of the power of human-like interaction in turning computer systems more attractive and easier to use.  相似文献   

刘诤轩  王亮  李和平  程健 《控制与决策》2023,38(7):1861-1868
高精度的定位对于自动驾驶至关重要. 2D激光雷达作为一种高精度的传感器被广泛应用于各种室内定位系统.然而在室外环境下,大量动态目标的存在使得相邻点云的匹配变得尤为困难,且2D激光雷达的点云数据存在稀疏性的问题,导致2D激光雷达在室外环境下的定位精度极低甚至无法实现定位.为此,提出一种融合双目视觉和2D激光雷达的室外定位算法.首先,利用双目视觉作为里程计提供相对位姿,将一个局部时间窗口内多个时刻得到的2D激光雷达数据融合成一个局部子图;然后,采用DS证据理论融合局部子图中的时态信息,以消除动态目标带来的噪声;最后,利用基于ICA的图像匹配方法将局部子图与预先构建的全局先验地图进行匹配,消除里程计的累积误差,实现高精度定位.在KITTI数据集上的实验结果表明,仅利用低成本的双目相机和2D激光雷达便可实现较高精度的定位,所提出算法的定位精度相比于ORB-SLAM2里程计最高可提升37.9%.  相似文献   

Multimedia Tools and Applications - In recent years, the demand of 3D video services has gradually increased. More and more bandwidth hungry applications are proposed, such as immersive media...  相似文献   

Multimedia Tools and Applications - With the rapid development of detecting violent behaviors in surveillance cameras, requests on systems that automatically recognize violent events are expanded....  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号