期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Query by video clip 总被引：15，自引：0，他引：15

Anil K. Jain Aditya Vailaya Xiong Wei 《Multimedia Systems》1999,7(5):369-384

Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames. For each key frame in the query, a similarity value (using color, texture, and motion) is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one basketball video as query and a different basketball video as the database show the effectiveness of feature representation and matching schemes. 相似文献

2.

基于多帧图像的视频文字跟踪和分割算法 总被引：8，自引：2，他引：6

密聪杰刘洋薛向阳《计算机研究与发展》2006,43(9):1523-1529

视频中文字的提取是视频语义理解和检索的重要信息来源．针对视频中的静止文字时间和空间上的冗余特性,以文字区域的边缘位图为特征对检测结果作精化,并提出了基于二分搜索法的快速文字跟踪算法,实现了对文字对象快速有效的定位．在分割阶段,除了采用传统的灰度融合图像进行文字区域增强方法,还结合边缘位图对文字区域进行进一步的背景过滤．实验表明,文字的检测精度和分割质量都有很大提高．相似文献

3.

Mean distance local binary pattern: a novel technique for color and texture image retrieval for liver ultrasound images

Bedi Anterpreet Kaur Sunkaria Ramesh Kumar 《Multimedia Tools and Applications》2021,80(14):20773-20802

相似文献

4.

Attention-based framework for weakly supervised video anomaly detection

Ma Hualin Zhang Liyan 《The Journal of supercomputing》2022,78(6):8409-8429

Video anomaly detection automatically recognizes abnormal events in surveillance videos. Existing works have made advances in recognizing whether a video contains abnormal events; however, they cannot temporally localize the abnormal events within videos. This paper presents a novel anomaly attention-based framework for accurately temporally localize the abnormal events. Benefiting from the proposed framework, we can achieve frame-level VAD using video-level labels, which significantly reduces the burden of data annotation. Our method is an end-to-end deep neural network-based approach, which contains three modules: anomaly attention module (AAM), discriminative anomaly attention module (DAAM) and generative anomaly attention module (GAAM). Specifically, AAM is trained to generate the anomaly attention, which is used to measure the abnormal degree of each frame. Whereas, DAAM and GAAM are used to alternately augmenting AAM from two different aspects. On the one hand, DAAM enhancing AAM by optimizing the video-level video classification. On the other hand, GAAM adopts a conditional variational autoencoder to model the likelihood of each frame given the attention for refining AAM. As a result, AAM can generate higher anomaly scores for abnormal frames while lower anomaly scores for normal frames. Experimental results show that our proposed approach outperforms state-of-the-art methods, which validates the superiority of our AAVAD.

相似文献

5.

A novel error detection & concealment technique for videos streamed over error prone channels

Usman Muhammad Arslan Seong Chi-Hyeok Lee Man Hee Shin Soo Young 《Multimedia Tools and Applications》2019,78(16):22959-22975

In video streaming services and applications, impulse noise occurs due to transmission errors or sometimes it is introduced during signal acquisition. The work presented in this paper proposes a novel impulse noise detection and mitigation (INDAM) method that can significantly recover video frames heavily impaired by impulse noise. The proposed technique uses cyclic redundancy check (CRC) method to create an error mask of the received impaired video frames. This error mask contains pixel-by-pixel error information of the video frames and is exploited further to mitigate the error from the impaired video frame. Each impaired pixel in the video frame is replaced by the average of its corresponding error-free neighboring pixels’ values, hence removing the impaired pixels and replacing them with the newly calculated average. The proposed technique uses the error mask created from the CRC method and uses only those pixels which do not contain error for calculating the averages. Results show that INDAM outperforms other contemporary methods in terms of peak signal to noise ratio (PSNR) and structural similarity index metric (SSIM).

相似文献

6.

Content-based indexing of multimedia databases 总被引：1，自引：0，他引：1

Jian-Kang Wu 《Knowledge and Data Engineering, IEEE Transactions on》1997,9(6):978-989

Content-based retrieval of multimedia database calls for content-based indexing techniques. Different from conventional databases, where data items are represented by a set of attributes of elementary data types, multimedia objects in multimedia databases are represented by a collection of features; similarity of object contents depends on context and frame of reference; and features of objects are characterized by multimodal feature measures. These lead to great challenges for content-based indexing. On the other hand, there are special requirements on content-based indexing: to support visual browsing, similarity retrieval, and fuzzy retrieval, nodes of the index should represent certain meaningful categories. That is to say that certain semantics must be added when performing indexing. ContIndex, the context-based indexing technique presented in this paper, is proposed to meet these challenges and special requirements. The indexing tree is formally defined by adapting a classification-tree concept. Horizontal links among nodes in the same level enhance the flexibility of the index. A special neural-network model, called Learning based on Experiences and Perspectives (FEP), has been developed to create node categories by fusing multimodal feature measures. It brings into the index the capability of self-organizing nodes with respect to certain context and frames of reference. An icon image is generated for each intermediate node to facilitate visual browsing. Algorithms have been developed to support multimedia object archival and retrieval using Contlndex 相似文献

7.

Localizing relevant frames in web videos using topic model and relevance filtering

Haojie Li Lei Yi Bin Liu Yi Wang 《Machine Vision and Applications》2014,25(7):1661-1670

Numerous web videos associated with rich metadata are available on the Internet today. While such metadata like video tags bring us facilitation and opportunities for video search and multimedia content understanding, some challenges also arise due to the fact that those video tags are usually annotated at the video level, while many tags actually only describe parts of the video content. How to localize the relevant parts or frames of web video for given tags is the key to many applications and research tasks. In this paper we propose combining topic model and relevance filtering to localize relevant frames. Our method is designed in three steps. First, we apply relevance filtering to assign relevance scores to video frames and a raw relevant frame set is obtained by selecting the top ranked frames. Then, we separate the frames into topics by mining the underlying semantics using latent Dirichlet allocation and use the raw relevance set as validation set to select relevant topics. Finally, the topical relevances are used to refine the raw relevant frame set and the final results are obtained. Experiment results on two real web video databases validate the effectiveness of the proposed approach. 相似文献

8.

A shot detection technique using linear regression of shot transition pattern

Debabrata Dutta Sanjoy Kumar Saha Bhabatosh Chanda 《Multimedia Tools and Applications》2016,75(1):93-113

Video segmentation acts as the fundamental step for various applications like, archiving, content based retrieval, copy detection and summarization of video data. Shot detection is first level of segmentation. In this work, a shot detection methodology is presented that evolves around a simple shot transition model based on the similarity of the frames with respect to a reference frame. Frames in an individual shot are very similar in terms of their visual content. Whenever a shot transition occurs a change in similarity values appears. For an abrupt transition, the rate of change is very high, while for gradual it is not so apparent. To overcome the effect of noise in similarity values, line is fit over a small window using a linear regression. Thus slope of this line exhibits the underlying pattern of transition. A novel algorithm for shot detection, hence, is developed based on the variation pattern of the similarity values of the frames with respect to a reference frame. First an algorithm is proposed, which is direct descendant of the underlying transition model and applies a threshold on the similarity values to detect the transitions. Then this algorithm is improved by utilizing the slope of linear approximation of variation in similarity values rather than the absolute values, following least square regression. Threshold on the slope is determined with a bias towards minimizing false rejection rate at the cost of false acceptance rate. Finally, a simple post-processing technique is adopted to reduce the false detection. Experiment is done with the video sequences taken from TRECVID 2001 database, action type movie video, recorded sports and news video. Comparison with few other systems indicates that the performance of the proposed scheme is quite satisfactory. 相似文献

9.

基于基线的视频维吾尔文字幕帧提取研究

张鲁建哈力旦·阿布都热依木黄浩《传感器与微系统》2013,32(4)

根据维吾尔文字独有的基线特性,提出了一种新的视频维吾尔文字幕帧提取方法,首先进行维吾尔文字幕帧的读取,然后根据相邻帧之间的像素帧间差异和区域像素统计对视频段作初步镜头关键帧的检测,之后对检测到的镜头关键帧作区域处理,检测视频帧中是否具有基线特性,再根据基线设置阈值,最后提取出代表视频语义的主要视频帧。实验证明:该提取方法简洁有效,其字幕帧提取率平均可达到85%以上。相似文献

10.

A robust multimedia surveillance system for people counting

Al-Zaydi Zeyad Q. H. Ndzi David L. Kamarudin Munirah L. Zakaria Ammar Shakaff Ali Y. M. 《Multimedia Tools and Applications》2017,76(22):23777-23804

Closed circuit television cameras (CCTV) are widely used in monitoring. This paper presents an intelligent CCTV crowd counting system based on two algorithms that estimate the density of each pixel in each frame and use it as a basis for counting people. One algorithm uses scale-invariant feature transform (SIFT) features and clustering to represent pixels of frames (SIFT algorithm) and the other uses features from accelerated segment test (FAST) corner points with SIFT features (SIFT-FAST algorithm). Each algorithm is designed using a novel combination of pixel-wise, motion-region, grid map, background segmentation using Gaussian mixture model (GMM) and edge detection. A fusion technique is proposed and used to validate the accuracy by combining the result of the algorithms at frame level. The proposed system is more practical than the state of the art regression methods because it is trained with a small number of frames so it is relatively easy to deploy. In addition, it reduces the training error, set-up time, cost and open the door to develop more accurate people detection methods. The University of California (UCSD) and Mall datasets have been used to test the proposed algorithms. The mean deviation error, mean squared error and the mean absolute error of the proposed system are less than 0.1, 16.5 and 3.1, respectively, for the Mall dataset and less than 0.07, 5.5 and 1.9, respectively, for UCSD dataset.

相似文献

11.

Temporal segmentation and assignment of successive actions in a long-term video

Guoliang Lu Mineichi Kudo Jun Toyama 《Pattern recognition letters》2013

Temporal segmentation of successive actions in a long-term video sequence has been a long-standing problem in computer vision. In this paper, we exploit a novel learning-based framework. Given a video sequence, only a few characteristic frames are selected by the proposed selection algorithm, and then the likelihood to trained models is calculated in a pair-wise way, and finally segmentation is obtained as the optimal model sequence to realize the maximum likelihood. The average accuracy on IXMAS dataset reached to 80.5% at frame level, using only 16.5% of all frames in computation time of 1.57 s per video which has 1160 frames on the average. 相似文献

12.

Video flickering removal using temporal reconstruction optimization

Li Chao Chen Zhihua Sheng Bin Li Ping He Gaoqi 《Multimedia Tools and Applications》2020,79(7-8):4661-4679

In this paper, we introduce an approach to remove the flickers in the videos, and the flickers are caused by applying image-based processing methods to original videos frame by frame. First, we propose a multi-frame based video flicker removal method. We utilize multiple temporally corresponding frames to reconstruct the flickering frame. Compared with traditional methods, which reconstruct the flickering frame just from an adjacent frame, reconstruction with multiple temporally corresponding frames reduces the warp inaccuracy. Then, we optimize our video flickering method from following aspects. On the one hand, we detect the flickering frames in the video sequence with temporal consistency metrics, and just reconstructing the flickering frames can accelerate the algorithm greatly. On the other hand, we just choose the previous temporally corresponding frames to reconstruct the output frames. We also accelerate our video flicker removal with GPU. Qualitative experimental results demonstrate the efficiency of our proposed video flicker method. With algorithmic optimization and GPU acceleration, the time complexity of our method also outperforms traditional video temporal coherence methods.

相似文献

13.

基于排序学习的视频摘要

下载免费PDF全文

王鈃润聂秀山杨帆吕鹏尹义龙《智能系统学报》2018,13(6):921-927

视频数据的急剧增加,给视频的浏览、存储、检索等应用带来一系列问题和挑战,视频摘要正是解决此类问题的一个有效途径。针对现有视频摘要算法基于约束和经验设置构造目标函数,并对帧集合进行打分带来的不确定和复杂度高等问题,提出一个基于排序学习的视频摘要生成方法。该方法把视频摘要的提取等价为视频帧对视频内容表示的相关度排序问题,利用训练集学习排序函数,使得排序靠前的是与视频相关度高的帧,用学到的排序函数对帧打分,根据分数高低选择关键帧作为视频摘要。另外,与现有方法相比,该方法是对帧而非帧集合打分,计算复杂度显著降低。通过在TVSum50数据集上测试,实验结果证实了该方法的有效性。相似文献

14.

Extracting representative motion flows for effective video retrieval

Zhe Zhao Bin Cui Gao Cong Zi Huang Heng Tao Shen 《Multimedia Tools and Applications》2012,58(3):687-711

In this paper, we propose a novel motion-based video retrieval approach to find desired videos from video databases through trajectory matching. The main component of our approach is to extract representative motion features from the video, which could be broken down to the following three steps. First, we extract the motion vectors from each frame of videos and utilize Harris corner points to compensate the effect of the camera motion. Second, we find interesting motion flows from frames using sliding window mechanism and a clustering algorithm. Third, we merge the generated motion flows and select representative ones to capture the motion features of videos. Furthermore, we design a symbolic based trajectory matching method for effective video retrieval. The experimental results show that our algorithm is capable to effectively extract motion flows with high accuracy and outperforms existing approaches for video retrieval. 相似文献

15.

Video frame interpolation via optical flow estimation with image inpainting

Xiaozhang Liu Hui Liu Yuxiu Lin 《国际智能系统杂志》2020,35(12):2087-2102

As we all know, video frame rate determines the quality of the video. The higher the frame rate, the smoother the movements in the picture, the clearer the information expressed, and the better the viewing experience for people. Video interpolation aims to increase the video frame rate by generating a new frame image using the relevant information between two consecutive frames, which is essential in the field of computer vision. The traditional motion compensation interpolation method will cause holes and overlaps in the reconstructed frame, and is easily affected by the quality of optical flow. Therefore, this paper proposes a video frame interpolation method via optical flow estimation with image inpainting. First, the optical flow between the input frames is estimated via combined local and global-total variation (CLG-TV) optical flow estimation model. Then, the intermediate frames are synthesized under the guidance of the optical flow. Finally, the nonlocal self-similarity between the video frames is used to solve the optimization problem, to fix the pixel loss area in the interpolated frame. Quantitative and qualitative experimental results show that this method can effectively improve the quality of optical flow estimation, generate realistic and smooth video frames, and effectively increase the video frame rate. 相似文献

16.

视频镜头关键帧重构算法及其在检索中的应用

冯珺高新波杨益敏《计算机辅助设计与图形学学报》2008,20(6):775-780

提出一种利用重构的静态关键帧和动态关键帧来表示视频镜头内容的检索算法．该算法充分利用视频序列在时间轴上的特性构建静态关键帧,并利用三维小波变换构建动态关键帧,将运动信息和静态信息相结合以更好地反映视频序列的内容．实验结果表明,该算法的性能优于基于时域最大值关键帧的算法及其改进算法．相似文献

17.

Effective and Efficient Query Processing for Video Subsequence Identification 总被引：1，自引：0，他引：1

Heng Tao Shen Jie Shao Zi Huang Xiaofang Zhou 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(3):321-334

With the growing demand for visual information of rich content, effective and efficient manipulations of large video databases are increasingly desired. Many investigations have been made on content-based video retrieval. However, despite the importance, video subsequence identification, which is to find the similar content to a short query clip from a long video sequence, has not been well addressed. This paper presents a graph transformation and matching approach to this problem, with extension to identify the occurrence of potentially different ordering or length due to content editing. With a novel batch query algorithm to retrieve similar frames, the mapping relationship between the query and database video is first represented by a bipartite graph. The densely matched parts along the long sequence are then extracted, followed by a filter-and-refine search strategy to prune some irrelevant subsequences. During the filtering stage, maximum size matching is deployed for each subgraph constructed by the query and candidate subsequence to obtain a smaller set of candidates. During the refinement stage, sub-maximum similarity matching is devised to identify the subsequence with the highest aggregate score from all candidates, according to a robust video similarity model that incorporates visual content, temporal order, and frame alignment information. The performance studies conducted on a long video recording of 50 hours validate that our approach is promising in terms of both search accuracy and speed. 相似文献

18.

Infrared single‐pixel video imaging for sea surface targets surveillance

下载免费PDF全文

Ming Zhao Meijing Zhao Fan Yang Houde Wu Wenhai Xu 《Journal of the Society for Information Display》2017,25(12):731-736

This paper presents an infrared single‐pixel video imaging to surveille sea surface. Based on the temporal redundancy of the surveillance video, a two‐step scheme, including low‐scale detection and high‐scale detection, is proposed. For each frame, low‐scale detection performs low‐resolution single‐pixel imaging to obtain a “preview” image of the scene, where the moving target can be located. These targets are further refined in the high‐scale detection where the high‐resolution single‐pixel imagings focusing on these targets are used. The frame is reconstructed by merging these two‐level images. The simulated experiments show that for a video with 128 × 128 pixels and 150 frames, the sampling rate of our scheme is about 17.8%, and the reconstructed video presents a good visual quality. 相似文献

19.

A novel mutual nearest neighbor based symmetry for text frame classification in video

Palaiahnakote Shivakumara Author Vitae Anjan Dutta^{Author Vitae} 《Pattern recognition》2011,44(8):1671-1683

In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection, etc. We propose a new text frame classification method that introduces a combination of wavelet and median moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max-Min clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called mutual nearest neighbor based symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels. 相似文献

20.

Faster CNN-based vehicle detection and counting strategy for fixed camera scenes

Gomaa Ahmed Minematsu Tsubasa Abdelwahab Moataz M. Abo-Zahhad Mohammed Taniguchi Rin-ichiro 《Multimedia Tools and Applications》2022,81(18):25443-25471

Automatic detection and counting of vehicles in a video is a challenging task and has become a key application area of traffic monitoring and management. In this paper, an efficient real-time approach for the detection and counting of moving vehicles is presented based on YOLOv2 and features point motion analysis. The work is based on synchronous vehicle features detection and tracking to achieve accurate counting results. The proposed strategy works in two phases; the first one is vehicle detection and the second is the counting of moving vehicles. Different convolutional neural networks including pixel by pixel classification networks and regression networks are investigated to improve the detection and counting decisions. For initial object detection, we have utilized state-of-the-art faster deep learning object detection algorithm YOLOv2 before refining them using K-means clustering and KLT tracker. Then an efficient approach is introduced using temporal information of the detection and tracking feature points between the framesets to assign each vehicle label with their corresponding trajectories and truly counted it. Experimental results on twelve challenging videos have shown that the proposed scheme generally outperforms state-of-the-art strategies. Moreover, the proposed approach using YOLOv2 increases the average time performance for the twelve tested sequences by 93.4% and 98.9% from 1.24 frames per second achieved using Faster Region-based Convolutional Neural Network (F R-CNN ) and 0.19 frames per second achieved using the background subtraction based CNN approach (BS-CNN ), respectively to 18.7 frames per second.

相似文献