共查询到11条相似文献,搜索用时 0 毫秒
1.
《Journal of Visual Communication and Image Representation》2014,25(2):373-383
Many video fingerprints have been proposed to handle the video transformations problems when the original contents are copied and redistributed. However, most of them did not take into account flipping and rotation transformations. In this paper, we propose a novel video fingerprint based on region binary patterns, aiming to realize robust and fast video copy detection against video transformations including rotation and flipping. We extract two complementary region binary patterns from several rings in keyframes. These two kinds of binary patterns are converted into a new type of patterns for the proposed video fingerprint which is robust against rotation and flipping. The experimental results demonstrated that the proposed video fingerprint is effective for video copy detection particularly in the case of rotation and flipping. Furthermore, our experimental results proved that the proposed method allows for high storage efficiency and low computation complexity, which is suitable for practical video copy system. 相似文献
2.
Hyun-seok Min Jae Young Choi Wesley De Neve Yong Man Ro 《Signal Processing: Image Communication》2011,26(10):612-627
The detection of near-duplicate video clips (NDVCs) is an area of current research interest and intense development. Most NDVC detection methods represent video clips with a unique set of low-level visual features, typically describing color or texture information. However, low-level visual features are sensitive to transformations of the video content. Given the observation that transformations tend to preserve the semantic information conveyed by the video content, we propose a novel approach for identifying NDVCs, making use of both low-level visual features (this is, MPEG-7 visual features) and high-level semantic features (this is, 32 semantic concepts detected using trained classifiers). Experimental results obtained for the publicly available MUSCLE-VCD-2007 and TRECVID 2008 video sets show that bimodal fusion of visual and semantic features facilitates robust NDVC detection. In particular, the proposed method is able to identify NDVCs with a low missed detection rate (3% on average) and a low false alarm rate (2% on average). In addition, the combined use of visual and semantic features outperforms the separate use of either of them in terms of NDVC detection effectiveness. Further, we demonstrate that the effectiveness of the proposed method is on par with or better than the effectiveness of three state-of-the-art NDVC detection methods either making use of temporal ordinal measurement, features computed using the Scale-Invariant Feature Transform (SIFT), or bag-of-visual-words (BoVW). We also show that the influence of the effectiveness of semantic concept detection on the effectiveness of NDVC detection is limited, as long as the mean average precision (MAP) of the semantic concept detectors used is higher than 0.3. Finally, we illustrate that the computational complexity of our NDVC detection method is competitive with the computational complexity of the three aforementioned NDVC detection methods. 相似文献
3.
Duan-Yu Chen Yu-Ming Chiu 《Journal of Visual Communication and Image Representation》2013,24(5):544-551
In this paper, to efficiently detect video copies, focus of interests in videos is first localized based on 3D spatiotemporal visual attention modeling. Salient feature points are then detected in visual attention regions. Prior to evaluate similarity between source and target video sequences using feature points, geometric constraint measurement is employed for conducting bi-directional point matching in order to remove noisy feature points and simultaneously maintain robust feature point pairs. Consequently, video matching is transformed to frame-based time-series linear search problem. Our proposed approach achieves promising high detection rate under distinct video copy attacks and thus shows its feasibility in real-world applications. 相似文献
4.
Video semantic detection has been one research hotspot in the field of human-computer interaction. In video features-oriented sparse representation, the features from the same category video could not achieve similar coding results. To address this, the Locality-Sensitive Discriminant Sparse Representation (LSDSR) is developed, in order that the video samples belonging to the same video category are encoded as similar sparse codes which make them have better category discrimination. In the LSDSR, a discriminative loss function based on sparse coefficients is imposed on the locality-sensitive sparse representation, which makes the optimized dictionary for sparse representation be discriminative. The LSDSR for video features enhances the power of semantic discrimination to optimize the dictionary and build the better discriminant sparse model. More so, to further improve the accuracy of video semantic detection after sparse representation, a weighted K-Nearest Neighbor (KNN) classification method with the loss function that integrates reconstruction error and discrimination for the sparse representation is adopted to detect video semantic concepts. The proposed methods are evaluated on the related video databases in comparison with existing sparse representation methods. The experimental results show that the proposed methods significantly enhance the power of discrimination of video features, and consequently improve the accuracy of video semantic concept detection. 相似文献
5.
Anomaly behavior detection plays a significant role in emergencies such as robbery. Although a lot of works have been proposed to deal with this problem, the performance in real applications is still relatively low. Here, to detect abnormal human behavior in videos, we propose a multiscale spatial temporal attention graph convolution network (MSTA-GCN) to capture and cluster the features of the human skeleton. First, based on the human skeleton graph, a multiscale spatial temporal attention graph convolution block (MSTA-GCB) is built which contains multiscale graphs in temporal and spatial dimensions. MSTA-GCB can simulate the motion relations of human body components at different scales where each scale corresponds to different granularity of annotation levels on the human skeleton. Then, static, globally-learned and attention-based adjacency matrices in the graph convolution module are proposed to capture hierarchical representation. Finally, extensive experiments are carried out on the ShanghaiTech Campus and CUHK Avenue datasets, the final results of the frame-level AUC/EER are 0.759/0.311 and 0.876/0.192, respectively. Moreover, the frame-level AUC is 0.768 for the human-related ShanghaiTech subset. These results show that our MSTA-GCN outperforms most of methods in video anomaly detection and we have obtained a new state-of-the-art performance in skeleton-based anomaly behavior detection. 相似文献
6.
In this article we introduce a new public digital watermarking technique for video copyright protection working in the discrete wavelet transform domain. The scheme uses binary images as watermarks. These are embedded in the detail wavelet coefficients of the middle wavelet sub-bands. The method is a combination of spread spectrum and quantisation-based watermarking. Every bit of the watermark is spread over a number of wavelet coefficients with the use of a secret key. The resilience of the watermarking algorithm was tested against a series of eight different attacks using different videos. To improve the resilience of the algorithm we use error correction codes and embed the watermark with spatial and temporal redundancy. The proposed method achieves a very good perceptual quality with mean peak signal-to-noise ratio values of the watermarked videos of more than 40 dB and high resistance to a large spectrum of attacks. 相似文献
7.
Objects that occupy a small portion of an image or a frame contain fewer pixels and contains less information. This makes small object detection a challenging task in computer vision. In this paper, an improved Single Shot multi-box Detector based on feature fusion and dilated convolution (FD-SSD) is proposed to solve the problem that small objects are difficult to detect. The proposed network uses VGG-16 as the backbone network, which mainly includes a multi-layer feature fusion module and a multi-branch residual dilated convolution module. In the multi-layer feature fusion module, the last two layers of the feature map are up-sampled, and then they are concatenated at the channel level with the shallow feature map to enhance the semantic information of the shallow feature map. In the multi-branch residual dilated convolution module, three dilated convolutions with different dilated ratios based on the residual network are combined to obtain the multi-scale context information of the feature without losing the original resolution of the feature map. In addition, deformable convolution is added to each detection layer to better adapt to the shape of small objects. The proposed FD-SSD achieved 79.1% mAP and 29.7% mAP on PASCAL VOC2007 dataset and MS COCO dataset respectively. Experimental results show that FD-SSD can effectively improve the utilization of multi-scale information of small objects, thus significantly improve the effect of the small object detection. 相似文献
8.
In real‐world intelligent transportation systems, accuracy in vehicle license plate detection and recognition is considered quite critical. Many algorithms have been proposed for still images, but their accuracy on actual videos is not satisfactory. This stems from several problematic conditions in videos, such as vehicle motion blur, variety in viewpoints, outliers, and the lack of publicly available video datasets. In this study, we focus on these challenges and propose a license plate detection and recognition scheme for videos based on a temporal matching prior network. Specifically, to improve the robustness of detection and recognition accuracy in the presence of motion blur and outliers, forward and bidirectional matching priors between consecutive frames are properly combined with layer structures specifically designed for plate detection. We also built our own video dataset for the deep training of the proposed network. During network training, we perform data augmentation based on image rotation to increase robustness regarding the various viewpoints in videos. 相似文献
9.
Spatial and temporal inconsistency of depth video deteriorates encoding efficiency in three dimensional video systems. A depth video processing algorithm based on human perception is presented. Firstly, a just noticeable rendering distortion (JNRD) model is formulated by combining the analyses of the influence of depth distortion on virtual view rendering with human visual perception characteristics. Then, depth video is processed based on the JNRD model from two aspects, spatial and temporal correlation enhancement. During the process of spatial correlation enhancement, depth video is segmented into edge, foreground, and background regions, and smoothened by Gaussian and mean filters. The operations of the temporal correlation enhancement include temporal–spatial transpose (TST), temporal smoothing filter and inverse TST. Finally, encoding and virtual view rendering experiments are conducted to evaluate the proposed algorithm. Experimental results show that the proposed algorithm can greatly reduce the bit rate while it maintains the quality of virtual view. 相似文献
10.
11.
研究了一种基于时空分集与合并的图像序列中微弱点状运动目标高维积累实时检测技术,指出了同类文献中的其它算法可以归属为本文算法的几种特例。考虑目标尺寸、运动速率和亮度信息可变的实际情况,提出了相应的解决方案:针对以不同速率运动的多目标提出了金字塔式分辨率亚采样方案;针对不同尺寸的目标则提出了成批处理方案;针对可变亮度的目标则提出了可变集成度解决方案。研究了集成度、运动速度和目标亮度(信噪比)之间的具体数值关系,得到了有用的理论分析和仿真结果。 相似文献