首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Relocated I-frames are a key type of abnormal inter-coded frame in double compressed videos with shifted GOP structures. In this work, a frame-wise detection method of relocated I-frame is proposed based on convolutional neural network (CNN). The proposed detection framework contains a novel network architecture, which initializes with a preprocessing layer and is followed by a well-designed CNN. In the preprocessing layer, the high-frequency component extraction operation is applied to eliminate the influence of diverse video contents. To mitigate overfitting, several advanced structures, such as 1 × 1 convolutional filter and the global average-pooling layer, are carefully introduced in the design of the CNN architecture. Public available YUV sequences are collected to construct a dataset of double compressed videos with different coding parameters. According to the experiments, the proposed framework can achieve a more promising performance of relocated I-frame detection than a well-known CNN structure (AlexNet) and the method based on average prediction residual.  相似文献   

2.
Double JPEG compression detection plays a vital role in multimedia forensics, to find out whether a JPEG image is authentic or manipulated. However, it still remains to be a challenging task in the case when the quality factor of the first compression is much higher than that of the second compression, as well as in the case when the targeted image blocks are quite small. In this work, we present a novel end-to-end deep learning framework taking raw DCT coefficients as input to distinguish between single and double compressed images, which performs superior in the above two cases. Our proposed framework can be divided into two stages. In the first stage, we adopt an auxiliary DCT layer with sixty-four 8 × 8 DCT kernels. Using a specific layer to extract DCT coefficients instead of extracting them directly from JPEG bitstream allows our proposed framework to work even if the double compressed images are stored in spatial domain, e.g. in PGM, TIFF or other bitmap formats. The second stage is a deep neural network with multiple convolutional blocks to extract more effective features. We have conducted extensive experiments on three different image datasets. The experimental results demonstrate the superiority of our framework when compared with other state-of-the-art double JPEG compression detection methods either hand-crafted or learned using deep networks in the literature, especially in the two cases mentioned above. Furthermore, our proposed framework can detect triple and even multiple JPEG compressed images, which is scarce in the literature as far as we know.  相似文献   

3.
Automatic License Plate Recognition (ALPR) is an important task with many applications in Intelligent Transportation and Surveillance systems. This work presents an end-to-end ALPR method based on a hierarchical Convolutional Neural Network (CNN). The core idea of the proposed method is to identify the vehicle and the license plate region using two passes on the same CNN, and then to recognize the characters using a second CNN. The recognition CNN massively explores the use of synthetic and augmented data to cope with limited training datasets, and our results show that the augmentation process significantly increases the recognition rate. In addition, we present a novel temporal coherence technique to better stabilize the OCR output in videos. Our method was tested with publicly available datasets containing Brazilian and European license plates, achieving accuracy rates better than competitive academic methods and a commercial system.  相似文献   

4.
Camera-based transmission line detection (TLD) is a fundamental and crucial task for automatically patrolling powerlines by aircraft. Motivated by instance segmentation, a TLD algorithm is proposed in this paper with a novel deep neural network, i.e., CableNet. The network structure is designed based on fully convolutional networks (FCNs) with two major improvements, considering the specific appearance characteristics of transmission lines. First, overlaying dilated convolutional layers and spatial convolutional layers are configured to better represent continuous long and thin cable shapes. Second, two branches of outputs are arranged to generate multidimensional feature maps for instance segmentation. Thus, cable pixels can be detected and assigned cable IDs simultaneously. Multiple experiments are conducted on aerial images, and the results show that the proposed algorithm obtains reliable detection performance and is superior to traditional TLD methods. Meanwhile, segmented pixels can be accurately identified as cable instances, contributing to line fitting for further applications.  相似文献   

5.
Modern deep convolutional neural networks(CNNs) are often designed to be scalable, leading to the model family concept. A model family is a large (possibly infinite) collection of related neural network architectures. The isomorphism of a model family refers to the fact that the models within it share the same high-level structure. Meanwhile, the models within the model family are called isomorphic models for each other. Existing weight initialization methods for CNNs use random initialization or data-driven initialization. Even though these methods can perform satisfactory initialization, the isomorphism of model families is rarely explored. This work proposes an isomorphic model-based initialization method (IM Init) for CNNs. It can initialize any network with another well-trained isomorphic model in the same model family. We first formulate the widely used general network structure of CNNs. Then a structural weight transformation is presented to transform the weight between two isomorphic models. Finally, we apply our IM Init to the model down-sampling and up-sampling scenarios and confirm its effectiveness in improving accuracy and convergence speed through experiments on various image classification datasets. In the model down-sampling scenario, IM Init initializes the smaller target model with a larger well-trained source model. It improves the accuracy of RegNet200MF by 1.59% on the CIFAR-100 dataset and 1.9% on the CUB200 dataset. Inversely, IM Init initializes the larger target model with a smaller well-trained source model in the model up-sampling scenario. It significantly speeds up the convergence of RegNet600MF and improves the accuracy by 30.10% under short training schedules. Code will be available.  相似文献   

6.
Multiple JPEG compressions leave artifacts in digital images: residual traces that could be exploited in forensics investigations to recover information about the device employed for acquisition or image editing software. In this paper, a novel First Quantization Estimation (FQE) algorithm based on convolutional neural networks (CNNs) is proposed. In particular, a solution based on an ensemble of CNNs was developed in conjunction with specific regularization strategies exploiting assumptions about neighboring element values of the quantization matrix to be inferred. Mostly designed to work in the aligned case, the solution was tested in challenging scenarios involving different input patch sizes, quantization matrices (both standard and custom) and datasets (i.e., RAISE and UCID collections). Comparisons with state-of-the-art solutions confirmed the effectiveness of the presented solution demonstrating for the first time to cover the widest combinations of parameters of double JPEG compressions.  相似文献   

7.
许灵龙  张玉金  吴云 《光电子.激光》2023,34(12):1271-1278
对JPEG(joint photographic experts group)图像实施篡改往往会产生双重JPEG(double JPEG,DJPE) 压缩痕迹,分析该痕迹有助于揭示图像压缩历史并实现篡改区域定位。现有算法在图像尺寸较小和质量因子(quality factor,QF) 较低的时候性能不佳,对两个QF的组合情况存在限制。本文提出了一种端到端的混合QF双重JPEG压缩图像取证网络,命名为DJPEGNet。首先,使用预处理层从图像头文件中提取表征压缩历史信息的量化表 (quantization table,Qtable) 特征,将图像从空域转换至DCT(discrete cosine transform)域构造统计直方图特征。然后,将两个特征输入到由深度可分离卷积和残差结构堆叠而成的主体结构,输出二分类结果。最后,使用滑动窗口算法自动定位篡改区域并绘制概率分布图。实验结果表明,在使用不同Qtable集生成的小尺寸数据集上,DJPEGNet所有指标均优于现有最先进的算法,其中ACC提高了1.78%,TPR提升了2.00%,TNR提升了1.60%。  相似文献   

8.
Screen content image (SCI) is a composite image including textual and pictorial regions resulting in many difficulties in image quality assessment (IQA). Large SCIs are divided into image patches to increase training samples for CNN training of IQA model, and this brings two problems: (1) local quality of each image patch is not equal to subjective differential mean opinion score (DMOS) of an entire image; (2) importance of different image patches is not same for quality assessment. In this paper, we propose a novel no-reference (NR) IQA model based on the convolutional neural network (CNN) for assessing the perceptual quality of SCIs. Our model conducts two designs solving problems which benefits from two strategies. For the first strategy, to imitate full-reference (FR) CNN-based model behavior, a CNN-based model is designed for both FR and NR IQA, and performance of NR-IQA part improves when the image patch scores predicted by FR-IQA part are adopted as the ground-truth to train NR-IQA part. For the second strategy, image patch qualities of one entire SCI are fused to obtain the SCI quality with an adaptive weighting method taking account the effect of the different image patch contents. Experimental results verify that our model outperforms all test NR IQA methods and most FR IQA methods on the screen content image quality assessment database (SIQAD). On the cross-database evaluation, the proposed method outperforms the existing NR IQA method in terms of at least 2.4 percent in PLCC and 2.8 percent in SRCC, which shows high generalization ability and high effectiveness of our model.  相似文献   

9.
For reasons of public security, modeling large crowd distributions for counting or density estimation has attracted significant research interests in recent years. Existing crowd counting algorithms rely on predefined features and regression to estimate the crowd size. However, most of them are constrained by such limitations: (1) they can handle crowds with a few tens individuals, but for crowds of hundreds or thousands, they can only be used to estimate the crowd density rather than the crowd count; (2) they usually rely on temporal sequence in crowd videos which is not applicable to still images. Addressing these problems, in this paper, we investigate the use of a deep-learning approach to estimate the number of individuals presented in a mid-level or high-level crowd visible in a single image. Firstly, a ConvNet structure is used to extract crowd features. Then two supervisory signals, i.e., crowd count and crowd density, are employed to learn crowd features and estimate the specific counting. We test our approach on a dataset containing 107 crowd images with 45,000 annotated humans inside, and each with head counts ranging from 58 to 2201. The efficacy of the proposed approach is demonstrated in extensive experiments by quantifying the counting performance through multiple evaluation criteria.  相似文献   

10.
Recognizing human interactions in still images is quite a challenging task since compared to videos, there is only a glimpse of interaction in a single image. This work investigates the role of human poses in recognizing human–human interactions in still images. To this end, a multi-stream convolutional neural network architecture is proposed, which fuses different levels of human pose information to recognize human interactions better. In this context, several pose-based representations are explored. Experimental evaluations in an extended benchmark dataset show that the proposed multi-stream pose Convolutional Neural Network is successful in discriminating a wide range of human–human interactions and human poses when used in conjunction with the overall context provides discriminative cues about human–human interactions.  相似文献   

11.
Detection of salient objects in image and video is of great importance in many computer vision applications. In spite of the fact that the state of the art in saliency detection for still images has been changed substantially over the last few years, there have been few improvements in video saliency detection. This paper proposes a novel non-local fully convolutional network architecture for capturing global dependencies more efficiently and investigates the use of recently introduced non-local neural networks in video salient object detection. The effect of non-local operations is studied separately on static and dynamic saliency detection in order to exploit both appearance and motion features. A novel deep non-local fully convolutional network architecture is introduced for video salient object detection and tested on two well-known datasets DAVIS and FBMS. The experimental results show that the proposed algorithm outperforms state-of-the-art video saliency detection methods.  相似文献   

12.
Anomaly detection and location in crowded scenes have attracted a lot of attention in computer vision research community recently due to the increased applications of intelligent surveillance improve security in public. We propose a novel parallel spatial-temporal convolution neural networks model to detect and localize the abnormal behavior in video surveillance. Our approach contains two main steps. Firstly, considering the typical position of camera and the large number of background information, we introduce a novel spatial-temporal cuboid of interest detection method with varied-size cell structure and optical flow algorithm. Then, we use the parallel 3D convolution neural networks to describe the same behavior in different temporal-lengths. That step ensures that the most of behavior information in cuboids could be captured, also insures the reduction of information unrelated to the major behavior. The evaluation results on benchmark datasets show the superiority of our method compared to the state-of-the-art methods.  相似文献   

13.
The work proposes a new multiply-and-accumulate (MAC) processing unit structure that is highly suitable for on-device convolutional neural networks (CNNs). By observing that the bit-lengths to represent the numerical values of the input/output neurons and weight parameters in on-device CNNs should be small (i.e., low precisions), usually no more than 9 bits, and vary across network layers, we propose a layer-by-layer composable MAC unit structure that is best suited to the ‘majority’ of the operations with low precisions through a maximal parallelism of the MAC operations in the unit with very little subsidiary processing overhead while being sufficiently effective in MAC unit resource utilization for the rest of operations. Precisely, two essences of this work are: (1) our MAC unit structure supports two operation modes, (mode-0) operating a single multiplier for every majority multiplication of low precisions and (mode-1) operating multiple (‘a minimal number of’) multipliers for the rest of multiplications of high precisions; (2) for a set of input CNNs, we formulate the exploration of the size of a single internal multiplier in MAC unit to derive an ‘economical’ instance, in terms of computation and energy cost, of MAC unit structure across the whole network layers. Our strategy is in a strong contrast with the conventional MAC unit design, in which the MAC input size should be large enough to cover the largest bit-size of the activation inputs/outputs and weight parameters. We show analytically and empirically that our MAC unit structure with the exploration of its instances is very effective, reducing computation cost per multiplication operation by 4.68∼30.3% and saving energy cost by 43.3% on average for the convolutional operations in AlexNet and VGG-16 over the use of the conventional MAC unit structures.  相似文献   

14.
Application of convolutional neural networks (CNNs) for image additive white Gaussian noise (AWGN) removal has attracted considerable attentions with the rapid development of deep learning in recent years. However, the work of image multiplicative speckle noise removal is rarely done. Moreover, most of the existing speckle noise removal algorithms are based on traditional methods with human priori knowledge, which means that the parameters of the algorithms need to be set manually. Nowadays, deep learning methods show clear advantages on image feature extraction. Multiplicative speckle noise is very common in real life images, especially in medical images. In this paper, a novel neural network structure is proposed to recover noisy images with speckle noise. Our proposed method mainly consists of three subnetworks. One network is rough clean image estimate subnetwork. Another is subnetwork of noise estimation. The last one is an information fusion network based on U-Net and several convolutional layers. Different from the existing speckle denoising model based on the statistics of images, the proposed network model can handle speckle denoising of different noise levels with an end-to-end trainable model. Extensive experimental results on several test datasets clearly demonstrate the superior performance of our proposed network over state-of-the-arts in terms of quantitative metrics and visual quality.  相似文献   

15.
Compared with the traditional image denoising method, although the convolutional neural network (CNN) has better denoising performance, there is an important issue that has not been well resolved: the residual image obtained by learning the difference between noisy image and clean image pairs contains abundant image detail information, resulting in the serious loss of detail in the denoised image. In this paper, in order to relearn the lost image detail information, a mathematical model is deducted from a minimization problem and an end-to-end detail retaining CNN (DRCNN) is proposed. Unlike most denoising methods based on CNN, DRCNN is not only focus to image denoising, but also the integrity of high frequency image content. DRCNN needs less parameters and storage space, therefore it has better generalization ability. Moreover, DRCNN can also adapt to different image restoration tasks such as blind image denoising, single image superresolution (SISR), blind deburring and image inpainting. Extensive experiments show that DRCNN has a better effect than some classic and novel methods.  相似文献   

16.
提出了一种新的解决红外图像小目标检测问题的深度卷积网络,将对小目标的检测问题转化为对小目标位置分布的分类问题;检测网络由全卷积网络和分类网络组成,全卷积网络对红外小目标进行增强和初步筛选,实现红外图像的背景抑制,分类网络以原始图像和背景抑制后的图像为输入,对目标点后续筛选,网络中引入SEnet(Squeeze-and-Excitation Networks)对特征图进行选择;实验验证了整个检测网络相对于传统小目标检测算法的优势,所提出的基于深度卷积神经网络的小目标检测方法对复杂背景下低信噪比且存在运动模糊的小目标具有很好的检测效果.  相似文献   

17.
This paper proposes a new method for estimating quantization steps (QSs) from an image that has been previously JPEG-compressed and stored in a lossless format. In this method, DCT coefficients of each frequency band of JPEG-compressed image are aggregated in the QS and its multiples. The entire estimation process can be grouped into two categories: alternating and direct current bands. Considering that DCT coefficients under different QSs show different periodicity, QS estimation for each band is then further divided into three steps, which involve identifying whether the QS is one, two, or another value. For each step, the periodicity of DCT coefficients can be well exploited with the analyses of the DCT-coefficient histogram and its corresponding frequency magnitude spectrum. Experimental results demonstrate the efficacy of the proposed method and the superiority in QS estimation for previously JPEG-compressed images, especially in the case that the actual QSs are higher than two.  相似文献   

18.
As the volume of multimedia digital information transmitted over the Internet continues to rise, the problem of preventing the unauthorized tampering and dissemination of digital content has emerged as a major concern. Thus, the present study proposes a forensic scheme for tracking the dissemination of copyright‐protected JPEG images over the Internet. The proposed scheme incorporates two basic mechanisms, namely signature embedding and signature detecting. To preserve the quality of the protected JPEG images, the signature is embedded at the application layer. By contrast, the signature detection process is performed at the packet level in order to improve the scalability of the proposed mechanism. For any flows regarded as suspicious, the signatures embedded in the JPEG packet trains are compared with the known digital signatures for forensic purposes. The experimental results show that the embedded signature has no effect on the visual quality of the JPEG image. Moreover, it is shown that the computational complexity of the proposed detection scheme is significantly lower than that of existing application‐level schemes. Thus, the scheme provides an ideal solution for the forensic analysis of JPEG streams over large‐scale network environments such as the Internet. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

19.
针对红外过采样扫描成像特点,提出一种基于深度卷积神经网络的红外点目标检测方法.首先,设计回归型深度卷积神经网络以抑制扫描图像杂波背景,该网络不含池化层,输出的背景抑制图像尺寸与输入图像一致;其次,对抑制后的图像进行门限检测,提取候选目标小区域原始数据;最后,将候选目标区域数据依次输入分类型深度卷积神经网络以进一步判别目标、剔除虚警.生成大量过采样训练数据有效训练两个深度网络.结果表明,在不同杂波背景下,该方法在目标信杂比增益、检测概率、虚警概率和运算时间等方面,均优于典型红外小目标检测方法,适用于红外过采样扫描系统的点目标检测.  相似文献   

20.
图像重采样检测是图像取证领域的重要任务,其目的是检测图像是否经过重采样操作。现有的基于深度学习的重采样检测方法大多只针对特定的重采样因子进行研究,而较少考虑重采样因子完全随机的情况。本文根据重采样操作中所涉及的插值技术原理设计了一组高效互补的图像预处理结构以避免图像内容的干扰,并通过可变形卷积层和高效通道注意力机制(efficient channel attention, ECA)分别提取和筛选重采样特征,从而有效提高了卷积神经网络整合提取不同重采样因子的重采样特征的能力。实验结果表明,无论对于未压缩的重采样图像还是JPEG压缩后处理的重采样图像,本文方法都可以有效检测,且预测准确率相比现有方法均有较大提升。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号