首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Although First Person Vision systems can sense the environment from the user’s perspective, they are generally unable to predict his intentions and goals. Since human activities can be decomposed in terms of atomic actions and interactions with objects, intelligent wearable systems would benefit from the ability to anticipate user-object interactions. Even if this task is not trivial, the First Person Vision paradigm can provide important cues to address this challenge. We propose to exploit the dynamics of the scene to recognize next-active-objects before an object interaction begins. We train a classifier to discriminate trajectories leading to an object activation from all others and forecast next-active-objects by analyzing fixed-length trajectory segments within a temporal sliding window. The proposed method compares favorably with respect to several baselines on the Activity of Daily Living (ADL) egocentric dataset comprising 10 h of videos acquired by 20 subjects while performing unconstrained interactions with several objects.  相似文献   

2.
In this paper, we present two new articulated motion analysis and object tracking approaches: the decentralized articulated object tracking method and the hierarchical articulated object tracking method. The first approach avoids the common practice of using a high-dimensional joint state representation for articulated object tracking. Instead, we introduce a decentralized scheme and model the interpart interaction within an innovative Bayesian framework. Specifically, we estimate the interaction density by an efficient decomposed interpart interaction model. To handle severe self-occlusions, we further extend the first approach by modeling high-level interunit interaction and develop the second algorithm within a consistent hierarchical framework. Preliminary experimental results have demonstrated the superior performance of the proposed approaches on real-world videos in both robustness and speed compared with other articulated object tracking methods.  相似文献   

3.
Removal of rain from videos: a review   总被引:1,自引:0,他引:1  
In this paper, the algorithms for the detection and removal of rain from videos have been reviewed. Rain reduces the visibility of scene and thus performance of computer vision algorithms which use feature information. Detection and removal of rain requires the discrimination of rain and nonrain pixels. Accuracy of the algorithm depends upon this discrimination. Here merits and demerits of the algorithms are discussed, which motivate the further research. A rain removal algorithm has a wide application in tracking and navigation, consumer electronics and entertainment industries.  相似文献   

4.
Several autonomous traffic monitoring systems have been created as a result of the growing number of vehicles in urban areas. Traffic surveillance systems that use roadside cameras, in particular, are becoming widely used for traffic management. For an efficient traffic control and vehicle navigation system, accurate traffic flow information must be obtained based on the vehicles detected in surveillance videos. However, vehicles of various scales are difficult to spot in traffic surveillance videos due to the presence of barricades, other vehicles, and the impact of poor lighting. Also, adverse weather conditions like snow, fog, and heavy rain diminish the visual quality of the surveillance footage. This paper proposes multi-scale dense nested deep CNN (MSDN-DCNN) and regional search grasshopper optimization algorithm (RS-GOA) framework to accurately detect the vehicles, estimate the traffic flow, and find the optimal path with less travel time. First, the surveillance videos are pre-processed, which includes frame conversion, redundancy removal, and image enhancement. The pre-processed frames are given as input to the MSDN-DCNN for multi-scale vehicle detection. The detected results are used for vehicle counting and estimating the traffic flow. Finally, the optimal path is chosen based on the traffic flow information by using the RS-GOA algorithm. The performance of the proposed method is compared with the existing vehicle detection and path selection techniques. The results illustrate that the proposed Deep CNN-RS-GOA framework has improved performance with high detection accuracy (91.03%), high speed (53.9 fps), less running time (1,000 ms), less travel time, and faster convergence.  相似文献   

5.
对干扰矩阵的具体应用进行研究,首先分析干扰矩阵的原理,然后对干扰矩阵的特点进行描述.将干扰矩阵运用到基站过覆盖分析、用户投诉、频率规划三个方面.  相似文献   

6.
7.
In this paper, we propose a novel approach for facial expression analysis and recognition. The main contributions of the paper are as follows. First, we propose a temporal recognition scheme that classifies a given image in an unseen video into one of the universal facial expression categories using an analysis–synthesis scheme. The proposed approach relies on tracked facial actions provided by a real-time face tracker. Second, we propose an efficient recognition scheme based on the detection of keyframes in videos. Third, we use the proposed method for extending the human–machine interaction functionality of the AIBO robot. More precisely, the robot is displaying an emotional state in response to the user's recognized facial expression. Experiments using unseen videos demonstrated the effectiveness of the developed methods.  相似文献   

8.
This article proposes a method for learning and robotic replication of dynamic collaborative tasks from offline videos. The objective is to extend the concept of learning from demonstration (LfD) to dynamic scenarios, benefiting from widely available or easily producible offline videos. To achieve this goal, we decode important dynamic information, such as the Configuration Dependent Stiffness (CDS), which reveals the contribution of arm pose to the arm endpoint stiffness, from a three-dimensional human skeleton model. Next, through encoding of the CDS via Gaussian Mixture Model (GMM) and decoding via Gaussian Mixture Regression (GMR), the robot’s Cartesian impedance profile is estimated and replicated. We demonstrate the proposed method in a collaborative sawing task with leader–follower structure, considering environmental constraints and dynamic uncertainties. The experimental setup includes two Panda robots, which replicate the leader–follower roles and the impedance profiles extracted from a two-persons sawing video.  相似文献   

9.
This paper presents an implementation scheme of Motion-JPEG2000 (MJP2) integrated with invertible deinterlacing. In previous work, we developed an invertible deinterlacing technique that suppresses the comb-tooth artifacts which are caused by field interleaving for interlaced scanning videos, and affect the quality of scalable frame-based codecs, such as MJP2. Our technique has two features, where sampling density is preserved and image quality is recovered by an inverse process. When no codec is placed between the deinterlacer and inverse process, the original video is perfectly reconstructed. Otherwise, it is almost completely recovered. We suggest an application scenario of this invertible deinterlacer for enhancing the sophisticated signal-to-noise ratio scalability in the frame-based MJP2 coding. The proposed system suppresses the comb-tooth artifacts at low bitrates, while enabling the quality recovery through its inverse process at high bitrates within the standard bitstream format. The main purpose of this paper is to present a system that yields high quality recovery for an MJP2 codec. We demonstrate that our invertible deinterlacer can be embedded into the discrete.wavelet transform employed in MJP2. As a result, the energy gain factor to control rate-distortion characteristics can be compensated for optimal compression. Simulation results show that the recovery of quality is improved by, for example, more than 2.0 dB in peak signal-to-noise ratio by applying our proposed gain compensation when decoding 8-bit grayscale Football sequence at 2.0 bpp.  相似文献   

10.
Semantic annotation of sports videos   总被引:3,自引:0,他引:3  
Taking into consideration the unique qualities of sports videos, we propose a system that semantically annotates them at different layers of semantic significance, using different elements of visual content. We decompose each shot into its visual and graphic content elements and, by combining several different low-level visual primitives, capture the semantic content at a higher level of significance  相似文献   

11.
In this paper, we propose an approach for detecting ball in broadcast soccer videos. We use hybrid techniques for identifying ball in medium and long shots. Candidate ball positions are first extracted using features based on shape and size. For medium shots, a ball is identified by filtering the candidates with the help of motion information. In long shots, after motion based filtering of the non-ball candidates, a directed weighted graph is constructed for the remaining ball candidates. Each node in the graph represents a candidate and each edge links candidates in a frame with the candidates in next two consecutive frames. Finally, dynamic programming is applied to find the longest path of the graph, which gives the actual ball trajectory. Experiments with several soccer sequences show that the proposed approach is very efficient.  相似文献   

12.
The huge amount of video data on the internet requires efficient video browsing and retrieval strategies. One of the viable solutions is to provide summaries of the videos in the form of key frames. The video summarization using visual attention modeling has been used of late. In such schemes, the visually salient frames are extracted as key frames on the basis of theories of human attention modeling. The visual attention modeling schemes have proved to be effective in video summarization. However, the high computational costs incurred by these techniques limit their applicability in practical scenarios. In this context, this paper proposes an efficient visual attention model based key frame extraction method. The computational cost is reduced by using the temporal gradient based dynamic visual saliency detection instead of the traditional optical flow methods. Moreover, for static visual saliency, an effective method employing discrete cosine transform has been used. The static and dynamic visual attention measures are fused by using a non-linear weighted fusion method. The experimental results indicate that the proposed method is not only efficient, but also yields high quality video summaries.  相似文献   

13.
A new method for rapidly constructing photographic masks which are required in the fabrication of microelectronic circuits is described. The method employs an optical projection technique to obtain a precisely defined pencil of light which is used for "writing" on a high resolution photographic plate. The photomask geometry is exposed in a raster-like format by accurate servo-controlled translation of the high resolution plate with respect to the projected light beam and by relay controlled automatic shuttering employed in conjunction with the operation of the servosystem. The automatic projection method provides an inherent optical reduction factor of 2 to 22 times and has permitted one-step fabrication of master photomasks having image characteristic tolerances and definitions as follows: vertical positional of ±50.0 microns, horizontal positional of ±10.0 microns, vertical edge of less than 2.5 microns, and horizontal edge of less than 25.0 microns. Writing speeds of 2.0 to 80.0 mils per second were used to produce line widths from 6.0 to 100.0 microns.  相似文献   

14.
Existing temporal segmentation methods suffer from the problems of high computational complexity and complicated steps. To address this issue, we present a method that combines the binary tree and spatio-temporal tunnel (STT) for temporal segmentation of rough videos. First, we compute initial cumulative spatio-temporal flow to determine flow overflow of sub-video which is divided from a rough video. Second, the decision tree is generated by combining binary tree and balance factor to dynamically adjust the sampling line of the STT. Finally, pixels on the sampling line are extracted to generate an adaptive STT for temporal proposals. Experimental results show that the computational complexity of the proposed method is significantly better than that of the comparison methods while ensuring accuracy.  相似文献   

15.
运动目标的检测是数字图像处理和模式识别的基础,也是计算机视觉研究的一个重要领域。以C#为主要研究工具,对基于相邻帧差法及背景差分法的视频目标检测算法进行了研究,主要对其原理和算法进行研究。最后利用以AForge.NET架构类库,利用图像灰度的绝对值是否大于设置的阈值实现了对运动目标进行检测,实验结果表明,采用该算法可以对运动目标进行较为精确的检测。  相似文献   

16.
《信息技术》2015,(10):42-45
立体视频质量评价是反映立体视频系统性能的重要指标。由于立体视频的最终观测者是人,因此,判断立体视频质量的最有效方法就是立体视频质量的主观评价,而且立体视频质量主观评价结果是优化立体视频质量客观评价方法的重要依据。基于大脑产生立体视觉的特性,现提出了一套相对完善的立体视频质量主观评价方案,主要包括立体视频数据库的建立、观看条件的确定、评测者的选定、评测者的训练和测试、以及数据处理和实验结果分析。  相似文献   

17.
Recent developments in the video coding technology brought new possibilities of utilising inherently embedded features of the encoded bit-stream in applications such as video adaptation and analysis. Due to the proliferation of surveillance videos there is a strong demand for highly efficient and reliable algorithms for object tracking. This paper presents a new approach for the fast compressed domain analysis utilising motion data from the encoded bit-streams in order to achieve low-processing complexity of object tracking in the surveillance videos. The algorithm estimates the trajectory of video objects by using compressed domain motion vectors extracted directly from standard H.264/MPEG-4 Advanced Video Coding (AVC) and Scalable Video Coding (SVC) bit-streams. The experimental results show comparable tracking precision when evaluated against the standard algorithms in uncompressed domain, while maintaining low computational complexity and fast processing time, thus making the algorithm suitable for real time and streaming applications where good estimates of object trajectories have to be computed fast.  相似文献   

18.
Partial encryption of compressed images and videos   总被引:12,自引:0,他引:12  
The increased popularity of multimedia applications places a great demand on efficient data storage and transmission techniques. Network communication, especially over a wireless network, can easily be intercepted and must be protected from eavesdroppers. Unfortunately, encryption and decryption are slow, and it is often difficult, if not impossible, to carry out real-time secure image and video communication and processing. Methods have been proposed to combine compression and encryption together to reduce the overall processing time, but they are either insecure or too computationally intensive. We propose a novel solution called partial encryption, in which a secure encryption algorithm is used to encrypt only part of the compressed data. Partial encryption is applied to several image and video compression algorithms in this paper. Only 13-27% of the output from quadtree compression algorithms is encrypted for typical images, and less than 2% is encrypted for 512×512 images compressed by the set partitioning in hierarchical trees (SPIHT) algorithm. The results are similar for video compression, resulting in a significant reduction in encryption and decryption time. The proposed partial encryption schemes are fast, secure, and do not reduce the compression performance of the underlying compression algorithm  相似文献   

19.
In the field of video protection, selective encryption (SE) is a scheme which ensures the visual security of a video by encrypting only a small part of the data. This paper presents a new SE algorithm for H.264/AVC videos in context-adaptive variable-length coding mode. This algorithm controls the amount of encrypted alternative coefficients (ACs) of the integer transform in the entropic encoder. Two visual quality measures, the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM), are used to measure the visual confidentiality level of each video frame and to control the amount of encrypted ACs. Moreover, a new psychovisual metric to measure the flickering is introduced, the so-called temporal structural similarity (TSSIM). This method can be applied on intra and inter frame video sequences. Several experimental results show the efficiency of the proposed method.  相似文献   

20.
In this study, a spatiotemporal saliency detection and salient region determination approach for H.264 videos is proposed. After Gaussian filtering in Lab color space, the phase spectrum of Fourier transform is used to generate the spatial saliency map of each video frame. On the other hand, the motion vector fields from each H.264 compressed video bitstream are backward accumulated. After normalization and global motion compensation, the phase spectrum of Fourier transform for the moving parts is used to generate the temporal saliency map of each video frame. Then, the spatial and temporal saliency maps of each video frame are combined to obtain its spatiotemporal saliency map using adaptive fusion. Finally, a modified salient region determination scheme is used to determine salient regions (SRs) of each video frame. Based on the experimental results obtained in this study, the performance of the proposed approach is better than those of two comparison approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号