首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multimedia Tools and Applications - This study proposes an event detection technique for acoustic surveillance that detects emergency situations by using acoustic sensors. Most surveillance systems...  相似文献   

2.
Identifying people and tracking their locations is a key prerequisite to achieving context awareness in smart spaces. Moreover, in realistic context-aware applications, these tasks have to be carried out in a non-obtrusive fashion. In this paper we present a set of robust person-identification and tracking algorithms, based on audio and visual processing. A main characteristic of these algorithms is that they operate on far-field and un-constrained audio–visual streams, which ensure that they are non-intrusive. We also illustrate that the combination of their outputs can lead to composite multimodal tracking components, which are suitable for supporting a broad range of context-aware services. In combining audio–visual processing results, we exploit a context-modeling approach based on a graph of situations. Accordingly, we discuss the implementation of realistic prototype applications that make use of the full range of audio, visual and multimodal algorithms.  相似文献   

3.
Face detection and landmark localization have been extensively investigated and are the prerequisite for many face related applications, such as face recognition and 3D face reconstruction. Most existing methods address only one of the two problems. In this paper, we propose a coupled encoder–decoder network to jointly detect faces and localize facial key points. The encoder and decoder generate response maps for facial landmark localization. Moreover, we observe that the intermediate feature maps from the encoder and decoder represent facial regions, which motivates us to build a unified framework for multi-scale cascaded face detection by coupling the feature maps. Experiments on face detection using two public benchmarks show improved results compared to the existing methods. They also demonstrate that face detection as a pre-processing step leads to increased robustness in face recognition. Finally, our experiments show that the landmark localization accuracy is consistently better than the state-of-the-art on three face-in-the-wild databases.  相似文献   

4.
Systems that broadcast/multicast over cellular networks have recently been intensively investigated. When compared to conventional terrestrial or satellite broadcasting systems, the Quality of Service (QoS) experienced by edge users is an important issue due to the inevitable inter-cell interference (ICI) that occurs within multi-cell environments. In order to resolve this issue, we have developed cooperative sub-band allocation (CSA) and CSA-joint transmission (CSA-JT) techniques that operate as a function of defined visual importance levels assigned to multi-layer videos, where the number of service users is limited over the macro/micro cell environment. To ensure that an acceptable level of video quality is delivered to edge users, an adaptive sub-band allocation scheme for layered video is designed to enhance the overall experience for all users based on maintaining program fairness. In a multiple-input/multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) system, CSA can effectively mitigate ICI at the cell border via base station (BS) cooperation. Moreover, CSA-JT can improve the QoS for cell border users via joint transmission among the cooperating BSs. In order to achieve the delivery of optimal visual quality, an optimization problem is formulated that seeks to maximize the sum experience of the multicast users. A dual decomposition technique is applied in order to reduce the computational complexity of the system. Simulation results show that the CSA and CSA-JT algorithms exhibit a remarkable reduction of outage probability.  相似文献   

5.
Pan  Yaohua  Niu  Zhibin  Wu  Jing  Zhang  Jiawan 《计算可视媒体(英文)》2019,5(4):375-390
Computational Visual Media - Role–event videos are rich in information but challenging to be understood at the story level. The social roles and behavior patterns of characters largely depend...  相似文献   

6.
7.
This paper proposes an emotion recognition system using a deep learning approach from emotional Big Data. The Big Data comprises of speech and video. In the proposed system, a speech signal is first processed in the frequency domain to obtain a Mel-spectrogram, which can be treated as an image. Then this Mel-spectrogram is fed to a convolutional neural network (CNN). For video signals, some representative frames from a video segment are extracted and fed to the CNN. The outputs of the two CNNs are fused using two consecutive extreme learning machines (ELMs). The output of the fusion is given to a support vector machine (SVM) for final classification of the emotions. The proposed system is evaluated using two audio–visual emotional databases, one of which is Big Data. Experimental results confirm the effectiveness of the proposed system involving the CNNs and the ELMs.  相似文献   

8.
9.
Design of an embedded audio–visual tracking and speech purification system is described in this paper. The system is able to perform human face tracking, voice activity detection, sound source direction estimation, and speech enhancement in real-time. Estimating the sound source directions helps to initialize the human face tracking module when the target changes the direction. The implementation architecture is based on an embedded dual-core processor, Texas Instruments DM6446 platform (Davinci), which contains an ARM core and a DSP core. For speech signal processing, an eight-channel digital microphone array is developed and the associated pre-processing and interfacing features are designed using the Altera Cyclone II FPGA. All the experiments are conducted in a real environment and the experimental results show that this system can execute all the audition and vision functions in real-time.  相似文献   

10.
11.
In this paper, the problem of non-collaborative person identification for a secure access to facilities is addressed. The proposed solution adopts a face and a speaker recognition techniques. The integration of these two methods allows to improve the performance with respect to the two classifiers.In non-collaborative scenarios, the problem of face recognition first requires to detect the face pattern then to recognize it even when in non-frontal poses. In the current work, a histogram normalization, a boosting technique and a linear discrimination analysis have been exploited to solve typical problems like illumination variability, occlusions, pose variation, etc. In addition, a new temporal classification is proposed to improve the robustness of the frame-by-frame classification. This allows to project known classification techniques for still image recognition into a multi-frame context where the image capture allows dynamics in the environment.For the audio, a method for the automatic speaker identification in noisy environments is presented. In particular, we propose an optimization of a speech de-noising algorithm to optimize the performance of the extended Kalman filter (EKF). To provide a baseline system for the integration with our proposed speech de-noising algorithm, we use a conventional speaker recognition system, based on Gaussian mixture models and mel frequency cepstral coefficients (MFCCs) as features.To confirm the effectiveness of our methods, we performed video and speaker recognition tasks first separately then integrating the results. In particular, two different corpora have been used: (a) a public corpus (ELDSR for audio and FERRET for images) and (b) a dedicated audio/video corpus, in which the speakers read a list of sentences wearing a scarf or a full-face motorcycle helmet. Experimental results show that our methods are able to reduce significantly the classification error rate.  相似文献   

12.
Digital cameras, new generation phones, commercial TV sets and, in general, all modern devices for image acquisition and visualization can benefit from algorithms for image enhancement suitable to work in real time and preferably with limited power consumption. Among the various methods described in the scientific literature, Retinex-based approaches are able to provide very good performances, but unfortunately they typically require a high computational effort. In this article, we propose a flexible and effective architecture for the real-time enhancement of video frames, suitable to be implemented in a single FPGA device. The video enhancement algorithm is based on a modified version of the Retinex approach. This method, developed to control the dynamic range of poorly illuminated images while preserving the visual details, has been improved by the adoption of a new model to perform illuminance estimation. The video enhancement parameters are controlled in real time through an embedded microprocessor which makes the system able to modify its behavior according to the characteristics of the input images, and using information about the surrounding light conditions.  相似文献   

13.
ABSTRACT

Anomaly detection (AD) is one of the most attracting topics within the recent 10 years in hyperspectral imagery (HSI). The goal of the AD is to label the pixels with significant spectral or spatial differences to their neighbours, as targets. In this paper, we propose a method that uses both spectral and spatial information of HSI based on human visual system (HVS). By inspiring the retina and the visual cortex functionality, the multiscale multiresolution analysis is applied to some principal components of hyperspectral data, to extract features from different spatial levels of the image. Then the global and local relations between features are considered based on inspiring the visual attention mechanism and inferotemporal (IT) part of the visual cortex. The effects of the attention mechanism are implemented using the logarithmic function which well highlights, small variations in pixels’ grey levels in global features. Also, the maximum operation is used over the local features for imitating the function of IT. Finally, the information theory concept is used for generating the final detection map by weighting the global and local detection maps to obtain the final anomaly map. The result of the proposed method is compared with some state-of-the-art methods such as SSRAD, FLD, PCA, RX, KPCA, and AED for two well-known real hyperspectral data which are San Diego airport and Pavia city, and a synthetic hyperspectral data. The results demonstrate that the proposed method effectively improves the AD capabilities, such as enhancement of the detection rate, reducing the false alarm rate and the computation complexity.  相似文献   

14.
Motion estimation in videos is a computationally intensive process. A popular strategy for dealing with such a high processing load is to accelerate algorithms with dedicated hardware such as graphic processor units (GPU), field programmable gate arrays (FPGA), and digital signal processors (DSP). Previous approaches addressed the problem using accelerators together with a general purpose processor, such as acorn RISC machines (ARM). In this work, we present a co-processing architecture using FPGA and DSP. A portable platform for motion estimation based on sparse feature point detection and tracking is developed for real-time embedded systems and smart video sensors applications. A Harris corner detection IP core is designed with a customized fine grain pipeline on a Virtex-4 FPGA. The detected feature points are then tracked using the Lucas–Kanade algorithm in a DSP that acts as a co-processor for the FPGA. The hybrid system offers a throughput of 160 frames per second (fps) for VGA image resolution. We have also tested the benefits of our proposed solution (FPGA + DSP) in comparison with two other traditional architectures and co-processing strategies: hybrid ARM + DSP and DSP only. The proposed FPGA + DSP system offers a speedup of about 20 times and 3 times over ARM + DSP and DSP only configurations, respectively. A comparison of the Harris feature detection algorithm performance between different embedded processors (DSP, ARM, and FPGA) reveals that the DSP offers the best performance when scaling up from QVGA to VGA resolutions.  相似文献   

15.
16.
In most existing Wyner–Ziv video coding schemes, a feedback channel (FC) is expected at the decoder in order to allocate a proper bit rate for each Wyner–Ziv frame. However, FC not only results in additional latency but also increases decoding complexity due to the several feedback-decoding iterations. Moreover, FC may be unavailable in many practical video applications. In this paper, we propose a novel feedback-free rate-allocation scheme for transform domain Wyner–Ziv video coding (TD-WZVC), which predicts the rate for each Wyner–Ziv frame at the encoder without significantly increasing the complexity of the encoder. First, a correlation estimation model is presented to characterize the relationship between the source frame and the reference frame estimated at the encoder in TD-WZVC. Then, an efficient FC-free rate-allocation algorithm is proposed and a linear model is built to avoid both overestimation and underestimation of the real rate and obtain an optimal rate-distortion performance. Experimental results show that the proposed scheme is able to achieve a good encoder rate allocation while still maintaining consistent coding efficiency.  相似文献   

17.
This paper is concerned with the fault detection problem for two-dimensional (2-D) discrete-time systems described by the Fornasini–Marchesini local state-space model. The goal of the paper is to design a fault detection filter to detect the occurrence of faults in finite-frequency domain. To this end, a finite-frequency H? index is used to describe fault sensitivity performance, and a finite-frequency H index is used to describe disturbance attenuation performance. In light of the generalised Kalman–Yakubovich–Popov lemma for 2-D systems and matrix inequality techniques, convex conditions are derived for this fault detection problem. Based on these conditions, a numerical algorithm is put forward to construct a desired fault detection filter. Finally, a numerical example and an industrial example are given to illustrate the effectiveness of the proposed algorithm.  相似文献   

18.
《Ergonomics》2012,55(6):775-797
In a simulated aircraft navigation task, a fusion technique known as triangulation was used to improve the accuracy and onscreen availability of location information from two separate radars. Three experiments investigated whether the reduced cognitive processing required to extract information from the fused environment led to impoverished retention of visual–spatial information. Experienced pilots and students completed various simulated flight missions and were required to make a number of location estimates. Following a retention interval, memory for locations was assessed. Experiment 1 demonstrated, in an applied setting, that the retention of fused information was problematic and Experiment 2 replicated this finding under laboratory conditions. Experiment 3 successfully improved the retention of fused information by limiting its availability within the interface, which it is argued, shifted participants' strategies from over-reliance on the display as an external memory source to more memory-dependent interaction. These results are discussed within the context of intelligent interface design and effective human–machine interaction.  相似文献   

19.
Occlusion is visible in only one frame and cannot be seen in the other frame which is a vital challenge in video stitching. Occlusion always brings ghost artifacts in the blended area. Meanwhile, the traditional image stitching approaches ignore temporal consistency and cannot avoid flicking problem. To solve these challenges, we propose a unified framework in which the stitching quality and stabilization both perform well. Specifically, we explicitly detect the potential occlusion regions to indicate blending information. Then, based on the occlusion maps, we choose a proper strip in the overlapped region as the blending area. With spatial–temporal Bayesian view synthesis, spatial ghost-like artifacts can be significantly eliminated and the output videos can be kept stable. The experimental results show the out performance of the proposed approach compared to state-of-the-art approaches.  相似文献   

20.
Near-field source localization using passive sensor arrays plays an important role in array signal processing areas. Although many algorithms have been developed to deal with this issue, most of them suffer from either parameter match or heavy loss of the aperture or high computational complexity problems. To overcome these problems, a new algorithm is proposed in this paper to jointly estimate the ranges, directions-of-arrival (DOAs), and frequencies of multiple near-field narrow-band sources. Simulation results verify that the proposed algorithm can resolve these problems and give much better performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号