首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper proposes an efficient video coding method using audio-visual focus of attention, which is based on the observation that sound-emitting regions in an audio-visual sequence draw viewers’ attention. First, an audio-visual source localization algorithm is presented, where the sound source is identified by using the correlation between the sound signal and the visual motion information. The localization result is then used to encode different regions in the scene with different quality in such a way that regions close to the source are encoded with higher quality than those far from the source. This is implemented in the framework of H.264/AVC by assigning different quantization parameters for different regions. Through experiments with both standard and high definition sequences, it is demonstrated that the proposed method can yield considerable coding gains over the constant quantization mode of H.264/AVC without noticeable degradation of perceived quality.  相似文献   

2.
Yi ZHANG  Juan LI  Min ZHANG 《通信学报》2019,40(1):102-109
In traditional multi-source localization field,it is necessary to guarantee that the number of microphone is more than the number of source.To overcome this constraint,a dual-microphone multi-source localization algorithm based on CS was proposed,where the number of sound source localized successfully was more than 3.The multi-source localization was regarded as the block sparse signal reconstruction in this algorithm,and the full room impulse responses normalized were exploited to construct the compressed observation matrix in frequency domain.In reconstructed block sparse signal,the positions of non-zero blocks were corresponded to the positions of sound sources in space.The simulation shows that compared with the SRP-sub algorithm,in reverberation time 0.6s with dual-microphone,the proposed multi-source localization algorithm based on compressed sensing has higher capability which can reach 80% success rate by using 40 frequency points to localize 3 sound sources.  相似文献   

3.
The letter proposed a sound source localization method of digital hearing aids using wavelet based multivariate statistics with the Generalized Cross Correlation (GCC) algorithm. Haar wavelet is used to decompose GCC sequences and extract four wavelet characteristics. And then, Hotelling T2 statistical method is used to fuse the four wavelet characteristics. The statistical value is used to judge the number of sound sources and obtain corresponding time delay estimation which is used to localize the position of sound source. The experimental results show that the proposed method has better robustness in an environment with severe noise and reverberation. Meanwhile, the complexity of algorithm is moderate, which is available for sound source localization of hearing aids.  相似文献   

4.
肖易明  张海剑  孙洪  丁昊 《信号处理》2019,35(12):1969-1978
在日常生活中视觉事件通常伴随着声音的产生。这表明视频流与音频之间存在某种潜在的联系,本文称之为音视频同步的联合表达。本文将视频流与音频融合并通过训练所设计的神经网络预测视频流和音频是否在时间上同步来学习这种联合表达。与传统音视频信息融合方法不同,本文引入注意力机制,利用视频特征与音频特征的皮尔森相关系数在时间维度和空间维度同时对视频流加权,使视频流与音频关联更加紧密。基于学习到的音视频同步的联合表达,本文进一步利用类激活图方法进行视频声源定位。实验结果表明,所提出的引入注意力机制的音视频同步检测模型可以更好地判定给定视频的音视频是否同步,即更好地学习到音视频同步的联合表达,从而也可以有效地定位视频声源。   相似文献   

5.
This paper introduces a mechanism for localizing a microphone array when the location of sound sources in the environment is known. Using the proposed spatial observability function based microphone array integration technique, a maximum likelihood estimator for the correct position and orientation of the array is derived. This is used to localize and track a microphone array with a known and fixed geometrical structure, which can be viewed as the inverse sound localization problem. Simulations using a two-element dynamic microphone array illustrate the ability of the proposed technique to correctly localize and estimate the orientation of the array even in a very reverberant environment. Using 1 s male speech segments from three speakers in a 7 m by 6 m by 2.5 m simulated environment, a 30 cm inter-microphone distance, and PHAT histogram SLF generation, the average localization error was approximately 3 cm with an average orientation error of 19/spl deg/. The same simulation configuration but with 4 s speech segments results in an average localization error less than 1cm, with an average orientation error of approximately 2/spl deg/. Experimental examples illustrate localizations for both stationary and dynamic microphone pairs.  相似文献   

6.
Learning multimodal dictionaries.   总被引:1,自引:0,他引:1  
Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many signal processing and computer vision problems involving mutually related signals. The simultaneous processing of multimodal data can, in fact, reveal information that is otherwise hidden when considering the signals independently. However, in natural multimodal signals, the statistical dependencies between modalities are in general not obvious. Learning fundamental multimodal patterns could offer deep insight into the structure of such signals. In this paper, we present a novel model of multimodal signals based on their sparse decomposition over a dictionary of multimodal structures. An algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal is proposed, as well. The learning is defined in such a way that it can be accomplished by iteratively solving a generalized eigenvector problem, which makes the algorithm fast, flexible, and free of user-defined parameters. The proposed algorithm is applied to audiovisual sequences and it is able to discover underlying structures in the data. The detection of such audio-video patterns in audiovisual clips allows to effectively localize the sound source on the video in presence of substantial acoustic and visual distractors, outperforming state-of-the-art audiovisual localization algorithms.  相似文献   

7.
We describe the first single microphone sound localization system and its inspiration from theories of human monaural sound localization. Reflections and diffractions caused by the external ear (pinna) allow humans to estimate sound source elevations using only one ear. Our single microphone localization model relies on a specially shaped reflecting structure that serves the role of the pinna. Specially designed analog VLSI circuitry uses echo-time processing to localize the sound. A CMOS integrated circuit has been designed, fabricated, and successfully demonstrated on actual sounds.  相似文献   

8.
For near-field localization of multiple sound sources in reverberant environments,a algorithm model based on approximated kernel density estimator (KDE) was proposed.Multi-stage (MS) of sub-band processing was introduced to effectively solve the spatial aliasing by wide spacing.Spatial likelihood function (SLF) was built for multi-dimensional fusion by using two operators,sum (S) and prod (P).Then four algorithms,S-KDE,P-KDE,S-KDEMS,P-KDEMS,were derived.By the comprehensive comparison of the two statistical indicators root mean square error (RMSE) and percentage of SLF (PSLF) which denoted the recognition,P-KDEMS is confirmed as a near-field localization algorithm of multiple sound sources with high robustness and recognition.  相似文献   

9.
基于脆弱水印的图像认证算法研究   总被引:17,自引:3,他引:17       下载免费PDF全文
很多基于分块的图像认证算法为了提高安全性,采用大分块或者分块相关技术,因而牺牲了定位精度.通过对各种攻击的分析,提出了一种基于脆弱水印的图像认证方案.使用SHA512算法和基于背包问题的单向函数来产生水印,使用滑动窗口技术和层次结构来嵌入水印,使强加密算法在小分块上得以应用.该方案不但能够抵抗矢量量化等目前已知的各种攻击,而且能够将篡改定位到2×2大小的像素块上.理论分析和实验数据表明,该方案在保证系统安全性的同时,有效地提高了篡改定位的精度.  相似文献   

10.
In this paper, a cross-modal approach is developed for social image clustering and tag cleansing. First, a semantic image clustering algorithm is developed for assigning large-scale weakly-tagged social images into a large number of image topics of interest. Spam tags are detected automatically via sentiment analysis and multiple synonymous tags are merged as one super-topic according to their inter-topic semantic similarity contexts. Second, multiple base kernels are seamlessly combined by maximizing the correlations between the visual similarity contexts and the semantic similarity context, which can achieve more precise characterization of cross-modal (semantic and visual) similarity contexts among weakly-tagged social images. Finally, a K-way min–max cut algorithm is developed for social image clustering by minimizing the cumulative inter-cluster cross-modal similarity contexts while maximizing the cumulative intra-cluster cross-modal similarity contexts. The optimal weights for base kernel combination are simultaneously determined by minimizing the cumulative within-cluster variances. The polysemous tags and their ambiguous images are further split into multiple sub-topics for reducing their within-topic visual diversity. Our experiments on large-scale weakly-tagged Flickr images have provided very positive results.  相似文献   

11.
不同颜色的可见光本质上是具有不同波长范围的电磁波.本文试探性地提出了一种动态颜色模型,它模拟了成像曝光时间内图像平面所接收到的电磁波的动态变化.离散化之后,彩色图像的颜色特征能够被表示成一个K维矢量,称为彩色图像的动态颜色空间表示.然后建立了模糊C-均值分割算法,分别在动态颜色空间和RGB空间分割彩色图像,实验结果表明动态颜色空间的分割结果优于RGB空间的分割,从而验证了动态颜色空间的性能.笔者相信本文所提出的动态颜色模型也能够被用于纹理分析或其它的图像处理领域.  相似文献   

12.
结合噪声源定位方法和阵列信号处理技术,设计了一种能提取输电线噪声的传声器阵列,利用该阵列采集了复杂声场条件下的输电线噪声数据,并分析了噪声源的频谱特性和空间分布特征,验证了宽带频域近场聚焦波束形成方法对输电线噪声源定位的有效性.通过实验观测发现,西安上苑330 kV高压输电线噪声源主要为高频随机噪声,其频率集中在2~ 10 kHz范围内,信号能量分布较均匀;在频段为2~8 kHz范围内的噪声信号,增大信号处理的带宽可以提高声源定位的精度,但如果信号处理的带宽超过了输电线的主频范围,定位性能会变差.  相似文献   

13.
《电子学报:英文版》2017,(6):1302-1307
Usually source localization using sensor networks requires many sensors to localize a few number of sources, and it is still very troublesome to deal with coherent sources. When the three-dimensional (3-D) space are considered, the localization will become more difficult. A new approach is proposed to localize 3-D wideband coherent sources based on distributed sensor network, which consists of two nodes and each node contains only two sensors. Direction-of-arrival (DOA) estimation is performed at each node by employing a new noise subspace proposed. Combining the pattern matching idea and the prior geometrical information of sources, a cost function is constructed to estimate the rough positions. A rotational projection algorithm is proposed to estimate the heights of sources and correct the rough positions, and consequently the localization of 3-D sources could be achieved. Numerical examples are provided to demonstrate the effectiveness of this approach.  相似文献   

14.
针对目前对高精度室内定位算法的需求,提出一种基于接收信号强度识别(RSSI)和惯性导航的融合室内定位算法。基于无线传感网中ZigBee节点的RSSI值,采用位置指纹识别算法,对网络中的未知节点进行定位。结合惯性传感单元(IMU)提供的惯性数据,对RSSI定位结果进行融合修正。利用Kalman滤波器,采用状态方程描述待定位节点位置坐标的动态变化规律,从而实现一种以无线传感网络定位为主、IMU为辅的融合定位方法。仿真结果表明,提出的融合定位算法既能改善单独使用RSSI定位受环境干扰较大的问题,又能避免单独使用惯性导航带来的累积误差,极大地提高了定位精度。  相似文献   

15.
刘云武  杨卫 《压电与声光》2014,36(2):314-316
传声器声源定位是利用传声器阵列拾取声音信号,并用数字信号处理技术对其进行分析和处理的声源定位技术。设计了一种基于MUSIC算法的声源定位系统,该系统以数字信号处理器(DSP)为控制核心,通过TMS320C6747高速数字信号处理器,提高了系统处理速度,简化了设计,能更好的满足声源定位的要求。经测试,该系统具有实时、高速、精度高及可靠性好的优点。  相似文献   

16.
An operational scheme for masking cloud-contaminated pixels in Advanced Very High Resolution Radiometer (AVHRR) daytime data over land is developed, evaluated, and presented. Dynamic thresholding is used with channel 1 reflectance data, channel 3 minus channel 4 temperature difference data, and channel 4 minus channel 5 temperature difference data to automatically create a cloud mask for a single image. The dynamic thresholds can be applied in two different ways: to each pixel individually and to classes of pixels determined by an unsupervised minimum Euclidian distance classifier. The dynamic threshold cloud-masking (DTCM) algorithm presented in this study is used to produce cloud masks based on three different configurations: two channels and individual pixels, three channels and individual pixels, and three channels and classes of pixels. These cloud masks are compared with control masks that were created by visual inspection. The results from the clouds from AVHRR (CLAVR) algorithm and the cloud and surface parameter retrieval (CASPR) algorithm are also compared with the control masks. The results of the comparisons indicate that DTCM, applied on a pixel-by-pixel basis, correctly identifies more clear pixels than CASPR or CLAVR while correctly identifying a comparable or higher number of cloud-contaminated pixels.  相似文献   

17.
In this paper we present techniques for detecting and locating transient pipe burst events in water distribution systems. The proposed method uses multiscale wavelet analysis of high rate pressure data recorded to detect transient events. Both wavelet coefficients and Lipschitz exponents provide additional information about the nature of the signal feature detected and can be used for feature classification. A local search method is proposed to estimate accurately the arrival time of the pressure transient associated with a pipe burst event. We also propose a graph-based localization algorithm which uses the arrival times of the pressure transient at different measurement points within the water distribution system to determine the actual location (or source) of the pipe burst. The detection and localization performance of these algorithms is validated through leak-off experiments performed on the WaterWiSe@SG wireless sensor network test bed, deployed on the drinking water distribution system in Singapore. Based on these experiments, the average localization error is 37.5 m. We also present a systematic analysis of the sources of localization error and show that even with significant errors in wave speed estimation and time synchronization the localization error is around 56 m.  相似文献   

18.
n维超立方体模映射安全隐写算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对目前基于模函数的隐写研究现状,提出n维超立方体模映射隐写算法.根据模运算性质定义一个n维模函数,将n个像素值映射到一位an进制数值,从而可以实现将一位an进制信息隐藏到n个像素中.选择不同的参数a可以得到不同的嵌入率和载密图像视觉质量,选择较大的参数n且a为偶数时可以得到更好的载密图像视觉质量.理论分析和实验结果表明,本文算法与众多隐写算法相比,不仅具备这些算法的功能,而且具有更好的载密图像视觉质量、安全性和更强的实用性.  相似文献   

19.
This paper presents a hybrid image interpolation algorithm to keep details and edges simultaneously. The basic idea is to separate the unknown pixels into two classes and estimate them in different way. One class of unknown pixels is obtained via shifted linear interpolation and the other class through statistical signal processing method. The merit of this hybrid algorithm is that each unknown pixel can be estimated through original pixels simultaneously. Simulation results demonstrate that this hybrid interpolation algorithm improves the quality of the interpolated images over conventional interpolation methods.  相似文献   

20.
Many tasks performed by machine vision systems involve processing of natural scenes with large intra-frame illumination ratios. Thus, wide dynamic range visible spectrum image sensors are required to achieve adequate processing performance and reliability. An image sensor implementing an algorithm that linearly increases the illumination dynamic range of solid-state pixels is presented. Optimal exposure is achieved with a predictive pixel saturation decision that allows for multiple integration intervals of different duration to run concurrently for different pixels while keeping the sensor frame rate constant. A proof-of-concept chip was fabricated in a 0.18-/spl mu/m CMOS process. Added functionality to standard imagers is mainly concentrated off-pixel so fill factor is not sacrificed. Measured data corroborates the algorithm functionality.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号