Unique visual features of 4D light field data have been shown to affect detection of salient objects. Nevertheless, only a few studies explore it yet. In this study, several helpful visual features extracted from light field data are fused in a two-stage Bayesian integration framework for salient object detection. First, background weighted color contrast is computed in high dimensional color space, which is more distinctive to identify object of interest. Second, focusness map of foreground slice is estimated. Then, it is combined with the color contrast results via first-stage Bayesian fusion. Third, background weighted depth contrast is computed. Depth contrast has been proved to be an extremely useful cue for salient object detection and complementary to color contrast. Finally, in the second-stage Bayesian fusion step, the depth-induced contrast saliency is further fused with the first-stage saliency fusion results to get the final saliency map. Experiments of comparing with eight existing state-of-the-art methods on light field benchmark datasets show that the proposed method can handle challenging scenarios such as cluttered background, and achieves the most visually acceptable salient object detection results. 相似文献
Emotion recognition from speech signals is an interesting research with several applications like smart healthcare, autonomous voice response systems, assessing situational seriousness by caller affective state analysis in emergency centers, and other smart affective services. In this paper, we present a study of speech emotion recognition based on the features extracted from spectrograms using a deep convolutional neural network (CNN) with rectangular kernels. Typically, CNNs have square shaped kernels and pooling operators at various layers, which are suited for 2D image data. However, in case of spectrograms, the information is encoded in a slightly different manner. Time is represented along the x-axis and y-axis shows frequency of the speech signal, whereas, the amplitude is indicated by the intensity value in the spectrogram at a particular position. To analyze speech through spectrograms, we propose rectangular kernels of varying shapes and sizes, along with max pooling in rectangular neighborhoods, to extract discriminative features. The proposed scheme effectively learns discriminative features from speech spectrograms and performs better than many state-of-the-art techniques when evaluated its performance on Emo-DB and Korean speech dataset.