首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 420 毫秒
1.
我们在歌词上做了一些传统的自然语言处理相关的实验。歌词是歌曲语义上的重要表达,因此,对歌词的分析可以作为歌曲音频处理的互补。我们利用齐夫定律对歌词语料库的字和词进行统计特征的考察,实验表明,其分布基本符合齐夫定律。利用向量空间模型的表示,我们可以找到比较相似的歌词集合。另外,我们探讨了如何利用歌词中的时间标注信息进行进一步的分析: 例如发现歌曲中重复片段,节奏划分,检索等。初步的实验表明,我们的方法具有一定的效果。  相似文献   

2.
Music and songs are integral parts of Bollywood movies. Every movie of two to three hours, contains three to ten songs, each song is 3–10 min long. Music lovers like to listen music and songs of a movie, however it is time consuming and error prone to search manually all the songs in a movie. Moreover, the task becomes much harder when songs are to be extracted from a huge archived movies’ database containing hundreds of movies. This paper presents an approach to automatically extract music and songs from archived musical movies. We used song grammar to construct Markov Chain Model that differentiates song scenes from dialogue and action scenes in a movie. We tested our system on Bollywood, Hollywood, Pakistani, Bengali, and Tamil movies. A total of 20 movies from different industries were selected for the experiments. On Bollywood movies, we achieved 97.22% recall in song extraction, whereas the recall on Hollywood musical movies is 80%. The test result on Pakistani, Tamil and Bengali movies is 87.09%.  相似文献   

3.
From lyrics-display on electronic music players and Karaoke videos to surtitles for live Chinese opera performance, one feature is common to all these everyday functionalities temporal: synchronization of the written text and its corresponding musical phrase. Our goal is to automate the process of lyrics alignment, a procedure which, to date, is still handled manually in the Cantonese popular song (Cantopop) industry. In our system, a vocal signal enhancement algorithm is developed to extract vocal signals from a CD recording in order to detect the onsets of the syllables sung and to determine the corresponding pitches. The proposed system is specifically designed for Cantonese, in which the contour of the musical melody and the tonal contour of the lyrics must match perfectly. With this prerequisite, we use a dynamic time warping algorithm to align the lyrics. The robustness of this approach is supported by experiment results. The system was evaluated with 70 twenty-second music segments and most samples have their lyrics aligned correctly.  相似文献   

4.
《Advanced Robotics》2013,27(6):585-604
We are attempting to introduce a 3D, realistic human-like animated face robot to human-robot communication. The face robot can recognize human facial expressions as well as produce realistic facial expressions in real time. For the animated face robot to communicate interactively, we propose a new concept of 'active human interface', and we investigate the performance of real time recognition of facial expressions by neural networks (NN) and the expressionability of facial messages on the face robot. We find that the NN recognition of facial expressions and the face robot's performance in generating facial expressions are of almost same level as that in humans. We also construct an artificial emotion model able to generate six basic emotions in accordance with the recognition of a given facial expression and the situational context. This implies a high potential for the animated face robot to undertake interactive communication with humans, when integrating these three component technologies into the face robot.  相似文献   

5.
We present LyricAlly, a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem based on a multimodal approach, using an appropriate pairing of audio and text processing to create the resulting prototype. LyricAlly's acoustic signal processing uses standard audio features but constrained and informed by the musical nature of the signal. The resulting detected hierarchical rhythm structure is utilized in singing voice detection and chorus detection to produce results of higher accuracy and lower computational costs than their respective baselines. Text processing is employed to approximate the length of the sung passages from the lyrics. Results show an average error of less than one bar for per-line alignment of the lyrics on a test bed of 20 songs (sampled from CD audio and carefully selected for variety). We perform a comprehensive set of system-wide and per-component tests and discuss their results. We conclude by outlining steps for further development.  相似文献   

6.
In this paper, an effective method of facial features detection is proposed for human-robot interaction (HRI). Considering the mobility of mobile robot, it is inevitable that any vision system for a mobile robot is bound to be faced with various imaging conditions such as pose variations, illumination changes, and cluttered backgrounds. To detecting face correctly under such difficult conditions, we focus on the local intensity pattern of the facial features. The characteristics of relatively dark and directionally different pattern can provide robust clues for detecting facial features. Based on this observation, we suggest a new directional template for detecting the major facial features, namely the two eyes and the mouth. By applying this template to a facial image, we can make a new convolved image, which we refer to as the edge-like blob map. One distinctive characteristic of this map image is that it provides the local and directional convolution values for each image pixel, which makes it easier to construct the candidate blobs of the major facial features without the information of facial boundary. Then, these candidates are filtered using the conditions associated with the spatial relationship of the two eyes and the mouth, and the face detection process is completed by applying appearance-based facial templates to the refined facial features. The overall detection results obtained with various color images and gray-level face database images demonstrate the usefulness of the proposed method in HRI applications.  相似文献   

7.
This paper presents an integrated approach for tracking hands, faces and specific facial features (eyes, nose, and mouth) in image sequences. For hand and face tracking, we employ a state-of-the-art blob tracker which is specifically trained to track skin-colored regions. In this paper we extend the skin color tracker by proposing an incremental probabilistic classifier, which can be used to maintain and continuously update the belief about the class of each tracked blob, which can be left-hand, right hand or face as well as to associate hand blobs with their corresponding faces. An additional contribution of this paper is related to the employment of a novel method for the detection and tracking of specific facial features within each detected facial blob which consists of an appearance-based detector and a feature-based tracker. The proposed approach is intended to provide input for the analysis of hand gestures and facial expressions that humans utilize while engaged in various conversational states with robots that operate autonomously in public places. It has been integrated into a system which runs in real time on a conventional personal computer which is located on a mobile robot. Experimental results confirm its effectiveness for the specific task at hand.  相似文献   

8.
音乐与歌曲赋予歌舞片电影作品以灵魂和翅膀,这是从设计艺术学角度研究不可缺少的艺术载体。在歌舞片电影中,音乐是电影艺术的一个重要组成部分,它能够深化影片的主题思想,抒发人物的内心情感,渲染环境气氛,加强艺术结构的连贯性和完整性;歌曲是专门为电影所创作或引用的以表现其思想内容、增强感染力为目的的一种音乐表现形式。  相似文献   

9.
人脸检测是计算机视觉和人工智能领域中的一项富有挑战性的工作,在虚拟现实、人机交互等很多领域都有广泛的应用。研究了基于Adaboost的人脸检测,并提出了肤色与Adaboost算法相结合的人脸检测方法。对输人的彩色图像进行从RGB空间到YCrCb空间的转换,再结合形态学等方法进行区域肤色分割,排除背景干扰,然后用Adaboost算法对可能区域进行检测,得到人脸位置。实验表明,该方法有较高的准确性和鲁棒性,可以得到满意的检测效果。  相似文献   

10.
A method for analyzing and categorizing the vowels of a sung query is described and analyzed. This query system uses a combination of spectral analysis and parametric clustering techniques to divide a single query into different vowel regions. The method is applied separately to each query, so no training or repeated measures are necessary. The vowel regions are then transformed into strings and string search methods are used to compare the results from various songs. We apply this method to a small pilot study consisting of 40 sung queries from each of 7 songs. Approximately 60% of the queries are correctly identified with their corresponding song, using only the vowel stream as the identifier.  相似文献   

11.
《Advanced Robotics》2013,27(8):827-852
The purpose of a robot is to execute tasks for people. People should be able to communicate with robots in a natural way. People naturally express themselves through body language using facial gestures and expressions. We have built a human-robot interface based on head gestures for use in robot applications. Our interface can track a person's facial features in real time (30 Hz video frame rate). No special illumination or facial makeup is needed to achieve robust tracking. We use dedicated vision hardware based on correlation image matching to implement the face tracking. Tracking using correlation matching suffers from the problems of changing shade and deformation or even disappearance of facial features. By using multiple Kalman filters we are able to overcome these problems. Our system can accurately predict and robustly track the positions of facial features despite disturbances and rapid movements of the head (including both translational and rotational motion). Since we can reliably track faces in real-time we are also able to recognize motion gestures of the face. Our system can recognize a large set of gestures (15) ranging from yes, no and may be to detecting winks, blinks and sleeping. We have used an approach that decomposes each gesture into a set of atomic actions, e.g. a nod for yes consists of an atomic up followed by a down motion. Our system can understand gestures by monitoring the transition between atomic actions.  相似文献   

12.
基于情感交互的仿人头部机器人   总被引:4,自引:0,他引:4  
本研究的目的是设计一台机器人,使它可以与人互动,并在日常生活中和常见的地方协助人类.为了 完成这些任务,机器人必须友好地显示出一些情感,表现出友好的特点和个性.依据仿生学,研制了一台仿人头部 机器人,建立了机器人的行为决策模型.该机器人具有人类的6 种基本面部表情,以及人脸检测、语音情感识别与 合成、情感行为决策等能力,能够通过机器视觉、语音交互、情感表达等方式与人进行有效的情感交互.  相似文献   

13.
Human face analysis on the mobile robot vision system should cope with difficult problems such as face pose variations, illumination changes, and complex backgrounds, in which problems are mainly induced from the movement of its platform. In this paper, in order to overcome such problems, an efficient facial feature detection approach based on local image region and direct pixel-intensity distributions is presented. We propose two novel concepts; the directional template for evaluating intensity distributions and the edge-like blob map image with multiple strength intensity. Using this blob map image, we show that the locations of major facial features—two eyes and a mouth—can be reliably estimated. Without the boundary information of facial area, final candidate face region is determined by both obtained locations of facial features and weighted correlations with stored facial templates.  相似文献   

14.
Recently, we have proposed a real-time tracker that simultaneously tracks the 3-D head pose and facial actions in monocular video sequences that can be provided by low quality cameras. This paper has two main contributions. First, we propose an automatic 3-D face pose initialization scheme for the real-time tracker by adopting a 2-D face detector and an eigenface system. Second, we use the proposed methods—the initialization and tracking—for enhancing the human–machine interaction functionality of an AIBO robot. More precisely, we show how the orientation of the robot's camera (or any active vision system) can be controlled through the estimation of the user's head pose. Applications based on head-pose imitation such as telepresence, virtual reality, and video games can directly exploit the proposed techniques. Experiments on real videos confirm the robustness and usefulness of the proposed methods.   相似文献   

15.
提出一种基于特征点LBP信息的表情识别方法。在分析了表情识别中的LBP特征之后,选择含有丰富表情信息的上半脸眼部周围和下半脸嘴部周围的特征点,计算每个特征点邻域的LBP信息作为表情特征进行表情识别。实验表明,基于特征点LBP信息的方法不需要对人脸进行预配准,较传统的LBP特征更有利于表情识别的实现。  相似文献   

16.
将文语转换技术与语音修改技术相结合,实现了一个歌词到歌曲的转换系统。首先利用一个文语转换系统将输入的歌词转换为语音,同时从歌曲的MIDI文件中提取歌曲的旋律参数,最后通过旋律控制模型对语音信号的声学特征进行修改,实现由歌词到歌曲的转换。实验结果表明,系统合成的歌曲达到了3.29的平均MOS得分。  相似文献   

17.
Interaction between a personal service robot and a human user is contingent on being aware of the posture and facial expression of users in the home environment. In this work, we propose algorithms to robustly and efficiently track the head, facial gestures, and the upper body movements of a user. The face processing module consists of 3D head pose estimation, modeling nonrigid facial deformations, and expression recognition. Thus, it can detect and track the face, and classify expressions under various poses, which is the key for human–robot interaction. For body pose tracking, we develop an efficient algorithm based on bottom-up techniques to search in a tree-structured 2D articulated body model, and identify multiple pose candidates to represent the state of current body configuration. We validate these face and body modules in varying experiments with different datasets, and the experimental results are reported. The implementation of both modules can run in real-time, which meets the requirement for real-world human–robot interaction task. These two modules have been ported onto a real robot platform by the Electronics and Telecommunications Research Institute.  相似文献   

18.
Human–computer interaction (HCI) lies at the crossroads of many scientific areas including artificial intelligence, computer vision, face recognition, motion tracking, etc. It is argued that to truly achieve effective human–computer intelligent interaction, the computer should be able to interact naturally with the user, similar to the way HCI takes place. In this paper, we discuss training probabilistic classifiers with labeled and unlabeled data for HCI applications. We provide an analysis that shows under what conditions unlabeled data can be used in learning to improve classification performance, and we investigate the implications of this analysis to a specific type of probabilistic classifiers, Bayesian networks. Finally, we show how the resulting algorithms are successfully employed in facial expression recognition, face detection, and skin detection.  相似文献   

19.
脉冲耦合神经网络(Pulse Coupled Neural Network,PCNN)是基于生物视觉特性而提出的新一代人工神经网络,它在数字图像处理及人工智能等领域具有广泛应用前景.本文通过研究PCNN理论模型及其工作特性的基础上提出了一种提取人脸特征的方法.首先利用小波变换提取人脸图像低频特征,降低人脸图像的维度,然后利用简化的PCNN提取小波低频系数重构后的人脸图像的相应时间序列,并以此作为人脸识别的特征序列.最后利用时间序列和欧式距离完成人脸的识别过程.本文通过ORL人脸库进行实验证明了该方法的有效性.  相似文献   

20.
Animating song     
We describe techniques used to create animations of song. Modifications to a text‐to‐audiovisual‐speech system have been made to take the extra information of timing and frequency of the lyrics from a MIDI file. Lip‐synchronized animations of song are then produced. We discuss differences between the production of speech and the production of song. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号