首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
为了解决声音和图像情感识别的不足,提出一种新的情感识别方式:触觉情感识别。对CoST(corpus of social touch)数据集进行了一系列触觉情感识别研究,对CoST数据集进行数据预处理,提出一些关于触觉情感识别的特征。利用极限学习机分类器探究不同手势下的情感识别,对14种手势下的3种情感(温柔、正常、暴躁)进行识别,准确度较高,且识别速度快识别时间短。结果表明,手势的不同会影响情感识别的准确率,其中手势“stroke”的识别效果在不同分类器下的分类精度均为最高,且有较好的分类精度,达到72.07%;极限学习机作为触觉情感识别的分类器,具有较好的分类效果,识别速度快;有的手势本身对应着某种情感,从而影响分类结果。  相似文献   

2.
Expression could play a key role in the audio rendering of virtual reality applications. Its understanding is an ambitious issue in the scientific environment, and several studies have investigated the analysis techniques to detect expression in music performances. The knowledge coming from these analyses is widely applicable: embedding expression on audio interfaces can drive to attractive solutions to emphasize interfaces in mixed-reality environments. Synthesized expressive sounds can be combined with real stimuli to experience augmented reality, and they can be used in multi-sensory stimulations to provide the sensation of first-person experience in virtual expressive environments. In this work we focus on the expression of violin and flute performances, with reference to sensorial and affective domains. By means of selected audio features, we draw a set of parameters describing performers’ strategies which are suitable both for tuning expressive synthesis instruments and enhancing audio in human–computer interfaces.  相似文献   

3.
4.
We present a neural network based system for the visual recognition of human hand pointing gestures from stereo pairs of video camera images. The accuracy of the current system allows to estimate the pointing target to an accuracy of 2 cm in a workspace area of 50×50 cm. The system consists of several neural networks that perform the tasks of image segmentation, estimation of hand location, estimation of 3D-pointing direction and necessary coordinate transforms. Drawing heavily on the use of learning algorithms, the functions of all network modules were created from data examples only.  相似文献   

5.
Seven classes of design guidelines are described for interfaces which use speech recognition. The guidelines concern: (i) allocation of function within complex systems; (ii) parallel processing of speech with other modalities; (iii) design of command vocabulary; (iv) choice of command syntax; (v) use of feedback; (vi) template and user training; and (vii) choice of evaluation criteria.  相似文献   

6.
为了更好地识别手部动作,提出了一种新思路,将单个手指的状态作为识别目标集。采集常用手部联合动作的6路表面肌电信号,以单个手指的状态为基准将动作合理规划,提取各通道样本均值构造特征向量,设计3个并行BP神经网络,从联合动作样本中学习单个手指的状态,使得分类基数小,从而降低分类的复杂度,克服了传统多分类方法中需要采集动作多的缺点。实验结果表明,采集12种手部动作的肌电信号,将手部动作合理简化为手指动作后,利用手指的状态来训练神经网络,就能够识别出手指的3个状态的所有组合动作,即所有常用的18种手部联合动作。  相似文献   

7.
In this paper, we present an approach for recognizing pointing gestures in the context of human–robot interaction. In order to obtain input features for gesture recognition, we perform visual tracking of head, hands and head orientation. Given the images provided by a calibrated stereo camera, color and disparity information are integrated into a multi-hypothesis tracking framework in order to find the 3D-positions of the respective body parts. Based on the hands’ motion, an HMM-based classifier is trained to detect pointing gestures. We show experimentally that the gesture recognition performance can be improved significantly by using information about head orientation as an additional feature. Our system aims at applications in the field of human–robot interaction, where it is important to do run-on recognition in real-time, to allow for robot egomotion and not to rely on manual initialization.  相似文献   

8.
Wearable projector and camera (PROCAM) interfaces, which provide a natural, intuitive and spatial experience, have been studied for many years. However, existing hand input research into such systems revolved around investigations into stable settings such as sitting or standing, not fully satisfying interaction requirements in sophisticated real life, especially when people are moving. Besides, increasingly more mobile phone users use their phones while walking. As a mobile computing device, the wearable PROCAM system should allow for the fact that mobility could influence usability and user experience. This paper proposes a wearable PROCAM system, with which the user can interact by inputting with finger gestures like the hover gesture and the pinch gesture on projected surfaces. A lab-based evaluation was organized, which mainly compared two gestures (the pinch gesture and the hover gesture) in three situations (sitting, standing and walking) to find out: (1) How and to what degree does mobility influence different gesture inputs? Are there any significant differences between gesture inputs in different settings? (2) What reasons cause these differences? (3) What do people think about the configuration in such systems and to what extent does the manual focus impact such interactions? From qualitative and quantitative points of view, the main findings imply that mobility impacts gesture interactions in varying degrees. The pinch gesture undergoes less influence than the hover gesture in mobile settings. Both gestures were impacted more in walking state than in sitting and standing states by all four negative factors (lack of coordination, jittering hand effect, tired forearms and extra attention paid). Manual focus influenced mobile projection interaction. Based on the findings, implications are discussed for the design of a mobile projection interface with gestures.  相似文献   

9.
10.
This paper presents the design and evaluation of a multi-lingual fingerspelling recognition module that is designed for an information terminal. Through the use of multimodal input and output methods, the information terminal acts as a communication medium between deaf and blind people. The system converts fingerspelled words to speech and vice versa using fingerspelling recognition, fingerspelling synthesis, speech recognition and speech synthesis in Czech, Russian, and Turkish languages. We describe an adaptive skin color based fingersign recognition system with a close to real-time performance and present recognition results on 88 different letters signed by five different signers, using above four hours of training and test videos.  相似文献   

11.
12.
A novel framework to context modeling based on the probability of co-occurrence of objects and scenes is proposed. The modeling is quite simple, and builds upon the availability of robust appearance classifiers. Images are represented by their posterior probabilities with respect to a set of contextual models, built upon the bag-of-features image representation, through two layers of probabilistic modeling. The first layer represents the image in a semantic space, where each dimension encodes an appearance-based posterior probability with respect to a concept. Due to the inherent ambiguity of classifying image patches, this representation suffers from a certain amount of contextual noise. The second layer enables robust inference in the presence of this noise by modeling the distribution of each concept in the semantic space. A thorough and systematic experimental evaluation of the proposed context modeling is presented. It is shown that it captures the contextual “gist” of natural images. Scene classification experiments show that contextual classifiers outperform their appearance-based counterparts, irrespective of the precise choice and accuracy of the latter. The effectiveness of the proposed approach to context modeling is further demonstrated through a comparison to existing approaches on scene classification and image retrieval, on benchmark data sets. In all cases, the proposed approach achieves superior results.  相似文献   

13.
Segmentation and recognition of continuous gestures are challenging due to spatio-temporal variations and endpoint localization issues. A novel multi-scale Gesture Model is presented here as a set of 3D spatio-temporal surfaces of a time-varying contour. Three approaches, which differ mainly in endpoint localization, are proposed: the first uses a motion detection strategy and multi-scale search to find the endpoints; the second uses Dynamic Time Warping to roughly locate the endpoints before a fine search is carried out; the last approach is based on Dynamic Programming. Experimental results on two arm and single hand gestures show that all three methods achieve high recognition rates, ranging from 88% to 96% for the two arm test, with the last method performing best.  相似文献   

14.
With increasing integration of computer systems through local and wide area communication networks, there exists the capability in many organizations to retrieve information from databases to support ad hoc decision making by many different users. The idea that information is a corporate resource is now something more than business school hype. But the implications of sharing data are only just dawning on the corporate mind. How do managers interpret data? Where decision making is carried out by several people, perhaps in several different locations for different purposes the same data is used in multiple decision contexts. This paper explores the role of context as a way of adding value to information from databases. Two types of context are defined and discussed in relation to some examples of decisions where the role of context is vital. These examples are taken from some empirical research conducted with users of spatial decision support systems. Here the use of background information on maps, for example roads, add context to maps which otherwise simply display statistical data. The paper concludes by suggesting a model of context based on the notion that context acts as a filter between user and database.  相似文献   

15.
We report the recognition in video streams of isolated alphabetic characters and connected cursive textual characters, such as alphabetic, hiragana and kanji characters, that are drawn in the air. This topic involves a number of difficult problems in computer vision, such as the segmentation and recognition of complex motion on videos. We use an algorithm called time–space continuous dynamic programming (TSCDP), which can realize both time- and location-free (spotting) recognition. Spotting means that the prior segmentation of input video is not required. Each reference (model) character is represented by a single stroke that is composed of pixels. We conducted two experiments involving the recognition of 26 isolated alphabetic characters and 23 Japanese hiragana and kanji air-drawn characters. We also conducted gesture recognition experiments based on TSCDP, which showed that TSCDP was free from many of the restrictions imposed by conventional methods.  相似文献   

16.
Findings are summarized from a study which set out to identify cognitive stumbling blocks in the user interface of a large computer system used by telephone operators in Australia. Intermittent observations were made of operators' actions in the workplace over a period of eight months before, during and after system implementation. Numerous weaknesses were identified in the user interface, but the most interesting aspect of the study turned out to be the analysis of the overall jobs operators are required to do, as a by-product of the intended study. The insight into operator job demands led to changes in job selection criteria and training, which were found to match actual job demands quite poorly at the time the study was conducted. The paper describes the process by which the usability assessment of the user-system interface led to a comprehensive job analysis. Supporting quantitative data are presented, together with anecdotal examples which demonstrate the importance of conducting systems analysis in the working context, thereby maximizing the ecological value of the resulting research data.  相似文献   

17.
Freehand gestural interaction, in which the user’s hands move in mid-air to provide input, has been of interest to researchers, but freehand menu selection interfaces have been under-investigated so far. Freehand menu selection is inherently difficult, especially with increasing menu breadth (i.e., the number of items), largely because moving hands in free space cannot achieve precision as high as physical input devices such as mouse and stylus. We have designed a novel menu selection interface called the rapMenu (Ni et al., 2008), which is controlled by wrist tilt and multiple pinch gestures, and takes advantage of the multiple discrete gesture inputs to reduce the required precision of the user hand movements.In this article, we first review the visual design and behavior of the rapMenu technique, as well as related design issues and its potential advantages. In the second part, we present two studies of the rapMenu in order to further investigate the strengths and limitations of the design principle. In the first study, we compared the rapMenu to the extensively studied tilt menu technique (Rahman et al., 2009). Our results revealed that the rapMenu outperforms the tilt menu as menu breadth increases. In the second study, we investigated how the rapMenu affords the opportunity of eyes-free selection and users’ transition from novice to expert. We found that within 10 min of practice, eyes-free selection with rapMenu has competitive speed and accuracy with the visual rapMenu and the tilt menu. Finally, we discuss design variations that use other axes of wrist movement and adopt alternative auditory feedback.  相似文献   

18.
In this paper, we aim for the recognition of a set of dance gestures from contemporary ballet. Our input data are motion trajectories followed by the joints of a dancing body provided by a motion-capture system. It is obvious that direct use of the original signals is unreliable and expensive. Therefore, we propose a suitable tool for non-uniform sub-sampling of spatiotemporal signals. The key to our approach is the use of a deformable model to provide a compact and efficient representation of motion trajectories. Our dance gesture recognition method involves a set of hidden Markov models (HMMs), each of them being related to a motion trajectory followed by the joints. The recognition of such movements is then achieved by matching the resulting gesture models with the input data via HMMs. We have validated our recognition system on 12 fundamental movements from contemporary ballet performed by four dancers. This revised version was published online in November 2004 with corrections to the section numbers. Ballet Atlantique Régine Chopinot.  相似文献   

19.
In this paper we propose an improved sinusoidal modeling method based on perceptual matching pursuits computed in the bark scale for parametric audio coding applications. Complex exponentials compose the overcomplete dictionary for matching pursuits. The main contribution is the minimization of a perceptual distortion measure defined in the bark scale to select the optimum atom at each iteration of the pursuits. Furthermore, a psychoacoustic stopping criterion for the pursuits is presented. The proposed sinusoidal modeling method is suitable to be integrated into a parametric audio coder based on the three-part model of sines, transients and noise (STN model), as can be appreciated in experimental results. Our method provides significant advantages regarding previous works mainly because it operates in the bark scale rather than in frequency domain.  相似文献   

20.
There has been a growing interest in exploiting contextual information in addition to local features to detect and localize multiple object categories in an image. A context model can rule out some unlikely combinations or locations of objects and guide detectors to produce a semantically coherent interpretation of a scene. However, the performance benefit of context models has been limited because most of the previous methods were tested on data sets with only a few object categories, in which most images contain one or two object categories. In this paper, we introduce a new data set with images that contain many instances of different object categories, and propose an efficient model that captures the contextual information among more than a hundred object categories using a tree structure. Our model incorporates global image features, dependencies between object categories, and outputs of local detectors into one probabilistic framework. We demonstrate that our context model improves object recognition performance and provides a coherent interpretation of a scene, which enables a reliable image querying system by multiple object categories. In addition, our model can be applied to scene understanding tasks that local detectors alone cannot solve, such as detecting objects out of context or querying for the most typical and the least typical scenes in a data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号