首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   15篇
  免费   0篇
无线电   7篇
冶金工业   1篇
自动化技术   7篇
  2011年   1篇
  2009年   1篇
  2008年   1篇
  2007年   4篇
  2003年   1篇
  2002年   1篇
  2001年   1篇
  1998年   1篇
  1997年   1篇
  1993年   1篇
  1991年   1篇
  1988年   1篇
排序方式: 共有15条查询结果,搜索用时 15 毫秒
1.
The usage patterns of speech and visual input modes are investigated as a function of relative input mode efficiency for both desktop and personal digital assistant (PDA) working environments. For this purpose the form-filling part of a multimodal dialogue system is implemented and evaluated; three multimodal modes of interaction are implemented: ldquoClick-to-Talk,rdquo ldquoOpen-Mike,rdquo and ldquoModality-Selection.rdquo ldquoModality-Selectionrdquo implements an adaptive interface where the system selects the most efficient input mode at each turn, effectively alternating between a ldquoClick-to-Talkrdquo and ldquoOpen-Mikerdquo interaction style as proposed in ldquoModality tracking in the multimodal Bell Labs Communicator,rdquo in Proceedings of the Automatic Speech Recognition and Understanding Workshop, by A. Potamianos, , 2003. The multimodal systems are evaluated and compared with the unimodal systems. Objective and subjective measures used include task completion, task duration, turn duration, and overall user satisfaction. Turn duration is broken down into interaction time and inactivity time to better measure the efficiency of each input mode. Duration statistics and empirical probability density functions are computed as a function of interaction context and user. Results show that the multimodal systems outperform the unimodal systems in terms of objective and subjective criteria. Also, users tend to use the most efficient input mode at each turn; however, biases towards the default input modality and a general bias towards the speech modality also exists. Results demonstrate that although users exploit some of the available synergies in multimodal dialogue interaction, further efficiency gains can be achieved by designing adaptive interfaces that fully exploit these synergies.  相似文献   
2.
The paper introduces a novel detection and tracking system that provides both frame-view and world-coordinate human location information, based on video from multiple synchronized and calibrated cameras with overlapping fields of view. The system is developed and evaluated for the specific scenario of a seminar lecturer presenting in front of an audience inside a “smart room”, its aim being to track the lecturer’s head centroid in the three-dimensional (3D) space and also yield two-dimensional (2D) face information in the available camera views. The proposed approach is primarily based on a statistical appearance model of human faces by means of well-known AdaBoost-like face detectors, extended to address the head pose variation observed in the smart room scenario of interest. The appearance module is complemented by two novel components and assisted by a simple tracking drift detection mechanism. The first component of interest is the initialization module, which employs a spatio-temporal dynamic programming approach with appropriate penalty functions to obtain optimal 3D location hypotheses. The second is an adaptive subspace learning based 2D tracking scheme with a novel forgetting mechanism, introduced to reduce tracking drift and increase robustness. System performance is benchmarked on an extensive database of realistic human interaction in the lecture smart room scenario, collected as part of the European integrated project “CHIL”. The system consistently achieves excellent tracking precision, with a 3D mean tracking error of less than 16 cm, and is demonstrated to outperform four alternative tracking schemes. Furthermore, the proposed system performs relatively well in detecting frontal and near-frontal faces in the available frame views. This work was performed while Zhenqiu Zhang was on a summer internship with the Human Language Technology Department at the IBM T.J. Watson Research Center.  相似文献   
3.
4.
A Monte Carlo simulation technique for estimating the partition function of a general Gibbs random field image is proposed. By expressing the partition function as an expectation, an importance sampling approach for estimating it using Monte Carlo simulations is developed. As expected, the resulting estimators are unbiased and consistent. Computations can be performed iteratively by using simple Monte Carlo algorithms with remarkable success, as demonstrated by simulations. The work concentrates on binary, second-order Gibbs random fields defined on a rectangular lattice. However, the proposed methods can be easily extended to more general Gibbs random fields. Their potential contribution to optimal parameter estimation and hypothesis testing problems for general Gibbs random field images using a likelihood approach is anticipated  相似文献   
5.
During pregnancy, mouse uterine epithelial cells produce and secrete a large amount of macrophage colony-stimulating factor (M-CSF/CSF-1). Macrophages accumulate and proliferate in the undecidualized endometrium of the pregnant uterus. Observations showed that macrophages expressed scavenger receptor class A (type I and type II) and class C (macrosialin). Scavenger receptors appeared to be involved in the removal of apoptotic cells in the degenerated decidual tissue. The expression of class A and class C scavenger receptor mRNAs in the uterus of pregnant mice was elevated but the expression of class B scavenger receptor (CD36) mRNA was similar to that of non-pregnant mice. The expression of various cytokines and chemokines, including M-CSF, monocyte chemoattractant protein-1 (MCP-1) and macrophage inflammatory protein 1-alpha (MIP1-alpha), was enhanced in the uterus of pregnant mice, suggesting that these molecules regulate macrophage chemotaxis and immunological function in the uterus. These findings imply that the pregnant uterus provides a microenvironment for the recruitment, differentiation, and proliferation of macrophages and the regulation of scavenger receptor and cytokine expression for a successful pregnancy.  相似文献   
6.
Recent advances in the automatic recognition of audiovisual speech   总被引:11,自引:0,他引:11  
Visual speech information from the speaker's mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audiovisual automatic speech recognition (ASR) and present novel contributions in two main areas: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovisual speech integration. On the latter topic, we discuss new work on feature and decision fusion combination, the modeling of audiovisual speech asynchrony, and incorporating modality reliability estimates to the bimodal recognition process. We also briefly touch upon the issue of audiovisual adaptation. We apply our algorithms to three multisubject bimodal databases, ranging from small- to large-vocabulary recognition tasks, recorded in both visually controlled and challenging environments. Our experiments demonstrate that the visual modality improves ASR over all conditions and data considered, though less so for visually challenging environments and large vocabulary tasks.  相似文献   
7.
The automatic recognition of user’s communicative style within a spoken dialog system framework, including the affective aspects, has received increased attention in the past few years. For dialog systems, it is important to know not only what was said but also how something was communicated, so that the system can engage the user in a richer and more natural interaction. This paper addresses the problem of automatically detecting “frustration”, “politeness”, and “neutral” attitudes from a child’s speech communication cues, elicited in spontaneous dialog interactions with computer characters. Several information sources such as acoustic, lexical, and contextual features, as well as, their combinations are used for this purpose. The study is based on a Wizard-of-Oz dialog corpus of 103 children, 7–14 years of age, playing a voice activated computer game. Three-way classification experiments, as well as, pairwise classification between polite vs. others and frustrated vs. others were performed. Experimental results show that lexical information has more discriminative power than acoustic and contextual cues for detection of politeness, whereas context and acoustic features perform best for frustration detection. Furthermore, the fusion of acoustic, lexical and contextual information provided significantly better classification results. Results also showed that classification performance varies with age and gender. Specifically, for the “politeness” detection task, higher classification accuracy was achieved for females and 10–11 years-olds, compared to males and other age groups, respectively.  相似文献   
8.
We propose a three-stage pixel-based visual front end for automatic speechreading (lipreading) that results in significantly improved recognition performance of spoken words or phonemes. The proposed algorithm is a cascade of three transforms applied on a three-dimensional video region-of-interest that contains the speaker's mouth area. The first stage is a typical image compression transform that achieves a high-energy, reduced-dimensionality representation of the video data. The second stage is a linear discriminant analysis-based data projection, which is applied on a concatenation of a small amount of consecutive image transformed video data. The third stage is a data rotation by means of a maximum likelihood linear transform that optimizes the likelihood of the observed data under the assumption of their class-conditional multivariate normal distribution with diagonal covariance. We applied the algorithm to visual-only 52-class phonetic and 27-class visemic classification on a 162-subject, 8-hour long, large vocabulary, continuous speech audio-visual database. We demonstrated significant classification accuracy gains by each added stage of the proposed algorithm which, when combined, can achieve up to 27% improvement. Overall, we achieved a 60% (49%) visual-only frame-level visemic classification accuracy with (without) use of test set viseme boundaries. In addition, we report improved audio-visual phonetic classification over the use of a single-stage image transform visual front end. Finally, we discuss preliminary speech recognition results.  相似文献   
9.
For pt.1see ibid., vol. 9, p. 3 (2007). In this paper, the task and user interface modules of a multimodal dialogue system development platform are presented. The main goal of this work is to provide a simple, application-independent solution to the problem of multimodal dialogue design for information seeking applications. The proposed system architecture clearly separates the task and interface components of the system. A task manager is designed and implemented that consists of two main submodules: the electronic form module that handles the list of attributes that have to be instantiated by the user, and the agenda module that contains the sequence of user and system tasks. Both the electronic forms and the agenda can be dynamically updated by the user. Next a spoken dialogue module is designed that implements the speech interface for the task manager. The dialogue manager can handle complex error correction and clarification user input, building on the semantics and pragmatic modules presented in Part I of this paper. The spoken dialogue system is evaluated for a travel reservation task of the DARPA Communicator research program and shown to yield over 90% task completion and good performance for both objective and subjective evaluation metrics. Finally, a multimodal dialogue system which combines graphical and speech interfaces, is designed, implemented and evaluated. Minor modifications to the unimodal semantic and pragmatic modules were required to build the multimodal system. It is shown that the multimodal system significantly outperforms the unimodal speech-only system both in terms of efficiency (task success and time to completion) and user satisfaction for a travel reservation task  相似文献   
10.
A comprehensive electrical characterization study which was conducted to optimize the fabrication of SIMOX substrates for VLSI is discussed. The oxygen implantation was carried out using medium-current and high-current implanters. The wafers were annealed at 1275°C and 1300°C to produce high-quality, precipitate-free material. The effect of dose, the effect of multiple implantation (by sequentially implanting and annealing), and the effect of the anneal ambient gas and the capping layer during annealing were studied. MOSFETs of various geometries with a gate oxide of ~20 nm were fabricated by a CMOS process incorporating the addition of a thin epitaxial Si layer. A general evaluation of each transistor was conducted by studying its static characteristics. The interface states, bulk traps, and carrier generation phenomena were studied. Good-quality interfaces were obtained. Better implantation control reduced contamination and suppressed deep traps below the detection limit. Multiple implantation resulted in superior material quality. as evidenced by very long generation lifetime values (> 100 μs)  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号