首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The human-computer interaction in medicine shows only a few differences to other man-machine interfaces. Medicine technology has — perhaps comparable with the interface in nuclear power stations — unusual features like a septic or aseptic operating room or 100% availability. The interface for people is the language. Therefore the equipment in medicine should be used by speech and the output should also be speech. The medical staff must work on many various pieces of equipment produced by various manufacturers. Siemens is about to establish a new standard named Syngo. Syngo is a synonym for the philosophy of usage of medical equipment, which shall be offered independently of the manufacturer.  相似文献   

2.
The wireless market is applying two conflicting requirements on mobile terminals — they must get smaller and lighter, yet at the same time they must also deliver more advanced multimedia applications. As the terminals shrink, screen size becomes limited and keypads must either occupy less space or vanish entirely. Meanwhile as the applications grow richer and more complex, the requirements to be able to control and interact with them only expand. How can these apparently contradictory goals be met? How can rich multimedia applications be controlled as the physical space for keypads shrinks or indeed vanishes? The answer lies in the use of speech technology. By using speech recognition, voice commands can supplement the use of a keypad/stylus, while audio output can deliver additional information that cannot fit on the screen. The multimodal interface, combining the best of voice-only and graphical user interfaces is the future of the wireless user interface. This paper looks at recent developments in multimodal interfaces and explores how they will affect the evolution of wireless applications.  相似文献   

3.
ExtraPlanT is a multiagent production planning system designed for small and medium-sized enterprises with project-oriented production. In order to make the results of the system available even to users who are located away from the enterprise, it has been equipped with the possibility of remote access-a Web and telephony interface. The multiagent design of the ExtraPlanT makes the integration of these interfaces robust and simple. The telephony interface uses VoiceXML technology so that it can be built without extensive knowledge of speech processing. The interface also uses innovative techniques to overcome the common disadvantages of speech as a medium for machine output.  相似文献   

4.
This paper presents a vision of the near future in which computer interaction is characterized by natural face-to-face conversations with lifelike characters that speak, emote, and gesture. These animated agents will converse with people much like people converse effectively with assistants in a variety of focused applications. Despite the research advances required to realize this vision, and the lack of strong experimental evidence that animated agents improve human-computer interaction, we argue that initial prototypes of perceptive animated interfaces can be developed today, and that the resulting systems will provide more effective and engaging communication experiences than existing systems. In support of this hypothesis, we first describe initial experiments using an animated character to teach speech and language skills to children with hearing problems, and classroom subjects and social skills to children with autistic spectrum disorder. We then show how existing dialogue system architectures can be transformed into perceptive animated interfaces by integrating computer vision and animation capabilities. We conclude by describing the Colorado Literacy Tutor, a computer-based literacy program that provides an ideal testbed for research and development of perceptive animated interfaces, and consider next steps required to realize the vision.  相似文献   

5.
We propose a novel HMI UI/UX for an in‐vehicle infotainment system. Our proposed HMI UI comprises multimodal interfaces that allow a driver to safely and intuitively manipulate an infotainment system while driving. Our analysis of a touchscreen interface–based HMI UI/UX reveals that a driver's use of such an interface while driving can cause the driver to be seriously distracted. Our proposed HMI UI/UX is a novel manipulation mechanism for a vehicle infotainment service. It consists of several interfaces that incorporate a variety of modalities, such as speech recognition, a manipulating device, and hand gesture recognition. In addition, we provide an HMI UI framework designed to be manipulated using a simple method based on four directions and one selection motion. Extensive quantitative and qualitative in‐vehicle experiments demonstrate that the proposed HMI UI/UX is an efficient mechanism through which to manipulate an infotainment system while driving.  相似文献   

6.
A multimedia natural-language interface for an expert network-management system is described. The different media discussed are speech, text, and graphics. An object-oriented, multimedia interaction language called Milan is used to describe the multimedia information. Application of the multimedia interface to a digital data services (DDS) network is illustrated. It is believed that such multimedia front ends will play a major role in the design of intelligent human-machine interfaces  相似文献   

7.
Research in human/computer interaction has mainly focused on natural language, text, speech and vision primarily in isolation. Recently there have been a number of research projects that have concentrated on the integration of such modalities using intelligent reasoners. The rationale is that many inherent ambiguities in single modes of communication can be resolved if extra information is available.This paper describes an intelligent multi-modal system called the Smart Work Manager. The main characteristics of the Smart Work Manager are that it can process speech, text, face images, gaze information and simulated gestures using the mouse as input modalities, and its output is in the form of speech, text or graphics. The main components of the system are the reasoner, a speech system, a vision system, an integration platform and the application interface. The overall architecture of the system will be described together with the integration platform and the components of the system which include a non-intrusive neural network based gaze-tracking system. The paper concludes with a discussion on the applicability of such systems to intelligent human/computer interaction and lessons learnt in terms of reliability and efficiency.  相似文献   

8.
Benefiting from the knowledge of speech, language, and hearing, a new technology has arisen to serve the users with complex information systems. This technology aims for a natural communication environment, capturing the attributes that humans favor in face-to-face exchange. Conversational interaction bears a central burden, with visual and manual signaling simultaneously supplementing the communication process. In addition to instrumenting the sensors for each mode, the interface must incorporate the context-aware algorithms in fusing and interpreting the multiple sensory channels. The ultimate objective is a reliable estimate of the user's intent, from which actionable responses can be made. The current research therefore addresses the multi-modal interfaces that can transcend from the limitations of the mouse and the keyboard. This report indicates the early status of the multimodal interfaces and identifies the emerging opportunities for enhanced usability and naturalness. It concludes by advocating the focused research on a frontier issue - the formulation of a quantitative language framework for multimodal communication.  相似文献   

9.
Speech Emotion Recognition (SER) represents one of the emerging fields in human-computer interaction. Quality of the human-computer interface that mimics human speech emotions relies heavily on the types of features used and also on the classifier employed for recognition. The main purpose of this paper is to present a wide range of features employed for speech emotion recognition and the acoustic characteristics of those features. Also in this paper, we analyze the performance in terms of some important parameters such as: precision, recall, F-measure and recognition rate of the features using two of the commonly used emotional speech databases namely Berlin emotional database and Danish emotional database. Emotional speech recognition is being applied in modern human-computer interfaces and the overview of 10 interesting applications is also presented in this paper to illustrate the importance of this technique.  相似文献   

10.
In this paper, we describe our recent work at Microsoft Research, in the project codenamed Dr. Who, aimed at the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present in detail MiPad as the first Dr. Who's application that addresses specifically the mobile user interaction scenario. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution to the current prevailing problem of pecking with tiny styluses or typing on minuscule keyboards in today's PDAs or smart phones. Despite its current incomplete implementation, we have observed that speech and pen have the potential to significantly improve user experience in our user study reported in this paper. We describe in this system-oriented paper the main components of MiPad, with a focus on the robust speech processing and spoken language understanding aspects. The detailed MiPad components discussed include: distributed speech recognition considerations for the speech processing algorithm design; a stereo-based speech feature enhancement algorithm used for noise-robust front-end speech processing; Aurora2 evaluation results for this front-end processing; speech feature compression (source coding) and error protection (channel coding) for distributed speech recognition in MiPad; HMM-based acoustic modeling for continuous speech recognition decoding; a unified language model integrating context-free grammar and N-gram model for the speech decoding; schema-based knowledge representation for the MiPad's personal information management task; a unified statistical framework that integrates speech recognition, spoken language understanding and dialogue management; the robust natural language parser used in MiPad to process the speech recognizer's output; a machine-aided grammar learning and development used for spoken language understanding for the MiPad task; Tap & Talk multimodal interaction and user interface design; back channel communication and MiPad's error repair strategy; and finally, user study results that demonstrate the superior throughput achieved by the Tap & Talk multimodal interaction over the existing pen-only PDA interface. These user study results highlight the crucial role played by speech in enhancing the overall user experience in MiPad-like human-computer interaction devices.  相似文献   

11.
Virtual reality interfaces can immerse users into virtual environments from an impressive array of application fields, including entertainment, education, design, and navigation. However, history teaches us that no matter how rich the content is from these applications, it remains out of reach for users without a physical way to interact with it. Multimodal interfaces give users a way to interact with the virtual environment (VE) using more than one complementary modality. Masterpiece (which is short for multimodal authoring tool with similar technologies from European research utilizing a physical interface in an enhanced collaborative environment) is a platform for a multimodal natural interface. We integrated Masterpiece into a new authoring tool for designers and engineers that uses 3D search capabilities to access original database content, supporting natural human-computer interaction.  相似文献   

12.
陈霏  潘昌杰 《信号处理》2020,36(6):816-830
脑机接口作为患有交流或肢体障碍的病人与机器设备交流沟通的一种技术,近年来在生物医学工程、康复工程等领域受到广泛关注。基于发音想象的脑机接口作为一种新型的脑机接口方式,由于其具有为患有言语障碍的病人提供有效、舒适的言语交流的潜力,其相关研究正在迅速发展。本文首先介绍常见的基于发音想象的脑机接口所用到的信号采集技术,然后详述现有文献中的相关研究内容和信号处理算法,最后讨论基于发音想象的脑机接口存在的问题以及对未来工作进行展望。   相似文献   

13.
Brooks  K. 《Multimedia, IEEE》2002,9(1):8-11
We're constantly looking for ways to manage the growing complexity of our computer experiences. Software developers are weaving more features into already bloated applications because. It's up to human interface designers as members of development teams to quell this drive of feature creep with smarter interface tools to make life easier for computer users. Methods for dealing with the increasing complexity range from the simple floating tool palettes (as used in Adobe Photoshop) to complex intelligent interfaces and interface agents making their way out of academia and into the commercial marketplace. The dilemma is that easy software sells better than difficult software, feature-rich software sells better than feature-poor software, and the two best-selling categories-easy and feature rich-seem to contradict each other. Story interfaces show some promise of mitigating the ever-increasing complexity of information interfaces. We must of course perform much research in the area of applying the story to interface design. We still have to work out a lot of the choreography, such as interaction between stored information, human relationships, story representation and expression. But eventually, using a story approach to device design can take us closer to dancing with our technology rather than struggling with it  相似文献   

14.
User-centered modeling and evaluation of multimodal interfaces   总被引:4,自引:0,他引:4  
Historically, the development of computer interfaces has been a technology-driven phenomenon. However, new multimodal interfaces are composed of recognition-based technologies that must interpret human speech, gesture, gaze, movement patterns, and other complex natural behaviors, which involve highly automatized skills that are not under full conscious control. As a result, it now is widely acknowledged that multimodal interface design requires modeling of the modality-centered behavior and integration patterns upon which multimodal systems aim to build. This paper summarizes research on the cognitive science foundations of multimodal interaction, and on the essential role that user-centered modeling has played in prototyping, guiding, and evaluating the design of next-generation multimodal interfaces. In particular, it discusses the properties of different modalities and the information content they carry, the unique features of multimodal language and its processability, as well as when users are likely to interact multimodally and how their multimodal input is integrated and synchronized. It also reviews research on typical performance and linguistic efficiencies associated with multimodal interaction, and on the user-centered reasons why multimodal interaction minimizes errors and expedites error handling. In addition, this paper describes the important role that selective methodologies and evaluation metrics have played in shaping next-generation multimodal systems, and it concludes by highlighting future directions for designing a new class of adaptive multimodal-multisensor interfaces.  相似文献   

15.
We document the rationale and design of a multimodal interface to a pervasive/ubiquitous computing system that supports independent living by older people in their own homes. The Millennium Home system involves fitting a resident's home with sensors--these sensors can be used to trigger sequences of interaction with the resident to warn them about dangerous events, or to check if they need external help. We draw lessons from the design process and conclude the paper with implications for the design of multimodal interfaces to ubiquitous systems developed for the elderly and in healthcare, as well as for more general ubiquitous computing applications.  相似文献   

16.
In this paper, we describe some abstract features of human/machine interaction systems that are required for the production of intelligent behaviour. We introduce a subset of intelligent systems called human-centered intelligent systems (HCIS) and argue that such systems must be autonomous, robust and adaptive in order to be intelligent. We also propose soft computing as a promising new technique that can be used to build HCIS, and present examples where this is already being done. The paper defines flexibility to be a combination of the often-conflicting requirements of robustness and adaptability, and based on this we claim that the right balance between these two features is necessary to achieve intelligent behaviour.We describe the intelligent assistant (IA) system and its various components which automatically perform helpful tasks for its user, so as to enable the user to improve productivity. These tasks include time, information and communication management. Time management involves planning and scheduling, decision making and learning user habits. Information management involves information seeking and filtering, information fusion, decision making and learning user preferences. Communication management involves recognising user behaviour and learning user priorities. All these tasks depend on many factors including the type of activity, its originator, the mood of the user, past experience, and the priority of the task. The IA uses a multimodal interface with conventional interfaces such as keyboard and mouse enhanced to include vision, speech and natural language processing. The inclusion of such extra modalities extends the capabilities of existing systems at the cost of introducing extra complexity. The IA is 'smart' because it has the knowledge about tasks and the capability to learn and adapt to new interactions with its user and with other systems.  相似文献   

17.
Larson  J.A. 《Multimedia, IEEE》2003,10(4):91-93
VoiceXML is a markup language for creating voice-user interfaces. It uses speech and telephone touchtone recognition for input and prerecorded audio and text-to-speech synthesis (TTS) for output. It's based on the World Wide Web Consortium's (W3C's) Extensible Markup Language (XML) and leverages the Web paradigm for application development and deployment. By having a common language, application developers, platform vendors, and tool providers all can benefit from code portability and reuse. The paper discusses VoiceXML and the W3C speech interface framework.  相似文献   

18.
张文  张毅  满毅  陈晓峰 《电信科学》2011,27(8):117-121
提出一种集成共享平台,是一种可对外提供多种基本且可扩展接口,以适应不同接口的应用系统。基于此平台,多个系统之间可以相互调用多种服务,提高服务管理功能的同时进行无障碍的信息交互;及时发现分析系统交互异常信息,实现智能维护管理。当出现新的接口协议时,只需针对所述平台进行一次新接口协议适配器的开发,即可完成新接口的接入,提高了平台开发效率,降低了开发维护成本。本平台将传统的点对点的应用方式改为多对多的总线方式,实现了各系统间简单便捷的信息交互,优化了各系统间的架构,易于管理和维护,减少了接入系统的接口改造工作。  相似文献   

19.
In this tutorial paper, we discuss and compare cooperative content delivery (CCD) techniques that exploit multiple wireless interfaces available on mobile devices to efficiently satisfy the already massive and rapidly growing user demand for content. The discussed CCD techniques include simultaneous use of wireless interfaces, opportunistic use of wireless interfaces, and aggregate use of wireless interfaces. We provide a taxonomy of different ways in which multiple wireless interfaces are exploited for CCD, and also discuss the real measurement studies that evaluate the content delivery performance of different wireless interfaces in terms of energy consumption and throughput. We describe several challenges related to the design of CCD methods using multiple interfaces, and also explain how new technological developments can help in accelerating the performance of such CCD methods. The new technological developments discussed in this paper include wireless interface aggregation, network caching, and the use of crowdsourcing. We provide a case study for selection of devices in a group for CCD using multiple interfaces. We consider this case study based on the observation that in general different CCD users can have different link qualities in terms of transmit/receive performance, and selection of users with good link qualities for CCD can accelerate the content delivery performance of wireless networks. Finally, we discuss some open research issues relating to CCD using multiple interfaces.  相似文献   

20.
This paper presents the results of a study into how structured dialogue techniques in IVR systems can be extended to allow more natural speech interaction with the caller. A spoken language system was produced which allows callers to set reminder calls or bar outgoing or incoming telephone calls. In spite of the limits of speech recognition performance, the resulting system had a highly natural speech interface which allowed the caller to optionally offer one or more pieces of information at a time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号