首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Benefiting from the knowledge of speech, language, and hearing, a new technology has arisen to serve the users with complex information systems. This technology aims for a natural communication environment, capturing the attributes that humans favor in face-to-face exchange. Conversational interaction bears a central burden, with visual and manual signaling simultaneously supplementing the communication process. In addition to instrumenting the sensors for each mode, the interface must incorporate the context-aware algorithms in fusing and interpreting the multiple sensory channels. The ultimate objective is a reliable estimate of the user's intent, from which actionable responses can be made. The current research therefore addresses the multi-modal interfaces that can transcend from the limitations of the mouse and the keyboard. This report indicates the early status of the multimodal interfaces and identifies the emerging opportunities for enhanced usability and naturalness. It concludes by advocating the focused research on a frontier issue - the formulation of a quantitative language framework for multimodal communication.  相似文献   

2.
Oviatt  S. 《Multimedia, IEEE》1996,3(4):26-35
By modeling difficult sources of linguistic variability in speech and language, we can design interfaces that transparently guide human input to match system processing capabilities. Such work will yield more user centered and robust interfaces for next generation spoken language and multimodal systems  相似文献   

3.
Speech-gesture driven multimodal interfaces for crisis management   总被引:3,自引:0,他引:3  
Emergency response requires strategic assessment of risks, decisions, and communications that are time critical while requiring teams of individuals to have fast access to large volumes of complex information and technologies that enable tightly coordinated work. The access to this information by crisis management teams in emergency operations centers can be facilitated through various human-computer interfaces. Unfortunately, these interfaces are hard to use, require extensive training, and often impede rather than support teamwork. Dialogue-enabled devices, based on natural, multimodal interfaces, have the potential of making a variety of information technology tools accessible during crisis management. This paper establishes the importance of multimodal interfaces in various aspects of crisis management and explores many issues in realizing successful speech-gesture driven, dialogue-enabled interfaces for crisis management. This paper is organized in five parts. The first part discusses the needs of crisis management that can be potentially met by the development of appropriate interfaces. The second part discusses the issues related to the design and development of multimodal interfaces in the context of crisis management. The third part discusses the state of the art in both the theories and practices involving these human-computer interfaces. In particular, it describes the evolution and implementation details of two representative systems, Crisis Management (XISM) and Dialog Assisted Visual Environment for Geoinformation (DAVE/spl I.bar/G). The fourth part speculates on the short-term and long-term research directions that will help addressing the outstanding challenges in interfaces that support dialogue and collaboration. Finally, the fifth part concludes the paper.  相似文献   

4.
Imagine that you're chatting with a good friend who tells you about her new cashmere pullover, and that you're able to touch it and appreciate the soft sensation of the cashmere wool - remotely over the Web in your virtual chatroom! In this article, the authors present their work on haptic sensing of virtual textiles. This is work done in the context of the Haptex research project that aims at integrating the human sense of touch into multimodal user interfaces. Being able to support haptic perception in the user interface will be a great step toward next-generation immersive multimedia experiences.  相似文献   

5.
The near future promises significant advances in communication capabilities, but one of the keys to success is the capability understanding of the people with regards to its value and usage. In considering the role of the user in the wireless world of the future, the Human Perspective Working Group (WG1) of the Wireless World Research Forum has gathered input and developed positions in four important areas: methods, processes, and best practices for user-centered research and design; reference frameworks for modeling user needs within the context of wireless systems; user scenario creation and analysis; and user interaction technologies. This article provides an overview of WG1's work in these areas that are critical to ensuring that the future wireless world meets and exceeds the expectations of people in the coming decades.  相似文献   

6.
Virtual reality interfaces can immerse users into virtual environments from an impressive array of application fields, including entertainment, education, design, and navigation. However, history teaches us that no matter how rich the content is from these applications, it remains out of reach for users without a physical way to interact with it. Multimodal interfaces give users a way to interact with the virtual environment (VE) using more than one complementary modality. Masterpiece (which is short for multimodal authoring tool with similar technologies from European research utilizing a physical interface in an enhanced collaborative environment) is a platform for a multimodal natural interface. We integrated Masterpiece into a new authoring tool for designers and engineers that uses 3D search capabilities to access original database content, supporting natural human-computer interaction.  相似文献   

7.
Gestural-based interfaces have become one of the fundamental technologies that can determine the success of new computer games. In fact, computer games today offer interaction paradigms that go well beyond the use of remote controls, letting players directly perform exchanges with the objects and characters that compose the virtual worlds that are displayed in front of them. To perform such exchanges, new algorithms and technologies have been devised which include advanced visual recognition schemes, new video cameras and accelerometer sensors. At the same time, other important trends are also quietly emerging in the same domain: game designers, in fact, are slowly shifting their attention out of the walls of gaming fanatics homes, broadening their interests to computer games that can be played in public spaces, as exhibitions and museums. However, to the best of our knowledge, only a very limited amount of research experiences have taken into account the problem of producing computer games, based on gesture-based interfaces that well suit such settings. Hence, in this paper we address the problem of differentiating the design of a gesture-based interface for a console from the problem of designing it for a public space setting. Moreover, we will show that within a public space, it is possible to narrow down the vision algorithms that can well support the recognition of complex actions, whereas solely relying on a simple webcam. In particular, we will describe the design and implementation of an interface that well suits public immersive scenarios, since it is based on a simple and efficient set of algorithms which, combined with the intelligence given by the knowledge of the context of where a game is played, leads to a fast and robust interpretation of hand gestures. To witness this last aspect, we will report on the results obtained from the deployment of a computer game we specifically developed for public spaces, termed Tortellino X-Perience, which has been enjoyed by hundreds of visitors at the 2010 Shanghai World Expo.  相似文献   

8.
We document the rationale and design of a multimodal interface to a pervasive/ubiquitous computing system that supports independent living by older people in their own homes. The Millennium Home system involves fitting a resident's home with sensors--these sensors can be used to trigger sequences of interaction with the resident to warn them about dangerous events, or to check if they need external help. We draw lessons from the design process and conclude the paper with implications for the design of multimodal interfaces to ubiquitous systems developed for the elderly and in healthcare, as well as for more general ubiquitous computing applications.  相似文献   

9.
Object-oriented database systems (OODBS), which aim at meeting the data modeling, performance, cooperative design, and version management requirements of next-generation applications, such as CAD, CAM, CASE, hypermedia, and expert systems, are discussed. Key features of OODBS are presented, a taxonomy of approaches is provided, and architectural and implementation issues, design alternatives, and tradeoffs are examined. A variety of OODB systems, both research prototypes and commercial systems, are summarized. Industry efforts to accelerate a consensus that can lead to standards in the OODB are discussed  相似文献   

10.
Integration of different wireless radio cellular technologies is emerging as an effective approach to accommodate the increasing demand of next-generation multimedia-based applications. In such systems user roaming among different technologies, commonly known as vertical handoff, will significantly affect different aspects of network design and planning due to the characteristically wide-ranging diversity in access technologies and supported applications. Hence, the development of new mobility models that accurately depict vertical mobility is crucial for studying different design problems in these heterogeneous systems. This article presents a generic framework for mobility modeling and performance analysis of integrated heterogeneous networks using phase-type distributions. This framework realizes all modeling requirements in next-generation user mobility including accuracy, analytical tractability, and accommodating the correlation between different residence times within different access technologies. Additionally, we present general guidelines to evaluate application performance based on the new mobility models introduced in this article. We show the accuracy of our modeling approach through simulation and analysis given different applications.  相似文献   

11.
The next-generation convergent microsystems, based on system-on-package (SOP) technology, require up-front system-level design-for-reliability approaches and appropriate reliability assessment methodologies to guarantee the reliability of digital, optical, and radio frequency (RF) functions, as well as their interfaces. Systems approach to reliability requires the development of: i) physics-based reliability models for various failure mechanisms associated with digital, optical, and RF Functions, and their interfaces in the system; ii) design optimization models for the selection of suitable materials and processing conditions for reliability, as well as functionality; and iii) system-level reliability models understanding the component and functional interaction. This paper presents the reliability assessment of digital, optical, and RF functions in SOP-based microsystems. Upfront physics-based design-for-reliability models for various functional failure mechanisms are presented to evaluate various design options and material selection even before the prototypes are made. Advanced modeling methodologies and algorithms to accommodate material length scale effects due to enhanced system integration and miniaturization are presented. System-level mixed-signal reliability is discussed thorough system-level reliability metrics relating component-level failure mechanisms to system-level signal integrity, as well as statistical aspects.  相似文献   

12.
In this paper, we describe our recent work at Microsoft Research, in the project codenamed Dr. Who, aimed at the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present in detail MiPad as the first Dr. Who's application that addresses specifically the mobile user interaction scenario. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multimodal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution to the current prevailing problem of pecking with tiny styluses or typing on minuscule keyboards in today's PDAs or smart phones. Despite its current incomplete implementation, we have observed that speech and pen have the potential to significantly improve user experience in our user study reported in this paper. We describe in this system-oriented paper the main components of MiPad, with a focus on the robust speech processing and spoken language understanding aspects. The detailed MiPad components discussed include: distributed speech recognition considerations for the speech processing algorithm design; a stereo-based speech feature enhancement algorithm used for noise-robust front-end speech processing; Aurora2 evaluation results for this front-end processing; speech feature compression (source coding) and error protection (channel coding) for distributed speech recognition in MiPad; HMM-based acoustic modeling for continuous speech recognition decoding; a unified language model integrating context-free grammar and N-gram model for the speech decoding; schema-based knowledge representation for the MiPad's personal information management task; a unified statistical framework that integrates speech recognition, spoken language understanding and dialogue management; the robust natural language parser used in MiPad to process the speech recognizer's output; a machine-aided grammar learning and development used for spoken language understanding for the MiPad task; Tap & Talk multimodal interaction and user interface design; back channel communication and MiPad's error repair strategy; and finally, user study results that demonstrate the superior throughput achieved by the Tap & Talk multimodal interaction over the existing pen-only PDA interface. These user study results highlight the crucial role played by speech in enhancing the overall user experience in MiPad-like human-computer interaction devices.  相似文献   

13.
张文  张毅  满毅  陈晓峰 《电信科学》2011,27(8):117-121
提出一种集成共享平台,是一种可对外提供多种基本且可扩展接口,以适应不同接口的应用系统。基于此平台,多个系统之间可以相互调用多种服务,提高服务管理功能的同时进行无障碍的信息交互;及时发现分析系统交互异常信息,实现智能维护管理。当出现新的接口协议时,只需针对所述平台进行一次新接口协议适配器的开发,即可完成新接口的接入,提高了平台开发效率,降低了开发维护成本。本平台将传统的点对点的应用方式改为多对多的总线方式,实现了各系统间简单便捷的信息交互,优化了各系统间的架构,易于管理和维护,减少了接入系统的接口改造工作。  相似文献   

14.
Kelly  P. Moezzi  S. 《Multimedia, IEEE》1995,2(1):94-99
It would be difficult to overestimate the importance of visual information in current computer systems. Visual computing, which embraces processing, interpreting, modeling, assimilating, storing, retrieving, and synthesizing visual information, now plays a crucial role in many fields. These include multimedia, virtual reality, robotics, scientific visualization, and communications systems. And the demand for further integration of visual information into these areas shows every sign of continuing unabated. Under the direction of Ramesh Jain, the Visual Computing Laboratory at the University of California, San Diego, was established as a center for innovative visual computing research to address the requirements of these applications in next-generation computer technologies. As such, the Visual Computing Lab hosts a group of researchers working in a variety of areas, notably multimedia databases, information assimilation, interactive video, and visual interaction through gesture recognition. This article presents a high-level overview of activities in the Visual Computing Laboratory and provides some details on prototype systems that we are currently developing  相似文献   

15.
In this paper, we describe some abstract features of human/machine interaction systems that are required for the production of intelligent behaviour. We introduce a subset of intelligent systems called human-centered intelligent systems (HCIS) and argue that such systems must be autonomous, robust and adaptive in order to be intelligent. We also propose soft computing as a promising new technique that can be used to build HCIS, and present examples where this is already being done. The paper defines flexibility to be a combination of the often-conflicting requirements of robustness and adaptability, and based on this we claim that the right balance between these two features is necessary to achieve intelligent behaviour.We describe the intelligent assistant (IA) system and its various components which automatically perform helpful tasks for its user, so as to enable the user to improve productivity. These tasks include time, information and communication management. Time management involves planning and scheduling, decision making and learning user habits. Information management involves information seeking and filtering, information fusion, decision making and learning user preferences. Communication management involves recognising user behaviour and learning user priorities. All these tasks depend on many factors including the type of activity, its originator, the mood of the user, past experience, and the priority of the task. The IA uses a multimodal interface with conventional interfaces such as keyboard and mouse enhanced to include vision, speech and natural language processing. The inclusion of such extra modalities extends the capabilities of existing systems at the cost of introducing extra complexity. The IA is 'smart' because it has the knowledge about tasks and the capability to learn and adapt to new interactions with its user and with other systems.  相似文献   

16.
Toward an affect-sensitive multimodal human-computer interaction   总被引:3,自引:0,他引:3  
The ability to recognize affective states of a person we are communicating with is the core of emotional intelligence. Emotional intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for successful interpersonal social interaction. This paper argues that next-generation human-computer interaction (HCI) designs need to include the essence of emotional intelligence - the ability to recognize a user's affective states-in order to become more human-like, more effective, and more efficient. Affective arousal modulates all nonverbal communicative cues (facial expressions, body movements, and vocal and physiological reactions). In a face-to-face interaction, humans detect and interpret those interactive signals of their communicator with little or no effort. Yet design and development of an automated system that accomplishes these tasks is rather difficult. This paper surveys the past work in solving these problems by a computer and provides a set of recommendations for developing the first part of an intelligent multimodal HCI-an automatic personalized analyzer of a user's nonverbal affective feedback.  相似文献   

17.
ATM has rapidly transitioned from a standards and prototyping concept to become the next-generation switching technology used in products available on the market. With the rapid introduction of ATM switches into networks, there is an urgent need to manage them. The article discusses the telecommunication management network (TMN) interfaces being defined for management systems to communicate with ATM network elements (NEs) and other management systems. ATM management systems will have to communicate with ATM NEs in their jurisdiction using TMN interfaces. Networks will usually contain equipment from different suppliers. Thus, it is vital that there be standard management interfaces so that these different NEs can be managed. Some standard interfaces for ATM networks are defined, while others are being defined. The status of these interfaces is reviewed in the article. Communication between different networks is also needed, both between public networks and between public and private networks. Management personnel of one network need to exchange information with other networks for certain functions (e.g., initial service provisioning), and so management systems of different networks will exchange information through a combination of mechanized and manual interfaces. The status of these interfaces is also reviewed in the article  相似文献   

18.
We describe our work on haptic holography, a combination of computational modeling and multimodal spatial display, which allows a person to see, feel, and interact with three-dimensional freestanding holographic images of material surfaces. In this paper, we combine various holographic displays with a force-feedback device to render multimodal images with programmatically prescribed material properties and behavior. After a brief overview of related work which situates visual display within the manual workspace, we describe our holo-haptic approach and survey three implementations, Touch, Lathe, and Poke, each named for the primitive functional affordance it offers. In Touch, static holographic images of simple geometric scenes are reconstructed in front of the hologram plane, and coregistered with a force model of the same geometry. These images can be visually inspected and haptically explored using a handheld interface. In Lathe, a holo-haptic image can be reshaped by haptic interaction in a dynamic but constrained manner. Finally in Poke, using a new technique for updating interference-modeled holographic fringe patterns, we render a holo-haptic image that permits more flexible interactive reshaping of its reconstructed surface. We situate this work within the context of related research and describe the strengths, shortcomings, and implications of our approach.  相似文献   

19.
CRIGOS: a compact robot for image-guided orthopedic surgery   总被引:5,自引:0,他引:5  
The CRIGOS (compact robot for image-guided orthopedic surgery) project was set up for the development of a compact surgical robot system for image-guided orthopedic surgery based on user requirements. The modular system comprises a compact parallel robot and a software system for planning of surgical interventions and for supervision of the robotic device. Because it is not sufficient to consider only technical aspects in order to improve clinical routines the therapeutic outcome of conventional interventions, a user-centered and task-oriented design process has been developed which also takes human factors into account. The design process for the CRIGOS system was started from requirement analysis of various orthopedic interventions using information gathered from literature, questionnaires, and workshops with domain experts. This resulted in identification of conventional interventions for which the robotic system would improve the medical and procedural quality. A system design concept has been elaborated which includes definitions of components, functionalities, and interfaces, Approaches to the acquisition of calibrated X-rays will be presented in the paper together with design and evaluation of a first human-computer interface. Finally, the first lab-type parallel robot based on low-cost standard components is presented together with the first evaluation results concerning positioning accuracy  相似文献   

20.
Current trends in microprocessor design integrate several autonomous processing cores onto the same die. These multicore architectures are particularly well-suited for computer vision applications, where it is typical to perform the same set of operations repeatedly over large datasets. These memory- and computation-intensive applications can reap tremendous performance and accuracy benefits from concurrent execution on multi-core processors. However, cost-sensitive embedded platforms place real-time performance and efficiency demands on techniques to accomplish this task. Furthermore, parallelization and partitioning techniques that allow the application to fully leverage the processing capabilities of each computing core are required for multi-core embedded vision systems. In this paper, we evaluate background modeling techniques on a multicore embedded platform, since this process dominates the execution and storage costs of common video analysis workloads. We introduce a new adaptive backgrounding technique, multimodal mean, which balances accuracy, performance, and efficiency to meet embedded system requirements. Our evaluation compares several pixel-level background modeling techniques in terms of their computation and storage requirements, and functional accuracy for three representative video sequences, across a range of processing and parallelization configurations. We show that the multimodal mean algorithm delivers comparable accuracy of the best alternative (Mixture of Gaussians) with a 3.4× improvement in execution time and a 50% reduction in required storage for optimal block processing on each core. In our analysis of several processing and parallelization configurations, we show how this algorithm can be optimized for embedded multicore performance, resulting in a 25% performance improvement over the baseline processing method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号