首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
This article describes a multimodal command language for home robot users, and a robot system which interprets users’ messages in the language through microphones, visual and tactile sensors, and control buttons. The command language comprises a set of grammar rules, a lexicon, and nonverbal events detected in hand gestures, readings of tactile sensors attached to the robots, and buttons on the controllers in the users’ hands. Prototype humanoid systems which immediately execute commands in the language are also presented, along with preliminary experiments of faceto-face interactions and teleoperations. Subjects unfamiliar with the language were able to command humanoids and complete their tasks with brief documents at hand, given a short demonstration beforehand. The command understanding system operating on PCs responded to multimodal commands without significant delay. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

2.
This article describes a user study of a life-supporting humanoid directed in a multimodal language, and discusses the results. Twenty inexperienced users commanded the humanoid in a computer-simulated remote home environment in the multimodal language by pressing keypad buttons and speaking to the robot. The results show that they comprehended the language well and were able to give commands successfully. They often chose a press-button action in place of verbal phrases to specify a direction, speed, length, angle, and/or temperature value, and preferred multimodal commands to spoken commands. However, they did not think that it was very easy to give commands in the language. This article discusses the results and points out both strong and weak points of the language and our robot.  相似文献   

3.
Commanding a humanoid to move objects in a multimodal language   总被引:2,自引:2,他引:0  
This article describes a study on a humanoid robot that moves objects at the request of its users. The robot understands commands in a multimodal language which combines spoken messages and two types of hand gesture. All of ten novice users directed the robot using gestures when they were asked to spontaneously direct the robot to move objects after learning the language for a short period of time. The success rate of multimodal commands was over 90%, and the users completed their tasks without trouble. They thought that gestures were preferable to, and as easy as, verbal phrases to inform the robot of action parameters such as direction, angle, step, width, and height. The results of the study show that the language is fairly easy for nonexperts to learn, and can be made more effective for directing humanoids to move objects by making the language more sophisticated and improving our gesture detector.  相似文献   

4.
Assistance is currently a pivotal research area in robotics, with huge societal potential. Since assistant robots directly interact with people, finding natural and easy-to-use user interfaces is of fundamental importance. This paper describes a flexible multimodal interface based on speech and gesture modalities in order to control our mobile robot named Jido. The vision system uses a stereo head mounted on a pan-tilt unit and a bank of collaborative particle filters devoted to the upper human body extremities to track and recognize pointing/symbolic mono but also bi-manual gestures. Such framework constitutes our first contribution, as it is shown, to give proper handling of natural artifacts (self-occlusion, camera out of view field, hand deformation) when performing 3D gestures using one or the other hand even both. A speech recognition and understanding system based on the Julius engine is also developed and embedded in order to process deictic and anaphoric utterances. The second contribution deals with a probabilistic and multi-hypothesis interpreter framework to fuse results from speech and gesture components. Such interpreter is shown to improve the classification rates of multimodal commands compared to using either modality alone. Finally, we report on successful live experiments in human-centered settings. Results are reported in the context of an interactive manipulation task, where users specify local motion commands to Jido and perform safe object exchanges.  相似文献   

5.
Command and control (C&C) speech recognition allows users to interact with a system by speaking commands or asking questions restricted to a fixed grammar containing pre-defined phrases. Whereas C&C interaction has been commonplace in telephony and accessibility systems for many years, only recently have mobile devices had the memory and processing capacity to support client-side speech recognition. Given the personal nature of mobile devices, statistical models that can predict commands based in part on past user behavior hold promise for improving C&C recognition accuracy. For example, if a user calls a spouse at the end of every workday, the language model could be adapted to weight the spouse more than other contacts during that time. In this paper, we describe and assess statistical models learned from a large population of users for predicting the next user command of a commercial C&C application. We explain how these models were used for language modeling, and evaluate their performance in terms of task completion. The best performing model achieved a 26% relative reduction in error rate compared to the base system. Finally, we investigate the effects of personalization on performance at different learning rates via online updating of model parameters based on individual user data. Personalization significantly increased relative reduction in error rate by an additional 5%.  相似文献   

6.
This study presents a user interface that was intentionally designed to support multimodal interaction by compensating for the weaknesses of speech compared with pen input and vice versa. The test application was email using a web pad with pen and speech input. In the case of pen input, information was represented as visual objects, which were easily accessible. Graphical metaphors were used to enable faster and easier manipulation of data. Speech input was facilitated by displaying the system speech vocabulary to the user. All commands and accessible fields with text labels could be spoken in by name. Commands and objects that the user could access via speech input were shown on a dynamic basis in a window. Multimodal interaction was further enhanced by creating a flexible object-action order such that the user could utter or select a command with a pen followed by the object which was to be enacted upon, or the other way round (e.g., New Message or Message New). The flexible action-object interaction design combined with voice and pen input led to eight possible action-object-modality combinations. The complexity of the multimodal interface was further reduced by making generic commands such as New applicable across corresponding objects. Use of generic commands led to a simplification of menu structures by reducing the number of instances in which actions appeared. In this manner, more content information could be made visible and consistently accessible via pen and speech input. Results of a controlled experiment indicated that the shortest task completion times for the eight possible input conditions were when speech-only was used to refer to an object followed by the action to be performed. Speech-only input with action-object order was also relatively fast. In the case of pen input-only, the shortest task completion times were found when an object was selected first followed by the action to be performed. In multimodal trials in which both pen and speech were used, no significant effect was found for object-action order, suggesting benefits of providing users with a flexible action-object interaction style in multimodal or speech-only systems.  相似文献   

7.
This article proposes a multimodal language to communicate with life-supporting robots through a touch screen and a speech interface. The language is designed for untrained users who need support in their daily lives from cost-effective robots. In this language, the users can combine spoken and pointing messages in an interactive manner in order to convey their intentions to the robots. Spoken messages include verb and noun phrases which describe intentions. Pointing messages are given when the user’s finger touches a camera image, a picture containing a robot body, or a button on a touch screen at hand which convey a location in their environment, a direction, a body part of the robot, a cue, a reply to a query, or other information to help the robot. This work presents the philosophy and structure of the language.  相似文献   

8.
《Advanced Robotics》2013,27(3-4):293-328
This paper presents a method of controlling robot manipulators with fuzzy voice commands. Recently, there has been some research on controlling robots using information-rich fuzzy voice commands such as 'go little slowly' and learning from such commands. However, the scope of all those works was limited to basic fuzzy voice motion commands. In this paper, we introduce a method of controlling the posture of a manipulator using complex fuzzy voice commands. A complex fuzzy voice command is composed of a set of fuzzy voice joint commands. Complex fuzzy voice commands can be used for complicated maneuvering of a manipulator, while fuzzy voice joint commands affect only a single joint. Once joint commands are learned, any complex command can be learned as a combination of some or all of them, so that, using the learned complex commands, a human user can control the manipulator in a complicated manner with natural language commands. Learning of complex commands is discussed in the framework of fuzzy coach–player model. The proposed idea is demonstrated with a PA-10 redundant manipulator.  相似文献   

9.
10.
In order for robots to effectively understand natural language commands, they must be able to acquire meaning representations that can be mapped to perceptual features in the external world. Previous approaches to learning these grounded meaning representations require detailed annotations at training time. In this paper, we present an approach to grounded language acquisition which is capable of jointly learning a policy for following natural language commands such as “Pick up the tire pallet,” as well as a mapping between specific phrases in the language and aspects of the external world; for example the mapping between the words “the tire pallet” and a specific object in the environment. Our approach assumes a parametric form for the policy that the robot uses to choose actions in response to a natural language command that factors based on the structure of the language. We use a gradient method to optimize model parameters. Our evaluation demonstrates the effectiveness of the model on a corpus of commands given to a robotic forklift by untrained users.  相似文献   

11.
In this paper, a voice activated robot arm with intelligence is presented. The robot arm is controlled with natural connected speech input. The language input allows a user to interact with the robot in terms which are familiar to most people. The advantages of speech activated robots are hands-free and fast data input operations. The proposed robot is capable of understanding the meaning of natural language commands. After interpreting the voice commands a series of control data for performing a tasks are generated. Finally the robot actually performs the task. Artificial Intelligence techniques are used to make the robot understand voice commands and act in the desired mode. It is also possible to control the robot using the keyboard input mode.  相似文献   

12.
ABSTRACT

In the imminent future, people are likely to engage with smart devices by instructing them in natural language. A fundamental question to ask is how might intelligent agents interpret such instructions and learn new tasks. In this article we present the first speech-based virtual assistant that can be taught new commands by speech. A user study on our agent has shown that people can teach it new commands. We also show that people see great advantage in using an instructable agent, and determine what users believe are the most important use cases of such an agent.  相似文献   

13.
Show me: automatic presentation for visual analysis   总被引:1,自引:0,他引:1  
This paper describes Show Me, an integrated set of user interface commands and defaults that incorporate automatic presentation into a commercial visual analysis system called Tableau. A key aspect of Tableau is VizQL, a language for specifying views, which is used by Show Me to extend automatic presentation to the generation of tables of views (commonly called small multiple displays). A key research issue for the commercial application of automatic presentation is the user experience, which must support the flow of visual analysis. User experience has not been the focus of previous research on automatic presentation. The Show Me user experience includes the automatic selection of mark types, a command to add a single field to a view, and a pair of commands to build views for multiple fields. Although the use of these defaults and commands is optional, user interface logs indicate that Show Me is used by commercial users.  相似文献   

14.
提出了应用于智能家居老年人帮助的新型多模态人机交互模式。构建了基于avatar的智能家居人机交互原型系统,集成了语音处理和视线追踪功能,实现了视觉和听觉双通道交互;同时采用基于规则的任务推理方法感知用户任务信息。测试结果表明,该交互模式提高了老年人的交互体验。  相似文献   

15.
16.
The article presents multiple pattern formation control of the multi-robot system using A* searching algorithm, and avoids the collision points moving on the motion platform. We use speech recognition algorithm to decide the various pattern formations, and program mobile robots to present the movement scenario on the grid-based motion platform. We have been developed some pattern formations to be applied in game applications, such as long snake pattern formation, phalanx pattern formation, crane wing pattern formation, sword pattern formation, cone pattern formation and so on. The mobile robot contains a controller module, three IR sensor modules, a voice module, a wireless RF module, a compass module, and two DC servomotors. The controller of the mobile robot can acquire the detection signals from reflect IR sensor modules and the compass module, and decide the cross points of the aisle. The mobile robot receives the command from the supervised computer, and transmits the status of environment to the supervised computer via wireless RF interface. We develop the user interface of the multi-robot system to program motion paths for various pattern formation exchanges using the minimum displacement. Users can use speech to control the multiple mobile robots to execut pattern formation exchange. In the experimental results, users can speak the pattern formation. The speech recognition system receives the signal to decide the pattern formation. The multiple mobile robots can receive the pattern formation command from the supervised computer, and arrange the assigned pattern formation on the motion platform, and avoid other mobile robots.  相似文献   

17.
Mobile robot programming using natural language   总被引:3,自引:0,他引:3  
How will naive users program domestic robots? This paper describes the design of a practical system that uses natural language to teach a vision-based robot how to navigate in a miniature town. To enable unconstrained speech the robot is provided with a set of primitive procedures derived from a corpus of route instructions. When the user refers to a route that is not known to the robot, the system will learn it by combining primitives as instructed by the user. This paper describes the components of the Instruction-Based Learning architecture and discusses issues of knowledge representation, the selection of primitives and the conversion of natural language into robot-understandable procedures.  相似文献   

18.
GeeAir: a universal multimodal remote control device for home appliances   总被引:2,自引:2,他引:0  
In this paper, we present a handheld device called GeeAir for remotely controlling home appliances via a mixed modality of speech, gesture, joystick, button, and light. This solution is superior to the existing universal remote controllers in that it can be used by the users with physical and vision impairments in a natural manner. By combining diverse interaction techniques in a single device, the GeeAir enables different user groups to control home appliances effectively, satisfying even the unmet needs of physically and vision-impaired users while maintaining high usability and reliability. The experiments demonstrate that the GeeAir prototype achieves prominent performance through standardizing a small set of verbal and gesture commands and introducing the feedback mechanisms.  相似文献   

19.
In this paper, we describe a prototype interface that facilitates the control of a mobile robot team by a single operator, using a sketch interface on a Tablet PC. The user draws a sketch map of the scene and includes the robots in approximate starting positions. Both path and target position commands are supported as well as editing capabilities. Sensor feedback from the robots is included in the display such that the sketch interface acts as a two-way communication device between the user and the robots. The paper also includes results of a usability study, in which users were asked to perform a series of tasks.  相似文献   

20.
As Third Generation (3G) networks emerge they provide not only higher data transmission rates but also the ability to transmit both voice and low latency data within the same session. This paper describes the architecture and implementation of a multimodal application (voice and text) that uses natural language understanding combined with a WAP browser to access email messages on a cell phone. We present results from the use of the system by users as part of a laboratory trial that evaluated usage. The user trial also compared the multimodal system with a text-only system that is representative of current products in the market today. We discuss the observed modality issues and highlight implementation problems and usability concerns that were encountered in the trial. Findings indicate that speech was used the majority of the time by participants for both input and navigation even though most of the participants had little or no prior experience with speech systems (yet did have prior experience with text-only access to applications on their phones). To our knowledge this represents the first implementation and evaluation of its kind using this combination of technologies on an unmodified cell phone. Design implications resulting from the study findings and usability issues encountered are presented to inform the design of future conversational multimodal mobile applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号