共查询到20条相似文献,搜索用时 250 毫秒
1.
Tetsushi Oka Toyokazu Abe Kaoru Sugita Masao Yokota 《Artificial Life and Robotics》2009,13(2):455-459
This article describes a multimodal command language for home robot users, and a robot system which interprets users’ messages
in the language through microphones, visual and tactile sensors, and control buttons. The command language comprises a set
of grammar rules, a lexicon, and nonverbal events detected in hand gestures, readings of tactile sensors attached to the robots,
and buttons on the controllers in the users’ hands. Prototype humanoid systems which immediately execute commands in the language
are also presented, along with preliminary experiments of faceto-face interactions and teleoperations. Subjects unfamiliar
with the language were able to command humanoids and complete their tasks with brief documents at hand, given a short demonstration
beforehand. The command understanding system operating on PCs responded to multimodal commands without significant delay.
This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January
31–February 2, 2008 相似文献
2.
Tetsushi Oka Toyokazu Abe Kaoru Sugita Masao Yokota 《Artificial Life and Robotics》2011,16(2):224-228
This article describes a user study of a life-supporting humanoid directed in a multimodal language, and discusses the results.
Twenty inexperienced users commanded the humanoid in a computer-simulated remote home environment in the multimodal language
by pressing keypad buttons and speaking to the robot. The results show that they comprehended the language well and were able
to give commands successfully. They often chose a press-button action in place of verbal phrases to specify a direction, speed,
length, angle, and/or temperature value, and preferred multimodal commands to spoken commands. However, they did not think
that it was very easy to give commands in the language. This article discusses the results and points out both strong and
weak points of the language and our robot. 相似文献
3.
Commanding a humanoid to move objects in a multimodal language 总被引:2,自引:2,他引:0
This article describes a study on a humanoid robot that moves objects at the request of its users. The robot understands commands
in a multimodal language which combines spoken messages and two types of hand gesture. All of ten novice users directed the
robot using gestures when they were asked to spontaneously direct the robot to move objects after learning the language for
a short period of time. The success rate of multimodal commands was over 90%, and the users completed their tasks without
trouble. They thought that gestures were preferable to, and as easy as, verbal phrases to inform the robot of action parameters
such as direction, angle, step, width, and height. The results of the study show that the language is fairly easy for nonexperts
to learn, and can be made more effective for directing humanoids to move objects by making the language more sophisticated
and improving our gesture detector. 相似文献
4.
Assistance is currently a pivotal research area in robotics, with huge societal potential. Since assistant robots directly
interact with people, finding natural and easy-to-use user interfaces is of fundamental importance. This paper describes a
flexible multimodal interface based on speech and gesture modalities in order to control our mobile robot named Jido. The
vision system uses a stereo head mounted on a pan-tilt unit and a bank of collaborative particle filters devoted to the upper
human body extremities to track and recognize pointing/symbolic mono but also bi-manual gestures. Such framework constitutes
our first contribution, as it is shown, to give proper handling of natural artifacts (self-occlusion, camera out of view field,
hand deformation) when performing 3D gestures using one or the other hand even both. A speech recognition and understanding
system based on the Julius engine is also developed and embedded in order to process deictic and anaphoric utterances. The
second contribution deals with a probabilistic and multi-hypothesis interpreter framework to fuse results from speech and
gesture components. Such interpreter is shown to improve the classification rates of multimodal commands compared to using
either modality alone. Finally, we report on successful live experiments in human-centered settings. Results are reported
in the context of an interactive manipulation task, where users specify local motion commands to Jido and perform safe object
exchanges. 相似文献
5.
Command and control (C&C) speech recognition allows users to interact with a system by speaking commands or asking questions
restricted to a fixed grammar containing pre-defined phrases. Whereas C&C interaction has been commonplace in telephony and
accessibility systems for many years, only recently have mobile devices had the memory and processing capacity to support
client-side speech recognition. Given the personal nature of mobile devices, statistical models that can predict commands
based in part on past user behavior hold promise for improving C&C recognition accuracy. For example, if a user calls a spouse
at the end of every workday, the language model could be adapted to weight the spouse more than other contacts during that
time. In this paper, we describe and assess statistical models learned from a large population of users for predicting the
next user command of a commercial C&C application. We explain how these models were used for language modeling, and evaluate
their performance in terms of task completion. The best performing model achieved a 26% relative reduction in error rate compared
to the base system. Finally, we investigate the effects of personalization on performance at different learning rates via
online updating of model parameters based on individual user data. Personalization significantly increased relative reduction
in error rate by an additional 5%. 相似文献
6.
This study presents a user interface that was intentionally designed to support multimodal interaction by compensating for the weaknesses of speech compared with pen input and vice versa. The test application was email using a web pad with pen and speech input. In the case of pen input, information was represented as visual objects, which were easily accessible. Graphical metaphors were used to enable faster and easier manipulation of data. Speech input was facilitated by displaying the system speech vocabulary to the user. All commands and accessible fields with text labels could be spoken in by name. Commands and objects that the user could access via speech input were shown on a dynamic basis in a window. Multimodal interaction was further enhanced by creating a flexible object-action order such that the user could utter or select a command with a pen followed by the object which was to be enacted upon, or the other way round (e.g., New Message or Message New). The flexible action-object interaction design combined with voice and pen input led to eight possible action-object-modality combinations. The complexity of the multimodal interface was further reduced by making generic commands such as New applicable across corresponding objects. Use of generic commands led to a simplification of menu structures by reducing the number of instances in which actions appeared. In this manner, more content information could be made visible and consistently accessible via pen and speech input. Results of a controlled experiment indicated that the shortest task completion times for the eight possible input conditions were when speech-only was used to refer to an object followed by the action to be performed. Speech-only input with action-object order was also relatively fast. In the case of pen input-only, the shortest task completion times were found when an object was selected first followed by the action to be performed. In multimodal trials in which both pen and speech were used, no significant effect was found for object-action order, suggesting benefits of providing users with a flexible action-object interaction style in multimodal or speech-only systems. 相似文献
7.
This article proposes a multimodal language to communicate with life-supporting robots through a touch screen and a speech
interface. The language is designed for untrained users who need support in their daily lives from cost-effective robots.
In this language, the users can combine spoken and pointing messages in an interactive manner in order to convey their intentions
to the robots. Spoken messages include verb and noun phrases which describe intentions. Pointing messages are given when the
user’s finger touches a camera image, a picture containing a robot body, or a button on a touch screen at hand which convey
a location in their environment, a direction, a body part of the robot, a cue, a reply to a query, or other information to
help the robot. This work presents the philosophy and structure of the language. 相似文献
8.
《Advanced Robotics》2013,27(3-4):293-328
This paper presents a method of controlling robot manipulators with fuzzy voice commands. Recently, there has been some research on controlling robots using information-rich fuzzy voice commands such as 'go little slowly' and learning from such commands. However, the scope of all those works was limited to basic fuzzy voice motion commands. In this paper, we introduce a method of controlling the posture of a manipulator using complex fuzzy voice commands. A complex fuzzy voice command is composed of a set of fuzzy voice joint commands. Complex fuzzy voice commands can be used for complicated maneuvering of a manipulator, while fuzzy voice joint commands affect only a single joint. Once joint commands are learned, any complex command can be learned as a combination of some or all of them, so that, using the learned complex commands, a human user can control the manipulator in a complicated manner with natural language commands. Learning of complex commands is discussed in the framework of fuzzy coach–player model. The proposed idea is demonstrated with a PA-10 redundant manipulator. 相似文献
9.
10.
In order for robots to effectively understand natural language commands, they must be able to acquire meaning representations that can be mapped to perceptual features in the external world. Previous approaches to learning these grounded meaning representations require detailed annotations at training time. In this paper, we present an approach to grounded language acquisition which is capable of jointly learning a policy for following natural language commands such as “Pick up the tire pallet,” as well as a mapping between specific phrases in the language and aspects of the external world; for example the mapping between the words “the tire pallet” and a specific object in the environment. Our approach assumes a parametric form for the policy that the robot uses to choose actions in response to a natural language command that factors based on the structure of the language. We use a gradient method to optimize model parameters. Our evaluation demonstrates the effectiveness of the model on a corpus of commands given to a robotic forklift by untrained users. 相似文献
11.
In this paper, a voice activated robot arm with intelligence is presented. The robot arm is controlled with natural connected speech input. The language input allows a user to interact with the robot in terms which are familiar to most people. The advantages of speech activated robots are hands-free and fast data input operations. The proposed robot is capable of understanding the meaning of natural language commands. After interpreting the voice commands a series of control data for performing a tasks are generated. Finally the robot actually performs the task. Artificial Intelligence techniques are used to make the robot understand voice commands and act in the desired mode. It is also possible to control the robot using the keyboard input mode. 相似文献
12.
Merav Chkroun Amos Azaria 《International journal of human-computer interaction》2013,29(17):1596-1607
ABSTRACTIn the imminent future, people are likely to engage with smart devices by instructing them in natural language. A fundamental question to ask is how might intelligent agents interpret such instructions and learn new tasks. In this article we present the first speech-based virtual assistant that can be taught new commands by speech. A user study on our agent has shown that people can teach it new commands. We also show that people see great advantage in using an instructable agent, and determine what users believe are the most important use cases of such an agent. 相似文献
13.
Show me: automatic presentation for visual analysis 总被引:1,自引:0,他引:1
Mackinlay J Hanrahan P Stolte C 《IEEE transactions on visualization and computer graphics》2007,13(6):1137-1144
This paper describes Show Me, an integrated set of user interface commands and defaults that incorporate automatic presentation into a commercial visual analysis system called Tableau. A key aspect of Tableau is VizQL, a language for specifying views, which is used by Show Me to extend automatic presentation to the generation of tables of views (commonly called small multiple displays). A key research issue for the commercial application of automatic presentation is the user experience, which must support the flow of visual analysis. User experience has not been the focus of previous research on automatic presentation. The Show Me user experience includes the automatic selection of mark types, a command to add a single field to a view, and a pair of commands to build views for multiple fields. Although the use of these defaults and commands is optional, user interface logs indicate that Show Me is used by commercial users. 相似文献
14.
15.
16.
The article presents multiple pattern formation control of the multi-robot system using A* searching algorithm, and avoids the collision points moving on the motion platform. We use speech recognition algorithm to decide the various pattern formations, and program mobile robots to present the movement scenario on the grid-based motion platform. We have been developed some pattern formations to be applied in game applications, such as long snake pattern formation, phalanx pattern formation, crane wing pattern formation, sword pattern formation, cone pattern formation and so on. The mobile robot contains a controller module, three IR sensor modules, a voice module, a wireless RF module, a compass module, and two DC servomotors. The controller of the mobile robot can acquire the detection signals from reflect IR sensor modules and the compass module, and decide the cross points of the aisle. The mobile robot receives the command from the supervised computer, and transmits the status of environment to the supervised computer via wireless RF interface. We develop the user interface of the multi-robot system to program motion paths for various pattern formation exchanges using the minimum displacement. Users can use speech to control the multiple mobile robots to execut pattern formation exchange. In the experimental results, users can speak the pattern formation. The speech recognition system receives the signal to decide the pattern formation. The multiple mobile robots can receive the pattern formation command from the supervised computer, and arrange the assigned pattern formation on the motion platform, and avoid other mobile robots. 相似文献
17.
Mobile robot programming using natural language 总被引:3,自引:0,他引:3
How will naive users program domestic robots? This paper describes the design of a practical system that uses natural language to teach a vision-based robot how to navigate in a miniature town. To enable unconstrained speech the robot is provided with a set of primitive procedures derived from a corpus of route instructions. When the user refers to a route that is not known to the robot, the system will learn it by combining primitives as instructed by the user. This paper describes the components of the Instruction-Based Learning architecture and discusses issues of knowledge representation, the selection of primitives and the conversion of natural language into robot-understandable procedures. 相似文献
18.
Gang Pan Jiahui Wu Daqing Zhang Zhaohui Wu Yingchun Yang Shijian Li 《Personal and Ubiquitous Computing》2010,14(8):723-735
In this paper, we present a handheld device called GeeAir for remotely controlling home appliances via a mixed modality of speech, gesture, joystick, button, and light. This solution
is superior to the existing universal remote controllers in that it can be used by the users with physical and vision impairments
in a natural manner. By combining diverse interaction techniques in a single device, the GeeAir enables different user groups
to control home appliances effectively, satisfying even the unmet needs of physically and vision-impaired users while maintaining
high usability and reliability. The experiments demonstrate that the GeeAir prototype achieves prominent performance through
standardizing a small set of verbal and gesture commands and introducing the feedback mechanisms. 相似文献
19.
Marjorie Skubic Derek Anderson Samuel Blisard Dennis Perzanowski Alan Schultz 《Autonomous Robots》2007,22(4):399-410
In this paper, we describe a prototype interface that facilitates the control of a mobile robot team by a single operator,
using a sketch interface on a Tablet PC. The user draws a sketch map of the scene and includes the robots in approximate starting
positions. Both path and target position commands are supported as well as editing capabilities. Sensor feedback from the
robots is included in the display such that the sketch interface acts as a two-way communication device between the user and
the robots. The paper also includes results of a usability study, in which users were asked to perform a series of tasks. 相似文献
20.
Jennifer Lai Stella Mitchell Christopher Pavlovski 《International Journal of Speech Technology》2007,10(1):17-30
As Third Generation (3G) networks emerge they provide not only higher data transmission rates but also the ability to transmit
both voice and low latency data within the same session. This paper describes the architecture and implementation of a multimodal
application (voice and text) that uses natural language understanding combined with a WAP browser to access email messages
on a cell phone. We present results from the use of the system by users as part of a laboratory trial that evaluated usage.
The user trial also compared the multimodal system with a text-only system that is representative of current products in the
market today. We discuss the observed modality issues and highlight implementation problems and usability concerns that were
encountered in the trial. Findings indicate that speech was used the majority of the time by participants for both input and
navigation even though most of the participants had little or no prior experience with speech systems (yet did have prior
experience with text-only access to applications on their phones). To our knowledge this represents the first implementation
and evaluation of its kind using this combination of technologies on an unmodified cell phone. Design implications resulting
from the study findings and usability issues encountered are presented to inform the design of future conversational multimodal
mobile applications. 相似文献