首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this article, a fuzzy neural network (FNN)-based approach is presented to interpret imprecise natural language (NL) commands for controlling a machine. This system, (1) interprets fuzzy linguistic information in NL commands for machines, (2) introduces a methodology to implement the contextual meaning of NL commands, and (3) recognizes machine-sensitive words from the running utterances which consist of both in-vocabulary and out-of-vocabulary words. The system achieves these capabilities through a FNN, which is used to interpret fuzzy linguistic information, a hidden Markov model-based key-word spotting system, which is used to identify machine-sensitive words among unrestricted user utterances, and a possible framework to insert the contextual meaning of words into the knowledge base employed in the fuzzy reasoning process. The system is a complete system integration which converts imprecise NL command inputs into their corresponding output signals in order to control a machine. The performance of the system specifications is examined by navigating a mobile robot in real time by unconditional speech utterances. This work was presented, in part, at the Seventh International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18, 2002  相似文献   

2.
Success rates in a multimodal command language for home robot users   总被引:1,自引:1,他引:0  
This article considers the success rates in a multimodal command language for home robot users. In the command language, the user specifies action types and action parameter values to direct robots in multiple modes such as speech, touch, and gesture. The success rates of commands in the language can be estimated by user evaluations in several ways. This article presents some user evaluation methods, as well as results from recent studies on command success rates. The results show that the language enables users without much training to command home robots at success rates as high as 88%–100%. It is also shown that multimodal commands combining speech and button-press actions included fewer words and were significantly more successful than single-modal spoken commands.  相似文献   

3.
Natural language commands are generated by intelligent human beings. As a result, they contain a lot of information. Therefore, if it is possible to learn from such commands and reuse that knowledge, it will be a very efficient process. In this paper, learning from such information rich voice commands for controlling a robot is studied. First, new concepts of fuzzy coach-player system and sub-coach are proposed for controlling robots with natural language commands. Then, the characteristics of the subjective human decision making process are discussed and a Probabilistic Neural Network (PNN) based learning method is proposed to learn from such commands and to reuse the acquired knowledge. Finally, the proposed concept is demonstrated and confirmed with experiments conducted using a PA-10 redundant manipulator.  相似文献   

4.
《Advanced Robotics》2013,27(3-4):293-328
This paper presents a method of controlling robot manipulators with fuzzy voice commands. Recently, there has been some research on controlling robots using information-rich fuzzy voice commands such as 'go little slowly' and learning from such commands. However, the scope of all those works was limited to basic fuzzy voice motion commands. In this paper, we introduce a method of controlling the posture of a manipulator using complex fuzzy voice commands. A complex fuzzy voice command is composed of a set of fuzzy voice joint commands. Complex fuzzy voice commands can be used for complicated maneuvering of a manipulator, while fuzzy voice joint commands affect only a single joint. Once joint commands are learned, any complex command can be learned as a combination of some or all of them, so that, using the learned complex commands, a human user can control the manipulator in a complicated manner with natural language commands. Learning of complex commands is discussed in the framework of fuzzy coach–player model. The proposed idea is demonstrated with a PA-10 redundant manipulator.  相似文献   

5.
6.
Habitability refers to the match between the language people employ when using a computer system and the language that the system can accept. In this paper, the concept of “habitability” is explored in relation to the design of dialogues for speech-based systems. Two studies investigating the role of habitability in speech systems for banking applications are reported. The first study employed a speech-driven automated teller machine (ATM), using a visual display to indicate available vocabulary. Users made several distinct types of error with this system, indicating that habitability in speech systems cannot be achieved simply by displaying the input language. The second study employed a speech input/speech output home banking application, in which system constraints were indicated by either a spoken menu of words or a “query-style” prompt (e.g. “what service do you require?”). Between-subjects comparisons of these two conditions confirmed that the “menu-style” dialogue was rated as more habitable than the “query-style”. It also led to fewer errors, and was rated as easier to use, suggesting that habitability is a key issue in speech system usability. Comparison with the results of the first study suggests that for speech input, spoken menu prompts may be more habitable than similar menus shown on a visual display. The implications of these results to system design are discussed, and some initial dialogue design recommendations are presented.  相似文献   

7.
Speech interaction systems are currently highly demanded for quick hands-free interactions. Conventional speech interaction systems (SISs) are trained to the user’s voice whilst most modern systems learn from interaction experience overtime. However, because speech expresses a human computer natural interaction (HCNI) with the world, SIS design must lead to interface computer system that can receive spoken information and act appropriately upon that information. In spite of significant advancements in recent years SISs, there still remain a large number of problems which must be solved in order to successfully apply the SISs in practice and also comfortably accepted by the users. Among many other problems, problems of devising and efficient modeling are considered the primary and important step in the speech recognition deployment in hands-free applications. Meanwhile, the brain–computer interfaces (BCIs) allow users to control applications by brain activity. The work presented in this paper emphasizes an improved implementation of SIS by integrating BCI in order to associate the brain signals for a list of commands as identification criteria for each specific command for controlling the wheelchair with spoken commands.  相似文献   

8.
提出了一种基于层叠条件随机场进行救灾机器人自然语言导航命令理解的方法。该方法由三层条件随机场(CRFs)构成:第一层用于导航词性标注,选取词、词性以及上下文作为特征模板生成导航词性标签;第二层用于导航过程提取,选择词、导航词性标签以及上下文构建特征模板生成导航过程标签;第三层用于起点终点识别,选取词、导航词性标签、导航过程标签以及上下文构建特征模板判断出地名词为起点还是终点。根据导航词性与导航要素的对应关系便可从命令中提取出导航信息。该方法能够处理完全不受限的自然语言导航命令,总体正确率达到78.6%,无需依赖特定的指令与地图,对完成救灾机器人导航的人机交互任务具有重要意义。  相似文献   

9.
This paper investigates the intellectualization of text input using a system for accelerated input of texts into digital devices with a view to constructing a model of a corpus of the Ukrainian spoken language and a text typing system based on this model. Such a system uses a smaller number of commands to input letters and predicts variants of words on the basis of the corpus of words and word combinations for communication. It is experimentally shown that the input of texts using four and six command keys is rather efficient for the constructed corpus.  相似文献   

10.
Text-independent speech segmentation is a challenging topic in computer-based speech recognition systems. This paper proposes a novel time-domain algorithm based on fuzzy knowledge for continuous speech segmentation task via a nonlinear speech analysis. Short-term energy, zero-crossing rate and the singularity exponents are the time-domain features that we have calculated in each point of speech signal in order to exploit relevant information for generating the significant segments. This is down for the phoneme or syllable identification and the transition fronts. Fuzzy logic technique helped us to fuzzify the calculated features into three complementary sets namely: low, medium, high and to perform a matching phase using a set of fuzzy rules. The outputs of our proposed algorithm are silence, phonemes, or syllables. Once evaluated, our algorithm produced the best performances with efficient results on Fongbe language (an African tonal language spoken especially in Benin, Togo and Nigeria).  相似文献   

11.
Computer games are now a part of our modern culture. However, certain categories of people are excluded from this form of entertainment and social interaction because they are unable to use the interface of the games. The reason for this can be deficits in motor control, vision or hearing. By using automatic speech recognition systems (ASR), voice driven commands can be used to control the game, which can thus open up the possibility for people with motor system difficulty to be included in game communities. This paper aims at find a standard way of using voice commands in games which uses a speech recognition system in the backend, and that can be universally applied for designing inclusive games. Present speech recognition systems however, do not support emotions, attitudes, tones etc. This is a drawback because such expressions can be vital for gaming. Taking multiple types of existing genres of games into account and analyzing their voice command requirements, a general ASRS module is proposed which can work as a common platform for designing inclusive games. A fuzzy logic controller proposed then is to enhance the system. The standard voice driven module can be based on algorithm or fuzzy controller which can be used to design software plug-ins or can be included in microchip. It then can be integrated with the game engines; creating the possibility of voice driven universal access for controlling games.  相似文献   

12.
Automatic speech recognition is the central part of the wheel towards the natural person-to-machine interaction technique. Due to the high disparity of speaking styles, speech recognition surely demands composite methods to constitute this irregularity. A speech recognition method can work in numerous distinct states such as speaker dependent/independent speech, isolated/continuous/spontaneous speech recognition, for less to very large vocabulary. The Punjabi language is being spoken by concerning 104 million peoples in India, Pakistan and other countries with Punjabi migrants. The Punjabi language is written in Gurmukhi writing in Indian Punjab, while in Shahmukhi writing in Pakistani Punjab. In the paper, the objective is to build the speaker independent automatic spontaneous speech recognition system for the Punjabi language. The system is also capable to recognize the spontaneous Punjabi live speech. So far, no work has to be achieved in the area of spontaneous speech recognition system for the Punjabi language. The user interfaces for Punjabi live speech system is created by using the java programming. Till now, automatic speech system is trained with 6012 Punjabi words and 1433 Punjabi sentences. The performance measured in terms of recognition accuracy which is 93.79% for Punjabi words and 90.8% for Punjabi sentences.  相似文献   

13.
A recognizer of isolated words spoken in the Italian language is presented. Each level of recognition (segmentation, phonemic classification and lexical recognition) is controlled by the rules of appropriate grammars whose symbols are fuzzy linguistic variables. The recognition strategy depends on the lexical redundancy of the protocol and is based on a classification of speech units into broad phonetic classes, eventually followed by a classification into more detailed classes if some ambiguities still remain.  相似文献   

14.
This paper proposes a new technique to test the performance of spoken dialogue systems by artificially simulating the behaviour of three types of user (very cooperative, cooperative and not very cooperative) interacting with a system by means of spoken dialogues. Experiments using the technique were carried out to test the performance of a previously developed dialogue system designed for the fast-food domain and working with two kinds of language model for automatic speech recognition: one based on 17 prompt-dependent language models, and the other based on one prompt-independent language model. The use of the simulated user enables the identification of problems relating to the speech recognition, spoken language understanding, and dialogue management components of the system. In particular, in these experiments problems were encountered with the recognition and understanding of postal codes and addresses and with the lengthy sequences of repetitive confirmation turns required to correct these errors. By employing a simulated user in a range of different experimental conditions sufficient data can be generated to support a systematic analysis of potential problems and to enable fine-grained tuning of the system.  相似文献   

15.
The purpose of my research was to develop a novel voice control system for the use in the robotized manufacturing cells as well as to create tools providing its simple integration into manufacturing. A comprehensive study of existing problems and their possible solutions has been performed. Unlike some other works, it focused on the specific requirements that should be fulfilled by industrially oriented voice control systems. Analysis of existing solutions related to the natural language processing and those related to various voice control applications has been performed. Its goal was to establish the optimal method of voice command analysis for industrially oriented systems. Finally, a voice control system for manufacturing cells has been developed, implemented and practically verified in the laboratory. Unlike many other solutions, it takes into consideration almost all aspects of voice command processing (speech recognition, syntactic and semantic analysis and spontaneous speech effects) and – most importantly – their mutual influence. To provide the simple system customization (integration into any particular manufacturing cells), a special format for quasi-natural sublanguage syntax definition has been developed. A novel algorithm for semantic analysis, using specific features of voice commands used for controlling industrial devices and machines, has been incorporated into the system. Successful implementation in the educational robotized machining cell shows that industrial applications should be possible in the very next future.  相似文献   

16.
Amita Dev 《AI & Society》2009,23(4):603-612
As development of the speech recognition system entirely depends upon the spoken language used for its development, and the very fact that speech technology is highly language dependent and reverse engineering is not possible, there is an utmost need to develop such systems for Indian languages. In this paper we present the implementation of a time delay neural network system (TDNN) in a modular fashion by exploiting the hidden structure of previously phonetic subcategory network for recognition of Hindi consonants. For the present study we have selected all the Hindi phonemes for srecognition. A vocabulary of 207 Hindi words was designed for the task-specific environment and used as a database. For the recognition of phoneme, a three-layered network was constructed and the network was trained using the back propagation learning algorithm. Experiments were conducted to categorize the Hindi voiced, unvoiced stops, semi vowels, vowels, nasals and fricatives. A close observation of confusion matrix of Hindi stops revealed maximum confusion of retroflex stops with their non-retroflex counterparts.  相似文献   

17.
This paper describes the development of LSESpeak, a spoken Spanish generator for Deaf people. This system integrates two main tools: a sign language into speech translation system and an SMS (Short Message Service) into speech translation system. The first tool is made up of three modules: an advanced visual interface (where a deaf person can specify a sequence of signs), a language translator (for generating the sequence of words in Spanish), and finally, an emotional text to speech (TTS) converter to generate spoken Spanish. The visual interface allows a sign sequence to be defined using several utilities. The emotional TTS converter is based on Hidden Semi-Markov Models (HSMMs) permitting voice gender, type of emotion, and emotional strength to be controlled. The second tool is made up of an SMS message editor, a language translator and the same emotional text to speech converter. Both translation tools use a phrase-based translation strategy where translation and target language models are trained from parallel corpora. In the experiments carried out to evaluate the translation performance, the sign language-speech translation system reported a 96.45 BLEU and the SMS-speech system a 44.36 BLEU in a specific domain: the renewal of the Identity Document and Driving License. In the evaluation of the emotional TTS, it is important to highlight the improvement in the naturalness thanks to the morpho-syntactic features, and the high flexibility provided by HSMMs when generating different emotional strengths.  相似文献   

18.
This paper contains three studies of factors which affect the choice of words in simple speech-based interactions. It is shown that choice of words is affected by the level of constraint imposed on users, such that variability is much higher when no constraint is applied than when some form of constraint is used and that variability can be reduced by employing different forms of feedback. In particular, the design of visual and auditory feedback has a bearing on users' choice of words. However, it is proposed that these results do not necessarily lead to people copying the computer, but arise from users developing appropriate communication protocols in their transactions. The paper concludes that choice of words will be subject to a number of factors, that some of these factors can be modified through system design but that 'out-task' vocabulary or inappropriate use of commands can still present problems. Until we have a better understanding of the linguistics of speech-based interaction with machines, these problems will remain intractable.  相似文献   

19.
20.
As mobile computing devices grow smaller and as in-car computing platforms become more common, we must augment traditional methods of human-computer interaction. Although speech interfaces have existed for years, the constrained system resources of pervasive devices, such as limited memory and processing capabilities, present new challenges. We provide an overview of embedded automatic speech recognition (ASR) on the pervasive device and discuss its ability to help us develop pervasive applications that meet today's marketplace needs. ASR recognizes spoken words and phrases. State-of-the-art ASR uses a phoneme-based approach for speech modeling: it gives each phoneme (or elementary speech sound) in the language under consideration a statistical representation expressing its acoustic properties.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号