首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The development of IP-Telephony in recent years has been substantial. The improvement in voice quality, the integration between voice and data, especially the interaction with multimedia has made the 3G communication more promising. The value added services of Telephony techniques alleviate the dependence on the phone and provide a universal platform for the multimodal telephony applications. For example, the web-based application with VoiceXML has been developed to simplify the human–machine interaction because it takes the advantage of the speech-enabled services and makes the telephone-web access a reality. However, it is not cost-efficient to build voice only stand-alone web application and is more reasonable that voice interfaces should be retrofitted to be compatible or collaborate with the existing HTML or XML-based web applications. Therefore, this paper considers that the functionality of the web service should enable multiple access modalities so that users can perceive and interact with the site in either visual or speech response simultaneously. Under this principle, our research develops a prototype system of multimodal VoIP with the integrated web-based Mandarin dialog system which adopts automatic speech recognition (ASR), text-to-speech (TTS), VoiceXML browser, and VoIP technologies to create user friendly graphic user interface (GUI) and voice user interface (VUI). The users can use traditional telephone, cellular phone, or even VoIP connection via personal computer to interact with the VoiceXML server. In the mean time, the users browse the web and access the same content with common HTML or XML-based browser. The proposed system shows excellent performance and can be easily incorporated into voice ordering service for a wider accessibility.  相似文献   

2.
Maptables are increasingly used to support collaborative spatial planning processes. Despite the proven benefits and claimed potential of using a maptable in such processes, software applications specifically designed for this device are still scarce. Moreover, often-used applications do not fully exploit the touch functionality of a maptable, or have low usability. To address this gap, we developed and evaluated the Open Geospatial Interactive TOol (OGITO), an open-source software application designed to support collaborative spatial planning processes with a maptable. To develop such tool, we combined human-centred design and Agile software development principles in a co-design effort with intended users and stakeholders. Through iterative development cycles and feedback from users, OGITO was evolved until it satisfied user expectations. In a case study on community mapping in Sumatra, Indonesia, a sample of users evaluated OGITO's usability. Case study participants reported high satisfaction with this tool for the tasks and context given. Our research shows the added value of iterative development and user feedback for improving and further development of the tool's usability and functionality.  相似文献   

3.
The development of IP-Telephony in recent years has been substantial. The improvement in voice quality, the integration between voice and data, especially the interaction with multimedia has made the 3G communication more promising. The value added services of Telephony techniques alleviate the dependence on the phone and provide a universal platform for the multimodal telephony applications. For example, the web-based application with VoiceXML has been developed to simplify the human–machine interaction because it takes the advantage of the speech-enabled services and makes the telephone-web access a reality. However, it is not cost-efficient to build voice only stand-alone web application and is more reasonable that voice interfaces should be retrofitted to be compatible or collaborate with the existing HTML or XML-based web applications. Therefore, this paper considers that the functionality of the web service should enable multiple access modalities so that users can perceive and interact with the site in either visual or speech response simultaneously. Under this principle, our research develops a prototype system of multimodal VoIP with the integrated web-based Mandarin dialog system which adopts automatic speech recognition (ASR), text-to-speech (TTS), VoiceXML browser, and VoIP technologies to create user friendly graphic user interface (GUI) and voice user interface (VUI). The users can use traditional telephone, cellular phone, or even VoIP connection via personal computer to interact with the VoiceXML server. In the mean time, the users browse the web and access the same content with common HTML or XML-based browser. The proposed system shows excellent performance and can be easily incorporated into voice ordering service for a wider accessibility.  相似文献   

4.
Advanced speech recognition technology facilitated the development of voice-based smart devices. Voice user interface (VUI) is now a common feature on smartphones, computers, smart home devices, and car systems. The fragmented and context focused literature on VUI motivates this examination of the relationship between perceived quality and customer satisfaction in VUI portable devices. This study is the first to introduce extrinsic motivational factors as an extension to Wixom and Todd’s model. These additional antecedent factors add an enriched explanation of VUI user behavior. This study is also the first to consider the role of gender in a VUI behavior model. Our findings suggest that the proposed driving factors, including trust, perceived risks, perceived enjoyment, and mobile self-efficacy, significantly affect VUI user attitudes, which influence their continuance intention. Our results also address the role of gender on the association of attitude toward VUI use and its antecedents. The findings show that perceived risk (privacy concerns) and perceived ease of use are more influential on VUI use behavior of males than females. However, the effect of trust and mobile self-efficacy play a more crucial role for females than males.  相似文献   

5.
联邦学习是解决多组织协同训练问题的一种有效手段,但是现有的联邦学习存在不支持用户掉线、模型API泄露敏感信息等问题。文章提出一种面向用户的支持用户掉线的联邦学习数据隐私保护方法,可以在用户掉线和保护的模型参数下训练出一个差分隐私扰动模型。该方法利用联邦学习框架设计了基于深度学习的数据隐私保护模型,主要包含两个执行协议:服务器和用户执行协议。用户在本地训练一个深度模型,在本地模型参数上添加差分隐私扰动,在聚合的参数上添加掉线用户的噪声和,使得联邦学习过程满足(ε,δ)-差分隐私。实验表明,当用户数为50、ε=1时,可以在模型隐私性与可用性之间达到平衡。  相似文献   

6.
《Ergonomics》2012,55(11):1943-1957
Abstract

Errors, whether created by the user, the recognizer, or inadequate systems design, are an important consideration in the more widespread and successful use of automatic speech recognition (ASR). An experiment is described in which recognition errors are studied under different types of feedback. Subjects entered data verbally to a microcomputer according to four experimental conditions: namely, orthogonal combinations of spoken and visual feedback presented concurrently or terminally after six items. Although no significant differences in terms of error rates or speed of data entry were shown across the conditions, analysis of the time penalty for error correction indicated that as a general rule, there is a small timing advantage for terminal feedback, when the error rate is low. It was found that subjects do not monitor visual feedback with the same degree of accuracy as spoken, as a larger number of incorrect data entry strings was being confirmed as correct. Further evidence for the use of ‘second best’ recognition data is given, since correct recognition on re-entry could be increased from 83·0% to 92·4% when the first choice recognition was deleted from the second attempt. Finally, the implications for error correction protocols in system design are discussed.  相似文献   

7.
《Ergonomics》2012,55(13-14):1386-1407
Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use.  相似文献   

8.

An evaluation method is proposed based on walkthrough analysis coupled with a taxonomic analysis of observed problems and causes of usability error. The model mismatch method identifies usability design flaws and missing requirements from user errors. The method is tested with a comparative evaluation of two information retrieval products. Different profiles of usability and requirements problems were found for the two products, even though their overall performance was similar.  相似文献   

9.
10.
Earcons, nonverbal sound feedback, have been used for electronic products to give appropriate feedback information for the selected user functions. This study evaluated earcon usability of a portable digital electronic product based on cognition time, error rate, and subjective feelings using 20 male and female subjects. For subjective evaluation, the study assessed various earcons by subjective impression of sounds using 7‐point rating scales. For earcon usability performance, major user functions were used for the product with currently available earcons and for the product with the new earcons (suggested by this study), which considered perceptual characteristics, such as loudness and melody. Statistical results from the study indicated that the new earcons significantly reduced user error rates and therefore generally improved user performance on major functions, such as “PLAY,” “OFF,” “STOP,” “FF” (fast forward), and “REW” (rewind). Subjective data results also showed that users were more satisfied with the new, melody‐based sound feedback. Practical guidelines for sound feedback design of a small digital product are suggested. © 2011 Wiley Periodicals, Inc.  相似文献   

11.
在自然人机对话中,由于环境噪声、方言口音等因素带来的语音识别错误以及语义分析的不充分等原因,计算机在理解用户交互意图时出现偏差,使得计算机对要反馈的话题出现错误,造成人机对话进程的断裂.以面向咖啡为主题的漫谈式人机对话为例,将对话中断分为3种情况:话题反馈不当引起中断、话题正确情况下的模糊反馈不当和精确反馈不当引起中断.根据用户与计算机对话的记录分析比较上述3种情况下人机对话进程断裂情况.统计数据结果表明,话题反馈不当带来的对话中断最为明显,在对话进程断裂情况中达到了60.1%的比例;在话题反馈正确情况下,模糊回答不当和精确回答不当带来的话题中断比例分别为22.2%和21.6%;在语音识别错误情况下,语义分析会带来数量更大的反馈错误.实验数据分析结果表明,在语音识别错误情况下,根据上下文信息提高计算机对用户话题反馈的准确率,能够有效降低人机对话的中断,提高人机对话的自然度.该工作为自然人机对话的意图分类重要性提供了数据分析和实验论证.  相似文献   

12.
A prototype e-mail system was developed for cognitively disabled users, with four different interfaces (free format, idea prompt, form fill and menu driven). The interfaces differed in the level of support provided for the user and complexity of facilities for composing e-mail messages. Usability evaluation demonstrated that no one interface was superior because of individual differences in usability problems, although the majority of users preferred interfaces which did not restrict their freedom of expression (free format). In contrast to traditional evaluation studies, no common pattern of usability errors emerged, demonstrating the need for customisation of interfaces for individual cognitively disabled users. A framework for customising user interfaces to individual users is proposed, and usability principles derived from the study are expressed as claims following the task artefact cycle.  相似文献   

13.
针对预先给定参数求解共同向量所存在的不足,提出了一种基于共同向量的非常态语音说话人识别算法,首先,通过系统识别率自适应调整求解共同向量的参数;然后,将系统识别率最高的参数视为最优参数,为测试语音提取共同向量,并用SVM分类器进行非常态语音说话人分类。实验结果表明:该算法所提取的共同向量,对轻微感冒语音说话人识别率为85.4%,比对特征不进行处理的GMM算法、SVM和结合共同向量的GMM算法的识别率分别提高了16.9%、15.2%和3.2%。  相似文献   

14.
肖星星  冯瑞 《计算机工程》2012,38(24):171-174
现有说话人识别方法在短时语音条件下识别性能明显下降。为此,提出一种基于共性特征选择的短时说话人识别方法。利用说话人语音数据得到高斯混合模型,提取说话人之间的公共重叠部分,建立共性重叠模型和非重叠模型,根据这2个模型完成测试语音特征的选择,计算其在所有说话人非重叠模型中的相似度,并根据相似性最大化原则进行决策。实验结果表明,该方法具有较强的鲁棒性,且系统识别错误率较低。  相似文献   

15.
基于多通道融合的连续手写识别纠错方法   总被引:1,自引:0,他引:1  
敖翔  王绪刚  戴国忠  王宏安 《软件学报》2007,18(9):2162-2173
在基于识别的界面中,用户的满意度不但由识别准确度决定,而且还受识别错误的纠正过程的影响.提出一种基于多通道融合的连续手写笔迹识别错误的纠正方法.该方法允许用户通过口述书写内容纠正手写识别中的字符提取和识别的错误.该纠错方法的核心是一种多通道融合算法.该算法通过利用语音输入约束最优手写识别结果的搜索,可纠正手写字符的切分错和识别错.实验评估结果表明,该融合算法能够有效纠正错误,计算效率高.与另外两种手写识别错误纠正方法相比,该方法具有更高的纠错效率.  相似文献   

16.
Experiencing stress during training is a way to prepare professionals for real-life crises. With the help of feedback tools, professionals can train to recognize and overcome negative effects of stress on task performances. This paper reports two studies that empirically examined the effect of such a feedback system. The system, based on the COgnitive Performance and Error (COPE) model, provides its users with physiological, predicted performance and predicted error-chance feedback. The first experiment focussed on creating stressful scenarios and establishing the parameters for the predictive models for the feedback system. Participants (n=9) performed fire-extinguishing tasks on a virtual ship. By altering time pressure, information uncertainty and consequences of performance, stress was induced. COPE variables were measured and models were established that predicted performance and the chances on specific errors. In the second experiment a new group of participants (n=29) carried out the same tasks while receiving eight different combinations of the three feedback types in a counterbalanced order. Performance scores improved when feedback was provided during the task. The number of errors made did not decrease. The usability score for the system with physiological feedback was significantly higher than a system without physiological feedback, unless combined with error feedback.This paper shows effects of feedback on performances and usability. To improve the effectiveness of the feedback system it is suggested to provide more in-depth tutorial sessions. Design changes are recommended that would make the feedback system more effective in improving performances.  相似文献   

17.
This paper reports results from a controlled experiment (N = 50) measuring effects of interruption on task completion time, error rate, annoyance, and anxiety. The experiment used a sample of primary and peripheral tasks representative of those often performed by users. Our experiment differs from prior interruption experiments because it measures effects of interrupting a user’s tasks along both performance and affective dimensions and controls for task workload by manipulating only the time at which peripheral tasks were displayed – between vs. during the execution of primary tasks. Results show that when peripheral tasks interrupt the execution of primary tasks, users require from 3% to 27% more time to complete the tasks, commit twice the number of errors across tasks, experience from 31% to 106% more annoyance, and experience twice the increase in anxiety than when those same peripheral tasks are presented at the boundary between primary tasks. An important implication of our work is that attention-aware systems could mitigate effects of interruption by deferring presentation of peripheral information until coarse boundaries are reached during task execution. As our results show, deferring presentation for a short time, i.e. just a few seconds, can lead to a large mitigation of disruption.  相似文献   

18.
Chan FY  Khalid HM 《Ergonomics》2003,46(13-14):1386-1407
Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use.  相似文献   

19.
As interactive voice response systems become more prevalent and provide increasingly more complex functionality, it becomes clear that the challenges facing such systems are not solely in their synthesis and recognition capabilities. Issues such as the coordination of turn exchanges between system and user also play an important role in system usability. In particular, both systems and users have difficulty determining when the other is taking or relinquishing the turn. In this paper, we seek to identify turn-taking cues correlated with human–human turn exchanges which are automatically computable. We compare the presence of potential prosodic, acoustic, and lexico-syntactic turn-yielding cues in prosodic phrases preceding turn changes (smooth switches) vs. turn retentions (holds) vs. backchannels in the Columbia Games Corpus, a large corpus of task-oriented dialogues, to determine which features reliably distinguish between these three. We identify seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems. Testing Duncan’s (1972) hypothesis that these turn-yielding cues are linearly correlated with the occurrence of turn-taking attempts, we further demonstrate that, the greater the number of turn-yielding cues that are present, the greater the likelihood that a turn change will occur. We also identify six cues that precede backchannels, which will also be useful for IVR backchannel generation and recognition; these cues correlate with backchannel occurrence in a quadratic manner. We find similar results for overlapping and for non-overlapping speech.  相似文献   

20.
Novice users face many challenges when browsing the Web. The goal of this experiment was to learn about how users perceive error situations when using the World Wide Web. Specifically, the goal was to learn which circumstances cause users to believe that an error has occurred. An exploratory experiment took place with 78 subjects who were novice users. In the experiment the subjects were asked to identify when they perceived that an error had occurred. The subjects reported a total of 219 error situations. These error situations were then classified by the researchers into the following four categories: user error, system error, situational error, and poor Web design. Based on the collected data, suggestions are presented for improving the usability of Web browsers and Web sites.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号