首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
本文主要讨论了声霸卡话音I/O的C语言编程接口及话音数据的存储格式,并通过一个C程序实例了介绍话音I/O应用开发的一般方法。  相似文献   

2.
In this paper, we describe a user study evaluating the usability of an augmented reality (AR) multimodal interface (MMI). We have developed an AR MMI that combines free-hand gesture and speech input in a natural way using a multimodal fusion architecture. We describe the system architecture and present a study exploring the usability of the AR MMI compared with speech-only and 3D-hand-gesture-only interaction conditions. The interface was used in an AR application for selecting 3D virtual objects and changing their shape and color. For each interface condition, we measured task completion time, the number of user and system errors, and user satisfactions. We found that the MMI was more usable than the gesture-only interface conditions, and users felt that the MMI was more satisfying to use than the speech-only interface conditions; however, it was neither more effective nor more efficient than the speech-only interface. We discuss the implications of this research for designing AR MMI and outline directions for future work. The findings could also be used to help develop MMIs for a wider range of AR applications, for example, in AR navigation tasks, mobile AR interfaces, or AR game applications.  相似文献   

3.
Modern interactive services such as information and e-commerce services are becoming increasingly more flexible in the types of user interfaces they support. These interfaces incorporate automatic speech recognition and natural language understanding and include graphical user interfaces on the desktop and web-based interfaces using applets and HTML forms. To what extent can the user interface software be decoupled from the service logic software (the code that defines the essential function of a service)? Decoupling of user interface from service logic directly impacts the flexibility of services, or how easy they are to modify and extend. To explore these issues, we have developed Sisl, an architecture and domain-specific language for designing and implementing interactive services with multiple user interfaces. A key principle underlying Sisl is that all user interfaces to a service share the same service logic. Sisl provides a clean separation between the service logic and the software for a variety of interfaces, including Java applets, HTML pages, speech-based natural language dialogue, and telephone-based voice access. Sisl uses an event-based model of services that allows service providers to support interchangeable user interfaces (or add new ones) to a single consistent source of service logic and data. As part of a collaboration between research and development, Sisl is being used to prototype a new generation of call processing services for a Lucent Technologies switching product.  相似文献   

4.
Decision support systems become a very important part of our lives. Technologies make them applicable almost everywhere. These systems can simplify solutions to the numerous problems, speed up analysis of the results of medical research and contribute to the rapid classification of input patterns. Practical applications may not only contribute to efficient technology, but also improve important aspects of our lives. In this article, we propose a model of decision support system for speech processing. Proposed mechanism can be used in various applications, where voice sample can be evaluated by the use of the proposed methodology. The proposed solution is based on analysis of the speech signal through the developed intelligent technique in which the signal is processed by the composed mathematical transform cooperating with bio-inspired algorithm and spiking neural network to evaluate possible voice problems. A novelty of our idea is the approach to the topic from a different side, because graphical representations of audio signals and heuristic methods are composed for feature extraction. The results are discussed after extensive comparisons in terms of advantages and disadvantages of the proposed approach. As a part of the conducted research, we demonstrated which transformations and heuristic algorithms work better in the process of voice analysis.  相似文献   

5.
基于LabWindows/CVI虚拟测试系统软件设计与实现   总被引:3,自引:1,他引:2  
基于VxWorks系统无法提供类似Windows系统人性化的图形操作界面,而且在VxWorks系统下开发图形界面具有难度高、工作量大、用户界面可视化效果单调等缺点,提出了一种利用LabWindows/CVI 8.5开发基于VxWoks系统下实时仿真机的测试方法.该方法成功地用于对某型飞机的地面半实物仿真试验中,实现了对仿真机外部I/O系统的测试与控制,并为今后仿真机I/O系统的故障诊断与排除提供了实用工具.  相似文献   

6.
The development of IP-Telephony in recent years has been substantial. The improvement in voice quality, the integration between voice and data, especially the interaction with multimedia has made the 3G communication more promising. The value added services of Telephony techniques alleviate the dependence on the phone and provide a universal platform for the multimodal telephony applications. For example, the web-based application with VoiceXML has been developed to simplify the human–machine interaction because it takes the advantage of the speech-enabled services and makes the telephone-web access a reality. However, it is not cost-efficient to build voice only stand-alone web application and is more reasonable that voice interfaces should be retrofitted to be compatible or collaborate with the existing HTML or XML-based web applications. Therefore, this paper considers that the functionality of the web service should enable multiple access modalities so that users can perceive and interact with the site in either visual or speech response simultaneously. Under this principle, our research develops a prototype system of multimodal VoIP with the integrated web-based Mandarin dialog system which adopts automatic speech recognition (ASR), text-to-speech (TTS), VoiceXML browser, and VoIP technologies to create user friendly graphic user interface (GUI) and voice user interface (VUI). The users can use traditional telephone, cellular phone, or even VoIP connection via personal computer to interact with the VoiceXML server. In the mean time, the users browse the web and access the same content with common HTML or XML-based browser. The proposed system shows excellent performance and can be easily incorporated into voice ordering service for a wider accessibility.  相似文献   

7.
The Enhanced Variable Rate Coder: Toll quality speech for CDMA   总被引:1,自引:0,他引:1  
The Enhanced Variable Rate Coder (EVRC), standardized by the Telecommunications Industry Association (TIA) as IS-127, is intended for use with the IS-95x Rate Set 1 air interface (CDMA). This coder operates at a maximum rate of 8.5 kb/s and an average rate of about 4.1 kb/s on conversational speech. The EVRC consists of three coding modes that are all based on the Code Excited Linear Prediction (CELP) model. Selection among the three modes is based on an estimate of the input signal state, with active speech encoded primarily at 170 bits/20 msec frame (Rate 1), background noise and silence encoded at 16 bits/frame (Rate 1/8), and some active speech and essentially all transitions between speech and silence encoded at 80 bits/frame (Rate 1/2). In order to improve performance in the presence of background noise, the EVRC employs an adaptive noise-suppression filter at the input. Subjective test results are presented demonstrating that the EVRC delivers excellent quality voice in clean speech/clear channel conditions, and that its performance exceeds that of most currently standardized speech coders for wireless applications in background noise and/or impaired channel conditions.  相似文献   

8.
The development of IP-Telephony in recent years has been substantial. The improvement in voice quality, the integration between voice and data, especially the interaction with multimedia has made the 3G communication more promising. The value added services of Telephony techniques alleviate the dependence on the phone and provide a universal platform for the multimodal telephony applications. For example, the web-based application with VoiceXML has been developed to simplify the human–machine interaction because it takes the advantage of the speech-enabled services and makes the telephone-web access a reality. However, it is not cost-efficient to build voice only stand-alone web application and is more reasonable that voice interfaces should be retrofitted to be compatible or collaborate with the existing HTML or XML-based web applications. Therefore, this paper considers that the functionality of the web service should enable multiple access modalities so that users can perceive and interact with the site in either visual or speech response simultaneously. Under this principle, our research develops a prototype system of multimodal VoIP with the integrated web-based Mandarin dialog system which adopts automatic speech recognition (ASR), text-to-speech (TTS), VoiceXML browser, and VoIP technologies to create user friendly graphic user interface (GUI) and voice user interface (VUI). The users can use traditional telephone, cellular phone, or even VoIP connection via personal computer to interact with the VoiceXML server. In the mean time, the users browse the web and access the same content with common HTML or XML-based browser. The proposed system shows excellent performance and can be easily incorporated into voice ordering service for a wider accessibility.  相似文献   

9.
A multimodal interactive dialogue automaton (kiosk) for self-service is presented in the paper. Multimodal user interface allow people to interact with the kiosk by natural speech, gestures additionally to the standard input and output devices. Architecture of the kiosk contains key modules of speech processing and computer vision. An array of four microphones is applied for far-field capturing and recording of user’s speech commands, it allows the kiosk to detect voice activity, to localize sources of desired speech signals, and to eliminate environmental acoustical noises. A noise robust speaker-independent recognition system is applied to automatic interpretation and understanding of continuous Russian speech. The distant speech recognizer uses grammar of voice queries as well as garbage and silence models to improve recognition accuracy. Pair of portable video-cameras are applied for vision-based detection and tracking of user’s head and body position inside of the working area. Russian-speaking talking head serves both for bimodal audio-visual speech synthesis and for improvement of communication intelligibility by turning the head to an approaching client. Dialogue manager controls the flow of dialogue and synchronizes sub-modules for input modalities fusion and output modalities fission. The experiments made with the multimodal kiosk were directed to cognitive and usability studies of human-computer interaction by different communication means  相似文献   

10.
Disclosure of personal information is valuable to individuals, governments, and corporations. This experiment explores the role interface design plays in maximizing disclosure. Participants (N = 100) were asked to disclose personal information to a telephone-based speech user interface (SUI) in a 3 (recorded speech vs. synthesized speech vs. text-based interface) by 2 (gender of participant) by 2 (gender of voice) between-participants experiment (with no voice manipulation in the text conditions). Synthetic speech participants exhibited significantly less disclosure and less comfort with the system than text-based or recorded-speech participants. Females were more sensitive to differences between synthetic and recorded speech. There were significant interactions between modality and gender of speech, while there were no gender identification effects. Implications for the design of speech-based information-gathering systems are outlined.  相似文献   

11.
为解决当前跨平台游戏开发难度较大的问题,使用三维游戏引擎Unity 3D开发了一款三维射击游戏,并移植到Windows,Web,Mac,MacDashboard等多个平台。从图形用户界面、关卡设计、动画模块、声音模块、人工智能等模块人手,阐述了游戏开发过程,解决了陨石围绕星体公转,用A*算法实现飞机自动寻找最近的陨石,建立对声音的监听等问题,并对纹理贴图、天空盒等问题进行了实现和优化。  相似文献   

12.
This paper reports the utility of eye-gaze,voice and manual response in the design of multimodal user interface.A device-and application-independent user interface model(VisualMan)of 3D object selection and manipulation was developed and validated in a prototype interface based on a 3D cube manipulation task.The multimodal inpus are integrated in the prototype interface based on the priority of modalities and interaction context.The implications of the model for virtual reality interface are discussed and a virtual environment using the multimodal user interface model is proposed.  相似文献   

13.
王爱珍  成守宇 《计算机工程与设计》2012,33(5):1790-1794,1800
为了改善基于串行总线技术的电站仿真机接口系统通信速率低、分布距离有限、扩展不方便以及调试困难,提出了基于ARM技术和以太网技术的分布式智能化输入输出接口系统.基于提出的分布式仿真机接口系统思想,分别从系统设计、系统软硬实现以及组态设计等进行了设计和实现.系统实际应用表明,基于ARM技术和以太网技术的接口系统通信速率高、分布距离远和扩展更方便,能够满足电站全范围仿真机输入输出接口的需要.  相似文献   

14.
Open graphical framework for interactive TV   总被引:1,自引:1,他引:0  
Multimedia end-user terminals are expected to perform advanced user interface related tasks. These tasks are carried out by user interface runtime tools and include, among others, the visualization of complex graphics and the efficient handling of user input. In addition, the terminal’s graphical system is expected, for example, to be able to synchronize audio and video, and control different contexts on the same screen. Finally, the availability of high-level tools to simplify the user interface implementation and the adaptiveness of the user interfaces for a diversity of configurations are, as well, desirable features. This paper presents a layered model that meets the just mentioned requirements. The architecture is divided into five different layers: hardware abstraction layer, multimedia cross platform libraries, graphical environment, GUI toolkit and high-level languages. Moreover, this paper presents the experiences of developing a prototype system based on the architecture, targeted to digital television receivers. In order to evaluate the prototype, some already developed DVB-MHP compliant digital television applications were tested. Finally, the prototype was extended with a high-level profile (i.e., SMIL support) and a low-level one (i.e., access to the framebuffer memory).
P. VuorimaaEmail:
  相似文献   

15.
One of the challenges that Ambient Intelligence (AmI) faces is the provision of a usable interaction concept to its users, especially for those with a weak technical background. In this paper, we describe a new approach to integrate interactive services provided by an AmI environment with the television set, which is one of the most widely used interaction client in the home environment. The approach supports the integration of different TV set configurations, guaranteeing the possibility to develop universally accessible solutions. An implementation of this approach has been carried out as a multimodal/multi-purpose natural human computer interface for elderly people, by creating adapted graphical user interfaces and navigation menus together with multimodal interaction (simplified TV remote control and voice interaction). In addition, this user interface can also be suited to other user groups. We have tested a prototype that adapts the videoconference and the information service with a group of 83 users. The results from the user tests show that the group found the prototype to be both satisfactory and efficient to use.  相似文献   

16.
For individuals with severe speech impairment accurate spoken communication can be difficult and require considerable effort. Some may choose to use a voice output communication aid (or VOCA) to support their spoken communication needs. A VOCA typically takes input from the user through a keyboard or switch-based interface and produces spoken output using either synthesised or recorded speech. The type and number of synthetic voices that can be accessed with a VOCA is often limited and this has been implicated as a factor for rejection of the devices. Therefore, there is a need to be able to provide voices that are more appropriate and acceptable for users.This paper reports on a study that utilises recent advances in speech synthesis to produce personalised synthetic voices for 3 speakers with mild to severe dysarthria, one of the most common speech disorders. Using a statistical parametric approach to synthesis, an average voice trained on data from several unimpaired speakers was adapted using recordings of the impaired speech of 3 dysarthric speakers. By careful selection of the speech data and the model parameters, several exemplar voices were produced for each speaker. A qualitative evaluation was conducted with the speakers and listeners who were familiar with the speaker. The evaluation showed that for one of the 3 speakers a voice could be created which conveyed many of his personal characteristics, such as regional identity, sex and age.  相似文献   

17.
Valaer  L.A. Babb  R.G.  II 《Software, IEEE》1997,14(4):29-39
Software developers face many difficult decisions when building new applications, not the least of which is the design of the graphical user interface. The answer to one question-is it better to use a GUI development tool or build it manually?-is relatively straightforward. Today's tools offer several benefits that manual coding does not. Because these tools often provide a simple graphical interface for developing displays, nonprogrammers and human factors engineers can contribute their expertise. Also, if the schedule permits, a tool can be used to build prototypes throughout the development cycle; some tools even provide a test/prototype mode for testing displays without compiling and executing the entire application. And finally, end users can evaluate each prototype and provide feedback, increasing their satisfaction with the final product  相似文献   

18.
徐文超  王光艳  陈雷 《计算机应用》2017,37(4):1212-1216
针对外部强噪声环境下电子耳蜗语音质量受损、适应性差等问题,提出了基于谱减法和变步长最小均方误差(LMS)自适应滤波算法联合去噪的改进方法,并以该方法构建了一个电子耳蜗前端语音预处理系统。利用变步长LMS自适应滤波算法输出误差的平方项来调节步长,采用步长值固定与变化相结合的方法,解决了自适应滤波算法收敛速度慢、稳态误差大的问题,适应性得到提高,提高了语音信号通信质量。该系统以TMS320VC5416和音频编解码芯片TLV320AIC23B为核心,通过多通道缓冲串口(McBSP)和串行外设接口(SPI)实现了语音数据的高速采集和实时处理。实验仿真和测试结果表明该算法消除噪声性能好,信噪比在低输入信噪比情况下提高约10 dB,语音质量感知评价(PESQ)分值也得到较大提高,能有效提高语音信号质量,且该系统性能稳定,能进一步提高耳蜗前端语音的清晰度和可懂度。  相似文献   

19.
For pt.1see ibid., vol. 9, p. 3 (2007). In this paper, the task and user interface modules of a multimodal dialogue system development platform are presented. The main goal of this work is to provide a simple, application-independent solution to the problem of multimodal dialogue design for information seeking applications. The proposed system architecture clearly separates the task and interface components of the system. A task manager is designed and implemented that consists of two main submodules: the electronic form module that handles the list of attributes that have to be instantiated by the user, and the agenda module that contains the sequence of user and system tasks. Both the electronic forms and the agenda can be dynamically updated by the user. Next a spoken dialogue module is designed that implements the speech interface for the task manager. The dialogue manager can handle complex error correction and clarification user input, building on the semantics and pragmatic modules presented in Part I of this paper. The spoken dialogue system is evaluated for a travel reservation task of the DARPA Communicator research program and shown to yield over 90% task completion and good performance for both objective and subjective evaluation metrics. Finally, a multimodal dialogue system which combines graphical and speech interfaces, is designed, implemented and evaluated. Minor modifications to the unimodal semantic and pragmatic modules were required to build the multimodal system. It is shown that the multimodal system significantly outperforms the unimodal speech-only system both in terms of efficiency (task success and time to completion) and user satisfaction for a travel reservation task  相似文献   

20.
In this paper, a voice activated robot arm with intelligence is presented. The robot arm is controlled with natural connected speech input. The language input allows a user to interact with the robot in terms which are familiar to most people. The advantages of speech activated robots are hands-free and fast data input operations. The proposed robot is capable of understanding the meaning of natural language commands. After interpreting the voice commands a series of control data for performing a tasks are generated. Finally the robot actually performs the task. Artificial Intelligence techniques are used to make the robot understand voice commands and act in the desired mode. It is also possible to control the robot using the keyboard input mode.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号