首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The findings of a workshop, the goals of which were to identify applications, research problems, and designs of high performance computing and communications (HPCC) systems for supporting applications are discussed. In computer vision, the main scientific issues are machine learning, surface reconstruction, inverse optics and integration, model acquisition, and perception and action. In speech and natural language processing (SNLP), issues were identified statistical analysis in corpus-based speech and language understanding, search strategies for language analysis, auditory and vocal-tract modeling, integration of multiple levels of speech and language analyses, and connectionist systems. In AI, important issues that need immediate attention include the development of efficient machine learning and heuristic search methods that can adapt to different architectural configurations, and the design and construction of scalable and verifiable knowledge bases, active memories, and artificial neural networks  相似文献   

2.
Two research projects are described that explore the use of spoken natural language interfaces to virtual reality (VR) systems. Both projects combine off-the-shelf speech recognition and synthesis technology with in-house command interpreters that interface to the VR applications. Details about the interpreters and other technical aspects of the projects are provided, together with a discussion of some of the design decisions involved in the creation of speech interfaces. Questions and issues raised by the projects are presented as inspiration for future work. These issues include: requirements for object and information representation in VR models to support natural language interfaces; use of the visual context to establish the interaction context; difficulties with referencing events in the virtual world; and problems related to the usability of speech and natural language interfaces in general.  相似文献   

3.
The age of artificial intelligence (AI) is upon us, and its effect upon society in the coming years will be noteworthy. Artificial intelligence is a field that encompasses such applications as robotics, expert systems, natural language understanding, speech recognition, and computer vision. The effect of these AI systems upon existing and future job occupations will be important. This paper takes a look at artificial intelligence in terms of the creation of new job categories. Also, the introduction of AI into the organization to better familiarize the employees about AI will be discussed.  相似文献   

4.
5.
蒋胤傑    况琨    吴飞   《智能系统学报》2020,15(1):175-182
数据驱动的机器学习(特别是深度学习)在自然语言处理、计算机视觉分析和语音识别等领域取得了巨大进展,是人工智能研究的热点。但是传统机器学习是通过各种优化算法拟合训练数据集上的最优模型,即在模型上的平均损失最小,而在现实生活的很多问题(如商业竞拍、资源分配等)中,人工智能算法学习的目标应该是是均衡解,即在动态情况下也有较好效果。这就需要将博弈的思想应用于大数据智能。通过蒙特卡洛树搜索和强化学习等方法,可以将博弈与人工智能相结合,寻求博弈对抗模型的均衡解。从数据拟合的最优解到博弈对抗的均衡解能让大数据智能有更广阔的应用空间。  相似文献   

6.
This paper presents a new technique to enhance the performance of the input interface of spoken dialogue systems based on a procedure that combines during speech recognition the advantages of using prompt-dependent language models with those of using a language model independent of the prompts generated by the dialogue system. The technique proposes to create a new speech recognizer, termed contextual speech recognizer, that uses a prompt-independent language model to allow recognizing any kind of sentence permitted in the application domain, and at the same time, uses contextual information (in the form of prompt-dependent language models) to take into account that some sentences are more likely to be uttered than others at a particular moment of the dialogue. The experiments show the technique allows enhancing clearly the performance of the input interface of a previously developed dialogue system based exclusively on prompt-dependent language models. But most important, in comparison with a standard speech recognizer that uses just one prompt-independent language model without contextual information, the proposed recognizer allows increasing the word accuracy and sentence understanding rates by 4.09% and 4.19% absolute, respectively. These scores are slightly better than those obtained using linear interpolation of the prompt-independent and prompt-dependent language models used in the experiments.  相似文献   

7.
《Computer》1998,31(1):51-58
What are 250 top researchers from academia and industry working on at Microsoft Research (MSR)? What attracted them to Redmond, Washington, as well as two new facilities in San Francisco and Cambridge, UK? MSR's appeal to its researchers is that their research will likely be “productized” for the mass market. Each of three metagroups have long-term goals but look for ways to incorporate their research into products shipping now. Their research is already shipping in some form in nearly every Microsoft product. This article gives an overview of the organization and goals of MSR, established in 1991 to look three to five years out to ensure that Microsoft remains well ahead of the technology curve. As a corporate research lab, MSR is unapologetic about its intentions to identify and fund technologies and new applications that are relevant to Microsoft's corporate strategy. Research is tightly coupled to Microsoft's vision of next-generation systems and software development: PCs that are intuitive to even neophytes, programming paradigms and tools that improve programmer productivity and program maintainability, and next-generation systems for the enterprise. Three metagroups-Advanced Interactivity and Intelligent Systems; Programming Tools and Methodologies; and Systems and Architecture-undertake research in roughly 20 areas including speech technology, vision, natural language processing, user interface development and decision theory  相似文献   

8.
SpeechActs is a prototype testbed for developing spoken natural language applications. In developing SpeechActs, our primary goal was to enable software developers without special expertise in speech or natural language to create effective conversational speech applications-that is, applications with which users can speak naturally, as if they were conversing with a personal assistant. We also wanted SpeechActs applications to work with one another without requiring that each have specific knowledge of other applications running in the same suite. A discourse management component is necessary to embody the information that allows such a natural conversational flow. Because technology changes so rapidly, we also did not want to tie developers to specific speech recognizers or synthesizers. We wanted them to be able to use these speech technologies as plug-in components  相似文献   

9.
The frame rate of conventional vision systems is restricted to the video signal formats(e.g., NTSC 30 fps and PAL 25 fps that are designed on the basis of the characteristics of the human eye, which implies that the processing speed of these systems i limited to the recognition speed of the human eye. However, there is a strong demand for real-time high-speed vision sensors in many application fields, such as factory automation, biomedicine, and robotics, where high-speed operations are carried out. Thes high-speed operations can be tracked and inspected by using high-speed vision systems with intelligent sensors that work at hundred of Hertz or more, especially when the operation is difficult to observe with the human eye. This paper reviews advances in developing real-time high speed vision systems and their applications in various fields, such as intelligent logging systems, vibration dynamic sensing, vision-based mechanical control, three-dimensional measurement/automated visual inspection, vision-based human interface and biomedical applications.  相似文献   

10.
深度神经网络是具有复杂结构和多个非线性处理单元的模型,广泛应用于计算机视觉、自然语言处理等领域.但是,深度神经网络存在不可解释这一致命缺陷,即“黑箱问题”,这使得深度学习在各个领域的应用仍然存在巨大的障碍.本文提出了一种新的深度神经网络模型——知识堆叠降噪自编码器(Knowledge-based stacked denoising autoencoder,KBSDAE).尝试以一种逻辑语言的方式有效解释网络结构及内在运作机理,同时确保逻辑规则可以进行深度推导.进一步通过插入提取的规则到深度网络,使KBSDAE不仅能自适应地构建深度网络模型并具有可解释和可视化特性,而且有效地提高了模式识别性能.大量的实验结果表明,提取的规则不仅能够有效地表示深度网络,还能够初始化网络结构以提高KBSDAE的特征学习性能、模型可解释性与可视化,可应用性更强.  相似文献   

11.
As Third Generation (3G) networks emerge they provide not only higher data transmission rates but also the ability to transmit both voice and low latency data within the same session. This paper describes the architecture and implementation of a multimodal application (voice and text) that uses natural language understanding combined with a WAP browser to access email messages on a cell phone. We present results from the use of the system by users as part of a laboratory trial that evaluated usage. The user trial also compared the multimodal system with a text-only system that is representative of current products in the market today. We discuss the observed modality issues and highlight implementation problems and usability concerns that were encountered in the trial. Findings indicate that speech was used the majority of the time by participants for both input and navigation even though most of the participants had little or no prior experience with speech systems (yet did have prior experience with text-only access to applications on their phones). To our knowledge this represents the first implementation and evaluation of its kind using this combination of technologies on an unmodified cell phone. Design implications resulting from the study findings and usability issues encountered are presented to inform the design of future conversational multimodal mobile applications.  相似文献   

12.
The design and implementation of a user-oriented speech recognition interface are described. The interface enables the use of speech recognition in so-called interactive voice response systems which can be accessed via a telephone connection. In the design of the interface a synergy of technology and human factors is achieved. This synergy is very important for making speech interfaces a natural and acceptable form of human-machine interaction. Important concepts such as interfaces, human factors and speech recognition are discussed. Additionally, an indication is given as to how the synergy of human factors and technology can be realised by a sketch of the interface's implementation. An explanation is also provided of how the interface might be integrated in different applications fruitfully.  相似文献   

13.
深度学习作为人工智能领域最为活跃的研究分支,近年来在计算机视觉、自然语言处理、语音识别等领域取得丰硕成果.同时,深度学习在医疗领域中的应用也逐渐成为研究热点,并且在医学图像和信号处理、计算机辅助检测与诊断、临床决策支持、医疗信息挖掘和检索等方面取得了一些成功,展现出了极大的应用前景.本文在介绍深度学习原理和常用深度神经...  相似文献   

14.
A review of speech-based bimodal recognition   总被引:1,自引:0,他引:1  
Speech recognition and speaker recognition by machine are crucial ingredients for many important applications such as natural and flexible human-machine interfaces. Most developments in speech-based automatic recognition have relied on acoustic speech as the sole input signal, disregarding its visual counterpart. However, recognition based on acoustic speech alone can be afflicted with deficiencies that preclude its use in many real-world applications, particularly under adverse conditions. The combination of auditory and visual modalities promises higher recognition accuracy and robustness than can be obtained with a single modality. Multimodal recognition is therefore acknowledged as a vital component of the next generation of spoken language systems. The paper reviews the components of bimodal recognizers, discusses the accuracy of bimodal recognition, and highlights some outstanding research issues as well as possible application domains  相似文献   

15.
基于k-近似的汉语词类自动判定   总被引:6,自引:0,他引:6  
生词处理在面向大规模起初文本的自然语言自理各项应用中占有重要位置。词类自动判定就是对说情水知的生词由机器自动赋予一个合适的词类标记。文中提出了一种基于k=近拟的词类自动判定算法,并在一个1亿字汉语语料库及一个60万字经过人工分词和词类标注汉语熟语料库的支持下,构造了相应实验。实验结果初步显示,本算法对汉语开放词类--名词动词开窍词的词类自动判定平均正确率分别为99.21%、84.73%、76.5  相似文献   

16.
. The results of a survey conducted amongst managers, users and application developers of Natural Language interrogation systems are presented and analysed. Those that were able to develop successful and effective applications using natural language paid careful attention to the certain stages. It is proposed that these stages are: 1. Systematic analysis of the company's requirements. 2. Effective integration of the natural language technology with the target database ensuring current applications are not adversely affected. 3. Introduction to new users of the system. This resulted in realistic user expectations and enabled effective use of the natural language software. The advantages and disadvantages of natural language interfaces from an application developer, manager and user perspective are also discussed and recommendations made.  相似文献   

17.
Language models (LMs) are important components of many applications that work with natural language, such as word prediction and completion programs, automatic speech recognition, and machine translation. In this paper, we introduce various types of improvements for LMs dealing with word prediction and completion in Hebrew. Whereas previous systems for the Hebrew language apply known variants of existing LMs without any alteration, this study presents two types of improvements concerning the LMs: one is general and the other is special for the Hebrew language. These improvements enable all tested LMs to improve their keystroke saving abilities.  相似文献   

18.
AppVox is a mobile application that provides support for children with speech and language impairments in their speech therapy sessions, while also allowing autonomous training at home. The application simulates a vocalizer with an audio stimulus feature, which can be used to train and amend the pronunciation of specific words through repetition. In this paper, we aim to present the development of the application as an assistive technology option, by adding new features to the vocalizer as well as assessing it as a usable option for daily training interaction for children with speech and language impairments. In this regard, we invited 15 children with speech and language impairments and 20 with no impairments to perform training activities with the application. Likewise, we asked three speech therapists and three usability experts to interact, assess, and give their feedback. In this assessment, we include the following parameters: successful conclusion of the training tasks (effectiveness); number of errors made, as well as number and type of difficulties found (efficiency); and the acceptance and level of comfort in completing the requested tasks (satisfaction). Overall, the results showed that children conclude the training tasks successfully and helped to improve their language and speech capabilities. Therapists and children gave positive feedback to the AppVox interface.  相似文献   

19.
Giovanni Guida  Carlo Tasso   《Automatica》1983,19(6):759-766
Constructing natural language interfaces to computer systems often requires achievement of advanced reasoning and expert capabilities in addition to basic natural language understanding. In this paper the above issue is tackled in the frame of an actual application concerning the design of a natural language interface for interactive document retrieval. After a short discussion of the peculiarities of this application, which requires both natural language understanding and reasoning capabilities, the general architecture and fundamental design criteria of a system presently being developed at the University of Udine are presented. The system, named IR-NLI, is aimed at allowing non-technical users to directly access through natural language the services offered by online databases. Attention is later focused on the basic functions of IR-NLI, namely understanding, dialogue and reasoning. An example of interaction with IR-NLI is fully worked out to introduce the main features of the system. Knowledge representation methods and algorithms adopted are then illustrated. Perspectives and direction for future research are also discussed.  相似文献   

20.

Speech provides a natural way for human–computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 h from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in European Portuguese.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号