首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Text-to-speech systems have attracted a lot of research and development during the last decade. Recently, this work has resulted in relatively inexpensive products with decent speech quality. Text-to-speech systems offer an alternative to presenting text information on screens or paper. The telephone could hence be used as a computer terminal. Very little information is at present available on human factors in the use of these new devices. In this paper we will discuss the use of a multilingual text-to-speech system in various applications related to telecommunications.  相似文献   

2.
Multilingual text processing is useful because the information content found in different languages is complementary, both regarding facts and opinions. While Information Extraction and other text mining software can, in principle, be developed for many languages, most text analysis tools have only been applied to small sets of languages because the development effort per language is large. Self-training tools obviously alleviate the problem, but even the effort of providing training data and of manually tuning the results is usually considerable. In this paper, we gather insights by various multilingual system developers on how to minimise the effort of developing natural language processing applications for many languages. We also explain the main guidelines underlying our own effort to develop complex text mining software for tens of languages. While these guidelines??most of all: extreme simplicity??can be very restrictive and limiting, we believe to have shown the feasibility of the approach through the development of the Europe Media Monitor (EMM) family of applications (http://emm.newsbrief.eu/overview.html). EMM is a set of complex media monitoring tools that process and analyse up to 100,000 online news articles per day in between twenty and fifty languages. We will also touch upon the kind of language resources that would make it easier for all to develop highly multilingual text mining applications. We will argue that??to achieve this??the most needed resources would be freely available, simple, parallel and uniform multilingual dictionaries, corpora and software tools.  相似文献   

3.
Many commercial systems and much R&D work are aimed at easing the information explosion problem resulting from the advent of the Information Superhighway. One solution is to personalize the information to the specific interests of a user. A personalized news system named DeNews has been developed to track multilingual news sources, filter the relevant news articles, learn about the users's interests, sort news articles into defined classes, deliver them in full or summarized form, and translate them to a specific language. Many advanced text and natural language processing techniques are required to implement these functions and to facilitate the multilingual aspect of DeNews and the overall management of the huge amount of news articles. It is envisaged that the technology developed with DeNews will be especially suitable in a domain-specific corporate business environment, where accurate and timely information is critical.  相似文献   

4.
A multilingual disaster information system (MLDI) has been developed to overcome the language barrier during times of natural disaster. MLDI is a web-based system that includes templates in nine languages so that translated texts can be made available immediately. Mobile phone e-mail with graphic text is a useful tool for delivering multilingual disaster information. The visibility of graphic text on mobile phones was measured and found to be equivalent to the built-in font. However, visibility deteriorates as the character size becomes smaller, especially, on displays with poor resolution. This article also discusses the necessity of multilingual information and measures for a safe and barrier-free society.  相似文献   

5.
国际化文字处理综述   总被引:3,自引:0,他引:3  
计算机与不同用户的交互通常必须实现通过多种文字信息的输入/输出以实现,因此操作系统对多种文字的支持程度是其功能性的一个衡量标准。各种文字特征的巨大差异导致现代操作系统的文字处理实现非常复杂。本文总结了操作系统文字处理的范围与内容,包括文本输入与存储,文本处理以及用户交互处理;归纳了通用的文字处理模型和可能采取的技术途径及其优缺点;分析了常用操作系统的文字处理实现;最后展望了文字处理仍面临的挑战。  相似文献   

6.
7.
Pei‐Chi Wu 《Software》2000,30(7):765-774
Character sets are one of the basic issues for information interchange. Most current national standard character sets extend 7‐bit ASCII. These extensions conflict with each other and make the design of multilingual information systems complicated. Unicode or the Universal Character Set (UCS) is a character set that covers symbols in the major written languages. Text files and strings usually have no header to indicate which character set is in use, and they currently use one of the national standards by default. The transition from national standards to Unicode may take a longer time than expected. This paper presents the following methods to help the transition. (1) A text file format of fixed‐width characters: if the first character in a text file is a nonzero control code, the file is in UCS; otherwise, it is in the default national standard. The control code indicates which UCS subset or byte order is in use. (2) A tagged string storage: each string has a tag representing which character set or coding format is in use, e.g., the default national standard, 8‐bit subset of UCS‐2, UCS‐2, or UCS‐4. (3) A method for assigning the format of string literals: all string literals use the same syntax notation, and their storage format is the same as that of their source files. These methods can improve multilingual support without introducing much complexity. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

8.
9.
资源稀缺蒙语语音识别研究   总被引:1,自引:1,他引:0  
张爱英  倪崇嘉 《计算机科学》2017,44(10):318-322
随着语音识别技术的发展,资源稀缺语言的语音识别系统的研究吸引了更广泛的关注。以蒙语为目标识别语言,研究了在资源稀缺的情况下(如仅有10小时的带标注的语音)如何利用其他多语言信息提高识别系统的性能。借助基于多语言深度神经网络的跨语言迁移学习和基于多语言深度Bottleneck神经网络的抽取特征可以获得更具有区分度的声学模型。通过搜索引擎以及网络爬虫的定向抓取获得大量的网页数据,有助于获得文本数据,以增强语言模型的性能。融合多个不同识别结果以进一步提高识别精度。与基线系统相比,多种系统融合的识别绝对错误率减少12%。  相似文献   

10.
针对日渐丰富的多语种文本数据,为了实现对同一类别体系下不同语种的文本分类,充分发挥多语种文本信息的价值,提出一种结合双向长短时记忆单元和卷积神经网络的多语种文本分类模型BiLSTM-CNN模型。针对每个语种,利用双向长短时记忆神经网络提取文本特征,并引入卷积神经网络进行特征优化,获得各语种更深层次的文本表示,最后将各语种的文本表示级联输入到softmax函数预测类别。在中英朝科技文献平行数据集上进行了实验验证,实验结果表明,该方法相比于基准方法分类正确率提高了4%,且对任一语种文本均能正确分类,具有良好的扩展性。  相似文献   

11.
12.

In this paper we present an implemented account of multilingual linguistic resources for multilingual text generation that improves significantly on the degree of reuse of resources both across languages and across applications. We argue that this is a necessary step for multilingual generation in order to reduce the high cost of constructing linguistic resources and to make natural language generation relevant for a wider range of applications particularly, in this paper, for multilingual software and user interfaces. We begin by contrasting a weak and a strong approach to multilinguality in the state of the art in multilingual text generation. Neither approach has provided sufficient principles for organizing multilingual work. We then introduce our framework , where multilingual variation is included as an intrinsic feature of all levels of representation. We provide an example of multilingual tactical generation using this approach and discuss some of the performance, maintenance, and development issues that arise.  相似文献   

13.
In this paper, we look at the current scenario in multilingual documentation generation and the types of tools currently being used in support of the translation task, and discuss their shortcomings. We examine emergent trends in the document industry, observing a reorganisation of the workflow which mirrors a shift of attention from translating to authoring and from the ergonomics of post-editing the target text to the ergonomics of producing the source text. We argue that these trends invite the design and development of new tools for the task of producing multilingual texts, and that multilingual generation provides the appropriate technology, shifting attention to an even earlier stage in the authoring process, that of specifying the semantics of the text to be produced. We describe a prototype system which exploits this technology to meet the expressed needs of authors and translators by supporting them in the drafting of multilingual instructions. We suggest that, in the future, a single platform to support multilingual documentation should integrate translation-oriented tools and generation-based tools to be employed as appropriate by different types of users (translators and authors) in different circumstances.  相似文献   

14.
This article describes two different word sense disambiguation (WSD) systems, one applicable to parallel corpora and requiring aligned wordnets and the other one, knowledge poorer, albeit more relevant for real applications, relying on unsupervised learning methods and only monolingual data (text and wordnet). Comparing performances of word sense disambiguation systems is a very difficult evaluation task when different sense inventories are used and even more difficult when the sense distinctions are not of the same granularity. However, as we used the same sense inventory, the performance of the two WSD systems can be objectively compared and we bring evidence that multilingual WSD is more precise than monolingual WSD.  相似文献   

15.
Enhancing portability with multilingual ontology-based knowledge management   总被引:1,自引:0,他引:1  
Information systems in multilingual environments, such as the EU, suffer from low portability and high deployment costs. In this paper we propose an ontology-based model for multilingual knowledge management in information systems. Our unique feature is a lightweight mechanism, dubbed context, that is associated with ontological concepts and specified in multiple languages. We use contexts to assist in resolving cross-language and local variation ambiguities. Equipped with such a model, we next provide a four-step procedure for overcoming the language barrier in deploying a new information system. We also show that our proposed solution can overcome differences that stem from local variations that may accompany multilingual information systems deployment. The proposed mechanism was tested in an actual multilingual eGovernment environment and by using real-world news syndication traces. Our empirical results serve as a proof-of-concept of the viability of the proposed model. Also, our experiments show that news items in different languages can be identified by a single ontology concept using contexts. We also evaluated the local interpretations of concepts of a language in different geographical locations.  相似文献   

16.
17.
由于现在缺乏多语言教学中的主观题自动评分, 针对这一问题提出了一种基于孪生网络和BERT模型的主观题自动评分系统. 主观题的问题文本和答案文本通过自然语言预处理BERT模型得到文本的句向量, BERT模型已经在大规模多种语言的语料上经过训练, 得到的文本向量包含了丰富的上下文语义信息, 并且能处理多种语言信息. 然后把...  相似文献   

18.
Paris  C. Vander Linden  K. 《Computer》1996,29(7):49-56
Machine translation has been the dominant paradigm for automated multilingual document production. In this paradigm, a technical writer generates a source text, which is translated by the computer system into another language and then edited. One problem with machine translation, however, is that its output is typically constrained by the original text's style and language. Automatic language-generation systems, however, start with an underlying knowledge base that represents the text's content without dictating its language or style. However, most automatic systems are stand-alone tools, leaving technical writers out of the loop. The systems assume that an underlying knowledge base containing the necessary information is available or can be easily obtained. This is not necessarily the case, though, when producing instruction manuals. For example, the knowledge base required to produce instructions should contain user-oriented information. User-oriented documentation, which concerns the ways the product can help users achieve their goals, is more effective than documentation that focuses on the product. Only a technical writer can specify user-oriented information. It is thus prefer able to have a document-generation system that works with the writer. With this in mind, we developed Drafter, an interactive document drafting tool that can be integrated into the technical writers' working practices and that can automatically and simultaneously generate appropriately worded drafts in several languages. Drafter's current domain of application is software manuals  相似文献   

19.
Text classification systems will help to solve the text clustering problem in the Azerbaijani language. There are some text-classification applications for foreign languages, but we tried to build a newly developed system to solve this problem for the Azerbaijani language. Firstly, we tried to find out potential practice areas. The system will be useful in a lot of areas. It will be mostly used in news feed categorization. News websites can automatically categorize news into classes such as sports, business, education, science, etc. The system is also used in sentiment analysis for product reviews. For example, the company shares a photo of a new product on Facebook and the company receives a thousand comments for new products. The systems classify comments like positive or negative. The system can also be applied in recommended systems, spam filtering, etc. Various machine learning techniques such as Naive Bayes, SVM, Multi-layer Perceptron have been devised to solve the text classification problem in Azerbaijani language.  相似文献   

20.
Video text often contains highly useful semantic information that can contribute significantly to video retrieval and understanding. Video text can be classified into scene text and superimposed text. Most of the previous methods detect superimposed or scene text separately due to different text alignments. Moreover, because different language characters have different edge and texture features, it is very difficult to detect the multilingual text. In this paper, we first perform a detailed analysis of motion patterns of video text, and show that the superimposed and scene text exhibit different motion patterns on consecutive frames, which is insensitive to multiple language characters and multiple text alignments. Based on our analysis, we define Motion Perception Field (MPF) to represent the text motion patterns. Finally, we propose a text detection algorithms using MPF for both superimposed and scene text with multiple languages and multiple alignments. Experimental results on diverse videos demonstrate that our algorithms are robust, and outperform previous methods for detecting both superimposed and scene texts with multiple languages and multiple alignments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号