首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper provides a case study of a multilingual knowledge management system for a large organization. In so doing we elicit what it means for a system to be “multilingual” and how that changes some previous research on knowledge management. Some researchers have viewed multilingual as meaning a multilingual user interface. However, that is only a small part of the story. In this case we find multilingual also refers to a broad range of “multilingual,” including multilingual knowledge resources, multilingual feedback from users, multilingual search, multilingual ontologies and other concerns.  相似文献   

2.
随着多媒体信息和通信技术的快速发展,网络上的多语言语音数据日益增多.语音识别作为语音分析与处理的核心技术,如何快速地把中文和英文等少数多资源主要语言处理能力推广到更多的低资源语言,是当前识别技术迫切需要突破的瓶颈.文中试图总结声学模型建模领域的最新进展,探讨传统语音识别技术从单语言向多语言跨越过程中可能面临的困难.并在...  相似文献   

3.
多语言Web网站的结构与实现方法*   总被引:2,自引:0,他引:2  
简化多语言Web网站服务的管理与开发。在实践中,管理与开发多语言网站的大部分工作是保持网站的各种信息之间相互独立。在开发与管理多语言网站的过程中有许多与人相关的角色,如设计人员、实施人员(如程序员)、系统管理员、翻译人员与用户等角色。按照这些不同的角色对网站的各种信息进行严格分类,并保持在同一个网站中这些分类后的信息相互独立,也就是说负责翻译的人员不需要看到脚本语言,如JavaScript。同样,图形设计人员也不需要精通多种语言,也不必在多种语言环境中工作。从以上方面论述如何设计及实现多语言网站。  相似文献   

4.
近年来,将公共安全数据转换为图的形式,通过图神经网络构造节点表示应用于下游任务的方法,充分利用了公共安全数据的实体与关联信息,取得了较好的效果.为了提高模型的有效性,需要大量的高质量数据,但是高质量的数据通常归属于政府、公司和组织,很难通过数据集中的方式使模型学习到有效的事件检测模型.由于各数据拥有方的关注主题与收集时间不同,数据之间存在Non-IID的问题.传统的假设一个全局模型可以适合所有客户端的方法难以解决此类问题.本文提出了基于强化联邦图神经网络的公共安全突发事件检测方法PPSED,各客户端采用多方协作的方式训练个性化的模型来解决本地的突发事件检测任务.设计联邦公共安全突发事件检测模型的本地训练与梯度量化模块,采用基于图采样的minibatch机制的GraphSage构造公共安全突发事件检测本地模型,以减小数据Non-IID的影响,采用梯度量化方法减小梯度通信的消耗.设计基于随机图嵌入的客户端状态感知模块,在保护隐私的同时更好地保留客户端模型有价值的梯度信息.设计强化联邦图神经网络的个性化梯度聚合与量化策略,采用DDPG拟合个性化联邦学习梯度聚合加权策略,并根据权重决定是否对梯度进行量化,对模型的性能与通信压力进行平衡.通过在微博平台收集的公共安全数据集和三个公开的图数据集进行了大量的实验,实验结果表明了提出的方法的有效性.  相似文献   

5.
介绍了一种实现多语言HTML文本显示的方法,该方法通过建立微型字体服务器,实现字符信息的非图像传输,节约了传输带宽,提高了传输速度。讨论了该方法的基本原理,实现的方法,与传统方法相比较,阐述了该方法的特点。  相似文献   

6.
多语言问答是自然语言处理领域的研究热点之一,其目的是给定不同语种的问题和文本,模型能够返回正确的答案。随着机器翻译技术的快速发展及多语言预训练技术在自然语言处理领域中的广泛应用,多语言问答也取得了较快的发展。文中首先系统地梳理了当前多语言问答方法的相关工作,并将多语言问答方法分为基于特征的方法、基于翻译的方法、基于预训练的方法和基于双重编码的方法,分别介绍了每类方法的使用和特点;然后系统地探讨了当前多语言问答任务的相关工作,将多语言问答任务分为基于文本的多语言问答任务和基于多模态的多语言问答任务,并分别给出每个多语言问答任务的基本定义;接着总结了这些任务中的数据集统计、评价指标,以及涉及的问答方法;最后展望了多语言问答的未来发展方向。  相似文献   

7.
The creation and deployment of knowledge repositories for managing, sharing, and reusing tacit knowledge within an organization has emerged as a prevalent approach in current knowledge management practices. A knowledge repository typically contains vast amounts of formal knowledge elements, which generally are available as documents. To facilitate users' navigation of documents within a knowledge repository, knowledge maps, often created by document clustering techniques, represent an appealing and promising approach. Various document clustering techniques have been proposed in the literature, but most deal with monolingual documents (i.e., written in the same language). However, as a result of increased globalization and advances in Internet technology, an organization often maintains documents in different languages in its knowledge repositories, which necessitates multilingual document clustering (MLDC) to create organizational knowledge maps. Motivated by the significance of this demand, this study designs a Latent Semantic Indexing (LSI)-based MLDC technique capable of generating knowledge maps (i.e., document clusters) from multilingual documents. The empirical evaluation results show that the proposed LSI-based MLDC technique achieves satisfactory clustering effectiveness, measured by both cluster recall and cluster precision, and is capable of maintaining a good balance between monolingual and cross-lingual clustering effectiveness when clustering a multilingual document corpus.  相似文献   

8.
随着Web资源的日益丰富,人们需要跨语言的知识共享和信息检索。一个多语言Ontology可以用来刻画不同语言相关领域的知识,克服不同文化和不同语言带来的障碍。对现有的构建多语言Ontology方法进行分析和比较,提出一种基于核心概念集的多语言Ontology的构建方法,用一个独立于特定语言的Ontology以及来自不同自然语言的定义和词汇的同义词集来描述相关领域的概念。用该方法构建的Ontology具有良好的扩展能力、表达能力和推理能力,特别适合分布式环境下大型Ontology的创建。  相似文献   

9.
面向事件的多语平行语料库构建研究   总被引:2,自引:0,他引:2  
讨论了面向北京奥运的多语语料库建设中的若干基础问题。提出了面向事件、多领域融合的语料收集原则,制定了具有分类信息的标注规范,初步建立了具有近七万句对的可控多语语料库。  相似文献   

10.
维、哈、柯多文种全文搜索引擎的设计与实现   总被引:1,自引:0,他引:1  
在现有基于Web的全文信息检索技术的基础上,深入研究维、哈、柯文网络信息检索现状和维、哈、柯文语言文字计算机处理方面的关键问题,介绍基于Web的维、哈、柯全文搜索引擎的设计和实现。通过一个少数民族语种的搜索引擎的设计和实现,详细描述维、哈、柯多文种全文搜索引擎系统结构,每个模块的功能、关键问题及解决方法,为维、哈、柯少数民族网络用户提供了全新的信息检索技术和手段。  相似文献   

11.
A lack of surveillance system infrastructure in the Asia-Pacific region is seen as hindering the global control of rapidly spreading infectious diseases such as the recent avian H5N1 epidemic. As part of improving surveillance in the region, the BioCaster project aims to develop a system based on text mining for automatically monitoring Internet news and other online sources in several regional languages. At the heart of the system is an application ontology which serves the dual purpose of enabling advanced searches on the mined facts and of allowing the system to make intelligent inferences for assessing the priority of events. However, it became clear early on in the project that existing classification schemes did not have the necessary language coverage or semantic specificity for our needs. In this article we present an overview of our needs and explore in detail the rationale and methods for developing a new conceptual structure and multilingual terminological resource that focusses on priority pathogens and the diseases they cause. The ontology is made freely available as an online database and downloadable OWL file.  相似文献   

12.
万芳  袁保社 《现代计算机》2011,(15):71-73,80
在新疆地区应用的信息系统必须支持维吾尔文、哈萨克文和柯尔克孜文,而维哈柯文由于组合方式与编辑方向的特殊性在系统中需要特别的处理。描述如何运用Eclipse 3.2和Tomcat 5.0开发多元化采集数据的人口信息管理系统,并以维吾尔文、哈萨克文、柯尔克孜文和汉文等多语种方式显示、处理、存储和打印信息。通过在社区、乡镇、县的实际运行,证明该系统使用简便、运行稳定、数据安全、综合性能良好。该软件的推广,能够进一步增强人口服务管理,全面提高少数民族地区社区人口管理工作现代化水平。  相似文献   

13.
Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting natural language processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that a consensual specification on monolingual, bilingual and multilingual lexicons can be a useful aid for the various NLP actors. Within ISO, one purpose of Lexical Markup Framework (LMF, ISO-24613) is to define a standard for lexicons that covers multilingual lexical data.
Claudia SoriaEmail:
  相似文献   

14.
The article describes aspects of the development of a conversational natural language understanding (NLU) system done during the first year of the European research project CATCH-2004 (Converse in AThens Cologne and Helsinki) [http://www.catch2004.org]. The project is co-funded by the European Union in the scope of the IST programme (IST 1999-11103).

Its objectives focus on multi-modal, multi-lingual conversational natural language access to information systems. The paper emphasises on architecture, and telephony-based speech and NLU components as well as aspects of the implementation of a city event information (CEI) system in English, Finnish, German and Greek. The CEI system accesses two different databases in Athens and Helsinki using a common retrieval interface. Furthermore the paper singles out methodologies involved for acoustic and language model of the speech recognition component, parsing techniques and dialog modelling for the conversational natural language subsystem. For the implementation it outlines an incremental system refinement methodology necessary to adapt the system components to real-life data. It addresses the implementation of language specific characteristics and a common dialog design for all four languages, but also deals with aspects towards a multilingual conversational system. Finally, it presents prospects for further developments of the project.  相似文献   


15.
While small-scale search engines in specific domains and languages are increasingly used by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we review related literature and suggest the criteria for an ideal search tool. We present the design of a toolkit, called SpidersRUs, developed for multilingual search engine creation. The design and implementation of the tool, consisting of a Spider module, an Indexer module, an Index Structure, a Search module, and a Graphical User Interface module, are discussed in detail. A sample user session and a case study on using the tool to develop a medical search engine in Chinese are also presented. The technical issues involved and the lessons learned in the project are then discussed. This study demonstrates that the proposed architecture is feasible in developing search engines easily in different languages such as Chinese, Spanish, Japanese, and Arabic.  相似文献   

16.
讨论了面向北京奥运的多语语料库建设中的若干基础问题。提出了面向事件、多领域融合的语料收集原则,制定了具有分类信息的标注规范,初步建立了具有近七万句对的可控多语语料库。  相似文献   

17.
资源稀缺蒙语语音识别研究   总被引:1,自引:1,他引:0  
张爱英  倪崇嘉 《计算机科学》2017,44(10):318-322
随着语音识别技术的发展,资源稀缺语言的语音识别系统的研究吸引了更广泛的关注。以蒙语为目标识别语言,研究了在资源稀缺的情况下(如仅有10小时的带标注的语音)如何利用其他多语言信息提高识别系统的性能。借助基于多语言深度神经网络的跨语言迁移学习和基于多语言深度Bottleneck神经网络的抽取特征可以获得更具有区分度的声学模型。通过搜索引擎以及网络爬虫的定向抓取获得大量的网页数据,有助于获得文本数据,以增强语言模型的性能。融合多个不同识别结果以进一步提高识别精度。与基线系统相比,多种系统融合的识别绝对错误率减少12%。  相似文献   

18.
国际安全管理(ISM)规则是国际海事组织为预防和减少船舶海难事故而制定的强制性规则。在严格遵守该规则的前提下,文章结合船舶公司的实际情况,设计并实现了国际安全管理规则管理系统,同时介绍了系统的结构、功能和实现。  相似文献   

19.
This paper presents a method for determining the up/down orientation of text in a scanned document of unknown orientation, so that it can be appropriately rotated and processed by an optical character recognition (OCR) engine. The method analyzes the “open” portions of text blobs to determine the direction in which the open portions face. By determining the respective densities of blobs opening in a pair of opposite directions (e.g., right or left), the method can establish the direction in which the text as a whole is oriented. We first describe a method for determining the up/down orientation of roman text based on the asymmetry in the openness of most roman letters in the horizontal direction. For non-roman text such as Pashto and Hebrew, we provide a method that determines a direction that is the most asymmetric, and therefore the most useful for the determination of text orientation, given a training data set of documents of known orientation. This work can be adapted for use in automated mail processing or to determine the orientation of checks in automated teller machine envelopes, scanned or copied documents, documents sent via facsimile, and digital photographs that include text (e.g., road signs, business cards, driver's licenses), among other applications.  相似文献   

20.
针对嵌入式系统软件设计中多语言版本实现程序空间利用率不高、通用性差的问题,本文介绍一种在C51环境下,通过为不同语言的字符建立多重索引,设计了基础字符库和显示界面字符串索引结构,在应用程序中通过指针简单调用实现在点阵式LCD液晶上的多语言显示,并且可通过单键实现语种切换功能,节约空间、简单实用。该方法已在设计的温室环境信息采集控制系统的运用中取得了很好的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号