首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper provides a case study of a multilingual knowledge management system for a large organization. In so doing we elicit what it means for a system to be “multilingual” and how that changes some previous research on knowledge management. Some researchers have viewed multilingual as meaning a multilingual user interface. However, that is only a small part of the story. In this case we find multilingual also refers to a broad range of “multilingual,” including multilingual knowledge resources, multilingual feedback from users, multilingual search, multilingual ontologies and other concerns.  相似文献   

2.
Traditional Machine Translation (MT) systems are designed to translate documents. In this paper we describe an MT system that translates the closed captions that accompany most North American television broadcasts. This domain has two identifying characteristics. First, the captions themselves have properties quite different from the type of textual input that many MT systems have been designed for. This is due to the fact that captions generally represent speech and hence contain many of the phenomena that characterize spoken language. Second, the operational characteristics of the closed-caption domain are also quite distinctive. Unlike most other translation domains, the translated captions are only one of several sources of information that are available to the user. In addition, the user has limited time to comprehend the translation since captions only appear on the screen for a few seconds. In this paper, we look at some of the theoretical and implementational challenges that these characteristics pose for MT. We present a fully automatic large-scale multilingual MT system, ALTo. Our approach is based on Whitelock's Shake and Bake MT paradigm, which relies heavily on lexical resources. The system currently provides wide-coverage translation from English to Spanish. In addition to discussing the design of the system, we also address the evaluation issues that are associated with this domain and report on our current performance.  相似文献   

3.
This paper describes a new advance in solving Cross-Lingual Question Answering (CL-QA) tasks. It is built on three main pillars: (i) the use of several multilingual knowledge resources to reference words between languages (the Inter Lingual Index (ILI) module of EuroWordNet and the multilingual knowledge encoded in Wikipedia); (ii) the consideration of more than only one translation per word in order to search candidate answers; and (iii) the analysis of the question in the original language without any translation process. This novel approach overcomes the errors caused by the common use of Machine Translation (MT) services by CL-QA systems. We also expose some studies and experiments that justify the importance of analyzing whether a Named Entity should be translated or not. Experimental results in bilingual scenarios show that our approach performs better than an MT based CL-QA approach achieving an average improvement of 36.7%.  相似文献   

4.
Globalisation of crime poses a serious threat to the international community and is a matter of growing concern to law enforcement agencies all over the world. In the combat against international and organized crime, the European Union (EU) has supported a number of research and development projects within the domain of law enforcement focusing on cross-border communication, information extraction and data analysis in a multilingual context as well as terminology and knowledge management. LinguaNet is a case in point and the only project involving a multilingual messaging system. A high level of user involvement was a prominent feature of the project and the resulting software – the LinguaNet system – has gained widespread recognition and usage. The paper gives an overview of the LinguaNet approach as a whole emphasising the temporary experimental embedding of fully automatic MT in a multilingual messaging system. The system is intended for use by professionals with no background in linguistics but in great need of fast and robust communication. One of the conclusions drawn from this experiment is that authoring errors proved to be much more counter-productive than insufficiencies of MT. Another conclusion is that it is preferable to leave it to the recipient of a message to request a machine translation rather than providing it automatically up front. In more general terms, the police liked the approach and reported the need for more message type templates and MT facilities. Finally, the project lead to the formation of a European LinguaNet user group network. This revised version was published online in November 2006 with corrections to the Cover Date.  相似文献   

5.
One may indicate the potentials of an MT system by stating what text genres it can process, e.g., weather reports and technical manuals. This approach is practical, but misleading, unless domain knowledge is highly integrated in the system. Another way to indicate which fragments of language the system can process is to state its grammatical potentials, or more formally, which languages the grammars of the system can generate. This approach is more technical and less understandable to the layman (customer), but it is less misleading, since it stresses the point that the fragments which can be translated by the grammars of a system need not necessarily coincide exactly with any particular genre. Generally, the syntactic and lexical rules of an MT system allow it to translate many sentences other than those belonging to a certain genre. On the other hand it probably cannot translate all the sentences of a particular genre. Swetra is a multilanguage MT system defined by the potentials of a formal grammar (standard referent grammar) and not by reference to a genre. Successful translation of sentences can be guaranteed if they are within a specified syntactic format based on a specified lexicon. The paper discusses the consequences of this approach (Grammatically Restricted Machine Translation, GRMT) and describes the limits set by a standard choice of grammatical rules for sentences and clauses, noun phrases, verb phrases, sentence adverbials, etc. Such rules have been set up for English, Swedish and Russian, mainly on the basis of familiarity (frequency) and computer efficiency, but restricting the grammar and making it suitable for several languages poses many problems for optimization. Sample texts — newspaper reports — illustrate the type of text that can be translated with reasonable success among Russian, English and Swedish.  相似文献   

6.
This paper describes our work on developing a language-independent technique for discovery of implicit knowledge from multilingual information sources. Text mining has been gaining popularity in the knowledge discovery field, particularity with the increasing availability of digital documents in various languages from all around the world. However, currently most text mining tools mainly focus only on processing monolingual documents (particularly English documents): little attention has been paid to apply the techniques to handle the documents in Asian languages, and further extend the mining algorithms to support the aspects of multilingual information sources. In this work, we attempt to develop a language-neutral method to tackle the linguistics difficulties in the text mining process. Using a variation of automatic clustering techniques, which apply a neural net approach, namely the Self-Organizing Maps (SOM), we have conducted several experiments to uncover associated documents based on a Chinese corpus, Chinese-English bilingual parallel corpora, and a hybrid Chinese-English corpus. The experiments show some interesting results and a couple of potential paths for future work in the field of multilingual information discovery. Besides, this work is expected to act as a starting point for exploring the impacts on linguistics issues with the machine-learning approach to mining sensible linguistics elements from multilingual text collections.  相似文献   

7.
Metadoc: An adaptive hypertext reading system   总被引:2,自引:0,他引:2  
Presentation of textual information is undergoing rapid transition. Millennia of experience writing linear documents is gradually being discarded in favor of non-linear hypertext writing. In this paper, we investigate how hypertext — in its current node-and-link form — can be augmented by an adaptive, user-model-driven tool. Currently the reader of a document has to adapt to that document — if the detail level is wrong the reader either skims the document or has to consult additional sources of information for clarification. The MetaDoc system not only has hypertext capabilities but also has knowledge about the documents it represents. This knowledge enables the document to modify its level of presentation to suit the user. MetaDoc builds and dynamically maintains a user model for each reader. The model tailors the presentation of the document to the reader. The three-dimensionality of MetaDoc allows the text presented to be changed either by the user model or through explicit user action. MetaDoc is more a documentation reading system rather than a hypertext navigation or reading tool. MetaDoc is a fully developed and debugged system that has been applied to technical documentation.  相似文献   

8.
Toward a lexicalized grammar for interlinguas   总被引:1,自引:0,他引:1  
In this paper we present one aspect of our research on machine translation (MT): capturing the grammatical and computational relation between (i) the interlingua (IL) as defined declaratively in the lexicon and (ii) the IL as defined procedurally by way of algorithms that compose and decompose pivot IL forms. We begin by examining the interlinguas in the lexicons of a variety of current IL-based approaches to MT. This brief survey makes it clear that no consensus exists among MT researchers on the level of representation for defining the IL. In the section that follows, we explore the consequences of this missing formal framework for MT system builders who develop their own lexical-IL entries. The lack of software tools to support rapid IL respecification and testing greatly hampers their ability to modify representations to handle new data and new domains. Our view is that IL-based MT research needs both (a) the formal framework to specify possible IL grammars and (b) the software support tools to implement and test these grammars. With respect to (a), we propose adopting a lexicalized grammar approach, tapping research results from the study oftree grammars for natural language syntax. With respect to (b), we sketch the design and functional specifications for parts of ILustrate, the set of software tools that we need to implement and test the various IL formalisms that meet the requirements of a lexicalized grammar. In this way, we begin to address a basic issue in MT research, how to define and test an interlingua as a computational language — without building a full MT system for each possible IL formalism that might be proposed.  相似文献   

9.
Enhancing portability with multilingual ontology-based knowledge management   总被引:1,自引:0,他引:1  
Information systems in multilingual environments, such as the EU, suffer from low portability and high deployment costs. In this paper we propose an ontology-based model for multilingual knowledge management in information systems. Our unique feature is a lightweight mechanism, dubbed context, that is associated with ontological concepts and specified in multiple languages. We use contexts to assist in resolving cross-language and local variation ambiguities. Equipped with such a model, we next provide a four-step procedure for overcoming the language barrier in deploying a new information system. We also show that our proposed solution can overcome differences that stem from local variations that may accompany multilingual information systems deployment. The proposed mechanism was tested in an actual multilingual eGovernment environment and by using real-world news syndication traces. Our empirical results serve as a proof-of-concept of the viability of the proposed model. Also, our experiments show that news items in different languages can be identified by a single ontology concept using contexts. We also evaluated the local interpretations of concepts of a language in different geographical locations.  相似文献   

10.
One may indicate the potentials of an MT system by stating what text genres it can process, e.g., weather reports and technical manuals. This approach is practical, but misleading, unless domain knowledge is highly integrated in the system. Another way to indicate which fragments of language the system can process is to state its grammatical potentials, or more formally, which languages the grammars of the system can generate. This approach is more technical and less understandable to the layman (customer), but it is less misleading, since it stresses the point that the fragments which can be translated by the grammars of a system need not necessarily coincide exactly with any particular genre. Generally, the syntactic and lexical rules of an MT system allow it to translate many sentences other than those belonging to a certain genre. On the other hand it probably cannot translate all the sentences of a particular genre. Swetra is a multilanguage MT system defined by the potentials of a formal grammar (standard referent grammar) and not by reference to a genre. Successful translation of sentences can be guaranteed if they are within a specified syntactic format based on a specified lexicon. The paper discusses the consequences of this approach (Grammatically Restricted Machine Translation, GRMT) and describes the limits set by a standard choice of grammatical rules for sentences and clauses, noun phrases, verb phrases, sentence adverbials, etc. Such rules have been set up for English, Swedish and Russian, mainly on the basis of familiarity (frequency) and computer efficiency, but restricting the grammar and making it suitable for several languages poses many problems for optimization. Sample texts—newspaper reports—illustrate the type of text that can be translated with reasonable success among Russian, English and Swedish.  相似文献   

11.
We present an integrated knowledge representation system for natural language processing (NLP) whose main distinguishing feature is its emphasis on encoding not only the usual propositional structure of the utterances in the input text, but also capturing an entire complex of nonpropositional — discourse, attitudinal, and other pragmatic — meanings that NL texts always carry. The need for discourse pragmatics, together with generic semantic information, is demonstrated in the context of anaphoric and definite noun phrase resolution for accurate machine translation. The major types of requisite pragmatic knowledge are presented, and an extension of a frame-based formalism developed in the context of the TRANSLATOR system is proposed as a first-pass codification of the integrated knowledge base.  相似文献   

12.
Simulation models involve the concepts oftime andspace. In designing a distribution simulation programming system, introducing a temporal construct results in a specification language for describing a changing world, introducing a spatial construct makes it possible to coordinate multiple, simultaneous, nondeterministic activities.In this paper, we present a new distributed logic programming model and discuss its implementation. A distributed program is represented by avirtual space—a set of process which are logical representations of system objects, and is evaluated with respect tovirtual time—a temporal coordinate which is used to measure computational progress and specify synchronization. The major focus of the implemention is the ability to accomplish global backtracking. The proposed implementation collects global knowledge through interprocess communication, controls global backtracking distributedly according tovirtual time anddependency relations, and capture heuristics in that earlier synchronizations may make subsequent synchronizations more likely to succeed.As compared with other distributed logic programming systems, our system provides a simpler syntax, well-defined semantics, and an efficient implementation.  相似文献   

13.
The subject of this paper is the complementary use of both a data base and an expert system in the analysis of urban areas in territorial planning.After a discussion of urban reasoning, this paper then states the fundamental principles of the URBYS system: —storage of the urban planner's knowledge; —the use of this knowledge by the expert system to assist in decision-making; —output of information from the urban data base to the expert system.A special effort was made to facilitate the use of this system and its closeness to the expert method by an easy modification of URBYS' knowledge without the overall coherence being affected.  相似文献   

14.
In knowledge management (KM)-related research, effective knowledge sharing is considered to be one of the most critical components of KM success. For the present research, the authors conducted a longitudinal, two-phased study to evaluate if the Theory of Reasoned Action (TRA) and three variations of the Theory of Planned Behavior—namely, TPB, decomposed TPB (DTPB), and revised TPB (RTPB)—can adequately predict knowledge sharing behaviors. The first TRA-based study shows a severe limitation in the ability of the intention to predict actual knowledge sharing behaviors collected from a knowledge management platform. In a subsequent study, three variations of TPB-based models were employed to show that, although the independent variables (i.e., attitude, subjective norm, and perceived behavior control that is decomposed into controllability and self-efficacy) give satisfactory explanations of variance in intention (R2 > 42%), the intention–behavior gap still exists in each of the three models. Only the perceived self-efficacy in the revised TPB can directly predict knowledge sharing behaviors. This gap highlights the importance of knowledge sharing as a fundamentally social activity for which the actualization of intention into actions may be interrupted due to barriers such as a mistake-free culture or others’ deliberate misinterpretations that may in turn cause unanticipated negative consequences to the person. The theoretical implication of this study is that in applying TPB to study knowledge sharing practices, researchers must focus on control beliefs that reflect people’s capacity to overcome possible environmental challenges encountered in carrying out their knowledge sharing intentions.  相似文献   

15.
Encoded state feedback is a term which refers to the situation in which the state feedback signal is sampled every T units of time and converted (encoded) into a binary representation. In this note stabilization of nonlinear systems by encoded state feedback is studied. It is shown that any nonlinear control system which can be globally asymptotically stabilized by “standard” (i.e. with no encoding) state feedback can also be globally asymptotically stabilized by encoded state feedback, provided that the number of bits used to encode the samples is not less than an explicitly determined lower bound. By means of this bound, we are able to establish a direct relationship between the size of the expected region of attraction and the data rate, under the stabilizability assumption only, a result which—to the best of our knowledge—does not have any precedent in the literature.  相似文献   

16.
Summary The (linear) failures semantics is a well-known model for the theoretical version of Hoare's CSP. We generalize this semantics by taking steps (i.e. multisets of simultaneously occurring actions) instead of single actions as the basic execution unit. Hence opposed to the linear semantics — where parallelism is modelled as arbitrary interleaving in order to avoid technical complication — the step failures semantics models parallelism explicitly and is equally easy to manage. In particular a sound and complete proof system is given. Opposed to the linear model divergence is treated uniformly here. The relation to the linear semantics can be established using our newly introduced deparallelize operator.The first author is supported by an Ernst von Siemens scholarship. A preliminary version of this paper appeared in [37]  相似文献   

17.
A model-based vehicle tracking system for the evaluation of inner-city traffic video sequences has been systematically tested on about 15 minutes of real world video data. Methodological improvements during preparatory test phases affected—among other changes—the combination of edge element and optical flow estimates in the measurement process and a more consequent exploitation of background knowledge. The explication of this knowledge in the form of models facilitates the evaluation of video data for different scenes by exchanging the scene-dependent models. An extensive series of experiments with a large test sample demonstrates that the current version of our system appears to have reached a relative optimum: further interactive tuning of tracking parameters does no longer promise to improve the overall system performance significantly. Even the incorporation of further knowledge regarding vehicle and scene geometry or illumination has to cope with an increasing level of interaction between different knowledge sources and system parameters. Our results indicate that model-based tracking of rigid objects in monocular image sequences may have to be reappraised more thoroughly than anticipated during the recent past.  相似文献   

18.
We live in a world characterized by evolution—that is, by ongoing processes of development, formation, and growth in both natural and human-created systems. Biology tells us that complex, natural systems are not created all at once but must instead evolve over time. We are becoming increasingly aware that evolutionary processes are ubiquitous and critical for technological innovations as well. This is particularly true for complex software systems because these systems do not necessarily exist in a technological context alone but instead are embedded within dynamic human organizations.The Center for LifeLong Learning and Design (L3D) at the University of Colorado has been involved in research on software design and other design domains for more than a decade. We understand software design as an evolutionary process in which system requirements and functionality are determined through an iterative process of collaboration among multiple stakeholders, rather than being completely specified before system development occurs. Our research focuses on the following claims about software systems embedded within dynamic human organizations: (1) they must evolve because they cannot be completely designed prior to use, (2) they must evolve to some extent at the hands of the users, and (3) they must be designed for evolution.Our theoretical work builds upon our existing knowledge of design processes and focuses on a software process model and architecture specifically for systems that must evolve. Our theories are instantiated and assessed through the development and evolution of domain-oriented design environments (DODEs)—software systems that support design activities within particular domains and that are built specifically to evolve.  相似文献   

19.
A common practice in operational Machine Translation (MT) and Natural Language Processing (NLP) systems is to assume that a verb has a fixed number of senses and rely on a precompiled lexicon to achieve large coverage. This paper demonstrates that this assumption is too weak to cope with the similar problems of lexical divergences between languages and unexpected uses of words that give rise to cases outside of the pre-compiled lexicon coverage. We first examine the lexical divergences between English verbs and Chinese verbs. We then focus on a specific lexical selection problem—translating Englishchange-of-state verbs into Chinese verb compounds. We show that an accurate translation depends not only on information about the participants, but also on contextual information. Therefore, selectional restrictions on verb arguments lack the necessary power for accurate lexical selection. Second, we examine verb representation theories and practices in MT systems and show that under the fixed sense assumption, the existing representation schemes are not adequate for handling these lexical divergences and extending existing verb senses to unexpected usages. We then propose a method of verb representation based on conceptual lattices which allows the similarities among different verbs in different languages to be quantitatively measured. A prototype system UNICON implements this theory and performs more accurate MT lexical selection for our chosen set of verbs. An additional lexical module for UNICON is also provided that handles sense extension.  相似文献   

20.
This report describes the current state of our central research thrust in the area of natural language generation. We have already reported on our text-level theory of lexical selection in natural language generation ([59, 60]), on a unification-based syntactic processor for syntactic generation ([73]) and designed a relatively flexible blackboard-oriented architecture for integrating these and other types of processing activities in generation ([60]). We have implemented these ideas in our prototype generator, Diogenes — a DIstributed, Opportunistic GENEration System — and tested our lexical selection and syntactic generation modules in a comprehensive natural language processing project — the KBMT-89 machine translation system ([15]). At this stage we are developing a more comprehensive Diogenes system, concentrating on both the theoretical and the system-building aspects of a) formulating a more comprehensive theory of distributed natural language generation; b) extending current theories of text organization as they pertain to the task of planning natural language texts; c) improving and extending the knowledge representation and the actual body of background knowledge (both domain and discourse/pragmatic) required for comprehensive text planning; d) designing and implementing algorithms for dynamic realization of text structure and integrating them into the blackboard style of communication and control; e) designing and implementing control algorithms for distributed text planning and realization. In this document we describe our ideas concerning opportunistic control for a natural language generation planner and present a research and development plan for the Diogenes project.Many people have contributed to the design and development of the Diogenes generation system over the last four years, especially Eric Nyberg, Rita McCardell, Donna Gates, Christine Defrise, John Leavitt, Scott Huffman, Ed Kenschaft and Philip Werner. Eric Nyberg and Masaru Tomita have created genkit, which is used as the syntactic component of Diogenes. A short version of this article appeared in Proceedings of IJCAI-89, co-authored with Victor Lesser and Eric Nyberg. To all the above many thanks. The remaining errors are the responsibility of this author.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号