共查询到20条相似文献,搜索用时 15 毫秒
1.
The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation
but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual
single-word and multi-word correspondences, translation rules) demands extensive manual work, and, as a consequence, bilingual
resources are usually more difficult to find than “shallow” monolingual resources such as morphological dictionaries or part-of-speech
taggers, especially when they involve a less-resourced language. This paper describes a methodology to build automatically
both bilingual dictionaries and shallow-transfer rules by extracting knowledge from word-aligned parallel corpora processed
with shallow monolingual resources (morphological analysers, and part-of-speech taggers). We present experiments for Brazilian
Portuguese–Spanish and Brazilian Portuguese–English parallel texts. The results show that the proposed methodology can enable
the rapid creation of valuable computational resources (bilingual dictionaries and shallow-transfer rules) for machine translation
and other natural language processing tasks). 相似文献
2.
3.
Bilingual termbanks are important for many natural language processing applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. The initial candidate terminology list is prepared by taking all arbitrary n-gram word sequences from the corpus. Then, a well-known statistical measure (the Dice coefficient) is employed in order to remove any multi-word terms with weak associations from the candidate term list. Thereafter, the log-likelihood comparison method is applied to rank the phrasal candidate term list. Then, using a phrase-based statistical machine translation model, we create a bilingual terminology with the extracted monolingual term lists. We integrate an external knowledge source—the Wikipedia cross-language link databases—into the terminology extraction (TE) model to assist two processes: (a) the ranking of the extracted terminology list, and (b) the selection of appropriate target terms for a source term. First, we report the performance of our monolingual TE model compared to a number of the state-of-the-art TE models on English-to-Turkish and English-to-Hindi data sets. Then, we evaluate our novel bilingual TE model on an English-to-Turkish data set, and report the automatic evaluation results. We also manually evaluate our novel TE model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains. 相似文献
4.
Petr Strossa 《Machine Translation》1994,9(1):61-80
The paper discusses a methodology of machine assisted translation (MAT) as an alternative to fully automatic MT. A prototype MAT system is described, which is an integration of a dictionary database system and a text editor. The functional requests for such a system from the linguistic point of view and some general problems of designing and implementing such systems are presented. Special attention is given to language-dependence, and to the problem of completeness and efficiency of the linguistic data representation for a very simple system. A statistic analysis of English inflexion and word derivation patterns is presented. 相似文献
5.
6.
《Expert systems with applications》2014,41(9):4494-4504
Key concept extraction is a major step for ontology learning that aims to build an ontology by identifying relevant domain concepts and their semantic relationships from a text corpus. The success of ontology development using key concept extraction strongly relies on the degree of relevance of the key concepts identified. If the identified key concepts are not closely relevant to the domain, the constructed ontology will not be able to correctly and fully represent the domain knowledge. In this paper, we propose a novel method, named CFinder, for key concept extraction. Given a text corpus in the target domain, CFinder first extracts noun phrases using their linguistic patterns based on Part-Of-Speech (POS) tags as candidates for key concepts. To calculate the weights (or importance) of these candidates within the domain, CFinder combines their statistical knowledge and domain-specific knowledge indicating their relative importance within the domain. The calculated weights are further enhanced by considering an inner structural pattern of the candidates. The effectiveness of CFinder is evaluated with a recently developed ontology for the domain of ‘emergency management for mass gatherings’ against the state-of-the-art methods for key concept extraction including—Text2Onto, KP-Miner and Moki. The comparative evaluation results show that CFinder statistically significantly outperforms all the three methods in terms of F-measure and average precision. 相似文献
7.
In recent years, several authors have presented algorithms that locate instances of a given string, or set of strings, within a text. Recently, authors have given less consideration to the complementary problem of processing a text to find out what strings appear in the text, without any preconceived notion of what strings might be present. A system called PATRICIA, which was developed two decades ago, is an implementation of a solution to this problem. The design of PATRICIA is very tightly bound to the assumptions that individual string elements are bits and that the user of the system can provide complete lists of starting and stopping places for strings. This paper presents an approach that drops these assumptions. Our method allows different definitions of indivisible string elements for different applications, and the only information the user provides for the determination of the beginning and ends of strings is a specification of a maximum length for output strings. This paper also describes a portable C implementation of the method, called PORTREP. The primary data structure of PORTREP is a trie represented as a ternary tree. PORTREP has a method for eliminating redundancy from the output, and it can function with a bounded number of nodes by employing a heuristic process that reuses seldom-visited nodes. Theoretical analysis and empirical studies, reported here, give confidence in the efficiency of the algorithms. PORTREP has the ability to form the basis for a variety of text-analysis applications, and this paper considers one such application, automatic document indexing. 相似文献
8.
Jianyong Duan Mei Zhang Wang Jingzhong Yushi Xu 《Expert systems with applications》2011,38(1):314-320
Bilingual multiword expression extraction is always a significant problem in extracting meaning from free text. This involves analyzing large amounts of textual information. In this paper we propose a text mining approach to extract bilingual multiword expression. Both statistic and rule-based methods are employed into the system. There are two phases in the extraction process. In the first phase, lots of candidates are extracted from the corpus by statistic methods. The algorithm of multiple sequence alignment is sensitive to the flexible multiword. In the second phase, error-driven rules and patterns are extracted from corpus. For acquired high qualified instances, the manual work with active learning is also performed in sample selection. These trained rules are used to filter the candidates. Bilingual comparisons are used in a parallel corpus. Parts of bilingual syntactic patterns are obtained from the bilingual phrase dictionary. Some related experiments are designed for achieving the best performance because there are lots of parameters in this system. Experimental results showed our approach gains good performance. 相似文献
9.
10.
Although there is no machine learning technique that fully meets human requirements, finding a quick and efficient translation mechanism has become an urgent necessity, due to the differences between the languages spoken in the world’s communities and the vast development that has occurred worldwide, as each technique demonstrates its own advantages and disadvantages. Thus, the purpose of this paper is to shed light on some of the techniques that employ machine translation available in literature, to encourage researchers to study these techniques. We discuss some of the linguistic characteristics of the Arabic language. Features of Arabic that are related to machine translation are discussed in detail, along with possible difficulties that they might present. This paper summarizes the major techniques used in machine translation from Arabic into English, and discusses their strengths and weaknesses. 相似文献
11.
Garcia C Delakis M 《IEEE transactions on pattern analysis and machine intelligence》2004,26(11):1408-1423
In this paper, we present a novel face detection approach based on a convolutional neural architecture, designed to robustly detect highly variable face patterns, rotated up to /spl plusmn/20 degrees in image plane and turned up to /spl plusmn/60 degrees, in complex real world images. The proposed system automatically synthesizes simple problem-specific feature extractors from a training set of face and nonface patterns, without making any assumptions or using any hand-made design concerning the features to extract or the areas of the face pattern to analyze. The face detection procedure acts like a pipeline of simple convolution and subsampling modules that treat the raw input image as a whole. We therefore show that an efficient face detection system does not require any costly local preprocessing before classification of image areas. The proposed scheme provides very high detection rate with a particularly low level of false positives, demonstrated on difficult test sets, without requiring the use of multiple networks for handling difficult cases. We present extensive experimental results illustrating the efficiency of the proposed approach on difficult test sets and including an in-depth sensitivity analysis with respect to the degrees of variability of the face patterns. 相似文献
12.
Wu-Chun Chung Hung-Pin Lin Shih-Chang Chen Mon-Fong Jiang Yeh-Ching Chung 《Automated Software Engineering》2014,21(4):489-508
As data exploration has increased rapidly in recent years, the datastore and data processing are getting more and more attention in extracting important information. To find a scalable solution to process the large-scale data is a critical issue in either the relational database system or the emerging NoSQL database. With the inherent scalability and fault tolerance of Hadoop, MapReduce is attractive to process the massive data in parallel. Most of previous researches focus on developing the SQL or SQL-like queries translator with the Hadoop distributed file system. However, it could be difficult to update data frequently in such file system. Therefore, we need a flexible datastore as HBase not only to place the data over a scale-out storage system, but also to manipulate the changeable data in a transparent way. However, the HBase interface is not friendly enough for most users. A GUI composed of SQL client application and database connection to HBase will ease the learning curve. In this paper, we propose the JackHare framework with SQL query compiler, JDBC driver and a systematical method using MapReduce framework for processing the unstructured data in HBase. After importing the JDBC driver to a SQL client GUI, we can exploit the HBase as the underlying datastore to execute the ANSI-SQL queries. Experimental results show that our approaches can perform well with efficiency and scalability. 相似文献
13.
《Interacting with computers》2005,17(1):85-104
In a bilingual civil society, such as that in Wales, language and the use of language can be a highly political issue. Within this context, web sites may act as a beneficial influence on the maintenance and revitalisation of the minority language, or may serve to exclude and marginalise that language. Through a study of existing web sites, this paper examines the extent to which the Welsh language is being presented as a usable tool through which individuals may be informed about and participate in civil society in Wales. While this work specifically considers Wales, the issues faced are similar to those of many other bilingual communities. 相似文献
14.
Differences in language and culture among participants in a meeting can present tremendous barriers to efficient and effective communication. Cultural and lingual barriers are becoming increasingly important issues to international managers as businesses continue to expand globally. This paper describes a group support system (GSS) which reduces many of these lingual and cultural barriers in groups composed of Spanish and English speakers. 相似文献
15.
16.
In a heterogeneous database system, a query for one type of database system (i.e., a source query) may have to be translated to an equivalent query (or queries) for execution in a different type of database system (i.e., a target query). Usually, for a given source query, there is more than one possible target query translation. Some of them can be executed more efficiently than others by the receiving database system. Developing a translation procedure for each type of database system is time-consuming and expensive. We abstract a generic hierarchical database system (GHDBS) which has properties common to database systems whose schema contains hierarchical structures (e.g., System 2000, IMS, and some object-oriented database systems). We develop principles of query translation with GHDBS as the receiving database system. Translation into any specific system can be accomplished by a translation into the general system with refinements to reflect the characteristics of the specific system. We develop rules that guarantee correctness of the target queries, where correctness means that the target query is equivalent to the source query. We also provide rules that can guarantee a minimum number of target queries in cases when one source query needs to be translated to multiple target queries. Since the minimum number of target queries implies the minimum number of times the underlying system is invoked, efficiency is taken into consideration 相似文献
17.
This paper describes the framework of the StatCan Daily Translation Extraction System (SDTES), a computer system that maps
and compares web-based translation texts of Statistics Canada (StatCan) news releases in the StatCan publication The Daily. The goal is to extract translations for translation memory systems, for translation terminology building, for cross-language
information retrieval and for corpus-based machine translation systems. Three years of officially published statistical news
release texts at were collected to compose the StatCan Daily data bank. The English and French texts in this collection were roughly aligned using the Gale-Church statistical algorithm.
After this, boundary markers of text segments and paragraphs were adjusted and the Gale-Church algorithm was run a second
time for a more fine-grained text segment alignment. To detect misaligned areas of texts and to prevent mismatched translation
pairs from being selected, key textual and structural properties of the mapped texts were automatically identified and used
as anchoring features for comparison and misalignment detection. The proposed method has been tested with web-based bilingual
materials from five other Canadian government websites. Results show that the SDTES model is very efficient in extracting
translations from published government texts, and very accurate in identifying mismatched translations. With parameters tuned,
the text-mapping part can be used to align corpus data collected from official government websites; and the text-comparing
component can be applied in prepublication translation quality control and in evaluating the results of statistical machine
translation systems. 相似文献
18.
We have designed a mobile robot with a distribution structure for intelligent life space. The mobile robot was constructed
using an aluminum frame. The mobile robot has the shape of a cylinder, and its diameter, height, and weight are 40 cm, 80
cm, and 40 kg, respectively. There are six systems in the mobile robot, including structure, an obstacle avoidance and driving
system, a software development system, a detection module system, a remote supervision system, and others. In the obstacle
avoidance and driving system, we use an NI motion control card to drive two DC servomotors in the mobile robot, and detect
obstacles using a laser range finder and a laser positioning system. Finally, we control the mobile robot using an NI motion
control card and a MAXON driver according to the programmed trajectory. The mobile robot can avoid obstacles using the laser
range finder, and follow the programmed trajectory. We developed the user interface with four functions for the mobile robot.
In the security system, we designed module-based security devices to detect dangerous events and transmit the detection results
to the mobile robot using a wireless RF interface. The mobile robot can move to the event position using the laser positioning
system. 相似文献
19.
《Expert systems with applications》2014,41(8):3615-3627
XML has been the de facto standard of data representation and exchange over the Web. In addition, fuzzy data are inherent in the real-world applications. Although fuzzy information has been extensively investigated in the context of relational database model, the classical relational database model and its fuzzy extension to date do not satisfy the need of modeling and processing complex objects with imprecision and uncertainty on the Web. Based on fuzzy sets, this paper concentrates on fuzzy information modeling in the EER (enhanced entity-relationship or extended entity-relationship) model and the fuzzy XML model. In particular, the formal approach to mapping the fuzzy EER model to the fuzzy DTD (document type definition) model is developed. 相似文献
20.
Computer animation and visualization can facilitate communication between the hearing impaired and those with normal speaking capabilities. This paper presents a model of a system that is capable of translating text from a natural language into animated sign language. Techniques have been developed to analyse language and transform it into sign language in a systematic way. A hand motion coding method as applied to the hand motion representation, and control has also been investigated. Two translation examples are also given to demonstrate the practicality of the system. 相似文献