首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We describe how to build a largecomprehensive, integrated Arabic lexicon byautomatic parsing of newspaper text. We havebuilt a parser system to read Arabic newspaperarticles, isolate the tokens from them, findthe part of speech, and the features for eachtoken. To achieve this goal we designed a setof algorithms, we generated several sets ofrules, and we developed a set of techniques,and a set of components to carry out thesetechniques. As each sentence is processed, newwords and features are added to the lexicon, sothat it grows continuously as the system runs.To test the system we have used 100 articles(80,444 words) from the Al-Raya newspaper.The system consists of several modules: thetokenizer module to isolate the tokens, the type findersystem to find the part of speech of eachtoken, the proper noun phrase parser module tomark the proper nouns and to discover someinformation about them and the feature findermodule to find the features of the words.  相似文献   

2.
3.
As the cognitive processes of natural language understanding and generation are better understood, it is becoming easier, nowadays, to perform machine translation. In this paper we present our work on machine translation from Arabic to English and French, and illustrate it with a fully operational system, which runs on PC compatibles with Arabic/Latin interface. This system is an extension of an earlier system, whose task was the analysis of the natural language Arabic. Thanks to the regularity of its phrase structures and word patterns, Arabic lends itself quite naturally to a Fillmore-like analysis. The meaning of a phrase is stored in a star-like data structure, where the verb occupies the center of the star and the various noun sentences occupy specific peripheral nodes of the star. The data structure is then translated into an internal representation in the target language, which is then mapped into the target text.  相似文献   

4.
In this paper, we present a comprehensive approach for extracting and relating Arabic multiword expressions (MWE) from Social Networks. 15 million tweets were collected and processed to form our data set. Due to the complexity of processing Arabic and the lack of resources, we built an experimental system to extract and relate similar MWE using statistical methods. We introduce a new metrics for measuring valid MWE in Social Networks. We compare results obtained from our experimental system against semantic graph obtained from web knowledgebase.  相似文献   

5.
使用基于短语的统计翻译方法,搭建了一个简易的阿拉伯语到中文的翻译系统。核心的解码器采用了loglinear直接翻译模型进行开发,在系统中使用了大量的开源软件进行语料库的预处理,并讨论了该方向上尚未解决的问题和未来的发展趋势。  相似文献   

6.
Recognition of Arabic characters   总被引:1,自引:0,他引:1  
A statistical approach for the recognition of Arabic characters is introduced. As a first step, the character is segmented into primary and secondary parts (dots and zigzags). The secondary parts of the character are then isolated and identified separately, thereby reducing the number of classes from 28 to 18. The moments of the horizontal and vertical projections of the remaining primary characters are then calculated and normalized with respect to the zero-order moment. Simple measures of the shape are obtained from the normalized moments. A 9-D feature vector is obtained for each character. Classification is accomplished using quadratic discriminant functions. The approach was evaluated using isolated, handwritten, and printed characters from a database established for this purpose. The results indicate that the technique offers better classification rates in comparison with existing methods  相似文献   

7.
With the expanding growth of Arabic electronic data on the web, extracting information, which is actually one of the major challenges of the question-answering, is essentially used for building corpus of documents. In fact, building a corpus is a research topic that is currently referred to among some other major themes of conferences, in natural language processing (NLP), such as, information retrieval (IR), question-answering (QA), automatic summary (AS), etc. Generally, a question-answering system provides various passages to answer the user questions. To make these passages truly informative, this system needs access to an underlying knowledge base; this requires the construction of a corpus. The aim of our research is to build an Arabic question-answering system. In addition, analyzing the question must be the first step. Next, it is essential to retrieve a passage from the web that can serve as an appropriate answer. In this paper, we propose a method to analysis the question and retrieve the passage answer in the Arabic language. For the question analysis, five factual question types are processed. Additionally, our purpose is to experiment with the generation of a logic representation from the declarative form of each question. Several studies, deal with the logic approaches in question-answering, are discussed in other languages than the Arabic language. This representation is very promising because it helps us later in the selection of a justifiable answer. The accuracy of questions that are correctly analyzed and translated into the logic form achieved 64%. And then, the results of passages of texts that are automatically generated achieved an 87% score for accuracy and a 98% score for c@1.  相似文献   

8.
Cadmium Selenide thermodynamic formation energies at the molecular and nanoscale range are investigated using density functional theory. The investigation is performed using wurtzoid and diamondoid clusters that represent the wurtzite and zincblende structures at the molecular and nanoscale size range for a cluster number of atoms n ≤ 26. Cd and Se atomic clusters are optimized and used to provide component atomic cluster energies. Although both Cd and Se clusters at the nanoscale have different phases than bulk, the results show that Gibbs free energy, enthalpy, and entropy of formation of CdSe are close to their experimental bulk energies of formation within errors of experimental measurements. CdSe wurtzoids generally have higher absolute (more negative) Gibbs free energy of formation than CdSe diamondoids indicating more stable wurtzoid molecules which is also the case at bulk. The absolute Gibbs free energy of wurtzoids is also higher than experimental value (more negative) because of surface effects at the nanoscale. Enthalpy of formation indicates an exothermic reaction of Cd and Se clusters as is the case at bulk. The entropy of formation of all clusters is size-sensitive and converges towards bulk experimental measurements. Both wurtzoids and diamondoids members contain Cd13Se13 cluster which is the most investigated magic CdSe cluster.  相似文献   

9.
10.
Automatic outline capture of Arabic fonts   总被引:2,自引:0,他引:2  
This paper presents an algorithm for automatic outline capture of digital character images, particularly suitable for non-Roman languages like Arabic. In most of the desktop publishing systems, the shapes of the characters are stored in the computer memory in terms of their outlines, and the outlines are expressed as cubic Bezier curves. The process of producing outlines includes various steps, detection of boundary, discovering corner points and break points and fitting the curve. This process becomes slow and inaccurate if there is any involvement of humans in any of the above steps. The work done, in this paper, fully automates the process and produces the best optimal results  相似文献   

11.
12.
An automatic off-line character recognition system for handwritten cursive Arabic characters is presented. A robust noise-independent algorithm is developed that yields skeletons that reflect the structural relationships of the character components. The character skeleton is converted to a tree structure suitable for recognition. A set of fuzzy constrained character graph models (FCCGM's), which tolerate large variability in writing, is designed. These models are graphs, with fuzzily labeled arcs used as prototypes for the characters. A set of rules is applied in sequence to match a character tree to an FCCGM. Arabic handwritings of four writers were used in the learning and testing stages. The system proved to be powerful in tolerance to variable writing, speed, and recognition rate  相似文献   

13.
《Computers & Fluids》2006,35(8-9):986-993
The Lattice–Boltzmann method (LBM) for simulation of low Mach number flows is evaluated for the application in flow acoustics. By linearization and von-Neumann analysis quantitative measures for the accuracy of phase speed and attenuation of low amplitude sound waves in presence of a mean flow are derived. It is shown that only phase errors are relevant when simulating sound waves in the audible frequency range in air. For the two dimensional 9 bit model and the three dimensional 19 bit model the phase speed error is below 0.1% (1%) as long as the wave is resolved with at least 34 (12) points per wavelength. The LBM is applied to the problem of a Helmholtz resonator under a grazing flow and to the trailing edge noise generation problem. The results clearly demonstrate the ability to reproduce relevant flow acoustic effects.  相似文献   

14.
心脏杂音提取和分类识别研究   总被引:1,自引:0,他引:1  
为了分析心脏杂音中包含的病理信息,采用奇异谱主分量分析方法从病理心音信号中提取杂音成分。对四种常见的病理心音信号进行奇异谱分析,得到各主分量和经验正交函数,选择合适阶次重构正常心音成分和杂音成分。计算杂音信号的样本熵作为特征值输入支持向量机分类器实现分类识别,为临床诊断提供参考信息。  相似文献   

15.
本体技术是数据可以达到语义层次交换的关键,如何将当前各类数据形成本体知识库表示是一个非常重要的问题。针对这个问题,以一种关系模式到一种语义扩展ER模型的正确性可满足转换算法为基础,提出了一种通过数据库反向工程到OWL DL本体的翻译算法,说明了该算法使得转换是正确性可满足的,并实验实现验证了算法。  相似文献   

16.
This paper describes the design and implementation of a computational model for Arabic natural language semantics, a semantic parser for capturing the deep semantic representation of Arabic text. The parser represents a major part of an Interlingua-based machine translation system for translating Arabic text into Sign Language. The parser follows a frame-based analysis to capture the overall meaning of Arabic text into a formal representation suitable for NLP applications that need for deep semantics representation, such as language generation and machine translation. We will show the representational power of this theory for the semantic analysis of texts in Arabic, a language which differs substantially from English in several ways. We will also show that the integration of WordNet and FrameNet in a single unified knowledge resource can improve disambiguation accuracy. Furthermore, we will propose a rule based algorithm to generate an equivalent Arabic FrameNet, using a lexical resource alignment of FrameNet1.3 LUs and WordNet3.0 synsets for English Language. A pilot study of motion and location verbs was carried out in order to test our system. Our corpus is made up of more than 2000 Arabic sentences in the domain of motion events collected from Algerian first level educational Arabic books and other relevant Arabic corpora.  相似文献   

17.
18.
Fuzzy-clustering methods, such as fuzzy k-means and expectation maximization, allow an object to be assigned to multiple clusters with different degrees of membership. However, the memberships that result from fuzzy-clustering algorithms are difficult to be analyzed and visualized. The memberships, usually converted to 0-1 values, are visualized using parallel coordinates or different color shades. In this paper, we propose a new approach to visualize fuzzy-clustered data. The scheme is based on a geometric visualization, and works by grouping the objects with similar cluster memberships towards the vertices of a hyper-tetrahedron. The proposed method shows clear advantages over the existing methods, demonstrating its capabilities for viewing and navigating inter-cluster relationships in a spatial manner.  相似文献   

19.
The task of building an ontology from a textual corpus starts with the conceptualization phase, which extracts ontology concepts. These concepts are linked by semantic relationships. In this paper, we describe an approach to the construction of an ontology from an Arabic textual corpus, starting first with the collection and preparation of the corpus through normalization, removing stop words and stemming; then, to extract terms of our ontology, a statistical method for extracting simple and complex terms, called “the repeated segments method” are applied. To select segments with sufficient weight we apply the weighting method term frequency–inverse document frequency (TF–IDF), and to link these terms by semantic relationships we apply an automatic method of learning linguistic markers from text. This method requires a dataset of relationship pairs, which are extracted from two external resources: an Arabic dictionary of synonyms and antonyms and the lexical database Arabic WordNet. Finally, we present the results of our experimentation using our textual corpus. The evaluation of our approach shows encouraging results in terms of recall and precision.  相似文献   

20.
Universal Access in the Information Society - Arabic sign language (ArSL) is a full natural language that is used by the deaf in Arab countries to communicate in their community. Unfamiliarity with...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号