期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Extracting an Arabic Lexicon from Arabic Newspaper Text

Saleem Abuleil Martha Evens 《Computers and the Humanities》2002,36(2):191-221

We describe how to build a largecomprehensive, integrated Arabic lexicon byautomatic parsing of newspaper text. We havebuilt a parser system to read Arabic newspaperarticles, isolate the tokens from them, findthe part of speech, and the features for eachtoken. To achieve this goal we designed a setof algorithms, we generated several sets ofrules, and we developed a set of techniques,and a set of components to carry out thesetechniques. As each sentence is processed, newwords and features are added to the lexicon, sothat it grows continuously as the system runs.To test the system we have used 100 articles(80,444 words) from the Al-Raya newspaper.The system consists of several modules: thetokenizer module to isolate the tokens, the type findersystem to find the part of speech of eachtoken, the proper noun phrase parser module tomark the proper nouns and to discover someinformation about them and the feature findermodule to find the features of the words. 相似文献

2.

Modeling Vocal Knowledge for Speech Synthesis of Standard Arabic

Tebbi Hanane Azzoune Hamid 《通讯和计算机》2014,(1):104-109

相似文献

3.

Machine translation from Arabic to English and French

《Information Sciences - Applications #》1995,3(2):91-109

As the cognitive processes of natural language understanding and generation are better understood, it is becoming easier, nowadays, to perform machine translation. In this paper we present our work on machine translation from Arabic to English and French, and illustrate it with a fully operational system, which runs on PC compatibles with Arabic/Latin interface. This system is an extension of an earlier system, whose task was the analysis of the natural language Arabic. Thanks to the regularity of its phrase structures and word patterns, Arabic lends itself quite naturally to a Fillmore-like analysis. The meaning of a phrase is stored in a star-like data structure, where the verb occupies the center of the star and the various noun sentences occupy specific peripheral nodes of the star. The data structure is then translated into an internal representation in the target language, which is then mapped into the target text. 相似文献

4.

Time-sensitive Arabic multiword expressions extraction from social networks

Daoud Daoud Akram Al-Kouz Mohammad Daoud 《International Journal of Speech Technology》2016,19(2):249-258

In this paper, we present a comprehensive approach for extracting and relating Arabic multiword expressions (MWE) from Social Networks. 15 million tweets were collected and processed to form our data set. Due to the complexity of processing Arabic and the lack of resources, we built an experimental system to extract and relate similar MWE using statistical methods. We introduce a new metrics for measuring valid MWE in Social Networks. We compare results obtained from our experimental system against semantic graph obtained from web knowledgebase. 相似文献

5.

基于短语的阿拉伯语到中文的机器翻译系统*

李凯郑洁蒋同海《计算机应用研究》2009,26(6):2306-2309

使用基于短语的统计翻译方法,搭建了一个简易的阿拉伯语到中文的翻译系统。核心的解码器采用了loglinear直接翻译模型进行开发,在系统中使用了大量的开源软件进行语料库的预处理,并讨论了该方向上尚未解决的问题和未来的发展趋势。相似文献

6.

Recognition of Arabic characters 总被引：1，自引：0，他引：1

Al-Yousefi H. Udpa S.S. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(8):853-857

A statistical approach for the recognition of Arabic characters is introduced. As a first step, the character is segmented into primary and secondary parts (dots and zigzags). The secondary parts of the character are then isolated and identified separately, thereby reducing the number of classes from 28 to 18. The moments of the horizontal and vertical projections of the remaining primary characters are then calculated and normalized with respect to the zero-order moment. Simple measures of the shape are obtained from the normalized moments. A 9-D feature vector is obtained for each character. Classification is accomplished using quadratic discriminant functions. The approach was evaluated using isolated, handwritten, and printed characters from a database established for this purpose. The results indicate that the technique offers better classification rates in comparison with existing methods 相似文献

7.

A logical representation of Arabic questions toward automatic passage extraction from the Web

Wided Bakari Patrice Bellot Mahmoud Neji 《International Journal of Speech Technology》2017,20(2):339-353

With the expanding growth of Arabic electronic data on the web, extracting information, which is actually one of the major challenges of the question-answering, is essentially used for building corpus of documents. In fact, building a corpus is a research topic that is currently referred to among some other major themes of conferences, in natural language processing (NLP), such as, information retrieval (IR), question-answering (QA), automatic summary (AS), etc. Generally, a question-answering system provides various passages to answer the user questions. To make these passages truly informative, this system needs access to an underlying knowledge base; this requires the construction of a corpus. The aim of our research is to build an Arabic question-answering system. In addition, analyzing the question must be the first step. Next, it is essential to retrieve a passage from the web that can serve as an appropriate answer. In this paper, we propose a method to analysis the question and retrieve the passage answer in the Arabic language. For the question analysis, five factual question types are processed. Additionally, our purpose is to experiment with the generation of a logic representation from the declarative form of each question. Several studies, deal with the logic approaches in question-answering, are discussed in other languages than the Arabic language. This representation is very promising because it helps us later in the selection of a justifiable answer. The accuracy of questions that are correctly analyzed and translated into the logic form achieved 64%. And then, the results of passages of texts that are automatically generated achieved an 87% score for accuracy and a 98% score for c@1. 相似文献

8.

Formation energies of CdSe wurtzoid and diamondoid clusters formed from Cd and Se atomic clusters

《Calphad》2019

Cadmium Selenide thermodynamic formation energies at the molecular and nanoscale range are investigated using density functional theory. The investigation is performed using wurtzoid and diamondoid clusters that represent the wurtzite and zincblende structures at the molecular and nanoscale size range for a cluster number of atoms n ≤ 26. Cd and Se atomic clusters are optimized and used to provide component atomic cluster energies. Although both Cd and Se clusters at the nanoscale have different phases than bulk, the results show that Gibbs free energy, enthalpy, and entropy of formation of CdSe are close to their experimental bulk energies of formation within errors of experimental measurements. CdSe wurtzoids generally have higher absolute (more negative) Gibbs free energy of formation than CdSe diamondoids indicating more stable wurtzoid molecules which is also the case at bulk. The absolute Gibbs free energy of wurtzoids is also higher than experimental value (more negative) because of surface effects at the nanoscale. Enthalpy of formation indicates an exothermic reaction of Cd and Se clusters as is the case at bulk. The entropy of formation of all clusters is size-sensitive and converges towards bulk experimental measurements. Both wurtzoids and diamondoids members contain Cd₁₃Se₁₃ cluster which is the most investigated magic CdSe cluster. 相似文献

9.

On-line recognition of handwritten Arabic characters

Al-Emami S. Usher M. 《IEEE transactions on pattern analysis and machine intelligence》1990,12(7):704-710

相似文献

10.

Automatic outline capture of Arabic fonts 总被引：2，自引：0，他引：2

M. Sarfraz M. A. Khan 《Information Sciences》2002,140(3-4):269-281

This paper presents an algorithm for automatic outline capture of digital character images, particularly suitable for non-Roman languages like Arabic. In most of the desktop publishing systems, the shapes of the characters are stored in the computer memory in terms of their outlines, and the outlines are expressed as cubic Bezier curves. The process of producing outlines includes various steps, detection of boundary, discovering corner points and break points and fitting the curve. This process becomes slow and inaccurate if there is any involvement of humans in any of the above steps. The work done, in this paper, fully automates the process and produces the best optimal results 相似文献

11.

Production of referring expressions in Arabic

Imtiaz Hussain Khan 《International Journal of Speech Technology》2016,19(2):385-392

相似文献

12.

Recognition of handwritten cursive Arabic characters

Abuhaiba I.S.I. Mahmoud S.A. Green R.J. 《IEEE transactions on pattern analysis and machine intelligence》1994,16(6):664-672

An automatic off-line character recognition system for handwritten cursive Arabic characters is presented. A robust noise-independent algorithm is developed that yields skeletons that reflect the structural relationships of the character components. The character skeleton is converted to a tree structure suitable for recognition. A set of fuzzy constrained character graph models (FCCGM's), which tolerate large variability in writing, is designed. These models are graphs, with fuzzily labeled arcs used as prototypes for the characters. A set of rules is applied in sequence to match a character tree to an FCCGM. Arabic handwritings of four writers were used in the learning and testing stages. The system proved to be powerful in tolerance to variable writing, speed, and recognition rate 相似文献

13.

Calculation of sound generation and radiation from instationary flows

《Computers & Fluids》2006,35(8-9):986-993

The Lattice–Boltzmann method (LBM) for simulation of low Mach number flows is evaluated for the application in flow acoustics. By linearization and von-Neumann analysis quantitative measures for the accuracy of phase speed and attenuation of low amplitude sound waves in presence of a mean flow are derived. It is shown that only phase errors are relevant when simulating sound waves in the audible frequency range in air. For the two dimensional 9 bit model and the three dimensional 19 bit model the phase speed error is below 0.1% (1%) as long as the wave is resolved with at least 34 (12) points per wavelength. The LBM is applied to the problem of a Helmholtz resonator under a grazing flow and to the trailing edge noise generation problem. The results clearly demonstrate the ability to reproduce relevant flow acoustic effects. 相似文献

14.

心脏杂音提取和分类识别研究 总被引：1，自引：0，他引：1

郭兴明胡童宜汤丽平《计算机工程与应用》2012,48(15):149-152,167

为了分析心脏杂音中包含的病理信息,采用奇异谱主分量分析方法从病理心音信号中提取杂音成分。对四种常见的病理心音信号进行奇异谱分析,得到各主分量和经验正交函数,选择合适阶次重构正常心音成分和杂音成分。计算杂音信号的样本熵作为特征值输入支持向量机分类器实现分类识别,为临床诊断提供参考信息。相似文献

15.

一种关系模式到OWL DL本体的翻译方法

下载免费PDF全文

彭劲松徐德智《计算机工程与应用》2008,44(12):166-169

本体技术是数据可以达到语义层次交换的关键,如何将当前各类数据形成本体知识库表示是一个非常重要的问题。针对这个问题,以一种关系模式到一种语义扩展ER模型的正确性可满足转换算法为基础,提出了一种通过数据库反向工程到OWL DL本体的翻译算法,说明了该算法使得转换是正确性可满足的,并实验实现验证了算法。相似文献

16.

A frame-based approach for capturing semantics from Arabic text for text-to-sign language MT

Abdelaziz Lakhfif Mohamed Tayeb Laskri 《International Journal of Speech Technology》2016,19(2):203-228

This paper describes the design and implementation of a computational model for Arabic natural language semantics, a semantic parser for capturing the deep semantic representation of Arabic text. The parser represents a major part of an Interlingua-based machine translation system for translating Arabic text into Sign Language. The parser follows a frame-based analysis to capture the overall meaning of Arabic text into a formal representation suitable for NLP applications that need for deep semantics representation, such as language generation and machine translation. We will show the representational power of this theory for the semantic analysis of texts in Arabic, a language which differs substantially from English in several ways. We will also show that the integration of WordNet and FrameNet in a single unified knowledge resource can improve disambiguation accuracy. Furthermore, we will propose a rule based algorithm to generate an equivalent Arabic FrameNet, using a lexical resource alignment of FrameNet1.3 LUs and WordNet3.0 synsets for English Language. A pilot study of motion and location verbs was carried out in order to test our system. Our corpus is made up of more than 2000 Arabic sentences in the domain of motion events collected from Algerian first level educational Arabic books and other relevant Arabic corpora. 相似文献

17.

Phonetization of Arabic: rules and algorithms

Yousif A. El-Imam 《Computer Speech and Language》2004,18(4):339-373

相似文献

18.

Geometric visualization of clusters obtained from fuzzy clustering algorithms

Luis Rueda Yuanquan Zhang 《Pattern recognition》2006,39(8):1415-1429

Fuzzy-clustering methods, such as fuzzy k-means and expectation maximization, allow an object to be assigned to multiple clusters with different degrees of membership. However, the memberships that result from fuzzy-clustering algorithms are difficult to be analyzed and visualized. The memberships, usually converted to 0-1 values, are visualized using parallel coordinates or different color shades. In this paper, we propose a new approach to visualize fuzzy-clustered data. The scheme is based on a geometric visualization, and works by grouping the objects with similar cluster memberships towards the vertices of a hyper-tetrahedron. The proposed method shows clear advantages over the existing methods, demonstrating its capabilities for viewing and navigating inter-cluster relationships in a spatial manner. 相似文献

19.

Extraction of terms and semantic relationships from Arabic texts for automatic construction of an ontology

Ali Benabdallah Mohammed AlaEddine Abderrahim Mohammed El-Amine Abderrahim 《International Journal of Speech Technology》2017,20(2):289-296

The task of building an ontology from a textual corpus starts with the conceptualization phase, which extracts ontology concepts. These concepts are linked by semantic relationships. In this paper, we describe an approach to the construction of an ontology from an Arabic textual corpus, starting first with the collection and preparation of the corpus through normalization, removing stop words and stemming; then, to extract terms of our ontology, a statistical method for extracting simple and complex terms, called “the repeated segments method” are applied. To select segments with sufficient weight we apply the weighting method term frequency–inverse document frequency (TF–IDF), and to link these terms by semantic relationships we apply an automatic method of learning linguistic markers from text. This method requires a dataset of relationship pairs, which are extracted from two external resources: an Arabic dictionary of synonyms and antonyms and the lexical database Arabic WordNet. Finally, we present the results of our experimentation using our textual corpus. The evaluation of our approach shows encouraging results in terms of recall and precision. 相似文献

20.

Automatic translation of Arabic text-to-Arabic sign language

Luqman Hamzah Mahmoud Sabri A. 《Universal Access in the Information Society》2019,18(4):939-951

Universal Access in the Information Society - Arabic sign language (ArSL) is a full natural language that is used by the deaf in Arab countries to communicate in their community. Unfamiliarity with... 相似文献