首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This paper presents two grammars for reading numbers of classical andmodern Arabic language. The grammars make use of the structured Arabiccounting system to present an accurate and compact grammar that can beeasily implemented in different platforms. Automating the process ofreading numbers from its numerical representation to its sentential formhas many applications. Inquiring about your bank balance over the phone,automatically writing the amount of checks (from numerical form toletter form), and reading for the blind people are some of the fieldsthat automated reading of numbers can be of service. The parsing problemof sentential representation of numbers in the Arabic language is alsoaddressed. A grammar to convert from sentential representation to thenumerical representation is also presented. Grammars presented can beused to translate from the sentential Arabic numbers to sententialEnglish numbers, and vice versa, by using the common numericalrepresentation as an intermediate code. Such methodology can be used toaid the automatic translation between the two natural languages. Allgrammars described in this paper have been implemented on a UNIX system.Examples of different number representations and the output of theimplementation of the grammars are given as part of the paper.  相似文献   

2.
Nava Ehsan  Heshaam Faili 《Software》2013,43(2):187-206
Producing electronic rather than paper documents has considerable benefits such as easier organizing and data management. Therefore, existence of automatic writing assistance tools such as spell and grammar checker/correctors can increase the quality of electronic texts by removing noise and correcting the erroneous sentences. Different kinds of errors in a text can be categorized into spelling, grammatical and real‐word errors. In this article, we present a language‐independent approach based on a statistical machine translation framework to develop a proofreading tool, which detects grammatical errors as well as context‐sensitive spelling mistakes (real‐word errors). A hybrid model for grammar checking is suggested by combining the mentioned approach with an existing rule‐based grammar checker. Experimental results on both English and Persian languages indicate that the proposed statistical method and the rule‐based grammar checker are complementary in detecting and correcting syntactic errors. The results of the hybrid grammar checker, applied to some English texts, show an improvement of about 24% with respect to the recall metric with almost similar value for precision. Experiments on real‐world data set show that state‐of‐the‐art results are achieved for grammar checking and context‐sensitive spell checking for Persian language. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

3.
Marcin Miłkowski 《Software》2010,40(7):543-566
In this paper, we show how an open‐source, language‐independent proofreading tool has been built. Many languages lack contextual proofreading tools; for many others, only partial solutions are available. Using existing, largely language‐independent tools and collaborative processes it is possible to develop a practical style and grammar checker and to fight the digital divide in countries where commercial linguistic application software is unavailable or too expensive for average users. The described solution depends on relatively easily available language resources and does not require a fully formalized grammar nor a deep parser, yet it can detect many frequent context‐dependent spelling mistakes, as well as grammatical, punctuation, usage, and stylistic errors. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

4.
The World Wide Web (WWW) today is so vast that it has become more and more difficult to find answers to questions using standard search engines. Current search engines can return ranked lists of documents, but they do not deliver direct answers to the user. The goal of Open Domain Question Answering (QA) systems is to take a natural language question, understand the meaning of the question, and present a short answer as a response based on a repository of information. In this paper we present QARAB, a QA system that combines techniques from Information Retrieval and Natural Language Processing. This combination enables domain independence. The system takes natural language questions expressed in the Arabic language and attempts to provide short answers in Arabic. To do so, it attempts to discover what the user wants by analyzing the question and a variety of candidate answers from a linguistic point of view.  相似文献   

5.
Arabic is the world’s first language, categorized by its rich and complicated grammatical formats. Furthermore, the Arabic morphology can be perplexing because nearly 10,000 roots and 900 patterns were the basis for verbs and nouns. The Arabic language consists of distinct variations utilized in a community and particular situations. Social media sites are a medium for expressing opinions and social phenomena like racism, hatred, offensive language, and all kinds of verbal violence. Such conduct does not impact particular nations, communities, or groups only, extending beyond such areas into people’s everyday lives. This study introduces an Improved Ant Lion Optimizer with Deep Learning Dirven Offensive and Hate Speech Detection (IALODL-OHSD) on Arabic Cross-Corpora. The presented IALODL-OHSD model mainly aims to detect and classify offensive/hate speech expressed on social media. In the IALODL-OHSD model, a three-stage process is performed, namely pre-processing, word embedding, and classification. Primarily, data pre-processing is performed to transform the Arabic social media text into a useful format. In addition, the word2vec word embedding process is utilized to produce word embeddings. The attention-based cascaded long short-term memory (ACLSTM) model is utilized for the classification process. Finally, the IALO algorithm is exploited as a hyperparameter optimizer to boost classifier results. To illustrate a brief result analysis of the IALODL-OHSD model, a detailed set of simulations were performed. The extensive comparison study portrayed the enhanced performance of the IALODL-OHSD model over other approaches.  相似文献   

6.
Abstract

The stemming is the process of transforming a word into its root or stem, hence, it is considered as a crucial pre-processing step before tackling any task of natural language processing or information retrieval. However, in the case of Arabic language, finding an effective stemming algorithm seems to be quite difficult, since the Arabic language has a specific morphology, which is different from many other languages. Although, there exist several algorithms in literature addressing the Arabic stemming issue, unfortunately, most of them are restricted to a limited number of words, present some confusions between original letters and affixes, and usually employ dictionary of words or patterns. For that purpose, we propose the design and implementation of a novel Arabic light stemmer, which is based on some new rules for stripping prefixes, suffixes and infixes in a smart way. And in our knowledge, it is the first work dealing with Arabic infixes with regards to their irregular rules. The empirical evaluation was conducted on a new Arabic data-set (called ARASTEM), which was conceived and collected from several Arabic discussion forums containing dialectical Arabic and modern pseudo-Arabic languages. Hence, we present a comparative investigation between our new stemmer and other existing stemmers using Paice’s parameters, namely: Under Stemming Index (UI), Over Stemming Index (OI) and Stemming Weight (SW). Results show that the proposed Arabic light stemmer maintains consistently high performances and outperforms several existing light stemmers.  相似文献   

7.
The retrieval of information from scanned handwritten documents is becoming vital with the rapid increase of digitized documents, and word spotting systems have been developed to search for words within documents. These systems can be either template matching algorithms or learning based. This paper presents a coherent learning based Arabic handwritten word spotting system which can adapt to the nature of Arabic handwriting, which can have no clear boundaries between words. Consequently, the system recognizes Pieces of Arabic Words (PAWs), then re-constructs and spots words using language models. The proposed system produced promising result for Arabic handwritten word spotting when tested on the CENPARMI Arabic documents database.  相似文献   

8.
9.
There has been much interest recently in two-level and associative models for handling morphologically rich inflectional languages. Such models are claimed to have advantages over generative, rule-based approaches in terms of not just conceptual appropriateness but also computational efficiency. The claim with regard to the former is that, whilst generative approaches to morphology may well be useful for inflectionally simple natural languages such as English (where most of the processing is carried out at the sentence level, with dictionaries and lexicons being accessed to identify secondary inflectional information once primitive words are found), this approach is not at all suitable for inflectionally rich languages where grammatical information is carried not by the combination or pattern of distinct and separate words which make up the sentence but by the combination or pattern of inflections within a word where, for instance, there are no clear boundaries between morphological constituents. The claim with regard to the latter is that many generative approaches to natural language are inefficient and, in some cases, computationally intractable, because of the heavy memory and processing demand placed on implementing actual models based on these approaches for anything more than a constrained fragment of a language. This paper describes an application of finite-state automata for Arabic noun inflections which leads to abstractions based on network topology as well as the form and content of network arcs. The idea of specific automata for specific inflection types inheriting some or all of the nodes, arc form and arc content of abstract automata representing more abstract classes of inflection is also introduced. This can lead to novel linguistic generalities and applications, as well as advantages in terms of procedural efficiency and representation.  相似文献   

10.
SAMAR is a system for subjectivity and sentiment analysis (SSA) for Arabic social media genres. Arabic is a morphologically rich language, which presents significant complexities for standard approaches to building SSA systems designed for the English language. Apart from the difficulties presented by the social media genres processing, the Arabic language inherently has a high number of variable word forms leading to data sparsity. In this context, we address the following 4 pertinent issues: how to best represent lexical information; whether standard features used for English are useful for Arabic; how to handle Arabic dialects; and, whether genre specific features have a measurable impact on performance. Our results show that using either lemma or lexeme information is helpful, as well as using the two part of speech tagsets (RTS and ERTS). However, the results show that we need individualized solutions for each genre and task, but that lemmatization and the ERTS POS tagset are present in a majority of the settings.  相似文献   

11.
12.
The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic structures of a language. The development of these grammars is laborious and time consuming. In this paper we present our method for building an Arabic parser based on an induced grammar, PCFG grammar. We first induce the PCFG grammar from an Arabic Treebank. Then, we implement the parser that assigns syntactic structure to each input sentence. The parser is tested on sentences extracted from the treebank (1650 sentences).We calculate the precision, recall and f-measure. Our experimental results showed the efficiency of the proposed parser for parsing modern standard Arabic sentences (Precision: 83.59 %, Recall: 82.98 % and F-measure: 83.23 %).  相似文献   

13.
Recently, with the spread of online services involving websites, attackers have the opportunity to expose these services to malicious actions. To protect these services, A Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is a proposed technique. Since many Arabic countries have developed their online services in Arabic, Arabic text-based CAPTCHA has been introduced to improve the usability for their users. Moreover, there exist a visual cryptography (VC) technique which can be exploited in order to enhance the security of text-based CAPTCHA by encrypting a CAPTCHA image into two shares and decrypting it by asking the user to stack them on each other. However, as yet, the implementation of this technique with regard to Arabic text-based CAPTCHA has not been carried out. Therefore, this paper aims to implement an Arabic printed and handwritten text-based CAPTCHA scheme based on the VC technique. To evaluate this scheme, experimental studies are conducted, and the results show that the implemented scheme offers a reasonable security and usability levels with text-based CAPTCHA itself.  相似文献   

14.
This paper presents a field study carried out with learners who used a grammar checker in real writing tasks in an advanced course at a Swedish university. The objective of the study was to investigate how students made use of the grammar checker in their writing while learning Swedish as a second language. Sixteen students with different linguistic and cultural backgrounds participated in the study. A judgment procedure was conducted by the learners on the alarms from the grammar checker. The students’ texts were also collected in two versions; a version written before the session with the grammar checker, and a version after the session. This procedure made it possible to study to what extent the students followed the advice from the grammar checker, and how this was related to their judgments of its behavior.The results obtained demonstrated that although most of the alarms from the grammar checker were accurate, some alarms were very hard for the students to judge correctly. The results also showed that providing the student with feedback on different aspects of their target language use; not only on their errors, and facilitating the processes of language exploration and reflection are important processes to be supported in second-language learning environments.Based on these results, design principles were identified and integrated in the development of Grim, an interactive language-learning program for Swedish. We present the design of Grim, which is grounded in visualization of grammatical categories and examples of language use, providing tools for both focus on linguistic code features and language comprehension.  相似文献   

15.
《Computers & Education》2008,50(4):1122-1146
This paper presents a field study carried out with learners who used a grammar checker in real writing tasks in an advanced course at a Swedish university. The objective of the study was to investigate how students made use of the grammar checker in their writing while learning Swedish as a second language. Sixteen students with different linguistic and cultural backgrounds participated in the study. A judgment procedure was conducted by the learners on the alarms from the grammar checker. The students’ texts were also collected in two versions; a version written before the session with the grammar checker, and a version after the session. This procedure made it possible to study to what extent the students followed the advice from the grammar checker, and how this was related to their judgments of its behavior.The results obtained demonstrated that although most of the alarms from the grammar checker were accurate, some alarms were very hard for the students to judge correctly. The results also showed that providing the student with feedback on different aspects of their target language use; not only on their errors, and facilitating the processes of language exploration and reflection are important processes to be supported in second-language learning environments.Based on these results, design principles were identified and integrated in the development of Grim, an interactive language-learning program for Swedish. We present the design of Grim, which is grounded in visualization of grammatical categories and examples of language use, providing tools for both focus on linguistic code features and language comprehension.  相似文献   

16.
17.
18.
The Quranic Arabic Corpus (http://corpus.quran.com) is a collaboratively constructed linguistic resource initiated at the University of Leeds, with multiple layers of annotation including part-of-speech tagging, morphological segmentation (Dukes and Habash 2010) and syntactic analysis using dependency grammar (Dukes and Buckwalter 2010). The motivation behind this work is to produce a resource that enables further analysis of the Quran, the 1,400 year-old central religious text of Islam. This project contrasts with other Arabic treebanks by providing a deep linguistic model based on the historical traditional grammar known as i′rāb (?????). By adapting this well-known canon of Quranic grammar into a familiar tagset, it is possible to encourage online annotation by Arabic linguists and Quranic experts. This article presents a new approach to linguistic annotation of an Arabic corpus: online supervised collaboration using a multi-stage approach. The different stages include automatic rule-based tagging, initial manual verification, and online supervised collaborative proofreading. A popular website attracting thousands of visitors per day, the Quranic Arabic Corpus has approximately 100 unpaid volunteer annotators each suggesting corrections to existing linguistic tagging. To ensure a high-quality resource, a small number of expert annotators are promoted to a supervisory role, allowing them to review or veto suggestions made by other collaborators. The Quran also benefits from a large body of existing historical grammatical analysis, which may be leveraged during this review. In this paper we evaluate and report on the effectiveness of the chosen annotation methodology. We also discuss the unique challenges of annotating Quranic Arabic online and describe the custom linguistic software used to aid collaborative annotation.  相似文献   

19.
Classical mobile phone keypads which consist of 12 buttons are commonly used to write short text messages through two common methods, the multi-tap and the predictive text entry. For the Arabic language mobile keypads, all Arabic letters are distributed over the 8 buttons of the keypad where three or more letters share the same button. In this paper, a new text entry environment is proposed. The environment includes two proposed improved approaches for Arabic language messages to make the multi-tap text entry method faster and easier. The first approach is based on the idea of remapping the distribution of Arabic letters on the keypad according to the frequency of letters. In the second approach, a bi-Gram based method is used to predict the next letter to be typed automatically. The proposed approaches are evaluated using a corpus of 1514 real Arabic text messages. Several experiments were conducted to evaluate the proposed text entry environment. The results of the experiments have showed that using the proposed remapped keypad is faster and consumes less effort in comparison to the classical keypad.  相似文献   

20.
Printed Arabic character recognition using HMM   总被引:1,自引:0,他引:1       下载免费PDF全文
The Arabic Language has a very rich vocabulary. More than 200 million people speak this language as their native speaking, and over 1 billion people use it in several religion-related activities. In this paper a new technique is presented for recognizing printed Arabic characters. After a word is segmented, each character/word is entirely transformed into a feature vector. The features of printed Arabic characters include strokes and bays in various directions, endpoints, intersection points, loops, dots and zigzags. The word skeleton is decomposed into a number of links in orthographic order, and then it is transferred into a sequence of symbols using vector quantization. Single hidden Markov model has been used for recognizing the printed Arabic characters. Experimental results show that the high recognition rate depends on the number of states in each sample.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号