首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this paper, we present a system that automatically translates Arabic text embedded in images into English. The system consists of three components: text detection from images, character recognition, and machine translation. We formulate the text detection as a binary classification problem and apply gradient boosting tree (GBT), support vector machine (SVM), and location-based prior knowledge to improve the F1 score of text detection from 78.95% to 87.05%. The detected text images are processed by off-the-shelf optical character recognition (OCR) software. We employ an error correction model to post-process the noisy OCR output, and apply a bigram language model to reduce word segmentation errors. The translation module is tailored with compact data structure for hand-held devices. The experimental results show substantial improvements in both word recognition accuracy and translation quality. For instance, in the experiment of Arabic transparent font, the BLEU score increases from 18.70 to 33.47 with use of the error correction module.  相似文献   

2.
The topic of this paper is machine translation (MT) from French text into French sign language (LSF). After arguing in favour of a rule-based method, it presents the architecture of an original MT system, built on two distinct efforts: formalising LSF production rules and triggering them with text processing. The former is made without any concern for text or translation and involves corpus analysis to link LSF form features to linguistic functions. It produces a set of production rules which may constitute a full LSF production grammar. The latter is an information extraction task from text, broken down in as many subtasks as there are rules in the grammar. After discussing this architecture, comparing it to the traditional methods and presenting the methodology for each task, the paper present the set of production rules found to govern event precedence and duration in LSF and gives a progress report on the implementation of the rule triggering system. With this proposal, it is also hoped to show how MT can benefit today from sign language processing.  相似文献   

3.
基于分块的网页信息解析器的研究与设计   总被引:27,自引:1,他引:27  
详细介绍了网页信息解析的基本技术手段,在综合权衡优缺点的基础上,提出了针对新 闻网站复杂结构页面较为有效的分块算法,并结合实际的项目需求,设计实现了网页信息解析器 TVPS,实验结果表明,该解析器具有良好的性能,满足实际的需求。  相似文献   

4.
传统互联网页面是基于HTML语法结构的,这种结构适合于计算机上的显示.但页面所表达的含义需要用户在浏览的时候加以识别,这对于信息的检索和实现知识的共享是非常不便的。文章介绍了一种根据HMTL语法结构来实现HTML页面到RDF文档的转化方法,它可以将HTML文档从结构上转换为以XML语法为基础的RDF文档。  相似文献   

5.
HTTP cookies have been widely used for maintaining session states, personalizing, authenticating, and tracking user behaviors. Despite their importance and usefulness, cookies have raised public concerns on Internet privacy because they can be exploited by third-parties to track user behaviors and build user profiles. In addition, stolen cookies may also incur severe security problems. However, current Web browsers lack secure and convenient mechanisms for cookie management. A cookie management scheme, which is easy-to-use and has minimal privacy risk, is in great demand; but designing such a scheme is a challenge. In this paper, we conduct a large scale HTTP cookie measurement and introduce CookiePicker, a system that can automatically validate the usefulness of cookies from a Web site and set the cookie usage permission on behalf of users. CookiePicker helps users achieve the maximum benefit brought by cookies, while minimizing the possible privacy and security risks. We implement CookiePicker as an extension to Firefox Web browser, and obtain promising results in the experiments.  相似文献   

6.
为提升自动控制效果,加快翻译速率,设计基于智能语音的翻译机器人自动化控制系统。采集外界智能语音信号,利用A/D转换器得到数字信号,启动语音唤醒模块激活翻译机器人,听写模式识别复杂语音信号,命令模式识别简单语音信号,得到语言文本识别结果,通过深度学习关键词检测方法提取关键词作为翻译机器人的自动化控制指令,通过单片机识别自动化控制指令。实验结果表明,该系统可有效采集外界智能语音信号,提取智能语音信号的关键词,完成翻译机器人自动化控制。  相似文献   

7.
秦颖 《计算机应用研究》2015,(2):326-329,335
随着机器翻译研究的推进和翻译教学方式的革新,译文质量自动评价问题近年来受到大量关注。为把握翻译质量自动评价的思路、方法,通过对目前研究脉络的梳理,从研究特点角度绘制出了一个树型分类图谱,并对典型算法及其改进思路进行了分析;还对自动评价算法的评测方法、国际机器翻译评测平台和自动评测开放工具等给予了介绍。最后分析了当前研究存在的主要困难和问题,提出了对发展方向的展望。  相似文献   

8.
An integrated automatic test data generation system   总被引:3,自引:0,他引:3  
The Godzilla automatic test data generator is an integrated collection of tools that implements a relatively new test data generation method—constraint-based testing—that is based on mutation analysis. Constraint-based testing integrates mutation analysis with several other testing techniques, including statement coverage, branch coverage, domain perturbation, and symbolic evaluation. Because Godzilla uses a rule-based approach to generate test data, it is easily extendible to allow new testing techniques to be integrated into the current system. This article describes the system that has been built to implement constraint-based testing. Godzilla's design emphasizes orthogonality and modularity, allowing relatively easy extensions. Godzilla's internal structure and algorithms are described with emphasis on internal structures of the system and the engineering problems that were solved during the implementation.Parts of this research were supported by Contract F30602-85-C-0255 through Rome Air Development Center while the author was a graduate student at the Georgia Institute of Technology.  相似文献   

9.
In this paper we present an automatic algorithm for registering and overlaying imagery. The algorithm basically attempts to find by successive approximations the best affine transformation or second order polynomial relating to the two images. The method requires the specification of only a matching pair of control points, then new control points are found approximately by extrapolating the old affine transformation to larger areas and then using correlation to find the best match. Thus an obvious advantage of this algorithm lies in its automatic features in locating and matching more potential ground control points. This paper also discusses the effect of the distribution of control points on the affine transformation. Finally, the method is tested on Landsat data and the results are discussed.  相似文献   

10.
A system is described which combines the facilities of automatic batch processing, tape management and bookkeeping. Although it was developed for use with high-energy physics experiments it could be adapted for more general use and therefore programming details have been excluded.  相似文献   

11.
针对传统网页排序算法Okapi BM25通常会出现网页与查询关键词领域无关的领域漂移现象,以及改进算法需要人工建立领域向量的问题,提出了一种基于BM25和Softmax回归分类模型的网页搜索排序算法。该方法首先对网页文本进行数据预处理并利用词袋模型进行网页文本的向量表示,之后通过少量的网页数据来训练Softmax回归分类模型,来预测测试网页数据的类别分数,并与BM25信息检索的分数结合在一起,得到最终的网页排序结果。实验结果显示该检索算法无须人工建立领域向量,即可达到很好的网页排序结果。  相似文献   

12.
Assessing the quality of candidate translations involves diverse linguistic facets. However, most automatic evaluation methods in use today rely on limited quality assumptions, such as lexical similarity. This introduces a bias in the development cycle which in some cases has been reported to carry very negative consequences. In order to tackle this methodological problem, we explore a novel path towards heterogeneous automatic Machine Translation evaluation. We have compiled a rich set of specialized similarity measures operating at different linguistic dimensions and analyzed their individual and collective behaviour over a wide range of evaluation scenarios. Results show that measures based on syntactic and semantic information are able to provide more reliable system rankings than lexical measures, especially when the systems under evaluation are based on different paradigms. At the sentence level, while some linguistic measures perform better than most lexical measures, some others perform substantially worse, mainly due to parsing problems. Their scores are, however, suitable for combination, yielding a substantially improved evaluation quality.  相似文献   

13.
14.
We report an easily setup, reliable and automatic microfluidic sample transfer and introduction system. Two different function liquid detection modules were developed to separately perform rapidly removing of a large approximate volume of air off chip and a low-speed high precision small volume of air purging process on chip incorporating liquid-on-chip handling module. As a proof of concept, we demonstrated that a small volume of radioactive sample as low as 5 μL could be successfully transferred and introduced from vials to the desired location in the microfluidic chip with minimal loss (2.1 ± 0.4 %, n = 3). The total time of the sample transfer and introduction was less than 1 min. The complete automation would facilitate the safe handling of the dangerous and toxic materials, such as radioactive compound.  相似文献   

15.
This paper proposes an integrated system for unconstrained face recognition in complex scenes. The scale and orientation tolerant system comprises a face detector followed by a recognizer. Given a color input image of a person, the face detector encloses the face from the complex scene within a circular boundary, and locates the position of the nose. A radial grid mapping centered on the nose is then performed to extract a feature vector within the boundary. The feature vector is input to a radial basis function neural network classifier for face identification. The proposed face detector achieved an average detection rate of 95.8% while the face recognizer achieved an average recognition rate of 97.5% on a database of 21 persons with variations in scale, orientation, natural illumination and background. The two modules were combined to form an automatic face recognition system that was evaluated in the context of a security system using a video database of 21 users and 10 intruders, acquired in an unconstrained environment. A recognition rate of 93.5% with 0% false acceptance rate was achieved.  相似文献   

16.
An automatic keyphrase extraction system for scientific documents   总被引:1,自引:0,他引:1  
Automatic keyphrase extraction techniques play an important role for many tasks including indexing, categorizing, summarizing, and searching. In this paper, we develop and evaluate an automatic keyphrase extraction system for scientific documents. Compared with previous work, our system concentrates on two important issues: (1) more precise location for potential keyphrases: a new candidate phrase generation method is proposed based on the core word expansion algorithm, which can reduce the size of the candidate set by about 75% without increasing the computational complexity; (2) overlap elimination for the output list: when a phrase and its sub-phrases coexist as candidates, an inverse document frequency feature is introduced for selecting the proper granularity. Additional new features are added for phrase weighting. Experiments based on real-world datasets were carried out to evaluate the proposed system. The results show the efficiency and effectiveness of the refined candidate set and demonstrate that the new features improve the accuracy of the system. The overall performance of our system compares favorably with other state-of-the-art keyphrase extraction systems.  相似文献   

17.
18.
张红艳  李淼 《计算机应用》2006,26(8):1925-1927
分析了结合Re Engineering技术的翻译器Bogart的工作原理,并与传统翻译器进行了比较,同时对源程序的存储方法提出了改进,采用了新的支持程序语言定义的语言处理系统。测试结果表明,该方法能够提高自动翻译的效率。  相似文献   

19.
In this paper, we present LinkingPark, an automatic semantic annotation system for tabular data to knowledge graph matching. LinkingPark is designed as a modular framework which can handle Cell-Entity Annotation (CEA), Column-Type Annotation (CTA), and Columns-Property Annotation (CPA) altogether. It is built upon our previous SemTab 2020 system, which won the 2nd prize among 28 different teams after four rounds of evaluations. Moreover, the system is unsupervised, stand-alone, and flexible for multilingual support. Its backend offers an efficient RESTful API for programmatic access, as well as an Excel Add-in for ease of use. Users can interact with LinkingPark in near real-time, further demonstrating its efficiency.  相似文献   

20.
Most interactive "query-by-example" based image retrieval systems utilize relevance feedback from the user for bridging the gap between the user's implied concept and the low-level image representation in the database. However, traditional relevance feedback usage in the context of content-based image retrieval (CBIR) may not be very efficient due to a significant overhead in database search and image download time in client-server environments. In this paper, we propose a CBIR system that efficiently addresses the inherent subjectivity in user perception during a retrieval session by employing a novel idea of intra-query modification and learning. The proposed system generates an object-level view of the query image using a new color segmentation technique. Color, shape and spatial features of individual segments are used for image representation and retrieval. The proposed system automatically generates a set of modifications by manipulating the features of the query segment(s). An initial estimate of user perception is learned from the user feedback provided on the set of modified images. This largely improves the precision in the first database search itself and alleviates the overheads of database search and image download. Precision-to-recall ratio is improved in further iterations through a new relevance feedback technique that utilizes both positive as well as negative examples. Extensive experiments have been conducted to demonstrate the feasibility and advantages of the proposed system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号