首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Named Entity Recognition and Classification (NERC) is an important component of applications like Opinion Tracking, Information Extraction, or Question Answering. When these applications require to work in several languages, NERC becomes a bottleneck because its development requires language-specific tools and resources like lists of names or annotated corpora. This paper presents a lightly supervised system that acquires lists of names and linguistic patterns from large raw text collections in western languages and starting with only a few seeds per class selected by a human expert. Experiments have been carried out with English and Spanish news collections and with the Spanish Wikipedia. Evaluation of NE classification on standard datasets shows that NE lists achieve high precision and reveals that contextual patterns increase recall significantly. Therefore, it would be helpful for applications where annotated NERC data are not available such as those that have to deal with several western languages or information from different domains.  相似文献   

2.
The semantics of modelling languages are not always specified in a precise and formal way, and their rather complex underlying models make it a non-trivial exercise to reuse them in newly developed tools. We report on experiments with a virtual machine-based approach for state space generation. The virtual machine’s (VM) byte-code language is straightforwardly implementable, facilitates reuse and makes it an adequate target for translation of higher-level languages like the SPIN model checker’s Promela, or even C. As added value, it provides efficiently executable operational semantics for modelling languages. Several tools have been built around the VM implementation we developed, to evaluate the benefits of the proposed approach.  相似文献   

3.
Iyengar  Arun K.  Squillante  Mark S.  Zhang  Li 《World Wide Web》1999,2(1-2):85-100
In this paper we develop a general methodology for characterizing the access patterns of Web server requests based on a time‐series analysis of finite collections of observed data from real systems. Our approach is used together with the access logs from the IBM Web site for the Olympic Games to demonstrate some of its advantages over previous methods and to construct a particular class of benchmarks for large‐scale heavily‐accessed Web server environments. We then apply an instance of this class of benchmarks to analyze aspects of large‐scale Web server performance, demonstrating some additional problems with methods commonly used to evaluate Web server performance at different request traffic intensities. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

4.
Modern literary scholars must combine access to vast collections of text with the traditional close analysis of their field. In this paper, we discuss the design and development of tools to support this work. Based on analysis of the needs of literary scholars, we constructed a suite of visualization tools for the analysis of large collections of tagged text (i.e. text where one or more words have been annotated as belonging to a specific category). These tools unite the aspects of the scholars’ work: large scale overview tools help to identify corpus‐wide statistical patterns while fine scale analysis tools assist in finding specific details that support these observations. We designed visual tools that support and integrate these levels of analysis. The result is the first tool suite that can support the multilevel text analysis performed by scholars, combining standard visual elements with novel methods for selecting individual texts and identifying represenative passages in them.  相似文献   

5.
Conclusion In this article, we consider a number of important issues of development, implementation, and use of simulation tools and methods as an essential part of the larger scientific and engineering problem of developing methods and tools for computer-aided design of computer systems. We demonstrate the key role of design languages in the simulation technology. The most advanced among the available design languages are already used in most stages of computer-aided design, i.e., they are actually broad-spectrum languages, and can be developed even further. The use of design languages is the basis for the development of expert systems that utilize and accumulate the practical experience of designers and developers: design solutions, design methods and techniques, optimization, simulation, testing and proving of the efficiency and correctness of design solutions. The authors will gladly cooperate with any organizations and individuals for further development, efficient implementation, and practical application of the languages ALGORITM and VHDL. Translated from Kibernetika i Sistemnyi Analiz, No. 3, pp. 48–62, May–June, 1995.  相似文献   

6.
7.
We propose an approach for the word-level indexing of modern printed documents which are difficult to recognize using current OCR engines. By means of word-level indexing, it is possible to retrieve the position of words in a document, enabling queries involving proximity of terms. Web search engines implement this kind of indexing, allowing users to retrieve Web pages on the basis of their textual content. Nowadays, digital libraries hold collections of digitized documents that can be retrieved either by browsing the document images or relying on appropriate metadata assembled by domain experts. Word indexing tools would therefore increase the access to these collections. The proposed system is designed to index homogeneous document collections by automatically adapting to different languages and font styles without relying on OCR engines for character recognition. The approach is based on three main ideas: the use of self organizing maps (SOM) to perform unsupervised character clustering, the definition of one suitable vector-based word representation whose size depends on the word aspect-ratio, and the run-time alignment of the query word with indexed words to deal with broken and touching characters. The most appropriate applications are for processing modern printed documents (17th to 19th centuries) where current OCR engines are less accurate. Our experimental analysis addresses six data sets containing documents ranging from books of the 17th century to contemporary journals.  相似文献   

8.
Multidimensional Visualization techniques are invaluable tools for analysis of structured and unstructured data with variable dimensionality. This paper introduces PEx-ImageProjection Explorer for Images—a tool aimed at supporting analysis of image collections. The tool supports a methodology that employs interactive visualizations to aid user-driven feature detection and classification tasks, thus offering improved analysis and exploration capabilities. The visual mappings employ similarity-based multidimensional projections and point placement to layout the data on a plane for visual exploration. In addition to its application to image databases, we also illustrate how the proposed approach can be successfully employed in simultaneous analysis of different data types, such as text and images, offering a common visual representation for data expressed in different modalities.  相似文献   

9.
Interactive history tools, ranging from basic undo and redo to branching timelines of user actions, facilitate iterative forms of interaction. In this paper, we investigate the design of history mechanisms for information visualization. We present a design space analysis of both architectural and interface issues, identifying design decisions and associated trade-offs. Based on this analysis, we contribute a design study of graphical history tools for Tableau, a database visualization system. These tools record and visualize interaction histories, support data analysis and communication of findings, and contribute novel mechanisms for presenting, managing, and exporting histories. Furthermore, we have analyzed aggregated collections of history sessions to evaluate Tableau usage. We describe additional tools for analyzing users’ history logs and how they have been applied to study usage patterns in Tableau.  相似文献   

10.
Object modelling languages are graphical semi-formal specification languages. They are tools to capture and formalise requirements in the earlier phases of software development, as well as providing support for describing designs, software architecture and even detailed implementations later in the process. One can consider these languages to have reached some level of maturity, especially because their precursors, the Object-Oriented Analysis and Design methods, have now been used and tested intensively in industry for many years. In addition, these modelling languages have been the subject of many improvements by the scientific community. Nevertheless, some dissatisfaction persists. In this paper, we aim to re-analyse several parts of the deep structure of two leading object modelling languages: OML and UML, in order to show how they can really increase software quality. Their structure is based on metamodelling, which is the way the semantics of these two languages is expressed. This structure is also the source of a proliferation of modelling constructs (for example, different forms of inheritance associated with distinct notational elements) whose use must clearly influence, in particular, reusability — a key expectation in a software engineering process. More generally, we identify some deficiencies in these languages, which allows us to highlight some appropriate evolutionary paths. In discussing dynamic metamodelling and scalability, we specifically outline that a main current drawback is the difficulty of implementing these languages in Computer-Aided Software Engineering tools. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

11.
The merger of three-dimensional graphics with the X Window System has recently been standardized by adapting PHIGS, the Programmer's Hierarchical Interactive Graphics System, to the X Window System with PEX, the PHIGS Extension to X. The standard programming library for PEX has been defined to be identical to PHIGS PLUS allowing PHIGS programs to port directly to the X environment. X uses a client server model to run applications as client processes which communicate with a server to perform graphical display and input. For improved performance, the PEX extension defines new server resources to reduce network traffic and to take advantage of graphics hardware existing on high-end servers. A side effect of this distributed model of computation is a distribution of PHIGS structures leading to a relaxation of the exclusive access which a PHIGS application usually maintains over its Central Structure Store. We exploit the distributed nature of a PEX/PHIGS client's Central Structure Store to provide access to it for other applications besides the originating PEX/PHIGS client. We refer to these other applications as tools since one of our primary goals is to create development tools for PHIGS programmers. Rather than concentrate on particular debugging tools, we focus upon easing the process of actually developing tools. Our goal is to supply a collection of routines which can be used by PHIGS programmers to create custom tools or other programs which require access to the graphics data of remote PHIGS processes. Our Tool Development Library provides the PHIGS programmer a small number of management routines which orchestrate the connection and mapping to the data of one or more remote PHIGS applications. Manipulation of remote PHIGS structures is accomplished just as easily as local operations and is performed using standard PHIGS calls. The remote application being accessed requires no changes to its source code. Obvious uses for the Tool Development Library are in the construction of PHIGS tools such as structure browsers, editors and debugging aids. Less obvious is the potential for developing collections of cooperating graphics applications which share graphics data.  相似文献   

12.
Easy on that trigger dad: a study of long term family photo retrieval   总被引:1,自引:1,他引:0  
We examine the effects of new technologies for digital photography on people’s longer term storage and access to collections of personal photos. We report an empirical study of parents’ ability to retrieve photos related to salient family events from more than a year ago. Performance was relatively poor with people failing to find almost 40% of pictures. We analyze participants’ organizational and access strategies to identify reasons for this poor performance. Possible reasons for retrieval failure include: storing too many pictures, rudimentary organization, use of multiple storage systems, failure to maintain collections and participants’ false beliefs about their ability to access photos. We conclude by exploring the technical and theoretical implications of these findings.  相似文献   

13.
The definition of security policies in information systems and programming applications is often accomplished through traditional low level languages that are difficult to use. This is a remarkable drawback if we consider that security policies are often specified and maintained by top level enterprise managers who would probably prefer to use simplified, metaphor oriented policy management tools.To support all the different kinds of users we propose a suite of visual languages to specify access and security policies according to the role based access control (RBAC) model. Moreover, a system implementing the proposed visual languages is proposed. The system provides a set of tools to enable a user to visually edit security policies and to successively translate them into (eXtensible Access Control Markup Language) code, which can be managed by a Policy Based Management System supporting such policy language.The system and the visual approach have been assessed by means of usability studies and of several case studies. The one presented in this paper regards the configuration of access policies for a multimedia content management platform providing video streaming services also accessible through mobile devices.  相似文献   

14.
Miro is a set of languages and tools that support the visual specification of file system security. Two visual languages are presented: the instance language, which allows specification of file system access, and the constraint language, which allows specification of security policies. Miro visual languages and tools are used to specify security configurations. A visual language is one whose entities are graphical, such as boxes and arrows, specifying means stating independently of any implementation the desired properties of a system. Security means file system protection: ensuring that files are protected from unauthorized access and granting privileges to some users, but not others. Tools implemented and examples of how these languages can be applied to real security specification problems are described  相似文献   

15.
16.
17.
The majority of the world’s languages have little to no NLP resources or tools. This is due to a lack of training data (“resources”) over which tools, such as taggers or parsers, can be trained. In recent years, there have been increasing efforts to apply NLP methods to a much broader swath of the world’s languages. In many cases this involves bootstrapping the learning process with enriched or partially enriched resources. We propose that Interlinear Glossed Text (IGT), a very common form of annotated data used in the field of linguistics, has great potential for bootstrapping NLP tools for resource-poor languages. Although IGT is generally very richly annotated, and can be enriched even further (e.g., through structural projection), much of the content is not easily consumable by machines since it remains “trapped” in linguistic scholarly documents and in human readable form. In this paper, we describe the expansion of the ODIN resource—a database containing many thousands of instances of IGT for over a thousand languages. We enrich the original IGT data by adding word alignment and syntactic structure. To make the data in ODIN more readily consumable by tool developers and NLP researchers, we adopt and extend a new XML format for IGT, called Xigt. We also develop two packages for manipulating IGT data: one, INTENT, enriches raw IGT automatically, and the other, XigtEdit, is a graphical IGT editor.  相似文献   

18.
Wilensky  R. 《Computer》1996,29(5):37-44
Work-centered digital information services are a set of library-like services meant to address work group needs. Workplace users especially need to access legacy documents and external collections. They also frequently want to retrieve information (rather than documents per se), and they require that digital information systems be integrated into established work practices. Realizing work-centered digital information systems requires a broad technical agenda. Three types of analysis-document image, natural language, and computer vision, are necessary to facilitate information extraction. Users also need new user interface paradigms and authoring tools to better access multimedia information, as well as improved protocols for client-program interaction with repositories (collections). Moreover, entirely new types of documents must be developed to exploit these capabilities. The system developed by the authors follows a client-server architecture, in which the servers are repositories implemented as databases supporting user-defined functions and user-defined access methods. The repositories also serve as indexing servers. The authors are creating a prototype set of information services called the California Environmental Digital Information System, which includes a diverse collection of environmental data  相似文献   

19.
20.
The Web has witnessed an enormous growth in the amount of semantic information published in recent years. This growth has been stimulated to a large extent by the emergence of Linked Data. Although this brings us a big step closer to the vision of a Semantic Web, it also raises new issues such as the need for dealing with information expressed in different natural languages. Indeed, although the Web of Data can contain any kind of information in any language, it still lacks explicit mechanisms to automatically reconcile such information when it is expressed in different languages. This leads to situations in which data expressed in a certain language is not easily accessible to speakers of other languages.The Web of Data shows the potential for being extended to a truly multilingual web as vocabularies and data can be published in a language-independent fashion, while associated language-dependent (linguistic) information supporting the access across languages can be stored separately. In this sense, the multilingual Web of Data can be realized in our view as a layer of services and resources on top of the existing Linked Data infrastructure adding (i) linguistic information for data and vocabularies in different languages, (ii) mappings between data with labels in different languages, and (iii) services to dynamically access and traverse Linked Data across different languages.In this article, we present this vision of a multilingual Web of Data. We discuss challenges that need to be addressed to make this vision come true and discuss the role that techniques such as ontology localization, ontology mapping, and cross-lingual ontology-based information access and presentation will play in achieving this. Further, we propose an initial architecture and describe a roadmap that can provide a basis for the implementation of this vision.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号