首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
This paper discusses the basic design of the encoding scheme described by the Text Encoding Initiative'sGuidelines for Electronic Text Encoding and Interchange (TEI document number TEI P3, hereafter simplyP3 orthe Guidelines). It first reviews the basic design goals of the TEI project and their development during the course of the project. Next, it outlines some basic notions relevant for the design of any markup language and uses those notions to describe the basic structure of the TEI encoding scheme. It also describes briefly the core tag set defined in chapter 6 of P3, and the default text structure defined in chapter 7 of that work. The final section of the paper attempts an evaluation of P3 in the light of its original design goals, and outlines areas in which further work is still needed.C. M. Sperberg-McQueen is a Senior Research Programmer at the academic computer center of the University of Illinois at Chicago; his interests include medieval Germanic languages and literatures and the theory of electronic text markup. Since 1988 he has been editor in chief of the ACH/ACL/ALLC Text Encoding Initiative. Lou Burnard is Director of the Oxford Text Archive at Oxford University Computing Services, with interests in electronic text and database technology. He is European Editor of the Text Encoding Initiative's Guidelines.  相似文献   

3.
One recurring theme in the TEI project has been the need to represent non-hierarchical information in a natural way — or at least in a way that is acceptable to those who must use it — using a technical tool that assumes a single hierarchical representation. This paper proposes solutions to a variety of such problems: the encoding of segments which do not reflect a document's primary hierarchy; relationships among non-adjacent segments of texts; ambiguous content; overlapping structures; parallel structures; cross-references; vague locations.David T. Barnard is Professor of Computing and Information Science at Queen's University. His research interests are in structured text processing and the compilation of programming languages. His recent publications include Tree-to-tree Correction for Document Trees, Queen's Technical Report, and Error Handling in a Parallel LR Substring Parser,Computer Languages, 19,4 (1993) 247–59.Lou Burnard is Director of the Oxford Text Archive at Oxford University Computing Services, with interests in electronic text and database technology. He is European Editor of the Text Encoding Initiative's Guidelines.Jean-Pierre Gaspart is with Associated Consultants and Software Engineers.Lynne A. Price (Ph.D., computer sciences, University of Wisconsin-Madison) is a senior software engineer at Frame Technology Corp. Her main area of research has been representing text structure for automatic processing. She has served on both the US and international SGML standards committee for several years and is the editor ofInternational Standard ISO/IEC 13673 on Conformance Testing for Standard Generalized Markup Language (SGML) Systems.C. M. Sperberg-McQueen is a Senior Research Programmer at the academic computer center of the University of Illinois at Chicago; his interests include medieval Germanic languages and literatures and the theory of electronic text markup. Since 1988 he has been editor in chief of the ACH/ACL/ALLC Text Encoding Initiative.Giovanni Battista Varile works for the Commission of the European Communities.This paper is derived from a working paper of the Metalanguage Committee entitled Notes on SGML Solutions to Markup Problems which was produced following a meeting of the committee in Luxembourg. The co-authors all participated in that meeting and provided input to this paper. Others serving on the committee at other times included David Durand (Boston University), Nancy Ide (Vassar College) and Frank Tompa (University of Waterloo).  相似文献   

4.
5.
This paper presents some aspects of the Silfide server, a system dedicated to the delivery of linguistic resources on the web. After presenting the main issues behind the design of such a system, we focus on the editorial choices related to the use of the Text Encoding Initiative to represent our textual documents. In particular, we focus on the accommodations we have had to carry with regards to the TEI header and address the trade-off between extensive enrichment and genericity of the primary data when one wants to precisely mark-up a given document content. As a whole, we show how essential the TEI has proven to be for a project such as ours both from a practical and conceptual point of view.  相似文献   

6.
Dictionary markup is one of the concerns of the Text Encoding Initiative (TEI), an international project for text encoding. In this paper, we investigate ways to use and extend the TEI encoding scheme for the markup of Korean dictionary entries. Since TEI suggestions for dictionary markup are mainly for western language dictionaries, we need to cope with problems to be encountered in encoding Korean dictionary entries. We try to extend and modify the TEI encoding scheme in the way suggested by the TEI. Also, we restrict the content model so that the encoded dictionary might be viewed as a database as well as a computerized, originally printed, dictionary. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

7.
This paper reports on a method for exploiting a bitext as the primary linguistic information source for the design of a generation environment for specialized bilingual documentation. The paper discusses such issues as Text Encoding Initiative (TEI), proposals for specialized corpus tagging, text segmentation and alignment of translation units and their allocation into translation memories, Document Type Definition (DTD), abstraction from tagged texts, and DTD deployment for bilingual text generation. The parallel corpus used for experimentation has two main features:  相似文献   

8.
This article begins by emphasizing the importance of terminology in this modern age of technical innovations and machine-based translation systems, establishing the need for a terminology interchange format, and distinguishing between lexicography and terminology. It then reviews previous attempts to establish terminology interchange formats and concludes with a forceful argument for a new system based on the TEI-based notions of elements and attributes.Alan K. Melby is Professor of Linguistics at Brigham Young University. He is involved in translation, technical communication, and computational language projects. He is the CEO of Linguatech International, the Chair of the Translation and Computers committee of the American Translators Association, and a member of the editorial board ofMachine Translation. He serves as Chair of the Terminology Workgroup of the Text Encoding Initiative, and as a member of the U.S. tag of ISO TC 37. He is an accredited translator (French to English).This paper is based on work performed in the TEI work group on Terminology, whose members include Alan Melby, Brigham Young U. (chair), Sue Ellen Wright, American Translators Association and other affiliations, Greg Shreve, Kent State U., Gerhard Budin, Infoterm, and Richard Strehlow, ASTM and other affiliations.Portions of this chapter appear in the following work: Melby, Alan, and Wright, Sue Ellen. 1995. Terminology Interchange. In:The Terminology Handbook, Sue Ellen Wright and Gerhard Budin, Eds. Amsterdam: John Benjamins.  相似文献   

9.
This paper chronicles the work of the TEI textual criticism working groups through several phases, documenting how and why the design goals were shaped by the requirements of several distinct user communities and by the nature of the textual evidence itself. Encoding schemes for the representation of physical details of textual witnesses were unified with encoding schemes for critical editing practices when it was observed that the two phenomena were inextricably layered and linked within real texts. Rationale is offered for the development teams' adherence to exceedingly general design principles: (a) the requirement that the encoding notations be neutral in text-theoretic terms; (b) the need to accommodate dramatically different text-transmission phenomena and research goals within diverse text-critical arenas; (c) the need for commensurability of the text-critical markup with encoding notations used in closely related text-analytic research. The paper also assesses the results of the effort in terms of the encoding scheme's adequacy for several scholarly purposes: suggestions are made concerning the need for programmatic testing, for refinement, and for extension of the encoding model to support a broader range of text-transmission phenomena and research objectives.Robin Cover serves as humanities computing technical consultant for the CELLAR Project (Computing Environment for Linguistic, Literary, and Anthropological Research), sponsored by SIL's Department of Academic Computing. His research involves conceptual modelling and functional design specification for a multilingual object-oriented document processing system and integrated bibliographic database management subsystem.Peter Robinson is Executive Officer for theCanterbury Tales Project, was chair of the TEI work-group on textual criticism, and is developer of the computer programCollate, widely used in the preparation of critical editions based on multiple witnesses. He acts as consultant to several publishers and critical edition projects.  相似文献   

10.
11.
12.
The TEI Header plays a vital role in the documentation and interchange of TEI conformant electronic texts. Moreover, this role is becoming increasingly important as more people follow the recommendations set out in TEI P3, and libraries, archives, and electronic text centres seek to share their holdings of electronic texts. However, the fact that TEI P3 allows for flexibility in the structure and content of TEI Headers has meant that divergent practices have begun to emerge within the numerous projects and initiatives creating TEI texts. With this in mind, the Oxford Text Archive hosted a one-day colloquium of leading TEI exponents, at which invited participants were encouraged to share their views and expertise on creating TEI Headers, and work together to develop some recommendations towards good practice.  相似文献   

13.
A late 1990 survey found that most historical editors in the United States continue to use the computer primarily as a word processing tool to prepare texts and editorial apparatus. Among older projects, a migration from mainframe or mini-computers to PCs has been the norm. New developments in the field include the Founding Fathers CD-ROM project, the impending release of Version 2.0 of NLCindex, and a strong interest in the Text Encoding Initiative. David R. Chesnutt is a Research Professor of History at the University of South Carolina, senior editor of The Papers of Henry Laurens, and president of the Association for Documentary Editing.  相似文献   

14.
While letters and correspondence materials serve as (in)valuablesources of information for historians, philologists, (socio-)linguists,biographers, and textual critics, modern editorial theory merelyassigns them a secondary role. Contrary to this traditionaldocumentary view, the authors of this article argue for a treatmentof epistolary materials as primary sources in their own right.They propose a generalized text-base approach of encoded andannotated correspondence materials that can accomodate the generationof versatile user-driven electronic editions. This approachneeds to address current lacunae in markup theory and practice,resulting in a lack for either provisions for the encoding ofletter-specific phenomena in texts, or encoding features forsuch generative editions. A closer look at broader editorialtheories reveals a deeper lack of understanding of the natureand hence definition of correspondence materials. The authorspropose a Jakobsonian communicative definition of letters thatto a great deal can be mapped onto the textual model of theText Encoding Initiative (TEI). The second part of this articlediscusses the motivation for and practical realization of DigitalArchive of Letters in Flanders (DALF), a formal framework forencoding correspondence materials which is defined as a TEIcustomization. Its most important features for capturing detailedmetadata as well as letter-specific source phenomena are analysedand discussed against the text-ontological background sketchedout before.  相似文献   

15.
This article summarizes the activities of the Istituto di Linguistica Computazionale. We discuss the Italian Multi-functional Lexical Databases; the projects focussing on linguistic analysis and generation; corpora in the MRF, textual databases and linguistic workstations; computer-assisted humanities teaching; and the various cooperative ventures, seminars and conferences offered by the Institute. Antonio Zampolli is a professor of computational linguistics at the University of Pisa where he is the director of the Institute of Computational Linguistics of the CNR, Pisa. He is also the president of the ALLC and a member of the Steering Committee of the Text Encoding Initiative. Professor Zampolli's research interests are computational linguistics and lexicology, lexiography, and text analysis. His publications includes Linguistic Structures Processing, North-Holland, 1977; and co-editor of Automating the Lexicon, OUP, (in press).  相似文献   

16.
There is a great deal of variation in the encoding of spoken texts in electronic form, both with respect to the types of features represented and the way particular features are rendered. This paper surveys problems in the electronic representation of speech and presents the solutions proposed by the Text Encoding Initiative. The special tags needed for the encoding of spoken texts are discussed, including a mechanism for temporal alignment. Further work is needed on phonological aspects, parallel representation, and on the development of software which connects the systematic underlying representation with a workable format for input and display.Stig Johansson is Professor of English Language at the Department of British and American Studies, University of Oslo. He is co-ordinating secretary of the International Computer Archive of Modern English (ICAME) and editor of theICAME Journal. Recent publications includeFrequency Analysis of English Vocabulary and Grammar (with Knut Hofland, Clarendon Press, 1989) andEnglish Computer Corpora (with Anna-Brita Stenström, Mouton de Gruyter, 1991).  相似文献   

17.
We present our experience in developing an on-line infrastructureto support collaborative analysis of text, which we distinguishfrom existing, well-explored efforts to create annotative electroniceditions. Using Faulkner's The Sound and the Fury as our primarytext case, we outline the features and rationale of our collaborativeframework, called Callimachus. We present our findings concerningthat text and explore how these findings only became possibleafter breaking with the received wisdom concerning the applicationof XML and the Text Encoding Initiative to such analytical projects.  相似文献   

18.
There is a great deal of variation in the encoding of spoken texts in electronic form, both with respect to the types of features represented and the way particular features are rendered. This paper surveys problems in the electronic representation of speech and presents the solutions proposed by the Text Encoding Initiative. The special tags needed for the encoding of spoken texts are discussed, including a mechanism for temporal alignment. Further work is needed on phonological aspects, parallel representation, and on the development of software which connects the systematic underlying representation with a workable format for input and display.  相似文献   

19.
采用文本编码来格式化消息时会增加客户和服务信道之间的数据传输量,且随着文件长度的增加传输量会线性增长,其性能会大打折扣.WCF(Windows Communication Foundation)作为新一代的网络传输开发架构,其在构建面向服务的应用程序方面具有无比强大的生产力,本文以这个架构为蓝本,在消息层采用MTOM(...  相似文献   

20.
Projects that attempt to encode variorum texts with the Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange will likely encounter situations where the text varies in its structure, as well as in its content. Although encoding textual variants at a separate level using a version control system may be attractive, the advantages in encoding text and variants in the same format are considerable. This paper proposes solutions to three problems that require more than the standard TEI textual critical elements: transposition, variation of meta-data, and insertion of incomplete structures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号