首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A dramatic work may be seen either as an event or as a text; the TEI guidelines make it possible to encode a dramatic work in either way, but do not attempt to solve the difficult problem of doing both at once. The basic element of a dramatic work, when seen as a text, is the speech; the guidelines also provide elements for encoding other familiar parts of dramatic texts (such as stage directions and cast lists), as well as for encoding analytic information on various aspects of texts and performances that is not normally included in printed dramatic texts. There are often other formal structures in dramatic works that intersect with the structure of speeches — metrical structures, for example; we discuss approaches for encoding these structures.John Lavagnino is a graduate student in English and American Literature at Brandeis University. His fields of interest include Renaissance drama, modern literature, textual scholarship, and electronic textuality. He is Electronics Editor ofThe Collected Works of Thomas Middleton (forthcoming from Oxford University Press).Elli Mylonas is a Lead Project Analyst for the Scholarly Technology Group at Brown University. Formerly she was the Managing Editor of the Perseus Project. Her areas of interest are Roman poetry, textual markup and SGML, and hypertext.The work described in this paper is the outcome of the discussions of the Performance Working Group, whose members are Elli Mylonas (chair), Rosanne G. Potter, John Lavagnino, and Lou Burnard. The authors wish to thank the other two members for their contributions.  相似文献   

2.
This paper discusses one of the tools which may be used for representing texts in machine-readable form, i.e. encoding systems or markup languages. This discussion is at the same time a report on current tendencies in the field. An attempt is made at reconstructing some of the main conceptions of text lying behind these tendencies. It is argued that, although the conceptions of texts and text structures inherent in these tendencies seem to be misguided, text encoding is nevertheless a fruitful approach to the study of texts. Finally, some conclusions are drawn concerning the relevance of this discussion to themes in text linguistics.Claus Huitfeldt studied philosophy at the University of Trondheim, writing his dissertation on the nature of transcendental arguments. He then worked for several years at the Norwegian Computing Centre for the Humanities, at the Norwegian Wittgenstein Project, and as Research Fellow in philosophy, before becoming Director of the Wittgenstein Archives at the University of Bergen. He has published a number of papers on text encoding.  相似文献   

3.
This paper report on some of the concrete outcomes of a larger research project on the study of syntactic change. In this part of the project, we are collecting and encoding historical texts and tagging them for syntactic analysis. We have so far produced a TEI-conformant version of an Old French text, La Vie de Saint Louis written by Jehan de Joinville around 1305, and we are in the process of adding syntactic tags to this text. Those syntactic tags are derived from the Penn-Helsinki coding scheme, which had been devised for the syntactic encoding of Middle English texts, and have been translated into TEI.Thus this paper addresses two issues: the development of a TEI encoding for the text, and the adaptation of the Penn-Helsinki syntactic coding scheme. While the first part of this work raises issues of a textual nature independently of the language of the text, and proposes concrete immediate solutions, the second part points to a more general extension of the PH tagset to other types of texts and to other languages.  相似文献   

4.
中英文双语交叉过滤的逻辑模型   总被引:7,自引:1,他引:6  
文章简要地描述了文本过滤的背景,提出了基于潜在语义索引的中英文双语交叉过滤的逻辑模型。其基本思想是改进双语交叉过滤中基于词汇对译的方法,而是利用双语文本中潜在的语义结构,作为用户模板与文本匹配的基础。将出现的双语词汇和文本映射为语义空间的向量,不必翻译对译词,甚至不需要出现相应的对译词,也能匹配成功,极大地改善了交叉过滤的精度,效果良好。  相似文献   

5.
The digitisation of cultural heritage and linguistics texts has long been troubled by the problem of how to represent overlapping structures arising from different markup perspectives (‘overlapping hierarchies’) or from different versions of the same work (‘textual variation’). These two problems can be reduced to one by observing that every case of overlapping hierarchies is also a case of textual variation. Overlapping textual structures can be accurately modelled either as a minimally redundant directed graph, or, more practically, as an ordered list of pairs, each containing a set of versions and a fragment of text or data. This ‘pairs-list’ representation is provably equivalent to the graph representation. It can record texts consisting of thousands of versions or perspectives without becoming overloaded with data, and the most common operations on variant text, e.g. comparison between two versions, can be performed in linear time. This representation also separates variation or other overlapping structures from the document content, leading to a simplification of markup suitable for wiki-like web applications.  相似文献   

6.
Projects that attempt to encode variorum texts with the Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange will likely encounter situations where the text varies in its structure, as well as in its content. Although encoding textual variants at a separate level using a version control system may be attractive, the advantages in encoding text and variants in the same format are considerable. This paper proposes solutions to three problems that require more than the standard TEI textual critical elements: transposition, variation of meta-data, and insertion of incomplete structures.  相似文献   

7.
Drawing on theories in evolutionary biology, research on hypertext navigation has posited two profiles to capture how students navigate information sources: the satisficing and sampling approaches of text access. While students engaged in sampling work to identify an optimal source to exploit for information, students who adopt a satisficing approach to text use spend time on accessing the first text they visit that meets some threshold of acceptability. This study examines the manifestation of these profiles when students navigate multiple, non-hyperlinked texts, without time limitations. Evidence was found for a satisficing, but not a sampling, approach to multiple text navigation. Four sub-profiles of satisficing approaches were identified. Students in the limited navigation profile devoted little time to text access. Students in the primary profile devoted the bulk of access time to a single text. Those in the distributed profile visited the texts they accessed for fairly uniform periods of time. Students in the discriminating profile visited certain texts for substantial periods of time, while accessing other texts to a more limited extent. These four navigation profiles were found to be differentially associated with other metrics of text access (e.g., whether texts were revisited), ratings of text usefulness, and task performance.  相似文献   

8.
This article focuses on the conceptual issues faced by scholarlyeditors and textual studies specialists. Theoretical debatein this general field is still active as digital texts presentspecial problems and magnify others. Older theory and methodologyare hampered by unacknowledged, sometimes inappropriate culturalvalues and other limitations, and are not always useful in connectionwith digital texts. Nevertheless, the distinction between theabstract work and its concrete expression is influential bothwithin and outside the field. In this approach, the conceptof authenticity relates to the degree of change a work undergoesor the accuracy of the ‘instructions’ for its reconstitution.Whether the digital text is best thought of as immaterial ormaterial is not as crucial as might first appear. The way adigital text is made visible is important, though potentiallyparadoxical. In order to be workable, the concept of authenticationby instructions needs further technical assistance, like thatprovided by the Just-in-Time Markup System. But, despite itslimitations, traditional textual scholarship still has muchto offer textual studies in digital environments.  相似文献   

9.
This work is in the context of, a system thatwatches over the users as they type a translation andrepeatedly suggests completions for the text already entered.The users may either accept, modify, or ignore these suggestions. Wedescribe the design, implementation, and performance of aprototype which suggests completions of units of texts that arelonger than one word.  相似文献   

10.
Collocations are understood in this work as the nonrandom combination of two or more lexical units that is typical for both a language as a whole (texts of any type) and a definite type of text. A text is a structured sequence of units of different levels; collocations, as complex text substructures, act as an important object when investigating text analysis procedures. In selecting collections of different types as materials, we study both the general patterns and properties of the analyzed collections. This paper devotes its main attention to digrams that were extracted from a collection of news texts.  相似文献   

11.
In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.  相似文献   

12.
Text categorization is an important research area of text mining. The original purpose of text categorization is to recognize, understand and organize different types of texts or documents. The general categorization approaches are treated as supervised learning, which infers similarity among a collection of categorized texts for training purposes. The existing categorization approaches are obviously not content-oriented and constrained at single word level.This paper introduces an innovative content-oriented text categorization approach named as CogCate. Inspired by cognitive situation models, CogCate exploits a human cognitive procedure in categorizing texts. In addition to traditional statistical analysis at word level, CogCate also applies lexical/semantical analysis, which ensures the accuracy of categorization. The evaluation experiments have testified the performance of CogCate. Meanwhile, CogCate remarkably reduces the time and effort spent on software training and maintenance of text collections. Our research work attests that interdisciplinary research efforts benefit text categorization.  相似文献   

13.
14.
We present an integrated knowledge representation system for natural language processing (NLP) whose main distinguishing feature is its emphasis on encoding not only the usual propositional structure of the utterances in the input text, but also capturing an entire complex of nonpropositional — discourse, attitudinal, and other pragmatic — meanings that NL texts always carry. The need for discourse pragmatics, together with generic semantic information, is demonstrated in the context of anaphoric and definite noun phrase resolution for accurate machine translation. The major types of requisite pragmatic knowledge are presented, and an extension of a frame-based formalism developed in the context of the TRANSLATOR system is proposed as a first-pass codification of the integrated knowledge base.  相似文献   

15.
The automatic generation of summaries using cases (GARUCAS) environment was designed as an intelligent system to help one learn to summarize narrative texts by means of examples within a case‐based reasoning (CBR) approach. Each example, modeled as a case, contains a conceptual representation of the initial textual state, the different steps of the summarization method, and the representation of the final textual state obtained. The CBR approach allows the environment to summarize new texts in order to produce new text summarization examples with respect to some predefined educational objectives. Within GARUCAS, this approach is used at two levels: an event level (EL) in order to identify essential elements of a story, and the clause level (CL) to make the summary more readable. The purpose of this article is to describe the GARUCAS environment and the model used to build story summarization examples and summarize new texts. This model is based on important psycholinguistic work concerning event and narrative structures and text revision rules. An experiment was conducted with 12 short stories. The GARUCAS environment can classify the stories according to their structure analogy and reuse the summarization method of the most similar text. Such an approach can be reused for any kind of texts or summary types. © 2003 Wiley Periodicals, Inc.  相似文献   

16.
A novel approach is introduced in this paper for the implementation of a question–answering based tool for the extraction of information and knowledge from texts. This effort resulted in the computer implementation of a system answering bilingual questions directly from a text using Natural Language Processing. The system uses domain knowledge concerning categories of actions and implicit semantic relations. The present state of the art in information extraction is based on the template approach which relies on a predefined user model. The model guides the extraction of information and the instantiation of a template that is similar to a frame or set of attribute value pairs as the result of the extraction process. Our question–answering based approach aims to create flexible information extraction tools accepting natural language questions and generating answers that contain information extracted from text either directly or after applying deductive inference. Our approach also addresses the problem of implicit semantic relations occurring either in the questions or in the texts from which information is extracted. These relations are made explicit with the use of domain knowledge. Examples of application of our methods are presented in this paper concerning four domains of quite different nature. These domains are: oceanography, medical physiology, aspirin pharmacology and ancient Greek law. Questions are expressed both in Greek and English. Another important point of our method is to process text directly avoiding any kind of formal representation when inference is required for the extraction of facts not mentioned explicitly in the text. This idea of using text as knowledge base was first presented in Kontos [7] and further elaborated in [9,11,12] as the ARISTA method. This is a new method for knowledge acquisition from texts that is based on using natural language itself for knowledge representation.  相似文献   

17.
Recently research on text mining has attracted lots of attention from both industrial and academic fields. Text mining concerns of discovering unknown patterns or knowledge from a large text repository. The problem is not easy to tackle due to the semi-structured or even unstructured nature of those texts under consideration. Many approaches have been devised for mining various kinds of knowledge from texts. One important aspect of text mining is on automatic text categorization, which assigns a text document to some predefined category if the document falls into the theme of the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. These maps were then analyzed to obtain the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.  相似文献   

18.
目的 目前基于卷积神经网络(CNN)的文本检测方法对自然场景中小尺度文本的定位非常困难。但自然场景图像中文本目标与其他目标存在很强的关联性,即自然场景中的文本通常伴随特定物体如广告牌、路牌等同时出现,基于此本文提出了一种顾及目标关联的级联CNN自然场景文本检测方法。方法 首先利用CNN检测文本目标及包含文本的关联物体目标,得到文本候选框及包含文本的关联物体候选框;再扩大包含文本的关联物体候选框区域,并从原始图像中裁剪,然后以该裁剪图像作为CNN的输入再精确检测文本候选框;最后采用非极大值抑制方法融合上述两步生成的文本候选框,得到文本检测结果。结果 本文方法能够有效地检测小尺度文本,在ICDAR-2013数据集上召回率、准确率和F值分别为0.817、0.880和0.847。结论 本文方法顾及自然场景中文本目标与包含文本的物体目标的强关联性,提高了自然场景图像中小尺度文本检测的召回率。  相似文献   

19.
Abstract

We report two studies investigating readers' ability to allocate limited time adaptively across online texts of varying difficulty. In both studies participants were asked to learn about the human heart and were free to allocate time to 4 separate online texts about the heart but did not have enough time to read them all thoroughly. Of particular interest was whether readers attempted to select the best text for them (by sampling the texts before reading) or to monitor texts while reading them and continue reading any text judged good enough (a satisficing strategy). We argue that both strategies can be considered adaptive, depending on properties of readers, texts, and tasks. Experiment 1 tested readers with a range of background knowledge and allowed them either 7 or 15 min study time. It showed that participants were adaptive in how they allocated their time in that more knowledgeable readers spent more time reading more difficult texts. Satisficing was a much more common strategy than sampling. Experiment 2 showed that providing outline overviews of each text dramatically increased the number of participants using a sampling strategy so that it became the modal strategy. However, this change in strategy had no effect on learning. Outline overviews presumably changed readers' perception of the ease with which relevant dimensions of text quality can be judged.  相似文献   

20.
Some guidelines are given to meet the observed need for rules about layout, the use of colour and typography on display screens so as to create texts with optimal legibility. Examples of videotex pages are used to illustrate right and wrong layouts, applications of colour and of letter type. The guidelines can be generalized to other types of display such as those used in personal computers and, to a more limited extent, to the use of graphics instead of text. Finally, figures are given on the general public's subjective appreciation of some alternative display layouts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号