首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although lexical frequencies are familiar measures of stylistic and thematic analysis, only recently have some stylostatisticians been tempted to investigate the relationship between the frequency and topography of repeated lexical items. In the present paper the authors have turned to the study of the four focal types of discursive narratology, using Marguerite Duras'Moderato Cantabile. Their intent is to uncover aspects of narratological performance which further elucidate the communicative strategies in the story. Part 1 summarizes the problematic between frequency and topography. It describes how a topographical index can be computed for any repeated item and how a Global Topography Index (GTI) can summarize the major topographical characteristics of any text sequence. Part 2 presents a four-cell typology of narrational mode: a segmentation of the verbal chain into narrating and narrated speech acts, with each text sequence tagged according to its discursive function: overt sender intervention for story coherence or comment on the focal level of a narrating present; representation of discrete or unlocalized events on the focal level of a mimeticized past. In Part 3 the focal encodings are displayed in numerical and graphic form, first according to the eight surface chapter divisions and then according to twenty-six subsets of approximately equal length. The fluctuations of the topography indices are reviewed, with particular attention being paid to the manifestation of cluster effects. Although sender interventions predominate, the relativized behavior of each focal type contributes to a climactic unraveling of the intrigue in the final chapters. In conclusion, the authors stress the dichotomy between the calm surface of the chapters and the agitated tensions of the twenty-six subsets.Richard Frautschi, Professor of French at Penn State University, is co-authoring a bibliography of French prose fiction, 1700–1800, and is preparing a book on the theory and practise of quantified discourse focalization.Philippe Thoiron, Professor and Director of the Centre de Recherche en Terminologie et Traduction at the Université de Lyon-2, specializes in quantitative studies of vocabulary (lexical richness and topography of repeated items). He is also engaged in research on languages for specific purposes and terminology.  相似文献   

2.
刘金岭  刘丹  周泓 《计算机工程》2012,38(10):67-69
提出一种基于知网的中文短信文本词汇链抽取方法。根据知网的语义关系,利用相同语义类给出上下文词汇项信息,构造多条词汇链,表达短信文本的多条叙事线索,从中抽取富含短信文本信息的词汇链,表达短信文本的语义信息,采用词汇链的关键词集合进行文本分类。实验结果证明,该方法的抽取准确率较高,文本分类速度较快。  相似文献   

3.
基于词汇链的中文短信主题语句抽取方法   总被引:1,自引:0,他引:1  
提出一种基于词汇链的中文短信文本主题的抽取方法。该方法首先通过构造多条词汇链来表达短信文本的叙事线索,并从多条词汇链中抽取出富含主题信息的词汇链,将其作为构造短信文本主题语句的关键词序列。实验表明该方法抽取的短信文本主题能够更全面地覆盖短信文本的信息,并能消除多个关键词序列表达同一主题信息的冗余。其效果明显优于采用统计信息进行短信文本主题抽取的方法。  相似文献   

4.
It has been assumed theoretically and established empirically that text signals exert an influence on text memorization and comprehension. The study investigates whether the restoration of the text visual signals improve text memorization and comprehension when automatically converting a text into speech. Participants listened to a restaurant menu oralized by a text-to-speech synthesis. The visual signals used in the menu were restored either with discursive segments, with prosodic cues, or with a picture of text, displayed before or during the listening. Participants had to perform tasks assessing their text memorization and comprehension. The restoration of text visual signals exerts an influence on the participants’ recall but these effects vary according to the restoration mean used and to the task. When visual signals are not restored, individuals construct an erroneous representation of the situation described in the text leading to a misinterpretation of the text meaning, whereas the discursive and prosodic restorations involve the construction of an adequate representation.  相似文献   

5.
This paper describes an experimental method for automatic text genre recognition based on 45 statistical, lexical, syntactic, positional, and discursive parameters. The suggested method includes: (1) the development of software permitting heterogeneous parameters to be normalized and clustered using the k-means algorithm; (2) the verification of parameters; (3) the selection of the parameters that are the most significant for scientific, newspaper, and artistic texts using two-factor analysis algorithms. Adaptive summarization algorithms have been developed based on these parameters.  相似文献   

6.
文本聚类算法面临着文本向量高维和极度稀疏的问题,传统降维方法多数是在假设关键词相互独立的前提下,通过统计的方法进行特征提取,这种方法往往忽略了文本在上下文语境中的语义关系,导致文本语义大量丢失。利用《知网》知识库,通过计算语义类相似度,构建了带权值的多条词汇链,根据权值大小,从中选取权值最大和次大的前两个词汇链组成代表文本的关键词序列,在此基础上提出了基于主题词汇链的文本聚类算法—TCABTLC,不但可以解决文本向量高维和稀疏导致的聚类算法运行效率低的问题,而且得到了较好的聚类效果。实验表明,在保持较好准确率下,该聚类算法的时间效率得到了大幅度提高。  相似文献   

7.
Modern web technologies are enabling authors to create various forms of text visualization integration for storytelling. This integration may shape the stories' flow and thereby affect the reading experience. In this paper, we seek to understand two text visualization integration forms: (i) different text and visualization spatial arrangements (layout), namely, vertical and slideshow; and (ii) interactive linking of text and visualization (linking). Here, linking refers to a bidirectional interaction mode that explicitly highlights the explanatory visualization element when selecting narrative text and vice versa. Through a crowdsourced study with 180 participants, we measured the effect of layout and linking on the degree to which users engage with the story (user engagement), their understanding of the story content (comprehension), and their ability to recall the story information (recall). We found that participants performed significantly better in comprehension tasks with the slideshow layout. Participant recall was better with the slideshow layout under conditions with linking versus no linking. We also found that linking significantly increased user engagement. Additionally, linking and the slideshow layout were preferred by the participants. We also explored user reading behaviors with different conditions.  相似文献   

8.
一个面向文本分类的中文特征词自动抽取方法   总被引:1,自引:0,他引:1  
文章根据主流文本分类模型只对词频敏感、且只关注中高频词条的特点,设计实现了一个基于多步过滤汉字结合模式的无词典特征词自动抽取方法,并通过实验与传统的词典分词法进行了比较,结果表明,这种方法对于中高频词条的识别率接近于词典分词法,而分词速度则远远高于词典分词法,能够满足对大规模开放域文本进行快速特征词自动抽取的需求。  相似文献   

9.
How Variable May a Constant be? Measures of Lexical Richness in Perspective   总被引:1,自引:0,他引:1  
A well-known problem in the domain of quantitative linguistics and stylistics concerns the evaluation of the lexical richness of texts. Since the most obvious measure of lexical richness, the vocabulary size (the number of different word types), depends heavily on the text length (measured in word tokens), a variety of alternative measures has been proposed which are claimed to be independent of the text length. This paper has a threefold aim. Firstly, we have investigated to what extent these alternative measures are truly textual constants. We have observed that in practice all measures vary substantially and systematically with the text length. We also show that in theory, only three of these measures are truly constant or nearly constant. Secondly, we have studied the extent to which these measures tap into different aspects of lexical structure. We have found that there are two main families of constants, one measuring lexical richness and one measuring lexical repetition. Thirdly, we have considered to what extent these measures can be used to investigate questions of textual similarity between and within authors. We propose to carry out such comparisons by means of the empirical trajectories of texts in the plane spanned by the dimensions of lexical richness and lexical repetition, and we provide a statistical technique for constructing confidence intervals around the empirical trajectories of texts. Our results suggest that the trajectories tap into a considerable amount of authorial structure without, however, guaranteeing that spatial separation implies a difference in authorship.  相似文献   

10.
11.
Collocations are understood in this work as the nonrandom combination of two or more lexical units that is typical for both a language as a whole (texts of any type) and a definite type of text. A text is a structured sequence of units of different levels; collocations, as complex text substructures, act as an important object when investigating text analysis procedures. In selecting collections of different types as materials, we study both the general patterns and properties of the analyzed collections. This paper devotes its main attention to digrams that were extracted from a collection of news texts.  相似文献   

12.
This paper considers the uniqueness of the texts and discourses produced by a specific group of World Wide Web (WWW) users. These characteristics include the intertextuality of the WWW text and the resulting formation of textual domains where no particular text can claim centrality. This decentering is reported as the result of a process of reciprocal intertextuality. These unique characteristics of the WWW text eventually produce an image of the group of people who write and read the text. The specific characteristics of the Web discourse suggests alternative ways of thinking of cyber-communities around the specific discursive strategies used by the authors.  相似文献   

13.
该文以包括《红楼梦》在内的51部当代及明清文学作品为语料集,利用文档嵌入算法,根据文档嵌入向量的酉不变性定义了不同作者作品文档嵌入矩阵及文档嵌入损失函数,构建了文档嵌入模型中最优维度及最优窗口的选择模型,并根据文本用词和文档主题语义特征构建了高维空间中的文档嵌入向量。通过无监督的流形学习降维映射以及有监督的分类算法多组实验,验证了通过文档嵌入得到的向量空间模型可以有效区分不同作者的写作风格,对于已知确定作者的作品分类准确率达99.6%,对于风格较为接近的作者也可以有效识别,例如,文风相似的路遥和陈忠实。并在此分类模型的基础上,构建了变尺度滑动窗口分类模型对《红楼梦》进行深入分析,印证了“红楼梦”前80回与后40回可能来自不同作者,还发现了前100回与后20回也存在着较大的风格差异,不排除有再次更换作者的可能。该文在计算机技术层面上为《红楼梦》的作者辨析问题提供了一种支持意见和新的见解。  相似文献   

14.
This research investigates classification of documents according to the ethnic group of their authors and/or to the historical period when the documents were written. The classification is done using various combinations of six sets of stylistic features: quantitative, orthographic, topographic, lexical, function, and vocabulary richness. The application domain is Jewish Law articles written in Hebrew-Aramaic, languages that are rich in their morphological forms. Four popular machine learning methods have been applied. The logistic regression method led to the best accuracy results: about 99.6% while classifying to the ethnic group of their authors or to the historical period when the articles were written and about 98.3% while classifying to both classifications. The quantitative feature set was found as very successful and superior to all other sets. The lexical and function feature sets have also been found to be useful. The quantitative and the function features are domain independent and language independent. These two feature sets might be generalized to similar classification tasks for other languages and can therefore be useful for the text classification community at large.  相似文献   

15.
提出一种基于词汇链的判断变异垃圾短信方法。该方法通过构造多条词汇链来表达短信文本的叙事线索,再从多条词汇链中抽取出富含内容信息的词汇链,同时消除了多个关键词序列表达同一内容信息的冗余;将构造的词汇链作为短信文本的信息相互进行比较,以对变异的垃圾短信信息进行识别。实验结果表明,该方法能较准确地识别垃圾短信的变异信息。  相似文献   

16.
We introduce a dual-use methodology for automating the maintenance and growth of two types of knowledge sources, which are crucial for natural language text understanding—background knowledge of the underlying domain and linguistic knowledge about the lexicon and the grammar of the underlying natural language. A particularity of this approach is that learning occurs simultaneously with the on-going text understanding process. The knowledge assimilation process is centered around the linguistic and conceptual ‘quality' of various forms of evidence underlying the generation, assessment and on-going refinement of lexical and concept hypotheses. On the basis of the strength of evidence, hypotheses are ranked according to qualitative plausibility criteria, and the most reasonable ones are selected for assimilation into the already given lexical class hierarchy and domain ontology.  相似文献   

17.
We propose a novel probabilistic method, based on latent variable models, for unsupervised topographic visualisation of dynamically evolving, coherent textual information. This can be seen as a complementary tool for topic detection and tracking applications. This is achieved by the exploitation of the a priori domain knowledge available, that there are relatively homogeneous temporal segments in the data stream. In a different manner from topographical techniques previously utilized for static text collections, the topography is an outcome of the coherence in time of the data stream in the proposed model. Simulation results on both toy-data settings and an actual application on Internet chat line discussion analysis is presented by way of demonstration.  相似文献   

18.
和导航中应用广泛。文本聚类作为一种无监督学习算法,其依据是聚类假设:同类的文档相似程度大,不同类的文档相似程度小。文中主要研究汉语文本聚类算法在新闻标题类文本中的应用。首先对采集到的若干条新闻标题进行分词和特征提取,将分词后的文本转化为词条矩阵;然后使用TF-IDF技术处理词条矩阵,得到基于分词权重的新的词条矩阵,对新的词条矩阵进行奇异值分解,得到主成分得分矩阵,提取主成分分析文本特征并根据主成分得分矩阵进行K-均值和分层聚类分析;最后将聚类结果用词云图的形式展示出来并评价聚类效果的好坏。实证显示,对词条矩阵的奇异值分解能降低向量空间的维数,提高聚类的精度和运算速度。  相似文献   

19.
The Middle Dutch Arthurian romance Roman van Walewein (‘Romanceof Gawain’) is attributed in the text itself to two authors,Penninc and Vostaert. Very little quantitative research intothis dual authorship has been done. This article describes ourprogress in applying different non-traditional authorship attributionmethods to the text of Walewein. After providing an introductionto the romance and an overview of earlier research, we evaluateprevious statements on authorship and stylistics by applyingboth Yule's measure of lexical richness and Burrows's Delta.To find out whether these new methods would confirm or evenenhance our present knowledge about the differences betweenthe two authors, we applied an adapted version of John Burrows'sDelta procedure. The adapted version seems to be able to distinguishthe double authorship of the romance. It also helps us to confirmsome and to reject other earlier statements about the positionin the text where the second author started his work.  相似文献   

20.
提出了一种多语种文本自动生成系统中句子规划阶段的知识表示模型,它以句子结构类、句法规则和语义词典确定文本的具体形式,并详细介绍了该知识表示模型的结构及其匹配准则。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号