期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Function Words in Authorship Attribution Studies

Garcia Antonio Miranda; Martin Javier Calle 《Literary and Linguistic Computing》2007,22(1):49-66

The search for a reliable expression to measure an author'slexical richness has constituted many statisticians' holy grailover the last decades in their attempt to solve some controversialauthorship attributions. The greatest effort has been devotedto find a formula grounded on the computation of tokens, word-types,most-frequent-word(s), hapax legomena, hapax dislegomena, etc.,such that it would characterize a text successfully, independentof its length. In this line, Yule's K and Zipf 's Z seem tobe generally accepted by scholars as reliable measures of lexicalrepetition and lexical richness by computing content and functionwords altogether.¹ Given the latter's higher frequency, theyprove to be more reliable identifiers when isolatedly computedin p.c.a. and Delta-based attribution studies, and their rateto the former also measures the functional density of a text.In this paper, we aim to show that each constant serves to measurea specific feature and, as such, they are thought to complementone another since a supposedly rich text (in terms of its lemmas)does necessarily have to characterize by its low functionaldensity, and vice versa. For this purpose, an annotated corpusof the West Saxon Gospels (WSG) and Apollonius of Tyre (AoT)has been used along with a huge raw corpus. 相似文献

2.

Authorship Attribution and Pastiche

Harold Somers Fiona Tweedie 《Computers and the Humanities》2003,37(4):407-429

This paper considers the question of authorship attribution techniques whenfaced with a pastiche. We ask whether the techniques can distinguish the real thing from the fake, or can the author fool the computer? If the latter, is this because the pastiche is good, or because the technique is faulty? Using a number of mainly vocabulary-based techniques, Gilbert Adair's pastiche of Lewis Carroll, Alice Through the Needle's Eye, is compared with the original `Alice' books. Standard measures of lexical richness, Yule's K andOrlov's Z both distinguish Adair from Carroll, though Z also distinguishesthe two originals. A principal component analysis based on word frequenciesfinds that the main differences are not due to authorship. A discriminantanalysis based on word usage and lexical richness successfully distinguishes thepastiche from the originals. Weighted cusum tests were also unable to distinguish the two authors in a majority of cases. As a cross-validation, wemade similar comparisons with control texts: another children's story from thesame era, and other work by Carroll and Adair. The implications of thesefindings are discussed. 相似文献

3.

Non-traditional Authorship Attribution Studies in the Historia Augusta: Some Caveats

RUDMAN JOSEPH 《Literary and Linguistic Computing》1998,13(3):151-157

相似文献

4.

The State of Authorship Attribution Studies: Some Problems and Solutions

Joseph Rudman 《Language Resources and Evaluation》1997,31(4):351-365

The statement, ’’Results of most non-traditional authorship attribution studies are not universally accepted as definitive,' is explicated. A variety of problems in these studies are listed and discussed: studies governed by expediency; a lack of competent research; flawed statistical techniques; corrupted primary data; lack of expertise in allied fields; a dilettantish approach; inadequate treatment of errors. Various solutions are suggested: construct a correct and complete experimental design; educate the practitioners; study style in its totality; identify and educate the gatekeepers; develop a complete theoretical framework; form an association of practitioners. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

5.

Using Literal and Grammatical Statistics for Authorship Attribution

Kukushkina O. V. Polikarpov A. A. Khmelev D. V. 《Problems of Information Transmission》2001,37(2):172-184

Markov chains are used as a formal mathematical model for sequences of elements of a text. This model is applied for authorship attribution of texts. As elements of a text, we consider sequences of letters or sequences of grammatical classes of words. It turns out that the frequencies of occurrences of letter pairs and pairs of grammatical classes in a Russian text are rather stable characteristics of an author and, apparently, they could be used in disputed authorship attribution. A comparison of results for various modifications of the method using both letters and grammatical classes is given. Experimental research involves 385 texts of 82 writers. In the Appendix, the research of D.V. Khmelev is described, where data compression algorithms are applied to authorship attribution. 相似文献

6.

Computer-Based Authorship Attribution Without Lexical Measures

E. Stamatatos N. Fakotakis G. Kokkinakis 《Computers and the Humanities》2001,35(2):193-214

The most important approaches to computer-assistedauthorship attribution are exclusively based onlexical measures that either represent the vocabularyrichness of the author or simply comprise frequenciesof occurrence of common words. In this paper wepresent a fully-automated approach to theidentification of the authorship of unrestricted textthat excludes any lexical measure. Instead we adapt aset of style markers to the analysis of the textperformed by an already existing natural languageprocessing tool using three stylometric levels, i.e.,token-level, phrase-level, and analysis-levelmeasures. The latter represent the way in which thetext has been analyzed. The presented experiments ona Modern Greek newspaper corpus show that the proposedset of style markers is able to distinguish reliablythe authors of a randomly-chosen group and performsbetter than a lexically-based approach. However, thecombination of these two approaches provides the mostaccurate solution (i.e., 87% accuracy). Moreover, wedescribe experiments on various sizes of the trainingdata as well as tests dealing with the significance ofthe proposed set of style markers. 相似文献

7.

Authorship Attribution with Support Vector Machines 总被引：1，自引：0，他引：1

Joachim Diederich Jörg Kindermann Edda Leopold Gerhard Paass 《Applied Intelligence》2003,19(1-2):109-123

In this paper we explore the use of text-mining methods for the identification of the author of a text. We apply the support vector machine (SVM) to this problem, as it is able to cope with half a million of inputs it requires no feature selection and can process the frequency vector of all words of a text. We performed a number of experiments with texts from a German newspaper. With nearly perfect reliability the SVM was able to reject other authors and detected the target author in 60–80% of the cases. In a second experiment, we ignored nouns, verbs and adjectives and replaced them by grammatical tags and bigrams. This resulted in slightly reduced performance. Author detection with SVMs on full word forms was remarkably robust even if the author wrote about different topics. 相似文献

8.

An Assessment of Cumulative Sum Charts for Authorship Attribution

HILTON MICHAEL L.; HOLMES DAVID I. 《Literary and Linguistic Computing》1993,8(2):73-80

相似文献

9.

Authorship Attribution of the Scriptores Historiae Augustae

GURNEY PENELOPE J.; GURNEY LYMAN W. 《Literary and Linguistic Computing》1998,13(3):119-131

相似文献

10.

Review Authorship Attribution in a Similarity Space

下载免费PDF全文

钱铁云刘兵李青司建锋《计算机科学技术学报》2015,(1)

Authorship attribution, also known as authorship classification, is the problem of identifying the authors (reviewers) of a set of documents (reviews). The common approach is to build a classifier usin... 相似文献

11.

诗人密码:唐诗作者身份识别

周爱桑晨张益嘉鲁明羽《中文信息学报》2022,36(6):162-170

作者身份识别是对作者个人写作风格的分析。虽然这一任务在多种语言中都得到了广泛的研究,但对中文而言,研究还没有涉及古典诗歌领域。唐诗同时具有跳跃性和整体性,为了兼顾这两种特点,该文提出了一种双通道的Cap-Transformer集成模型。上通道Capsule模型可以在提取特征的同时降低信息损失,能够更好地捕获唐诗各个意象的语义特征;下通道Transformer模型通过多头自注意力机制充分学习唐诗所有意象共同反映的深层语义信息。实验表明,该文提出的模型适用于唐诗作者身份识别任务,并通过错误分析,针对唐诗文本的特殊性,讨论了唐诗作者身份识别任务目前存在的问题及未来的研究方向和面临的挑战。相似文献

12.

Quantitative Authorship Attribution: An Evaluation of Techniques

Grieve Jack 《Literary and Linguistic Computing》2007,22(3):251-270

The basic assumption of quantitative authorship attributionis that the author of a text can be selected from a set of possibleauthors by comparing the values of textual measurements in thattext to their corresponding values in each possible author'swriting sample. Over the past three centuries, many types oftextual measurements have been proposed, but never before havethe majority of these measurements been tested on the same dataset.A large-scale comparison of textual measurements is crucialif current techniques are to be used effectively and if newand more powerful techniques are to be developed. This articlepresents the results of a comparison of thirty-nine differenttypes of textual measurements commonly used in attribution studies,in order to determine which are the best indicators of authorship.Based on the results of these tests, a more accurate approachto quantitative authorship attribution is proposed, which involvesthe analysis of many different textual measurements. 相似文献

13.

The Federalist Revisited: New Directions in Authorship Attribution 总被引：1，自引：0，他引：1

HOLMES D. I.; FORSYTH R. S. 《Literary and Linguistic Computing》1995,10(2):111-127

相似文献

14.

Statistical Stylistics and Authorship Attribution: an Empirical Investigation

Hoover David L. 《Literary and Linguistic Computing》2001,16(4):421-444

相似文献

15.

Sentence-length and Authorship Attribution: the Case of Oliver Goldsmith

Mannion David; Dixon Peter 《Literary and Linguistic Computing》2004,19(4):497-508

相似文献

16.

Subsets and Homogeneity: Authorship Attribution in the Scriptories Historiae Augustae

GURNEY PENELOPE J.; GURNEY LYMAN W. 《Literary and Linguistic Computing》1998,13(3):133-140

相似文献

17.

Stephen Crane and the New-York Tribune: A Case Study in Traditional and Non-Traditional Authorship Attribution

David I. Holmes Michael Robertson Roxanna Paez 《Computers and the Humanities》2001,35(3):315-331

This paper describes how traditional andnon-traditional methods were used to identifyseventeen previously unknown articles that webelieve to be by Stephen Crane, published inthe New-York Tribune between 1889 and1892. The articles, printed without byline inwhat was at the time New York City's mostprestigious newspaper, report on activities ina string of summer resort towns on New Jersey'snorthern shore. Scholars had previouslyidentified fourteen shore reports as Crane's;these possible attributions more than doublethat corpus. The seventeen articles confirmhow remarkably early Stephen Crane set hisdistinctive writing style and artistic agenda. In addition, the sheer quantity of the articlesfrom the summer of 1892 reveals how vigorouslythe twenty-year-old Crane sought to establishhimself in the role of professional writer. Finally, our discovery of an article about theNew Jersey National Guard's summer encampmentreveals another way in which Crane immersedhimself in nineteenth-century military cultureand help to explain how a young man who hadnever seen a battle could write so convincinglyof war in his soon-to-come masterpiece,The Red Badge of Courage. We argue that thejoint interdisciplinary approach employed inthis paper should be the way in whichattributional research is conducted. 相似文献

18.

Questions of Authorship: Attribution and Beyond A Lecture Delivered on the Occasion of the Roberto Busa Award ACH-ALLC 2001, New York

John Burrows 《Language Resources and Evaluation》2003,37(1):5-32

相似文献

19.

并发程序切片原型系统的设计与实现

下载免费PDF全文

徐晓晶戚晓芳《计算机科学与探索》2012,6(3):257-266

并发程序切片是并发程序分析的一种重要手段。针对多线程共享变量通信机制,在通过程序分析工具CodeSurfer获取程序基本信息的基础上构造程序可达图,生成以程序状态和语句二元组为节点的并发程序依赖图,实现了基于程序可达图的并发程序切片原型系统。初步实验结果表明,与传统的切片方法相比,采用基于程序可达图的并发程序切片方法,可有效地解决依赖关系不可传递问题,获得高精度的并发程序切片。相似文献

20.

作者识别研究综述

张洋江铭虎《自动化学报》2021,47(11):2501-2520

作者识别是根据已知文本推断未知文本作者的交叉学科. 其传统研究通常基于文学或语言学的经验知识, 而现代研究则主要依靠数学方法量化作者的写作风格. 近些年, 随着认知科学、系统科学和信息技术的发展, 作者识别受到越来越多研究者的关注. 本文主要站在计算语言学的角度综述作者识别领域现代研究中的方法和思路. 首先, 简要介绍了作者识别的发展历程. 然后, 详述了文体风格特征、作者识别方法以及该领域中多层面的研究. 接着介绍了与作者识别相关的一些评测、数据集及评价指标. 最后, 指出该领域存在的一些问题, 结合这些问题分析并展望了作者识别的发展趋势. 相似文献