首页 | 本学科首页   官方微博 | 高级检索  
     

词语序差的分布特点与文本间词汇异同
引用本文:刘锐,孙碧泽,龙云飞,王珊.词语序差的分布特点与文本间词汇异同[J].中文信息学报,2017,31(5):8-13.
作者姓名:刘锐  孙碧泽  龙云飞  王珊
作者单位:1.厦门大学 中文系,福建 厦门 361005;
2.南京大学 中文系,江苏 南京210023;
3.香港理工大学 电子计算学系,香港;
4.香港教育大学 中国语言学系,香港
基金项目:香港教育大学(Internal Research Grant; Project No.: 15214, Activity Code: R3733, Reference Number: RG 92/2015-2016)
摘    要:该文在已有关于“频级”“频序”研究的基础上,结合两种不同类型的语料,采用词汇计量分析方法,考察词语的“序差”所具有的分布特点。该研究发现,对于两种文本的共有词集,词的序差呈对称分布,且集中分布于中位数附近,存在离群值序差。这一特点在序差图上表现为“中段平直,双尾翘曲”的“双尾分布”形态。根据词语序差的分布规律,可以将文本共有词划分为“中段”“下尾”“上尾”三个层次。“中段”词语反映两个文本的共性特征,“下尾”及“上尾”词语反映两个文本的差异性特征,这些特征具有反映文本的主题内容和文体风格的语言学意义。

关 键 词:序差  双尾分布  主题内容  文体风格  

Lexical Frequency Rank Difference Distributions Between Texts
LIU Rui,SUN Bize,LONG Yunfei,WANG Shan.Lexical Frequency Rank Difference Distributions Between Texts[J].Journal of Chinese Information Processing,2017,31(5):8-13.
Authors:LIU Rui  SUN Bize  LONG Yunfei  WANG Shan
Affiliation:1.Department of Chinese Language and Literature, Xiamen University, Xiamen, Fujian 361005, China;
2.Department of Chinese Language and Literature, Nanjing University, Nanjing, Jiangsu 210023, China;
3.Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China;
4. Department of Chinese Language Studies, The Education University of Hong Kong, Hong Kong, China
Abstract:Based on previous studies on frequency and frequency rank of words, this paper focuses on the analysis of the frequency rank difference (FRD) from the perspective of lexical quantitative analysis. This paper reveals that for the common words between texts, the FRDs are distributed symmetrically and gathered around the median. This characteristic assumes a “two-tailed distribution”, which is flat in the middle and curving in both ends. Three lexical levels, i.e. middle, downward end and upward end, are summarized based on the FRD distributions. The middle lexicon reflects the common characteristics of the two texts, while the lexicon that belongs to both ends reflects their own distinctive features. These features are of linguistic significance in reflecting the thematic content and stylistic features of the texts.
Keywords:frequency rank difference  two-tailed distribution  thematic content  stylistic features of the texts  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号