期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Student papers across the curriculum: Designing and developing a corpus of British student writing

Hilary Gerard Lisa 《Computers and Composition》2004,21(4):439-450

This paper reports on a collaborative project, currently being carried out by the Centre for English Language Teacher Education and the Warwick Writing Programme at the University of Warwick, England, to compile a multimillion-word corpus of student writing. Since May 2001, we have collected samples of proficient written coursework produced by students at all levels and in a range of disciplines. We believe this student writing collection will eventually provide an invaluable database for use by researchers and writing teachers, enabling them to identify and describe, in a systematic way, the characteristics of assigned work across disciplines and levels of study. Our corpus is confined to shorter assignments assessed within departments—the most common form of student writing, but unpublished and therefore generally unavailable to researchers. This paper describes the project, and explains the rationale for developing the corpus. It also considers the corpus’ potential role as a resource for research and teaching within and across subject disciplines. 相似文献

2.

综合型语言知识库的建设与利用 总被引：15，自引：4，他引：15

俞士汶段慧明朱学锋张化瑞《中文信息学报》2004,18(5):2-11

语言知识库的规模和质量决定了自然语言处理系统的成败。经过18年的努力,北京大学计算语言学研究所已经积累了一系列颇具规模、质量上乘的语言数据资源:现代汉语语法信息词典,大规模基本标注语料库,现代汉语语义词典,中文概念词典,不同单位对齐的双语语料库,多个专业领域的术语库,现代汉语短语结构规则库,中国古代诗词语料库等等。本项研究将把这些语言数据资源集成为一个综合型的语言知识库。集成不同的语言数据资源时,必须克服它们之间的“缝隙”。规划中的综合型语言知识库除了有统一的友好的使用界面和方便的应用程序接口外,还将提供支持知识挖掘的工具软件,促使现有的语言数据资源从初级产品形式向深加工产品形式不断发展;提供多种形式的知识传播和信息服务机制,让综合型语言知识库为语言信息处理研究、语言学本体研究和语言教学提供全方位的、多层次的支持。相似文献

3.

语言工程的软件体系结构研究综述

冯冲陈肇雄黄河燕《中文信息学报》2004,18(6):54-60,72

语言工程的软件体系结构已经逐渐发展成为语言工程的主要研究领域之一。它面向通用的自然语言应用,为其提供架构层次的参考方案。研究内容涵盖与体系结构相关的计算资源、语言资源、方法和应用等多个方面。在一定意义上,可以把它看作是在语言工程领域内的特定领域软件体系结构(DSSA)。本文概要介绍了该领域的发展历程和研究意义,然后对其基本概念和当前主要研究进展进行了阐述和分析,并展望了进一步的发展趋势。相似文献

4.

Internet网络新闻文本自动摘要的研究 总被引：6，自引：0，他引：6

官礼和《计算机工程与设计》2007,28(14):3518-3520,F0003

给出了Internet网络新闻中文文本自动摘要的基本思路和基本步骤,讨论了断句、分词算法.针对自动摘要中新闻文本的4种形式特征,提出了一套新的自动摘要方案:首先综合新闻文本的4种形式特征对词汇和句子赋予不同的权值,然后根据权值大小按给定的比例挑选句子,并进行平滑处理,生成文字流畅且具备一定质量的摘要.最后实验分析表明效果较好. 相似文献

5.

Cross-Fertilization Between Human Computer Interaction and Natural Language Processing: Why and How

Nadine Ozkan Cécile Paris 《International Journal of Speech Technology》2002,5(2):135-146

Many of the central notions and ultimate goals of both human-computer interaction (HCI) and natural language processing (NLP) are common to both disciplines. Both are concerned with communication as a core concept, and both attempt to maximize the naturalness of this communication for the end-user. A central challenge to both disciplines is the issue of the choice and adaptation of the appropriate form of communication for the specific user and context at hand. Despite these strong commonalities, we observe very little collaboration, cross-references or even mutual knowledge between the HCI and NLP communities. And, surprisingly enough, although their goals might be very similar, the methods and the evaluation frameworks used in both research and applicative work in both areas are distinct. We think that it is time to step back and re-assess the potential for collaboration between the two disciplines.In this paper, we argue that importing ideas and methods from each discipline into the other can be fruitful, and we review specific areas where this is the case. We argue that cross-fertilization between HCI and NLP is desirable in wider and in more fundamental ways than only for the design of natural language interfaces. The reflection presented in this paper is motivated by our own work over the last four years in a team comprised of both HCI and specialists in natural language generation (NLG), a subfield of NLP specifically concerned with the automatic generation of language. 相似文献

6.

Annotating Expressions of Opinions and Emotions in Language 总被引：3，自引：0，他引：3

Janyce Wiebe Theresa Wilson Claire Cardie 《Language Resources and Evaluation》2005,39(2-3):165-210

This paper describes a corpus annotation project to study issues in the manual annotation of opinions, emotions, sentiments, speculations, evaluations and other private states in language. The resulting corpus annotation scheme is described, as well as examples of its use. In addition, the manual annotation process and the results of an inter-annotator agreement study on a 10,000-sentence corpus of articles drawn from the world press are presented. 相似文献

7.

Using SGML as a Basis for Data-Intensive Natural Language Processing

D. McKelvie C. Brew H.S. Thompson 《Language Resources and Evaluation》1997,31(5):367-388

This paper describes the LT NSL system (McKelvie et al., 1996), an architecture for writing corpus processing tools. This system is then compared with two other systems which address similar issues, the GATE system (Cunningham et al., 1995) and the IMS Corpus Workbench (Christ, 1994). In particular we address the advantages and disadvantages of an SGML approach compared with a non-sgml database approach. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

8.

面向自然语言信息处理的维吾尔语名词形态分析研究 总被引：2，自引：3，他引：2

阿依克孜·卡德尔开沙尔·卡德尔吐尔根·依布拉音《中文信息学报》2006,20(3):45-48,98

名词是人类语言中的基本词类之一。维吾尔语是一种形态变化很复杂的语言,其中名词是一种形态变化复杂的词类。因此名词的形态分析研究无论在语法研究还是在语言信息处理中都非常重要。本文对维吾尔语名词的形态变化(名词的数、人称、格等语法范畴)进行了形式化的描述和分析。指出了维吾尔语名词的基本形态参数,总结出参数的组配规律并统计了其类型,探索了维吾尔语名词的削尾方法。这些工作将为维吾尔语名词形态处理提供有效的方法和新的思路。相似文献

9.

自然语言处理预训练模型的研究综述

下载免费PDF全文

余同瑞金冉韩晓臻李家辉郁婷《计算机工程与应用》2020,56(23):12-22

近年来,深度学习技术被广泛应用于各个领域,基于深度学习的预处理模型将自然语言处理带入一个新时代。预训练模型的目标是如何使预训练好的模型处于良好的初始状态,在下游任务中达到更好的性能表现。对预训练技术及其发展历史进行介绍,并按照模型特点划分为基于概率统计的传统模型和基于深度学习的新式模型进行综述;简要分析传统预训练模型的特点及局限性,重点介绍基于深度学习的预训练模型,并针对它们在下游任务的表现进行对比评估;梳理出具有启发意义的新式预训练模型,简述这些模型的改进机制以及在下游任务中取得的性能提升;总结目前预训练的模型所面临的问题,并对后续发展趋势进行展望。相似文献

10.

图神经网络在自然语言处理中的应用

陈雨龙付乾坤张岳《中文信息学报》2021,35(3):1-23

近几年,神经网络因其强大的表征能力逐渐取代传统的机器学习成为自然语言处理任务的基本模型.然而经典的神经网络模型只能处理欧氏空间中的数据,自然语言处理领域中,篇章结构,句法甚至句子本身都以图数据的形式存在.因此,图神经网络引起学界广泛关注,并在自然语言处理的多个领域成功应用.该文对图神经网络在自然语言处理领域中的应用进行... 相似文献

11.

基于强化学习的对抗预训练语言建模方法

颜俊琦孙水发吴义熔裴伟董方敏《中文信息学报》2022,36(4):20-28

在大规模无监督语料上的BERT、XLNet等预训练语言模型,通常采用基于交叉熵损失函数的语言建模任务进行训练。模型的评价标准则采用困惑度或者模型在其他下游自然语言处理任务中的性能指标,存在损失函数和评测指标不匹配等问题。为解决这些问题,该文提出一种结合强化学习的对抗预训练语言模型RL-XLNet(Reinforcement Learning-XLNet)。RL-XLNet采用对抗训练方式训练一个生成器,基于上下文预测选定词,并训练一个判别器判断生成器预测的词是否正确。通过对抗网络生成器和判别器的相互促进作用,强化生成器对语义的理解,提高模型的学习能力。由于在文本生成过程中存在采样过程,导致最终的损失无法直接进行回传,故提出采用强化学习的方式对生成器进行训练。基于通用语言理解评估基准(GLUE Benchmark)和斯坦福问答任务(SQuAD 1.1)的实验,结果表明,与现有BERT、XLNet方法相比,RL-XLNet模型在多项任务中的性能上表现出较明显的优势: 在GLUE的六个任务中排名第1,一个任务排名第2,一个任务排名第3。在SQuAD 1.1任务中F₁值排名第1。考虑到运算资源有限,基于小语料集的模型性能也达到了领域先进水平。相似文献

12.

An Action Representation Formalism to Interpret Natural Language Instructions

Barbara Di Eugenio 《Computational Intelligence》1998,14(1):89-133

相似文献

13.

自动提取词汇化树邻接文法

许云樊孝忠张锋《计算机应用》2005,25(1):4-6

提出了一种从宾州中文语料库中自动提取词汇化树邻接文法(LTAG)的算法。该算法的主要思想是从词汇化树库中归纳出三种类型的词汇化树,然后利用了中心词驱动短语结构文法的方法从语料库自动提取结构合理的词汇化树;最后由语言规则对不合法的词汇化树进行过滤。与手工创建词汇化树邻接文法相比,它需要较少的人力,并且避免了人工创建词汇化树可能造成的遗漏或不一致现象。相似文献

14.

面向古建动画自动生成的中文自然语言处理

孙凯《网络安全技术与应用》2011,(9):52-55

本文提出了一个面向古代建筑领域的自然语言处理的系统模型,它被用于古建筑动画自动生成系统之中,承担着从简单中文描述到古建筑领域语义结果的计算工作。该模型分为三部分,分别为预处理过程,一般语义计算和面向古建筑领域的语义计算。通过调用Stanford大学的中文分词、语法分析程序完成分词、语法分析任务,使用Prolog语言完成一般语义计算,最终计算出古建筑构件以及它的搭建顺序、尺寸和位置,即所谓的面向古建筑领域的语义计算。相似文献

15.

单词嵌入-自然语言的连续空间表示

陈恩红丘思语许畅田飞刘铁岩《数据采集与处理》2014,29(1):19-29

单词嵌入是指运用机器学习的方法,将位于高维离散空间（维数为词典单词数目）中的每个单词映射到低维连续空间的实数向量的技术。在很多文本处理的任务中,单词嵌入提供了更好的语义级别的单词特征表示,从而为文本处理任务带来了诸多便利。同时,大数据时代海量的未标注文本数据,以及以深度学习为代表的机器学习技术的发展使高效的单词嵌入技术成为可能。本文将给出单词嵌入的定义以及实际意义,同时将综述目前单词嵌入技术的几种典型方法,包括基于神经网络的方法、基于受限玻尔兹曼机的方法以及基于单词与上下文共生矩阵分解的方法。本文将详细介绍不同模型的数学定义、物理意义以及训练方法,并给出他们之间的比较。相似文献

16.

面向多种语料的语气词用法规则问题研究

下载免费PDF全文

周溢辉昝红英穆玲玲《计算机工程与应用》2011,47(28):135-138

语气词用法的自动识别是现代汉语语气词知识库的核心问题。使用规则方法研究了语气词用法在多种语料库中的识别问题,从多种语料库中的语气词实际用法入手,修改和完善了语气词用法词典和语气词用法规则库。实验数据表明,经过修改和完善,语气词用法在各语料库中的识别准确率有所提高,增强了语气词知识库的适用性。相似文献

17.

视觉问答语言处理方法综述

下载免费PDF全文

王瑞平吴士泓张美航王小平《计算机工程与应用》2022,58(17):50-60

视觉问答中的语言处理方法对视觉问答模型的性能影响巨大。语言处理方法源于自然语言处理,但在发展过程中与自然语言处理领域最先进技术脱节,导致视觉问答中涉及的问题理解和答案生成受阻。产生这一问题的根源主观上是研究人员对语言处理方法的重要性认识不足,客观上则是相关研究文献的匮乏。针对上述问题,通过分析语言处理对视觉问答的价值,调查视觉问答中涉及到的语言处理方法和最新研究成果,归纳总结语言处理方法的类型,从而为研究人员认识语言处理重要性提供基础;探讨了自然语言处理技术对视觉问答中语言处理方法的推动作用,并展望了语言处理方法未来的发展方向。相似文献

18.

Manar Ahmed Hamza Hala J. Alshahrani Khaled Tarmissi Ayman Yafoz Amal S. Mehanna Ishfaq Yaseen Amgad Atta Abdelmageed Mohamed I. Eldesouki 《计算机系统科学与工程》2023,46(3):3303-3319

The term ‘corpus’ refers to a huge volume of structured datasets containing machine-readable texts. Such texts are generated in a natural communicative setting. The explosion of social media permitted individuals to spread data with minimal examination and filters freely. Due to this, the old problem of fake news has resurfaced. It has become an important concern due to its negative impact on the community. To manage the spread of fake news, automatic recognition approaches have been investigated earlier using Artificial Intelligence (AI) and Machine Learning (ML) techniques. To perform the medicinal text classification tasks, the ML approaches were applied, and they performed quite effectively. Still, a huge effort is required from the human side to generate the labelled training data. The recent progress of the Deep Learning (DL) methods seems to be a promising solution to tackle difficult types of Natural Language Processing (NLP) tasks, especially fake news detection. To unlock social media data, an automatic text classifier is highly helpful in the domain of NLP. The current research article focuses on the design of the Optimal Quad Channel Hybrid Long Short-Term Memory-based Fake News Classification (QCLSTM-FNC) approach. The presented QCLSTM-FNC approach aims to identify and differentiate fake news from actual news. To attain this, the proposed QCLSTM-FNC approach follows two methods such as the pre-processing data method and the Glove-based word embedding process. Besides, the QCLSTM model is utilized for classification. To boost the classification results of the QCLSTM model, a Quasi-Oppositional Sandpiper Optimization (QOSPO) algorithm is utilized to fine-tune the hyperparameters. The proposed QCLSTM-FNC approach was experimentally validated against a benchmark dataset. The QCLSTM-FNC approach successfully outperformed all other existing DL models under different measures. 相似文献

19.

基于自然语言处理的多级网页过滤器研究

康海燕任俊玲陈昕王鹤沩《信息安全与技术》2011,(10):66-69

针对现有网页过滤系统的不足和实时网络信息过滤的新挑战,提出新一代多级网页智能过滤解决方案：主要采用Mimefilte r技术,结合多级过滤方法对网页进行过滤。利用分类算法对已知的训练样本进行学习,提取特征向量,构造二值分类器。然后运用此分类器,对新的网页进行过滤,将过滤的结果提交给用户,用户可对过滤结果进行评价反馈,系... 相似文献

20.

Examining effectiveness of communities of practice in online English for academic purposes (EAP) assessment in virtual classes

《Computers & Education》2014

The literature on English for academic purposes (EAP) methodology highlights the significance of learners' engagement in learning language (Hyland, 2006) in mainstream general and online contexts. Blogs have been recommended in many studies as having the potential to bring the sense of community and collaboration in online classes. Therefore, this study sought to investigate whether blogs in large classes would help students enhance their perceptions of learning. To this end, Forty-two undergraduate students of Information Technology (IT) at an Iranian university participated in a weblog writing course in order to promote collaboration and reflective learning. Instrumentation included a questionnaire of perceived learning and sense of community, semi-structured interviews, and participant observations. The findings revealed a significant difference in perceived learning between the students with low sense of community and those with a high sense of community. Based on the qualitative findings of the study, we suggest an assessment framework incorporating constructivist and social-interactionist theories of learning in order to treat students as members of a community of learning. The findings may promise implications for gearing EAP assessment to more collaborative modes in online courses and suggest a model framework for the assessment of students in EAP online classes. 相似文献