首页 | 本学科首页   官方微博 | 高级检索  
     

基于WordSmith软件的平行语料库加工处理系统设计
引用本文:李宁.基于WordSmith软件的平行语料库加工处理系统设计[J].自动化与仪器仪表,2021(2):131-134.
作者姓名:李宁
作者单位:西安职业技术学院
基金项目:陕西省教育厅2020年度专项科学研究计划项目“一带一路背景下儒家典籍文化词英译策略探究:关联理论视角”(No.20JK0366)。
摘    要:平行语料库加工处理过程中,传统的系统很难将当前字的标记与其它序列字符的同现特征统计出来,导致切分错误情况频频发生。为此,设计基于WordSmith软件的平行语料库加工处理系统。在硬件设计上,使用S3C6410处理器实现文本分析功能,生成标注文件,用于后续加工处理;在软件设计上,使用WordSmith软件提取出语料库中的词表,并进行削尾处理,使用6字标注集实现语料的分词处理,处理完成后,根据计算的词语相似度实现语料对齐处理。至此,系统设计完成。实验结果表明:设计的基于WordSmith软件的平行语料库加工处理系统在分词实验中没有出现切分异常的情况,并且在兼类词消歧实验中,召回率为95.6,K值为97.2,均高于传统的加工处理系统。

关 键 词:wordSwith  平行语料库  加工处理  系统设计  对齐处理

Design of parallel corpus processing system based on WordSmith software
LI Ning.Design of parallel corpus processing system based on WordSmith software[J].Automation & Instrumentation,2021(2):131-134.
Authors:LI Ning
Affiliation:(Xi'an Vocational and Technical College,Xi'an Shanxi 710077,China)
Abstract:During the processing of parallel corpus,it is difficult for traditional systems to count the co-occurrence characteristics of the current character’s mark and other sequence characters,resulting in frequent segmentation errors.Therefore,a parallel corpus processing system based on WordSmith software is designed.In the hardware design,the S3 C6410 processor is used to realize the text analysis function,and the annotation file is generated for subsequent processing;in the software design,the WordSmith software is used to extract the vocabulary in the corpus,and the tail is trimmed,and the 6-character label is used to realize the word segmentation processing of the corpus.After the processing is completed,the corpus alignment processing is realized according to the calculated word similarity.At this point,the system design is complete.The experimental results show that the designed parallel corpus processing system based on WordSmith software has no segmentation abnormalities in the word segmentation experiment,and in the concurrent word disambiguation experiment,the recall rate is 95.6 and the K value is 97.2,both higher than traditional processing system.
Keywords:wordSwith  parallel corpus  processing  system design  alignment
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号