首页 | 本学科首页   官方微博 | 高级检索  
     

非规范化中文地址的行政区划提取算法
引用本文:李晓林,黄爽,卢涛,李霖.非规范化中文地址的行政区划提取算法[J].计算机应用,2017,37(3):876-882.
作者姓名:李晓林  黄爽  卢涛  李霖
作者单位:1. 武汉工程大学 计算机科学与工程学院, 武汉 430205;2. 智能机器人湖北省重点实验室(武汉工程大学), 武汉 430205;3. 武汉大学 资源与环境科学学院, 武汉 430079
基金项目:测绘地理信息公益性行业科研专项(201412014);国家863计划项目(2013AA12A202);湖北省自然科学基金资助项目(2013CFA125);武汉工程大学第七届研究生创新基金资助项目(CX2015053)。
摘    要:由于互联网上中文地址的非规范化表达,导致互联网中的中文地址信息在地理位置服务中难以直接应用。针对此问题,提出一种非规范中文地址的行政区划提取算法。首先,对原始数据进行“路”特征词分组预处理;再利用行政区划字典和移动窗口最大匹配算法,从中文地址中提取所有可能的行政区划数据集;然后,利用中文地址行政区划元素之间具有层次关系的特点,建立行政区划条件集合运算规则,对获取的数据集进行集合运算;再利用行政区划匹配度建立一种行政区划集合解析规则,来计算行政区划可信度;最后,得到可信度最大信息量最完整的中文地址的行政区划。利用从互联网中提取的约25万条中文地址数据进行是否采用“路”特征词分组处理以及是否进行可信度计算处理,对算法的可用性进行了验证,并与目前的地址匹配技术进行对比,准确率达到93.51%。

关 键 词:集合运算  行政区划  中文地址  移动窗口  匹配度  解析规则  
收稿时间:2016-08-26
修稿时间:2016-10-18

Administrative division extracting algorithm for non-normalized Chinese addresses
LI Xiaolin,HUANG Shuang,LU Tao,LI Lin.Administrative division extracting algorithm for non-normalized Chinese addresses[J].journal of Computer Applications,2017,37(3):876-882.
Authors:LI Xiaolin  HUANG Shuang  LU Tao  LI Lin
Affiliation:1. School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan Hubei 430205, China;2. Hubei Provincial Key Laboratory of Intelligent Robot(Wuhan Institute of Technology), Wuhan Hubei 430205, China;3. School of Resource and Environmental Sciences, Wuhan University, Wuhan Hubei 430079, China
Abstract:Chinese addresses on the Internet are always non-normalized, which cannot be used directly in location-based services. To solve the problem, an algorithm to extract administrative divisions from non-normalized Chinese addresses was proposed. Firstly, preprocessing "road" feature word grouping for original data; using administrative division dictionary and moving window maximum matching algorithm, extract all possible administrative region data sets from Chinese address. Then, using the Chinese administrative divisions between the elements of the hierarchical relationship between the characteristics, the administrative set conditional set operation rule was established and the acquired data set was aggregated. using the administrative division of matching, a set of administrative division set rules were established to calculate the credibility of the administrative division. Finally, the credibility of the maximum amount of information the most complete Chinese address of the administrative divisions were obtained. By using the extracted from the Internet about 250000 Chinese address data whether the use of "road" feature word packet processing and whether to carry on the credibility calculation process was verified for the availability of the algorithm, and with the current address matching technology for comparison, the accuracy rate of 93.51%.
Keywords:set operation  administrative division  Chinese address  moving window  matching degree  analytical rule  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号