首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The Journal of Supercomputing - Record linkage is a technique widely used to gather data stored in disparate data sources that presumably pertain to the same real world entity. This integration can...  相似文献   

3.
4.
5.
Record Linkage is used to link records of two different files corresponding to the same individuals. These algorithms are used for database integration. In data privacy, these algorithms are used to evaluate the disclosure risk of a protected data set by linking records that belong to the same individual. The degree of success when linking the original (unprotected data) with the protected data gives an estimation of the disclosure risk.In this paper we propose a new parameterized aggregation operator and a supervised learning method for disclosure risk assessment. The parameterized operator is a symmetric bilinear form and the supervised learning method is formalized as an optimization problem. The target of the optimization problem is to find the values of the aggregation parameters that maximize the number of re-identification (or correct links). We evaluate and compare our proposal with other non-parametrized variations of record linkage, such as those using the Mahalanobis distance and the Euclidean distance (one of the most used approaches for this purpose). Additionally, we also compare it with other previously presented parameterized aggregation operators for record linkage such as the weighted mean and the Choquet integral. From these comparisons we show how the proposed aggregation operator is able to overcome or at least achieve similar results than the other parameterized operators. We also study which are the necessary optimization problem conditions to consider the described aggregation functions as metric functions.  相似文献   

6.
7.
In data privacy, record linkage can be used as an estimator of the disclosure risk of protected data. To model the worst case scenario one normally attempts to link records from the original data to the protected data. In this paper we introduce a parametrization of record linkage in terms of a weighted mean and its weights, and provide a supervised learning method to determine the optimum weights for the linkage process. That is, the parameters yielding a maximal record linkage between the protected and original data. We compare our method to standard record linkage with data from several protection methods widely used in statistical disclosure control, and study the results taking into account the performance in the linkage process, and its computational effort.  相似文献   

8.
The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage.  相似文献   

9.
基于LSSVM的MIMO系统快速在线辨识方法   总被引:2,自引:0,他引:2  
针对时变非线性多输入多输出(MIMO)系统在线辨识较困难的问题,提出一种基于最小二乘支持向量机(LSSVM)的快速在线辨识方法。介绍了现有LSSVM增量式和在线式学习算法,并为它引入了一些加速实现策略,得到LSSVM快速在线式学习算法。将MIMO系统分解为多个多输入单输出(MISO)子系统,对每一个MISO利用一个LSSVM在线建模;这些LSSVM执行快速在线式学习算法。数字仿真显示该方法建模速度快,模型预测精度高。  相似文献   

10.
在线/离线签名是利用预处理技巧提高在线签名速度的签名形式。构造了一种可以恢复消息的在线/离线签名方案。因为不需要发送消息,该方案可以大大节约传输带宽。在随机预言模型下,新方案被证明是安全的。  相似文献   

11.
The results discussed in this paper are relevant to a large database consisting of consumer profile information together with behavioral (transaction) patterns. We introduce the concept of profile association rules, which discusses the problem of relating consumer buying behavior to profile information. The problem of online mining of profile association rules in this large database is discussed. We show how to use multidimensional indexing structures in order to actually perform the mining. The use of multidimensional indexing structures to perform profile mining provides considerable advantages in terms of the ability to perform very generic range-based online queries.  相似文献   

12.
A digital signature is an important type of authentication in a public-key (or asymmetric) cryptographic system, and it is widely used in many digital government applications. We, however, note that the performance of an Internet server computing digital signatures online is limited by the high cost of modular arithmetic. One simple way to improve the performance of the server is to reduce the number of computed digital signatures by combining a set of documents into a batch in a smart way and signing each batch only once. This approach could reduce the demand on the CPU but require more network bandwidth of sending extra information to clients.In this paper, we investigate performance of different online digital signature batching schemes. That is, we provide a framework for studying as well as analyzing performance of a variety of such schemes. The results show that substantial computational benefits can be obtained from batching without significant increases in the amount of additional information that needs to be sent to the clients. Furthermore, we explore the potential benefits of considering more sophisticated batching schemes. The proposed analytical framework uses a semi-Markov model of a batch-based digital signature server. Through the emulation and the simulation, the results show the accuracy and effectiveness of our proposed analytic framework.  相似文献   

13.
Application of exact ODDS for partial agreements of names in record linkage   总被引:1,自引:0,他引:1  
Automated methods for linking records pertaining to the same individuals have in the past made only crude use of the name information. A perceptive filing clerk is more sophisticated because humans retain a lifetime memory of instances in which variant forms of names were employed interchangeably, and of synonyms that sometimes did not even resemble one another. This limitation of the machine can be rectified, but the body of knowledge required to serve as its memory must be large. The needed data have now been brought together on a suitable scale, from many past searches of Canada's Mortality Data Base. Described here are the development and use of the resulting tables of essentially exact discriminating powers (or ODDS) to do with comparisons of male given names. The aim is to reduce the proportion of ambiguously linked pairs of records requiring labor intensive clerical resolution. The tables are intended for general use in this country, and as a model for similar facilities appropriate to other populations.  相似文献   

14.
Record linkage aims at finding the matching records from two or multiple different databases. Many approximate string matching methods in privacy-preserving record linkage have been presented. In this paper, we study the problem of secure record linkage between two data files in the two-party computation setting. We note that two records are linked when all the Hamming distances between their attributes are smaller than some fixed thresholds. We firstly construct two efficient secure protocols for computing the powers of an encrypted value and implementing zero test on an encrypted value, then present an efficient protocol within constant rounds for computing the Hamming distance based record linkage in the presence of malicious adversaries by transferring these two protocols. We also discuss the extension of our protocol for settling the Levenshtein distance based record linkage problem.  相似文献   

15.
Record linkage is a process of identifying records that refer to the same real-world entity. Many existing approaches to record linkage apply supervised machine learning techniques to generate a classification model that classifies a pair of records as either match or non-match. The main requirement of such an approach is a labelled training dataset. In many real-world applications no labelled dataset is available hence manual labelling is required to create a sufficiently sized training dataset for a supervised machine learning algorithm. Semi-supervised machine learning techniques, such as self-learning or active learning, which require only a small manually labelled training dataset have been applied to record linkage. These techniques reduce the requirement on the manual labelling of the training dataset. However, they have yet to achieve a level of accuracy similar to that of supervised learning techniques. In this paper we propose a new approach to unsupervised record linkage based on a combination of ensemble learning and enhanced automatic self-learning. In the proposed approach an ensemble of automatic self-learning models is generated with different similarity measure schemes. In order to further improve the automatic self-learning process we incorporate field weighting into the automatic seed selection for each of the self-learning models. We propose an unsupervised diversity measure to ensure that there is high diversity among the selected self-learning models. Finally, we propose to use the contribution ratios of self-learning models to remove those with poor accuracy from the ensemble. We have evaluated our approach on 4 publicly available datasets which are commonly used in the record linkage community. Our experimental results show that our proposed approach has advantages over the state-of-the-art semi-supervised and unsupervised record linkage techniques. In 3 out of 4 datasets it also achieves comparable results to those of the supervised approaches.  相似文献   

16.
In this paper, NMPC schemes based on fast update methods (fast NMPC schemes) are reviewed that strive to provide a fast but typically suboptimal update of the control variables at each sampling instant with negligible computational delay. The review focuses on schemes that employ one of two subclasses of fast update methods developed for direct solution approaches, the suboptimal update methods and the sensitivity-based update methods. The connections and similarities of the fast update methods, the elements of the fast NMPC, the control architecture as well as the fast NMPC schemes as a whole are highlighted to support the assessment of the benefits and limitations of each individual scheme. In this way, this review facilitates the choice of a suitable fast NMPC scheme within the vast amount of fast NMPC schemes available in literature.  相似文献   

17.
Drawing upon the ‘stimulus-organism-response’ framework, this paper examines how the ease of navigation (EON) of a website affects consumers' online impulse buying. We also compared the effects of various virtual layouts (i.e. grid, freeform, racetrack, and mix grid–freeform) on consumers' perceptions of EON, emotional responses, and the urge to buy impulsively. Based on questionnaire responses from 216 participants in a stratified survey, we found that EON significantly influences consumers' emotional responses, pleasantness, and arousal, which subsequently affects their urge to buy impulsively. We also found that compared with the other three layouts, manifestations of navigability, freeform is the easiest to navigate, and is able to elicit the highest level of pleasantness and the strongest urge to buy impulsively. The findings of this study provide important implications for impulse buying research and practice.  相似文献   

18.
The development of a generalized iterative record linkage system for use in follow-up of cohorts in epidemiologic studies is described. The availability of this system makes such large-scale studies feasible and economical. The methodology for linking records is described as well as the different modules of the computer system developed to apply the methodology. Two applications of record linkage using the generalized system are discussed together with some considerations regarding strategies for conducting linkages efficiently.  相似文献   

19.
为提高分布式在线优化算法的收敛速度,对底层网络拓扑依次添边,提出一种快速的一阶分布式在线对偶平均优化(FODD)算法。首先,对于分布式在线优化问题,运用添边方法使所选的边与网络模型快速混合,进而建立数学模型并设计FODD算法对其进行优化求解。其次,揭示了网络拓扑和在线分布式对偶平均收敛速度之间的关系,通过提高底层拓扑网络的代数连通度改进了Regret界,将在线分布式对偶平均(ODDA)算法从静态网络拓展到时变网络拓扑上,并证明了FODD算法的收敛性,同时解析地给出了收敛速度。最后的数值仿真表明:和ODDA算法相比,所提出的FODD算法具有更快的收敛速度。  相似文献   

20.
This paper aims at to present the integration of the files of the Brazilian Cervical Cancer Information System (SISCOLO) in order to identify all women in the system. SISCOLO has the exam as the unit of observation and the women are not uniquely identified. It has two main tables: histology and cytology, containing the histological and cytological examinations of women, respectively. In this study, data from June 2006 to December 2009 were used. Each table was linked with itself and with the other through record linkage methods. The integration identified 6236 women in the histology table and 1,678,993 in the cytology table. 5324 women from the histology table had records in the cytology table. The sensitivities were above 90% and the specificities and precisions near 100%. This study showed that it is possible to integrate SISCOLO to produce indicators for the evaluation of the cervical cancer screening programme taking the woman as the unit of observation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号