首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The rough-set theory proposed by Pawlak, has been widely used in dealing with data classification problems. The original rough-set model is, however, quite sensitive to noisy data. Ziarko thus proposed the variable precision rough-set model to deal with noisy data and uncertain information. This model allowed for some degree of uncertainty and misclassification in the mining process. Conventionally, the mining algorithms based on the rough-set theory identify the relationships among data using crisp attribute values; however, data with quantitative values are commonly seen in real-world applications. This paper thus deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from quantitative data with a predefined tolerance degree of uncertainty and misclassification. A new method, which combines the variable precision rough-set model and the fuzzy set theory, is thus proposed to solve this problem. It first transforms each quantitative value into a fuzzy set of linguistic terms using membership functions and then calculates the fuzzy β-lower and the fuzzy β-upper approximations. The certain and possible rules are then generated based on these fuzzy approximations. These rules can then be used to classify unknown objects. The paper thus extends the existing rough-set mining approaches to process quantitative data with tolerance of noise and uncertainty.  相似文献   

2.
目前大多数个性化隐私保护算法,对敏感属性的保护方法可以分为两种:一种是对不同的敏感属性设置不同的阈值;另一种是泛化敏感属性,用泛化后的精度低的值取代原来的敏感属性值。两种方法匿名后的数据存在敏感信息泄露的风险或信息损失较大,以及数据可用性的问题。为此,提出个性化(p,α,k)匿名隐私保护算法,根据敏感属性的敏感等级,对等价类中各等级的敏感值采用不同的匿名方法,从而实现对敏感属性的个性化隐私保护。实验表明,该算法较其他个性化隐私保护算法有近似的时间代价,更低的信息损失。  相似文献   

3.
敏感信息识别是净化互联网环境的关键,在当今信息爆炸的时代,人们每天都要从互联网中获得大量信息,如何过滤大量信息中的敏感信息对整个社会安定和谐有着重要的意义.现有的方法主要是基于敏感关键词的方法进行过滤,需要不断更新迭代敏感关键词,泛化性弱,本文中使用基于预训练模型的深度学习方法可以学习到互联网新闻文本中更深层的语义信息,进而更有效的识别和过滤敏感信息,泛化性强,但是只使用深度学习方法会一定程度上的损失敏感关键词特征.本文首次将传统的敏感关键词方法与深度学习方法相结合应用于互联网敏感信息识别,提出了一种融合敏感关键词特征的模型Mer-HiBert.实验结果表明,与之前的敏感关键词方法以及深度学习模型相比,模型的性能有进一步提高.  相似文献   

4.

The continuous k-nearest neighbor query is one of the most important query types to share multimedia data or to continuously identify transportable users in LBS. Various methods have been proposed to efficiently process the continuous k-NN query. However, most of the existing methods suffer from high computation time and larger memory requirement because they unnecessarily access cells to find the nearest cells on a grid index. Furthermore, most methods do not consider the movement of a query. In this paper, we propose a new processing scheme to process the continuous k nearest neighbor query for efficiently support multimedia data sharing and transmission in LBS. The proposed method uses the patterns of the distance relationships among the cells in a grid index. The basic idea is to normalize the distance relationships as certain patterns. Using this approach, the proposed scheme significantly improves the overall performance of the query processing. It is shown through various experiments that our proposed method outperforms the existing methods in terms of query processing time and storage overhead.

  相似文献   

5.
Locality preserving embedding for face and handwriting digital recognition   总被引:1,自引:1,他引:0  
Most supervised manifold learning-based methods preserve the original neighbor relationships to pursue the discriminating power. Thus, structure information of the data distributions might be neglected and destroyed in low-dimensional space in a certain sense. In this paper, a novel supervised method, called locality preserving embedding (LPE), is proposed to feature extraction and dimensionality reduction. LPE can give a low-dimensional embedding for discriminative multi-class sub-manifolds and preserves principal structure information of the local sub-manifolds. In LPE framework, supervised and unsupervised ideas are combined together to learn the optimal discriminant projections. On the one hand, the class information is taken into account to characterize the compactness of local sub-manifolds and the separability of different sub-manifolds. On the other hand, at the same time, all the samples in the local neighborhood are used to characterize the original data distributions and preserve the structure in low-dimensional subspace. The most significant difference from existing methods is that LPE takes the distribution directions of local neighbor data into account and preserves them in low-dimensional subspace instead of only preserving the each local sub-manifold’s original neighbor relationships. Therefore, LPE optimally preserves both the local sub-manifold’s original neighborhood relationships and the distribution direction of local neighbor data to separate different sub-manifolds as far as possible. The criterion, similar to the classical Fisher criterion, is a Rayleigh quotient in form, and the optimal linear projections are obtained by solving a generalized Eigen equation. Furthermore, the framework can be directly used in semi-supervised learning, and the semi-supervised LPE and semi-supervised kernel LPE are given. The proposed LPE is applied to face recognition (on the ORL and Yale face databases) and handwriting digital recognition (on the USPS database). The experimental results show that LPE consistently outperforms classical linear methods, e.g., principal component analysis and linear discriminant analysis, and the recent manifold learning-based methods, e.g., marginal Fisher analysis and constrained maximum variance mapping.  相似文献   

6.
吴琼  潘欣  于超 《计算机仿真》2010,27(4):248-251
土地利用信息是进行土地规划和管理的重要数据,有着重要的经济价值。为了准确获取土地利用数据信息,根据粗集理论在处理遥感影像的不确定、不一致和属性选取方面虽有一定优势。然而现有的粗集方法对于遥感影像中的同物异谱和异物同谱现象过于敏感,采用计算机仿真技术对遥感影像进行自动分类是有效的手段,特别是其分类规则在仿真过程中易于出现过度拟合现象,限制了粗集分类的能力。针对上述情况,提出了一种新的基于粗集的遥感影像分类方法,改进了分类规则匹配机制。仿真结果表明通过改进方法可以很好的解决粗集与遥感影像的过度拟合现象,并提高了分类精度。  相似文献   

7.
8.
Data-mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values, however, transactions with quantitative values are commonly seen in real-world applications. This paper thus proposes a new data-mining algorithm for extracting interesting knowledge from transactions stored as quantitative values. The proposed algorithm integrates fuzzy set concepts and the apriori mining algorithm to find interesting fuzzy association rules in given transaction data sets. Experiments with student grades at I-Shou University were also made to verify the performance of the proposed algorithm.  相似文献   

9.
Spatio-temporal predicates   总被引:10,自引:0,他引:10  
Investigates temporal changes of topological relationships and thereby integrates two important research areas: first, 2D topological relationships that have been investigated quite intensively, and second, the change of spatial information over time. We investigate spatio-temporal predicates, which describe developments of well-known spatial topological relationships. A framework is developed in which spatio-temporal predicates can be obtained by temporal aggregation of elementary spatial predicates and sequential composition. We compare our framework with two other possible approaches: one is based on the observation that spatio-temporal objects correspond to 3D spatial objects for which existing topological predicates can be exploited. The other approach is to consider possible transitions between spatial configurations. These considerations help to identify a canonical set of spatio-temporal predicates  相似文献   

10.
Nowadays, thanks to the rapid evolvement of information technology, an explosively large amount of information with very high-dimensional features for customers is being accumulated in companies. These companies, in turn, are exerting every effort to develop more efficient churn prediction models for managing customer relationships effectively. In this paper, a novel method is proposed to deal with a high-dimensional large data set for constructing better churn prediction models. The proposed method starts by partitioning a data set into small-sized data subsets, and applies sequential manifold learning to reduce high-dimensional features and give consistent results for combined data subsets. The performance of the constructed churn prediction model using the proposed method is tested using an E-commerce data set by comparing it with other existing methods. The proposed method works better and is much faster for high-dimensional large data sets without the need for retraining the original data set to reduce the dimensions of new test samples.  相似文献   

11.
Applications in the water treatment domain generally rely on complex sensors located at remote sites. The processing of the corresponding measurements for generating higher-level information such as optimization of coagulation dosing must therefore account for possible sensor failures and imperfect input data. In this paper, self-organizing map (SOM)-based methods are applied to multiparameter data validation and missing data reconstruction in a drinking water treatment. The SOM is a special kind of artificial neural networks that can be used for analysis and visualization of large high-dimensional data sets. It performs both in a nonlinear mapping from a high-dimensional data space to a low-dimensional space aiming to preserve the most important topological and metric relationships of the original data elements and, thus, inherently clusters the data. Combining the SOM results with those obtained by a fuzzy technique that uses marginal adequacy concept to identify the functional states (normal or abnormal), the SOM performances of validation and reconstruction process are tested successfully on the experimental data stemming from a coagulation process involved in drinking water treatment.  相似文献   

12.
基于粗集理论的推理风险量化分析方法   总被引:1,自引:0,他引:1  
研究了基于粗集理论评估安全数据库中敏感数据的推理风险的方法.粗集理论被用于发现关系内部存在的潜在规则.并依据这些推理规则评估敏感数据所面临的推理风险.如果推理风险大于某个阈值,则提高敏感数据所在元组部分或全部属性的安全级,从而可以保证数据库中的敏感数据是推理安全的.  相似文献   

13.
计算机信息粒通常是使用概率性方法以判别式学习的方式进行的,当分类任务的性质是识别特定类别的模式时,如在情绪检测的情况下,可以同时从同一个人识别出多种情绪,这通常表明不同的情绪可能涉及特定的关系而不是相互排斥。本文基于模式时间序列用来识别密集型现实数据实例的分类。并以生命科学的UCI数据集作为实验对象,通过本文提出的方法与常用的概率方法进行比较,结果表明,该方法不仅可以作为概率方法的替代方法,而且还可以捕获概率方法无法实现的更多模式。  相似文献   

14.
CLARANS: a method for clustering objects for spatial data mining   总被引:14,自引:0,他引:14  
Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. To this end, this paper has three main contributions. First, it proposes a new clustering method called CLARANS, whose aim is to identify spatial structures that may be present in the data. Experimental results indicate that, when compared with existing clustering methods, CLARANS is very efficient and effective. Second, the paper investigates how CLARANS can handle not only point objects, but also polygon objects efficiently. One of the methods considered, called the IR-approximation, is very efficient in clustering convex and nonconvex polygon objects. Third, building on top of CLARANS, the paper develops two spatial data mining algorithms that aim to discover relationships between spatial and nonspatial attributes. Both algorithms can discover knowledge that is difficult to find with existing spatial data mining algorithms.  相似文献   

15.
构造性覆盖下不完整数据修正填充方法   总被引:1,自引:0,他引:1       下载免费PDF全文
不完整数据处理是数据挖掘、机器学习等领域中的重要问题,缺失值填充是处理不完整数据的主流方法。当前已有的缺失值填充方法大多运用统计学和机器学习领域的相关技术来分析原始数据中的剩余信息,从而得到较为合理的值来替代缺失部分。缺失值填充大致可以分为单一填充和多重填充,这些填充方法在不同的场景下有着各自的优势。但是,很少有方法能进一步考虑样本空间分布中的邻域信息,并以此对缺失值的填充结果进行修正。鉴于此,本文提出了一种可广泛应用于诸多现有填充方法的框架用以提升现有方法的填充效果,该框架由预填充、空间邻域信息挖掘和修正填充三部分构成。本文对7种填充方法在8个UCI数据集上进行了实验,实验结果验证了本文所提框架的有效性和鲁棒性。  相似文献   

16.
This paper presents a decision support tool that can be used by practitioners and industrialists to solve practical cell formation problems. The tool is based on a cell formation algorithm that employs a set of heuristic rules to obtain a quasi-optimal solution from both component routing information and other significant production data. The algorithm has been tested on a number of data sets obtained from the literature. The test results have demonstrated that in many cases the algorithm has produced an exceptional performance in terms of the grouping efficiency, grouping efficacy and quality index measures. The algorithm, to an extent, overcomes common problems in existing cell formation methods such as in dealing with ill-structured matrices and achieving rational cell sizes.  相似文献   

17.
数据发布中面向多敏感属性的隐私保护方法   总被引:12,自引:0,他引:12  
现有的隐私数据发布技术通常关注单敏感属性数据,直接应用于多敏感属性数据会导致大量隐私信息的泄漏.文中首次对多敏感属性数据发布问题进行详细研究,继承了基于有损连接对隐私数据进行保护的思想,提出了针对多敏感属性隐私数据发布的多维桶分组技术——MSB(Multi-Sensitive Bucketization).为了避免高复杂性的穷举方法,首先提出3种不同的线性时间的贪心算法:最大桶优先算法(MBF)、最大单维容量优先算法(MSDCF)和最大多维容量优先算法(MMDCF).另外,针对实际应用中发布数据的重要性差异,提出加权多维桶分组技术.实际数据集上的大量实验结果表明,所提出的前3种算法的附加信息损失度为0.04,而隐匿率都低于0.06.加权多维桶分组技术对数据拥有者定义的重要信息的可发布性达到70%以上.  相似文献   

18.
Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between them, or a mix of both. Existing methods for relational clustering have strong and often implicit biases in this respect. In this paper, we introduce a novel dissimilarity measure for relational data. It is the first approach to incorporate a wide variety of types of similarity, including similarity of attributes, similarity of relational context, and proximity in a hypergraph. We experimentally evaluate the proposed dissimilarity measure on both clustering and classification tasks using data sets of very different types. Considering the quality of the obtained clustering, the experiments demonstrate that (a) using this dissimilarity in standard clustering methods consistently gives good results, whereas other measures work well only on data sets that match their bias; and (b) on most data sets, the novel dissimilarity outperforms even the best among the existing ones. On the classification tasks, the proposed method outperforms the competitors on the majority of data sets, often by a large margin. Moreover, we show that learning the appropriate bias in an unsupervised way is a very challenging task, and that the existing methods offer a marginal gain compared to the proposed similarity method, and can even hurt performance. Finally, we show that the asymptotic complexity of the proposed dissimilarity measure is similar to the existing state-of-the-art approaches. The results confirm that the proposed dissimilarity measure is indeed versatile enough to capture relevant information, regardless of whether that comes from the attributes of vertices, their proximity, or connectedness of vertices, even without parameter tuning.  相似文献   

19.
现有的混合信息系统知识发现模型涵盖的数据类型大多为符号型、数值型条件属性及符号型决策属性,且大多数模型的关注点是属性约简或特征选择,针对规则提取的研究相对较少。针对涵盖更多数据类型的混合信息系统构建一个动态规则提取模型。首先修正了现有的属性值距离的计算公式,对错层型属性值的距离给出了一种定义形式,从而定义了一个新的混合距离。其次提出了针对数值型决策属性诱导决策类的3种方法。其后构造了广义邻域粗糙集模型,提出了动态粒度下的上下近似及规则提取算法,构建了基于邻域粒化的动态规则提取模型。该模型可用于具有以下特点的信息系统的规则提取: (1)条件属性集可包括单层符号型、错层符号型、数值型、区间型、集值型、未知型等; (2)决策属性集可包括符号型、数值型。利用UCI数据库中的数据集进行了对比实验,分类精度表明了规则提取算法的有效性。  相似文献   

20.
针对目前的室内人员步态识别方法存在计算量大、设备成本高、鲁棒性低等问题,提出一种基于信道状态信息的高鲁棒性室内人员步态识别方法WiKown。通过快速傅里叶变换设置能量指示器监测人员行走行为,将采集的CSI步态数据经滤波与降噪处理后以滑动窗口的方式提取特征值,得到人员步态的CSI信息后建立观测序列,最后通过高斯分布叠加拟合后引入隐马尔科夫模型计算观测序列概率,生成步态参数模型。在走廊、实验室和大厅真实多人环境中,WiKown方法对单人步态的平均识别率达到92.71%。实验结果表明,与决策树、动态时间规整和长短时记忆网络方法相比较,该方法能有效地识别出人员的步态信息,提升了识别精度和鲁棒性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号