首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An alternative form to multidimensional projections for the visual analysis of data represented in multidimensional spaces is the deployment of similarity trees, such as Neighbor Joining trees. They organize data objects on the visual plane emphasizing their levels of similarity with high capability of detecting and separating groups and subgroups of objects. Besides this similarity-based hierarchical data organization, some of their advantages include the ability to decrease point clutter; high precision; and a consistent view of the data set during focusing, offering a very intuitive way to view the general structure of the data set as well as to drill down to groups and subgroups of interest. Disadvantages of similarity trees based on neighbor joining strategies include their computational cost and the presence of virtual nodes that utilize too much of the visual space. This paper presents a highly improved version of the similarity tree technique. The improvements in the technique are given by two procedures. The first is a strategy that replaces virtual nodes by promoting real leaf nodes to their place, saving large portions of space in the display and maintaining the expressiveness and precision of the technique. The second improvement is an implementation that significantly accelerates the algorithm, impacting its use for larger data sets. We also illustrate the applicability of the technique in visual data mining, showing its advantages to support visual classification of data sets, with special attention to the case of image classification. We demonstrate the capabilities of the tree for analysis and iterative manipulation and employ those capabilities to support evolving to a satisfactory data organization and classification.  相似文献   

2.
为了弥补图像底层特征到高层语义之间的语义鸿沟,提出一种颜色语义特征的构建方法以建立新的语义映射来提高图像分类准确率。通过提取底层颜色特征,构建包含颜色概念的语义网络,建立了颜色语义特征三元组,利用机器学习分类算法进行图像分类。实验结果表明,利用文章提出的新方法构建的语义特征向量进行图像分类,不仅可以取得优秀的分类结果,同时对不同的分类算法具有鲁棒性。  相似文献   

3.
The application of data types to database semantic integrity   总被引:1,自引:0,他引:1  
Data type concepts are used to investigate the extent to which database semantic integrity can be defined and ensured through database structures. Database and datatype concepts are extended mutually to improve the semantic capabilities of both database models and data type systems and to resolve apparent discrepencies between databases and programming languages. To meet database needs, data structuring is developed to form an algebra of data types. A semantically rich database model is used to show that database models can be expressed in terms of data types. Finally, a schema specification language is presented to demonstrate the power of data type tools for the definition of database schemas and for the maintenance of database semantic integrity.  相似文献   

4.
In many real-life problems, obtaining labelled data can be a very expensive and laborious task, while unlabeled data can be abundant. The availability of labeled data can seriously limit the performance of supervised learning methods. Here, we propose a semi-supervised classification tree induction algorithm that can exploit both the labelled and unlabeled data, while preserving all of the appealing characteristics of standard supervised decision trees: being non-parametric, efficient, having good predictive performance and producing readily interpretable models. Moreover, we further improve their predictive performance by using them as base predictive models in random forests. We performed an extensive empirical evaluation on 12 binary and 12 multi-class classification datasets. The results showed that the proposed methods improve the predictive performance of their supervised counterparts. Moreover, we show that, in cases with limited availability of labeled data, the semi-supervised decision trees often yield models that are smaller and easier to interpret than supervised decision trees.  相似文献   

5.
Making the non-terminal nodes of a binary tree classifier fuzzy can mitigate tree brittleness. Using a genetic algorithm, two optimization techniques are explored. In one case, each generation minimizes classification error by optimizing a common fuzzy percent, pT, used to determine parameters at every node. In the other case, each generation yields a sequence of minimized node-specific parameters. The output value is determined through defuzzification after input vectors, in general, take both paths at each node with a weighting factor determined by the node membership functions. Experiments conducted using this geno-fuzzy approach yield an improvement compared with other classical algorithms.  相似文献   

6.
文本分类是组织和处理海量文本信息的关键方法。目前的文本分类模型多用关键词特征向量描述文本资源,造成向量的高维性和稀疏性。引入文本资源的概念特征,将文本资源描述由关键词级提升至概念级,提高文本资源描述的准确性,并提出了基于概念特征的语义文本分类模型。仿真实验的结果表明,该模型能有效克服资源特征向量空间的高维性和稀疏性,确保向量空间的正交性,在语义文本分类的效率和正确性上都有良好的表现。  相似文献   

7.
The distributed nature of the Web, as a decentralized system exchanging information between heterogeneous sources, has underlined the need to manage interoperability, i.e., the ability to automatically interpret information in Web documents exchanged between different sources, necessary for efficient information management and search applications. In this context, XML was introduced as a data representation standard that simplifies the tasks of interoperation and integration among heterogeneous data sources, allowing to represent data in (semi-) structured documents consisting of hierarchically nested elements and atomic attributes. However, while XML was shown most effective in exchanging data, i.e., in syntactic interoperability, it has been proven limited when it comes to handling semantics, i.e.,  semantic interoperability, since it only specifies the syntactic and structural properties of the data without any further semantic meaning. As a result, XML semantic-aware processing has become a motivating challenge in Web data management, requiring dedicated semantic analysis and disambiguation methods to assign well-defined meaning to XML elements and attributes. In this context, most existing approaches: (i) ignore the problem of identifying ambiguous XML elements/nodes, (ii) only partially consider their structural relationships/context, (iii) use syntactic information in processing XML data regardless of the semantics involved, and (iv) are static in adopting fixed disambiguation constraints thus limiting user involvement. In this paper, we provide a new XML Semantic Disambiguation Framework titled XSDFdesigned to address each of the above limitations, taking as input: an XML document, and then producing as output a semantically augmented XML tree made of unambiguous semantic concepts extracted from a reference machine-readable semantic network. XSDF consists of four main modules for: (i) linguistic pre-processing of simple/compound XML node labels and values, (ii) selecting ambiguous XML nodes as targets for disambiguation, (iii) representing target nodes as special sphere neighborhood vectors including all XML structural relationships within a (user-chosen) range, and (iv) running context vectors through a hybrid disambiguation process, combining two approaches: concept-basedand context-based disambiguation, allowing the user to tune disambiguation parameters following her needs. Conducted experiments demonstrate the effectiveness and efficiency of our approach in comparison with alternative methods. We also discuss some practical applications of our method, ranging over semantic-aware query rewriting, semantic document clustering and classification, Mobile and Web services search and discovery, as well as blog analysis and event detection in social networks and tweets.  相似文献   

8.
《国际计算机数学杂志》2012,89(3-4):189-208
Execution of sub-processes within a program segment are subject to a partial ordering. In certain cases (such as expressions and assignment statements) this ordering reduces to a tree which, according to the characteristics of the operators present, may be manipulated to influence the extent to which parallel processing capabilities of multiple-processor configurations can be utilized in its evaluation. A strategy is presented which uses associativity of certain operators to adjust the shape of the trees to allow a degree of overlap between adjacent subtrees. Although only optimal in the local sense, the transformation yields significant improvements in the “parallel dimensions” of the tree and, more importantly, can be couched in syntactic terms. Consequently, it is possible in principle to perform these manipulations within the syntax analysis phase of compilation, regardless of other operational characteristics of the operators, or of the parallel capabilities of the target run-time system.  相似文献   

9.
《Knowledge》2002,15(5-6):301-308
The automatic induction of classification rules from examples in the form of a decision tree is an important technique used in data mining. One of the problems encountered is the overfitting of rules to training data. In some cases this can lead to an excessively large number of rules, many of which have very little predictive value for unseen data. This paper is concerned with the reduction of overfitting during decision tree generation. It introduces a technique known as J-pruning, based on the J-measure, an information theoretic means of quantifying the information content of a rule.  相似文献   

10.
Access to legal information and, in particular, to legal literature is examined for the creation of a search and retrieval system for Italian legal literature. The design and implementation of services such as integrated access to a wide range of resources are described, with a particular focus on the importance of exploiting metadata assigned to disparate legal material. The integration of structured repositories and Web documents is the main purpose of the system: it is constructed on the basis of a federation system with service provider functions, aiming at creating a centralized index of legal resources. The index is based on a uniform metadata view created for structured data by means of the OAI approach and for Web documents by a machine learning approach, which, in this paper, has been assessed as regards document classification. Semantic searching is a major requirement for legal literature users and a solution based on the exploitation of Dublin Core metadata, as well as the use of legal ontologies and related terms prepared for accessing indexed articles have been implemented.
E. FrancesconiEmail:
  相似文献   

11.
12.
贾圣宾  向阳 《计算机应用》2018,38(3):620-625
针对智能服务制定与提供过程中时间语义处理难的问题,提出一种面向智能服务系统的时间语义信息理解模型。在自然语言描述的服务消息文本上,实现对时间信息的抽取、映射和语义建模,从而为一般的智能服务系统提供通用的时间语义表达模式。首先,模型采用启发式策略自动抽取时间短语并构建时间信息知识库,无需人工干预;然后,提出一种基于时间基元的时间信息映射方法,实现了绝对时间的量化表达以及相对时间的逻辑推理;最后,综合利用时间信息与上下文信息构建时间语义模型。实验结果表明,该模型在服务自然语言文本测试集上,时间信息抽取准确率高达97.58%,时间信息映射准确率高于85%,语义建模效果良好。  相似文献   

13.
14.
传统的文本分类都是根据文本的外在特征进行的,最常见的就是基于向量空间模型的方法,使用空间向量表示文本,通过相似度比较来确定分类。为了克服向量空间模型中的词条独立性假设,文章提出了一种基于潜在语义索引的文本分类模型,通过对大量的文本集进行统计分析,揭示了词语的上下文使用含义,通过奇异值分解有效地降低了向量空间的维数,消除了同义词、多义词的影响,从而提高了文本分类的精度。  相似文献   

15.
自动图像标注是一项具有挑战性的工作,它对于图像分析理解和图像检索都有着重要的意义.在自动图像标注领域,通过对已标注图像集的学习,建立语义概念空间与视觉特征空间之间的关系模型,并用这个模型对未标注的图像集进行标注.由于低高级语义之间错综复杂的对应关系,使目前自动图像标注的精度仍然较低.而在场景约束条件下可以简化标注与视觉特征之间的映射关系,提高自动标注的可靠性.因此提出一种基于场景语义树的图像标注方法.首先对用于学习的标注图像进行自动的语义场景聚类,对每个场景语义类别生成视觉场景空间,然后对每个场景空间建立相应的语义树.对待标注图像,确定其语义类别后,通过相应的场景语义树,获得图像的最终标注.在Corel5K图像集上,获得了优于TM(translation model)、CMRM(cross media relevance model)、CRM(continous-space relevance model)、PLSA-GMM(概率潜在语义分析-高期混合模型)等模型的标注结果.  相似文献   

16.
基于特征融合的图像情感语义分类   总被引:1,自引:0,他引:1  
基于颜色或颜色-空间信息的图像分类方法,由于没有考虑图像中所含目标对象的形状特征,分类效果不够理想,以服装图像作为数据源,提出并设计了颜色-边缘方向角二维直方图,将图像的颜色特征与形状特征融合起来进行图像分类。图像中的低阶可视化特征与高阶情感概念之间有着密切的关联,分析了服装图像的颜色和形状的融合特征与情感之间的相关性,采用概率神经网络作为分类算法来完成情感语义分类,实验结果表明,该方法的分类精度有了明显的提高。  相似文献   

17.
Classification trees are widely used in the data mining community. Typically, trees are constructed to try and maximize their mean classification accuracy. In this paper, we propose an alternative to using the mean accuracy as the performance measure of a tree. We investigate the use of various percentiles (representing the risk aversion of a decision maker) of the distribution of classification accuracy in place of the mean. We develop a genetic algorithm (GA) to build decision trees based on this new criterion. We develop this GA further by explicitly creating diversity in the population by simultaneously considering two fitness criteria within the GA. We show that our bicriterion GA performs quite well, scales up to handle large data sets, and requires a small sample of the original data to build a good decision tree.  相似文献   

18.
FRCT: fuzzy-rough classification trees   总被引:1,自引:1,他引:0  
Using fuzzy-rough hybrids, we have proposed a measure to quantify the functional dependency of decision attribute(s) on condition attribute(s) within fuzzy data. We have shown that the proposed measure of dependency degree is a generalization of the measure proposed by Pawlak for crisp data. In this paper, this new measure of dependency degree has been encapsulated into the decision tree generation mechanism to produce fuzzy-rough classification trees (FRCT); efficient, top-down, multi-class decision tree structures geared to solving classification problems from feature-based learning examples. The developed FRCT generation algorithm has been applied to 16 real-world benchmark datasets. It is experimentally compared with the five fuzzy decision tree generation algorithms reported so far, and the rough decomposition tree algorithm. Comparison has been made in terms of number of rules, average training time, and classification accuracy. Experimental results show that the proposed algorithm to generate FRCT outperforms existing fuzzy decision tree generation techniques and rough decomposition tree induction algorithm.
Rajen B. BhattEmail:

Dr. Rajen Bhatt   has obtained his B.E. and M.E. both in Control and Instrumentation, from S.S. Engineering College, Bhavnagar, and from Delhi College of Engineering, New Delhi in 1999 and 2002, respectively. He has obtained his Ph.D. from the Department of Electrical Engineering, Indian Institute of Technology Delhi, INDIA in 2006. He was actively engaged in the development of multimedia course on Control Engineering under the National Program on Technology Enabled Learning (NPTEL). He is a regular reviewer of International Journals like Pattern Recognition, Information Sciences, Pattern Analysis and Applications, and IEEE Trans. on Systems, Man and Cybernatics. Since June 2005, he is working with Imaging team of Samsung India Software Centre as a Lead Engineer. He also serves as a Member of Patent Review Committee at Samsung. He has published several research papers in reputed journals and conferences. His current research interests are Pattern Classification and Regression, Soft Computing, Data mining, Patents and Trademarks, and Information Technology for Education. He holds an expertise over industry standard software project management. Dr. M. Gopal   has obtained his B.Tech. (Electrical), M.Tech. (Control systems), and Ph.D. (Control Systems) degrees. all from Birla Institute of Technology and Science, Pilani in 1968, 1970, and 1976, respectively. He has been in the teaching and research field for the last three and half decades; associated with NIT Jaipur, BITS Pilani, IIT Bombay, City University London, and University Technology Malaysia, and IIT Delhi. Since January 1986 he is a Professor with the Electrical Engineering Department, Indian Institute of Technology Delhi. He has published six books in the area of Control Engineering, and a video course on Control Engineering including complete presentation and student questionnaires. He has also published interactive web-compatible multimedia course on Control Engineering, under National Program on Technology Enabled Learning (NPTEL). He has published several research papers in referred journals and conferences. His current research interests include Machine learning, Soft computing technologies, Intelligent control, and e-Learning.   相似文献   

19.
Many national and international governments establish organizations for applied science research funding. For this, several organizations have defined procedures for identifying relevant projects that based on prioritized technologies. Even for applied science research projects, which combine several technologies it is difficult to identify all corresponding technologies of all research-funding organizations. In this paper, we present an approach to support researchers and to support research-funding planners by classifying applied science research projects according to corresponding technologies of research-funding organizations. In contrast to related work, this problem is solved by considering results from literature concerning the application based technological relationships and by creating a new approach that is based on latent semantic indexing (LSI) as semantic text classification algorithm. Technologies that occur together in the process of creating an application are grouped in classes, semantic textual patterns are identified as representative for each class, and projects are assigned to one of these classes. This enables the assignment of each project to all technologies semantically grouped by use of LSI. This approach is evaluated using the example of defense and security based technological research. This is because the growing importance of this application field leads to an increasing number of research projects and to the appearance of many new technologies.  相似文献   

20.
It has been recognized that Classification trees (CART) are unstable; a small perturbation in the input variables or a fresh sample can lead to a very different classification tree. Some approaches exist that try to correct this instability. However, their benefits can, at present, be appreciated only qualitatively. A similarity measure between two classification trees is introduced that can measure their closeness. Its usefulness is illustrated with synthetic data on the impact of radioactivity deposit through the environment. In this context, a modified node level stabilizing technique, referred to as the NLS-REP method, is introduced and shown to be more stable than the classical CART method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号