首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
Listwise approaches are an important class of learning to rank, which utilizes automatic learning techniques to discover useful information. Most previous research on listwise approaches has focused on optimizing ranking models using weights and has used imprecisely labeled training data; optimizing ranking models using features was largely ignored thus the continuous performance improvement of these approaches was hindered. To address the limitations of previous listwise work, we propose a quasi-KNN model to discover the ranking of features and employ rank addition rule to calculate the weight of combination. On the basis of this, we propose three listwise algorithms, FeatureRank, BLFeatureRank, and DiffRank. The experimental results show that our proposed algorithms can be applied to a strict ordered ranking training set and gain better performance than state-of-the-art listwise algorithms.  相似文献   

5.
在非结构化数据挖掘结构模型,即发现特征子空间模型(DFSSM)的运行机制下,提出了一种新的文本分类算法——基于DFSSM 的文本分类(TCDFSSM) 算法。该算法在文本训练及分类阶段的基础上增加了自动反馈阶段,使得TCDFSSM具有自学习能力,并给出了文本分类过程反馈阈值的选取算法。结果表明,该算法分类效果良好,其自学习能力、适应性及鲁棒性更加优越。  相似文献   

6.
7.
8.
9.
10.
11.
12.
13.
Fraudulent financial reporting (FFR) involves conscious efforts to mislead others regarding the financial condition of a business. It usually consists of deliberate actions to deceive regulators, investors or the general public that also hinder systematic approaches from effective detection. The challenge comes from distinguishing dichotomous samples that have their major attributes falling in the same distribution. This study pioneers a novel dual GHSOM (Growing Hierarchical Self-Organizing Map) approach to discover the topological patterns of FFR, achieving effective FFR detection and feature extraction. Specifically, the proposed approach uses fraudulent samples and non-fraudulent samples to train a pair of dual GHSOMs under the same training parameters and examines the hypotheses for counterpart relationships among their subgroups taking advantage of unsupervised learning nature and growing hierarchical structures from GHSOMs. This study further presents (1) an effective classification rule to detect FFR based on the topological patterns and (2) an expert-competitive feature extraction mechanism to capture the salient characteristics of fraud behaviors. The experimental results against 762 annual financial statements from 144 public-traded companies in Taiwan (out of which 72 are fraudulent and 72 are non-fraudulent) reveal that the topological pattern of FFR follows the non-fraud-central spatial relationship, as well as shows the promise of using the topological patterns for FFR detection and feature extraction.  相似文献   

14.
It has been widely reported that the reuse of previously created components, or features, in new engineering designs will improve the efficiency of a company’s product development process. Although the reuse of engineering components has established metrics and methodologies, the reuse of specific design features (e.g. stiffening ribs, hole patterns or lubrication grooves, etc.) has received less attention in the literature. Typically, researchers have reported approaches to partial design reuse that identify patterns predominately in terms of geometrically similar shapes (i.e. a set of features) whose elements are adjacent, cohesive, and decoupled from the overall form of a component.In contrast, this paper defines a common design structure (CDS) as collections of frequently occurring features (e.g. holes) with common parametric values (e.g. diameters) in a CAD database (irrespective of their locations or spatial connectivity between other features on a component). By exploiting the established data-mining technology of association rules and item-sets the authors show how CDSs can be efficiently computed for hundreds of 3D CAD models. A case study, with hole data extracted from a publicly available dataset of hydraulic valves, is presented to illustrate how item-sets associated with CDS can be computed and used to support predictive design by identifying potentially ‘substitutable features’ during an interactive design process. This is done using a combination of association rules and geometric compatibility checks to ensure the system’s suggestion are implementable. The use of the Kullback–Leibler divergence to assess the degree of similarity between components is identified as a crucial step in the process of identifying the “best” suggestions. The results illustrate how the prototype implementation successfully mines the CDSs and identifies substitutable hole features in a dataset of industrial valve designs.  相似文献   

15.
为解决现有电商水军特征模型对文化产品水军识别不足以及单一分类器识别精度不高的问题,提出面向文化产品水军的多视角特征表达与识别模型.根据文化产品具有丰富的语义性、严格的时效性以及网络交互性等3个特点,从内容、行为、属性3个视角提出了评论主题相似度、平均有用度、行为关联性、兴趣关联性、平均评价积极度和综合质量评价等特征.将...  相似文献   

16.
17.
Automated scientific discovery, a topic in artificial intelligence has mainly been used to generate scientific insight from data. Our work follows the knowledge-driven discovery approach and introduces the use of category theory as the foundation for modeling diverse engineering fields represented with combinatorial representation. We show how category theory provides support for all stages of the discovery process starting from modeling the engineering knowledge. We demonstrate the use of the approach to rediscover previous discoveries in mechanics and discover new devices, some of which need to be realized to be appreciated. Category theory allows expanding the process to disciplines not modeled with combinatorial representations. We intend to demonstrate this in future studies.  相似文献   

18.
19.
20.
针对通用新词发现方法对专利长词识别效果不佳、专利术语词性搭配模板的灵活性不高,以及缺乏对中文专利长词识别的无监督方法的问题,提出了一种发现专利新词的双向聚合度特征提取新方法。首先,以词中组分的双向条件概率统计信息为基础,构造提出了一个二元词上的双向聚合度统计特征;其次,利用此特征扩展提出了词边界筛选规则;最后,基于新特征和词边界规则实现专利新词的提取。实验结果表明,新方法在整体F-测度值方面,与通用领域新词发现方法相比,提高了6.7个百分点,与两种最新的专利词性搭配模板方法相比,分别提高了19.2个百分点和17.2个百分点,并且较为显著地提高了4~8字专利新词发现的F-测度值。综合地,所提出的方法提升了专利新词发现性能,并且能够更有效地提取专利文本中具有复合形式的长词,同时可以减少对预先训练过程和额外复杂规则库的依赖,具备更好的实用性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号