首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Large corporations increasingly utilize business process models for documenting and redesigning their operations. The extent of such modeling initiatives with several hundred models and dozens of often hardly trained modelers calls for automated quality assurance. While formal properties of control flow can easily be checked by existing tools, there is a notable gap for checking the quality of the textual content of models, in particular, its activity labels. In this paper, we address the problem of activity label quality in business process models. We designed a technique for the recognition of labeling styles, and the automatic refactoring of labels with quality issues. More specifically, we developed a parsing algorithm that is able to deal with the shortness of activity labels, which integrates natural language tools like WordNet and the Stanford Parser. Using three business process model collections from practice with differing labeling style distributions, we demonstrate the applicability of our technique. In comparison to a straightforward application of standard natural language tools, our technique provides much more stable results. As an outcome, the technique shifts the boundary of process model quality issues that can be checked automatically from syntactic to semantic aspects.  相似文献   

2.
Flowcharts are considered in this work as a specific 2D handwritten language where the basic strokes are the terminal symbols of a graphical language governed by a 2D grammar. In this way, they can be regarded as structured objects, and we propose to use a MRF to model them, and to allow assigning a label to each of the strokes. We use structured SVM as learning algorithm, maximizing the margin between true labels and incorrect labels. The model would automatically learn the implicit grammatical information encoded among strokes, which greatly improves the stroke labeling accuracy compared to previous researches that incorporated human prior knowledge of flowchart structure. We further complete the recognition by using grammatical analysis, which finally brings coherence to the whole flowchart recognition by labeling the relations between the detected objects.  相似文献   

3.
Grammatical inference – used successfully in a variety of fields such as pattern recognition, computational biology and natural language processing – is the process of automatically inferring a grammar by examining the sentences of an unknown language. Software engineering can also benefit from grammatical inference. Unlike these other fields, which use grammars as a convenient tool to model naturally occurring patterns, software engineering treats grammars as first-class objects typically created and maintained for a specific purpose by human designers. We introduce the theory of grammatical inference and review the state of the art as it relates to software engineering.  相似文献   

4.
作为中文自然语言处理中的基础任务中文分词,其分词的好坏直接影响之后的自然语言处理任务。当前中文分词大部分都是采用基于机器学习的方法,但是其需要人工构建大量特征。针对上述问题,论文提出一种基于深度学习的新分词模型,该模型基于BLSTM(双向长短期神经网络),CNN(卷积神经网络)和CRF(条件随机场),充分利用了BLSTM可以利用长距离信息和CNN提取局部信息的优点。并设计了实验,在三个数据集上验证论文提出的模型在中文分词上的正确性和优越性。  相似文献   

5.
Identifying syntactical information from natural‐language texts requires the use of sophisticated parsing techniques mainly based on statistical and machine‐learning methods. However, due to complexity and efficiency issues many intensive natural‐language processing applications using full syntactic analysis methods may not be effective when processing large amounts of natural‐language texts. These tasks can adequately be performed by identifying partial syntactical information through shallow parsing (or chunking) techniques. In this work, a new approach to natural‐language chunking using an evolutionary model is proposed. It uses previously captured training information to guide the evolution of the model. In addition, a multiobjective optimization strategy is used to produce unique quality values for objective functions involving the internal and the external quality of chunking. Experiments and the main results obtained using the model and state‐of‐the‐art approaches are discussed.  相似文献   

6.
作为监督学习的一种变体,多示例学习(MIL)试图从包中的示例中学习分类器。在多示例学习中,标签与包相关联,而不是与单个示例相关联。包的标签是已知的,示例的标签是未知的。MIL可以解决标记模糊问题,但要解决带有弱标签的问题并不容易。对于弱标签问题,包和示例的标签都是未知的,但它们是潜在的变量。现在有多个标签和示例,可以通过对不同标签进行加权来近似估计包和示例的标签。提出了一种新的基于迁移学习的多示例学习框架来解决弱标签的问题。首先构造了一个基于多示例方法的迁移学习模型,该模型可以将知识从源任务迁移到目标任务中,从而将弱标签问题转换为多示例学习问题。在此基础上,提出了一种求解多示例迁移学习模型的迭代框架。实验结果表明,该方法优于现有多示例学习方法。  相似文献   

7.
单幅图像深度估计是计算机视觉中的经典问题,对场景的3维重建、增强现实中的遮挡及光照处理具有重要意义。本文回顾了单幅图像深度估计技术的相关工作,介绍了单幅图像深度估计常用的数据集及模型方法。根据场景类型的不同,数据集可分为室内数据集、室外数据集与虚拟场景数据集。按照数学模型的不同,单目深度估计方法可分为基于传统机器学习的方法与基于深度学习的方法。基于传统机器学习的单目深度估计方法一般使用马尔可夫随机场(MRF)或条件随机场(CRF)对深度关系进行建模,在最大后验概率框架下,通过能量函数最小化求解深度。依据模型是否包含参数,该方法又可进一步分为参数学习方法与非参数学习方法,前者假定模型包含未知参数,训练过程即是对未知参数进行求解;后者使用现有的数据集进行相似性检索推测深度,不需要通过学习来获得参数。对于基于深度学习的单目深度估计方法本文详细阐述了国内外研究现状及优缺点,同时依据不同的分类标准,自底向上逐层级将其归类。第1层级为仅预测深度的单任务方法与同时预测深度及语义等信息的多任务方法。图片的深度和语义等信息关联密切,因此有部分工作研究多任务的联合预测方法。第2层级为绝对深度预测方法与相对深度关系预测方法。绝对深度是指场景中的物体到摄像机的实际距离,而相对深度关注图片中物体的相对远近关系。给定任意图片,人的视觉更擅于判断场景中物体的相对远近关系。第3层级包含有监督回归方法、有监督分类方法及无监督方法。对于单张图片深度估计任务,大部分工作都关注绝对深度的预测,而早期的大多数方法采用有监督回归模型,即模型训练数据带有标签,且对连续的深度值进行回归拟合。考虑到场景由远及近的特性,也有用分类的思想解决深度估计问题的方法。有监督学习方法要求每幅RGB图像都有其对应的深度标签,而深度标签的采集通常需要深度相机或激光雷达,前者范围受限,后者成本昂贵。而且采集的原始深度标签通常是一些稀疏的点,不能与原图很好地匹配。因此不用深度标签的无监督估计方法是研究趋势,其基本思路是利用左右视图,结合对极几何与自动编码机的思想求解深度。  相似文献   

8.
We present ECOC-DRF, a framework where potential functions for Discriminative Random Fields are formulated as an ensemble of classifiers. We introduce the label trick, a technique to express transitions in the pairwise potential as meta-classes. This allows to independently learn any possible transition between labels without assuming any pre-defined model. The Error Correcting Output Codes matrix is used as ensemble framework for the combination of margin classifiers. We apply ECOC-DRF to a large set of classification problems, covering synthetic, natural and medical images for binary and multi-class cases, outperforming state-of-the art in almost all the experiments.  相似文献   

9.
10.
Multilabel classification via calibrated label ranking   总被引:3,自引:0,他引:3  
Label ranking studies the problem of learning a mapping from instances to rankings over a predefined set of labels. Hitherto existing approaches to label ranking implicitly operate on an underlying (utility) scale which is not calibrated in the sense that it lacks a natural zero point. We propose a suitable extension of label ranking that incorporates the calibrated scenario and substantially extends the expressive power of these approaches. In particular, our extension suggests a conceptually novel technique for extending the common learning by pairwise comparison approach to the multilabel scenario, a setting previously not being amenable to the pairwise decomposition technique. The key idea of the approach is to introduce an artificial calibration label that, in each example, separates the relevant from the irrelevant labels. We show that this technique can be viewed as a combination of pairwise preference learning and the conventional relevance classification technique, where a separate classifier is trained to predict whether a label is relevant or not. Empirical results in the area of text categorization, image classification and gene analysis underscore the merits of the calibrated model in comparison to state-of-the-art multilabel learning methods.  相似文献   

11.
机器阅读理解要求机器能够理解自然语言文本并回答相关问题,是自然语言处理领域的核心技术,也是自然语言处理领域最具挑战性的任务之一。抽取式机器阅读理解是机器阅读理解任务中一个重要的分支,因其更贴合实际情况,更能够反映机器的理解能力,成为当前学术界和工业界的研究热点。对抽取式机器阅读理解从以下四个方面进行了全面地综述:介绍了机器阅读理解任务及其发展历程;介绍了抽取式机器阅读理解任务以及其现阶段存在的难点;对抽取式机器阅读理解任务的主要数据集及方法进行了梳理总结;讨论了抽取式机器阅读理解的未来发展方向。  相似文献   

12.
中文命名实体识别(CNER)任务是问答系统、机器翻译、信息抽取等自然语言应用的基础底层任务。传统的CNER系统借助人工设计的领域词典和语法规则,取得了不错的实验效果,但存在泛化能力弱、鲁棒性差、维护难等缺点。近年来兴起的深度学习技术通过端到端的方式自动提取文本特征,弥补了上述不足。该文对基于深度学习的中文命名实体识别任务最新研究进展进行了综述,先介绍中文命名实体识别任务的概念、应用现状和难点,接着简要介绍中文命名实体识别任务的常用数据集和评估方法,并按照主要网络架构对中文命名实体识别任务上的深度学习模型进行分类和梳理,最后对这一任务的未来研究方向进行了展望。  相似文献   

13.
Predicting labels of structured data such as sequences or images is a very important problem in statistical machine learning and data mining. The conditional random field (CRF) is perhaps one of the most successful approaches for structured label prediction via conditional probabilistic modeling. In such models, it is traditionally assumed that each label is a random variable from a nominal category set (e.g., class categories) where all categories are symmetric and unrelated from one another. In this paper we consider a different situation of ordinal-valued labels where each label category bears a particular meaning of preference or order. This setup fits many interesting problems/datasets for which one is interested in predicting labels that represent certain degrees of intensity or relevance. We propose a fairly intuitive and principled CRF-like model that can effectively deal with the ordinal-scale labels within an underlying correlation structure. Unlike standard log-linear CRFs, learning the proposed model incurs non-convex optimization. However, the new model can be learned accurately using efficient gradient search. We demonstrate the improved prediction performance achieved by the proposed model on several intriguing sequence/image label prediction tasks.  相似文献   

14.
A clean map visualization requires the fewest possible overlaps and depends on how labels are attached to point features. In this paper, we address the cartographic label placement variant problem whose objective is to label a set of points maximizing the number of conflict‐free points. Thus, we propose a hybrid data mining heuristic to solve the point‐feature cartographic label placement problem based on a clustering search (CS) heuristic, a state‐of‐the‐art method for this problem. Although several works have investigated the combination of data mining and multistart metaheuristics, this is the first time data mining has been used to improve CS and simulated annealing based heuristics. Computational experiments showed that the proposed hybrid heuristic was able to reach better cost solutions than the original strategy, with the same time effort. The proposed heuristic also could find almost all known optimal solutions and improved most of the best results for the set of large instances reported so far in the literature.  相似文献   

15.
在多标记分类中,某个标记可能只由其自身的某些特有属性决定,这些特定属性称之为类属属性利用类属属性进行多标记分类,可以有效避免某些无用特征影响构建分类模型的性能然而类属属性算法仅从标记角度去提取重要特征,而忽略了从特征角度去提取重要标记事实上,如果能从特征角度提前关注某些标记,更容易获取这些标记的特有属性基于此,提出了一...  相似文献   

16.
Interactive image segmentation has remained an active research topic in image processing and graphics, since the user intention can be incorporated to enhance the performance. It can be employed to mobile devices which now allow user interaction as an input, enabling various applications. Most interactive segmentation methods assume that the initial labels are correctly and carefully assigned to some parts of regions to segment. Inaccurate labels, such as foreground labels in background regions for example, lead to incorrect segments, even by a small number of inaccurate labels, which is not appropriate for practical usage such as mobile application. In this paper, we present an interactive segmentation method that is robust to inaccurate initial labels (scribbles). To address this problem, we propose a structure-aware labeling method using occurrence and co-occurrence probability (OCP) of color values for each initial label in a unified framework. Occurrence probability captures a global distribution of all color values within each label, while co-occurrence one encodes a local distribution of color values around the label. We show that nonlocal regularization together with the OCP enables robust image segmentation to inaccurately assigned labels and alleviates a small-cut problem. We analyze theoretic relations of our approach to other segmentation methods. Intensive experiments with synthetic and manual labels show that our approach outperforms the state of the art.  相似文献   

17.
二值图像的连通区域标记算法是图像处理的一个基本问题。为了提高算法的效率,以Suzuki等人提出的多遍扫描算法为基础,提出了一种快速的一遍扫描连通域标记算法。算法通过对图像做一次正向扫描,先计算出每个当前像素所在邻域内的最小标号,再利用一个递推过程,查找该连通域中具有较小标号的结点,将被更新结点所在连通分支连接到该结点,以保证等价信息不损失。同时,用最小标号更新递推查找路径上结点的临时标号,以减小分支的深度。通过对连接表的更新使每个结点获得最终标号。算法不需要动态数据结构和递归过程的支持,需要的存储空间较小,算法比原算法速度提高了近2倍,也快于近期提出的一些基于游程的算法。  相似文献   

18.
传统的多标签文本分类算法在挖掘标签的关联信息和提取文本与标签之间的判别信息过程中存在不足,由此提出一种基于标签组合的预训练模型与多粒度融合注意力的多标签文本分类算法。通过标签组合的预训练模型训练得到具有标签关联性的文本编码器,使用门控融合策略融合预训练语言模型和词向量得到词嵌入表示,送入预训练编码器中生成基于标签语义的文本表征。通过自注意力和多层空洞卷积增强的标签注意力分别得到全局信息和细粒度语义信息,自适应融合后输入到多层感知机进行多标签预测。在特定威胁识别数据集和两个通用多标签文本分类数据集上的实验结果表明,提出的方法在能够有效捕捉标签与文本之间的关联信息,并在F1值、汉明损失和召回率上均取得了明显提升。  相似文献   

19.
Currently a consensus on multi-label classification is to exploit label correlations for performance improvement. Many approaches build one classifier for each label based on the one-versus-all strategy, and integrate classifiers by enforcing a regularization term on the global weights to exploit label correlations. However, this strategy might be suboptimal since it may be only part of the global weights that support the assumption. This paper proposes clustered intrinsic label correlations for multi-label classification (CILC), which extends traditional support vector machine to the multi-label setting. The predictive function of each classifier consists of two components: one component is the common information among all labels, and the other component is a label-specific one which highly depends on the corresponding label. The label-specific one representing the intrinsic label correlations is regularized by clustered structure assumption. The appealing features of the proposed method are that it separates the common information and the label-specific information of the labels and utilizes clustered structures among labels represented by the label-specific parts. The practical multi-label classification problems can be directly solved by the proposed CILC method, such as text categorization, image annotation and sentiment analysis. Experiments across five data sets validate the effectiveness of CILC, compared with six well-established multi-label classification algorithms.  相似文献   

20.
Machine Intelligence Research - This paper presents a state of the art machine learning-based approach for automation of a varied class of Internet of things (IoT) analytics problems targeted on...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号