首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
Noise filtering is most frequently used in data preprocessing to improve the accuracy of induced classifiers. The focus of this work is different: we aim at detecting noisy instances for improved data understanding, data cleaning and outlier identification. The paper is composed of three parts. The first part presents an ensemble-based noise ranking methodology for explicit noise and outlier identification, named Noise- Rank, which was successfully applied to a real-life medical problem as proven in domain expert evaluation. The second part is concerned with quantitative performance evaluation of noise detection algorithms on data with randomly injected noise. A methodology for visual performance evaluation of noise detection algorithms in the precision-recall space, named Viper, is presented and compared to standard evaluation practice. The third part presents the implementation of the NoiseRank and Viper methodologies in a web-based platform for composition and execution of data mining workflows. This implementation allows public accessibility of the developed approaches, repeatability and sharing of the presented experiments as well as the inclusion of web services enabling to incorporate new noise detection algorithms into the proposed noise detection and performance evaluation workflows.  相似文献   

2.
Łukasz Jeż 《Algorithmica》2013,67(4):498-515
We give a memoryless scale-invariant randomized algorithm ReMix for Packet Scheduling that is e/(e?1)-competitive against an adaptive adversary. ReMix unifies most of previously known randomized algorithms, and its general analysis yields improved performance guarantees for several restricted variants, including the s-bounded instances. In particular, ReMix attains the optimum competitive ratio of 4/3 on 2-bounded instances. Our results are applicable to a more general problem, called Item Collection, in which only the relative order between packets’ deadlines is known. ReMix is the optimal memoryless randomized algorithm against adaptive adversary for that problem.  相似文献   

3.
In this paper, we present Para Miner which is a generic and parallel algorithm for closed pattern mining. Para Miner is built on the principles of pattern enumeration in strongly accessible set systems. Its efficiency is due to a novel dataset reduction technique (that we call EL-reduction), combined with novel technique for performing dataset reduction in a parallel execution on a multi-core architecture. We illustrate Para Miner’s genericity by using this algorithm to solve three different pattern mining problems: the frequent itemset mining problem, the mining frequent connected relational graphs problem and the mining gradual itemsets problem. In this paper, we prove the soundness and the completeness of Para Miner. Furthermore, our experiments show that despite being a generic algorithm, Para Miner can compete with specialized state of the art algorithms designed for the pattern mining problems mentioned above. Besides, for the particular problem of gradual itemset mining, Para Miner outperforms the state of the art algorithm by two orders of magnitude.  相似文献   

4.
Inclusion/exclusion and measure and conquer are two central techniques from the field of exact exponential-time algorithms that recently received a lot of attention. In this paper, we show that both techniques can be used in a single algorithm. This is done by looking at the principle of inclusion/exclusion as a branching rule. This inclusion/exclusion-based branching rule can be combined in a branch-and-reduce algorithm with traditional branching rules and reduction rules. The resulting algorithms can be analysed using measure and conquer allowing us to obtain good upper bounds on their running times. In this way, we obtain the currently fastest exact exponential-time algorithms for a number of domination problems in graphs. Among these are faster polynomial-space and exponential-space algorithms for #Dominating Set and Minimum Weight Dominating Set (for the case where the set of possible weight sums is polynomially bounded), and a faster polynomial-space algorithm for Domatic Number. This approach is also extended in this paper to the setting where not all requirements in a problem need to be satisfied. This results in faster polynomial-space and exponential-space algorithms for Partial Dominating Set, and faster polynomial-space and exponential-space algorithms for the well-studied parameterised problem k-Set Splitting and its generalisation k-Not-All-Equal Satisfiability.  相似文献   

5.
多示例多标记学习(Multi-Instance Multi-Label,MIML)是一种新的机器学习框架,基于该框架上的样本由多个示例组成并且与多个类别相关联,该框架因其对多义性对象具有出色的表达能力,已成为机器学习界研究的热点.解决MIML分类问题的最直接的思路是采用退化策略,通过向多示例学习或多标记学习的退化,将MIML框架下的分类问题简化为一系列的二类分类问题进行求解.但是在退化过程中会丢失标记之间的关联信息,降低分类的准确率.针对此问题,本文提出了MIMLSVM-LOC算法,该算法将改进的MIMLSVM算法与一种局部标记相关性的方法ML-LOC相结合,在训练过程中结合标记之间的关联信息进行分类.算法首先对MIMLSVM算法中的K-medoids聚类算法进行改进,采用的混合Hausdorff距离,将每一个示例包转化为一个示例,将MIML问题进行了退化.然后采用单示例多标记的算法ML-LOC算法继续以后的分类工作.在实验中,通过与其他多示例多标记算法对比,得出本文提出的算法取得了比其他分类算法更优的分类效果.  相似文献   

6.
7.
8.
This paper presents a distributed (Bulk-Synchronous Parallel or bsp) algorithm to compute on-the-fly whether a structured model of a security protocol satisfies a ctl \(^*\) formula. Using the structured nature of the security protocols allows us to design a simple method to distribute the state space under consideration in a need-driven fashion. Based on this distribution of the states, the algorithm for logical checking of a ltl formula can be simplified and optimised allowing, with few tricky modifications, the design of an efficient algorithm for ctl \(^*\) checking. Some prototype implementations have been developed, allowing to run benchmarks to investigate the parallel behaviour of our algorithms.  相似文献   

9.
Self-organization of autonomous mobile nodes using bio-inspired algorithms in mobile ad hoc networks (manets) has been presented in earlier work of the authors. In this paper, the convergence speed of our force-based genetic algorithm (called fga) is provided through analysis using homogeneous Markov chains. The fga is run by each mobile node as a topology control mechanism to decide a corresponding node??s next speed and movement direction so that it guides an autonomous mobile node over an unknown geographical area to obtain a uniform node distribution while only using local information. The stochastic behavior of fga, like all ga-based approaches, makes it difficult to analyze the effects that various manet characteristics have on its convergence speed. Metrically transitive homogeneous Markov chains have been used to analyze the convergence of our fga with respect to various communication ranges of mobile nodes and also the number of nodes in various scenarios. The Dobrushin contraction coefficient of ergodicity is used for measuring convergence speed for Markov chain model of our fga. Two different testbed platforms are presented to illustrate effectiveness of our bio-inspired algorithm in terms of area coverage.  相似文献   

10.
We consider the problem of optimal real-time scheduling of periodic and sporadic tasks on identical multiprocessors. A number of recent papers have used the notions of fluid scheduling and deadline partitioning to guarantee optimality and improve performance. This article develops a unifying theory with the DP-Fair scheduling policy and examines how it overcomes problems faced by greedy scheduling algorithms. In addition, we present DP-Wrap, a simple DP-Fair scheduling algorithm which serves as a least common ancestor to other recent algorithms. The DP-Fair scheduling policy is extended to address the problem of scheduling sporadic task sets with arbitrary deadlines.  相似文献   

11.
Kernelization algorithms for the cluster editing problem have been a popular topic in the recent research in parameterized computation. Most kernelization algorithms for the problem are based on the concept of critical cliques. In this paper, we present new observations and new techniques for the study of kernelization algorithms for the cluster editing problem. Our techniques are based on the study of the relationship between cluster editing and graph edge-cuts. As an application, we present a simple algorithm that constructs a 2k-vertex kernel for the integral-weighted version of the cluster editing problem. Our result matches the best kernel bound for the unweighted version of the cluster editing problem, and significantly improves the previous best kernel bound for the weighted version of the problem. For the more general real-weighted version of the problem, our techniques lead to a simple kernelization algorithm that constructs a kernel of at most 4k vertices.  相似文献   

12.
Multi-instance multi-label learning (MIML) is a newly proposed framework, in which the multi-label problems are investigated by representing each sample with multiple feature vectors named instances. In this framework, the multi-label learning task becomes to learn a many-to-many relationship, and it also offers a possibility for explaining why a concerned sample has the certain class labels. The connections between instances and labels as well as the correlations among labels are equally crucial information for MIML. However, the existing MIML algorithms can rarely exploit them simultaneously. In this paper, a new MIML algorithm is proposed based on Gaussian process. The basic idea is to suppose a latent function with Gaussian process prior in the instance space for each label and infer the predictive probability of labels by integrating over uncertainties in these functions using the Bayesian approach, so that the connection between instances and every label can be exploited by defining a likelihood function and the correlations among labels can be identified by the covariance matrix of the latent functions. Moreover, since different relationships between instances and labels can be captured by defining different likelihood functions, the algorithm may be used to deal with the problems with various multi-instance assumptions. Experimental results on several benchmark data sets show that the proposed algorithm is valid and can achieve superior performance to the existing ones.  相似文献   

13.
The AtMostSeqCard constraint is the conjunction of a cardinality constraint on a sequence of n variables and of n???q?+?1 constraints AtMost u on each subsequence of size q. This constraint is useful in car-sequencing and crew-rostering problems. In van Hoeve et al. (Constraints 14(2):273–292, 2009), two algorithms designed for the AmongSeq constraint were adapted to this constraint with an O(2 q n) and O(n 3) worst case time complexity, respectively. In Maher et al. (2008), another algorithm similarly adaptable to filter the AtMostSeqCard constraint with a time complexity of O(n 2) was proposed. In this paper, we introduce an algorithm for achieving arc consistency on the AtMostSeqCard constraint with an O(n) (hence optimal) worst case time complexity. Next, we show that this algorithm can be easily modified to achieve arc consistency on some extensions of this constraint. In particular, the conjunction of a set of m AtMostSeqCard constraints sharing the same scope can be filtered in O(nm). We then empirically study the efficiency of our propagator on instances of the car-sequencing and crew-rostering problems.  相似文献   

14.
Due to significant advances in SAT technology in the last years, its use for solving constraint satisfaction problems has been gaining wide acceptance. Solvers for satisfiability modulo theories (SMT) generalize SAT solving by adding the ability to handle arithmetic and other theories. Although there are results pointing out the adequacy of SMT solvers for solving CSPs, there are no available tools to extensively explore such adequacy. For this reason, in this paper we introduce a tool for translating FLATZINC (MINIZINC intermediate code) instances of CSPs to the standard SMT-LIB language. We provide extensive performance comparisons between state-of-the-art SMT solvers and most of the available FLATZINC solvers on standard FLATZINC problems. The obtained results suggest that state-of-the-art SMT solvers can be effectively used to solve CSPs.  相似文献   

15.
The Frequency Assignment Problem (fap) is one of the key issues in the design of Global System for Mobile Communications (gsm) networks. The formulation of the fap used here focuses on aspects that are relevant to real gsm networks. In this paper, we adapt a parallel model to tackle a multiobjectivised version of the fap. It is a hybrid model which combines an island-based model and a hyperheuristic. The main aim of this paper is to design a strategy that facilitates the application of the current best-behaved algorithm. Specifically, our goal is to decrease the user effort required to set its parameters. At the same time, the usage of such an algorithm in parallel environments was enabled. As a result, the time required to attain high-quality solutions was decreased. We also conduct a robustness analysis of this parallel model. In this analysis we study the relationship between the migration stage of the parallel model and the quality of the resulting solutions. In addition, we also carry out a scalability study of the parallel model. In this case, we analyse the impact that the migration stage has on the scalability of the entire parallel model. Computational results with several real network instances have validated our proposed approach. The best-known frequency plans for two real-world network instances are improved with this strategy.  相似文献   

16.
Measure & Conquer (M&C) is a prominent technique for analyzing exact algorithms for computationally hard problems, in particular, graph problems. It tries to balance worse and better situations within the algorithm analysis. This has led, e.g., to algorithms for Minimum Vertex Cover with a running time of $\mathcal{O}(c^{n})$ for some constant c??1.2, where n is the number of vertices in the graph. Several obstacles prevent the application of this technique in parameterized algorithmics, making it rarely applied in this area. However, these difficulties can be handled in some situations. We will exemplify this with two problems related to Vertex Cover, namely Connected Vertex Cover and Edge Dominating Set. For both problems, several parameterized algorithms have been published, all based on the idea of first enumerating minimal vertex covers. Using M&C in this context will allow us to improve on the hitherto published running times. In contrast to some of the earlier suggested algorithms, ours will use polynomial space.  相似文献   

17.
The adoption of Artificial Neural Networks (ANNs) in safety-related applications is often avoided because it is difficult to rule out possible misbehaviors with traditional analytical or probabilistic techniques. In this paper we present NeVer, our tool for checking safety of ANNs. NeVer encodes the problem of verifying safety of ANNs into the problem of satisfying corresponding Boolean combinations of linear arithmetic constraints. We describe the main verification algorithm and the structure of NeVer. We present also empirical results confirming the effectiveness of NeVer on realistic case studies.  相似文献   

18.
We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit indicating whether the predicted label is correct or not, rather than the true label. Our algorithm is based on the second-order Perceptron, and uses upper-confidence bounds to trade-off exploration and exploitation, instead of random sampling as performed by most current algorithms. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model which is also chosen adversarially. We show a regret of $\mathcal{O}(\sqrt{T}\log T)$ , which improves over the current best bounds of $\mathcal{O}(T^{2/3})$ in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems and on four vowel recognition tasks, often obtaining state-of-the-art results, even compared with non-bandit online algorithms, especially when label noise is introduced.  相似文献   

19.
A DIN Kernel LISP Draft (DKLisp) has been developed by DIN as Reaction to Action D1 (N79), short term goal, of ISO WG16. It defines a subset language, as compatible as possible with the ANSICommon-Lisp draft, but also with theEuLisp draft. It combines the most important LISP main stream features in a single, compact, but nevertheless complete language definition, which thereby could be well suited as basis for a short term InternationalLisp Standard. Besides the functional and knowledge processing features, the expressive power of the language is well comparable with contemporary procedural languages, as e.g. C++ (of course without libraries). Important features ofDKLisp are:
  • to be a “Lisp-1,” but allowing an easy “Lisp-2” transformation;
  • to be a simple, powerful and standardized educationalLisp;
  • to omit all features, which are unclean or in heavy discussion;
  • DKLisp programs run nearly unchanged inCommon-Lisp;
  • DKLisp contains a simple object and package system;
  • DKLisp contains those data classes and control structures also common to most modernLisp and non-Lisp languages;
  • DKLisp offers a simple stream I/O;
  • DKLisp contains a clean unified hierarchical class/type system;
  • DKLisp contains the typical “Lisp-features” in an orthogonal way;
  • DKLisp allows and encourages really small but powerful implementations;
  • DKLisp comes in levels, so allowing ANSICommon-Lisp to be an extension ofDKLisp level-1.
  • The present is the second version of the proposal, namely version 1.2, with slight differences with respect to the one sent to ISO. Sources of such changes were the remarks generously sent by many readers of the previous attempt.  相似文献   

    20.
    The Hamiltonian Cycle problem is the problem of deciding whether an n-vertex graph G has a cycle passing through all vertices of G. This problem is a classic NP-complete problem. Finding an exact algorithm that solves it in ${\mathcal {O}}^{*}(\alpha^{n})$ time for some constant α<2 was a notorious open problem until very recently, when Björklund presented a randomized algorithm that uses ${\mathcal {O}}^{*}(1.657^{n})$ time and polynomial space. The Longest Cycle problem, in which the task is to find a cycle of maximum length, is a natural generalization of the Hamiltonian Cycle problem. For a claw-free graph G, finding a longest cycle is equivalent to finding a closed trail (i.e., a connected even subgraph, possibly consisting of a single vertex) that dominates the largest number of edges of some associated graph H. Using this translation we obtain two deterministic algorithms that solve the Longest Cycle problem, and consequently the Hamiltonian Cycle problem, for claw-free graphs: one algorithm that uses ${\mathcal {O}}^{*}(1.6818^{n})$ time and exponential space, and one algorithm that uses ${\mathcal {O}}^{*}(1.8878^{n})$ time and polynomial space.  相似文献   

    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号