期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Ecosystem on the Web: non-linear mining and forecasting of co-evolving online activities

Yasuko Matsubara Yasushi Sakurai Christos Faloutsos 《World Wide Web》2017,20(3):439-465

Given a large collection of co-evolving online activities, such as searches for the keywords “Xbox”, “PlayStation” and “Wii”, how can we find patterns and rules? Are these keywords related? If so, are they competing against each other? Can we forecast the volume of user activity for the coming month? We conjecture that online activities compete for user attention in the same way that species in an ecosystem compete for food. We present EcoWeb, (i.e., Ecosystem on the Web), which is an intuitive model designed as a non-linear dynamical system for mining large-scale co-evolving online activities. Our second contribution is a novel, parameter-free, and scalable fitting algorithm, EcoWeb-Fit, that estimates the parameters of EcoWeb. Extensive experiments on real data show that EcoWeb is effective, in that it can capture long-range dynamics and meaningful patterns such as seasonalities, and practical, in that it can provide accurate long-range forecasts. EcoWeb consistently outperforms existing methods in terms of both accuracy and execution speed. 相似文献

2.

Adaptive partitioning schemes for bipartite ranking

Stéphan Clémençon Marine Depecker Nicolas Vayatis 《Machine Learning》2011,83(1):31-69

Recursive partitioning methods are among the most popular techniques in machine learning. The purpose of this paper is to investigate how to adapt this methodology to the bipartite ranking problem. Following in the footsteps of the TreeRank approach developed in Clémençon and Vayatis (Proceedings of the 2008 Conference on Algorithmic Learning Theory, 2008 and IEEE Trans. Inf. Theory 55(9):4316–4336, 2009), we present tree-structured algorithms designed for learning to rank instances based on classification data. The main contributions of the present work are the following: the practical implementation of the TreeRank algorithm, well-founded solutions to the crucial issues related to the splitting rule and the choice of the “right” size for the ranking tree. From the angle embraced in this paper, splitting is viewed as a cost-sensitive classification task with data-dependent cost. Hence, up to straightforward modifications, any classification algorithm may serve as a splitting rule. Also, we propose to implement a cost-complexity pruning method after the growing stage in order to produce a “right-sized” ranking sub-tree with large AUC. In particular, performance bounds are established for pruning schemes inspired by recent work on nonparametric model selection. Eventually, we propose indicators for variable importance and variable dependence, plus various simulation studies illustrating the potential of our method. 相似文献

3.

<Emphasis Type="SmallCaps">EnTagRec</Emphasis><Superscript>++</Superscript>: An enhanced tag recommendation system for software information sites

Shaowei Wang David Lo Bogdan Vasilescu Alexander Serebrenik 《Empirical Software Engineering》2018,23(2):800-832

相似文献

4.

Human computation for constraint-based recommenders

Thomas Ulz Michael Schwarz Alexander Felfernig Sarah Haas Amal Shehadeh Stefan Reiterer Martin Stettinger 《Journal of Intelligent Information Systems》2017,49(1):37-57

PeopleViews is a Human Computation based environment for the construction of constraint-based recommenders. Constraint-based recommender systems support the handling of complex items where constraints (e.g., between user requirements and item properties) can be taken into account. When applying such systems, users are articulating their requirements and the recommender identifies solutions on the basis of the constraints in a recommendation knowledge base. In this paper, we provide an overview of the PeopleViews environment and show how recommendation knowledge can be collected from users of the environment on the basis of micro-tasks. We also show how PeopleViews exploits this knowledge for automatically generating recommendation knowledge bases. In this context, we compare the prediction quality of the recommendation approaches integrated in PeopleViews using a DSLR camera dataset. 相似文献

5.

Explaining anomalies in groups with characterizing subspace rules

Meghanath Macha Leman Akoglu 《Data mining and knowledge discovery》2018,32(5):1444-1480

相似文献

6.

Multi-ML: Programming Multi-BSP Algorithms in ML

V. Allombert F. Gava J. Tesson 《International journal of parallel programming》2017,45(2):340-361

bsp is a bridging model between abstract execution and concrete parallel systems. Structure and abstraction brought by bsp allow to have portable parallel programs with scalable performance predictions, without dealing with low-level details of architectures. In the past, we designed bsml for programming bsp algorithms in ml. However, the simplicity of the bsp model does not fit the complexity of today’s hierarchical architectures such as clusters of machines with multiple multi-core processors. The multi-bsp model is an extension of the bsp model which brings a tree-based view of nested components of hierarchical architectures. To program multi-bsp algorithms in ml, we propose the multi-ml language as an extension of bsml where a specific kind of recursion is used to go through a hierarchy of computing nodes. We define a formal semantics of the language and present preliminary experiments which show performance improvements with respect to bsml. 相似文献

7.

Mobile phone name extraction from internet forums: a semi-supervised approach

Yangjie Yao Aixin Sun 《World Wide Web》2016,19(5):783-805

Collecting users’ feedback on products from Internet forums is challenging because users often mention a product with informal abbreviations or nicknames. In this paper, we propose a method named Gren to recognize and normalize mobile phone names from domain-specific Internet forums. Instead of directly recognizing phone names from sentences as in most named entity recognition tasks, we propose an approach to generating candidate names as the first step. The candidate names capture short forms, spelling variations, and nicknames of products, but are not noise free. To predict whether a candidate name mention in a sentence indeed refers to a specific phone model, a Conditional Random Field (CRF)-based name recognizer is developed. The CRF model is trained by using a large set of sentences obtained in a semi-automatic manner with minimal manual labeling effort. Lastly, a rule-based name normalization component maps a recognized name to its formal form. Evaluated on more than 4000 manually labeled sentences with about 1000 phone name mentions, Gren outperforms all baseline methods. Specifically, it achieves precision and recall of 0.918 and 0.875 respectively, with the best feature setting. We also provide detailed analysis of the intermediate results obtained by each of the three components in Gren. 相似文献

8.

MTP: discovering high quality partitions in real world graphs

Yongsub?Lim Won-Jo?Lee Ho-Jin?Choi U?Kang Email author 《World Wide Web》2017,20(3):491-514

Given a real world graph, how can we find a large subgraph whose partition quality is much better than the original? How can we use a partition of that subgraph to discover a high quality global partition? Although graph partitioning especially with balanced sizes has received attentions in various applications, it is known NP-hard, and also known that there is no good cut at a large scale for real graphs. In this paper, we propose a novel approach for graph partitioning. Our first focus is on finding a large subgraph with high quality partitions, in terms of conductance. Despite the difficulty of the task for the whole graph, we observe that there is a large connected subgraph whose partition quality is much better than the original. Our proposed method MTP finds such a subgraph by removing “hub” nodes with large degrees, and taking the remaining giant connected component. Further, we extend MTP to gb MTP (Global Balanced MTP) for discovering a global balanced partition. gb MTP attaches the excluded nodes in MTP to the partition found by MTP in a greedy way. In experiments, we demonstrate that MTP finds a subgraph of a large size with low conductance graph partitions, compared with competing methods. We also show that the competitors cannot find connected subgraphs while our method does, by construction. This improvement in partition quality for the subgraph is especially noticeable for large scale cuts—for a balanced partition, down to 14 % of the original conductance with the subgraph size 70 % of the total. As a result, the found subgraph has clear partitions at almost all scales compared with the original. Moreover, gb MTP generally discovers global balanced partitions whose conductance are lower than those found by METIS, the state-of-the-art graph partitioning method. 相似文献

9.

Robust and simple database evolution

Kai Herrmann Hannes Voigt Jonas Rausch Andreas Behrend Wolfgang Lehner 《Information Systems Frontiers》2018,20(1):45-61

相似文献

10.

Translation-based approaches for solving disjunctive temporal problems with preferences

Enrico Giunchiglia Marco Maratea Luca Pulina 《Constraints》2018,23(4):383-402

Disjunctive Temporal Problems (DTPs) with Preferences (DTPPs) extend DTPs with piece-wise constant preference functions associated to each constraint of the form l ≤ x ? y ≤ u, where x,y are (real or integer) variables, and l,u are numeric constants. The goal is to find an assignment to the variables of the problem that maximizes the sum of the preference values of satisfied DTP constraints, where such values are obtained by aggregating the preference functions of the satisfied constraints in it under a “max” semantic. The state-of-the-art approach in the field, implemented in the native DTPP solver Maxilitis, extends the approach of the native DTP solver Epilitis. In this paper we present alternative approaches that translate DTPPs to Maximum Satisfiability of a set of Boolean combination of constraints of the form l?x ? y?u, ? ∈{<,≤}, that extend previous work dealing with constant preference functions only. We prove correctness and completeness of the approaches. Results obtained with the Satisfiability Modulo Theories (SMT) solvers Yices and MathSAT on randomly generated DTPPs and DTPPs built from real-world benchmarks, show that one of our translation is competitive to, and can be faster than, Maxilitis (This is an extended and revised version of Bourguet et al. 2013). 相似文献

11.

Performance and energy impact of parallelization and vectorization techniques in modern microprocessors

Juan M. Cebrián Lasse Natvig Jan Christian Meyer 《Computing》2014,96(12):1179-1193

While Moore’s law states that the number of transistors is approximately doubled every 2 years, powering these transistors simultaneously is only possible as long as Dennard scaling continues. Unfortunately, voltage scaling has slowed down in recent years, and microprocessor designers have hit what is known as the “utilization wall” or the “dark silicon” effect. Vectorization, parallelization, specialization and heterogeneity are the key approaches to deal with this utilization wall. However, how software developers can maximize energy efficiency of these architectures remains an open question. This paper presents an energy evaluation of parallelization using both physical and logical cores (i.e., SMT/Hyper-Threading), vectorization (SSE, Advanced Vector Extensions and NEON) and dynamic core reconfiguration [ \(\hbox {Intel}^{\circledR }\) ’s Turbo Boost Technology (TBT)]. The evaluation spans microprocessors for embedded, laptop, desktop and server markets, since there is a convergence among them towards energy efficiency. The analyzed processors include Intel’s Core \(^\mathrm{TM}\) i5 and i7 family and ARM \(^{\circledR }\) ’s Cortex \(^\mathrm{TM}\) A9 and A15. Results show that software developers should prioritize vectorization over thread parallelism when possible, as it yields better energy efficiency, especially on the Intel platforms. Application scalability can be reduced drastically when using vectorization and threading simultaneously since vectorization increases pressure on the memory subsystem. Intel’s TBT further improves energy efficiency by an additional 10–20 % depending on the number of active threads. 相似文献

12.

Approximating the Sparsest k-Subgraph in Chordal Graphs

Rémi Watrigant Marin Bougeret Rodolphe Giroudeau 《Theory of Computing Systems》2016,58(1):111-132

Given a simple undirected graph G = (V, E) and an integer k < |V|, the Sparsest k-Subgraph problem asks for a set of k vertices which induces the minimum number of edges. As a generalization of the classical independent set problem, Sparsest k-Subgraph is ????-hard and even not approximable unless ?????? in general graphs. Thus, we investigate Sparsest k-Subgraph in graph classes where independent set is polynomial-time solvable, such as subclasses of perfect graphs. Our two main results are the ????-hardness of Sparsest k-Subgraph on chordal graphs, and a greedy 2-approximation algorithm. Finally, we also show how to derive a P T A S for Sparsest k-Subgraph on proper interval graphs. 相似文献

13.

StarTrack: The Next Generation (of Product Review Management Tools)

Stefano Baccianella Andrea Esuli Fabrizio Sebastiani 《New Generation Computing》2013,31(1):47-70

Online product reviews are increasingly being recognized as a gold mine of information for determining product and brand positioning, and more and more companies look for ways of digging this gold mine for nuggets of knowledge that they can then bring to bear in decision making. We present a software system, called StarTrack, that automatically rates a product review according to a number of “stars,” i.e., according to how positive it is. In other words, given a text-only review (i.e., one with no explicit star-rating attached), StarTrack attempts to guess the star-rating that the reviewer would have attached to the review. StarTrack is thus useful for analysing unstructured word-of-mouth on products, such as the comments and reviews about products that are to be found in spontaneous discussion forums, such as newsgroups, blogs, and the like. StarTrack is based on machine learning technology, and as such does not require any re-programming for porting it from one product domain to another. Based on the star-ratings it attributes to reviews, StarTrack can subsequently rank the products in a given set according to how favourably they have been reviewed by consumers. We present controlled experiments in which we evaluate, on two large sets of product reviews crawled from the Web, the accuracy of StarTrack at (i) star-rating reviews, and (ii) ranking the reviewed products based on the automatically attributed star-ratings. 相似文献

14.

BSP-Why: A Tool for Deductive Verification of BSP Algorithms with Subgroup Synchronisation

Jean Fortin Frédéric Gava 《International journal of parallel programming》2016,44(3):574-597

We present bsp-why, a tool for deductive verification of bsp algorithms with subgroup synchronisation. From bsp programs, bsp-why generates sequential codes for the back-end condition generator why and thus benefits from its large range of existing provers. By enabling subgroups, the user can prove the correctness of programs that run on hierarchical machines—e.g. clusters of multi-cores. In general, bsp-why is able to generate proof obligations of mpi programs that only use collective operations. Our case studies are distributed state-space construction algorithms, the basis of model-checking. 相似文献

15.

Improved Approximation Algorithms for Label Cover Problems

Moses Charikar MohammadTaghi Hajiaghayi Howard Karloff 《Algorithmica》2011,61(1):190-206

In this paper we consider both the maximization variant Max Rep and the minimization variant Min Rep of the famous Label Cover problem. So far the best approximation ratios known for these two problems were \(O(\sqrt{n})\) and indeed some authors suggested the possibility that this ratio is the best approximation factor for these two problems. We show, in fact, that there are a O(n ^1/3)-approximation algorithm for Max Rep and a O(n ^1/3log?^2/3 n)-approximation algorithm for Min Rep. In addition, we also exhibit a randomized reduction from Densest k-Subgraph to Max Rep, showing that any approximation factor for Max Rep implies the same factor (up to a constant) for Densest k-Subgraph. 相似文献

16.

Verification as a parameterized testing (experiments with the SCP4 supercompiler)

A. P. Lisitsa A. P. Nemytykh 《Programming and Computer Software》2007,33(1):14-23

Let a program-predicate t testing another program p with respect to a given postcondition be given. Concrete tests d (data of the program p) are input data for t. Let us consider the program t when values of its argument d are unknown. Then a proof of the fact that the prediate t is true for all input data of the program p is verification of p with respect to the given postcondition. In this paper, we describe experiments on automatic verification of a number of cache coherence protocols with the SCP4 supercompiler (an optimizer of programs written in the REFAL-5 functional language). 相似文献

17.

Optimizing network robustness by edge rewiring: a general framework

Hau Chan Leman Akoglu 《Data mining and knowledge discovery》2016,30(5):1395-1425

Spectral measures have long been used to quantify the robustness of real-world graphs. For example, spectral radius (or the principal eigenvalue) is related to the effective spreading rates of dynamic processes (e.g., rumor, disease, information propagation) on graphs. Algebraic connectivity (or the Fiedler value), which is a lower bound on the node and edge connectivity of a graph, captures the “partitionability” of a graph into disjoint components. In this work we address the problem of modifying a given graph’s structure under a given budget so as to maximally improve its robustness, as quantified by spectral measures. We focus on modifications based on degree-preserving edge rewiring, such that the expected load (e.g., airport flight capacity) or physical/hardware requirement (e.g., count of ISP router traffic switches) of nodes remain unchanged. Different from a vast literature of measure-independent heuristic approaches, we propose an algorithm, called EdgeRewire, which optimizes a specific measure of interest directly. Notably, EdgeRewire is general to accommodate six different spectral measures. Experiments on real-world datasets from three different domains (Internet AS-level, P2P, and airport flights graphs) show the effectiveness of our approach, where EdgeRewire produces graphs with both (i) higher robustness, and (ii) higher attack-tolerance over several state-of-the-art methods. 相似文献

18.

An investigation of the fault-proneness of clone evolutionary patterns

Liliane Barbour Le An Foutse Khomh Ying Zou Shaohua Wang 《Software Quality Journal》2018,26(4):1187-1222

Two identical or similar code fragments form a clone pair. Previous studies have identified cloning as a risky practice. Therefore, a developer needs to be aware of any clone pairs in order to properly propagate any changes between clones. A clone pair may experience many changes during the creation and maintenance of a software system. A change can either maintain or remove the similarity between clones in a clone pair. If a change maintains the similarity between clones, the clone pair is left in a consistent state. When a change makes the clones no longer similar, the clone pair is left in an inconsistent state. The set of states and changes experienced by clone pairs over time form an evolution history known as a clone genealogy. In this paper, we examine clone genealogies to identify fault-prone “patterns” of states and changes. We explore the use of clone genealogy information in fault prediction. We conduct a quasi-experiment with four long-lived software systems (i.e., Apache Ant, ArgoUML, JEdit, Maven) and identify clones using the NiCad and iClones clone detection tools. Overall, we find that the size of the clone can impact the fault-proneness of a clone pair. However, there is no clear impact of the time interval between changes to a clone pair on the fault-proneness of the clone pair. We also discover that adding clone genealogy information can increase the explanatory power of fault prediction models. 相似文献

19.

Toward the complexity of the existence of wonderfully stable partitions and strictly core stable coalition structures in enemy-oriented hedonic games

Anja Rey Jörg Rothe Hilmar Schadrack Lena Schend 《Annals of Mathematics and Artificial Intelligence》2016,77(3-4):317-333

We study the computational complexity of the existence and the verification problem for wonderfully stable partitions (WSPE and WSPV) and of the existence problem for strictly core stable coalition structures (SCSCS) in enemy-oriented hedonic games. In this note, we show that WSPV is NP-complete and both WSPE and SCSCS are DP-hard, where DP is the second level of the boolean hierarchy, and we discuss an approach for classifying the latter two problems in terms of their complexity. 相似文献

20.

Advanced load application on deformable ring gears in planetary gearboxes

T. Kamper S. Flock R. Schelenz 《Multibody System Dynamics》2016,36(4):323-337

This paper discusses the implementation of deformable ring gears in Multi-Body-Simulation-Models (MBS-Models) of planetary gearboxes using the example of a Wind Turbine (WT). For this purpose an add-on to the MBS-Software Simpack was developed and tested by the project partners Bosch Rexroth and the Institute for Machine Elements and Machine Design at Rwth-Aachen University (Ime).The presented research includes a measurement campaign on a test rig owned by Bosch Rexroth to obtain data for a validation of the newly developed add-on. 相似文献