共查询到20条相似文献,搜索用时 31 毫秒
1.
Many environmental, scientific, technical or medical database applications require effective and efficient mining of time
series, sequences or trajectories of measurements taken at different time points and positions forming large temporal or spatial
databases. Particularly the analysis of concurrent and multidimensional sequences poses new challenges in finding clusters
of arbitrary length and varying number of attributes. We present a novel algorithm capable of finding parallel clusters in
different subspaces and demonstrate our results for temporal and spatial applications. Our analysis of structural quality
parameters in rivers is successfully used by hydrologists to develop measures for river quality improvements.
相似文献
2.
Nowadays data mining plays an important role in decision making. Since many organizations do not possess the in-house expertise
of data mining, it is beneficial to outsource data mining tasks to external service providers. However, most organizations
hesitate to do so due to the concern of loss of business intelligence and customer privacy. In this paper, we present a Bloom
filter based solution to enable organizations to outsource their tasks of mining association rules, at the same time, protect
their business intelligence and customer privacy. Our approach can achieve high precision in data mining by trading-off the
storage requirement.
This research was supported by the USA National Science Foundation Grants CCR-0310974 and IIS-0546027.
相似文献
3.
Support vector machines (SVMs) have been promising methods for classification and regression analysis due to their solid mathematical
foundations, which include two desirable properties: margin maximization and nonlinear classification using kernels. However,
despite these prominent properties, SVMs are usually not chosen for large-scale data mining problems because their training
complexity is highly dependent on the data set size. Unlike traditional pattern recognition and machine learning, real-world
data mining applications often involve huge numbers of data records. Thus it is too expensive to perform multiple scans on
the entire data set, and it is also infeasible to put the data set in memory. This paper presents a method, Clustering-Based SVM (CB-SVM), that maximizes the SVM performance for very large data sets given a limited amount of resource, e.g., memory. CB-SVM applies
a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples.
These samples carry statistical summaries of the data and maximize the benefit of learning. Our analyses show that the training
complexity of CB-SVM is quadratically dependent on the number of support vectors, which is usually much less than that of
the entire data set. Our experiments on synthetic and real-world data sets show that CB-SVM is highly scalable for very large
data sets and very accurate in terms of classification.
A preliminary version of the paper, “ Classifying Large Data Sets Using SVM with Hierarchical Clusters”, by H. Yu, J. Yang, and J. Han, appeared in Proc. 2003 Int. Conf. on Knowledge Discovery in Databases (KDD'03), Washington, DC, August 2003. However, this submission has substantially extended the previous paper and contains new and
major-value added technical contribution in comparison with the conference publication.
相似文献
4.
We present a study of using camera-phones and visual-tags to access mobile services. Firstly, a user-experience study is described in which participants were both observed learning to interact with a prototype mobile service and interviewed
about their experiences. Secondly, a pointing-device task is presented in which quantitative data was gathered regarding the speed and accuracy with which participants aimed and clicked
on visual-tags using camera-phones. We found that participants’ attitudes to visual-tag-based applications were broadly positive,
although they had several important reservations about camera-phone technology more generally. Data from our pointing-device
task demonstrated that novice users were able to aim and click on visual-tags quickly (well under 3 s per pointing-device
trial on average) and accurately (almost all meeting our defined speed/accuracy tradeoff of 6% error-rate). Based on our findings,
design lessons for camera-phone and visual-tag applications are presented.
相似文献
5.
An important area of Human Reliability Assessment in interactive systems is the ability to understand the causes of human
error and to model their occurrence. This paper investigates a new approach to analysis of task failures based on patterns
of operator behaviour, in contrast with more traditional event-based approaches. It considers, as a case study, a formal model
of an Air Traffic Control system operator’s task which incorporates a simple model of the high-level cognitive processes involved.
The cognitive model is formalised in the CSP process algebra. Various patterns of behaviour that could lead to task failure
are described using temporal logic. Then a model-checking technique is used to verify whether the set of selected behavioural
patterns is sound and complete with respect to the definition of task failure. The decomposition is shown to be incomplete
and a new behavioural pattern is identified, which appears to have been overlooked in the informal analysis of the problem.
This illustrates how formal analysis of operator models can yield fresh insights into how failures may arise in interactive
systems.
相似文献
6.
In this paper, we deal with mining sequential patterns in multiple time sequences. Building on a state-of-the-art sequential
pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE ( MIning in mu Ltiple s Equences), an efficient algorithm to facilitate the mining process. MILE recursively utilizes the knowledge of existing patterns
to avoid redundant data scanning, and therefore can effectively speed up the new patterns’ discovery process. Another unique
feature of MILE is that it can incorporate prior knowledge of the data distribution in time sequences into the mining process
to further improve the performance. Extensive empirical results show that MILE is significantly faster than PrefixSpan. As
MILE consumes more memory than PrefixSpan, we also present a solution to trade time efficiency in memory constrained environments.
相似文献
7.
We provide the complete record of methodology that let us evolve BrilliAnt, the winner of the Ant Wars contest. Ant Wars contestants are virtual ants collecting food on a grid board in the presence
of a competing ant. BrilliAnt has been evolved through a competitive one-population coevolution using genetic programming
and fitnessless selection. In this paper, we detail the evolutionary setup that lead to BrilliAnt’s emergence, assess its
direct and indirect human-competitiveness, and describe the behavioral patterns observed in its strategy.
相似文献
8.
Recently, a new class of data mining methods, known as privacy preserving data mining (PPDM) algorithms, has been developed by the research community working on security and knowledge discovery. The aim of these
algorithms is the extraction of relevant knowledge from large amount of data, while protecting at the same time sensitive
information. Several data mining techniques, incorporating privacy protection mechanisms, have been developed that allow one
to hide sensitive itemsets or patterns, before the data mining process is executed. Privacy preserving classification methods,
instead, prevent a miner from building a classifier which is able to predict sensitive data. Additionally, privacy preserving
clustering techniques have been recently proposed, which distort sensitive numerical attributes, while preserving general
features for clustering analysis. A crucial issue is to determine which ones among these privacy-preserving techniques better
protect sensitive information. However, this is not the only criteria with respect to which these algorithms can be evaluated.
It is also important to assess the quality of the data resulting from the modifications applied by each algorithm, as well
as the performance of the algorithms. There is thus the need of identifying a comprehensive set of criteria with respect to
which to assess the existing PPDM algorithms and determine which algorithm meets specific requirements.
In this paper, we present a first evaluation framework for estimating and comparing different kinds of PPDM algorithms. Then,
we apply our criteria to a specific set of algorithms and discuss the evaluation results we obtain. Finally, some considerations
about future work and promising directions in the context of privacy preservation in data mining are discussed.
*The work reported in this paper has been partially supported by the EU under the IST Project CODMINE and by the Sponsors of
CERIAS.
Editor: Geoff Webb
相似文献
9.
This paper describes the simulated car racing competition that was arranged as part of the 2007 IEEE Congress on Evolutionary
Computation. Both the game that was used as the domain for the competition, the controllers submitted as entries to the competition
and its results are presented. With this paper, we hope to provide some insight into the efficacy of various computational
intelligence methods on a well-defined game task, as well as an example of one way of running a competition. In the process,
we provide a set of reference results for those who wish to use the simplerace game to benchmark their own algorithms. The paper is co-authored by the organizers and participants of the competition.
相似文献
10.
Listening to music on personal, digital devices whilst mobile is an enjoyable, everyday activity. We explore a scheme for
exploiting this practice to immerse listeners in navigation cues. Our prototype, ONTRACK, continuously adapts audio, modifying
the spatial balance and volume to lead listeners to their target destination. First we report on an initial lab-based evaluation
that demonstrated the approach’s efficacy: users were able to complete tasks within a reasonable time and their subjective
feedback was positive. Encouraged by these results we constructed a handheld prototype. Here, we discuss this implementation
and the results of field-trials. These indicate that even with a low-fidelity realisation of the concept, users can quite
effectively navigate complicated routes.
相似文献
11.
The paper reflects on the unique experience of social and technological development in Lithuania since the regaining of independence
as a newly reshaped society constructing a distinctive competitive IST-based model at global level. This has presented Lithuanian
pattern of how to integrate different experiences and relations between generations in implementing complex information society
approaches. The resulting programme in general is linked to the Lisbon objectives of the European Union. The experience of
transitional countries in Europe, each different but facing some common problems, may be useful to developing countries in
Africa.
相似文献
12.
This paper presents an algorithm for the complete specification of multinomial discrete choice models to predict the spatial
preferences of attackers. The formulation employed is a modification of models previously applied in transportation flow and
crime analysis. A breaking and entering crime data set is employed to compare the efficacy of this model with traditional
hot spot models. Discrete choice models are shown to perform as well as, or better than such models and offer more interpretable
results.
相似文献
13.
Quantitative usability requirements are a critical but challenging, and hence an often neglected aspect of a usability engineering process. A case study is described where quantitative usability requirements played a key role in the development of a new user interface of a mobile phone. Within the practical constraints of the project, existing methods for determining usability requirements and evaluating the extent to which these are met, could not be applied as such, therefore tailored methods had to be developed. These methods and their applications are discussed. 相似文献
14.
Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than
a human expert or a machine leaning technique could make use of. Often different patterns or combinations of patterns cover
a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information
contained in such pattern sets, we propose two general heuristic algorithms—Bouncer and Picker—for selecting a small subset
of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data
sets. The results show that both techniques succeed in severely reducing the number of patterns, while at the same time apparently
retaining much of the original information. Additionally, the experiments show that reducing the pattern set indeed improves
the quality of classification results. Both results show that the developed solutions are very well suited for the goals we
aim at.
相似文献
15.
In this paper, we present an Inverse Multi-Objective Robust Evolutionary (IMORE) design methodology that handles the presence
of uncertainty without making assumptions about the uncertainty structure. We model the clustering of uncertain events in
families of nested sets using a multi-level optimization search. To reduce the high computational costs of the proposed methodology
we proposed schemes for (1) adapting the step-size in estimating the uncertainty, and (2) trimming down the number of calls
to the objective function in the nested search. Both offline and online adaptation strategies are considered in conjunction
with the IMORE design algorithm. Design of Experiments (DOE) approaches further reduce the number of objective function calls
in the online adaptive IMORE algorithm. Empirical studies conducted on a series of test functions having diverse complexities
show that the proposed algorithms converge to a set of Pareto-optimal design solutions with non-dominated nominal and robustness
performances efficiently.
相似文献
16.
The complexity of group dynamics occurring in small group interactions often hinders the performance of teams. The availability
of rich multimodal information about what is going on during the meeting makes it possible to explore the possibility of providing
support to dysfunctional teams from facilitation to training sessions addressing both the individuals and the group as a whole.
A necessary step in this direction is that of capturing and understanding group dynamics. In this paper, we discuss a particular
scenario, in which meeting participants receive multimedia feedback on their relational behaviour, as a first step towards
increasing self-awareness. We describe the background and the motivation for a coding scheme for annotating meeting recordings
partially inspired by the Bales’ Interaction Process Analysis. This coding scheme was aimed at identifying suitable observable
behavioural sequences. The study is complemented with an experimental investigation on the acceptability of such a service.
相似文献
17.
Awareness systems have attracted significant research interest for their potential to support interpersonal relationships.
Investigations of awareness systems for the domestic environment have suggested that such systems can help individuals stay
in touch with dear friends or family and provide affective benefits to their users. Our research provides empirical evidence
to refine and substantiate such suggestions. We report our experience with designing and evaluating the ASTRA awareness system,
for connecting households and mobile family members. We introduce the concept of connectedness and its measurement through
the Affective Benefits and Costs of communication questionnaire (ABC-Q). We inform results that testify the benefits of sharing
experiences at the moment they happen without interrupting potential receivers. Finally, we document the role that lightweight,
picture-based communication can play in the range of communication media available.
相似文献
18.
There are only a few ethical regulations that deal explicitly with robots, in contrast to a vast number of regulations, which
may be applied. We will focus on ethical issues with regard to “responsibility and autonomous robots”, “machines as a replacement
for humans”, and “tele-presence”. Furthermore we will examine examples from special fields of application (medicine and healthcare,
armed forces, and entertainment). We do not claim to present a complete list of ethical issue nor of regulations in the field
of robotics, but we will demonstrate that there are legal challenges with regard to these issues.
相似文献
19.
Frequent pattern mining on data streams is of interest recently. However, it is not easy for users to determine a proper frequency
threshold. It is more reasonable to ask users to set a bound on the result size. We study the problem of mining top K frequent itemsets in data streams. We introduce a method based on the Chernoff bound with a guarantee of the output quality
and also a bound on the memory usage. We also propose an algorithm based on the Lossy Counting Algorithm. In most of the experiments
of the two proposed algorithms, we obtain perfect solutions and the memory space occupied by our algorithms is very small.
Besides, we also propose the adapted approach of these two algorithms in order to handle the case when we are interested in
mining the data in a sliding window. The experiments show that the results are accurate.
相似文献
20.
Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial
search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of support.
Also, it tends to generate too many spurious patterns involving items which are from different support levels and are poorly
correlated. In this paper, we present a framework for mining highly-correlated association patterns called hyperclique patterns.
In this framework, an objective measure called h-confidence is applied to discover hyperclique patterns. We prove that the
items in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine
similarity (uncentered Pearson's correlation coefficient). Also, we show that the h-confidence measure satisfies a cross-support
property which can help efficiently eliminate spurious patterns involving items with substantially different support levels.
Indeed, this cross-support property is not limited to h-confidence and can be generalized to some other association measures.
In addition, an algorithm called hyperclique miner is proposed to exploit both cross-support and anti-monotone properties
of the h-confidence measure for the efficient discovery of hyperclique patterns. Finally, our experimental results show that
hyperclique miner can efficiently identify hyperclique patterns, even at extremely low levels of support.
相似文献
|