共查询到20条相似文献,搜索用时 0 毫秒
1.
Christos Anagnostopoulos 《Applied Intelligence》2016,45(4):1034-1046
On-line statistical and machine learning analytic tasks over large-scale contextual data streams coming from e.g., wireless sensor networks, Internet of Things environments, have gained high popularity nowadays due to their significance in knowledge extraction, regression and classification tasks, and, more generally, in making sense from large-scale streaming data. The quality of the received contextual information, however, impacts predictive analytics tasks especially when dealing with uncertain data, outliers data, and data containing missing values. Low quality of received contextual data significantly spoils the progressive inference and on-line statistical reasoning tasks, thus, bias is introduced in the induced knowledge, e.g., classification and decision making. To alleviate such situation, which is not so rare in real time contextual information processing systems, we propose a progressive time-optimized data quality-aware mechanism, which attempts to deliver contextual information of high quality to predictive analytics engines by progressively introducing a certain controlled delay. Such a mechanism progressively delivers high quality data as much as possible, thus eliminating possible biases in knowledge extraction and predictive analysis tasks. We propose an analytical model for this mechanism and show the benefits stem from this approach through comprehensive experimental evaluation and comparative assessment with quality-unaware methods over real sensory multivariate contextual data. 相似文献
2.
Seyed-Mehdi-Reza Beheshti Boualem Benatallah Hamid Reza Motahari-Nezhad 《Distributed and Parallel Databases》2016,34(3):379-423
In today’s knowledge-, service-, and cloud-based economy, businesses accumulate massive amounts of data from a variety of sources. In order to understand businesses one may need to perform considerable analytics over large hybrid collections of heterogeneous and partially unstructured data that is captured related to the process execution. This data, usually modeled as graphs, increasingly come to show all the typical properties of big data: wide physical distribution, diversity of formats, non-standard data models, independently-managed and heterogeneous semantics. We use the term big process graph to refer to such large hybrid collections of heterogeneous and partially unstructured process related execution data. Online analytical processing (OLAP) of big process graph is challenging as the extension of existing OLAP techniques to analysis of graphs is not straightforward. Moreover, process data analysis methods should be capable of processing and querying large amount of data effectively and efficiently, and therefore have to be able to scale well with the infrastructure’s scale. While traditional analytics solutions (relational DBs, data warehouses and OLAP), do a great job in collecting data and providing answers on known questions, key business insights remain hidden in the interactions among objects: it will be hard to discover concept hierarchies for entities based on both data objects and their interactions in process graphs. In this paper, we introduce a framework and a set of methods to support scalable graph-based OLAP analytics over process execution data. The goal is to facilitate the analytics over big process graph through summarizing the process graph and providing multiple views at different granularity. To achieve this goal, we present a model for process OLAP (P-OLAP) and define OLAP specific abstractions in process context such as process cubes, dimensions, and cells. We present a MapReduce-based graph processing engine, to support big data analytics over process graphs. We have implemented the P-OLAP framework and integrated it into our existing process data analytics platform, ProcessAtlas, which introduces a scalable architecture for querying, exploration and analysis of large process data. We report on experiments performed on both synthetic and real-world datasets that show the viability and efficiency of the approach. 相似文献
3.
Yuan X He X Guo H Guo P Kendall W Huang J Zhang Y 《IEEE transactions on visualization and computer graphics》2010,16(6):1413-1420
Over the past few years, large human populations around the world have been affected by an increase in significant seismic activities. For both conducting basic scientific research and for setting critical government policies, it is crucial to be able to explore and understand seismic and geographical information obtained through all scientific instruments. In this work, we present a visual analytics system that enables explorative visualization of seismic data together with satellite-based observational data, and introduce a suite of visual analytical tools. Seismic and satellite data are integrated temporally and spatially. Users can select temporal ;and spatial ranges to zoom in on specific seismic events, as well as to inspect changes both during and after the events. Tools for designing high dimensional transfer functions have been developed to enable efficient and intuitive comprehension of the multi-modal data. Spread-sheet style comparisons are used for data drill-down as well as presentation. Comparisons between distinct seismic events are also provided for characterizing event-wise differences. Our system has been designed for scalability in terms of data size, complexity (i.e. number of modalities), and varying form factors of display environments. 相似文献
4.
Research into service provision and innovation is becoming progressively more important as automated service-provision via the web matures as a technology. We describe a web-based targeting platform that uses advanced dynamic model building techniques to conduct intelligent reporting and modeling. The impact of the automated targeting services is realized through a knowledge base that drives the development of predictive model(s). The knowledge base is comprised of a rules engine that guides and evaluates the development of an automated model-building process. The template defines the model classifier (e.g., logistic regression, multinomial logit, ordinary least squares, etc.) in concert with rules for data filling and transformations. Additionally, the template also defines which variables to test (“include” rules) and which variables to retain (“keep” rules). The “final” model emerges from the iterative steps undertaken by the rules engine, and is utilized to target, or rank, the best prospects. This automated modeling approach is designed to cost-effectively assist businesses in their targeting activities—independent of the firm’s size and targeting needs. We describe how the service has been utilized to provide “targeting services” for a small to medium business direct marketing campaign, and for direct sales-force targeting in a larger firm. Empirical results suggest that the automated modeling approach provides superior “service” in terms of cost and timing compared to more traditional manual service provision. 相似文献
5.
6.
7.
Li Li Li Xiaotong Qi Wenmin Zhang Yue Yang Wensheng 《Electronic Commerce Research》2022,22(2):321-350
Electronic Commerce Research - Electronic coupon (e-coupon) is one of the most important marketing tools in B2C e-commerce. To improve the e-coupon redemption rate and reduce marketing costs, it is... 相似文献
8.
Quality guaranteed aggregation based model predictive control and stability analysis 总被引:1,自引:0,他引:1
The input aggregation strategy can reduce the online computational burden of the model predictive controller. But generally
aggregation based MPC controller may lead to poor control quality. Therefore, a new concept, equivalent aggregation, is proposed
to guarantee the control quality of aggregation based MPC. From the general framework of input linear aggregation, the design
methods of equivalent aggregation are developed for unconstrained and terminal zero constrained MPC, which guarantee the actual
control inputs exactly to be equal to that of the original MPC. For constrained MPC, quasi-equivalent aggregation strategies
are also discussed, aiming to make the difference between the control inputs of aggregation based MPC and original MPC as
small as possible. The stability conditions are given for the quasi-equivalent aggregation based MPC as well.
Supported by the National Natural Science Foundation of China (Grant No. 60674041), and the Specialized Research Fund for
the Doctoral Program of Higher Education (Grant No. 20070248004) 相似文献
9.
10.
A visual analytics agenda 总被引:5,自引:0,他引:5
Researchers have made significant progress in disciplines such as scientific and information visualization, statistically based exploratory and confirmatory analysis, data and knowledge representations, and perceptual and cognitive sciences. Although some research is being done in this area, the pace at which new technologies and technical talents are becoming available is far too slow to meet the urgent need. National Visualization and Analytics Center's goal is to advance the state of the science to enable analysts to detect the expected and discover the unexpected from massive and dynamic information streams and databases consisting of data of multiple types and from multiple sources, even though the data are often conflicting and incomplete. Visual analytics is a multidisciplinary field that includes the following focus areas: (i) analytical reasoning techniques, (ii) visual representations and interaction techniques, (iii) data representations and transformations, (iv) techniques to support production, presentation, and dissemination of analytical results. The R&D agenda for visual analytics addresses technical needs for each of these focus areas, as well as recommendations for speeding the movement of promising technologies into practice. This article provides only the concise summary of the R&D agenda. We encourage reading, discussion, and debate as well as active innovation toward the agenda for visual analysis. 相似文献
11.
Harry Jiannan WangAuthor Vitae J. Leon ZhaoAuthor Vitae 《Decision Support Systems》2011,51(3):562-575
In a globalized economic environment with volatile business requirements, continuous process improvement needs to be done regularly in various organizations. However, maintaining the consistency of workflow models under frequent changes is a significant challenge in the management of corporate information services. Unfortunately, few formal approaches are found in the literature for managing workflow changes systematically. In this paper, we propose an analytical framework for workflow change management through formal modeling of workflow constraints, leading to an approach called Constraint-centric Workflow Change Analytics (CWCA). A core component of CWCA is the formal definition and analysis of workflow change anomalies. We operationalize CWCA by developing a change anomaly detection algorithm and validate it in the context of procurement management. A prototype system based on an open-source rule engine is presented to provide a proof-of-concept implementation of CWCA. 相似文献
12.
13.
Ricardo Dunia Thomas F. Edgar Terry Blevins Willy Wojsznis 《Journal of Process Control》2012,22(8):1445-1456
Batch process monitoring methods, such as multiway PCA and multiblock multiway PLS, make use of process variable time profiles to normalize and define most likelihood trajectories for statistical process control. Nevertheless, a continuous process analytics counterpart has not been developed, nor addressed in the literature. This paper presents a novel methodology that defines “state variables” to determine the multiple operating points around which a continuous process operates. In this manner, the operating region is divided into multiple regions (states) and shifts in operating conditions are captured by such state variables. Transition trajectories between states are calculated to determine the most likely path from one state to another. This methodology is referred as multistate analytics and can be implemented in the context of empirical monitoring methods, named multistate PLS and multistate PCA. A case study using data from carbon dioxide removal process shows that multistate analytics is beneficial for statistical monitoring of continuous processes. 相似文献
14.
Silva Ricardo Almeida Pires João Moura Datia Nuno Santos Maribel Yasmina Martins Bruno Birra Fernando 《Multimedia Tools and Applications》2019,78(23):32805-32847
Multimedia Tools and Applications - Crimes, forest fires, accidents, infectious diseases, or human interactions with mobile devices (e.g., tweets) are being logged as spatiotemporal events. For... 相似文献
15.
Wong PC Foote H Chin G Mackey P Perrine K 《IEEE transactions on visualization and computer graphics》2006,12(6):1399-1413
We present a visual analytics technique to explore graphs using the concept of a data signature. A data signature, in our context, is a multidimensional vector that captures the local topology information surrounding each graph node. Signature vectors extracted from a graph are projected onto a low-dimensional scatterplot through the use of scaling. The resultant scatterplot, which reflects the similarities of the vectors, allows analysts to examine the graph structures and their corresponding real-life interpretations through repeated use of brushing and linking between the two visualizations. The interpretation of the graph structures is based on the outcomes of multiple participatory analysis sessions with intelligence analysts conducted by the authors at the Pacific Northwest National Laboratory. The paper first uses three public domain data sets with either well-known or obvious features to explain the rationale of our design and illustrate its results. More advanced examples are then used in a customized usability study to evaluate the effectiveness and efficiency of our approach. The study results reveal not only the limitations and weaknesses of the traditional approach based solely on graph visualization, but also the advantages and strengths of our signature-guided approach presented in the paper 相似文献
16.
Trends in big data analytics 总被引:1,自引:0,他引:1
Karthik Kambatla Giorgos Kollias Vipin Kumar Ananth Grama 《Journal of Parallel and Distributed Computing》2014
One of the major applications of future generation parallel and distributed systems is in big-data analytics. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size. Beyond their sheer magnitude, these datasets and associated applications’ considerations pose significant challenges for method and software development. Datasets are often distributed and their size and privacy considerations warrant distributed techniques. Data often resides on platforms with widely varying computational and network capabilities. Considerations of fault-tolerance, security, and access control are critical in many applications (Dean and Ghemawat, 2004; Apache hadoop). Analysis tasks often have hard deadlines, and data quality is a major concern in yet other applications. For most emerging applications, data-driven models and methods, capable of operating at scale, are as-yet unknown. Even when known methods can be scaled, validation of results is a major issue. Characteristics of hardware platforms and the software stack fundamentally impact data analytics. In this article, we provide an overview of the state-of-the-art and focus on emerging trends to highlight the hardware, software, and application landscape of big-data analytics. 相似文献
17.
We intend to understand the growing amount of sports performance data by finding extreme data points, which makes human interpretation easier. In archetypoid analysis each datum is expressed as a mixture of actual observations (archetypoids). Therefore, it allows us to identify not only extreme athletes and teams, but also the composition of other athletes (or teams) according to the archetypoid athletes, and to establish a ranking. The utility of archetypoids in sports is illustrated with basketball and soccer data in three scenarios. Firstly, with multivariate data, where they are compared with other alternatives, showing their best results. Secondly, despite the fact that functional data are common in sports (time series or trajectories), functional data analysis has not been exploited until now, due to the sparseness of functions. In the second scenario, we extend archetypoid analysis for sparse functional data, furthermore showing the potential of functional data analysis in sports analytics. Finally, in the third scenario, features are not available, so we use proximities. We extend archetypoid analysis when asymmetric relations are present in data. This study provides information that will provide valuable knowledge about player/team/league performance so that we can analyze athlete’s careers. 相似文献
18.
19.
Paulo Sérgio Almeida Carlos Baquero Nuno Preguiça 《Information Processing Letters》2007,101(6):255-261
Bloom filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability, being impossible to store extra elements without increasing the false positive probability. This leads typically to a conservative assumption regarding maximum set size, possibly by orders of magnitude, and a consequent space waste. This paper proposes Scalable Bloom Filters, a variant of Bloom filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability. 相似文献
20.