首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Today, a large volume of hotel reviews is available on many websites, such as TripAdvisor (http://www.tripadvisor.com) and Orbitz (http://www.orbitz.com). A typical review contains an overall rating, several aspect ratings, and review text. The rating is an abstract of review in terms of numerical points. The task of aspect-based opinion summarization is to extract aspect-specific opinions hidden in the reviews which do not have aspect ratings, so that users can quickly digest them without actually reading through them. The task consists of aspect identification and aspect rating inference. Most existing studies cannot utilize aspect ratings which become increasingly abundant on review hosts. In this paper, we propose two topic models which explicitly model aspect ratings as observed variables to improve the performance of aspect rating inference on unrated reviews. The experiment results show that our approaches outperform the existing methods on the data set crawled from TripAdvisor website.  相似文献   

2.
In the era of bigdata, with a massive set of digital information of unprecedented volumes being collected and/or produced in several application domains, it becomes more and more difficult to manage and query large data repositories. In the framework of the PetaSky project (http://com.isima.fr/Petasky), we focus on the problem of managing scientific data in the field of cosmology. The data we consider are those of the LSST project (http://www.lsst.org/). The overall size of the database that will be produced is expected to exceed 60 PB (Lsst data challenge handbook, 2012). In order to evaluate the performances of existing SQL On MapReduce data management systems, we conducted extensive experiments by using data and queries from the area of cosmology. The goal of this work is to report on the ability of such systems to support large scale declarative queries. We mainly investigated the impact of data partitioning, indexing and compression on query execution performances.  相似文献   

3.
Twitter (http://twitter.com) is one of the most popular social networking platforms. Twitter users can easily broadcast disaster-specific information, which, if effectively mined, can assist in relief operations. However, the brevity and informal nature of tweets pose a challenge to Information Retrieval (IR) researchers. In this paper, we successfully use word embedding techniques to improve ranking for ad-hoc queries on microblog data. Our experiments with the ‘Social Media for Emergency Relief and Preparedness’ (SMERP) dataset provided at an ECIR 2017 workshop show that these techniques outperform conventional term-matching based IR models. In addition, we show that, for the SMERP task, our word embedding based method is more effective if the embeddings are generated from the disaster specific SMERP data, than when they are trained on the large social media collection provided for the TREC (http://trec.nist.gov/) 2011 Microblog track dataset.  相似文献   

4.
Rapid building detection using machine learning   总被引:1,自引:0,他引:1  
This work describes algorithms for performing discrete object detection, specifically in the case of buildings, where usually only low quality RGB-only geospatial reflective imagery is available. We utilize new candidate search and feature extraction techniques to reduce the problem to a machine learning (ML) classification task. Here we can harness the complex patterns of contrast features contained in training data to establish a model of buildings. We avoid costly sliding windows to generate candidates; instead we innovatively stitch together well known image processing techniques to produce candidates for building detection that cover 80–85 % of buildings. Reducing the number of possible candidates is important due to the scale of the problem. Each candidate is subjected to classification which, although linear, costs time and prohibits large scale evaluation. We propose a candidate alignment algorithm to boost classification performance to 80–90 % precision with a linear time algorithm and show it has negligible cost. Also, we propose a new concept called a Permutable Haar Mesh (PHM) which we use to form and traverse a search space to recover candidate buildings which were lost in the initial preprocessing phase. All code and datasets from this paper are made available online (http://kdl.cs.umb.edu/w/datasets/ and https://github.com/caitlinkuhlman/ObjectDetectionCLUtility).  相似文献   

5.
With the explosion of data production, the efficiency of data management and analysis has been concerned by both industry and academia. Meanwhile, more and more energy is consumed by the IT infrastructure especially the larger scale distributed systems. In this paper, a novel idea for optimizing the Energy Consumption (EC for short) of MapReduce system is proposed. We argue that a fair data placement is helpful to save energy, and then we propose three goals of data placement, and a modulo based Data Placement Algorithm (DPA for short) which achieves these goals. Afterwards, the correctness of the proposed DPA is proved from both theoretical and experimental perspectives. Three different systems which implement MapReduce model with different DPAs are compared in our experiments. Our algorithm is proved to optimize EC effectively, without introducing the additional costs and delaying data loading. With the help of our DPA, the EC for the WordCount (https://src/examples/org/apache/hadoop/examples/), Sort (https://src/examples/org/apache/hadoop/examples/sort) and MRBench (https://src/examples/org/apache/hadoop/mapred/) can be reduced by 10.9 %, 8.3 % and 17 % respectively, and time consumption is reduced by 7 %, 6.3 % and 7 % respectively.  相似文献   

6.
In our studies of global software engineering (GSE) teams, we found that informal, non-work-related conversations are positively associated with trust. Seeking to use novel analytical techniques to more carefully investigate this phenomenon, we described these non-work-related conversations by adapting the economics literature concept of “cheap talk,” and studied it using Evolutionary Game Theory (EGT). More specifically, we modified the classic Stag-hunt game and analyzed the dynamics in a fixed population setting (an abstraction of a GSE team). Doing so, we were able to demonstrate how cheap talk over the Internet (e-cheap talk) was powerful enough to facilitate the emergence of trust and improve the probability of cooperation where the punishment for uncooperative behavior is comparable to the cost of the cheap talk. To validate the results of our theoretical approach, we conducted two empirical case studies that analyzed the logged IRC development discussions of Apache Lucene (http://lucene.apache. org/) and Chromium OS (http://www.chromium.org/chromium-os) using both quantitative and qualitative methods. The results provide general support to the theoretical propositions. We discuss our findings and the theoretical and practical implications to GSE collaborations and research.  相似文献   

7.
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing \(\sim \)0.25 M images, \(\sim \)0.76 M questions, and \(\sim \)10 M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).  相似文献   

8.
Fine particulate matter (\(\hbox {PM}_{2.5}\)) has a considerable impact on human health, the environment and climate change. It is estimated that with better predictions, US$9 billion can be saved over a 10-year period in the USA (State of the science fact sheet air quality. http://www.noaa.gov/factsheets/new, 2012). Therefore, it is crucial to keep developing models and systems that can accurately predict the concentration of major air pollutants. In this paper, our target is to predict \(\hbox {PM}_{2.5}\) concentration in Japan using environmental monitoring data obtained from physical sensors with improved accuracy over the currently employed prediction models. To do so, we propose a deep recurrent neural network (DRNN) that is enhanced with a novel pre-training method using auto-encoder especially designed for time series prediction. Additionally, sensors selection is performed within DRNN without harming the accuracy of the predictions by taking advantage of the sparsity found in the network. The numerical experiments show that DRNN with our proposed pre-training method is superior than when using a canonical and a state-of-the-art auto-encoder training method when applied to time series prediction. The experiments confirm that when compared against the \(\hbox {PM}_{2.5}\) prediction system VENUS (National Institute for Environmental Studies. Visual Atmospheric Environment Utility System. http://envgis5.nies.go.jp/osenyosoku/, 2014), our technique improves the accuracy of \(\hbox {PM}_{2.5}\) concentration level predictions that are being reported in Japan.  相似文献   

9.
As the volume of data generated each day continues to increase, more and more interest is put into Big Data algorithms and the insight they provide.? Since these analyses require a substantial amount of resources, including physical machines, power, and time, reliable execution of the algorithms becomes critical. This paper analyzes the error resilience of a select group of popular Big Data algorithms and shows how they can effectively be made more fault-tolerant. Using KULFI (http://github.com/quadpixels/kulfi, 2013) and the LLVM (Proceedings of the 2004 international symposium on code generation and optimization (CGO 2004), San Jose, CA, USA, 2004) compiler for compilation allows injection of artificial soft faults throughout these algorithms, giving a thorough analysis of how faults in different locations can affect the outcome of the program. This information is then used to help guide incorporating fault tolerance mechanisms into the program, making them as impervious as possible to soft faults.  相似文献   

10.
Website Archivability (WA) is a notion established to capture the core aspects of a website, crucial in diagnosing whether it has the potential to be archived with completeness and accuracy. In this work, aiming at measuring WA, we introduce and elaborate on all aspects of CLEAR+, an extended version of the Credible Live Evaluation Method for Archive Readiness (CLEAR) method. We use a systematic approach to evaluate WA from multiple different perspectives, which we call Website Archivability Facets. We then analyse archiveready.com, a web application we created as the reference implementation of CLEAR+, and discuss the implementation of the evaluation workflow. Finally, we conduct thorough evaluations of all aspects of WA to support the validity, the reliability and the benefits of our method using real-world web data.  相似文献   

11.
We describe a scheme for subdividing long-running, variable-length analyses into short, fixed-length boinc workunits using phylogenetic analyses as an example. Fixed-length workunits decrease variance in analysis runtime, improve overall system throughput, and make boinc a more useful resource for analyses that require a relatively fast turnaround time, such as the phylogenetic analyses submitted by users of the garli web service at molecularevolution.org. Additionally, we explain why these changes will benefit volunteers who contribute their processing power to boinc projects, such as the Lattice boinc Project (http://boinc.umiacs.umd.edu). Our results, which demonstrate the advantages of relatively short workunits, should be of general interest to anyone who develops and deploys an application on the boinc platform.  相似文献   

12.
In recent years, deep learning has been successfully applied to diverse multimedia research areas, with the aim of learning powerful and informative representations for a variety of visual recognition tasks. In this work, we propose convolutional fusion networks (CFN) to integrate multi-level deep features and fuse a richer visual representation. Despite recent advances in deep fusion networks, they still have limitations due to expensive parameters and weak fusion modules. Instead, CFN uses 1 × 1 convolutional layers and global average pooling to generate side branches with few parameters, and employs a locally-connected fusion module, which can learn adaptive weights for different side branches and form a better fused feature. Specifically, we introduce three key components of the proposed CFN, and discuss its differences from other deep models. Moreover, we propose fully convolutional fusion networks (FCFN) that are an extension of CFN for pixel-level classification applied to several tasks, such as semantic segmentation and edge detection. Our experiments demonstrate that CFN (and FCFN) can achieve promising performance by consistent improvements for both image-level and pixel-level classification tasks, compared to a plain CNN. We release our codes on https://github.com/yuLiu24/CFN. Also, we make a live demo (goliath.liacs.nl) using a CFN model trained on the ImageNet dataset.  相似文献   

13.
Uncertain variables are used to describe the phenomenon where uncertainty appears in a complex system. For modeling the multi-objective decision-making problems with uncertain parameters, a class of uncertain optimization is suggested for the decision systems in Liu and Chen (2013), http://orsc.edu.cn/online/131020 which is called the uncertain multi-objective programming. In order to solve the proposed uncertain multi-objective programming, an interactive uncertain satisficing approach involving the decision-maker’s flexible demands is proposed in this paper. It makes an improvement in contrast to the noninteractive methods. Finally, a numerical example about the capital budget problem is given to illustrate the effectiveness of the proposed model and the relevant solving approach.  相似文献   

14.
15.
Multi-Verse Optimizer: a nature-inspired algorithm for global optimization   总被引:1,自引:0,他引:1  
This paper proposes a novel nature-inspired algorithm called Multi-Verse Optimizer (MVO). The main inspirations of this algorithm are based on three concepts in cosmology: white hole, black hole, and wormhole. The mathematical models of these three concepts are developed to perform exploration, exploitation, and local search, respectively. The MVO algorithm is first benchmarked on 19 challenging test problems. It is then applied to five real engineering problems to further confirm its performance. To validate the results, MVO is compared with four well-known algorithms: Grey Wolf Optimizer, Particle Swarm Optimization, Genetic Algorithm, and Gravitational Search Algorithm. The results prove that the proposed algorithm is able to provide very competitive results and outperforms the best algorithms in the literature on the majority of the test beds. The results of the real case studies also demonstrate the potential of MVO in solving real problems with unknown search spaces. Note that the source codes of the proposed MVO algorithm are publicly available at http://www.alimirjalili.com/MVO.html.  相似文献   

16.
Recently, Gao et al. (J Time Ser Anal, 2016 doi:  10.1111/jtsa.12178) propose a new estimation method for dynamic panel probit model with random effects, where the theoretical properties of estimator are derived. In this paper, we extend their estimation method to the \(T\ge 3\) case, and some Monte Carlo simulations are presented to illustrate the extended estimator.  相似文献   

17.
In 2013, Farid and Vasiliev [arXiv:1310.4922 [quant-ph]] for the first time proposed a way to construct a protocol for the realisation of “Classical to Quantum” one-way hash function, a derivative of the quantum one-way function as defined by Gottesman and Chuang [Technical Report arXiv:quant-ph/0105032] and used it for constructing quantum digital signatures. We, on the other hand, for the first time, propose the idea of a different kind of one-way function, which is “quantum-classical” in nature, that is, it takes an n-qubit quantum state of a definite kind as its input and produces a classical output. We formally define such a one-way function and propose a way to construct and realise it. The proposed one-way function turns out to be very useful in authenticating a quantum state in any quantum money scheme, and so we can construct many different quantum money schemes based on such a one-way function. Later in the paper, we also give explicit constructions of some interesting quantum money schemes like quantum bitcoins and quantum currency schemes, solely based on the proposed one-way function. The security of such schemes can be explained on the basis of the security of the underlying one-way functions.  相似文献   

18.
This educational paper describes the implementation aspects, user interface design considerations and workflow potential of the recently published TopOpt 3D App. The app solves the standard minimum compliance problem in 3D and allows the user to change design settings interactively at any point in time during the optimization. Apart from its educational nature, the app may point towards future ways of performing industrial design. Instead of the usual geometrize, then model and optimize approach, the geometry now automatically adapts to the varying boundary and loading conditions. The app is freely available for iOS at Apple’s App Store and at http://www.topopt.dtu.dk/TopOpt3D for Windows and OSX.  相似文献   

19.
The amount of multimedia data collected in museum databases is growing fast, while the capacity of museums to display information to visitors is acutely limited by physical space. Museums must seek the perfect balance of information given on individual pieces in order to provide sufficient information to aid visitor understanding while maintaining sparse usage of the walls and guaranteeing high appreciation of the exhibit. Moreover, museums often target the interests of average visitors instead of the entire spectrum of different interests each individual visitor might have. Finally, visiting a museum should not be an experience contained in the physical space of the museum but a door opened onto a broader context of related artworks, authors, artistic trends, etc. In this paper we describe the MNEMOSYNE system that attempts to address these issues through a new multimedia museum experience. Based on passive observation, the system builds a profile of the artworks of interest for each visitor. These profiles of interest are then used to drive an interactive table that personalizes multimedia content delivery. The natural user interface on the interactive table uses the visitor’s profile, an ontology of museum content and a recommendation system to personalize exploration of multimedia content. At the end of their visit, the visitor can take home a personalized summary of their visit on a custom mobile application. In this article we describe in detail each component of our approach as well as the first field trials of our prototype system built and deployed at our permanent exhibition space at LeMurate (http://www.lemurate.comune.fi.it/lemurate/) in Florence together with the first results of the evaluation process during the official installation in the National Museum of Bargello (http://www.uffizi.firenze.it/musei/?m=bargello).  相似文献   

20.
We present the twelfth edition of the Multi-Agent Programming Contest (https://multiagentcontest.org), an annual, community-serving competition that attracts groups from all over the world. Our contest facilitates comparison of multi-agent systems and provides a concrete problem that is interesting in itself and well-suited to be tackled in educational environments. This time, seven teams competed using strictly agent-based as well as traditional programming approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号