首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This paper studies supervised clustering in the context of label ranking data. The goal is to partition the feature space into K clusters, such that they are compact in both the feature and label ranking space. This type of clustering has many potential applications. For example, in target marketing we might want to come up with K different offers or marketing strategies for our target audience. Thus, we aim at clustering the customers’ feature space into K clusters by leveraging the revealed or stated, potentially incomplete customer preferences over products, such that the preferences of customers within one cluster are more similar to each other than to those of customers in other clusters. We establish several baseline algorithms and propose two principled algorithms for supervised clustering. In the first baseline, the clusters are created in an unsupervised manner, followed by assigning a representative label ranking to each cluster. In the second baseline, the label ranking space is clustered first, followed by partitioning the feature space based on the central rankings. In the third baseline, clustering is applied on a new feature space consisting of both features and label rankings, followed by mapping back to the original feature and ranking space. The RankTree principled approach is based on a Ranking Tree algorithm previously proposed for label ranking prediction. Our modification starts with K random label rankings and iteratively splits the feature space to minimize the ranking loss, followed by re-calculation of the K rankings based on cluster assignments. The MM-PL approach is a multi-prototype supervised clustering algorithm based on the Plackett-Luce (PL) probabilistic ranking model. It represents each cluster with a union of Voronoi cells that are defined by a set of prototypes, and assign each cluster with a set of PL label scores that determine the cluster central ranking. Cluster membership and ranking prediction for a new instance are determined by cluster membership of its nearest prototype. The unknown cluster PL parameters and prototype positions are learned by minimizing the ranking loss, based on two variants of the expectation-maximization algorithm. Evaluation of the proposed algorithms was conducted on synthetic and real-life label ranking data by considering several measures of cluster goodness: (1) cluster compactness in feature space, (2) cluster compactness in label ranking space and (3) label ranking prediction loss. Experimental results demonstrate that the proposed MM-PL and RankTree models are superior to the baseline models. Further, MM-PL is has shown to be much better than other algorithms at handling situations with significant fraction of missing label preferences.  相似文献   

2.
We present an evolutionary approach for the computation of exact answers to natural languages (NL) questions. Answers are extracted directly from the N-best snippets, which have been identified by a standard Web search engine using NL questions. The core idea of our evolutionary approach to Web question answering is to search for those substrings in the snippets whose contexts are most similar to contexts of already known answers. This context model together with the words mentioned in the NL question are used to evaluate the fitness of answer candidates, which are actually randomly selected substrings from randomly selected sentences of the snippets. New answer candidates are then created by applying specialized operators for crossover and mutation, which either stretch and shrink the substring of an answer candidate or transpose the span to new sentences. Since we have no predefined notion of patterns, our context alignment methods are very dynamic and strictly data-driven. We assessed our system with seven different datasets of question/answer pairs. The results show that this approach is promising, especially when it deals with specific questions.  相似文献   

3.
This paper proposed a novel approach to ranking fuzzy numbers based on the left and right deviation degree (L-R deviation degree). In the approach, the maximal and minimal reference sets are defined to measure L-R deviation degree of fuzzy number, and then the transfer coefficient is defined to measure the relative variation of L-R deviation degree of fuzzy number. Furthermore, the ranking index value is obtained based on the L-R deviation degree and relative variation of fuzzy numbers. Additionally, to compare the proposed approach with the existing approaches, five numerical examples are used. The comparative results illustrate that the approach proposed in this paper is simpler and better.  相似文献   

4.
Evolutionary developmental biology, evo-devo, emerged to integrate evolution and development in the 1980s, in which evolution is conceptualized as heritable changes in development. Recently, the field is moving to a new synthesis: ecological evolutionary developmental biology, eco-evo-devo. We believe that artificial life (ALife) approach will provide new insights into the field, and also contribute to the field of robotics through the emergence perspective and the constructive methodology. This paper explores the potential of such an artificial life approach based on the evolution of virtual creatures by presenting our ongoing studies with three models: (1) metamorphosis model, (2) exaptation model and (3) Prey-predator model.  相似文献   

5.
Basing cluster analysis on mixture models has become a classical and powerful approach. Until now, this approach, which allows to explain some classic clustering criteria such as the well-known k-means criteria and to propose general criteria, has been developed to classify a set of objects measured on a set of variables. But, for this kind of data, if most clustering procedures are designated to construct an optimal partition of objects or, sometimes, of variables, there exist others methods, named block clustering methods, which consider simultaneously the two sets and organize the data into homogeneous blocks.In this work, a new mixture model called block mixture model is proposed to take into account this situation. This model allows to embed simultaneous clustering of objects and variables in a mixture approach. We first consider this probabilistic model in a general context and we develop a new algorithm of simultaneous partitioning based on the CEM algorithm. Then, we focus on the case of binary data and we show that our approach allows us to extend a block clustering method, which had been proposed in this case. Simplicity, fast convergence and the possibility to process large data sets are the major advantages of the proposed approach.  相似文献   

6.
In this paper, we study the problem of keyword proximity search in XML documents. We take the disjunctive semantics among the keywords into consideration and find top-k relevant compact connected trees (CCTrees) as the answers of keyword proximity queries. We first introduce the notions of compact lowest common ancestor (CLCA) and maximal CLCA (MCLCA), and then propose compact connected trees and maximal CCTrees (MCCTrees) to efficiently and effectively answer keyword proximity queries. We give the theoretical upper bounds of the numbers of CLCAs, MCLCAs, CCTrees and MCCTrees, respectively. We devise an efficient algorithm to generate all MCCTrees, and propose a ranking mechanism to rank MCCTrees. Our extensive experimental study shows that our method achieves both high efficiency and effectiveness, and outperforms existing state-of-the-art approaches significantly.  相似文献   

7.
Platforms for community-based Question Answering (cQA) are playing an increasing role in the synergy of information-seeking and social networks. Being able to categorize user questions is very important, since these categories are good predictors for the underlying question goal, viz. informational or subjective. Furthermore, an effective cQA platform should be capable of detecting similar past questions and relevant answers, because it is known that a high number of best answers are reusable. Therefore, question paraphrasing is not only a useful but also an essential ingredient for effective search in cQA. However, the generated paraphrases do not necessarily lead to the same answer set, and might differ in their expected quality of retrieval, for example, in their power of identifying and ranking best answers higher.We propose a novel category-specific learning to rank approach for effectively ranking paraphrases for cQA. We describe a number of different large-scale experiments using logs from Yahoo! Search and Yahoo! Answers, and demonstrate that the subjective and objective nature of cQA questions dramatically affect the recall and ranking of past answers, when fine-grained category information is put into its place. Then, category-specific models are able to adapt well to the different degree of objectivity and subjectivity of each category, and the more specific the models are, the better the results, especially when benefiting from effective semantic and syntactic features.  相似文献   

8.
Objective:We investigate a knowledge-based help system for developers of an integrated clinical information system (CIS). The first objective in the study was to determine the system's ability to answer users' questions effectively. User performance and behavior were studied. The second objective was to evaluate the effect of using questions and answers to augment or replace traditional program documentation.Design:A comparative study of user and system effectiveness using a collection of 47 veritable questions regarding the CIS, solicited from various CIS developers, is conducted. Most questions were concerning the clinical data model and acquiring the data.Measurements:Answers using current documentation known by users were compared to answers found using the help system. Answers existing within traditional documentation were compared to answers existing within question–answer exchanges (Q-A's).Results:The support system augmented 39% of users' answers to test questions. Though the Q-A's were less than 5% of the total documentation collected, these files contained answers to nearly 50% of the questions in the test group. The rest of the documentation contained about 75% of the answers.Conclusions:A knowledge-based help system built by collecting questions and answers can be a viable alternative to large documentation files, providing the questions and answers can be collected effectively.  相似文献   

9.
Image segmentation denotes a process of partitioning an image into distinct regions. A large variety of different segmentation approaches for images have been developed. Among them, the clustering methods have been extensively investigated and used. In this paper, a clustering based approach using a hierarchical evolutionary algorithm (HEA) is proposed for medical image segmentation. The HEA can be viewed as a variant of conventional genetic algorithms. By means of a hierarchical structure in the chromosome, the proposed approach can automatically classify the image into appropriate classes and avoid the difficulty of searching for the proper number of classes. The experimental results indicate that the proposed approach can produce more continuous and smoother segmentation results in comparison with four existing methods, competitive Hopfield neural networks (CHNN), dynamic thresholding, k-means, and fuzzy c-means methods.  相似文献   

10.
A Web-Based Platform for User-Interactive Question-Answering   总被引:2,自引:0,他引:2  
A user-interactive question-answering (QA) platform named BuyAns (at ) is presented. The platform is a special kind of online community and mainly features a rewarding scheme for answering questions among all users, a pattern-based user interface (UI) for questioning and answering, and a pattern-based representation and storage scheme for accumulated question-answer pairs. The system actually proposes and promotes a C2C business model for exchanging and commercializing knowledge from ordinary people. It can also be used as an incentive and collaborative approach to knowledge acquisition. Driven by the business model, prompt and quality answers are quickly accumulated. Due to the patterns used, accurate answers can be provided automatically for repeated questions. Facilitating features and technologies, including user modeling, reputation management, and answer clustering and fusion, are also developed and briefly described. Preliminary user studies show the potential attraction of the system to its users as well as reasonable usability and user-satisfaction. We anticipate hot applications of such a system in the Web 2.0 era.  相似文献   

11.
Modern Community Question Answering (CQA) web forums provide the possibility to browse their archives using question-like search queries as in Information Retrieval (IR) systems. Although these traditional IR methods have become very successful at fetching semantically related questions, they typically leave unconsidered their temporal relations. That is to say, a group of questions may be asked more often during specific recurring time lines despite being semantically unrelated. In fact, predicting temporal aspects would not only assist these platforms in widening the semantic diversity of their search results, but also in re-stating questions that need to refresh their answers and in producing more dynamic, especially temporally-anchored, displays.In this paper, we devised a new set of time-frame specific categories for CQA questions, which is obtained by fusing two distinct earlier taxonomies (i.e., [29] and [50]). These new categories are then utilized in a large crowd-sourcing based human annotation effort. Accordingly, we present a systematical analysis of its results in terms of complexity and degree of difficulty as it relates to the different question topics1Incidentally, through a large number of experiments, we investigate the effectiveness of a wider variety of linguistic features compared to what has been done in previous works. We additionally mix evidence/features distilled directly and indirectly from questions by capitalizing on their related web search results. We finally investigate the impact and effectiveness of multi-view learning to boost a large variety of multi-class supervised learners by optimizing a latent layer build on top of two views: one composed of features harvested from questions, and the other from CQA meta data and evidence extracted from web resources (i.e., snippets and Internet archives).  相似文献   

12.
13.
14.
In the process of reviewing and ranking projects by a group of reviewers, the allocation of the subset of projects to each reviewer has major impact on the robustness of the outcome ranking. We address here this problem where each reviewer is assigned, out of the list of all projects, a subset of up to k projects. Each individual reviewer then ranks and compares all pairs of k projects. The k-allocation problem is to determine an allocation of up to k projects to each reviewer, that lie within the expertise set of the reviewer, so that the resulting union of reviewed projects has certain desirable properties. The k-complete problem is a k-allocation with the property that all pairs of projects have been compared by at least one reviewer. A k-complete allocation is desirable as otherwise there may be projects that were not compared by any reviewer, leading to possible adverse properties in the outcome ranking. When a k-complete allocation cannot be achieved, one might settle for other properties. One basic requirement is that each pair of projects is comparable via a ranking path which is a sequence of pairwise rankings of projects implying a comparison of all pairs on the path. A k-allocation with a ranking path between each pair is the connectivity-k-aloc. Since the robustness of relative comparisons deteriorates with increased length of the ranking path, another goal is that between each pair of projects there will be at least one ranking path that has at most two hops or q hops for fixed values of q. An alternative means for increasing the robustness of the ranking is to use a k-allocation with at least p disjoint ranking paths between each pair. We model all these problems as graph problems. We demonstrate that the connectivity-k-aloc problem is polynomially solvable, using matroid intersection; we prove that the k-complete problem is NP-hard unless k = 2; and we provide approximation algorithms for a related optimization problem. All other variants are shown to be NP-complete for all values of k ≥ 2.  相似文献   

15.
The growth of social media usage questions the old-style idea of customer relationship management (CRM). Social CRM strategy is a novel version of CRM empowered by social media technology that offers a new way of managing relationships with customers effectively. This study aims to forecast the predictors of social CRM strategy adoption by small and medium enterprises (SMEs). The proposed model used in this study derived its theoretical support from IT/IS, marketing, and CRM literature. In the proposed Technology-Organization-Environment-Process (TOEP) adoption model, several hypotheses are developed which examine the role of Technological factors, such as Cost of Adoption, Relative Advantages, Complexity, and Compatibility; Organizational factors, such as IT/IS knowledge of employee, and Top management support; Environmental factors such as Competitive Pressure, and Customer Pressure; and Process factors such as Information Capture, Information Use, and Information Sharing; all having a positive relationship with social CRM adoption. This research applied a following two staged SEM-neural network method combining both structural equation modelling (SEM) and neural network analyses. The proposed hypothetical model is examined by using SEM on the collected data of SMEs in Kuala Lumpur, the central city of Malaysia. The SEM approach with a neural network method can be used to investigate the complicated relations involved in the adoption of social CRM. The study finds that compatibility, information capture, IT/IS knowledge of employee, top management support, information sharing, competitive pressure, cost, relative advantage, and customer pressure are the most important factors influencing social CRM adoption. Remarkably, the results of neural network analysis show that compatibility and information capture of social CRM are the most significant factors which affect SMEs' adoption of this form of customer relationship management. The outcomes of this research benefit executives' decision-making by identifying and ranking factors that enable them to discover how they can advance the usage of social CRM in their firms. Furthermore, the findings of this study can help the managers/owners of SMEs assign their resources, according to the ranking of social CRM adoption factors, when they are making plans to adopt social CRM. This study differs from previous studies as it proposes an innovative new approach to determine what influences the adoption of social CRM. By proposing the TOEP adoption model, additional information process factors advance the traditional TOE adoption model.  相似文献   

16.
This paper describes the work that adapts group technology and integrates it with fuzzy c-means, genetic algorithms and the tabu search to realize a fuzzy c-means based hybrid evolutionary approach to the clustering of supply chains. The proposed hybrid approach is able to organise supply chain units, transportation modes and work orders into different unit-transportation-work order families. It can determine the optimal clustering parameter, namely the number of clusters, c, and weighting exponent, m, dynamically, and is able to eliminate the necessity of pre-defining suitable values for these clustering parameters. A new fuzzy c-means validity index that takes into account inter-cluster transportation and group efficiency is formulated. It is employed to determine the promise level that estimates how good a set of clustering parameters is. The capability of the proposed hybrid approach is illustrated using three experiments and the comparative studies. The results show that the proposed hybrid approach is able to suggest suitable clustering parameters and near optimal supply chain clusters can be obtained readily.  相似文献   

17.
The flexible architecture of evolutionary algorithms allows specialised models to be obtained with the aim of performing as other search methods do, but more satisfactorily. In fact, there exist several evolutionary proposals in the literature that play the role of local search methods. In this paper, we make a step forward presenting a specialised evolutionary approach that carries out a search process equivalent to the one of simulated annealing. An empirical study comparing the new model with classic simulated annealing methods, hybrid algorithms and state-of-the-art optimisers concludes that the new alternative scheme for combining ideas from simulated annealing and evolutionary algorithms introduced by our proposal may outperform this kind of hybrid algorithms, and achieve competitive results with regard to proposals presented in the literature for binary-coded optimisation problems.  相似文献   

18.
Adaptive multilevel rough entropy evolutionary thresholding   总被引:1,自引:0,他引:1  
In this study, comprehensive research into rough set entropy-based thresholding image segmentation techniques has been performed producing new and robust algorithmic schemes. Segmentation is the low-level image transformation routine that partitions an input image into distinct disjoint and homogenous regions using thresholding algorithms most often applied in practical situations, especially when there is pressing need for algorithm implementation simplicity, high segmentation quality, and robustness. Combining entropy-based thresholding with rough set results in the rough entropy thresholding algorithm.The authors propose a new algorithm based on granular multilevel rough entropy evolutionary thresholding that operates on a multilevel domain. The MRET algorithm performance has been compared to the iterative RET algorithm and standard k-means clustering methods on the basis of β-index as a representative validation measure. Performance in experimental assessment suggests that granular multilevel rough entropy threshold based segmentations - MRET - present high quality, comparable with and often better than k-means clustering based segmentations. In this context, the rough entropy evolutionary thresholding MRET algorithm is suitable for specific segmentation tasks, when seeking solutions that incorporate spatial data features with particular characteristics.  相似文献   

19.
The synergy between peer-to-peer systems and semantic Web technologies supports large-scale sharing of semantically rich data, usually represented through schemas such as RDF. Peers rarely share the same vocabulary, so the resulting heterogeneity of data representations introduces new challenges for the efficient and effective retrieval of relevant information. The authors leverage the presence of semantic approximations between peers' schemas to improve query routing by identifying the peers that best satisfy the user's requests, and to inform users of the relevance of the returned answers through a ranking mechanism that promotes the most semantically related results.  相似文献   

20.
Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and Stack Overflow Q&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for Stack Overflow questions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号