首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
已知的面向排序的协同过滤算法主要有两个缺点:计算用户相似度时只考虑用户对同一产品对的偏好是否一致,而忽略了用户对产品对的偏好程度以及该偏好在用户间的流行度; 进行偏好融合和排序时需要中间步骤来构建价值函数然后才能利用贪婪算法产生推荐列表。为解决上述问题: 我们利用类TF-IDF加权策略对用户的偏好程度及偏好流行度进行综合考量,使用加权的Kendall Tau相关系数计算用户间的相似度;进行偏好融合与排序时则使用基于投票的舒尔茨方法直接产生推荐列表。在两个电影数据集上,本文提出的算法在评测指标NDCG上的效果要明显优于其他流行的协同过滤算法。  相似文献   

3.
We present a method to speed up the dynamic program algorithms used for solving the HMM decoding and training problems for discrete time-independent HMMs. We discuss the application of our method to Viterbi’s decoding and training algorithms (IEEE Trans. Inform. Theory IT-13:260–269, 1967), as well as to the forward-backward and Baum-Welch (Inequalities 3:1–8, 1972) algorithms. Our approach is based on identifying repeated substrings in the observed input sequence. Initially, we show how to exploit repetitions of all sufficiently small substrings (this is similar to the Four Russians method). Then, we describe four algorithms based alternatively on run length encoding (RLE), Lempel-Ziv (LZ78) parsing, grammar-based compression (SLP), and byte pair encoding (BPE). Compared to Viterbi’s algorithm, we achieve speedups of Θ(log n) using the Four Russians method, using RLE, using LZ78, using SLP, and Ω(r) using BPE, where k is the number of hidden states, n is the length of the observed sequence and r is its compression ratio (under each compression scheme). Our experimental results demonstrate that our new algorithms are indeed faster in practice. We also discuss a parallel implementation of our algorithms. A preliminary version of this paper appeared in Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 4–15, 2007. Y. Lifshits’ research was supported by the Center for the Mathematics of Information and the Lee Center for Advanced Networking. S. Mozes’ work conducted while visiting MIT.  相似文献   

4.
Traditionally, collaborative recommender systems have been based on a single-shot model of recommendation where a single set of recommendations is generated based on a user’s (past) stored preferences. However, content-based recommender system research has begun to look towards more conversational models of recommendation, where the user is actively engaged in directing search at recommendation time. Such interactions can range from high-level dialogues with the user, possibly in natural language, to more simple interactions where the user is, for example, asked to indicate a preference for one of k suggested items. Importantly, the feedback attained from these interactions can help to differentiate between the user’s long-term stored preferences, and her current (short-term) requirements, which may be quite different. We argue that such interactions can also be beneficial to collaborative recommendation and provide experimental evidence to support this claim.  相似文献   

5.

Collaborative filtering algorithms find useful patterns in rating and consumption data and exploit these patterns to guide users to good items. Many of these patterns reflect important real-world phenomena driving interactions between the various users and items; other patterns may be irrelevant or reflect undesired discrimination, such as discrimination in publishing or purchasing against authors who are women or ethnic minorities. In this work, we examine the response of collaborative filtering recommender algorithms to the distribution of their input data with respect to one dimension of social concern, namely content creator gender. Using publicly available book ratings data, we measure the distribution of the genders of the authors of books in user rating profiles and recommendation lists produced from this data. We find that common collaborative filtering algorithms tend to propagate at least some of each user’s tendency to rate or read male or female authors into their resulting recommendations, although they differ in both the strength of this propagation and the variance in the gender balance of the recommendation lists they produce. The data, experimental design, and statistical methods are designed to be reusable for studying potentially discriminatory social dimensions of recommendations in other domains and settings as well.

  相似文献   

6.
The notion of P-simple points was introduced by Bertrand to conceive parallel thinning algorithms. In ‘A 3D fully parallel thinning algorithm for generating medial faces’ (Pattern Recogn. Lett. 16:83–87, 1995), Ma proposed an algorithm for which there are objects whose topology is not preserved. In this paper, we propose a new application of P-simple points: to automatically correct Ma’s algorithm.  相似文献   

7.
Mobile devices need to provide more accurate and personalized information in a computing environment with a small screen and limited information retrieval functions. This paper presents a user-selectable recommendation system that reflects a user interest group by employing collaborative filtering as technique to provide useful information in a mobile environment. We form similar groups by simultaneously considering a user’s information preferences and demographics. Then we reconstruct lists of a final recommendation based on what search results the similar demographic group has chosen. This is an optional filter for the search results. This means that we provide an interactive flexible recommendation list that considers a user’s intent more actively, rather than unilaterally. We show the Mean Absolute Error result to evaluate the recommendation and finally show the realization of a prototype that is based on both the iPhone and Android phone environments.  相似文献   

8.
In order to make a recommendation, a recommender system typically first predicts a user’s ratings for items and then recommends a list of items to the user which have high predicted ratings. Quality of predictions is measured by accuracy, that is, how close the predicted ratings are to actual ratings. On the other hand, quality of recommendation lists is evaluated from more than one perspective. Since accuracy of predicted ratings is not enough for customer satisfaction, metrics such as novelty, serendipity, and diversity are also used to measure the quality of the recommendation lists. Aggregate diversity is one of these metrics which measures the diversity of items across the recommendation lists of all users. Increasing aggregate diversity is important because it leads a more even distribution of items in the recommendation lists which prevents the long-tail problem. In this study, we propose two novel methods to increase aggregate diversity of a recommender system. The first method is a reranking approach which takes a ranked list of recommendations of a user and reranks it to increase aggregate diversity. While the reranking approach is applied after model generation as a wrapper the second method is applied in model generation phase which has the advantage of being more efficient in the generation of recommendation lists. We compare our methods with the well-known methods in the field and show the superiority of our methods using real-world datasets.  相似文献   

9.
In many E-commerce recommender systems, a special class of recommendation involves recommending items to users in a life cycle. For example, customers who have babies will shop on Diapers.com within a relatively long period, and purchase different products for babies within different growth stages. Traditional recommendation algorithms produce recommendation lists similar to items that the target user has accessed before (content filtering), or compute recommendation by analyzing the items purchased by the users who are similar to the target user (collaborative filtering). Such recommendation paradigms cannot effectively resolve the situation with a life cycle, i.e., the need of customers within different stages might vary significantly. In this paper, we model users’ behavior with life cycles by employing hand-crafted item taxonomies, of which the background knowledge can be tailored for the computation of personalized recommendation. In particular, our method first formalizes a user’s long-term behavior using the item taxonomy, and then identifies the exact stage of the user. By incorporating collaborative filtering into recommendation, we can easily provide a personalized item list to the user through other similar users within the same stage. An empirical evaluation conducted on a purchasing data collection obtained from Diapers.com demonstrates the efficacy of our proposed method.  相似文献   

10.
Uzquiano (Analysis 70:39–44, 2010) showed that the Hardest Logic Puzzle Ever (HLPE) [in its amended form due to Rabern and Rabern (Analysis 68:105–112, 2008)] has a solution in only two questions. Uzquiano concludes his paper by noting that his solution strategy naturally suggests a harder variation of the puzzle which, as he remarks, he does not know how to solve in two questions. Wheeler and Barahona (J Philos Logic, to appear, 2011) formulated a three question solution to Uzquiano’s puzzle and gave an information theoretic argument to establish that a two question solution for Uzquiano’s puzzle does not exist. However, their argument crucially relies on a certain conception of what it means to answer self-referential yes–no questions truly and falsely. We propose an alternative such conception which, as we show, allows one to solve Uzquiano’s puzzle in two questions. The solution strategy adopted suggests an even harder variation of Uzquiano’s puzzle which, as we will show, can also be solved in two questions. Just as all previous solutions to versions of HLPE, our solution is presented informally. The second part of the paper investigates the prospects of formally representing solutions to HLPE by exploiting theories of truth.  相似文献   

11.
In 2003, Maurer et al. (IEEE Trans. Pattern Anal. Mach. Intell. 25:265–270, 2003) published a paper describing an algorithm that computes the exact distance transform in linear time (with respect to image size) for the rectangular binary images in the k-dimensional space ℝ k and distance measured with respect to L p -metric for 1≤p≤∞, which includes Euclidean distance L 2. In this paper we discuss this algorithm from theoretical and practical points of view. On the practical side, we concentrate on its Euclidean distance version, discuss the possible ways of implementing it as signed distance transform, and experimentally compare implemented algorithms. We also describe the parallelization of these algorithms and discuss the computational time savings associated with them. All these implementations will be made available as a part of the CAVASS software system developed and maintained in our group (Grevera et al. in J. Digit. Imaging 20:101–118, 2007). On the theoretical side, we prove that our version of the signed distance transform algorithm, GBDT, returns the exact value of the distance from the geometrically defined object boundary. We provide a complete proof (which was not given of Maurer et al. (IEEE Trans. Pattern Anal. Mach. Intell. 25:265–270, 2003) that all these algorithms work correctly for L p -metric with 1<p<∞. We also point out that the precise form of the algorithm from Maurer et al. (IEEE Trans. Pattern Anal. Mach. Intell. 25:265–270, 2003) is not well defined for L 1 and L metrics. In addition, we show that the algorithm can be used to find, in linear time, the exact value of the diameter of an object, that is, the largest possible distance between any two of its elements.  相似文献   

12.
Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning” (Bain and Sammut in Machine intelligence agents. London: Oxford University Press, 1995; Pomerleau in Advances in neural information processing systems 1, 1989; LeCun et al. in Advances in neural information processing systems 18, 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al. in Proceedings of the IEEE-RAS international conference on humanoid robots, 2003) to outdoor unstructured navigation (Kelly et al. in Proceedings of the international symposium on experimental robotics (ISER), 2004; Stentz et al. in AUVSI’s unmanned systems, 2007), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration. These algorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert’s demonstration. The work we present extends the Maximum Margin Planning (MMP) (Ratliff et al. in Twenty second international conference on machine learning (ICML06), 2006a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH), are simpler to implement than most existing methods, more efficient than previous attempts at non-linearization (Ratliff et al. in NIPS, 2006b), more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the function’s form. We derive and discuss the framework both mathematically and intuitively, and demonstrate practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. The latter study includes hundreds of kilometers of autonomous traversal through complex natural environments. These case-studies address key challenges in applying the algorithm in practical settings that utilize state-of-the-art planners, and which may be constrained by efficiency requirements and imperfect expert demonstration.
J. Andrew BagnellEmail:
  相似文献   

13.
In this paper, we propose a context-aware food recommendation system for well-being care applications. The proposed system, called u-BabSang, provides individualized food recommendation lists at the dining table, and is based dietary advice in the typical Korean medical text. Our proposed system receives a user’s profile, physiological signals, and environmental information around the dining table in real time. To operate our system, we present a method for user specified analysis, and also describe time-division layered context integration which integrates the multiple contexts obtained from the sensors. Thus, our system recommends appropriate foods for each individual’s health at the table in real time.  相似文献   

14.
Autobiographical memory (AM) is the “memory for the events in one’s life” [1]. Often it is assumed that in order to remember all those events, you just need to record everything and when you replay these recordings you will remember those events. You can compare this with a library metaphor that has been used to explain AM according to the record-keeping approach. However, after many years of AM-research it was concluded that AM is stored in a different manner, namely according to the constructionist approach, which often is initiated by memory cues. This paper explains these AM theories, surveys literature on existing augmented memory systems and describes our own work in this area. All this input is combined into eight design recommendations for future augmented memory systems.
Elise van den HovenEmail:
  相似文献   

15.
This paper presents a performance study of a one-dimensional search algorithm for solving general high-dimensional optimization problems. The proposed approach is a hybrid between a line search algorithm of Glover (The 3-2-3, stratified split and nested interval line search algorithms. Research report, OptTek Systems, Boulder, CO, 2010) and an improved variant of a global method of Gardeux et al. (Unidimensional search for solving continuous high-dimensional optimization problems. In: ISDA ’09: Proceedings of the 2009 ninth international conference on intelligent systems design and applications, IEEE Computer Society, Washington, DC, USA, pp 1096–1101, 2009) that uses line search algorithms as subroutines. The resulting algorithm, called EM323, was tested on 19 scalable benchmark functions, with a view to observing how optimization techniques for continuous optimization problems respond with increasing dimension. To this end, we report the algorithm’s performance on the 50, 100, 200, 500 and 1,000-dimension versions of each function. Computational results are given comparing our method with three leading evolutionary algorithms. Statistical analysis discloses that our method outperforms the other methods by a significant margin.  相似文献   

16.
k-Anonymity is a privacy preserving method for limiting disclosure of private information in data mining. The process of anonymizing a database table typically involves generalizing table entries and, consequently, it incurs loss of relevant information. This motivates the search for anonymization algorithms that achieve the required level of anonymization while incurring a minimal loss of information. The problem of k-anonymization with minimal loss of information is NP-hard. We present a practical approximation algorithm that enables solving the k-anonymization problem with an approximation guarantee of O(ln k). That algorithm improves an algorithm due to Aggarwal et al. (Proceedings of the international conference on database theory (ICDT), 2005) that offers an approximation guarantee of O(k), and generalizes that of Park and Shim (SIGMOD ’07: proceedings of the 2007 ACM SIGMOD international conference on management of data, 2007) that was limited to the case of generalization by suppression. Our algorithm uses techniques that we introduce herein for mining closed frequent generalized records. Our experiments show that the significance of our algorithm is not limited only to the theory of k-anonymization. The proposed algorithm achieves lower information losses than the leading approximation algorithm, as well as the leading heuristic algorithms. A modified version of our algorithm that issues -diverse k-anonymizations also achieves lower information losses than the corresponding modified versions of the leading algorithms.  相似文献   

17.
18.
This article reports the results of an extensive experimental analysis of efficient algorithms for computing graph spanners in the data streaming model, where an (α,β)-spanner of a graph G is a subgraph SG such that for each pair of vertices the distance in S is at most α times the distance in G plus β. To the best of our knowledge, this is the first computational study of graph spanner algorithms in a streaming setting. We compare experimentally the randomized algorithms proposed by Baswana () and by Elkin (In: Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP 2007), Wroclaw, Poland, pp. 716–727, 9–13 July 2007) for general stretch factors with the deterministic algorithm presented by Ausiello et al. (In: Proceedings of the 15th Annual European Symposium on Algorithms (ESA 2007), Engineering and Applications Track, Eilat, Israel, 8–10 October 2007. LNCS, vol. 4698, pp. 605–617, 2007), designed for building small stretch spanners. All the algorithms we implemented work in a data streaming model where the input graph is given as a stream of edges in arbitrary order, and all of them need a single pass over the data. Differently from the algorithm in Ausiello et al., the algorithms in Baswana () and Elkin (In: Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP 2007), Wroclaw, Poland, pp. 716–727, 9–13 July 2007) need to know in advance the number of vertices in the graph. The results of our experimental investigation on several input families confirm that all these algorithms are very efficient in practice, finding spanners with stretch and size much smaller than the theoretical bounds and comparable to those obtainable by off-line algorithms. Moreover, our experimental findings confirm that small values of the stretch factor are the case of interest in practice, and that the algorithm by Ausiello et al. tends to produce spanners of better quality than the algorithms by Baswana and Elkin, while still using a comparable amount of time and space resources. Work partially supported by the Italian Ministry of University and Research under Project MAINSTREAM “Algorithms for Massive Information Structures and Data Streams”. A preliminary version of this paper was presented at the 15th Annual European Symposium on Algorithms (ESA 2007) 5.  相似文献   

19.
We consider the problem of efficiently sampling Web search engine query results. In turn, using a small random sample instead of the full set of results leads to efficient approximate algorithms for several applications, such as:
•  Determining the set of categories in a given taxonomy spanned by the search results;
•  Finding the range of metadata values associated with the result set in order to enable “multi-faceted search”;
•  Estimating the size of the result set;
•  Data mining associations to the query terms.
We present and analyze efficient algorithms for obtaining uniform random samples applicable to any search engine that is based on posting lists and document-at-a-time evaluation. (To our knowledge, all popular Web search engines, for example, Google, Yahoo Search, MSN Search, Ask, belong to this class.) Furthermore, our algorithm can be modified to follow the modern object-oriented approach whereby posting lists are viewed as streams equipped with a next method, and the next method for Boolean and other complex queries is built from the next method for primitive terms. In our case we show how to construct a basic sample-next(p) method that samples term posting lists with probability p, and show how to construct sample-next(p) methods for Boolean operators (AND, OR, WAND) from primitive methods. Finally, we test the efficiency and quality of our approach on both synthetic and real-world data. A preliminary version of this work has appeared in [3]. Work performed while A. Anagnostopoulos and A.Z. Broder were at IBM T. J. Watson Research Center.  相似文献   

20.
We provide sharp estimates for the probabilistic behaviour of the main parameters of the Euclid Algorithms, both on polynomials and on integer numbers. We study in particular the distribution of the bit-complexity which involves two main parameters: digit-costs and length of remainders. We first show here that an asymptotic Gaussian law holds for the length of remainders at a fraction of the execution, which exhibits a deep regularity phenomenon. Then, we study in each framework—polynomials (P) and integer numbers (I)—two gcd algorithms, the standard one (S) which only computes the gcd, and the extended one (E) which also computes the Bezout pair, and is widely used for computing modular inverses. The extended algorithm is more regular than the standard one, and this explains that our results are more precise for the Extended algorithm: we exhibit an asymptotic Gaussian law for the bit-complexity of the extended algorithm, in both cases (P) and (I). We also prove that an asymptotic Gaussian law for the bit-complexity of the standard gcd in case (P), but we do not succeed obtaining a similar result in case (I). The integer study is more involved than the polynomial study, as it is usually the case. In the polynomial case, we deal with the central tools of the distributional analysis of algorithms, namely bivariate generating functions. In the integer case, we are led to dynamical methods, which heavily use the dynamical system underlying the number Euclidean algorithm, and its transfer operator. Baladi and Vallée (J. Number Theory 110(2):331–386, 2005) have recently designed a general framework for “distributional dynamical analysis”, where they have exhibited asymptotic Gaussian laws for a large family of parameters. However, this family does not contain neither the bit-complexity cost nor the size of remainders, and we have to extend their methods for obtaining our results. Even if these dynamical methods are not necessary in case (P), we explain how the polynomial dynamical system can be also used for proving our results. This provides a common framework for both analyses, which well explains the similarities and the differences between the two cases (P) and (I), for the algorithms themselves, and also for their analysis. An extended abstract of this paper can be found in Lhote and Vallée (Proceedings of LATIN’06, Lecture Notes in Computer Science, vol. 3887, pp. 689–702, 2006).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号