首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
How users rate a mobile app via star ratings and user reviews is of utmost importance for the success of an app. Recent studies and surveys show that users rely heavily on star ratings and user reviews that are provided by other users, for deciding which app to download. However, understanding star ratings and user reviews is a complicated matter, since they are influenced by many factors such as the actual quality of the app and how the user perceives such quality relative to their expectations, which are in turn influenced by their prior experiences and expectations relative to other apps on the platform (e.g., iOS versus Android). Nevertheless, star ratings and user reviews provide developers with valuable information for improving the overall impression of their app. In an effort to expand their revenue and reach more users, app developers commonly build cross-platform apps, i.e., apps that are available on multiple platforms. As star ratings and user reviews are of such importance in the mobile app industry, it is essential for developers of cross-platform apps to maintain a consistent level of star ratings and user reviews for their apps across the various platforms on which they are available. In this paper, we investigate whether cross-platform apps achieve a consistent level of star ratings and user reviews. We manually identify 19 cross-platform apps and conduct an empirical study on their star ratings and user reviews. By manually tagging 9,902 1 & 2-star reviews of the studied cross-platform apps, we discover that the distribution of the frequency of complaint types varies across platforms. Finally, we study the negative impact ratio of complaint types and find that for some apps, users have higher expectations on one platform. All our proposed techniques and our methodologies are generic and can be used for any app. Our findings show that at least 79% of the studied cross-platform apps do not have consistent star ratings, which suggests that different quality assurance efforts need to be considered by developers for the different platforms that they wish to support.  相似文献   

2.

Heterogeneous information networks, which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects, are ubiquitous in the real world. In this paper, we study the problem of entity matching for heterogeneous information networks based on distributed network embedding and multi-layer perceptron with a highway network, and we propose a new method named DEM short for Deep Entity Matching. In contrast to the traditional entity matching methods, DEM utilizes the multi-layer perceptron with a highway network to explore the hidden relations to improve the performance of matching. Importantly, we incorporate DEM with the network embedding methodology, enabling highly efficient computing in a vectorized manner. DEM’s generic modeling of both the network structure and the entity attributes enables it to model various heterogeneous information networks flexibly. To illustrate its functionality, we apply the DEM algorithm to two real-world entity matching applications: user linkage under the social network analysis scenario that predicts the same or matched users in different social platforms and record linkage that predicts the same or matched records in different citation networks. Extensive experiments on real-world datasets demonstrate DEM’s effectiveness and rationality.

  相似文献   

3.
郑永广  岳昆  尹子都  张学杰 《计算机应用》2017,37(11):3101-3106
针对大规模社交网络及其用户发布消息的历史数据,如何快速有效地选取具有较强信息传播能力的关键用户,提出了一种关键用户选取方法。首先,利用社交网络的结构信息,构建以用户为节点的有向图,利用用户发布消息的历史数据,基于Spark计算框架,定量计算由用户活跃度、转发交互度和信息量占比刻画的权重,从而构建社交网络的有向带权图模型;然后,借鉴PageRank算法,建立用户信息传播能力的度量机制,给出基于Spark的大规模社交网络中用户信息传播能力的计算方法;进而,给出基于Spark的d-距选取算法,通过多次迭代,使得所选取的不同关键用户的信息传播范围尽量少地重叠。建立在新浪微博数据上的实验结果表明,所提方法具有高效性、可行性和可扩展性,对于控制不良突发信息传播、社交网络舆情监控具有一定的支撑作用。  相似文献   

4.

One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time. Hadoop and Spark are two frameworks for distributed data processing. Hadoop is a very popular and general platform for big data processing. Because of the in-memory programming model, Spark as an open-source framework is suitable for processing iterative algorithms. In this paper, Hadoop and Spark frameworks, the big data processing platforms, are evaluated and compared in terms of runtime, memory and network usage, and central processor efficiency. Hence, the K-nearest neighbor (KNN) algorithm is implemented on datasets with different sizes within both Hadoop and Spark frameworks. The results show that the runtime of the KNN algorithm implemented on Spark is 4 to 4.5 times faster than Hadoop. Evaluations show that Hadoop uses more sources, including central processor and network. It is concluded that the CPU in Spark is more effective than Hadoop. On the other hand, the memory usage in Hadoop is less than Spark.

  相似文献   

5.
随着民宿与在线短租平台的兴起,房东多归属现象持续受到关注与研究,该现象提供了新的研究角度,而如何在不同平台识别同源房东成为首要解决的问题。故本文基于传统用户匹配探索C2C在线短租跨平台房东匹配算法。其中由于房东个人信息稀疏,因此本文引入房源信息,设计基于房源信息的两阶段房东匹配算法(TSHM)。本文方法在基于国内2大在线短租平台真实数据划分的普通数据集与难例数据集上分别达到99.69%与81.97%的准确率,优于SVM、DT等传统分类器,验证了匹配模型与匹配特征的有效性,为跨平台房东匹配提供新思路,在房东个人信息缺乏条件下仍可有效匹配房东。但本文仅针对国内平台数据进行实验,未引入文本与图片等特征,存在一定局限性。  相似文献   

6.
Spatial analytics systems (SASs) represent a technology capable of managing huge volumes of spatial data using frameworks such as Apache Hadoop and Apache Spark. An increasing number of SASs have been proposed, requiring a comparison among them. However, existing comparisons in the literature provide a system-centric view based on performance evaluations. Thus, there is a lack of comparisons based on the user-centric view, that is, comparisons that help users to understand how the characteristics of SASs are useful to meet the specific requirements of their spatial applications. In this article, we provide a user-centric comparison of the following SASs based on Hadoop and Spark: Hadoop-GIS, SpatialHadoop, SpatialSpark, GeoSpark, GeoMesa Spark, SIMBA, LocationSpark, STARK, Magellan, SparkGIS, and Elcano. This comparison employs an extensive set of criteria related to the general characteristics of these systems, to the aspects of spatial data handling, and to the aspects inherent to distributed systems. Based on this comparison, we introduce guidelines to help users to choose an appropriate SAS. We also describe two case studies based on real-world applications to illustrate the use of these guidelines. Finally, we discuss chronological tendencies related to SASs and identify limitations that SASs should address to improve user experience.  相似文献   

7.
付博  刘挺 《软件学报》2016,27(11):2843-2854
不同于已有的显式消费意图识别的研究,提出了社会媒体中用户的隐式消费意图自动识别方法.该方法将隐式消费意图识别视作多标记分类问题,并综合使用了基于用户关注行为、意图关注行为、意图转发行为以及个人信息的多种特征.由于隐式消费意图识别难以评价,自动抽取了大量跨社会媒体的用户链指信息,利用该方法,共抽取出12万余对的用户链指.在此自动评价集上的实验结果表明,所采用的多标记分类方法对于识别用户的隐式消费意图是行之有效的,其中使用的各种特征对于提高隐式消费意图识别的效果皆有帮助.  相似文献   

8.
It is becoming a common practice to use recommendation systems to serve users of web-based platforms such as social networking platforms, review web-sites, and e-commerce web-sites. Each platform produces recommendations by capturing, maintaining and analyzing data related to its users and their behavior. However, people generally use different web-based platforms for different purposes. Thus, each platform captures its own data which may reflect certain aspects related to its users. Integrating data from multiple platforms may widen the perspective of the analysis and may help in modeling users more effectively. Motivated by this, we developed a recommendation framework which integrates data collected from multiple platforms. For this purpose, we collected and anonymized datasets which contain information from several social networking and social media platforms, namely BlogCatalog, Twitter, Flickr, Facebook, YouTube and LastFm. The collected and integrated data forms a consolidated repository that may become a valuable source for researchers and practitioners. We implemented a number of recommendation methodologies to observe their performance for various cases which involve using single versus multiple features from a single source versus multiple sources. The conducted experiments have shown that using multiple features from multiple sources is expected to produce a more concrete and wider perspective of user’s behavior and preferences. This leads to improved recommendation outcome.  相似文献   

9.
社交网络现已成为现实世界中信息传播与扩散的主要媒介,对其中的热点信息进行建模和预测有着广泛的应用场景和商业价值,比如进行信息传播挖掘、广告推荐和用户行为分析等.目前的相关研究主要利用特征和时间序列进行建模,但是并没有考虑到社交网络中用户的社交圈层对于信息传播的作用.本文提出了一种基于社交圈层和注意力机制的热度预测模型S...  相似文献   

10.
在推荐系统领域,了解电商平台中在线用户的行为意图至关重要。目前的一些方法通常将用户与商品之间的交互历史数据视为有序的序列,却忽视了不同交互行为之间的时间间隔信息。另外,一个用户的在线行为可能不仅仅包含一种意图,而是包含多种意图。例如,当一位用户在浏览运动品类下的商品时,其可能同时有购买足球和运动衫这两种商品的意图。但是现有的一些电商平台用户意图预测方法很难有效对用户-商品交互对时间间隔信息进行建模,也难以捕捉用户多方面的购物意图。为了解决上述问题,我们提出了一种时间感知分层自注意力网络模型THSNet,以更有效对电商平台的用户意图进行预测。具体而言,THSNet模型采用一种分层注意力机制来有效地捕获用户-商品交互历史中的时间跨度信息以更有效建模用户的多种意图。THSNet模型的注意力层分为两层,底层的注意力层用于建模每个会话内部的用户-商品交互,上层的注意力层学习不同会话之间的长期依赖关系。另外,为了提高预测结果的鲁棒性和准确度,我们采用BERT预训练的方法,通过随机遮盖部分会话的特征表示,构造了一个完形填空任务,并将该任务与用户意图预测任务耦合成为多任务学习模型,这种多任务预测方法有助于模型学到一个具有鲁棒性和双向性的会话特征表示。我们在两个真实数据集上对所提方法对有效性进行了验证。实验结果表明,我们所提出的THSNet模型要明显优于目前最先进的方法。  相似文献   

11.
现有的社交网络去匿名方法主要是基于网络结构,对网络结构进行学习与表示是去匿名的关键。用户身份链接(user identity linkage)的目的是检测来自不同社交网络平台的同一个用户。基于深度学习的跨社交网络用户对齐技术,很好地学习了不同社交网络的结构特征,实现了跨社交网络的用户对齐。将该技术用于同一社交网络匿名用户识别,实验结果优于传统去匿名方法。  相似文献   

12.
Compressed bitmap indexes are used in databases and search engines. Many bitmap compression techniques have been proposed, almost all relying primarily on run‐length encoding (RLE). However, on unsorted data, we can get superior performance with a hybrid compression technique that uses both uncompressed bitmaps and packed arrays inside a two‐level tree. An instance of this technique, Roaring, has recently been proposed. Due to its good performance, it has been adopted by several production platforms (e.g., Apache Lucene, Apache Spark, Apache Kylin, and Druid). Yet there are cases where run‐length‐encoded bitmaps are smaller than the original Roaring bitmaps—typically when the data are sorted so that the bitmaps contain long compressible runs. To better handle these cases, we build a new Roaring hybrid that combines uncompressed bitmaps, packed arrays, and RLE‐compressed segments. The result is a new Roaring format that compresses better. Overall, our new implementation of Roaring can be several times faster (up to two orders of magnitude) than the implementations of traditional RLE‐based alternatives (WAH, Concise, and EWAH) while compressing better. We review the design choices and optimizations that make these good results possible. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

13.
移动应用开发一直存在多平台重复开发以及各平台用户体验不一致的实际问题。在移动应用中引入Web技术可解决跨平台的需求。介绍移动Web应用和混合型应用的设计方式,提出以Web技术为主和以原生技术为主的两种混合型应用开发模式,较好地解决开发效率与运行性能之间的平衡问题,并提供具体的实现方法。  相似文献   

14.
Many recommender systems are currently available for proposing content (movies, TV series, music, etc.) to users according to different profiling metrics, such as ratings of previously consumed items and ratings of people with similar tastes. Recommendation algorithms are typically executed by powerful servers, as they are computationally expensive. In this paper, we propose a new software solution to improve the performance of recommender systems. Its implementation relies heavily on Apache Spark technology to speed up the computation of recommendation algorithms. It also includes a webserver, an API REST, and a content cache. To prove that our solution is valid and adequate, we have developed a movie recommender system based on two methods, both tested on the freely available Movielens and Netflix datasets. Performance was assessed by calculating root‐mean‐square error values and the times needed to produce a recommendation. We also provide quantitative measures of the speed improvement of the recommendation algorithms when the implementation is supported by a computing cluster. The contribution of this paper lies in the fact that our solution, which improves the performance of competitor recommender systems, is the first proposal combining a webserver, an API REST, a content cache and Apache Spark technology. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

15.
Yin  Minghao  Liu  Yanheng  Zhou  Xu  Sun  Geng 《Multimedia Tools and Applications》2021,80(30):36215-36235

Point of interest (POI) recommendation problem in location based social network (LBSN) is of great importance and the challenge lies in the data sparsity, implicit user feedback and personalized preference. To improve the precision of recommendation, a tensor decomposition based collaborative filtering (TDCF) algorithm is proposed for POI recommendation. Tensor decomposition algorithm is utilized to fill the missing values in tensor (user-category-time). Specifically, locations are replaced by location categories to reduce dimension in the first phase, which effectively solves the problem of data sparsity. In the second phase, we get the preference rating of users to POIs based on time and user similarity computation and hypertext induced topic search (HITS) algorithm with spatial constraints, respectively. Finally the user’s preference score of locations are determined by two items with different weights, and the Top-N locations are the recommendation results for a user to visit at a given time. Experimental results on two LBSN datasets demonstrate that the proposed model gets much higher precision and recall value than the other three recommendation methods.

  相似文献   

16.
Inferring user profiles based on texts created by users on social networks has a variety of applications in recommender systems such as job offering, item recommendation, and targeted advertisement. The problem becomes more challenging when working with short texts like tweets on Twitter, or posts on Facebook. This work aims at proposing an integrated framework based on Dempster–Shafer theory of evidence, word embedding, and k-means clustering for user profiling problem, which is capable of not only working well with short texts but also dealing with uncertainty inherently in user texts. The proposed framework is essentially composed of three phases: (1) Learning abstract concepts at multiple levels of abstraction from user corpora; (2) Evidential inference and combination for user modeling; and (3) User profile extraction. Particularly, in the first phase, a word embedding technique is used to convert preprocessed texts into vectors which capture semantics of words in user corpus, and then k-means clustering is utilized for learning abstract concepts at multiple levels of abstraction, each of which reflects appropriate semantics of user profiles. In the second phase, by considering each document in user corpus as an evidential source that carries some partial information for inferring user profiles, we first infer a mass function associated with each user document by maximum a posterior estimation, and then apply Dempster’s rule of combination for fusing all documents’ mass functions into an overall one for the user corpus. Finally, in the third phase, we apply the so-called pignistic probability principle to extract top-n keywords from user’s overall mass function to define the user profile. Thanks to the ability of combining pieces of information from many documents, the proposed framework is flexible enough to be scaled when input data coming from not only multiple modes but different sources on web environments. Besides, the resulting profiles are interpretable, visualizable, and compatible in practical applications. The effectiveness of the proposed framework is validated by experimental studies conducted on datasets crawled from Twitter and Facebook.  相似文献   

17.
In big data sources, real-world entities are typically represented with a variety of schemata and formats (e.g., relational records, JSON objects, etc.). Different profiles (i.e., representations) of an entity often contain redundant and/or inconsistent information. Thus identifying which profiles refer to the same entity is a fundamental task (called Entity Resolution) to unleash the value of big data. The naïve all-pairs comparison solution is impractical on large data, hence blocking methods are employed to partition a profile collection into (possibly overlapping) blocks and limit the comparisons to profiles that appear in the same block together. Meta-blocking is the task of restructuring a block collection, removing superfluous comparisons. Existing meta-blocking approaches rely exclusively on schema-agnostic features, under the assumption that handling the schema variety of big data does not pay-off for such a task. In this paper, we demonstrate how “loose” schema information (i.e., statistics collected directly from the data) can be exploited to enhance the quality of the blocks in a holistic loosely schema-aware (meta-)blocking approach that can be used to speed up your favorite Entity Resolution algorithm. We call it Blast (Blocking with Loosely-Aware Schema Techniques). We show how Blast can automatically extract the loose schema information by adopting an LSH-based step for efficiently handling volume and schema heterogeneity of the data. Furthermore, we introduce a novel meta-blocking algorithm that can be employed to efficiently execute Blast on MapReduce-like systems (such as Apache Spark). Finally, we experimentally demonstrate, on real-world datasets, how Blast outperforms the state-of-the-art (meta-)blocking approaches.  相似文献   

18.
Our social media experience is no longer limited to a single site. We use different social media sites for different purposes and our information on each site is often partial. By collecting complementary information for the same individual across sites, one can better profile users. These profiles can help improve online services such as advertising or recommendation across sites. To combine complementary information across sites, it is critical to understand how information for the same individual varies across sites. In this study, we aim to understand how two fundamental properties of users vary across social media sites. First, we study how user friendship behavior varies across sites. Our findings show how friend distributions for individuals change as they join new sites. Next, we analyze how user popularity changes across sites as individuals join different sites. We evaluate our findings and demonstrate how our findings can be employed to predict how popular users are likely to be on new sites they join.  相似文献   

19.
Networks with billions of vertices introduce new challenges to perform graph analysis in a reasonable time. Clustering coefficient is an important analytical measure of networks such as social networks and biological networks. To compute clustering coefficient in big graphs, existing distributed algorithms suffer from low efficiency such that they may fail due to demanding lots of memory, or even, if they complete successfully, their execution time is not acceptable for real-world applications. We present a distributed MapReduce-based algorithm, called CCFinder, to efficiently compute clustering coefficient in very big graphs. CCFinder is executed on Apache Spark, a scalable data processing platform. It efficiently detects existing triangles through using our proposed data structure, called FONL, which is cached in the distributed memory provided by Spark and reused multiple times. As data items in the FONL are fine-grained and contain the minimum required information, CCFinder requires less storage space and has better parallelism in comparison with its competitors. To find clustering coefficient, our solution to triangle counting is extended to have degree information of the vertices in the appropriate places. We performed several experiments on a Spark cluster with 60 processors. The results show that CCFinder achieves acceptable scalability and outperforms six existing competitor methods. Four competitors are those methods proposed based on graph processing systems, i.e., GraphX, NScale, NScaleSpark, and Pregel frameworks, and two others are the Cohen’s method and NodeIterator++, introduced based on MapReduce.  相似文献   

20.
易佳  薛晨  王树鹏 《计算机科学》2017,44(5):172-177
分布式流查询是一种基于数据流的实时查询计算方法,近年来得到了广泛的关注和快速发展。综述了分布式流处理框架在实时关系型查询上取得的研究成果;对涉及分布式数据加载、分布式流计算框架、分布式流查询的产品进行了分析和比较;提出了基于Spark Streaming和Apache Kafka构建的分布式流查询模型,以并发加载多个文件源的形式,设计内存文件系统实现数据的快速加载,相较于基于Apache Flume的加载技术提速1倍以上。在Spark Streaming的基础上,实现了基于Spark SQL的分布式流查询接口,并提出了自行编码解析SQL语句的方法,实现了分布式查询。测试结果表明,在查询语句复杂的情况下,自行编码解析SQL的查询效率具有明显的优势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号