期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Studying the consistency of star ratings and the complaints in 1 & 2-star user reviews for top free cross-platform Android and iOS apps

Hanyang Hu Cor-Paul Bezemer Ahmed E. Hassan 《Empirical Software Engineering》2018,23(6):3442-3475

How users rate a mobile app via star ratings and user reviews is of utmost importance for the success of an app. Recent studies and surveys show that users rely heavily on star ratings and user reviews that are provided by other users, for deciding which app to download. However, understanding star ratings and user reviews is a complicated matter, since they are influenced by many factors such as the actual quality of the app and how the user perceives such quality relative to their expectations, which are in turn influenced by their prior experiences and expectations relative to other apps on the platform (e.g., iOS versus Android). Nevertheless, star ratings and user reviews provide developers with valuable information for improving the overall impression of their app. In an effort to expand their revenue and reach more users, app developers commonly build cross-platform apps, i.e., apps that are available on multiple platforms. As star ratings and user reviews are of such importance in the mobile app industry, it is essential for developers of cross-platform apps to maintain a consistent level of star ratings and user reviews for their apps across the various platforms on which they are available. In this paper, we investigate whether cross-platform apps achieve a consistent level of star ratings and user reviews. We manually identify 19 cross-platform apps and conduct an empirical study on their star ratings and user reviews. By manually tagging 9,902 1 & 2-star reviews of the studied cross-platform apps, we discover that the distribution of the frequency of complaint types varies across platforms. Finally, we study the negative impact ratio of complaint types and find that for some apps, users have higher expectations on one platform. All our proposed techniques and our methodologies are generic and can be used for any app. Our findings show that at least 79% of the studied cross-platform apps do not have consistent star ratings, which suggests that different quality assurance efforts need to be considered by developers for the different platforms that they wish to support. 相似文献

2.

DEM: Deep Entity Matching Across Heterogeneous Information Networks

下载免费PDF全文

Kong Chao Chen Bao-Xiang Zhang Li-Ping 《计算机科学技术学报》2020,35(4):739-750

Heterogeneous information networks, which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects, are ubiquitous in the real world. In this paper, we study the problem of entity matching for heterogeneous information networks based on distributed network embedding and multi-layer perceptron with a highway network, and we propose a new method named DEM short for Deep Entity Matching. In contrast to the traditional entity matching methods, DEM utilizes the multi-layer perceptron with a highway network to explore the hidden relations to improve the performance of matching. Importantly, we incorporate DEM with the network embedding methodology, enabling highly efficient computing in a vectorized manner. DEM’s generic modeling of both the network structure and the entity attributes enables it to model various heterogeneous information networks flexibly. To illustrate its functionality, we apply the DEM algorithm to two real-world entity matching applications: user linkage under the social network analysis scenario that predicts the same or matched users in different social platforms and record linkage that predicts the same or matched records in different citation networks. Extensive experiments on real-world datasets demonstrate DEM’s effectiveness and rationality.

相似文献

3.

大规模社交网络中高效的关键用户选取方法

郑永广岳昆尹子都张学杰《计算机应用》2017,37(11):3101-3106

针对大规模社交网络及其用户发布消息的历史数据,如何快速有效地选取具有较强信息传播能力的关键用户,提出了一种关键用户选取方法。首先,利用社交网络的结构信息,构建以用户为节点的有向图,利用用户发布消息的历史数据,基于Spark计算框架,定量计算由用户活跃度、转发交互度和信息量占比刻画的权重,从而构建社交网络的有向带权图模型;然后,借鉴PageRank算法,建立用户信息传播能力的度量机制,给出基于Spark的大规模社交网络中用户信息传播能力的计算方法;进而,给出基于Spark的d-距选取算法,通过多次迭代,使得所选取的不同关键用户的信息传播范围尽量少地重叠。建立在新浪微博数据上的实验结果表明,所提方法具有高效性、可行性和可扩展性,对于控制不良突发信息传播、社交网络舆情监控具有一定的支撑作用。相似文献

4.

Investigating the performance of Hadoop and Spark platforms on machine learning algorithms

Mostafaeipour Ali Jahangard Rafsanjani Amir Ahmadi Mohammad Arockia Dhanraj Joshuva 《The Journal of supercomputing》2021,77(2):1273-1300

One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time. Hadoop and Spark are two frameworks for distributed data processing. Hadoop is a very popular and general platform for big data processing. Because of the in-memory programming model, Spark as an open-source framework is suitable for processing iterative algorithms. In this paper, Hadoop and Spark frameworks, the big data processing platforms, are evaluated and compared in terms of runtime, memory and network usage, and central processor efficiency. Hence, the K-nearest neighbor (KNN) algorithm is implemented on datasets with different sizes within both Hadoop and Spark frameworks. The results show that the runtime of the KNN algorithm implemented on Spark is 4 to 4.5 times faster than Hadoop. Evaluations show that Hadoop uses more sources, including central processor and network. It is concluded that the CPU in Spark is more effective than Hadoop. On the other hand, the memory usage in Hadoop is less than Spark.

相似文献

5.

C2C在线短租跨平台房东匹配算法

吴代漾赵洁梁家铭董振宁梁周扬《计算机与现代化》2022,(6):43-48

随着民宿与在线短租平台的兴起，房东多归属现象持续受到关注与研究，该现象提供了新的研究角度，而如何在不同平台识别同源房东成为首要解决的问题。故本文基于传统用户匹配探索C2C在线短租跨平台房东匹配算法。其中由于房东个人信息稀疏，因此本文引入房源信息，设计基于房源信息的两阶段房东匹配算法(TSHM)。本文方法在基于国内2大在线短租平台真实数据划分的普通数据集与难例数据集上分别达到99.69%与81.97%的准确率，优于SVM、DT等传统分类器，验证了匹配模型与匹配特征的有效性，为跨平台房东匹配提供新思路，在房东个人信息缺乏条件下仍可有效匹配房东。但本文仅针对国内平台数据进行实验，未引入文本与图片等特征，存在一定局限性。相似文献

6.

Analyzing spatial analytics systems based on Hadoop and Spark: A user perspective

João Pedro de Carvalho Castro Anderson Chaves Carniel Cristina Dutra de Aguiar Ciferri 《Software》2020,50(12):2121-2144

Spatial analytics systems (SASs) represent a technology capable of managing huge volumes of spatial data using frameworks such as Apache Hadoop and Apache Spark. An increasing number of SASs have been proposed, requiring a comparison among them. However, existing comparisons in the literature provide a system-centric view based on performance evaluations. Thus, there is a lack of comparisons based on the user-centric view, that is, comparisons that help users to understand how the characteristics of SASs are useful to meet the specific requirements of their spatial applications. In this article, we provide a user-centric comparison of the following SASs based on Hadoop and Spark: Hadoop-GIS, SpatialHadoop, SpatialSpark, GeoSpark, GeoMesa Spark, SIMBA, LocationSpark, STARK, Magellan, SparkGIS, and Elcano. This comparison employs an extensive set of criteria related to the general characteristics of these systems, to the aspects of spatial data handling, and to the aspects inherent to distributed systems. Based on this comparison, we introduce guidelines to help users to choose an appropriate SAS. We also describe two case studies based on real-world applications to illustrate the use of these guidelines. Finally, we discuss chronological tendencies related to SASs and identify limitations that SASs should address to improve user experience. 相似文献

7.

社会媒体中用户的隐式消费意图识别

付博刘挺《软件学报》2016,27(11):2843-2854

不同于已有的显式消费意图识别的研究,提出了社会媒体中用户的隐式消费意图自动识别方法.该方法将隐式消费意图识别视作多标记分类问题,并综合使用了基于用户关注行为、意图关注行为、意图转发行为以及个人信息的多种特征.由于隐式消费意图识别难以评价,自动抽取了大量跨社会媒体的用户链指信息,利用该方法,共抽取出12万余对的用户链指.在此自动评价集上的实验结果表明,所采用的多标记分类方法对于识别用户的隐式消费意图是行之有效的,其中使用的各种特征对于提高隐式消费意图识别的效果皆有帮助. 相似文献

8.

Making recommendations by integrating information from multiple social networks

Makbule Gulcin Ozsoy Faruk Polat Reda Alhajj 《Applied Intelligence》2016,45(4):1047-1065

It is becoming a common practice to use recommendation systems to serve users of web-based platforms such as social networking platforms, review web-sites, and e-commerce web-sites. Each platform produces recommendations by capturing, maintaining and analyzing data related to its users and their behavior. However, people generally use different web-based platforms for different purposes. Thus, each platform captures its own data which may reflect certain aspects related to its users. Integrating data from multiple platforms may widen the perspective of the analysis and may help in modeling users more effectively. Motivated by this, we developed a recommendation framework which integrates data collected from multiple platforms. For this purpose, we collected and anonymized datasets which contain information from several social networking and social media platforms, namely BlogCatalog, Twitter, Flickr, Facebook, YouTube and LastFm. The collected and integrated data forms a consolidated repository that may become a valuable source for researchers and practitioners. We implemented a number of recommendation methodologies to observe their performance for various cases which involve using single versus multiple features from a single source versus multiple sources. The conducted experiments have shown that using multiple features from multiple sources is expected to produce a more concrete and wider perspective of user’s behavior and preferences. This leads to improved recommendation outcome. 相似文献

9.

基于社交圈层和注意力机制的信息热度预测

郑作武邵斯绮高晓沨陈贵海《计算机学报》2021,44(5):921-936

社交网络现已成为现实世界中信息传播与扩散的主要媒介,对其中的热点信息进行建模和预测有着广泛的应用场景和商业价值,比如进行信息传播挖掘、广告推荐和用户行为分析等.目前的相关研究主要利用特征和时间序列进行建模,但是并没有考虑到社交网络中用户的社交圈层对于信息传播的作用.本文提出了一种基于社交圈层和注意力机制的热度预测模型S... 相似文献

10.

面向电子商务平台用户意图预测的时间感知分层自注意力网络

下载免费PDF全文

王森章刘毅张家强尹成语《信息安全学报》2021,6(5):169-180

在推荐系统领域,了解电商平台中在线用户的行为意图至关重要。目前的一些方法通常将用户与商品之间的交互历史数据视为有序的序列,却忽视了不同交互行为之间的时间间隔信息。另外,一个用户的在线行为可能不仅仅包含一种意图,而是包含多种意图。例如,当一位用户在浏览运动品类下的商品时,其可能同时有购买足球和运动衫这两种商品的意图。但是现有的一些电商平台用户意图预测方法很难有效对用户-商品交互对时间间隔信息进行建模,也难以捕捉用户多方面的购物意图。为了解决上述问题,我们提出了一种时间感知分层自注意力网络模型THSNet,以更有效对电商平台的用户意图进行预测。具体而言,THSNet模型采用一种分层注意力机制来有效地捕获用户-商品交互历史中的时间跨度信息以更有效建模用户的多种意图。THSNet模型的注意力层分为两层,底层的注意力层用于建模每个会话内部的用户-商品交互,上层的注意力层学习不同会话之间的长期依赖关系。另外,为了提高预测结果的鲁棒性和准确度,我们采用BERT预训练的方法,通过随机遮盖部分会话的特征表示,构造了一个完形填空任务,并将该任务与用户意图预测任务耦合成为多任务学习模型,这种多任务预测方法有助于模型学到一个具有鲁棒性和双向性的会话特征表示。我们在两个真实数据集上对所提方法对有效性进行了验证。实验结果表明,我们所提出的THSNet模型要明显优于目前最先进的方法。相似文献

11.

基于DeepLink的社交网络去匿名方法

下载免费PDF全文

王培贾焰李爱平蒋千越《网络与信息安全学报》2020,6(4):104-108

现有的社交网络去匿名方法主要是基于网络结构,对网络结构进行学习与表示是去匿名的关键。用户身份链接(user identity linkage)的目的是检测来自不同社交网络平台的同一个用户。基于深度学习的跨社交网络用户对齐技术,很好地学习了不同社交网络的结构特征,实现了跨社交网络的用户对齐。将该技术用于同一社交网络匿名用户识别,实验结果优于传统去匿名方法。相似文献

12.

Consistently faster and smaller compressed bitmaps with Roaring

下载免费PDF全文

Daniel Lemire Gregory Ssi‐Yan‐Kai Owen Kaser 《Software》2016,46(11):1547-1569

Compressed bitmap indexes are used in databases and search engines. Many bitmap compression techniques have been proposed, almost all relying primarily on run‐length encoding (RLE). However, on unsorted data, we can get superior performance with a hybrid compression technique that uses both uncompressed bitmaps and packed arrays inside a two‐level tree. An instance of this technique, Roaring, has recently been proposed. Due to its good performance, it has been adopted by several production platforms (e.g., Apache Lucene, Apache Spark, Apache Kylin, and Druid). Yet there are cases where run‐length‐encoded bitmaps are smaller than the original Roaring bitmaps—typically when the data are sorted so that the bitmaps contain long compressible runs. To better handle these cases, we build a new Roaring hybrid that combines uncompressed bitmaps, packed arrays, and RLE‐compressed segments. The result is a new Roaring format that compresses better. Overall, our new implementation of Roaring can be several times faster (up to two orders of magnitude) than the implementations of traditional RLE‐based alternatives (WAH, Concise, and EWAH) while compressing better. We review the design choices and optimizations that make these good results possible. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

13.

基于HTML5的跨平台移动Web应用与混合型应用的研究

钟迅科《电脑与微电子技术》2014,(13):32-36

移动应用开发一直存在多平台重复开发以及各平台用户体验不一致的实际问题。在移动应用中引入Web技术可解决跨平台的需求。介绍移动Web应用和混合型应用的设计方式,提出以Web技术为主和以原生技术为主的两种混合型应用开发模式,较好地解决开发效率与运行性能之间的平衡问题,并提供具体的实现方法。相似文献

14.

In‐memory distributed software solution to improve the performance of recommender systems

下载免费PDF全文

Enrique Costa‐Montenegro Alexander Tsybanev Héctor Cerezo‐Costas Francisco Javier González‐Castaño Felipe Gil‐Castiñeira Belén Barragáns‐Martínez Diego Almuiña‐Troncoso 《Software》2017,47(6):867-889

Many recommender systems are currently available for proposing content (movies, TV series, music, etc.) to users according to different profiling metrics, such as ratings of previously consumed items and ratings of people with similar tastes. Recommendation algorithms are typically executed by powerful servers, as they are computationally expensive. In this paper, we propose a new software solution to improve the performance of recommender systems. Its implementation relies heavily on Apache Spark technology to speed up the computation of recommendation algorithms. It also includes a webserver, an API REST, and a content cache. To prove that our solution is valid and adequate, we have developed a movie recommender system based on two methods, both tested on the freely available Movielens and Netflix datasets. Performance was assessed by calculating root‐mean‐square error values and the times needed to produce a recommendation. We also provide quantitative measures of the speed improvement of the recommendation algorithms when the implementation is supported by a computing cluster. The contribution of this paper lies in the fact that our solution, which improves the performance of competitor recommender systems, is the first proposal combining a webserver, an API REST, a content cache and Apache Spark technology. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

15.

A tensor decomposition based collaborative filtering algorithm for time-aware POI recommendation in LBSN

Yin Minghao Liu Yanheng Zhou Xu Sun Geng 《Multimedia Tools and Applications》2021,80(30):36215-36235

Point of interest (POI) recommendation problem in location based social network (LBSN) is of great importance and the challenge lies in the data sparsity, implicit user feedback and personalized preference. To improve the precision of recommendation, a tensor decomposition based collaborative filtering (TDCF) algorithm is proposed for POI recommendation. Tensor decomposition algorithm is utilized to fill the missing values in tensor (user-category-time). Specifically, locations are replaced by location categories to reduce dimension in the first phase, which effectively solves the problem of data sparsity. In the second phase, we get the preference rating of users to POIs based on time and user similarity computation and hypertext induced topic search (HITS) algorithm with spatial constraints, respectively. Finally the user’s preference score of locations are determined by two items with different weights, and the Top-N locations are the recommendation results for a user to visit at a given time. Experimental results on two LBSN datasets demonstrate that the proposed model gets much higher precision and recall value than the other three recommendation methods.

相似文献

16.

An integrated framework of learning and evidential reasoning for user profiling using short texts

《Information Fusion》2021

Inferring user profiles based on texts created by users on social networks has a variety of applications in recommender systems such as job offering, item recommendation, and targeted advertisement. The problem becomes more challenging when working with short texts like tweets on Twitter, or posts on Facebook. This work aims at proposing an integrated framework based on Dempster–Shafer theory of evidence, word embedding, and

k

-means clustering for user profiling problem, which is capable of not only working well with short texts but also dealing with uncertainty inherently in user texts. The proposed framework is essentially composed of three phases: (1) Learning abstract concepts at multiple levels of abstraction from user corpora; (2) Evidential inference and combination for user modeling; and (3) User profile extraction. Particularly, in the first phase, a word embedding technique is used to convert preprocessed texts into vectors which capture semantics of words in user corpus, and then

k

-means clustering is utilized for learning abstract concepts at multiple levels of abstraction, each of which reflects appropriate semantics of user profiles. In the second phase, by considering each document in user corpus as an evidential source that carries some partial information for inferring user profiles, we first infer a mass function associated with each user document by maximum a posterior estimation, and then apply Dempster’s rule of combination for fusing all documents’ mass functions into an overall one for the user corpus. Finally, in the third phase, we apply the so-called pignistic probability principle to extract top-

n

keywords from user’s overall mass function to define the user profile. Thanks to the ability of combining pieces of information from many documents, the proposed framework is flexible enough to be scaled when input data coming from not only multiple modes but different sources on web environments. Besides, the resulting profiles are interpretable, visualizable, and compatible in practical applications. The effectiveness of the proposed framework is validated by experimental studies conducted on datasets crawled from Twitter and Facebook. 相似文献

17.

Scaling entity resolution: A loosely schema-aware approach

《Information Systems》2019

In big data sources, real-world entities are typically represented with a variety of schemata and formats (e.g., relational records, JSON objects, etc.). Different profiles (i.e., representations) of an entity often contain redundant and/or inconsistent information. Thus identifying which profiles refer to the same entity is a fundamental task (called Entity Resolution) to unleash the value of big data. The naïve all-pairs comparison solution is impractical on large data, hence blocking methods are employed to partition a profile collection into (possibly overlapping) blocks and limit the comparisons to profiles that appear in the same block together. Meta-blocking is the task of restructuring a block collection, removing superfluous comparisons. Existing meta-blocking approaches rely exclusively on schema-agnostic features, under the assumption that handling the schema variety of big data does not pay-off for such a task. In this paper, we demonstrate how “loose” schema information (i.e., statistics collected directly from the data) can be exploited to enhance the quality of the blocks in a holistic loosely schema-aware (meta-)blocking approach that can be used to speed up your favorite Entity Resolution algorithm. We call it Blast (Blocking with Loosely-Aware Schema Techniques). We show how Blast can automatically extract the loose schema information by adopting an LSH-based step for efficiently handling volume and schema heterogeneity of the data. Furthermore, we introduce a novel meta-blocking algorithm that can be employed to efficiently execute Blast on MapReduce-like systems (such as Apache Spark). Finally, we experimentally demonstrate, on real-world datasets, how Blast outperforms the state-of-the-art (meta-)blocking approaches. 相似文献

18.

Users joining multiple sites: Friendship and popularity variations across sites

《Information Fusion》2016

Our social media experience is no longer limited to a single site. We use different social media sites for different purposes and our information on each site is often partial. By collecting complementary information for the same individual across sites, one can better profile users. These profiles can help improve online services such as advertising or recommendation across sites. To combine complementary information across sites, it is critical to understand how information for the same individual varies across sites. In this study, we aim to understand how two fundamental properties of users vary across social media sites. First, we study how user friendship behavior varies across sites. Our findings show how friend distributions for individuals change as they join new sites. Next, we analyze how user popularity changes across sites as individuals join different sites. We evaluate our findings and demonstrate how our findings can be employed to predict how popular users are likely to be on new sites they join. 相似文献

19.

CCFinder: using Spark to find clustering coefficient in big graphs

Mehdi Alemi Hassan Haghighi Saeed Shahrivari 《The Journal of supercomputing》2017,73(11):4683-4710

Networks with billions of vertices introduce new challenges to perform graph analysis in a reasonable time. Clustering coefficient is an important analytical measure of networks such as social networks and biological networks. To compute clustering coefficient in big graphs, existing distributed algorithms suffer from low efficiency such that they may fail due to demanding lots of memory, or even, if they complete successfully, their execution time is not acceptable for real-world applications. We present a distributed MapReduce-based algorithm, called CCFinder, to efficiently compute clustering coefficient in very big graphs. CCFinder is executed on Apache Spark, a scalable data processing platform. It efficiently detects existing triangles through using our proposed data structure, called FONL, which is cached in the distributed memory provided by Spark and reused multiple times. As data items in the FONL are fine-grained and contain the minimum required information, CCFinder requires less storage space and has better parallelism in comparison with its competitors. To find clustering coefficient, our solution to triangle counting is extended to have degree information of the vertices in the appropriate places. We performed several experiments on a Spark cluster with 60 processors. The results show that CCFinder achieves acceptable scalability and outperforms six existing competitor methods. Four competitors are those methods proposed based on graph processing systems, i.e., GraphX, NScale, NScaleSpark, and Pregel frameworks, and two others are the Cohen’s method and NodeIterator++, introduced based on MapReduce. 相似文献

20.

分布式流数据加载和查询技术优化

易佳薛晨王树鹏《计算机科学》2017,44(5):172-177

分布式流查询是一种基于数据流的实时查询计算方法,近年来得到了广泛的关注和快速发展。综述了分布式流处理框架在实时关系型查询上取得的研究成果;对涉及分布式数据加载、分布式流计算框架、分布式流查询的产品进行了分析和比较;提出了基于Spark Streaming和Apache Kafka构建的分布式流查询模型,以并发加载多个文件源的形式,设计内存文件系统实现数据的快速加载,相较于基于Apache Flume的加载技术提速1倍以上。在Spark Streaming的基础上,实现了基于Spark SQL的分布式流查询接口,并提出了自行编码解析SQL语句的方法,实现了分布式查询。测试结果表明,在查询语句复杂的情况下,自行编码解析SQL的查询效率具有明显的优势。相似文献