首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Statistical analysis of baseball has long been popular, albeit only in limited capacity until relatively recently. In particular, analysts can now apply machine learning algorithms to large baseball data sets to derive meaningful insights into player and team performance. In the interest of stimulating new research and serving as a go-to resource for academic and industrial analysts, we perform a systematic literature review of machine learning applications in baseball analytics. The approaches employed in literature fall mainly under three problem class umbrellas: Regression, Binary Classification, and Multiclass Classification. We categorize these approaches, provide our insights on possible future applications, and conclude with a summary of our findings. We find two algorithms dominate the literature: (1) Support Vector Machines for classification problems and (2) k-nearest neighbors for both classification and Regression problems. We postulate that recent proliferation of neural networks in general machine learning research will soon carry over into baseball analytics.  相似文献   

2.
SSH作为一种加密通讯协议,不仅为远程登录等服务提供了安全保障,其隧道应用还可以封装一些其他未知应用,对网络安全产生了一定的潜在影响,因此需要准确识别出这些应用,并及时采取相应措施,维护网络安全.由于SSH协议的加密特性,通常采用基于流量统计特征的方法对其进行识别,且多是采用有监督的机器学习方法.通过对无监督机器学习方法与有监督机器学习方法的对比,比较了C4.5,SVM,BayesNet,K-means,EM这5种机器学习方法对SSH应用的分类效果,证实了通过机器学习方法来识别SSH应用是可行的.实验结果显示无监督的K-means方法具有最好的分类效果,对SSH隧道中的HTTP应用的识别准确率最高,达到了99%以上.  相似文献   

3.
ABSTRACT

Credit scoring and monitoring are the two important dimensions of the decision-making process for the loan institutions. In the first part of this study, we investigate the role of machine learning for applicant reassessment and propose a complementary screening step to an existing scoring system. We use a real data set from one of the prominent loan companies in Turkey. The information provided by the applicants form the variables in our analysis. The company’s experts have already labeled the clients as bad and good according to their ongoing payments. Using this labeled data set, we execute several methods to classify the bad applicants as well as the significant variables in this classification. As the data set consists of applicants who have passed the initial scoring system, most of the clients are marked as good. To deal with this imbalanced nature of the problem, we employ a set of different approaches to improve the performance of predicting the applicants who are likely to default. In the second part of this study, we aim to predict the payment behavior of clients based on their static (demographic and financial) and dynamic (payment) information. Furthermore, we analyze the effect of the length of the payment history and the staying power of the proposed prediction models.  相似文献   

4.
5.
Multicore hardware and software are becoming increasingly more complex. The programmability problem of multicore software has led to the use of parallel patterns. Parallel patterns reduce the effort and time required to develop multicore software by effectively capturing its thread communication and data sharing characteristics. Hence, detecting the parallel pattern used in a multi-threaded application is crucial for performance improvements and enables many architectural optimizations; however, this topic has not been widely studied. We apply machine learning techniques in a novel approach to automatically detect parallel patterns and compare these techniques in terms of accuracy and speed. We experimentally validate the detection ability of our techniques on benchmarks including PARSEC and Rodinia. Our experiments show that the k-nearest neighbor, decision trees, and naive Bayes classifier are the most accurate techniques. Overall, decision trees are the fastest technique with the lowest characterization overhead producing the best combination of detection results. We also show the usefulness of the proposed techniques on synthetic benchmark generation.  相似文献   

6.
尽管机器学习在许多领域取得了巨大的成功,但缺乏可解释性严重限制了其在现实任务尤其是安全敏感任务中的广泛应用.为了克服这一弱点,许多学者对如何提高机器学习模型可解释性进行了深入的研究,并提出了大量的解释方法以帮助用户理解模型内部的工作机制.然而,可解释性研究还处于初级阶段,依然还有大量的科学问题尚待解决.并且,不同的学者解决问题的角度不同,对可解释性赋予的含义也不同,所提出的解释方法也各有侧重.迄今为止,学术界对模型可解释性仍缺乏统一的认识,可解释性研究的体系结构尚不明确.在综述中,回顾了机器学习中的可解释性问题,并对现有的研究工作进行了系统的总结和科学的归类.同时,讨论了可解释性相关技术的潜在应用,分析了可解释性与可解释机器学习的安全性之间的关系,并且探讨了可解释性研究当前面临的挑战和未来潜在的研究方向,以期进一步推动可解释性研究的发展和应用.  相似文献   

7.
8.
P2P流量逐渐成为了互联网流量的重要组成部分,在对Internet 起巨大推动作用的同时,也带来了因资源过度占用而引起的网络拥塞以及安全隐患等问题,妨碍了正常的网络业务的开展.文中提出了基于机器学习的P2P流量识别方案,并运用FCBF(Fast Correlation-Based Filter)特征选择算法形成了流量特征子集,构建了机器学习P2P流量识别模型并对比了几种常见的机器学习算法在流量识别方面的性能.测试实验结果表明,C4.5算法和贝叶斯网络算法都适合于P2P流量检测,其个别模型达到了90%以上的识别率.  相似文献   

9.
Risk assessment of financialintermediaries is an area of renewed interest due tothe financial crises of the 1980's and 90's. Anaccurate estimation of risk, and its use in corporateor global financial risk models, could be translatedinto a more efficient use of resources. One importantingredient to accomplish this goal is to find accuratepredictors of individual risk in the credit portfoliosof institutions. In this context we make a comparativeanalysis of different statistical and machine learningmodeling methods of classification on a mortgage loandata set with the motivation to understand theirlimitations and potential. We introduced a specificmodeling methodology based on the study of errorcurves. Using state-of-the-art modeling techniques webuilt more than 9,000 models as part of the study. Theresults show that CART decision-tree models providethe best estimation for default with an average 8.31%error rate for a training sample of 2,000 records. Asa result of the error curve analysis for this model weconclude that if more data were available,approximately 22,000 records, a potential 7.32% errorrate could be achieved. Neural Networks provided thesecond best results with an average error of 11.00%.The K-Nearest Neighbor algorithm had an averageerror rate of 14.95%. These results outperformed thestandard Probit algorithm which attained an averageerror rate of 15.13%. Finally we discuss thepossibilities to use this type of accurate predictivemodel as ingredients of institutional and global riskmodels.  相似文献   

10.
11.
This paper contrasts the shallow representational methods used in many of today's commercial expert systems with methods which reason about the function and structure of the objects under consideration. These methods are used to build ‘deep knowledge’ systems and are still being researched. The paper gives examples of the experimental application of these methods in different domains.  相似文献   

12.
近年来,基于机器学习的数据分析和数据发布技术成为热点研究方向。与传统数据分析技术相比,机器学习的优点是能够精准分析大数据的结构与模式。但是,基于机器学习的数据分析技术的隐私安全问题日益突出,机器学习模型泄漏用户训练集中的隐私信息的事件频频发生,比如成员推断攻击泄漏机器学习中训练的存在与否,成员属性攻击泄漏机器学习模型训练集的隐私属性信息。差分隐私作为传统数据隐私保护的常用技术,正在试图融入机器学习以保护用户隐私安全。然而,对隐私安全、机器学习以及机器学习攻击三种技术的交叉研究较为少见。本文做了以下几个方面的研究:第一,调研分析差分隐私技术的发展历程,包括常见类型的定义、性质以及实现机制等,并举例说明差分隐私的多个实现机制的应用场景。初次之外,还详细讨论了最新的Rényi差分隐私定义和Moment Accountant差分隐私的累加技术。其二,本文详细总结了机器学习领域常见隐私威胁模型定义、隐私安全攻击实例方式以及差分隐私技术对各种隐私安全攻击的抵抗效果。其三,以机器学习较为常见的鉴别模型和生成模型为例,阐述了差分隐私技术如何应用于保护机器学习模型的技术,包括差分隐私的随机梯度扰动(DP-SGD)技术和差分隐私的知识转移(PATE)技术。最后,本文讨论了面向机器学习的差分隐私机制的若干研究方向及问题。  相似文献   

13.
张幸幸  朱振峰  赵亚威  赵耀 《软件学报》2022,33(10):3732-3753
随着信息技术在社会各领域的深入渗透,人类社会所拥有的数据总量达到了一个前所未有的高度.一方面,海量数据为基于数据驱动的机器学习方法获取有价值的信息提供了充分的空间;另一方面,高维度、过冗余以及高噪声也是上述繁多、复杂数据的固有特性.为消除数据冗余、发现数据结构、提高数据质量,原型学习是一种行之有效的方式.通过寻找一个原型集来表示目标集,以从样本空间进行数据约简,在增强数据可用性的同时,提升机器学习算法的执行效率.其可行性在众多应用领域中已得到证明.因此,原型学习相关理论与方法的研究是当前机器学习领域的一个研究热点与重点.主要介绍了原型学习的研究背景和应用价值,概括介绍了各类原型学习相关方法的基本特性、原型的质量评估以及典型应用;接着,从原型学习的监督方式及模型设计两个视角重点介绍了原型学习的研究进展,其中,前者主要涉及无监督、半监督和全监督方式,后者包括基于相似度、行列式点过程、数据重构和低秩逼近这四大类原型学习方法;最后,对原型学习的未来发展方向进行了展望.  相似文献   

14.
Langley  Pat 《Machine Learning》1986,1(3):243-248
Summary Although science can be characterized in terms of search, some search methods let one explore multiple paths in parallel. We have argued that more machine learning researchers should focus their efforts on modeling human behavior, but we have not argued that the field should limit itself to this approach. For those interested in general principles, the study of nonhuman learning methods is also necessary for useful results. In terms of applications, some of machine learning's greatest achievements have involved nonincremental methods that are clearly poor models of human learning. Planes are terrible imitations of birds (and fly less efficiently), but there are still excellent reasons for using aircraft.However, we do believe that too little research has focused on results from the literature on human learning, and that greater attention in this direction would benefit the field as a whole. Science is a complex and bewildering process, and the scientist should employ all available knowledge to direct his steps in useful directions. This strategy seems especially important in young fields like machine learning, in which conflicting views and methods abound. We encourage the reader to join us in applying machine learning techniques to explain the mysteries of human behavior, and in using knowledge of human behavior to constrain our computational theories of learning.  相似文献   

15.
针对电磁频谱空间中频谱资源日益稀缺的问题,新兴的射频机器学习旨在结合电磁频谱领域知识,设计专门的机器学习模型,具有快速、小样本甚至零样本、可解释性和高性能的优势。按照五层网络结构,从物理层、数据链路层、网络层、传输层和应用层出发,本文对已有的射频机器学习在无线通信中具体应用的最新成果进行归类分析。同时,在现有成果基础上,通过对数据驱动和知识驱动的相互作用关系,总结了4种射频机器学习框架(串行/并行/耦合/反馈双驱动框架)。最后,为了促进射频机器学习的研究和实际应用,本文讨论了关键挑战和开放性问题。  相似文献   

16.
17.
18.
19.
20.
Ring  Mark B. 《Machine Learning》1997,28(1):77-104
Continual learning is the constant development of increasingly complex behaviors; the process of building more complicated skills on top of those already developed. A continual-learning agent should therefore learn incrementally and hierarchically. This paper describes CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development. CHILD can quickly solve complicated non-Markovian reinforcement-learning tasks and can then transfer its skills to similar but even more complicated tasks, learning these faster still.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号