首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   12篇
  免费   0篇
电工技术   1篇
轻工业   1篇
冶金工业   1篇
自动化技术   9篇
  2022年   1篇
  2019年   3篇
  2017年   1篇
  2016年   1篇
  2013年   2篇
  2011年   1篇
  2008年   1篇
  2001年   1篇
  1998年   1篇
排序方式: 共有12条查询结果,搜索用时 31 毫秒
1.

How well can machine learning predict the outcome of a soccer game, given the most commonly and freely available match data? To help answer this question and to facilitate machine learning research in soccer, we have developed the Open International Soccer Database. Version v1.0 of the Database contains essential information from 216,743 league soccer matches from 52 leagues in 35 countries. The earliest entries in the Database are from the year 2000, which is when football leagues generally adopted the “three points for a win” rule. To demonstrate the use of the Database for machine learning research, we organized the 2017 Soccer Prediction Challenge. One of the goals of the Challenge was to estimate where the limits of predictability lie, given the type of match data contained in the Database. Another goal of the Challenge was to pose a real-world machine learning problem with a fixed time line and a genuine prediction task: to develop a predictive model from the Database and then to predict the outcome of the 206 future soccer matches taking place from 31 March 2017 to the end of the regular season. The Open International Soccer Database is released as an open science project, providing a valuable resource for soccer analysts and a unique benchmark for advanced machine learning methods. Here, we describe the Database and the 2017 Soccer Prediction Challenge and its results.

  相似文献   
2.
Data Mining and Knowledge Discovery - The statistical comparison of machine learning classifiers is frequently underpinned by null hypothesis significance testing. Here, we provide a survey and...  相似文献   
3.
Ecological data suggest a long‐term diet high in plant material rich in biologically active compounds, such as the lignans, can significantly influence the development of prostate cancer over the lifetime of an individual. The capacity of a pure mammalian lignan, enterolactone (ENL), to influence the proliferation of the LNCaP human prostate cancer cell line was investigated as a function of cell density, metabolic activity, expression and secretion of prostate specific antigen (PSA), cell cycle profile, and the expression of genes involved in development and progression of prostate cancer. Treatment with a subcytotoxic concentration of ENL (60 μM for 72 h) was found to reduce: cell density (57.5%, SD 7.23, p < 0.001), metabolic activity (55%, SD 0.03, p < 0.001), secretion of PSA (48.50% SD 4.74, p = 0.05) and induce apoptosis (8.33‐fold SD 0.04, p = 0.001) compared to untreated cells. Cotreatment with 10 μM etoposide was found to increase apoptosis by 50.17% (SD 0.02, p < 0.001). Additionally, several key genes (e. g. MCMs, survivin and CDKs) were beneficially regulated by ENL treatment (p < 0.05). The data suggest that the antiproliferative activity of ENL is a consequence of altered expression of cell cycle associated genes and provides novel molecular evidence for the antiproliferative properties of a pure lignan in prostate cancer.  相似文献   
4.
5.
Null hypothesis significance testing is routinely used for comparing the performance of machine learning algorithms. Here, we provide a detailed account of the major underrated problems that this common practice entails. For example, omnibus tests, such as the widely used Friedman test, are not appropriate for the comparison of multiple classifiers over diverse data sets. In contrast to the view that significance tests are essential to a sound and objective interpretation of classification results, our study suggests that no such tests are needed. Instead, greater emphasis should be placed on the magnitude of the performance difference and the investigator’s informed judgment. As an effective tool for this purpose, we propose confidence curves, which depict nested confidence intervals at all levels for the performance difference. These curves enable us to assess the compatibility of an infinite number of null hypotheses with the experimental results. We benchmarked several classifiers on multiple data sets and analyzed the results with both significance tests and confidence curves. Our conclusion is that confidence curves effectively summarize the key information needed for a meaningful interpretation of classification results while avoiding the intrinsic pitfalls of significance tests.  相似文献   
6.
Visualization techniques for high-dimensional data sets play a pivotal role in exploratory analysis in a wide range of disciplines. A particularly challenging problem represents gene expression data based on microarray technology where the number of features (genes) typically exceeds 20,000, whereas the number of samples is frequently below 200. We investigated class-specific discrimination coefficients for each feature and each pair of classes for an effective nonlinear mapping to lower-dimensional space. We applied the technique to three microarray data sets and compared the projections to two-dimensional space with the results from a conventional multidimensional scaling method, a score plot resulting from principal component analysis, and projections from linear discriminant analysis. In the experiments, we observed that the discrimination coefficients allowed for an improved visualization of high-dimensional genomic data.  相似文献   
7.
Comparative genomic hybridization (CGH) is a molecular cytogenetic analysis method that allows the detection of chromosomal imbalances in entire genomes. The CGH approach is used in cancer research to identify over- and under-representations of chromosomal regions. To search for and analyze tumor-relevant aberration patterns in CGH data, we designed, implemented, and deployed a relational database system. This project is part of a more complex and comprehensive effort to compile, integrate, fuse, and analyze biological and clinical data from heterogeneous and distributed information sources. In this article, we discuss the obstacles and pitfalls that were encountered in the design process, describe the resulting CGH database model and the underlying technical infrastructure, and present the first results based on mining the CGH database  相似文献   
8.
Null hypothesis significance tests and their p-values currently dominate the statistical evaluation of classifiers in machine learning. Here, we discuss fundamental problems of this research practice. We focus on the problem of comparing multiple fully specified classifiers on a small-sample test set. On the basis of the method by Quesenberry and Hurst, we derive confidence intervals for the effect size, i.e. the difference in true classification performance. These confidence intervals disentangle the effect size from its uncertainty and thereby provide information beyond the p-value. This additional information can drastically change the way in which classification results are currently interpreted, published and acted upon. We illustrate how our reasoning can change, depending on whether we focus on p-values or confidence intervals. We argue that the conclusions from comparative classification studies should be based primarily on effect size estimation with confidence intervals, and not on significance tests and p-values.  相似文献   
9.
10.
Berrar  Daniel  Lopes  Philippe  Dubitzky  Werner 《Machine Learning》2019,108(1):97-126

The task of the 2017 Soccer Prediction Challenge was to use machine learning to predict the outcome of future soccer matches based on a data set describing the match outcomes of 216,743 past soccer matches. One of the goals of the Challenge was to gauge where the limits of predictability lie with this type of commonly available data. Another goal was to pose a real-world machine learning challenge with a fixed time line, involving the prediction of real future events. Here, we present two novel ideas for integrating soccer domain knowledge into the modeling process. Based on these ideas, we developed two new feature engineering methods for match outcome prediction, which we denote as recency feature extraction and rating feature learning. Using these methods, we constructed two learning sets from the Challenge data. The top-ranking model of the 2017 Soccer Prediction Challenge was our k-nearest neighbor model trained on the rating feature learning set. In further experiments, we could slightly improve on this performance with an ensemble of extreme gradient boosted trees (XGBoost). Our study suggests that a key factor in soccer match outcome prediction lies in the successful incorporation of domain knowledge into the machine learning modeling process.

  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号