Face datasets are considered a primary tool for evaluating the efficacy of face recognition methods. Here we show that in
many of the commonly used face datasets, face images can be recognized accurately at a rate significantly higher than random
even when no face, hair or clothes features appear in the image. The experiments were done by cutting a small background area
from each face image, so that each face dataset provided a new image dataset which included only seemingly blank images. Then,
an image classification method was used in order to check the classification accuracy. Experimental results show that the
classification accuracy ranged between 13.5% (color FERET) to 99% (YaleB). These results indicate that the performance of
face recognition methods measured using face image datasets may be biased. Compilable source code used for this experiment
is freely available for download via the Internet. 相似文献
The algorithm selection problem is defined as identifying the best-performing machine learning (ML) algorithm for a given combination of dataset, task, and evaluation measure. The human expertise required to evaluate the increasing number of ML algorithms available has resulted in the need to automate the algorithm selection task. Various approaches have emerged to handle the automatic algorithm selection challenge, including meta-learning. Meta-learning is a popular approach that leverages accumulated experience for future learning and typically involves dataset characterization. Existing meta-learning methods often represent a dataset using predefined features and thus cannot be generalized across different ML tasks, or alternatively, learn a dataset’s representation in a supervised manner and therefore are unable to deal with unsupervised tasks. In this study, we propose a novel learning-based task-agnostic method for producing dataset representations. Then, we introduce TRIO, a meta-learning approach, that utilizes the proposed dataset representations to accurately recommend top-performing algorithms for previously unseen datasets. TRIO first learns graphical representations for the datasets, using four tools to learn the latent interactions among dataset instances and then utilizes a graph convolutional neural network technique to extract embedding representations from the graphs obtained. We extensively evaluate the effectiveness of our approach on 337 datasets and 195 ML algorithms, demonstrating that TRIO significantly outperforms state-of-the-art methods for algorithm selection for both supervised (classification and regression) and unsupervised (clustering) tasks.
This paper deals with the interpretation and feasibility check of line drawings representing polyhedral scenes. The polyhedra are of general types and there are no restrictions on camera position. The geometric consistency check and the line labeling are carried out through constructions in the image plane. An algorithm for the geometric construction is suggested, and the necessary conditions for these constructions are discussed. The image plane construction can be used for preparing labeled junction catalogs for junctions other than trihedral. In addition the paper analyzes the relation between the image plane construction and the gradient space construction suggested by Mackworth [7] for the same purpose. 相似文献
We present a new concept—Wikiometrics—the derivation of metrics and indicators from Wikipedia. Wikipedia provides an accurate representation of the real world due to its size, structure, editing policy and popularity. We demonstrate an innovative “mining” methodology, where different elements of Wikipedia – content, structure, editorial actions and reader reviews – are used to rank items in a manner which is by no means inferior to rankings produced by experts or other methods. We test our proposed method by applying it to two real-world ranking problems: top world universities and academic journals. Our proposed ranking methods were compared to leading and widely accepted benchmarks, and were found to be extremely correlative but with the advantage of the data being publically available. 相似文献
Bibliometric analysis of publication metadata is an important tool for investigating emerging fields of technology. However, the application of field definitions to define an emerging technology is complicated by ongoing and at times rapid change in the underlying technology itself. There is limited prior work on adapting the bibliometric definitions of emerging technologies as these technologies change over time. The paper addresses this gap. We draw on the example of the modular keyword nanotechnology search strategy developed at Georgia Institute of Technology in 2006. This search approach has seen extensive use in analyzing emerging trends in nanotechnology research and innovation. Yet with the growth of the nanotechnology field, novel materials, particles, technologies, and tools have appeared. We report on the process and results of reviewing and updating this nanotechnology search strategy. By employing structured text-mining software to profile keyword terms, and by soliciting input from domain experts, we identify new nanotechnology-related keywords. We retroactively apply the revised evolutionary lexical query to 20 years of publication data and analyze the results. Our findings indicate that the updated search approach offers an incremental improvement over the original strategy in terms of recall and precision. Additionally, the updated strategy reveals the importance for nanotechnology of several emerging cited-subject categories, particularly in the biomedical sciences, suggesting a further extension of the nanotechnology knowledge domain. The implications of the work for applying bibliometric definitions to emerging technologies are discussed. 相似文献