Higher order feature selection for text classification |
| |
Authors: | Jan Bakus Mohamed S Kamel |
| |
Affiliation: | (1) Department of Systems Design Engineering, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada;(2) Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada |
| |
Abstract: | In this paper. we present the MIFS-C variant of the mutual information feature-selection algorithms. We present an algorithm
to find the optimal value of the redundancy parameter, which is a key parameter in the MIFS-type algorithms. Furthermore,
we present an algorithm that speeds up the execution time of all the MIFS variants. Overall, the presented MIFS-C has comparable
classification accuracy (in some cases even better) compared with other MIFS algorithms, while its running time is faster.
We compared this feature selector with other feature selectors, and found that it performs better in most cases. The MIFS-C
performed especially well for the breakeven and F-measure because the algorithm can be tuned to optimise these evaluation measures.
Jan Bakus received the B.A.Sc. and M.A.Sc. degrees in electrical engineering from the University of Waterloo, Waterloo, ON, Canada,
in 1996 and 1998, respectively, and Ph.D. degree in systems design engineering in 2005. He is currently working at Maplesoft,
Waterloo, ON, Canada as an applications engineer, where he is responsible for the development of application specific toolboxes
for the Maple scientific computing software.
His research interests are in the area of feature selection for text classification, text classification, text clustering,
and information retrieval. He is the recipient of the Carl Pollock Fellowship award from the University of Waterloo and the
Datatel Scholars Foundation scholarship from Datatel.
Mohamed S. Kamel holds a Ph.D. in computer science from the University of Toronto, Canada. He is at present Professor and Director of the
Pattern Analysis and Machine Intelligence Laboratory in the Department of Electrical and Computing Engineering, University
of Waterloo, Canada. Professor Kamel holds a Canada Research Chair in Cooperative Intelligent Systems.
Dr. Kamel's research interests are in machine intelligence, neural networks and pattern recognition with applications in robotics
and manufacturing. He has authored and coauthored over 200 papers in journals and conference proceedings, 2 patents and numerous
technical and industrial project reports. Under his supervision, 53 Ph.D. and M.A.Sc. students have completed their degrees.
Dr. Kamel is a member of ACM, AAAI, CIPS and APEO and has been named s Fellow of IEEE (2005). He is the editor-in-chief of
the International Journal of Robotics and Automation, Associate Editor of the IEEE SMC, Part A, the International Journal
of Image and Graphics, Pattern Recognition Letters and is a member of the editorial board of the Intelligent Automation and
Soft Computing. He has served as a consultant to many Companies, including NCR, IBM, Nortel, VRP and CSA. He is a member of
the board of directors and cofounder of Virtek Vision International in Waterloo. |
| |
Keywords: | Feature selection Text classification |
本文献已被 SpringerLink 等数据库收录! |
|