Supervised two-step feature extraction for structured representation of text data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Supervised two-step feature extraction for structured representation of text data

Affiliation:	1. Department of Computers, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic;2. Department of Computer Systems, Faculty of Information Technology, Czech Technical University in Prague, Czech Republic;1. Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China;2. Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China;3. Department of Neurosurgery, Kwong Wah Hospital, Hong Kong, China;4. Novartis Oncology, Novartis Pharma GmbH, 90429, Nuremberg, Germany;5. Department of Applied AI and Data Science, City of Hope National Medical Center, Duarte, CA, 91010, United States;6. Department of Epidemiology and Center for Global Cardiometabolic Health, School of Public Health, Department of Medicine, The Warrant Alpert School of Medicine, Brown University, Providence, RI, United States;1. Wuya College of Innovation, Shenyang Pharmaceutical University, Shenyang 110016, China;2. Department of Pharmacy, The First Affiliated Hospital of China Medical University, Shenyang 110001, China;1. State Key Lab of Industrial Control Technology, Zhejiang University, Hangzhou, China;2. State Key Lab of Nuclear Power Safety Monitoring Technology and Equipment, Shenzhen, Guangdong, China;2. Department of Drug Metabolism and Pharmacokinetics, Genentech, South San Francisco, CA;3. Department of Biomedical Imaging, Genentech, South San Francisco, CA;4. Department of Discovery Oncology, Genentech, South San Francisco, CA;5. Department of Discovery Chemistry, Genentech, South San Francisco, CA

Abstract:	Training data matrix used for classification of text documents to multiple categories is characterized by large number of dimensions while the number of manually classified training documents is relatively small. Thus the suitable dimensionality reduction techniques are required to be able to develop the classifier. The article describes two-step supervised feature extraction method that takes advantage of projections of terms into document and category spaces. We propose several enhancements that make the method more efficient and faster than it was presented in our former paper. We also introduce the adjustment score that enables to correct defected targets or helps to identify improper training examples that bias extracted features.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏