首页 | 本学科首页   官方微博 | 高级检索  
     


DOCODE 3.0 (DOcument COpy DEtector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources
Affiliation:1. Department of Industrial Engineering, University of Chile, Av. República 701, Chile;2. Department of Computer Science, The University of Waikato, Private Bag 3105, Hamilton 3240, New Zealand;1. Departamento de Ingeniería Industrial, Universidad de Chile, Chile;2. Departement d’Informatique, École normale supérieure, France;3. Facultad de Matemáticas & Escuela de Ingeniería, Pontificia Universidad Católica de Chile, Chile;1. Department of Telecommunications and Information Processing, Ghent University, Sint-Pietersnieuwstraat 41, Ghent, Belgium;2. Systems Research Institute, Polish Academy of Sciences, Newelska 6, Warsaw, Poland;1. CNR-IMM, Via S. Sofia 64, 95123 Catania, Italy;2. Department of Physics and Astronomy, University of Catania, Via S. Sofia 64, 95123 Catania, Italy;3. Department of Chemical Sciences, University of Catania, Viale Andrea Doria 6, 95125 Catania, Italy;1. Department of Computer Science, School of Science and Technology, Middlesex University, The Burroughs, London NW4 4BT, UK;3. Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, Surrey GU2 7XH, UK;1. Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China;2. University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:Plagiarism refers to the act of presenting external words, thoughts, or ideas as one’s own, without providing references to the sources from which they were taken. The exponential growth of different digital document sources available on the Web has facilitated the spread of this practice, making the accurate detection of it a crucial task for educational institutions. In this article, we present DOCODE 3.0, a Web system for educational institutions that performs automatic analysis of large quantities of digital documents in relation to their degree of originality. Since plagiarism is a complex problem, frequently tackled at different levels, our system applies algorithms in order to perform an information fusion process from multi data source to all these levels. These algorithms have been successfully tested in the scientific community in solving tasks like the identification of plagiarized passages and the retrieval of source candidates from the Web, among other multi data sources as digital libraries, and have proven to be very effective. We integrate these algorithms into a multi-tier, robust and scalable JEE architecture, allowing many different types of clients with different requirements to consume our services. For users, DOCODE produces a number of visualizations and reports from the different outputs to let teachers and professors gain insights on the originality of the documents they review, allowing them to discover, understand and handle possible plagiarism cases and making it easier and much faster to analyze a vast number of documents. Our experience here is so far focused on the Chilean situation and the Spanish language, offering solutions to Chilean educational institutions in any of their preferred Virtual Learning Environments. However, DOCODE can easily be adapted to increase language coverage.
Keywords:Plagiarism detection  Text patterns information fusion  Multi documental data sources
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号