首页 | 本学科首页   官方微博 | 高级检索  
     


Type II fuzzy set-based data analytics to explore amino acid associations in protein sequences of Swine Influenza Virus
Abstract:The veracity present in molecular data available in biological databases possesses new challenges for data analytics. The analysis of molecular data of various diseases can provide vital information for developing better understanding of the molecular mechanism of a disease. In this paper, an attempt has been made to propose a model that addresses the issue of veracity in data analytics for amino acid association patterns in protein sequences of Swine Influenza Virus. The veracity is caused by intra-sequential and inter-sequential biases present in the sequences due to varying degrees of relationships among amino acids. A complete dataset of 63,682 protein sequences is downloaded from NCBI and is refined. The refined dataset consists of 26,594 sequences which are employed in the present study. The type I fuzzy set is employed to explore amino acid association patterns in the dataset. The type I fuzzy support is refined to partially remove the inter-sequential biases causing veracity in data. The remaining inter-sequential biases present in refined fuzzy support are evaluated and eliminated using type II fuzzy set. Hence, it is concluded that a combination of type II fuzzy & refined fuzzy approach is the optimal approach for extracting a better picture of amino acid association patterns in the molecular dataset.
Keywords:Type II fuzzy set  Swine Influenza Virus  Data analytics
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号