Abstract: | The veracity present in molecular data available in biological databases possesses new challenges for data analytics. The analysis of molecular data of various diseases can provide vital information for developing better understanding of the molecular mechanism of a disease. In this paper, an attempt has been made to propose a model that addresses the issue of veracity in data analytics for amino acid association patterns in protein sequences of Swine Influenza Virus. The veracity is caused by intra-sequential and inter-sequential biases present in the sequences due to varying degrees of relationships among amino acids. A complete dataset of 63,682 protein sequences is downloaded from NCBI and is refined. The refined dataset consists of 26,594 sequences which are employed in the present study. The type I fuzzy set is employed to explore amino acid association patterns in the dataset. The type I fuzzy support is refined to partially remove the inter-sequential biases causing veracity in data. The remaining inter-sequential biases present in refined fuzzy support are evaluated and eliminated using type II fuzzy set. Hence, it is concluded that a combination of type II fuzzy & refined fuzzy approach is the optimal approach for extracting a better picture of amino acid association patterns in the molecular dataset. |