Abstract: | Text mining or analytics is important for various applications such as market analysis and biomedical purposes because it enables the efficient retrieval of information from large datasets. During the analysis, increasing the dimensionality of the data reduces the performance of an entire system because doing so may retrieve irrelevant text, which creates errors. Therefore, this paper introduces big data and data mining techniques to analyse large volumes of information while mining texts, emails, blogs, online forums, news, and call centre documents. Initially, the data are collected from various sources that contain noise, which is removed by applying normalization techniques. Data mining techniques eliminate the irrelevant information and noise, and the relevant features are selected using the rough set‐based particle swarm optimization algorithm. The selected features are formed as a cluster using a fuzzy set with the particle swarm optimization algorithm, which improves the efficiency of the mining process. Then, the efficiency of the system is evaluated using the University of California Irvine Machine Learning Repository knowledge process mining database, along with the sum of the intra cluster distances, the mean squared error rate, and the accuracy. |