Affiliation: | 1. Department of Chemical Engineering, Indian Institute of Technology Madras, Chennai, India Contribution: Conceptualization (supporting), Formal analysis (lead), Investigation (lead), Methodology (lead), Software (equal), Validation (lead), Visualization (lead), Writing - original draft (lead), Writing - review & editing (supporting);2. Department of Instrumentation Engineering, Madras Institute of Technology, Chennai, India Contribution: Data curation (equal), Formal analysis (supporting), Investigation (supporting), Software (equal);3. Gyan Data Pvt. Ltd., Indian Institute of Technology Madras Research Park, Chennai, India Contribution: Data curation (equal), Methodology (supporting), Software (supporting);4. Department of Chemical Engineering, Indian Institute of Technology Madras, Chennai, India |
Abstract: | With the ever-increasing volume of scientific literature, there is a strong need to develop methods that allow rigorous information identification. In this contribution, a state-of-the-art natural language processing (NLP) model was used to select perovskite materials for electrocatalytic applications from literature. This was accomplished by obtaining word embeddings for perovskite materials from the NLP model and subsequently designing downstream tasks to discover perovskite-based electrocatalyst materials. However, embeddings could be obtained only for materials available in the literature. Consequently, a novel methodology was devised to generate embeddings for newly designed materials. Results from the analysis showed that the computed embeddings could be used to rank materials for their suitability for electrocatalytic applications. Further, the word embeddings were also employed as features in predicting the electrocatalytic activity of perovskite-based electrocatalysts. The analysis demonstrated that the fidelity of regression models increased when the embeddings were used as features. |