Machine Learning Aided Quantum Chemistry Discovery in the Solution Phase


Machine learning (ML) and big data play increasingly important roles in both experimental and theoretical chemistry studies. Although numerous critical chemical processes occur in the solution phase, datasets (computational or experimental) and machine-learning models for solution-phase molecular systems are still scarce. My research group aims to overcome these challenges by building a big-data ecosystem for quantum chemistry research of solution-phase molecular systems.

To efficiently generate computational datasets of solvated molecular systems, we developed strategies to accelerate both the implicit and explicit solvent models for quantum chemistry calculations. For the implicit conductor-like polarization model (C-PCM), we developed algorithms on the graphical processing units (GPUs) to accelerate the calculation. For the explicit solvent model, we developed AutoSolvate, an open-source toolkit to streamline the QC calculation workflow of explicitly solvated molecules. To make these tools more accessible, we created a web-based chatbot-assisted platform to offer automated simulations on cloud computing resources.

To improve the accuracy of the generated datasets, we develop ML models to reduce the discrepancy between experimental measurements and computationally predicted molecular properties in both implicit and explicit solvent models. We are also utilizing explainable ML models to reveal the design rules of photoredox catalysts.