Transfer Learning 4 DTI

Transfer Learning for Drug-Target Interaction

Abstract
Utilizing AI-driven approaches for drug–target interaction (DTI) prediction requires large volumes of training data which are not available for the majority of target proteins. In this work, we investigate the use of deep transfer learning for the prediction of interactions between drug candidate compounds and understudied target proteins with scarce training data. The idea here is to first train a deep neural network classifier with a generalized source training dataset of large size and then to reuse this pre-trained neural network as an initial configuration for re-training/fine-tuning purposes with a small-sized specialized target training dataset. We selected six protein families that have critical importance in biomedicine: kinases, G-protein-coupled receptors (GPCRs), ion channels, nuclear receptors, proteases, and transporters. The performance of deep transfer learning is evaluated and compared with that of training the same deep neural network from scratch. We found that when the training dataset contains fewer than 100 compounds, transfer learning outperforms the conventional strategy of training the system from scratch, suggesting that transfer learning is advantageous for predicting binders to under-studied targets. The source code and datasets are available in github.com/cansyl/TransferLearning4DTI>. This web-based service contains the pre-trained models ready to be used for producing bioactivity predictions either by further model fine tuning on user-defined datasets or directly using the source models.

How to use the TL4DTI service?
This web-based service is provided so that the users can experience the methods described in this study with their own datasets. TL4DTI service consists of three modules: Fine-tune, Predict and Fine-tune&Predict. For all of the modules, six pre-trained source models (each of which trained with the source training dataset of six protein families) are available. The Fine-tune module allows the user to create a target model from a pre-trained source model via transfer learning. The Fine-tune module accepts files or text pasted directly into the text box. Before uploading a dataset for training, one of the six pre-trained source models must be chosen. Kinase is the default source model. Each line of the training dataset file should include a compound's identifier, its SMILES representation, and a binary interaction value (1 for an active compound and 0 for an inactive compound). After uploading the training file or pasting your entry, you should click the submit button. If an e-mail address is entered, the associated job ID will be emailed to the user when the corresponding job is complete; otherwise, the user must have saved and safely stored the associated job ID. Using the associated job ID, the user can locate the job under the Job List tab and download the refined model.

Predict module lets the user perform prediction directly using one of the pre-trained source models (default source model: Kinase). Then, a test dataset should be uploaded. The test dataset should contain the identifier of compounds and their SMILES representations. The results (i.e., predictions for the test dataset) can be accessed using the corresponding job ID.

Fine-tune&Predict module is the combination of the previous two modules. The user should provide training dataset and test dataset in the formats previously explained. After the job is complete, predictions for the test dataset can be downloaded.

Link to pre-print