Transfer Learning is a machine learning approach where knowledge from a model trained on one task is reused and adapted for a related task. Instead of training a model from scratch, transfer learning applies pre-trained models—often trained on large, general datasets—to improve accuracy, reduce training time, and support cases where labeled data is limited. This technique is widely used in computer vision, natural language processing (NLP), speech recognition, and other domains requiring deep learning.
Core Characteristics of Transfer Learning
- Source and Target Domains
A model is first trained on a source domain with abundant labeled data and later adapted to a target domain with less data. Knowledge transfer improves learning efficiency and helps overcome data scarcity.
- Feature Extraction vs. Fine-Tuning
Two primary strategies include:
- Feature Extraction: Freeze most model layers and retrain only final layers to adapt to the new task.
- Fine-Tuning: Continue training the full model with a low learning rate to adjust weights to the new dataset.
- Model Architecture Adaptation
Deep learning architectures such as CNNs for vision tasks and transformer models (e.g., BERT, GPT) for language tasks are commonly used due to their strong generalization capabilities.
- Mathematical Representation
Transfer learning seeks to minimize the target loss function using knowledge from the source parameters:
minLT(fT(DT∣θS))\min L_T(f_T(D_T | \theta_S))minLT(fT(DT∣θS))
Where θS\theta_SθS represents parameters learned from the source model and adapted for the target model.
- Domain Similarity and Transferability
Effectiveness increases when the source and target domains are closely related. When domains are too different, negative transfer may occur, reducing model accuracy.
- Common Pre-Trained Models
Examples include:
- ResNet, VGG, Inception (Computer Vision)
- BERT, GPT, RoBERTa (NLP)
- Whisper, Wav2Vec (Speech Recognition)
Related Terms