fine-tuning
Fine-tuning is a process in deep learning where a pre-trained model is further trained (or “fine-tuned”) on a new dataset, which may be smaller or from a different domain than the original training data. The idea is to leverage the knowledge the model has already acquired and adapt it to the new task or dataset. This is particularly useful when the new dataset is too small to train a model from scratch effectively[1][2].
Here’s how fine-tuning works:
- Transfer Learning: Fine-tuning is a form of transfer learning where the knowledge from a source task is transferred to a target task. The pre-trained model has learned a representation that can be useful for the new task[1][4].
- Partial Training: During fine-tuning, typically only a subset of the model’s layers are retrained. Often, the earlier layers, which capture more generic features, are kept frozen, and the later layers are fine-tuned to adapt to the specifics of the new task[1].
- Adapters: Sometimes, instead of fine-tuning the entire network or its layers, adapters (small modules or layers) are added to the model and only these are fine-tuned. This can be more parameter-efficient[1].
- Learning Rate: The learning rate used during fine-tuning is usually much smaller than the one used during the initial training. This is to make small adjustments to the weights without distorting the pre-learned features too much[5].
- Domain-Similarity: Fine-tuning is most effective when the source and target tasks are similar. If the domains are too different, the pre-trained model may not provide a useful starting point[5].
- Robustness: Fine-tuning can sometimes degrade a model’s robustness to distribution shifts, meaning it may perform worse on data that is not represented in the fine-tuning dataset[1].
- Efficiency: Fine-tuning can be more efficient than training a model from scratch, especially when computational resources are limited or when the available dataset for the new task is small[2].
- Commercial Models: Some commercial models, like those offered by OpenAI and Microsoft, support fine-tuning, allowing users to customize models for specific applications[1].
Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are “frozen” (not updated during the training process)[1].
Citations:
[1] https://en.wikipedia.org/wiki/Fine-tuning_%28deep_learning%29
[2] https://stats.stackexchange.com/questions/331369/what-is-meant-by-fine-tuning-of-neural-network
[3] https://community.openai.com/t/how-does-fine-tuning-really-work/39972
[4] http://d2l.ai/chapter_computer-vision/fine-tuning.html
[5] https://www.baeldung.com/cs/fine-tuning-nn
[6] https://blog.pangeanic.com/what-is-fine-tuning
[7] https://deeplizard.com/learn/video/5T-iXNNiwIs
[11] https://platform.openai.com/docs/guides/fine-tuning
[12] https://encord.com/blog/training-vs-fine-tuning/
[13] https://www.turing.com/resources/finetuning-large-language-models
[14] https://innodata.com/quick-concepts-fine-tuning-in-generative-ai/
[15] https://chrisalbon.com/Large+Language+Models/Fine-Tuning+Vs.+Training
[16] https://intellipaat.com/blog/fine-tuning/
[19] https://blog.pangeanic.com/what-is-fine-tuning
[20] https://en.wikipedia.org/wiki/Fine-tuning_(deep_learning)
[22] https://deeplizard.com/learn/video/5T-iXNNiwIs
[24] https://discuss.huggingface.co/t/does-fine-tuning-mean-retraining-the-entire-model/26263
[25] https://www.lakera.ai/blog/llm-fine-tuning-guide
[26] https://deci.ai/deep-learning-glossary/fine-tuning/
[28] https://www.deeplearning.ai/short-courses/finetuning-large-language-models/
[29] https://www.ankursnewsletter.com/p/pre-training-vs-fine-tuning-large
[30] https://www.techtarget.com/searchenterpriseai/definition/fine-tuning
[31] https://www.linkedin.com/pulse/difference-between-retraining-fine-tuning-sanjay-kumar-mba-ms-phd