Intellectual Technologies on Transport

Интеллектуальные технологии на транспорте

2413-2527

89539

10.20295/2413-2527-2024-339-5-12

Искусственный интеллект и машинное обучение

Artificial intelligence and machine learning

Искусственный интеллект и машинное обучение

Methods for Optimizing the Training and Fine-Tuning Large Language Models

Методы оптимизации процесса обучения и тонкой настройки больших языковых моделей

Самонов

Александр Валерьянович

Samonov

Aleksandr Valer'yanovich

a.samonov@mail.ru

кандидат технических наук;

candidate of technical sciences;

Военно-космическая академия имени А. Ф. Можайского Санкт-Петербург Россия Mozhaisky Military Aerospace Academy Saint-Petersburg Russian Federation

09 10 2024

3 5 12 09 10 2024

https://atjournal.ru/en/nauka/article/89539/view

Основными проблемными вопросами при разработке и специализации больших языковых моделей (Large Language Model — LLM, ) являются катастрофическое забывание, риск переобучения, галлюцинации, некорректная обработка исключительных ситуаций, а также исключительно высокие требования к производительности используемых при этом вычислительных средств. Целями исследования являются выбор и разработка методов оптимизации процесса обучения и настройки LLM, обеспечивающих существенное снижение необходимых для этого вычислительных ресурсов. Для достижения данной цели предложено использовать следующие методы оптимизации LLM и алгоритмов их обучения: LoRA и QLoRA, Batch size choice (выбор оптимального размера пакета), Gradient Accumulation (накопление градиента), Gradient Checkpointing (контрольные точки градиента), Mixed precision training (смешанная точность), FlashAttention 2. Для получения кумулятивного положительного эффекта при совместном использовании этих методов необходимо выполнить ряд практических экспериментов. При настройке гиперпараметров обучения LLM сначала следует определить, какой размер пакета дает наилучшие результаты, а затем выбрать адекватные методы оптимизации используемых вычислительных ресурсов. Применение представленных методов позволит повысить эффективность использования вычислительных ресурсов при настройке больших языковых моделей и обеспечит сокращение необходимых для этого временных и финансовых затрат.

The main problematic issues in the development and specialization of LLM are: catastrophic forgetting, the risk of overfitting, hallucinations, incorrect interpretations, incorrect processing of exceptional situations as well as exceptionally high performance requirements for the computing tools used in this case. The purpose of the study is to select and develop methods for optimizing the training and fine-tuning process LLM, providing a significant reduction in the computing resources required for this. To achieve this goal, it is proposed to use the following methods of optimizing LLMs and their learning algorithms: LoRA and QLoRA, Batch size choice, Gradient Accumulation, Gradient Checkpoint, Mixed precision training, FlashAttention-2. To obtain a cumulative positive effect when using these methods together, it is necessary to perform a number of practical experiments. When setting up LLM learning hyperparameters, you should first determine which package size gives the best results, and then choose adequate methods to optimize the computing resources used. The application of the presented methods will increase the efficiency of using computing resources when training and fine-tuning large language models and will reduce the time and financial costs necessary for this.

большая языковая модель графический процессор накопление градиента смешанная точность точная настройка LLM Large Language Model Low-Rank Adaptation

fine-tuning gradient accumulation graphics processing unit Large Language Model Low-Rank Adaptation mixed precision

A Survey of Large Language Models / W. Zhao [et al.] // ArXiv. 2023. Vol. 2303.18223. 124 p. DOI: 10.48550/ arXiv.2303.18223

Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning / V. Lialin [et al.] // ArXiv. 2023. Vol. 2303.15647. 21 p. DOI: 10.48550/arXiv.2303.15647

Matrix Multiplication Background User’s Guide // NVIDIA Documentation Hub. URL: http://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication (accessed 26 Mar 2024).

Bekman S. Benchmarking Transformers with HF Trainer on a Single A100 40GB // Github. URL: http://github.com/huggingface/transformers/issues/15026 (accessed 26 Mar 2024).

LORA: Low-Rank Adaptation of Large Language Models / E. Hu [et al.] // ArXiv. 2021. Vol. 2106.09685. 26 p. DOI: 10.48550/arXiv.2106.09685

LLaMA-Adapter: Efficient Fine-Tuning of Language Models with Zero-Init Attention / R. Zhang [et al.] // ArXiv. 2023. Vol. 2303.16199. 22 p. DOI: 10.48550/arXiv.2303.16199

Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-Trained Language Models / N. Ding [et al.] // ArXiv. 2022. Vol. 2203.06904. 49 p. DOI: 10.48550/arXiv.2203.06904

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models / Y. Xu [et al.] // ArXiv. 2023. Vol. 2309.14717. 16 p. DOI: 10.48550/arXiv.2309.14717

QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning / H. Rajabzadeh [et al.] // ArXiv. 2024. Vol. 2402.10462. 6 p. DOI: 10.48550/arXiv.2402.10462

10.

Methods and Tools for Efficient Training on a Single GPU // Hugging Face Community. URL: http://huggingface.co/docs/transformers/perf_train_gpu_one (accessed 26 Mar 2024).

11.

Goodfellow I., Bengio Y., Courville A. Optimization for Training Deep Model // Deep Learning. Cambridge (MA): MIT Press, 2016. Pp. 267–320.

12.

Bekman S. Benchmarking Transformers with HF Trainer on RTX-309 // Github. URL: http://github.com/huggingface/transformers/issues/14608 (accessed 26 Mar 2024).

13.

Linear/Fully Connected Layers User’s Guide // NVIDIA Documentation Hub. URL: http://docs.nvidia.com/deeplearning/performance/dl-performance-fully-connected (accessed 26 Mar 2024).

14.

Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models / M. Weyssow [et al.] // ArXiv. 2023. Vol. 2308.10462. 23 p. DOI: 10.48550/arXiv.2308.10462

15.

PTraining FP8 Large Language Models / H. Peng [et al.] // ArXiv. 2023. Vol. 2310.18313. 23 p. DOI: 10.48550/ arXiv.2310.18313.10.48550/arXiv.2310.18313