<!DOCTYPE article
PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20190208//EN"
       "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.4" xml:lang="en">
 <front>
  <journal-meta>
   <journal-id journal-id-type="publisher-id">Intellectual Technologies on Transport</journal-id>
   <journal-title-group>
    <journal-title xml:lang="en">Intellectual Technologies on Transport</journal-title>
    <trans-title-group xml:lang="ru">
     <trans-title>Интеллектуальные технологии на транспорте</trans-title>
    </trans-title-group>
   </journal-title-group>
   <issn publication-format="online">2413-2527</issn>
  </journal-meta>
  <article-meta>
   <article-id pub-id-type="publisher-id">89539</article-id>
   <article-id pub-id-type="doi">10.20295/2413-2527-2024-339-5-12</article-id>
   <article-categories>
    <subj-group subj-group-type="toc-heading" xml:lang="ru">
     <subject>Искусственный интеллект и машинное обучение</subject>
    </subj-group>
    <subj-group subj-group-type="toc-heading" xml:lang="en">
     <subject>Artificial intelligence and machine learning</subject>
    </subj-group>
    <subj-group>
     <subject>Искусственный интеллект и машинное обучение</subject>
    </subj-group>
   </article-categories>
   <title-group>
    <article-title xml:lang="en">Methods for Optimizing the Training and Fine-Tuning Large Language Models</article-title>
    <trans-title-group xml:lang="ru">
     <trans-title>Методы оптимизации процесса обучения и тонкой настройки больших языковых моделей</trans-title>
    </trans-title-group>
   </title-group>
   <contrib-group content-type="authors">
    <contrib contrib-type="author">
     <name-alternatives>
      <name xml:lang="ru">
       <surname>Самонов</surname>
       <given-names>Александр Валерьянович</given-names>
      </name>
      <name xml:lang="en">
       <surname>Samonov</surname>
       <given-names>Aleksandr Valer'yanovich</given-names>
      </name>
     </name-alternatives>
     <email>a.samonov@mail.ru</email>
     <bio xml:lang="ru">
      <p>кандидат технических наук;</p>
     </bio>
     <bio xml:lang="en">
      <p>candidate of technical sciences;</p>
     </bio>
     <xref ref-type="aff" rid="aff-1"/>
    </contrib>
   </contrib-group>
   <aff-alternatives id="aff-1">
    <aff>
     <institution xml:lang="ru">Военно-космическая академия имени А. Ф. Можайского</institution>
     <city>Санкт-Петербург</city>
     <country>Россия</country>
    </aff>
    <aff>
     <institution xml:lang="en">Mozhaisky Military Aerospace Academy</institution>
     <city>Saint-Petersburg</city>
     <country>Russian Federation</country>
    </aff>
   </aff-alternatives>
   <pub-date publication-format="print" date-type="pub" iso-8601-date="2024-10-09T00:00:00+03:00">
    <day>09</day>
    <month>10</month>
    <year>2024</year>
   </pub-date>
   <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2024-10-09T00:00:00+03:00">
    <day>09</day>
    <month>10</month>
    <year>2024</year>
   </pub-date>
   <issue>3</issue>
   <fpage>5</fpage>
   <lpage>12</lpage>
   <history>
    <date date-type="received" iso-8601-date="2024-10-09T00:00:00+03:00">
     <day>09</day>
     <month>10</month>
     <year>2024</year>
    </date>
   </history>
   <self-uri xlink:href="https://atjournal.ru/en/nauka/article/89539/view">https://atjournal.ru/en/nauka/article/89539/view</self-uri>
   <abstract xml:lang="ru">
    <p>Основными проблемными вопросами при разработке и специализации больших языковых моделей (Large Language Model — LLM, ) являются катастрофическое забывание, риск переобучения, галлюцинации, некорректная обработка исключительных ситуаций, а также исключительно высокие требования к производительности используемых при этом вычислительных средств. Целями исследования являются выбор и разработка методов оптимизации процесса обучения и настройки LLM, обеспечивающих существенное снижение необходимых для этого вычислительных ресурсов. Для достижения данной цели предложено использовать следующие методы оптимизации LLM и алгоритмов их обучения: LoRA и QLoRA, Batch size choice (выбор оптимального размера пакета), Gradient Accumulation (накопление градиента), Gradient Checkpointing (контрольные точки градиента), Mixed precision training (смешанная точность), FlashAttention 2. Для получения кумулятивного положительного эффекта при совместном использовании этих методов необходимо выполнить ряд практических экспериментов. При настройке гиперпараметров обучения LLM сначала следует определить, какой размер пакета дает наилучшие результаты, а затем выбрать адекватные методы оптимизации используемых вычислительных ресурсов. Применение представленных методов позволит повысить эффективность использования вычислительных ресурсов при настройке больших языковых моделей и обеспечит сокращение необходимых для этого временных и финансовых затрат.</p>
   </abstract>
   <trans-abstract xml:lang="en">
    <p>The main problematic issues in the development and specialization of LLM are: catastrophic forgetting, the risk of overfitting, hallucinations, incorrect interpretations, incorrect processing of exceptional situations as well as exceptionally high performance requirements for the computing tools used in this case. The purpose of the study is to select and develop methods for optimizing the training and fine-tuning process LLM, providing a significant reduction in the computing resources required for this. To achieve this goal, it is proposed to use the following methods of optimizing LLMs and their learning algorithms: LoRA and QLoRA, Batch size choice, Gradient Accumulation, Gradient Checkpoint, Mixed precision training, FlashAttention-2. To obtain a cumulative positive effect when using these methods together, it is necessary to perform a number of practical experiments. When setting up LLM learning hyperparameters, you should first determine which package size gives the best results, and then choose adequate methods to optimize the computing resources used. The application of the presented methods will increase the efficiency of using computing resources when training and fine-tuning large language models and will reduce the time and financial costs necessary for this.</p>
   </trans-abstract>
   <kwd-group xml:lang="ru">
    <kwd>большая языковая модель</kwd>
    <kwd>графический процессор</kwd>
    <kwd>накопление градиента</kwd>
    <kwd>смешанная точность</kwd>
    <kwd>точная настройка LLM</kwd>
    <kwd>Large Language Model</kwd>
    <kwd>Low-Rank Adaptation</kwd>
   </kwd-group>
   <kwd-group xml:lang="en">
    <kwd>fine-tuning</kwd>
    <kwd>gradient accumulation</kwd>
    <kwd>graphics processing unit</kwd>
    <kwd>Large Language Model</kwd>
    <kwd>Low-Rank Adaptation</kwd>
    <kwd>mixed precision</kwd>
   </kwd-group>
  </article-meta>
 </front>
 <body>
  <p></p>
 </body>
 <back>
  <ref-list>
   <ref id="B1">
    <label>1.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">A Survey of Large Language Models / W. Zhao [et al.] // ArXiv. 2023. Vol. 2303.18223. 124 p. DOI: 10.48550/ arXiv.2303.18223</mixed-citation>
     <mixed-citation xml:lang="en">A Survey of Large Language Models / W. Zhao [et al.] // ArXiv. 2023. Vol. 2303.18223. 124 p. DOI: 10.48550/ arXiv.2303.18223</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B2">
    <label>2.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning / V. Lialin [et al.] // ArXiv. 2023. Vol. 2303.15647. 21 p. DOI: 10.48550/arXiv.2303.15647</mixed-citation>
     <mixed-citation xml:lang="en">Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning / V. Lialin [et al.] // ArXiv. 2023. Vol. 2303.15647. 21 p. DOI: 10.48550/arXiv.2303.15647</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B3">
    <label>3.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Matrix Multiplication Background User’s Guide // NVIDIA Documentation Hub. URL: http://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication (accessed 26 Mar 2024).</mixed-citation>
     <mixed-citation xml:lang="en">Matrix Multiplication Background User’s Guide // NVIDIA Documentation Hub. URL: http://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication (accessed 26 Mar 2024).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B4">
    <label>4.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Bekman S. Benchmarking Transformers with HF Trainer on a Single A100 40GB // Github. URL: http://github.com/huggingface/transformers/issues/15026 (accessed 26 Mar 2024).</mixed-citation>
     <mixed-citation xml:lang="en">Bekman S. Benchmarking Transformers with HF Trainer on a Single A100 40GB // Github. URL: http://github.com/huggingface/transformers/issues/15026 (accessed 26 Mar 2024).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B5">
    <label>5.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">LORA: Low-Rank Adaptation of Large Language Models / E. Hu [et al.] // ArXiv. 2021. Vol. 2106.09685. 26 p. DOI: 10.48550/arXiv.2106.09685</mixed-citation>
     <mixed-citation xml:lang="en">LORA: Low-Rank Adaptation of Large Language Models / E. Hu [et al.] // ArXiv. 2021. Vol. 2106.09685. 26 p. DOI: 10.48550/arXiv.2106.09685</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B6">
    <label>6.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">LLaMA-Adapter: Efficient Fine-Tuning of Language Models with Zero-Init Attention / R. Zhang [et al.] // ArXiv. 2023. Vol. 2303.16199. 22 p. DOI: 10.48550/arXiv.2303.16199</mixed-citation>
     <mixed-citation xml:lang="en">LLaMA-Adapter: Efficient Fine-Tuning of Language Models with Zero-Init Attention / R. Zhang [et al.] // ArXiv. 2023. Vol. 2303.16199. 22 p. DOI: 10.48550/arXiv.2303.16199</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B7">
    <label>7.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-Trained Language Models / N. Ding [et al.] // ArXiv. 2022. Vol. 2203.06904. 49 p. DOI: 10.48550/arXiv.2203.06904</mixed-citation>
     <mixed-citation xml:lang="en">Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-Trained Language Models / N. Ding [et al.] // ArXiv. 2022. Vol. 2203.06904. 49 p. DOI: 10.48550/arXiv.2203.06904</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B8">
    <label>8.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models / Y. Xu [et al.] // ArXiv. 2023. Vol. 2309.14717. 16 p. DOI: 10.48550/arXiv.2309.14717</mixed-citation>
     <mixed-citation xml:lang="en">QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models / Y. Xu [et al.] // ArXiv. 2023. Vol. 2309.14717. 16 p. DOI: 10.48550/arXiv.2309.14717</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B9">
    <label>9.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning / H. Rajabzadeh [et al.] // ArXiv. 2024. Vol. 2402.10462. 6 p. DOI: 10.48550/arXiv.2402.10462</mixed-citation>
     <mixed-citation xml:lang="en">QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning / H. Rajabzadeh [et al.] // ArXiv. 2024. Vol. 2402.10462. 6 p. DOI: 10.48550/arXiv.2402.10462</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B10">
    <label>10.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Methods and Tools for Efficient Training on a Single GPU // Hugging Face Community. URL: http://huggingface.co/docs/transformers/perf_train_gpu_one (accessed 26 Mar 2024).</mixed-citation>
     <mixed-citation xml:lang="en">Methods and Tools for Efficient Training on a Single GPU // Hugging Face Community. URL: http://huggingface.co/docs/transformers/perf_train_gpu_one (accessed 26 Mar 2024).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B11">
    <label>11.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Goodfellow I., Bengio Y., Courville A. Optimization for Training Deep Model // Deep Learning. Cambridge (MA): MIT Press, 2016. Pp. 267–320.</mixed-citation>
     <mixed-citation xml:lang="en">Goodfellow I., Bengio Y., Courville A. Optimization for Training Deep Model // Deep Learning. Cambridge (MA): MIT Press, 2016. Pp. 267–320.</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B12">
    <label>12.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Bekman S. Benchmarking Transformers with HF Trainer on RTX-309 // Github. URL: http://github.com/huggingface/transformers/issues/14608 (accessed 26 Mar 2024).</mixed-citation>
     <mixed-citation xml:lang="en">Bekman S. Benchmarking Transformers with HF Trainer on RTX-309 // Github. URL: http://github.com/huggingface/transformers/issues/14608 (accessed 26 Mar 2024).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B13">
    <label>13.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Linear/Fully Connected Layers User’s Guide // NVIDIA Documentation Hub. URL: http://docs.nvidia.com/deeplearning/performance/dl-performance-fully-connected (accessed 26 Mar 2024).</mixed-citation>
     <mixed-citation xml:lang="en">Linear/Fully Connected Layers User’s Guide // NVIDIA Documentation Hub. URL: http://docs.nvidia.com/deeplearning/performance/dl-performance-fully-connected (accessed 26 Mar 2024).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B14">
    <label>14.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models / M. Weyssow [et al.] // ArXiv. 2023. Vol. 2308.10462. 23 p. DOI: 10.48550/arXiv.2308.10462</mixed-citation>
     <mixed-citation xml:lang="en">Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models / M. Weyssow [et al.] // ArXiv. 2023. Vol. 2308.10462. 23 p. DOI: 10.48550/arXiv.2308.10462</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B15">
    <label>15.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">PTraining FP8 Large Language Models / H. Peng [et al.] // ArXiv. 2023. Vol. 2310.18313. 23 p. DOI: 10.48550/ arXiv.2310.18313.10.48550/arXiv.2310.18313</mixed-citation>
     <mixed-citation xml:lang="en">PTraining FP8 Large Language Models / H. Peng [et al.] // ArXiv. 2023. Vol. 2310.18313. 23 p. DOI: 10.48550/ arXiv.2310.18313.10.48550/arXiv.2310.18313</mixed-citation>
    </citation-alternatives>
   </ref>
  </ref-list>
 </back>
</article>
