Russian Federation
Russian Federation
UDK 004.891.3 Диагностические экспертные системы
One of the most serious problems limiting the possibility of using intelligent methods of processing diagnostic information in the tasks of diagnosing complex technical objects is the difficulty of forming a training sample for all classes of the state of the object in an amount sufficient for high-quality training of reference diagnostic models or classifiers, due to high absolute reliability indicators of such objects. An effective way to solve the problem is to augment (artificially expand) training data. A feature of training samples in technical diagnostics tasks is the generally unknown type of their distribution in the space of features, while additional "synthetic" data should be distributed similarly to the actual training set to ensure high-quality training of the diagnostic model. As a result of the analysis of existing data augmentation methods, it was established that the possibility of determining data distribution parameters of the training sample in the course of training with subsequent reproduction of these parameters in the generated samples can be implemented in generative models based on variational autoencoders (VAE) and generative-adversarial networks (GAN). At the same time, the best results are achieved using GAN. In the tasks of intelligent classification of the state of a diagnostic object with marked training samples for generating additional data, it is preferable to use conditional GAN (CGAN). A serious problem that arises in solving practical problems related to the generation of additional data on the available sample (training sample of a small volume) is the assessment of the uniformity of the training and generated samples, the results of which determine the duration (number of eras) of the training process of the generative model. The paper proposes and substantiates an original method of estimating uniformity of multidimensional samples based on Ripley’s G and F functions used in spatial cluster analysis of point processes. Based on it, a quantitative indicator has been determined for quality control and training duration of the generative model. The efficiency of the proposed method is confirmed by the example of solving the problem of augmentation of training data for the reference diagnostic model of the gas-air path of a diesel locomotive.
machine learning model, training sample, intelligent classifier, diagnostic object, generative adversarial networks, data augmentation, multidimensional samples uniformity control, spatial analysis, Ripley’s function
1. Fedotov M. V. Prediktivnaya analitika tehnicheskogo sostoyaniya sistem teplovozov s ispol'zovaniem neyrosetevyh prognoznyh modeley / M. V. Fedotov, V. V. Grachev // Byulleten' rezul'tatov nauchnyh issledovaniy. - 2021. - № 3. - S. 102-114. - DOI:https://doi.org/10.20295/2223-9987-2021-3-102-114.
2. Fedotov M. V. Sposoby povysheniya kachestva obucheniya neyrosetevyh diagnosticheskih modeley slozhnyh tehnicheskih ob'ektov / M. V. Fedotov, A. L. Sharapov, V. V. Grachev // Integrirovannye modeli i myagkie vychisleniya v iskusstvennom intellekte IMMV-2022: Sbornik nauchnyh trudov XI Mezhdunarodnoy nauchno-prakticheskoy konferencii. V 2 tomah, Kolomna, 16-19 maya 2022 goda. Tom 1. - Kolomna: Obscherossiyskaya obschestvennaya organizaciya «Rossiyskaya associaciya iskusstvennogo intellekta», 2022. - S. 258-267.
3. Voroncov K. V. Lekcii po teorii obobschayuschey sposobnosti / K. V. Voroncov. - URL: http://www.ccas.ru/voron/download/Generalization.pdf (data obrascheniya: 07.05.2023).
4. Shorten C. A survey on Image Data Augmentation for Deep Learning / C. Shorten, T. M. Khoshgoftaar // Journal of Big Data. - 2019. - Vol. 6, Article number: 60. - URL: https://journalofbigdataspringeropen.com/articles/10.1186/s40537-019-0197-0 (data obrascheniya: 07.05.2023).
5. Krizhevsky A. ImageNet classification with deep convolutional neural networks / A. Krizhevsky, I. Sutskever, G. E. Hinton // Adv Neural Inf Process Syst. - 2012. - Iss. 25. - Pp. 1106-1114.
6. Guoliang K. PatchShufe regularization / K. Guoliang, D. Xuanyi, Z. Liang et al. // arXiv preprint, 2017.
7. Zhun Z. Random erasing data augmentation / Z. Zhun, Z. Liang, K. Guoliang // arXiv e-prints, 2017.
8. Ken C. Return of the devil in the details: delving deep into convolutional nets / C. Ken, S. Karen, V. Andrea et al. // Proceedings of BMVC. - 2014.
9. Chernobrovov A. Kak obmanut' neyroset' ili chto takoe Adversarial attack / A. Chernobrovov. - 2020. - URL: https://www.chernobrovov.ru/articles/kak-obmanut-nejroset-ili-chto-takoe-adversarial-attack.html (data obrascheniya: 07.05.2023).
10. Seyed-Mohsen M. D. A simple and accurate method to fool deep neural networks / M. D. Seyed-Mohsen, F. Alhussein, F. Pascal et al. - ArXiv preprint, 2016.
11. Jiawei S. One pixel attack for fooling deep neural networks / S. Jiawei, W. Danilo, K. Sakurai // arXiv preprints, 2018.
12. Surcukov M. Avtoenkodery v Keras / M. Surcukov. - 2017. - URL: https://habr.com/ru/articles/331382/ (data obrascheniya: 07.05.2023).
13. Goodfellow I. J. Generative Adversarial NetWork / I. J. Goodfellow, J. Pouget-Abadie, M. Mirza et al. - Reprint arXiv: 1406.2661-2014.
14. Mehdi M. Conditional Generative Adversarial Nets / M. Mehdi, S. Osindero // arXiv:1411.1784. - 2014.
15. Kak nayti shodstvo mezhdu dvumya raspredeleniyami veroyatnostey s pomosch'yu Python. - 2023. - URL: https://questu.ru/articles/352904/ (data obrascheniya: 21.06.2023).
16. Ssylka na funkciyu rasstoyaniya Vassershteyna v Python. - 2023. - URL: https://question-it.com/questions/15429235/ssylka-na-funktsiju-rasstojanija-vassershtejna-v-python (data obrascheniya: 21.06.2023).
17. Grachev V. V. Diagnostirovanie gazovozdushnogo trakta teplovoznogo dizelya s ispol'zovaniem intellektual'nogo klassifikatora / V. V. Grachev, M. V. Fedotov, A. V. Grischenko i dr. // Byulleten' rezul'tatov nauchnyh issledovaniy. - 2022. - № 2. - S. 124-140. - DOI:https://doi.org/10.20295/2223-9987-2022-2-124-140.
18. Foster D. Generativnoe glubokoe obuchenie. Tvorcheskiy potencial neyronnyh setey / D. Foster. - SPb.: Piter, 2020. - 336 s.
19. Metody izmereniya rasstoyaniya i podobiya. - 2023. - URL: https://russianblogs.com/article/62221539035/ (data obrascheniya: 22.06.2023).
20. Rey S. Distance Based Statistical Method for Planar Point Patterns / S. Rey, W. Kang. - URL: https://pysal.org/notebooks/explore/pointpats/distance_statistics.html (data obrascheniya: 01.07.2023).