Abstract and keywords
Abstract (English):
One of the most serious problems limiting the possibility of using intelligent methods of processing diagnostic information in the tasks of diagnosing complex technical objects is the difficulty of forming a training sample for all classes of the state of the object in an amount sufficient for high-quality training of reference diagnostic models or classifiers, due to high absolute reliability indicators of such objects. An effective way to solve the problem is to augment (artificially expand) training data. A feature of training samples in technical diagnostics tasks is the generally unknown type of their distribution in the space of features, while additional "synthetic" data should be distributed similarly to the actual training set to ensure high-quality training of the diagnostic model. As a result of the analysis of existing data augmentation methods, it was established that the possibility of determining data distribution parameters of the training sample in the course of training with subsequent reproduction of these parameters in the generated samples can be implemented in generative models based on variational autoencoders (VAE) and generative-adversarial networks (GAN). At the same time, the best results are achieved using GAN. In the tasks of intelligent classification of the state of a diagnostic object with marked training samples for generating additional data, it is preferable to use conditional GAN (CGAN). A serious problem that arises in solving practical problems related to the generation of additional data on the available sample (training sample of a small volume) is the assessment of the uniformity of the training and generated samples, the results of which determine the duration (number of eras) of the training process of the generative model. The paper proposes and substantiates an original method of estimating uniformity of multidimensional samples based on Ripley’s G and F functions used in spatial cluster analysis of point processes. Based on it, a quantitative indicator has been determined for quality control and training duration of the generative model. The efficiency of the proposed method is confirmed by the example of solving the problem of augmentation of training data for the reference diagnostic model of the gas-air path of a diesel locomotive.

machine learning model, training sample, intelligent classifier, diagnostic object, generative adversarial networks, data augmentation, multidimensional samples uniformity control, spatial analysis, Ripley’s function
