The object of the research is the analysis of the sentiment of the Russian-language corpus of texts. The subject of the research is a comparison of the effectiveness of the approaches of preliminary text cleaning before sentiment analysis. The aim of the research is to develop a generalized method for preliminary data cleaning to create a neural network model. A distinctive feature of the proposed solutions is the use of modern and lightweight libraries for the possibility of preliminary preparation of a text for training with a neural network, and the hypothesis of using a truncated dictionary based on the assumption of data redundancy has been tested. The results obtained show the usefulness of the developed algorithm in terms of obtaining improved results in the learning process and indicate that, due to its versatility, it can be extrapolated for further use on other text data.
mining, data analysis, sentiment analysis, neural networks, text processing
1.