Россия
Россия
Россия
В статье представлены алгоритмы оценивания показателей качества распределенных систем хранения данных, применяемых для накопления и обработки данных автоматизированных информационных систем различного назначения, обслуживающих большое количество территориально распределенных клиентов. Приведены математические выражения показателей конфиденциальности, доступности и стоимости хранения данных. Предложенные алгоритмы позволяют проанализировать распределение информационных ресурсов по элементам системы хранения данных для выбора рациональных решений по организации хранения конфиденциальной информации.
распределенная система хранения данных, конфиденциальность данных
Modern automated information systems that serve a
large number of users contain data storage systems (DSS),
the quality of which is subject to high requirements [1–3].
In order to analyze the values of quality indicators
of the functioning of storage systems, a methodological
apparatus is needed for assessing the following
indicators of storing data located in storage systems:
confidentiality, availability and cost.
A feature of the proposed approach is the use of quantitative
estimates of the indicators under consideration.
The article discusses mathematical expressions that
make it possible to quantify the values of the listed indicators
and proposes algorithms for their calculation.
Quality indicators
of a distributed data storage system
A distributed information system (RIS), including
client automated workstations (AWS) and data processing
centers (DPCs), united by a global computer
network (fig. 1), allows solving applied problems
based on the collection, storage, processing and transmission
of target information [4–7].
We will assume that the operating technology
of such a RIS involves collecting information
from client workstations and processing it
in one or more data centers. In this case, information
is accumulated in a database (DB) and stored
in a data center.
14 Интеллектуальные технологии на транспорте. 2024. № 2
Математическое моделирование, численные методы и комплексы программ
Fig. 1. Distributed information system
Information received from clients is stored in relational
databases with a table structure (fig. 2). The database
consists of n records or their blocks, each of which
contains service information from one RIS client.
Records
Recording storage location
α1 α2 ... αm
r1 0 1 0
r2 1 0 0
...
rn 1 0 1
Fig. 2. Structure of stored information in the database
In order to increase reliability, database records
can be stored on several elements of a data storage
system, which can have a different architecture and
represent a data storage of various capacities, accessed
by users via the global Internet. At the same
time, storing data on each storage system element
has a certain cost, depending on the volume of stored
information and the quality of the services provided
for its storage (reliability, confidentiality, communication
channel capacity, etc.).
In normal mode, RIS uses a database located
in the data center, and the remaining data centers are
used for backup information storage.
Each data center can be subject to destructive
effects, both external (cyber-attack, terrorist
attack, natural disaster, etc.) and internal (breakdown,
failure, unauthorized access to data), as a result
of which the integrity, availability and confidentiality
of stored information can be violated (or parts thereof),
which leads to denial of service to the RIS.
In order to meet the requirements for the quality
of information storage, database records can be
distributed among storage elements (data center
and workstation) in such a way as to ensure restoration
of access to data in the event of a data center service
failure within a given time, subject to restrictions on
the cost and confidentiality of their storage.
Data storage confidentiality indicator
Let a rectangular matrix be given H[n,m], each
element hij of which is equal to 1 if the i-th record,
1 ≤ i ≤ n, is placed on the j-th element of the storage
system, 1 ≤ j ≤ m and is equal to 0 otherwise.
Let also be given a vector C〈m〉 = 〈c1, c2,..., cm〉, each
element cj of which is the probability (guarantee level)
Intellectual Technologies on Transport. 2024. No. 2 15
Mathematical modeling, numerical methods and software packages
of ensuring confidentiality of data storage on the j-th
element of the storage system.
Let’s consider two options for ensuring the confidentiality
of data storage, each of which is advisable
to use in appropriate conditions [8–12].
Option 1. The confidentiality requirement applies
to each record separately.
Data confidentiality indicator — the minimum
level of confidentiality L for all database records will be
where β (x,y) = 1 at x = 0; 1 at x = 1
Option 2. The confidentiality requirement applies
to the entirety of the records.
Data confidentiality indicator — average value
of information storage confidentiality:
,
where combjk is the value of the j-th digit (0 or 1) in the
binary representation (combination) of the number k.
Data storage cost indicator
The cost of data storage is determined by the total
cost of storing it on all storage elements. If the cost of
storage is calculated through the price Z〈m〉 = 〈z1, z2,..., zm〉
for storing a unit of data volume on the j-th storage
element, then the total cost of storing all records on all
storage systems will be:
,
where zj is the cost of storing a unit of volume (GB,
TB) of data on the j-th storage system;
vj — vector component;
Vn = v1,..., vn — volume of the i-th record.
Data availability indicator
The availability of data depends on the readiness
of the storage system elements to provide the databases
stored in it. This indicator can be estimated
based on the availability factors g of the corresponding
storage elements [13–15]:
,
tp — the time of regular operation of the RIS,
during which the requested database records must
be provided with the established efficiency;
tв — time during which the requested database records
are unavailable.
If the availability coefficient of the j-th storage element
is gj, then the minimum level of availability G
for all data records will be
.
Obviously, this indicator depends on the placement
of database records, the readiness and state of storage
elements (serviceable/faulty).
To increase data availability, storage elements are
combined into clusters, duplicating information on
each element of such a cluster. However, duplication
of information increases the risk of unauthorized access
to it, which leads to a decrease in the confidentiality
of data storage.
Thus, the optimization problem arises of placing
information on storage elements in order to ensure
its maximum confidentiality under restrictions on the
availability and cost of data storage.
To ensure the adequacy of the solution to this
problem, algorithms are proposed for assessing the
confidentiality, availability and cost of data storage,
the use of which will make it possible to select
a rational configuration for placing information on
storage elements.
The following describes the step-by-step operation
of algorithms for assessing quality indicators
of storage systems, common to which are the following
notations: matrix H[n,m] — matrix for placing records
on storage systems; n — number of database
records (blocks); m — number of storage elements.
16 Интеллектуальные технологии на транспорте. 2024. № 2
Математическое моделирование, численные методы и комплексы программ
Algorithm for assessing information availability
in a distributed data storage system
In the presented algorithm gi, the availability
coefficient of the i-th storage element.
Algorithm 1.
Step 1. Start.
Step 2. i :=1 , G :=1.
Step 3. j :=1, d :=1.
Step 4. If hij = 1, then d := d·(1 – gi).
Step 5. j := j + 1.
Step 6. If j > N, then go to step 7, otherwise —
to step 4.
Step 7. G := min(G,1 – d).
Step 8. i := i + 1.
Step 9. If i > m, then go to step 10, otherwise —
to step 3.
Step 10. End.
As a result of the algorithm’s operation, the variable
G will have the value of the minimum level of
data recording availability. By comparing this value
with the required one, it is possible to assess the suitability
of data distribution among storage elements
from the point of view of information availability requirements.
The computational complexity of the algorithm
is O(mn).
Algorithm for estimating the cost of information
storage in a distributed data storage system
In the presented algorithm zj, is the price of storing
a unit of data volume on the j-th storage element, vi
and is the volume of the i-th data block.
Algorithm 2.
Step 1. Start.
Step 2. i := 0, S := 0.
Step 3. j := 1.
Step 4. If hij = 1, then S := S + zj · vi .
Step 5. j := j + 1.
Step 6. If j > N, then go to step 7, otherwise —
to step 4.
Step 7. i := i + 1.
Step 8. If i > m, then go to step 10, otherwise —
to step 3.
Step 9. Finish.
At the end of the algorithm, the variable will have
the value of the cost of data storage. By comparing
this value with the required one, it is possible to assess
the suitability of data distribution among storage elements
from the point of view of requirements for the
cost of information storage. The computational complexity
of the algorithm is O(mn).
Algorithms for assessing information
confidentiality in a distributed data storage system
In the algorithms under consideration ci, the level
of confidentiality of information storage on the i-th
storage element.
As a result of the operation of the algorithms,
the variable C will have a value of a certain level
of data storage confidentiality. Comparing this value
with the required one, it is possible to evaluate
the value of the confidentiality of data distributed
among storage elements in terms of information confidentiality
requirements.
Algorithm 3. Estimation of the minimum level
of data confidentiality.
Step 1. Start.
Step 2. i := 1, C := 1.
Step 3. j := 1, d := 1.
Step 4. If hij = 1, then d := d · ci.
Step 5. j := j + 1.
Step 6. If j > N, then go to step 7, otherwise —
to step 4.
Step 7. C := min(C,d).
Step 8. i := i + 1.
Step 9. If i > m, then go to step 10, otherwise —
to step 3.
Step 10. Finish.
Thus, the variable C will have the value of the minimum
level of confidentiality of data storage. The computational
complexity of the algorithm is O(m2).
Algorithm 4. Estimating the average level of information
confidentiality (brute force).
Step 1. Start.
Step 2. i := 1, C := 0.
Step 3. j := 1, u := 0.
Step 4. k := 1, d := 1.
Step 5. If hik = 1⌒comb(i,k – 1) = 1, then d := d · ci .
Step 6. k := k + 1.
Step 7. If k > m, then go to step 8, otherwise —
to step 5.
Intellectual Technologies on Transport. 2024. No. 2 17
Mathematical modeling, numerical methods and software packages
Step 8. u := u + d.
Step 9. j := j + 1.
Step 10. If j > N, then go to step 11, otherwise —
to step 4.
Step 11. C := C + u.
Step 12. i := i + 1.
Step 13. If i > 2m – 1, then go to step 14, otherwise —
to step 3.
Step 14. If C := C / n2m – 1.
Step 15. End.
In the presented algorithm, the value of the function
β is equal to the value β of the digit β∈{0,1},
in the binary code of the integer α, 0 ≤ α ≤ 2m – 1.
As a result of the algorithm’s operation, the variable
C will have the value of the average level of confidentiality
of data storage.
It should be noted that the algorithm has high computational
complexity O(2m) and is applicable for relatively
small values of the number of storage elements.
Since in practice there is a need to assess the confidentiality
of data storage on a significant number
of storage elements, it is advisable to use an approximate
approach to assessment based on “greedy algorithms”.
The algorithm proposed below, which belongs
to this type, implements a strategy of “greedy”
data collection on storage elements with the goal
of unauthorized collection of the maximum amount
of information at minimal cost.
Algorithm 5. Evaluating the confidentiality of data
storage (greedy algorithm).
Step 1. Start.
Step 2. Define the vector F = f1, f2,...,fm, fi ∈ {0,1}.
Step 3. F := 0, C := 0, k := 1.
Step 4. i := 1, d := 0.
Step 5. If fi = 1, then go to step 13, otherwise —
to step 6.
Step 6. j := 1, u := 0.
Step 7. u := u + H(j,i).
Step 8. j := j + 1.
Step 9. If j > N , then go to step 10, otherwise —
to step 7.
Step 10. If u · (1 – ci) ≥ d, then go to step 11, otherwise
— to step 13.
Step 11. d : = u · (1 – ci), l := i.
Step 12. fl : = 1, c : = c + d.
Step 13. i := i + 1.
Step 14. If i > m, then go to step 15, otherwise —
to step 5.
Step 15. If hjl = 1, then hjr = 0 ∀r = 1,...,m.
Step 16. k := k + 1.
Step 17. If k > m, then go to step 18, otherwise —
to step 4.
Step 18. End.
As a result of the algorithm’s operation, the variable
C will have the value of the pessimistic (minimum)
level of data storage confidentiality. The computational
complexity of the algorithm is O(m2).
Thus, the presented algorithms provide estimates
of data storage quality indicators.
Conclusion
The given algorithms have a practical focus
on studying the effectiveness of using storage systems
for collecting, storing and processing confidential information.
The main problem of using algorithms is obtaining
initial data, namely, estimates of the values of indicators
of the availability of storage elements and the
level of their provision of confidentiality of stored
information. Availability estimates can be obtained
based on software agents that operate on each storage
element and keep track of the active location of
the corresponding storage element on the network.
Obtaining confidentiality estimates can be found using
fuzzy data analysis models [16], taking into account
many parameters of their storage on storage elements.
1. Сухорослов О. В. Новые технологии распределенного хранения и обработки больших массивов данных. М.: Институт системного анализа РАН, 2021. 40 с.
2. Радченко Г. И. Распределенные вычислительные системы. Челябинск: Фотохудожник, 2012. 184 с.
3. Докучаев В. А., Кальфа А. А., Маклачкова В. В. Архитектура центров обработки данных. М.: Горячая линия — Телеком, 2023. 240 с.
4. Цилькер Б. Я., Орлов С. А. Организация ЭВМ и систем: учебник для вузов. 2-е изд. СПб.: Питер, 2011. 668 с.
5. Мелехин В. Ф., Павловский Е. Г. Вычислительные машины, системы и сети. М.: Академия, 2007. 560 с.
6. Хорошевский В. Г. Архитектура вычислительных систем: учеб. пособие. 2-е изд. М.: МГТУ им. Н. Э. Баумана, 2008. 520 с.
7. Косяков М. С. Введение в распределенные вычисления. СПб.: НИУ ИТМО, 2014. 115 с.
8. Амелин Р. В. Информационная безопасность. М.: Юристъ, 2010. С. 121.
9. Данилов А. Д. Ценность информации. Технологии информационного общества. 2011. № 10. С. 137–140.
10. Степанов Е. А., Корнеев И. К. Информационная безопасность и защита информации: учебное пособие. М.: Инфра-М. 2010. 336 с.
11. Фатьянов А. А. Проблемы защиты конфиденциальной информации, не составляющей государственную тайну. Информационное общество. 2010. № 5. С. 49–56.
12. Информационная безопасность. М.: Оружие и технологии, 2011. С. 35–44.
13. Рахман П. А. Коэффициент готовности системы обработки данных с основным и резервным узлами. Международный журнал прикладных и фундаментальных исследований. 2015. № 9. С. 608–611.
14. Половко А. М., Гуров С. В. Основы теории надежности. 2-е изд., перераб. и доп. СПб.: БХВ-Петербург, 2006. 704 с.
15. Шишмарев В. Ю. Надежность технических систем: учебник для студ. высш. учеб. заведений. М.: Издательский центр «Академия», 2010. 304 с.
16. Броневич А. Г., Лепский А. Е. Нечеткие модели анализа данных и принятия решений: учебное пособие. Нац. исслед. ун-т «Высшая школа экономики». М.: Изд. дом Высшей школы экономики, 2022. 264 с.