4.1 Analysis of database optimization requirements
The requirement of database optimization is the state information of each node of the database, such as CPU utilization, load degree of the database system and internal storage, and the performance module of database optimization is transmitted in real time through data. The most important part of database optimization is to accept the transmission of dynamic data, which involves several applications: the form of real-time data transmission; Data dynamic sending address and carrier; Transformation curve form of real-time data, etc. The database optimization process is shown in Fig. 2.
As can be seen from Fig. 2, the process of database optimization mainly includes: the overall use of the data set, the use of each terminal space, the load of each server node, and the use of storage and CPU. The load, storage and CPU usage are real-time dynamic data, and the overall usage of the data set can also be transmitted in real time.
4.2 Task processing model
For the problem between the server terminal and the edge value, a service terminal coordination mechanism is proposed. When the server terminal thinks that the edge value is not credible, it can only complete the task data of the server terminal according to the task load, computing capacity and storage status of the server terminal, and the remaining tasks are placed in the processing of the remaining edge values. The advantages of this idea are that the server terminal and the edge value state are simple to operate, the computational complexity is low, the decision can be completed in the limited numerical information, and the decision result can be guaranteed.
The modeling results of the task scheduling model between the server terminal and the edge value are as follows:
Object:
$$\text{m}\text{a}\text{x}\sum _{i\in J}\sum _{j\in {J}_{i}}{a}_{ij}\bullet {success}_{ij}$$
7
Subject to:
$${success}_{ij}=\left\{\begin{array}{c}1, {t}_{ij}^{finish}\le {d}_{ij}\\ 0, {t}_{ij}^{finish}>{d}_{ij}\end{array}\right.$$
8
$${a}_{ij}=\left\{\begin{array}{c}1, {o}_{ij}=1\\ 0, otherwise\end{array}\right.$$
9
$${t}_{ij}^{finish}={t}_{begin}+{wait}_{i}+\sum _{{q}_{ik}^{i}<{q}_{jk}^{i}}\frac{{c}_{ik}}{{f}_{i}}+\frac{{c}_{ij}}{{f}_{i}} , {a}_{ij}=1$$
10
$${X}_{i,{n}_{i}}=1,\forall i\in I,\forall j\in {J}_{i}$$
11
From the above formula, it can be seen that the task scheduling of each server terminal is independent and does not affect each other in the task scheduling of the server terminal and the edge value. The state arrangement of the task scheduling problem is performed on each server terminal. In the conclusion of traditional task scheduling, a heuristic computing method is proposed which can effectively solve the scheduling of server terminal and edge value tasks. This method is based on the optimized dynamic calculation form.
The first step to solve the problem by using dynamic calculation is to determine the equation of migration. The equation of state transition is:
$$\text{s}\text{u}\text{c}\text{c}\text{e}\text{s}\text{s}\text{D}\text{P}\left[j+1\right]=max\left\{successDP\left[j\right],\text{s}\text{u}\text{c}\text{c}\text{e}\text{s}\text{s}\text{D}\text{P}\left[k\right]+1\right\}$$
12
The standard for selecting the variable K is to ensure that the value of the successdp [k] before completion is maximized on the basis of the successdp [k], that is, the formula (13) is satisfied:
$$\text{k}\leftarrow {max}_{k}\left\{successDP\left[{k}^{{\prime }}\right]\right\},{k}^{{\prime }}<j+1\wedge totalTime\left[{k}^{{\prime }}\right]+{e}_{j+1}\le {d}_{j+1}$$
13
The modeling structure in the task scheduling of edge value is shown in (14):
Object:
$$\text{m}\text{a}\text{x}\sum _{i\in I}\sum _{j\in {J}_{i}^{off}}{success}_{ij}$$
14
Subject to:
$${success}_{ij}=\left\{\begin{array}{c}1, {t}_{ij}^{finish}\le {d}_{ij}\\ 0, {t}_{ij}^{finish}>{d}_{ij}\end{array}\right.$$
15
$${t}_{ij}^{finish}=\left\{\begin{array}{c}{t}_{design}+{trans}_{i}^{{o}_{ij}}+{wait}_{{o}_{ij}}+\sum _{{q}_{ik}^{{o}_{ij}}<{q}_{ij}^{{o}_{ij}}}\frac{{c}_{ik}}{{f}_{{o}_{ik}}}+\frac{{c}_{ij}}{{f}_{{o}_{ij}}},{o}_{ij}\ne cloud\\ {trans}_{i}^{{o}_{ij}},otherwise\end{array}\right.$$
16
$${trans}_{i}^{{o}_{ij}}=\left\{\begin{array}{c}\frac{{input}_{i}}{{B}_{i}^{{edge}_{i}}},{o}_{ij}={edge}_{i}\\ \frac{{input}_{i}}{{B}_{i}^{{edge}_{i}}}+\frac{{input}_{i}}{{B}_{{edge}_{i}}^{Z}},{o}_{ij}=z,z\in {E}_{{edge}_{i}}\\ {trans}_{i}^{cloud},otherwise\end{array}\right.$$
17
$${O}_{ij}\in {E}_{{edge}_{i}}\cup \left\{{edge}_{i},cloud\right\}$$
18
$${wait}_{{O}_{ij}}+\sum _{{q}_{ik}^{{o}_{ij}}<{q}_{ij}^{{o}_{ij}}}\frac{{c}_{ik}}{{f}_{{o}_{ik}}}\ge {trans}_{i}^{{o}_{ij}}$$
19
$${X}_{{O}_{ij},{m}_{ij}}=1,\forall {O}_{ij}\in {E}_{{edge}_{i}}\cup \left\{{edge}_{i}\right\},\forall j\in {J}_{i}^{off}$$
20
It can be seen from the formula that only when the completion time is less than the deadline can the task be considered successful. However, if the data transmission time and service time under non edge computing are less than the deadline, it can be judged that the task is completed. In the formula, if the edge value of the server terminal connection of the task cannot be compared with the deadline, the task will be migrated to the database system for optimization. The task transfer market is also the database system time given by the migration task, so the time that the migration task of the server terminal arrives at the database system is cloudtransi as the migration cost.
4.3 Optimization analysis
First of all, the size and distribution of the table data are known results. Therefore, sample the data and calculate the size of the database system to obtain the growth multiple of the data. In order to ensure the accuracy and rationality of the data, the sampling operation shall be repeated for many times, and the final average result shall be taken,. When the amount of data is too large, the error of sampling can be ignored, and the actual storage size m of each node is:
$$\text{M}=\text{m}\text{i}\text{n}\left({\text{M}}^{{\prime }},{\omega }\text{*}\sum _{\text{i}=1}^{\text{p}}{\text{f}}_{\text{i}}\right)$$
21
Where M 'represents the size of the data stored in the node database, P represents the number of data when approaching M', fi represents the actual size of the file, and the unit is MB. Because f is larger than the actual available storage size m of the node, the optimization shall be performed in batches. The total number of batches t is expressed by formula (22):
$$\text{t}=⌈\frac{\text{F}}{\text{M}}⌉=⌈\frac{\sum _{\text{i}=1}^{\text{j}}{\text{f}}_{\text{i}}}{\text{m}\text{i}\text{n}\left(\left(\text{x}\text{*}{2}^{10}-300\right)\text{*}{\epsilon },{\omega }\text{*}\sum _{\text{i}=1}^{\text{p}}{\text{f}}_{\text{i}}\right)}⌉$$
22
The formula shows that after t rounds of database system optimization processing, the maximum amount of data to be imported is m, which is equal to its storage capacity. Read on K nodes, and each node has y multi-threaded equations, so all data can be transmitted through t rounds. In addition, to ensure the data volume f of the node, load balancing and other operations should be performed on the database.
The generated database optimization test data set is investigated. Compared with the traditional database system, the operator moves down indiscriminately and the operator moves down selectively. The comparison results of optimized data under the same data scale form and the same investigation operation form are shown in Table 2.
Table 2
Near data processing mode of operator down shift
Model | Traditional database | Only applies to storage engine down | Storage Engine Downshift + Streaming Programming Model |
Downward move of the operator | The operator does not move down | The operator part is moved down | operator selection down |
Investigation time (seconds) | 5020.57 | 621.11 | 334.70 |
System Power Consumption (KJ) | 215.24 | 32.85 | 16.98 |
From Table 2, it can be concluded that the survey time is in the two processing forms, and the survey time is moved down compared with the form of using the storage engine, which reduces the migration calls of operators and reduces the data transmission form; In terms of the optimized storage data mode and combination of database, the application of system power consumption function can reduce the power consumption of the system,and the combination of these two processing modes can reduce the power consumption more; In terms of stability, the two modes improve the investigation efficiency of the system in terms of reducing data transmission. For the optimization of the traditional database, the server should be buffered to the running mode to make the data transmission more stable.
4.4 Test results
The performance indexes detected in the form of database optimization are analyzed and compared according to the optimization process, and the comprehensive index values under the algorithm are obtained. The experimental results are shown in Fig. 3.
From the database optimization index in Fig. 3, it can be seen that the default database optimization algorithm has a great impact on the distribution of data. Once the data inclination is large and the development is unbalanced, it indicates that the load of the database is too heavy and the index rises. The load of other forms of databases is small, so the index is low.
The batch database optimization algorithm does not change due to the distribution of data. The data optimization indexes between each database are relatively balanced, so the performance of the indexes is relatively consistent. The comprehensive index analysis is shown in Fig. 4.
As can be seen in Fig. 4, the default database optimization index is generally high, and the batch database optimization index is generally low. Among them, database 3 has the most significant effect in database optimization.