Correctly laid down architecture of any system avoids many problems, especially such as horizontal and vertical expansion, the addition of new functions, and minimization of operating errors.
One of the important stages in the design of an information-analytical tool is the definition of a set of components. So, to implement high-quality solutions for building an aeromonitoring databank, one should use the following minimum set of components and nodes:
-
Web crawlers. These components are responsible for collecting data from third-party resources that do not have an API to interact with them.
-
Data and custom queries processing server. The name of the resource speaks for itself, this server processes the data received from web crawlers and places them in the database, and also processes user requests stored in the database, bringing them to the form that is necessary for display on the client side, thereby saving network traffic without transferring redundant data over the network, and reducing the load on the computing resources of the client.
-
Server messaging through the queue. This node of the system allows you to centralize the collection, transmission and processing of a large number of messages in continuous information flows, as well as to store this large data in a kind of intermediate storage, without worrying about the risks of their loss and system performance.
-
Server(s) databases. The heart of the system itself provides data storage. Responsible for the consistency and availability of data.
Among the features of the above architecture (Fig. 1) the following ones can be distinguished:
no need to use additional software on the client side, which allows automatically implement a multi-platform client side;
the ability to connect an almost unlimited number of clients;
with a single storage location and a database management system, the minimum requirements for maintaining data integrity are met; and
regarding the amount of data - the architecture of web systems does not have significant restrictions.
To ensure the filling of the databank with the help of web crawlers, first of all, it is necessary to highlight the main entities necessary for building a model, these entities will allow to keep track of daily flights on various routes, calculate the amount of emissions for each flight based on the type of aircraft, as well as view statistics in the context airlines.
In addition to the data necessary for accounting for flights and calculating the amount of emissions, it is correct to organize the maintenance of geostructural data in relation to pollution. This data is required to be displayed to the user on the website.
In this case, we use air corridors of flights rather than static geo-squares as the basis for displaying data on a future visualization map. Since flights are strictly limited to a specific flight corridor, it can be assumed that all aircraft on the same flight have the same flight path. In fact, the flight path may differ, but these differences are not significant, they are leveled by the scale of the emission dispersion, and remain within the air corridor of the flight. Therefore, the following entities can be distinguished:
Taking into account the above presented structure of entities and the need to quickly provide information to the client, in this case, you should use several DBMS.
On the one hand, the document-oriented MongoDB DBMS should be used as a DBMS that will store operational data obtained in real time using web crawlers [10–11], since it uses JSON-like documents in the data storage schema. It is also beneficial as a convenient tool for displaying data during web development, in particular, within a JavaScript-oriented stack. Thus, the structure of the composition of the entity's geodata entities that will be stored in such a DBMS must first be reformatted into JSON format, which will describe the flight trajectory of the reference path and contain information about all flights and flights along this path, as well as take into account information about the pollution amount for each flight.
On the other hand, the relational database PostgreSQL should be used as a long-term data archive. The latter also has its advantages as it supports the appropriate format and allows you to store spatial data.
As a preprocessing of data, in particular, for calculating the carbon footprint of an aircraft flight in the implementation of the prototype, a formula was established that is regulated by best practices used in the aviation industry, according to the existing methodologies of the International Civil Aviation Organization (ICAO) and the International Air Transport Association (IATA).
where:
-
E - the amount of CO2 emissions per passenger, measured in kilograms;
-
x - the flight distance, which is defined as the sum of the aircraft flight segments GCD (x [n]) = GCD (x1, x2, x3 ... xn), measured in kilometers;
-
S - the average number of seats, common for all cabin classes;
-
PLF - occupancy rate of passenger seats, measured as a percentage of occupied seats in relation to empty seats;
-
CF - load factor, is the percentage of unoccupied commercial load of the vessel;
-
CW - weight coefficient of the cabin class, measured as a percentage, represents the ratio of the weight of seats of different classes;
-
EF – CO2 emissions when fuel is burned by an aircraft, measured in kilograms of fuel consumed;
-
M - the multiplier takes into account potential effects not related to CO2, such effects are the fuel consumption of the engines.
-
P - the amount of CO2 emissions to start the aircraft, measured in kilograms of fuel consumed;
-
AF - aircraft weight is measured in kilograms;
-
A - CO2 emissions from airport infrastructure, measured in kilograms.
The ax2 + bx + c part is a non-linear approximation of the function f(x) + LTO, where LTO is the fuel consumption during landing and takeoff, including taxi to the runway, measured in kilograms. The long distance is defined as short x < 1500 km, and long distance - x > 2500 km. A linear interpolation is used between them.
For data visualization, a site has been developed for the architecture shown in Fig. 2.
It is based on a self-developed site created on the basis of HTML 5, CSS markup language and additional JavaScript modules. To process custom queries and display results on an interactive map, a REST API was written in Flask, which allows you to link queries to MongoDB, PostgreSQL databases and the visual part using the Leaflet library. Apache Kafka message broker is used to load and unload data from REST API content to databases and vice versa [12]. This tool acts as a smooth and convenient exchange of messages between microservices, which in turn ensures the stable operation of data streams in real time, and the reliability of their receipt by the requested party [13].