Big Data and Hadoop in a new way: about Big Data and BI development trends
27 march 2023
On March 22, the Big Data & BI Day 2023 conference was held in Moscow and was dedicated to modern approaches to Big Data analytics and their use in business. About a hundred guests came to listen to the speakers and learn about the development of Big Data & BI, including Eastwind representatives.
Even though the event was held under the auspices of Business Intelligence, the Big Data section was filled with quite interesting presentations. Based on them, we’ve found several major trends in the technology.
For example, NLMK has built all of its processes in-house, starting with Big Data analytics and building machine learning models, and ending with output into production and further support. Growing competencies and working as closely as possible with open-source solutions is the company's choice. This is not least dictated by the desire to save the budget.
UBRD (The Ural Bank for Reconstruction and Development) case was quite interesting as well. They created a cross-selling bank service based on the Big Data infrastructure in the cloud. In fact, the company's specialists have abandoned the classic approach of buying their own hardware and instead put the calculations in the cloud platforms. The format is as follows: you quickly deploy a Hadoop cluster in the cloud, perform calculations on it, and immediately close it down.
You get a switch effect: when you need it, you turn it on, when you don't need it, you turn it off. In theory, this leads to cost savings, but not all that simple:
The new paradigm means that there are two separate subsystems: storing and computing. On the one hand, this requires more resources - hardware, in particular. On the other hand, the system becomes more stable.
It looks like this: storage servers must have a lot of disks, a few cores, and some RAM. The data in these servers is divided into three categories according to how often it is accessed: cold, warm and hot. When it is necessary to perform calculations, data from the storage subsystem goes up into RAM, but not into its own memory, but into the memory of the computing subsystem. The compute nodes already have more RAM and cores, and the calculations go at high speed.
What this approach gives:
We will keep our hand on the pulse of the development of AI and Big Data so that high-tech becomes part of the information field.
More News
Even though the event was held under the auspices of Business Intelligence, the Big Data section was filled with quite interesting presentations. Based on them, we’ve found several major trends in the technology.
Developing Big Data in-house
One of the trends for large companies in the Big Data analytics was the desire to develop their own teams. It is noticeable that businesses want to keep all stages of development and implementation under control, and at the same time keep all the data on their side.For example, NLMK has built all of its processes in-house, starting with Big Data analytics and building machine learning models, and ending with output into production and further support. Growing competencies and working as closely as possible with open-source solutions is the company's choice. This is not least dictated by the desire to save the budget.
UBRD (The Ural Bank for Reconstruction and Development) case was quite interesting as well. They created a cross-selling bank service based on the Big Data infrastructure in the cloud. In fact, the company's specialists have abandoned the classic approach of buying their own hardware and instead put the calculations in the cloud platforms. The format is as follows: you quickly deploy a Hadoop cluster in the cloud, perform calculations on it, and immediately close it down.
You get a switch effect: when you need it, you turn it on, when you don't need it, you turn it off. In theory, this leads to cost savings, but not all that simple:
- According to UBRD's own calculations, it is profitable to invest in the cloud on a horizon of up to five years. After that, the cost of a cloud subscription becomes equal to the cost of in-house hardware.
- As for rapidly deploying a Hadoop cluster in the cloud, this will only work if the amount of data is measured with gigabytes. Terabytes, much less petabytes, will not work quickly in the cloud.
Modifying Hadoop
Another trend is the desire to separate data storage and data processing in Big Data processing systems. While classic Hadoop has multiple nodes that simultaneously store data and do the computation, there are now attempts to do things differently.The new paradigm means that there are two separate subsystems: storing and computing. On the one hand, this requires more resources - hardware, in particular. On the other hand, the system becomes more stable.
It looks like this: storage servers must have a lot of disks, a few cores, and some RAM. The data in these servers is divided into three categories according to how often it is accessed: cold, warm and hot. When it is necessary to perform calculations, data from the storage subsystem goes up into RAM, but not into its own memory, but into the memory of the computing subsystem. The compute nodes already have more RAM and cores, and the calculations go at high speed.
What this approach gives:
- High fault tolerance is provided for the disk subsystem.
- For the computing system, it provides the maximum performance.
Conclusions
It is still an open question how well these emerging trends will be accepted in the rapidly changing world of Big Data. So far, we see that there is a desire to develop proprietary solutions.We will keep our hand on the pulse of the development of AI and Big Data so that high-tech becomes part of the information field.