Write to us
Leave your contacts and question, and we will contact you within one working day’s time.
*
*
*
*

Eastwind Develops DataFlow Module

21 june 2018

Data scientists will be able to work with analytical models directly on the Hadoop cluster.

 

SUCH A PAIN

 
Data is the fuel to the business of today. Information is what a company’s marketing, customer care, process optimization, market situation control and many other things rely upon. How fast a data scientist works, how accurate and timely their predictions are defines how a business and its individual elements will be developing. As long as the analyst can rely on his proven and familiar tools, no external factors can affect his work. But as data becomes too big, it is fed to Hadoop. At this point, handling large data volumes often runs into a problem of interaction between dev-ops and analytics departments.
 

BAD NEWS:
FOR A MAJORITY OF COMPANIES, DATA SCIENCE VS DEVOPS IS A MESS

 
The problem arises when a single task – say, customer analysis – has to be tackled by multiple specialists. Each professes a stack of tools with deliverables that are difficult to transpose. They code indifferent languages using different approaches. For instance, a data scientist of a telecom operator may build a brilliant analytical model working with a sample – on her local machine, in Python. But when she brings it a DevOps guy to apply it to the entire subscriber base, he may have to rewrite the whole model, say, into Java, and run to on Hadoop. Most likely, that would not be the first time such a thing happened. Often, you cannot rewrite the model exactly - for technical reasons. Add miscommunications between the workers, who literally speak different languages ​​(of programming). All the things factored in, the task gets seriously delayed, the quality of the deliverables suffers, and everybody’s nerves are all but frayed.
 

GOOD NEWS: WE KNOW WHAT TO DO

 
EW DataFlow helps a data scientist to handle data on Hadoop, bypassing DevOps. The module is connected directly to the cluster and displays all the relevant data in an intuitive UI. In this way, EW DataFlow acts as a Hadoop adapter. The comfortable and friendly environment allows a data scientist to manipulate big data using familiar tools – quickly and without middlemen. All is left to do for developers is roll out the system. The module has clustered computation tools under its hood, but the data scientist will use Python to write all of the code in the UI.
 

WHAT DATA SCEINTIST TASTS ARE ENABLED BY EW DATAFLOW

 
  • Connecting new data sources,
  • Any type of data processing (sampling, analysis, model building, monitoring, etc.),
  • Launching models into operation and fine-tuning,
  • Instant trouble alerting and trouble-shooting,
  • Managing all the computations in the cluster,
  • Exporting the deliverables into files. 
 

POSSIBLE APPLICATIONS

           
Exported data files can be fed to any analytical systems. We offer two options of EW DataFlow module delivery: as a standalone product – for those who already work with data manually or in an automated way, or together with the EW Social Analytics platform for those who need a comprehensive analytics solution.
 
Previously, whenever we had data problems on Hadoop, we used to have two people work in the task – a data scientist and a developer”, says Pavel Olifer, head of social analytics at Eastwind. – The company was wasting time and money. We created EW DataFlow to prevent that from happening. With the module, the work of the data scientist on the Hadoop cluster is transparent. You write the code, you start it, and you monitor it all by yourself. You can correct it if need be. After all, business analytics must be quick and relevant. That’s the only way it can be effective and profitable.”
 

HOW DOESN’T NEED EW DATAFLOW

 
  • Those who do not work with data on Hadoop.
  • Those who foster universal workers (teaching analysts development, and teaching developers analytics) and are prepared that they will be constantly lured by competitors.
  • Those who have a custom solution to a problem, and are prepared to keep it actualized – because Hadoop is constantly changing. 
  • Those who do not pursue efficiency and speed of analytics. 

If that is not about you, find more and order a product presentation at Eastwind’s website.
 
More News