Why is data science important

Career & Salary

In an interview with COMPUTERWOCHE, manager Holger Clever and his data scientist Achim Strunk are convinced that data scientists are in top form when they have experienced and worked on their employer's core processes themselves.

What are the skills of a data scientist?

Holger Clever: "At first you automatically think of programming knowledge, well-founded know-how in statistics and mathematics as well as technical expertise. In practice, however, another element is important to us: the communicative component. This aims to ensure that a data scientist usually develops models that be able to develop, change or even replace entire business processes if they are suitable.

But it is utopian to believe that this will happen in your own private room. This only works if the data scientist is in constant contact with his professional colleagues and can make it clear to them what knowledge he has found and why it has an impact on business processes. We have had the best experience with us when the analysts experience and work on the core process themselves: energy trading.

The question of what can be done better, more efficiently or more automatically can only be answered by someone who has been behind the wheel and is aware of the possible deficits. A withdrawn, poorly communicative analyst will find it difficult to achieve progress and recognition. The ability to abstract the results on a level that enables decisions is, in our view, one of the most important skills of the data scientist, which distinguishes him from the data engineer. "

What is the everyday life of a data scientist like?

Achim Strunk: "The everyday life of a data scientist may look dry at first glance, as much of the research into suitable methods for answering a specific question consists of extensive data preparation, the development of program code and the evaluation of the results. At first glance, the work differs not too much of a software developer. In the ideal case, the data scientist has a comprehensive view of the company's strategic and operational processes and has insight into product development or even significant influence on it. Thus, his participation in project management is also part of his Everyday. "

What know-how does he need in everyday life?

Stalk: "The data scientist should develop a comprehensive technical understanding (" domain know-how "). In order to pour the resulting product and solution ideas into appropriate methods and models, additional skills from software development or at least the operation of appropriate tools are required the possibility of assessing statistical methods and evaluating and assessing their results.

Due to the multitude of options for linking data with machine learning methods, a good overview of the existing tool landscape and command of programming languages ​​is a prerequisite. Continuous training in this regard and the freedom to just try things out, in our opinion, harbor more opportunities for the company than risks. Of course, this also requires companies to be willing to allow undesirable developments and not to book them as failures, but as a learning process. "

What skills does the data scientist need for his position in the company? Which ones does he learn on the job?

Stalk: "Robust training in statistics is certainly the most important basis for the successful utilization of data. Compared to the situation ten years ago, a data scientist can almost lie in a nest when it comes to code development, as there is now a wide range of established tools Of course, a talent for programming is important. However, the programming languages ​​or working environments used in the company differ greatly, so that this is certainly an aspect in which a company has to invest in a data scientist.

Of course, this applies in a similar way to the subject-specific know-how. The latter becomes all the more important, the more the intuition is required for the appropriate solution in the model development. On closer inspection, this is also one of the core tasks of a data scientist: to find the Pareto optimum of the solution. In times of the diversity of machine learning methods from linear regression to deep learning, this represents a great challenge. After all, the complex, non-linear battleships such as neural networks are able to reduce almost any problem with the appropriate data; but on the other hand they only provide black box methods, which means that the relationships that have been learned are difficult or impossible to extract.

This in turn makes it more difficult to generate new "domain know-how" on the basis of which simpler and easier-to-implement algorithms for subsequent problems can be developed. A good data scientist must therefore be able to select the right methodology for the problem to be solved on the basis of his experience: as complex as necessary, as transparent and robust as possible. "

In which areas, for which topics and at which point within the organizational structure is the data scientist deployed?

Clever: "Due to our size, we are far from being able to answer this for large corporations. Due to the large bandwidth of the data scientist between data engineer on the one hand and product manager on the other, the data scientist is certainly often used as a project manager. This is especially true for areas of business intelligence. "

How does the cooperation with the IT department look like?

Stalk: "Without close networking with the IT department, the work of data engineers and researchers cannot succeed. This begins with the provision of the work materials and environments, including computer capacity and operating system, which often look different from those of the 'ordinary' employee of a company .

Since, in our experience, new requirements often arise within a project, uncomplicated and unbureaucratic support from the administration with regard to resources, software used and available tools is essential. With the appropriate implementation or deployment of new data models, especially with automated solutions, the support of experienced software developers is required, who are often in the IT departments. Here, too, a close exchange between IT and data scientists is required, since the IT framework conditions always have an influence on the Pareto optimum mentioned above. "

What conclusions does the management draw from the work of the data scientist?

Stalk: "A solution resulting from the new use of data models must first of all prove historically true and demonstrate its potential. In most common applications, such a backtest is possible if sufficient data is available to evaluate the general representativeness of the relationships often learned from past data for the future.

If an algorithm turns out to be permanently useful in the past, the management's decision to roll it out is of course easier. With prognostic models in particular, the data scientist has the crucial task of supporting management in evaluating the validity of the 'past = future' hypothesis. In the best case scenario, the models can also be tested live on a 'small scale' before a real migration takes place. "

Clever: "Which conclusions the management draws from the results of the data scientist naturally depends on the respective project and the interrelationships resulting from the data. Especially with simple algorithms of the machine learning, which bring a technical knowledge gain with them, the corresponding conclusions of the management come If, on the other hand, the core message is "it works great, but we don't know exactly why", there is always a certain amount of skepticism about the solution presented.

The honest communication of the strengths and weaknesses of a solution and the associated opportunities and risks for the company represent one of the core challenges. A trust in the methods and algorithms built on the basis of this communication and presentation promotes the willingness to replace the old with the new in the best sense of the word replace."