What are predictive modeling analytics

Making predictions from data: not so unlikely!

A look into the crystal ball: Predictive analytics in theory and practice

Predictive analytics is currently one of the most important areas of application for big data. It is a sub-discipline of business analytics that picks up where OLAP or reporting leave off. Instead of just analyzing the existing situation, predictive analytics tries to use data models to make predictions about possible future events. There is a close connection to data mining.

The volume of data in companies and in the private sector is already gigantic. Mobile devices such as smartphones continuously collect data on all possible environmental conditions. When surfing the Internet, each of us leaves an infinitely long digital trail. The Internet of Things (IoT) promises comprehensive networking of all everyday devices and production tools that surround us. At the same time, the modern knowledge society has to put up with the question of whether we actually use all this data actively or whether the usable knowledge has increased as a result. The answer to this question is not trivial. It is correct that today's possibilities for data evaluation, for compressing the data into information and thus for generating knowledge from it, are greater than ever before. But it is also correct that this new knowledge, which consists of the hidden connections in the data, does not appear in our minds' eye by itself. We need to research to bring it to the surface. We have to try to recognize patterns in the data world. In the last step, these patterns have to be interpreted correctly.

This is exactly where predictive analytics comes in. It is an attempt to make predictions about the future behavior of people or the occurrence of certain events. This is a very exciting topic that has the potential to have a lasting impact on and change the way we work in many areas of life. The underlying methods are developed on an interdisciplinary basis. The focus is on the new job description of the data scientist. You need extensive knowledge in the areas of data analysis, databases, mathematical and statistical models and of course software development. Regardless of the specific role you find yourself in, an understanding of the basic procedure is always necessary.

With this article we would like to provide a compact introduction to the topic. To this end, we shed light on the theoretical background and see what similarities and differences arise from other trend topics such as business intelligence or business analytics. But then the question quickly arises: And what can you do with it in practice? We use examples to examine which applications can already benefit from predictive analytics and where it is already being used. Is that interesting for us software developers? Yes, more than that! In the future, we will have to equip data-based applications with the appropriate methods, primarily through access to the cloud. To master this, it is more than helpful to be familiar with the subject.

Predictive Analytics: in a nutshell

Predictive analytics is primarily concerned with the prediction of the probable future. An example: The idea is that, thanks to such analyzes, companies know in advance how customers are most likely to react. What products will you buy? What price are you willing to pay for it? When will you show your interest in buying with the dealer of your choice? With this information, the dealer can plan better. He can keep the right products in stock. Customer satisfaction grows and sales can be increased. If the predictions are correct, it is a tool to survive in the competition or even to set yourself apart from the competition. We will come to other uses of predictive analytics later.

How does this work? People cannot look into the future! Is that witchcraft? Clearly not! There are methods that make it possible to predict future trends and developments from past data. Please note: Predictive analytics is not just about collecting the relevant data, but rather about analyzing it and drawing the right conclusions from it and making decisions and acting accordingly. People who have dealt with questions of market research and related statistics in the past will rightly object at this point that the underlying methods of data evaluation and the resulting analyzes are not new. That's right. What is new, however, are the possibilities of having the relevant data available and being able to access them. Until now, expensive primary surveys often had to be carried out to answer individual questions, but today we have a lot of data at hand without any further action. Mass databases, data from social platforms and data from IoT devices are just a few examples. These data are just waiting for an analysis, called a secondary investigation.

It is a fact that predictive analytics is becoming more and more important in the course of digitization and is even becoming a driving force behind innovation. Data mining is the basis on which predictive analytics is based. Regression analyzes, clustering, neural networks and association analyzes are the classic methods of data mining. Predictive analytics also uses statistical calculations, machine learning, artificial intelligence (AI), elements of game theory and methods of operations research such as optimization calculations and simulation processes. A whole lot of math, statistics, linguistics, text mining and text analytics are behind the inconspicuous term predictive analytics.

Predictive Analytics as a part of Business Intelligence

Predictive Analytics as a sub-area of ​​Business Intelligence Predictive Analytics is viewed as a sub-area of ​​Business Intelligence (BI). Where does BI end and where does predictive analytics begin? BI includes a number of architectures and technologies that turn raw data into meaningful and useful information. In summary, it includes procedures for the systematic collection, evaluation and presentation of data. BI tools also offer great help for the interpretation of extensive data collections. Business analytics includes queries, reporting, and online analytical processing (known as OLAP). BI mainly provides answers to questions of the following types:

  • What happened: "Which products were primarily bought by which customers?"
  • How often has a certain connection occurred: "Which products were in particularly high demand in the past, for example before Christmas?"

So BI is mainly about the creation and maintenance of data warehouses. In this way, operational and strategic decisions can be better prepared. The application of BI often provides a good basis for achieving company goals more quickly. In this way, business processes can be made more profitable, costs can be reduced, risks can be minimized and added value can be improved overall. However, it is important not to forget that the usual methods of BI initially only focus on data from the past. Which decisions can be derived from this, d. H. The way in which the data is updated into the future or what conclusions are drawn from it is usually left to the BI users themselves.

Predictive analytics goes a decisive step further. The term predictive analytics is geared towards the creation of forecasts and is closely related to the term advanced analytics (box: “Definition of terms”). This is understood to mean the automatic discovery of patterns in structured and unstructured data. It means going beyond the limits of BI. The question is no longer “What happened?” But “What will happen?”. It is an attempt to predict the future. This can affect the behavior of individuals or groups or the occurrence of certain events. In other words, it is about the optimization, the correlation and the forecast of the next best action or the most likely next action.

Delimitation of terms
The terms relating to modern and advanced analysis techniques are not always used clearly and without overlapping. The following brief definitions should help with the classification:
  • Predictive Analytics deals with the probability of a possible future event occurring.
  • Prescriptive Analytics goes one step further than predictive analytics and provides suggestions based on predictive analytics. This form of analysis provides additional explanations as to why an event will occur in the future and recommendations on how to react to such an event.
  • Descriptive Analytics In contrast to prescriptive analytics, it deals with the past. The goal is to learn from past events in order to make better decisions in the future.
  • Advanced Analytics: This includes processes such as predictive analytics, data mining, big data analysis and location intelligence. It is therefore an umbrella term for various analysis methods to filter information from data and thus to lay the basis for knowledge.

For predictive analytics you need software. Software that can handle mass data. Much of that happens in the cloud today. Services from AI providers and other service providers are used to evaluate the data and generate the patterns. illustration 1 shows a classification of BI software based on the dimensions of complexity and degrees of freedom for users. It becomes immediately apparent that standard reporting tools are easy to use, but hardly offer any design options for adapting the analyzes to a specific question. Special tools for advanced and predictive analytics open up the greatest scope. But their complexity is also many times higher. Users need to understand the interrelationships in order to use these tools to unearth new knowledge.

Predictive Analytics in Practice

Before we get to the bottom of the technology and processes, let's take a look at other areas of application of predictive analytics. In addition to the example from the introduction, i. H. the prediction of the future buying behavior of customers, the approach is already used today as follows [cf. 4]:

  • Fraud detection: The spectrum is broad and ranges from the identification of a duplicate or incorrect invoice to manipulation of balance sheets. Specially developed algorithms aim to automatically detect these irregularities.
  • Identification of dissatisfied customers: The aim is to specifically prevent the churn of customers with an adjusted price setting or new offer packages. An example: A customer who frequently calls the call center of a cellular network provider is identified as "at risk". Attempts are made to prevent his “migration” with special offers tailored to his needs.
  • Prediction of the time of maintenance of devices and machines: This use case is also known under the keyword predictive maintenance. The motto is: "[T] he failure-related repairs should be replaced by preventive maintenance". Special algorithms constantly try to analyze and monitor the behavior of the machines concerned by including historical data. The best possible time for the next inspection is calculated and this date should be before the probable failure of the machine. The choice of date should also be more intelligent than simply determining it by the passage of time or production volume. Background: Both a machine failure and too early maintenance cost unnecessarily time and thus cause avoidable costs.
  • Reduction of rejects: We are talking about predictive quality here. The aim is to identify defective products at an early stage and remove them from the production process. This is about predictive quality management.
  • Identification of upselling potential: From the previous behavior of a customer, the individual potential for the sale of further products to this customer is calculated through the use of algorithms. This leads to answers to the following questions: Which customers are worth calling? Which customers are likely to respond better to letters and with whom are you more successful with an email?

Another interesting application of predictive analytics are smart apps (box: “Smart apps with predictive analytics”).

Smart apps with predictive analytics

What are smart apps? Smart apps show the user information appropriate to the context. There are basically two ways to make an app smart:

  • The graphical adaptation of the user interface to the user context, e.g. B. Large and clear symbols for children. In this case, the user interface is specified by the developer and remains static.
  • The content adjustment of the app: The possibilities of the app change from user to user. For example, the app offers different operating sequences depending on the situation.

Ease of use is a key objective in both cases. With the help of predictive analytics, attempts are made to make even more extensive adjustments, i.e. to predict the behavior of the user and thus improve user-friendliness. An example: Melanie Müller regularly buys a second-class ticket from Hamburg to Berlin on Friday evenings using a travel app. One suggestion for a smarter app with a forecast function is: The app uses the geolocation to check whether Melanie Müller will be near Hamburg Central Station on Friday evening. If so, the customer will be spared a longer interaction in the app with several booking steps and the app immediately offers an option to buy the train ticket from Hamburg to Berlin with one click. In this way, the ticket purchase is designed individually for the user. The day of the week, time and geolocation factors are the triggers for the specific “smart” proposal.

Use in practice

Of course, the question arises how often and to what extent predictive and advanced analytics are actually being used in practice today. The BARC application study provides the following in Figure 2 results presented.

We comment on the result as follows: We are still at the beginning. According to the study, only 10 companies, or 5 percent of the 210 companies surveyed, use advanced and predictive analytics frequently. Almost 45 percent of the companies surveyed plan to use such advanced analyzes in the short or long term. Interestingly, almost all of the companies surveyed consider predictive and advanced analytics to be very important for the future and want to take steps in this direction soon. The areas of application are mainly in the following areas: finance, controlling, IT and management. Other possible fields of application are marketing, sales, research and development, logistics, production, procurement and human resources. In the following section we give a brief insight into the technology of predictive and advanced analytics.

The role of big data

We have heard of big data many times. Everyone talks about it, but the classification and meaning is often not clear. And what does big data have to do with predictive analytics? Everything in order! The term Big Data stands for two aspects: on the one hand, it is about the ever faster growing mountains of diverse data and on the other hand it is about IT solutions and systems that help companies cope with this flood of data. Due to the increasing complexity of the data structures, the classic business intelligence structures are reaching their limits and companies are facing new challenges. The term was first introduced in 2001 by Douglas Laney. Laney himself defines big data as “data with a large volume, a large variety of data formats and a high speed of data generation.” Big data is thus to be understood multidimensionally and does not just refer to the often assumed amount of data to be processed. We can therefore describe big data using the following five dimensions (Fig. 3):

  1. Amount of data: Not only is the amount of data flowing into the company from outside growing, but also the amount of self-generated data. Big data are those datasets whose size exceeds the capabilities of the typical database software. Amount of data: Not only is the amount of data that flows into the company from outside growing, but also the amount of self-generated data. Big data are those datasets whose size exceeds the capabilities of typical database software.
  2. Data diversity: Nowadays, the data come from a wide variety of sources. The data structures and formats are therefore also very diverse. There are different languages, there are text and image files, there are data from different input systems and applications. Structured and unstructured data are increasingly mixing up.
  3. Speed: For companies, it is very important how quickly the data can be evaluated.The evaluation in real time helps to gain strategic competitive advantages.
  4. Data sources: Data origin is also an important aspect of big data. It is checked which data is already available in the company and which of it is already being actively used. Failure to use existing data can be a waste of resources.
  5. Complexity: Tremendous computing power is required to analyze poly-structured data.

As can be seen, Big Data represents an enormous challenge for companies. In this sense, Big Data is both a task and an opportunity and represents the basis for the application of advanced analysis methods as we describe them here. Big data is used in a wide variety of areas. Would you like an example? The Google Flu Trends is a project that was started by Google in 2008 and is unfortunately no longer supported these days. The idea was that people affected by the flu would often search the internet for related terms and symptoms. This enabled Google to estimate the current distribution of flu. In addition to the search term, the time and place of the search query were also taken into account. A daily trend map for the spread of the flu could be created for more than 25 countries. Such information was available more quickly than data from institutional monitoring programs, and measures to prevent and combat the flu could be implemented at an early stage. Today you can no longer find current data on the Internet, only values ​​from the past.

Predictive analytics and similar analysis methods can be based directly on big data, i. H. The existing data diversity in terms of volume and structure forms the basis for further data analyzes. It is not always possible to precisely differentiate between the two approaches, i.e. big data and predictive analytics. The transitions are fluid, with the focus of predictive analytics on the interpretation of the data for the future.

The technical implementation of predictive analytics

There are a number of software products that companies can choose from when implementing analyzes based on predictive analytics. These can be divided into the following four classes:

  • Application-specific solutions: These are mainly ready-made masks, predefined models and processes for use out of the box. In addition to the algorithms and procedures required for the application, there are also user interfaces specific to the department. Various tools with predefined templates, content and corresponding user interfaces are offered on the software market. However, the application-specific solutions reach their limits if the number of parameters is too high or if new case-specific variables have to be generated. In this case, there is a need to further develop and expand these tools.
  • BI tools with advanced analytics functions: Instructions are used here with the aid of formula editors and algorithm libraries. Statistics languages ​​and models are also integrated. BI tools are thus expanded to include the methods of predictive analytics.
  • Data mining software: Development environments are used to create individual data mining models. Algorithm libraries and statistical languages ​​are also used.
  • Generic development environments: In this case, generic development environments are used to develop individual applications. It is about the integration or the control of statistics modules. This approach undoubtedly offers the most options, but it is also very complex and requires extensive knowledge of the methods behind it.

The implementation of predictive analytics with the help of extensions to BI software and data mining tools is the most widely used technology today. Every second company uses this variant.

Predictive analytics tools

According to a list by Predictive Analytics Today, the top 20 predictive analytics software tools are RapidMiner Studio, KNIME Analytics Platform, IBM Predictive Analytics, SAP Predictive Analytics, Dataiku DSS, SAS Predictive Analytics, Oracle Data Mining, Angoss Predictive Analytics, Microsoft R. , Minitab, Microsoft Azure Machine Learning, TIBCO Spotfire, STATISTICA, Anaconda, Google Cloud Prediction API, AdvancedMiner, DataRobot, Alteryx Analytics, ABM and HP Haven Predictive Analytics. We have compiled some information on selected tools in Table 1.

RapidMiner StudioAn environment for machine learning and data mining, it contains more than 500 operators for all tasks of knowledge discovery in databases. RapidMiner was written in Java and can be used on all common operating systems.
KNIME Analytics PlatformKNIME is the abbreviation for "Konstanz Information Miner". It is free software for interactive data analysis. It has a modular structure (pipelining concept) and therefore enables the integration of numerous machine learning and data mining processes.
IBM Predictive AnalyticsA tool developed by IBM that consists of the IBM SPSS Modeller and IBM SPSS Statistics. IBM SPSS Modeller is the predictive analytics platform that provides predictive information for decision-making. IBM SPSS Statistics takes care of everything from planning to data collection and analysis to reporting and implementation.
SAP Predictive AnalyticsA tool for predictive analytics developed by SAP. SAP Predictive Analytics includes the automation of data preparation, creation of forecast models, use of the extended visualization functions and the use of predictive scoring for a variety of different target systems.

Table 1: Overview of predictive analytics tools


Data, data and more data. Data is the capital of knowledge-based companies. They form the basis for good business decisions and can become a competitive tool. Advanced analyzes like predictive analytics try to predict the future. The basis for this is a careful analysis of previous behavior. It should not be forgotten that this approach only works if the data structures actually contain patterns. This is certainly the case with many natural phenomena and behaviors of individuals and groups. Our behavior is therefore often easier to predict than we think. However, the last step is always common sense, because every prediction is and remains a statistical analysis.

Predictive analytics always works with a wide variety of data from different sources. Often personal data is also used. The relationships determined are not always intended for all eyes. Companies that are getting started with predictive analytics should never disregard data protection. Not everything that is possible is allowed and certainly not ethically justifiable.

[smartblockm id = 150695]

Our editorial team recommends: