Is there an example of a data analysis project

Predictive Analytics 101: A Beginner's Guide

Predictive analytics have become an integral part of the everyday life of data scientists. How important you are for the area, but also for companies in general, is illustrated by a brief introduction scenario:

In a few minutes you should be presenting the latest data analysis project to your company's C-level executives. You are proud of your analysis and confident because the data sets have the potential to influence new marketing campaigns, serve as the basis for bid material and drive sales. The data is in the cloud and is therefore easily accessible and interpretable. They even have a dashboard with visualizations that perfectly depict the immense power of the data set. The presentation can only be a success.

After five minutes, one of the C-level executives interrupts you and asks: "How will this data change in the future?" Before you can answer, there is another question: "How do we know that this dashboard really tells us everything?"

You pause, puzzled, and think. The data you presented to senior management is correct - your testing department has been reviewing it extensively for months. But can you really say whether and how this data will change in the future? The dataset and dashboard are just snapshots. Nobody can predict the future.

You don't have a crystal ball, but what if you came very close to valid future forecasts? Modern companies need more than one-off reporting. You have to minimize future risks, increase sales and customer satisfaction and optimize your processes. To achieve this, companies use predictive analytics across industries. To fully apply predictive analytics, organizations need to understand current applications, the interfaces with the cloud, and the science behind them.

What is predictive analytics?

Predictive analysis describes the collection and analysis of historical data in order to foresee future developments. The collection of several data sets creates the connection between different departments, company processes and data types (structured vs. unstructured).

However, the simple collation of several data points does not necessarily indicate future behavior. Predictive analysis leverages statistical techniques such as data modeling, machine learning, and even artificial intelligence to uncover patterns in big data.

While these patterns cannot accurately predict what will happen in the future, predictive analytics can identify trends, announce disruptive industry changes, and enable more data-driven decision-making.

Watch it now.
Watch now

Typical areas of application of predictive analytics

Any area in which data is being collected is suitable for predictive analysis. From improving cybersecurity and data security to developing more targeted marketing to strengthening actuarial performance, anything is conceivable.

Predictive analysis in healthcare

Predictive analysis is used especially in the healthcare sector. A major problem in healthcare is predicting patient risk. Actuarial teams need to determine optimal insurance rates and government reimbursement requests for members with a variety of health issues.

Because of this internal need, health insurers were at the forefront when it came to introducing big data. Actuaries use predictive analytics to determine:

  • The likelihood that a patient's health will deteriorate
  • Probability that a patient will take part in sponsored wellness activities or take advantage of health treatments

Predictive analyzes enable health insurers to examine risk patterns in patients with the following similar characteristics:

  • Age
  • health status
  • social health determinants

With this information, health insurers can make more informed financial and ethical decisions.

Predictive analysis in finance

Lending, an essential part of the financial services sector, has also been revolutionized by predictive analysis. Before a bank issues a loan, it wants to make sure that the customer is trustworthy. After all, they want their money back. But how do insurers measure this trust?

Up until a few years ago, insurers assessed applicants based on previous experience and personal intuition. The verification process involved checking the applicant's history and debt-to-income ratio to come up with a complicated interest rate. However, with the advent of new financial legislation, lenders have had to develop a more statistically relevant method for the insurance business.

The banking industry went through a revolution when predictive analytics models such as VantageScore and FICO Score became available. These models allowed lenders to calculate accurate, risk-based interest rates and limited subjective elements. Rather than basing interest rates on a few outdated metrics, the VantageScore and FICO Score models are based on data from millions of borrowers with similar spending trends.

Predictive Analytics: Three Practical Examples

After the hypothetical uses of predictive analysis, let's look at a few real use cases.

1. Improve patient care

CenterLight is a managed care organization with 13 locations in New York that provides services for the disabled, the elderly and the chronically ill. For years CenterLight used an in-house system to manage its data. However, this system did not keep pace with the requirements of the constantly changing compliance guidelines and the various options for patient treatment. It became increasingly impossible to track patient progress and manage patient care.

For this reason CenterLight used the predictive analysis effectively in their data warehouse, which successfully implemented data integrations from the CRM system (Salesforce), the in-house system eCHAMP as well as from other service and provider databases. From this, the business intelligence team was able to identify patterns in the behavior of nurses, which increased member loyalty and better prepared patients for their examinations.

The predictive analysis helped CenterLight save time and money while effectively managing the care of its members.

2. Addressing users with the right recommendations

Lenovo is a technology company with customers in over 160 countries that makes computers and smartphones. Lenovo realized that innovative products were not enough to stand out in a highly competitive industry. The company had to develop new product categories to improve the customer experience.

To be particularly effective, Lenovo set out to use data sets to understand customers' needs. This enables the technology company to outline customer expectations, behaviors, and preferences. For this purpose, Lenovo developed a channel-independent and predictive analysis method that enables data to be obtained from a large number of touchpoints in real time. This predictive analytics model helped Lenovo improve the customer experience and increase revenue per unit of sale by eleven percent.

3. Take a 360-degree view of the customer

Air France-KLM is a global company leading the market in its three main businesses - passenger and cargo transportation and aircraft maintenance. With 90 million customers per year and 2.5 million monthly website visitors, data management is a top priority for Air France-KLM to ensure customer satisfaction.

Air France-KLM developed a 360-degree customer approach based on predictive analysis based on their data. From the provision of the complete customer history for all call center employees to the sending of targeted promotional offers to the introduction of chatbots in customer service - the company created an outstanding curstomer experience by anticipating potential customer needs. Air France-KLM even went a step further and also identified the main stress factors for customers. Based on this knowledge, it was possible to develop a proactive plan of action to level a large number of potential problems on the customer side.

This is how predictive analytics works

Predictive analysis is not magic, it just boils with water. They are based on statistics and predictive modeling is about assigning a certain weight or value to existing variables in a large data set. Data scientists can use this value to calculate how likely it is that a certain event will occur in the future.

There are two main approaches to statistical modeling used in predictive analysis: classification models and regression models.

Classification models in predictive analytics

Classification models are typically binary. An example: You are interested in registering as a member at CenterLight. A classification model predicts whether a member will stay with CenterLight or opt out within a certain period of time and based on certain criteria.

Regression models in predictive analytics

Regression models differentiate more strongly. Instead of a 0 or a 1, regression models predict an actual number. Let's stick with the CenterLight example: let's say a member has a BMI of 29. A regression model could predict that with a consistent and healthy diet, the BMI could drop three points over the next year.

Three methods of predictive analysis: decision trees, regression analysis and neural networks

There are several methods that data scientists use to construct classification and regression models.

Decision trees

Every branch or branch on the decision tree is to be equated with a decision. Each branch of the decision tree represents a possible choice between two or more options, while each leaf symbolizes a classification (a yes or a no). Decision trees are a popular method for modeling because they can handle missing values ​​and are easy to understand.

Regression analysis

Regression analysis is another popular modeling tool. They are used when the data is not binary but is continuous. Different data requests require different uses of regression. Linear regression is used, for example, when only one independent variable can be assigned to a result. If more than one variable could influence the result, multiple regression is most accurate. Logistic regression is an even more complex form that does not follow the same guidelines as linear or multiple regression. In contrast to the other two models, logistic regression is ideally suited when the dependent variable is binary. Let's look at the CenterLight example again: A logistic regression could answer the question of how the probability changes with each additional BMI value (continuous variable) that a member will have a heart attack (binary variable).

Neural Networks

Neural networks are considered to be the most complicated technology. The demand for this method continues to grow because perfectly linear relationships are rare. Artificial intelligence is used in neural networks, which is why more sophisticated pattern recognition is possible with them.

Even if these statistical methods are not new, their acceptance and use is increasing more and more. This can be attributed to the increasing popularity of the cloud.

Big data, the cloud and the future of predictive analytics

Before the cloud existed, predictive analytics seemed impossible. Computers didn't have the capacity to store petabytes of data, let alone enough processing power to run labyrinthine data models. The cloud offers companies the opportunity to create and combine several large data sets and to easily scale their models.

There are many emerging cloud-based predictive analytics tools out there. In the future, companies will be able to develop their own machine learning models thanks to the cloud. Advantages of the cloud: Computers that are able to find patterns in data make manual work superfluous and enable more extensive and more precise interpretations and projections.

The cloud also brings better customization options and more flexibility. With the rise of the Internet of Things in the cloud, predictive analytics tools could become even more granular in their assessment of people's everyday habits.

Modern predictive analytics software and tools

Since companies can access large data sets from the cloud at any time, there is a very high need for big data analyzes. The market for cloud-based predictive analytics tools is growing more and more. While having a team of experts to interpret the data models is essential, software is essential to reduce the time spent collecting, cleaning, and analyzing the data. Predictive analytics tools can process both stored and real-time data and also help with appropriate formatting.

In addition, most predictive analytics tools can be easily integrated with the ERP systems, digital analytics software, and business intelligence platforms that most companies already use. Business intelligence teams can also use predictive analytics software to visually show the value of predictive analytics using dashboards.

Talend offers big data software that can be used universally. Since Talend is an open source integration platform, it is versatile enough to support data preparation, data management and cloud integration. The first task for companies, once they have reached a certain level of maturity and have developed their own predictive analytics process, will be to migrate their data to the cloud.

Are you ready to get started with predictive analytics? Try Talend Data Fabric today and transform your business data.