How and why we added predictive analytics to myTracker

News feed

Today we are going to look at the way predictive analytics works, how myTracker team developed a proprietary predictive analytics tool, and, most importantly, why it is useful.

Why predictive analytics

Many businesses have long been aware that analytics is essential for any project. In our age of information, the more data and opportunities for analysis you have, the stronger foothold you gain. A lot of major companies and governments gather petabytes of data as a foundation for achieving their objectives, i.e. to identify the future course of development or potential risks.

Analytics is a ubiquitous tool in marketing, finance, and product development, where decisions are hardly ever made without using data already accumulated and available.

It is now fairly obvious that the future belongs to algorithms, machine learning, AI, and Big Data. For example, the city of Boston has already replaced humans who used to draw school bus routes with algorithms. In 30 minutes, the algorithm created a system-level route map that was 20% more efficient than the ones done by hand. People had been unable to do the same even after spending thousands of hours. As a result, USD 5 m was saved, while the fleet and carbon dioxide emissions were reduced.

As a logical next step, predictive analytics was introduced as a tool relying on a large volume of previously collected data as a way to forecast user behaviour. Because of the truly vast amount of data, no human would be able to review and analyse it. As a result, various machine learning solutions enter the picture.

As of today, predictive analytics has found lots of applications. For example, webstores offer personalised recommendations based on previous purchases or activity, while insurers and banks have quite a long history of analysing the customer’s background and assessing risks prior to making decisions. DeepFace, a system trained on images from Google, has already left behind Hollywood visual effects artists. Netflix gathers data to capture user tastes in order to create or purchase the right movies or TV shows. Facebook and Google analyse user preferences to make ads more relevant.

Past experience has shown that businesses making the most of their data are destined to win in the long term. As AI and computational capacities improve, ordinary people are gaining access, and sometimes even free of charge, to technologies which used to be available to industry giants only.

In myTracker, these in-depth analysis tools are now available to marketing specialists, product managers, and mobile app developers and owners. myTracker generates a free forecast of your revenues from each group of users engaged. It also helps to identify the most profitable user acquisition channels and growth points and gives insights into user demands, resulting in a marked increase in earnings.

How it all began

Although we have always had very ambitious goals for using predictive analytics in our service, we started with financial performance forecasts. We drew on a simple idea that most businesses are established to make a profit. Based on this, the first solution we introduced was forecasting LTV (lifetime value), an estimate of the average revenue that a customer will generate before they delete, or lose interest in, your app.

Probably every company strives to forecast LTV for identifying users that should not be covered by an ad campaign. If users generate less revenue than you spend to attract them, the campaign no longer makes any economic sense. When you can see your most valuable customers, the best opportunities for ad placement, and the platforms that will pay off, you avoid unnecessary costs and increase the number of right decisions.

What influences user behaviour

An approximate LTV can be predicted without dedicated machine learning tools. We talked about some of those methods in our article for VC.ru. Unfortunately, these predictions are too general and time-consuming, and contain no hard data or details.

However, if we try and provide a more detailed LTV, we will find that user behaviours differ: one person may pay a large lump sum, while another will make more regular smaller payments during a long period of time. It is quite tricky to calculate everything manually.

Usually, a single user's payments look somewhat like the chart below:

Revenue generated by the user since app installation (time and LTV are shown on the horizontal and vertical axes, respectively).

A forecast for a single user has two objectives:

  1. Predict the number of payments.
  2. Predict the payment amount.

The main issue is that users tend to be chaotic and random in their behaviour. This makes it impossible to approximate the way a single user behaves using one function only. Machine learning, or teaching a model to create a reliable forecast for a user, will not work either.

We have chosen a different approach, with myTracker classifying users into different segments (cohorts). Users from each cohort share a number of common traits. The main challenge is to choose the right cohorts, i.e. the rule for classifying the users.

First, we identified the high-level factors that impact user behaviour:

  • platform (Android or iOS);
  • app;
  • country;
  • ad campaigns that attracted them;
  • interaction with certain functions of the app;
  • other cross sections.

Payment patterns differ from country to country. For example, over a six-month period, a mobile app user from Liechtenstein may pay an average of USD 350, while a Brazilian will only pay USD 20. In fact, many things depend on the specific app, ads, type of payment, and a number of other factors. Analysing users coming from different campaigns brings us to the same conclusion: different ads and platforms attract different customers. Sometimes they are really engaged users willing to spend more in the app, and sometimes they only care about what's free of charge and are not ready to spend a penny.

The final selection of a specific cohort can be unique for every app. It should rely on hard data, expert estimate, and testing based on historical information. Based on the data accumulated over the years of myTracker operations, we identify cross sections which provide the most effective categorisation of users by LTV. As a next step, we use these cohorts for prediction.

Finally, we get several groups of users with similar patterns of behaviour. It is important to avoid choosing too small cohorts, otherwise the cohort prediction becomes excessively personalised and is therefore unstable.

How to calculate LTV

Now let's look at a number of ways to forecast user-generated revenue, or LTV.

1. The most obvious solution is to get historical data for the old cohort, feed them to a machine learning model, and make calculations for the new cohort with shared characteristics. The more similar the two cohorts, the more accurate the prediction.

Yet there are many factors affecting this approach to measuring LTV from new acquisitions. These include promos and ad campaigns, which may dramatically change user behaviour. This makes it impossible to simply collect data for the new segment and come up with a perfectly correct prediction. On top of that, this approach would require reliable historical data. As a result, when it comes to customers from a new market or campaign which represent an entirely different cohort, we will be able to build forecasts only after a certain period of time has elapsed, depending on the size of the training data sets.

2. Another thing you can do is to analyse the cohort behaviour in the first several days and then try and replace LTV with a function that suits the cohort best. The choice is underpinned by historical data for a similar cohort. While this approach does rely on historical data, they are not key to generating a prediction for a new user group. In this case, the main tools would be logarithms or a linear model. Their accuracy may also be impacted by promos or app features (both for the current and old cohort used for the model training). The forecast graph will then look like this:

Time and cohort LTV are shown on the horizontal and vertical axes, respectively. The zero point is the app installation time. The model starts generating LTV predictions some time after the installation.

How the final model works

To improve an LTV forecast, we use a mix-model, which is a combination of models best fit to a specific cohort. Our measurements help predict revenue from any cohort, enabling our customers to obtain an accurate estimate for any cross section. There is no ideal one-size-fits-all solution as we make predictions for a variety of apps, platforms, and payment methods. We apply a bunch of linear and gradient tree boosting models along with those described above.

The choice is driven by a range of factors, including:

  • in-app payment history;
  • payment breakdown vs previous periods;
  • cohort size.

We seek to identify and eliminate the effect of promos and features by adjusting our models, with peaks and troughs in the graph flattening.

As myTracker models are adjusted to each user, they have no difficulty analysing new cohorts. Thanks to the mix-model, we are less dependent on historical data for a similar cohort.

Also, there are various payment types such as in-app payments, subscriptions, and advertising, which require totally different approaches. Some of them would need a regression tree model, while for others a simple approximation of the logarithm function works better.

This is a fairly complicated system that generates forecasts by revenue type using a variety of models and picks the best model for each cohort within an app or payment type. As a result, we operate over 500 models trained for each app to make accurate predictions in nearly any circumstances. At the same time, our predictions are not perfect and are often affected by powerful marketing campaigns that can change user behaviour patterns. Another factor likely to impact the analysis is a change in ad monetisation or overall app development strategy.

The more data we collect by app, revenue type or other metrics, the more accurate myTracker's prediction will be. To provide our customers with the most precise insights and quickly eliminate any errors, we regularly validate and monitor the quality of our models.

This enables us to continually train and update the models using incoming data. With automated training, update and validation in place, we can continue boosting the reliability of our forecasts and offer our customers an effective ready-to-use solution capable of processing app data almost immediately after they are fed into myTracker.

Results

The key takeaway is that myTracker can now predict LTV with an accuracy of 85–90%, 30, 60, 90 and 180 days following app installation.

LTV prediction is the best marketing tool to analyse apps with in-app purchases or subscriptions and assess revenues by advertising channel and campaign. Relying on billions of transactions in thousands of apps, myTracker predictive models are capable of making forecasts from the first days of use. As the app data accumulate and the models are trained to adapt to a specific audience, predictions become increasingly accurate. All you need to do to launch this tricky system is to go to our Constructor, select the Metrics section, and set the desired forecast horizon.

This tool will help you better engage your audience, fine tune your promotion strategy, and cut down on marketing expenses. Predictive analytics has made the life of many of our customers easier by tackling many business challenges at lower cost. What's more, Mail.ru Group has transitioned a lot of its own apps to this platform. We are confident that going forward this technology will make a difference for the entire industry. In parallel, myTracker team is working on other domains to benefit the Group and our customers who seek to increase the earnings they get from their apps. Check out our next article to learn more.