Log in

MyTracker Case Study: How MY.COM saved 20% of its mobile game promotion budget using predictive models

MY.COM Group's portfolio includes a great variety of mobile games; the company actively develops them as well as promotes and acquires users. An important part of the process is assessing the quality of traffic. The earlier it is possible to make a judgement, the better you can allocate the budget. Good quality information about traffic allows you to shut off inefficient channels and focus on those which bring users who are most loyal and spend more.

In order to minimize costs and increase monetization, Gaming Analyst teams together with myWidget recommendations service created a predictive analytics tool based on MyTracker service which uses machine learning. For a number of MY.COM gaming projects (Juggernaut Wars, Evolution, Hawk, Hustle Castle) models were developed that predict main metrics such as LTV (lifetime value) - for any sample of players based on data analysis. Data is collected within 2-5 days after the game has been installed.

The graph bellows shows a report on the quality of predictions for the media buying team. It can be seen that the predictions were accurate and the decisions based on them were justified.

Technique

The new predictive analytics tool is meant to predict the most complex metric - 90th day LTV for every user. This allows for any, even the most exotic group to be viewed without significant computing load.

The decision to shut off inefficient channels should be made as early as possible based on accurate predictions. However, the price of under- and overestimating a user isn't the same, so we relied on the fact that is it better to underestimate the user.

Data about user's time spent in the game, their payments and device info as well as socio-demographic data was used to make the predictions.

Due to a low proportion of paying users in games (anywhere between 1% and 10%, depending on region, genre and monetization models) LTV prediction wasn't always great due to a large number of non-paying users. The problem was solved by sampling non-paying users and adding a "probability of paying" indicator. During cross-validation, the "paying-non-paying" classifier showed rather good results (70% recall and 95% ROC AUC), and the predicted probabilities correlated with payments.

A separate issue was the consideration of numerous categorical features. Here, we resorted to the technique of mean-encoding categorical features: e.g., for every country we calculated the share of paying users from that country based on historical data.

As a result, we developed an algorithm which predicts LTV as well as the probability of any user making a payment.

The graph below shows distributions of predicted (with Random Forest model) LTVs and actual LTVs for a cohort of a gaming application users.

We've also considered using segments to predict LTV (e.g. 'app - country' or 'app - campaign - partner') based on linear or more complex models - gradient boosting, Random Forest, Poisson regression and others. The only data needed was the dynamics of aggregate payments during first week of user interaction with the app. Thus, we have an LTV prediction for a user segment at our disposal and are able to make adequate decisions on where a certain acquisition channel should be used. During cross-validation for different apps, it was possible to predict user segment LTV with relative error between 0 and 25%.

How it works

The general outline of how the predictive model works is as follows:

  1. Data accumulation for machine learning algorithms. A minimum 2-3 month's worth of data is needed from the moment the SDK is installed and user acquisition has started. To determine the required amount of data (both in time and in the number of payments), validation curves are constructed, according to which, for each application, their constants are automatically determined - the minimum sample size (number of payments and events). To analyze the stability of model predictions, confidence intervals based on bootstrap are constructed and their variance calculated.
  2. Primary machine learning and model validation (model selection and validation, hyperparameter tuning, feature selection). Several models need to be trained, because predictions are made on the 1st, 2nd, 3rd, 7th, 14th and 21st day from the moment of user registration, and each day requires a separate model. To speed up calculation, we evaluate feature importance based on random forest and gradient boosting and choose the most significant features. To select the best model at a given time, the modified MAPE + sMAPE criterion is used. Hyperparameters of models are adjusted during time-series cross-validation. Nowadays, several models are used to evaluate models for all days and for all MY.COM gaming projects. However, most often gradient boosting and random forest prove to be the most accurate.
  3. Using the model. After optimal settings of models have been determined, you can start using them. At the end of each day, the first iteration of computing predictions is performed for every user who installed the app on the previous day. Further, predictions are updated for user cohorts on their 2nd, 3rd and subsequent days since installing the game. With new data coming in daily (new payments, sessions, app launches and more) predictions are updated and improved daily as well.
  4. Continuous model improvement. New data is added daily to the training set and the quality of the model is continually evaluated. If its accuracy declines, we re-train the model with new sample weigths. Extra training also takes place if there has been a sharp change in project metrics (usually due to introduction of new functionalities, modes or when entering a new market)

Conclusion

Developed at MY.COM, the LTV prediction model helps the company to optimize its game promotion budget, saving up to 20%. Previously, the money was spent on acquiring non-paying or irrelevant users which is usually discovered much later (in several weeks) when analyzing losses. In addition, up to 15% of UA department's time was saved: previously employees were manually analyzing traffic quality based on data from just 2 to 4 weeks. As a result, predictive analytics allowed MY.COM to significantly improve its effectiveness in marketing.

Tags: LTV gaming predictive analytics