MY.COM Group's portfolio includes a great variety of mobile games; the company actively develops them as well as promotes and acquires users. An important part of the process is assessing the quality of traffic. The earlier it is possible to make a judgement, the better you can allocate the budget. Good quality information about traffic allows you to shut off inefficient channels and focus on those which bring users who are most loyal and spend more.
In order to minimize costs and increase monetization, Gaming Analyst teams together with myWidget recommendations service created a predictive analytics tool based on MyTracker service which uses machine learning. For a number of MY.COM gaming projects (Juggernaut Wars, Evolution, Hawk, Hustle Castle) models were developed that predict main metrics such as LTV (lifetime value) - for any sample of players based on data analysis. Data is collected within 2-5 days after the game has been installed.
The graph bellows shows a report on the quality of predictions for the media buying team. It can be seen that the predictions were accurate and the decisions based on them were justified.
The new predictive analytics tool is meant to predict the most complex metric - 90th day LTV for every user. This allows for any, even the most exotic group to be viewed without significant computing load.
The decision to shut off inefficient channels should be made as early as possible based on accurate predictions. However, the price of under- and overestimating a user isn't the same, so we relied on the fact that is it better to underestimate the user.
Data about user's time spent in the game, their payments and device info as well as socio-demographic data was used to make the predictions.
Due to a low proportion of paying users in games (anywhere between 1% and 10%, depending on region, genre and monetization models) LTV prediction wasn't always great due to a large number of non-paying users. The problem was solved by sampling non-paying users and adding a "probability of paying" indicator. During cross-validation, the "paying-non-paying" classifier showed rather good results (70% recall and 95% ROC AUC), and the predicted probabilities correlated with payments.
A separate issue was the consideration of numerous categorical features. Here, we resorted to the technique of mean-encoding categorical features: e.g., for every country we calculated the share of paying users from that country based on historical data.
As a result, we developed an algorithm which predicts LTV as well as the probability of any user making a payment.
The graph below shows distributions of predicted (with Random Forest model) LTVs and actual LTVs for a cohort of a gaming application users.
We've also considered using segments to predict LTV (e.g. 'app - country' or 'app - campaign - partner') based on linear or more complex models - gradient boosting, Random Forest, Poisson regression and others. The only data needed was the dynamics of aggregate payments during first week of user interaction with the app. Thus, we have an LTV prediction for a user segment at our disposal and are able to make adequate decisions on where a certain acquisition channel should be used. During cross-validation for different apps, it was possible to predict user segment LTV with relative error between 0 and 25%.
The general outline of how the predictive model works is as follows:
Developed at MY.COM, the LTV prediction model helps the company to optimize its game promotion budget, saving up to 20%. Previously, the money was spent on acquiring non-paying or irrelevant users which is usually discovered much later (in several weeks) when analyzing losses. In addition, up to 15% of UA department's time was saved: previously employees were manually analyzing traffic quality based on data from just 2 to 4 weeks. As a result, predictive analytics allowed MY.COM to significantly improve its effectiveness in marketing.