How to spot mobile ad fraud

At myTracker, we often encounter fraud in mobile ads, and today we're going to talk about how to detect it and protect your budget. Mobile fraud is rampant in advertising, affecting any and all mobile app owners who seek to promote their product and attract users.

Why is mobile ad fraud so topical? According to Forrester’s recent survey, about a third of enterprise marketers believe that at least 40% of their ad budgets are exposed to fraud. Given the often massive ad spend of modern businesses, the damage is likely to amount to millions of dollars. So, we have to combat fraud to prevent your marketing budget from being siphoned away to crooks.

Let’s start with a definition of mobile ad fraud. We’ll use this term to describe various types of fraud, such as click hijacking, impressions flooding, emulated devices and others that distort the results of ad campaigns and suck up your budget.

There are certain user actions for which you pay money to the ad network. These can be divided into two types:

  • pre-install events such as ad impressions or banner clicks;
  • post-install events such as app installs or desired in-game actions (e.g. upon installing an app or levelling up).

The tricky thing about pre-install events is that you can only get information about users’ actions in ad networks from the ad networks themselves. If a user doesn't end up installing your app, there’s no way for you to find out about their actions directly, which means you pay money to the ad network based on the information it shares with you. So, the simplest fraud case here is to provide customers with unreliable and simulated data about clicks or views and get paid for it.

Bot farms are another example of mobile ad fraud which immediately comes to mind. It's a more refined and sophisticated technique seeking to tap into expensive target events, such as payments or other in-game actions. Mobile advertising is a fast-growing business, with ad networks ramping up capabilities to meet ever-increasing consumer needs. The foil for the former fraud method would be paying only for the events that you can track (e.g. app installs). Larger ad networks make it possible to create complex events for which you will pay (e.g. in-game payments or continuous playtime of over N hours). On the other hand, bot farms can try to simulate behaviour patterns of common users and perform the desired actions of your ad campaign. Modern bots can disguise themselves by emulating screen taps and other user actions, as well as the specs of real mobile devices.

Now let’s focus on some ways to detect such fraud. To begin with, we’ll try to look for malicious ad networks seeking to cheat us by selling air, i.e. simulated data, or stealing users from other ad networks. We assume that the ad network can provide us with click and view data.

CTtI – Click Time to Install

The idea here is that we'll consider the time elapsed between the last ad banner click before the app is installed and the app install itself. Let’s make it clear right away that we’ll be counting the first app launch as installation. Specifically, a whole bunch of actions takes place between the two events, while a user:

  • clicks on an ad link to the store page;
  • possibly evaluates the app itself;
  • clicks the install button;
  • enters a password, if necessary;
  • downloads the application files;
  • installs and launches it.

All these actions take time, and in most cases, that time can't be significantly reduced.

Let’s assume that the time spent on such user behaviour must follow a log-normal distribution. We tested it on organic traffic where users chose to install the app naturally or organically without being redirected from somewhere else. Organic traffic usually faces no fraud because we don’t pay for it. Here's an example of a log-normal distribution.

The X-axis shows the time elapsed between the last click and the app install (delta), and the Y-axis shows the proportion of users with such delta.

Users are distributed by Internet speed (i.e. download time) and by interest (with the user possibly opening the app in the store but not going for a download). Importantly, a user may have clicked on an ad, which means they are most likely interested in the app and want to download it. Thus, the time distribution peak tends to skew to the left, with a long tail attached (due to slow Internet speeds, failure to download the app right away, etc.). Now let’s move on to the most important question. How should we separate the sheep from the goats? This is a bit of a conundrum, so let's try to untangle it.

First, have a look at the chart, and then we'll explain how we understand which traffic is fraud-ridden and which is not.

The X-axis shows the time elapsed between the last click and the app install (delta), and the Y-axis shows the proportion of users with such delta.

The quickest test to help clear up the situation is as follows: “We consider any traffic beyond a log-normal distribution to be fraud-ridden.”

Things are not that simple, though, as we don’t always have a multi-million sample and our sample is not always free of outliers. So, the first thing to do is clear our traffic of outliers that could’ve resulted from incorrect calculations or glitches in the data gathering system (with negative values on the X-axis being the most trivial case). Once we have a more or less clean chart, we can plot a histogram and assess how similar it is to a log-normal distribution. Then we can pick up the traffic that goes beyond. If there isn't much traffic left, we can most obviously try to expand the sample by adding similar traffic or by increasing the time interval in question.

VTtI – View Time to Install

This metric is very similar to CTtI. Now we will consider the time between the last ad banner view and the app install, rather than the time between the last click and the app install. We believe this distribution should also be similar to a log-normal distribution, though the minimum time to install increases since it takes users who have viewed (including without clicking) the ad time to find the app in the store before downloading it. Many rogue advertisers try to take credit for ad views secured by other ad networks, so such metrics give them away quickly.

The X-axis shows the time elapsed between the last click and the app install (delta), and the Y-axis shows the proportion of users with such delta.

CTtC – Click Time to Click

Here we look at the time between the last and the last but one click.

As I mentioned for the VTtI metric, there are many rogue advertisers that try to steal clicks (or views) from other ad networks. This metric is quite helpful in detecting such fraud, so we can reallocate ad budgets going forward under the clear understanding that some of our ad networks don’t bring in new users, but simply steal those that would've come to us anyway. This means that if a user clicked on the app, they were most likely interested in it, went to the store and started downloading it. It would be weird if we saw two clicks on different ad banners within a very short time, and even weirder if they came from two different ad networks. The reason is that even if a user didn’t like the page in the store, they most likely went back to the game or the website where they were spending time before clicking the ad, rather than switching to other websites. So, this metric is also expected to follow a log-normal distribution and should be handled in a very similar way to CTtI.

The X-axis shows the time elapsed between the last click and the app install (delta), and the Y-axis shows the proportion of users with such delta.

It's important to note for this metric that we consider the ad network providing us with the last click to be the one that's trying to steal our user. It shouldn't be used for ad views, which are usually independent as users can visit multiple websites or watch a lot of ads in their apps.

It's way harder to discover the theft of ad views, which is of lesser relevance in most cases because ad impressions cost very little these days as compared to other ad types. So, in our opinion, efforts to analyse recent impressions to keep track of stolen ad views are too costly with little or no profit in the end.

Conclusion

The last thing to note about these metrics is their applicable time interval. All three metrics are worth considering only within the first 10–15 seconds because these intend to show that a certain event occurred very quickly.

The metrics described above are more focused on bad ad networks or marketing partners and have nothing to do with the fact that a user may engage in fraud. This is important to understand when negotiating with ad networks or taking actions within the app against fraudulent users. It's also important because we don’t have to determine which devices go beyond a log-normal distribution, which is a much bigger challenge with less obvious solutions.

This article opens a discussion on issues related to buying mobile ads. If you want to buy cleaner traffic and pay for real users, you can cut off a chunk of mala fide ad networks using the above metrics. Unfortunately, this is just the tip of the iceberg, and most ad networks know how to bypass these metrics, at least partially.

To take advantage of these metrics, you need to meet some requirements:

  • collect and store an array of events in your app;
  • integrate and collect data from marketing partners;
  • know how to handle attribution of app installs and other events;
  • have control of computing capacities and analyst time to collect all the data and implement the metrics described above.

We know that it's quite difficult to analyse data manually, which is why we provide a handy solution – Fraud Scanner. It automatically detects all types of mobile ad fraud and protects you throughout the entire life cycle of your app.