A Step-by-step Guide to Exporting Raw Data from MyTracker and Analyzing It for Fraud

Author: Ivan Mazharov Fraud Scanner, Lead Data Scientist

April 01, 2022

MyTracker can now flag fraud in audience segments. Customers will have an easier time analyzing app traffic, and they can also export fraud installs for further analysis or to send them their partner's way for refund.

In this guide, we will look at working with raw data and fraud using the Quick Installs metric.

With the help of raw data, we will plot the CTtI (Click Time to Install) metric, which will show us the time elapsed between the last ad banner click before the app is installed and the app install itself. Since a whole lot of actions take place between these events, the time can’t be abnormally short.

Getting ready for raw data export

Before exporting installs and analyzing them for fraud, we should create two segments in the MyTracker interface using the new functionality. This will give us two groups of installs – those flagged as Quick Installs and “clean” installs based on this metric.

In the top navigation menu, go to “Reports” → “Segments”. Click “Add” on the page that opens. Fill in the segment name, select “Devices” as audience type, and pick account, projects and applications.

Then add the Quick Installs metric.

Save the segment and use its idSegment to export data with the help of RAW API.

For this test, we will also create a similar segment with the Quick Installs metric, but this time activate “Exclude from segment” to single out clean traffic.

To analyze fraud data, we will need to export the following selectors:

idApp – app identifier;
idDevice – identifier of a particular device;
tsEvent – timestamp of an event (install in this case);
idPartnerTitle – partner title;
tsClick – click timestamp (in this case, we will only consider click fraud);
androidReferrerTsClick – click timestamp according to Google Play (for Android apps);
idAdEventTypeTitle – type of ad interaction that led to attribution.

For partner dialogue purposes, identifiers should be exported in iOS and Android (GAID/IDFA).

To see all parameters that can be exported from MyTracker, go to Request to export raw data in the Documentation section.

So, we have set up two segments – one with the Quick Installs metric and one with an inverted metric, and downloaded two files with saved installs for fraudulent and clean devices.

Analyzing data for fraud

In our example, we change all identifiers and partner titles (except for Organic).

Step 1: Merge fraud and fraless installs into one table

After extracting, we open the files in Excel and split the data into different cells, which gives us two tables containing fraud and fraudless installs respectively. Now, we create one more column in each table, name it isFraud, and fill in ones for fraud installs and zeroes for clean installs.

To analyze the data, we need to combine the two tables into one containing both fraud and fraudless installs. Just copying the data over to one table will do the trick.

Now, we need to calculate the click time to install to detect fraud.

But before that – some preparation. First, we need to sort out installs that can't be analyzed based on short CTtI.

Step 2: Filter out all organic installs

Uncheck Organic in the idPartnerTitle column's filter.

Step 3: Filter out all installs with no click time

Uncheck 0 in the tsClick column's filter.

Step 4: Filter out all installs with non-click attribution

Check “Post click” only in the idAdEventTypeTitle column's filter.

Step 5: Calculating click time to install

For iOS apps, we need to subtract the tsClick value from the tsEvent value.

For Android apps (like in our example), there are two ways:

Using an additional androidReferrerTsClick value, if it was sent by Google Play. It gives a more accurate picture of the user interaction time before the install. If such value is available, we can calculate the difference between tsEvent and androidReferrerTsClick.
If androidReferrerTsClick is not available, we do the same thing as for iOS – subtract tsClick from tsEvent.

In our case, we can do it using the function =C2-IF(F2<>0,F2,E2).

Input the calculated value in the delta column.

Step 6: Capping the delta

We are only looking at quick installs, so we need to limit the delta values to arrive at our chart.

To filter out all values for which an install occurs beyond the first minute post click, set it to only show deltas less than or equal to 3600 (the value is specific to each app).

Step 7: Filtering partners to detect fraud

To detect fraud in an app, we need to check each partner's traffic for fraud.

If we look at the entire app, the data will be jumbled, making it impossible to identify the fraud source. This happens because different partners may have different approaches to advertising. Plus, concerns about the quality of traffic are addressed to each partner separately.

In our example, we will leave in Partner1.

Step 8: Creating a chart based on exported raw data

Select the delta column and create a histogram, which should give us the following chart:

It shows abnormal timing between ad banner click and install – the first peak (0–25 seconds).

This is CTtI fraud for Partner1.

You can use the fraud installs segment and regularly export such installs to improve your interaction with advertising partners.

What to do with undetected fraud

Unfortunately, even after all this work there is still undetected fraud in the app. The reasons are several:

All other partners need to be checked regularly.
Post-view attribution installs need to be checked as well.
There are a variety of fraud types that circumvent CTIT-based abnormal distribution, including clever bots and emulators that demonstrate abnormal patterns after an install.

Even in our example, we did not detect all fraud for Partner1. Below is a chart from our internal system. It shows that some installs have a different delta, but are still flagged as fraud – this is because in Fraud Scanner, our anti-fraud system, an abnormal peak is detected automatically following a comprehensive analysis, as opposed to manual detection by an analyst. This gives us a much more accurate result.

But the hurdles don't stop there. Here we have a chart where you can clearly see two peaks and abnormal distribution – with many installs.

In practice, such cases are relatively rare, either due to a small number of installs or a more evened-out chart. Weeding out fraud becomes much harder with nothing but a histogram. This is where Fraud Scanner really comes in handy, as it can detect fraud even for small cohorts.

Request a demo

Tags: raw data ad fraud