Google came out today with a click-fraud report, claiming that less than 2% of all clicks on both adwords and adsense are considered click-fraud. Shuman Ghosemajumder, Product Manager on Google’s Trust and Safety Team, claims that this is true because of Google’s 4-step filtering system.
Prevent Adsense Click Fraud
According to Ghosemanjumder, below is an accurate picture of click-fraud and invalid clicks based on Googler’s internal data:
Google uses a 4-step filtering system, as pictured below:
In Ghosemajumder’s words:
The first layer is purely automatic and is used to filter clicks from both search and AdSense partners (contextual ads). This filter is able to detect invalid clicks in real-time, with the goal of removing them before they ever show up in the AdWords console.
The second and third layers are aimed at filtering only AdSense clicks. The second layer is what Google calls its flagging system and is an automatic process to remove invalid clicks from the AdWords system. The third layer of filtering is a manual review process with more than two dozen Google employees manually reviewing and removing any suspicious clicks.
Google’s goal is to have the first three layers of filtering identify 100% of all invalid and fraudulent clicks. Those clicks that manage to escape Google’s filters are what causes many advertisers to raise concerns and has spawned the growth of many so-called click fraud detection companies. The fourth layer of click fraud detection falls to these advertisers and detection companies and is what Google calls requested investigations.
Ghosemajumder goes on to explain that not all click-fraud is a bonafide click-fraud, citing examples from multiple clicks from the same IP Address might be from a corporate site, etc. He concludes that the current numbers of 20% or above click-fraud are inflated and untrue.
Google & Measurement System Analysis (MSA)
I have several friends working at Google; I believe that they hire smart people. But, a blanket percentage number for this type of phenomena is insufficient. These types of scenarios are subject to intra-subject and inter-subject variability. That is, how does one know that a click is valid or invalid? That question alone points to the need of additional metrics such as specificity and sensitivity:
Specificity = [(number of true negatives) / (number of true negatives + number of false positives)]
The specificity metric gives us an idea of how accurate the testing measurement tool is — in this case, the accuracy of declaring a click is not an invalid click; without that metric, blanket percentages declared by Google don’t have much meaning.
Another metric that is important to know is the sensitivity of the test:
Sensitivity = [(number of true positives) / (number of true positives + number of false negatives)]
The sensitivity metric gives us an idea of the accuracy of the test for demonstrating true click-fraud. Again, without this measurement, blanket percentages purported by Google don’t carry much meaning.
The two measurements above give rise to 2 more metrics that will give us a better picture into the true Click-fraud rate and the accuracy of the measurement system in question:
False Positive Rate = (Number of False Positives / Number of True Negatives)
This metric gives us an idea of the proportion of negative instances that were incorrectly reported as positive. On the other side, we can also derive the following:
False Negative Rate = (Number of False Negatives / Number of Positive Instances)
That metric gives us an idea of the proportion of positive instances that were reported as negative. Below is a helpful table for reference:
Without the reporting of the 4 measurements above, it is truly difficult — if not academically possible — for Google to claim very much.
Google, Click-Fraud, and The Liar’s Paradox
Because Google really didn’t present much today in terms of meaningful data to help the audience reach a conclusion the the accuracy of the measurement system or the true numbers that make up invalid clicks and click-fraud, Google is reduced to the Liar’s Paradox.
Following my previous post on axiomatizing majority rule, I present how Google claim today is a Liar’s Paradox:
With the absence of meaningful data, Google claims:
Statement One: 2% of clicks constitute click-fraud.
Statement Two: Statement One is False.
Statement Three: Statement Two is True.
Statement Four: Statement One is both True and False
Statement Five: Statement Four is a contradiction
QED
I’m stretching the Liar’s paradox here a little bit, but I do it to demonstrate that Google’s statement today, without the metrics I described above, presented us with nothing meaningful.
Become a Lean Six Sigma professional today!
Start your learning journey with Lean Six Sigma White Belt at NO COST
wioota says
I read Andy Beal’s post and baulked instantly. Fortunately I can focus on getting some work done tonight as you’ve brought a some sober analysis to the table already. In general the blogosphere seems to give numbers levels of meaning previously unseen in other mediums – numbers are regularly quoted [without/incorrectly] even quoting the metric they quantify.
تور مالزی says
Hi Erin,
Thanks for this article, it makes a lot of sense but now days many fraudsters use different tactics to avoid detection. When it comes to stop attackers that use VPN and proxie softwares i’d recommend to use a 3rd party click-fraud protection services like Clickcease.com or improvely.com.These services even offer to automatically add the IPs to the exclusion list so the attacker can’t keep clicking your ads. I couldn’t find a better way to stop click-fraud these days.