> "Online ad delivery may take website popularity and keywords in mind, but may have little or no knowledge of audience exclusivity."
Except most online ad delivery doesn't do that anymore, most ads are bought and sold on the exchanges and are targeted based on user cookie data rather than broad demographics or keywords (most notably retargeting which has experienced hypergrowth over the last few years). Using ad delivery examples from 2011 inane as ad delivery and targeting have both changed massively since then.
I'm not sure why you are getting down-voted. i was surprised when reading the article how inaccurate it was, as almost all major systems have online auctions based off cookie data.
I spoke to several CEO's and chief data scientists of these companies that run or purchase these ads. they go almost entirely off your browsing history because nothing shows intent to buy more than one's browser history. This is an important thing to think about because they have to buy the ad regardless if you buy or not, but they only collect money if you make a purchase.
I'm just going to make an example so dont get too caught up in details, but say rich people like to use cameras more than poorer people. naive logic would make one think you should know the demographics of the people visiting and then send them a camera ad if they 'have money'. however, just because a rich person 'may' like cameras, it probably doesnt mean they are about to buy one. however, jumping around to certain sites does show a lot about your intent regardless of demographics.
To be clear, they don't have your whole browsing history, just (at best) the history of sites visited in the same browser that also happened to use ads from the same network. And some ads are CPL/CPA where the advertiser only pays when the user takes some action, but many are still CPM and the advertiser is paying for impressions.
But really, I think the dirty little secret of these ad networks is that the targeting doesn't actually work all that well. Despite all the data they should theoretically have about you, it just doesn't work very well.
Also, I'd argue search terms are a much better indicator of intent to buy than search history or demographics -- one of the reasons Google ads do so much better than Facebook ads even when Facebook knows much more about you.
There are companies that re-populate cookies. The company formerly known as Rapleaf did this. You use computer A, the you go to device B, log in to some random site that is a Rapleaf partner, and they dropped a cookie connecting you to your other device based on your email address. Not particularly shocking anymore, certainly not as Facebook's new cookieless multi-device ad network rolls out.
It is going to be very interesting 10 years or so out when we have physical billboards running facial recognition delivering targeted ads. Perhaps 20 years out, a very customized AI salesperson creates a demographic profile and knows exactly what to say to maximize the probability of us buying.
The curious thing about advertising law is that this big data & super targeting stuff has become critical for political campaigns. Laws which throttle back on the privacy encroachment would likely neutralize a lot of the effectiveness of a modern US Presidential campaign.
Cookie-based ads tend to show me things I've already made a decision about. Often, it's something I've thought about but decided against, or sometimes even things I've already spent money on (which has to be the ultimate waste of an ad).
In any case, at the very least, I've already formed a preference on the thing, whereas with a search I typically have no leaning in mind and am thus much more influencable.
exactly, you nailed this. it is a lot more about your cookies than anything else right now. That might change in the future but the most valuable impressions are still those very basic retargeting impressions to the advertiser.
If it had been published by an advocacy group, [the article]
would have simply been the latest in an unending series of
baseless, pseudo-scientific attacks on online advertising
over imagined harms. But coming from the chief technologist
of the advertising industry’s primary regulator and
enforcement agency, the post takes on a far more coercive
tone.
I think the FTC makes a valid point, and the interesting topic of discussion, I realize though that they don't argue that advertisers in say, Gourmet, don't really target people going to Walmart.
Is the FTC accusing of online ad equivalent of redlining?
in part, these sorts of ads are determined by a designed bid/match thresh-hold - so why is it not Omega Psi Phi accepting better matches on the Yield side? (also remember, if they deny the ad placement, they deny themselves ad revenue)
Also this is going to get a fundamental oddball question - being part of a protected group could be correlated but not causative for the targeting on that site, but not other sites. For example, for the lower quality credit cards, it could be tied to credit scores, and and there could be a higher percentage of people showing up in these protected groups on these sites (but not in other sites that these protected groups go to that don't have have this correlation of low credit scores, which is the actual targeting criteria.)
You've brought up an interesting point: correlation vs causation. One of the issues in discrimination contexts today is disparate impact. For example, we know that a bank couldn't say, "we don't want to lend money to black people." Race is a protected class and that's discrimination against a protected class. What if the bank said, "we don't want to lend money to people in cities"? Now we're into disparate impact land. Racial minorities are often more in cities and this policy could have an adverse impact against certain groups along racial lines. The kind of questions that are then often asked are things like, "is there a business purpose?" and "is there no tighter indicator that the bank could use for their business purpose that wouldn't have the same disparate impact?"
Ultimately, disparate impact is about not using indicators that have a racial bias when better indicators exist. The bank might say, "we use city residence because people in cities usually have a history of missed debt payments." Regulators could retort, "then why not just use an individuals history of made and missed debt payments?" Disparate impact isn't about preventing the use of all indicators that correlate with a protected class. However, companies often use indicators that are of dubious predictive power (things where one can't show a significant correlation with the bad outcome the company wants to avoid, but show a strong correlation with a protected class).
Since the article is about analytics, one can imagine building a bayesian model and a few test subjects to put through it. We could think about features that are probably predictive like missed payments. We could also have features like height which should be nonsense. For the sake of this silly model, let's say that one racial group is shorter and is a greater credit risk. You create your "height < 6ft" feature and your "has > 2 missed payments" bernoulli variables and start training your model based on your training set. You find that "has > 2 missed payments" is very predictive, but you also find that "height < 6ft" is also predictive. But there's a certain, "um, we know that the feature is correlated with a protected class and there's no good, logical reason to use it unless you're trying to create a proxy for that protected class so you can discriminate" that might go through one's head.
Disparate impact isn't about preventing the use of all indicators that correlate with a protected class. However, companies often use indicators that are of dubious predictive power (things where one can't show a significant correlation with the bad outcome the company wants to avoid, but show a strong correlation with a protected class).
This is done in a LOT of machine learning settings - the vast majority of variables used in classification/regression scenarios fall into this category. Most machine learning consists of turning a large number of nearly insignificant correlations into a single big correlation.
Consider category A, which has features x[i] located in the ball of radius 100 around the point [0,0,...,0] in N dimensional space. Category B has features x[i] located in the ball of radius 100 around the point [1,1,...,1]. Any individual feature is useless - if it's from category A, it'll be located in [-100,100], whereas if it's from category B it'll be located in [-99,101].
On the other hand, the classifier sum(x) works perfectly for N > 100. That's because in N dimensions, the distance between [0,0,...,0] and [1,1,...,1] is sqrt(N) (proof: use the pythagorean theorem).
That's an oversimplified example, but I hope it illustrates the point.
It's a bit scary that the FTC wants to start regulating feature engineering. Will they also take a position on Bayesian classifiers vs geometric/vector space methods?
That's part of a larger set of questions about protected classes and targeting. We don't know until we see the full campaign and targeting criteria. RTB makes this even harder, as well as Yield management issues.
This is but a dance. If we don't know all the criteria of who is asking for what at a given time, no dice on if the FTC is right
Some complete tangents into the data presentation/slicing of this article.
Figure 3 is a horrible graph, a perfect illustration of why scatterplots suck! Asians completely overwhelm everything else due solely to the fact that they were plotted last and because of the huge space filling dots. Not to mention that because of all the overlap, density is nearly invisible.
Secondly, the "Asian" category is clearly nonsensical here, given that highly exclusive Asian domains were indiatimes.com and baidu.com. It should have taken the author about a second to recognize that maybe this category is overly broad.
This was educational. MixRank looks like an interesting way to derive ties between audiences and vendors, for customer discovery. At least until ad networks start hiding information from MixRank scrapers.
Except most online ad delivery doesn't do that anymore, most ads are bought and sold on the exchanges and are targeted based on user cookie data rather than broad demographics or keywords (most notably retargeting which has experienced hypergrowth over the last few years). Using ad delivery examples from 2011 inane as ad delivery and targeting have both changed massively since then.