Could generative ad media be doomed by doppelgängers?
And is it actually possible for brands to respond to that risk?
Have you ever met your own Doppelgänger? Can you think of a couple of people you know, who you aren’t sure you would be able to tell apart if you had them both in the same room, or do you work with someone who is the spitting image of that one celebrity, just without the ridiculous hair? Have you read the stories about celebrities losing their own lookalike competitions?
It’s a widely stated “fact” that we all have seven doppelgängers. But is that true? And more importantly could that be true for your ad?
Because, situationally, it’s easy to see how that could become a massive problem.
I would hope that it goes without saying that anyone even considering creating ad media using AI needs to understand a lot about the source they plan to use for those images. And much has been said on the topic of the accidental recreation of humans who haven’t given permission for a likeness to be used.
But I think a much less discussed aspect of this is the question of whether due diligence here might just be doomed by the relatively low distinctiveness “space” of human faces? That no matter how effectively we can create fully synthetic likenesses, we can still expect them to have unexpected twins?
Consider:
You might be sued by someone claiming that you have used their identity, or that the image was created by a model that was using it without permission.
You might be sued by the ex-partner or other acquaintance of someone connected to your ad media pipeline, who is alleging that an unflattering model was created in their likeness.
Ignoring any discussion of the reliability of systems designed secure permission for an images use, or to produce entirely synthetic faces. Is it actually credible that either of those situations could arise purely by chance?
I think so, yes.
This is a suprisingly difficult question to answer conclusively, especially without writing a mini whitepaper]. There are a few important dimensions to the question, especially on the issue of “how similar is similar enough?”, and there isn’t actually as much published research to draw on as one might assume. I’ll link some references and notes at the end of this article with a more in depth discussion, and I’m happy to write a more in depth examination if people would like to see it.
Beyond that data, I think this is well in tune with most peoples intuition. Most people have around 150 acquaintances that they would know well enough to really start pattern matching about, and so if you personally happen to have lookalike anecdotes, then it starts to become very likely that an ad campaign served to 20 million people might acquire some too. Likewise, if you have 50 people in an organisation collectively creating 500 ads a year, it feels very plausible that they could duplicate the ex-wife of a random editor somewhere along the way with absolutely no malice involved, and without anyone noticing*.
So the Next obvious question here is what the actual legal risk here might be? How likely is this to actually result in damaging litigation?
That question is also complicated. I am not a Lawyer, and so I have no business giving legal advice or trying to interpret specific legislation. Neither do I have access to the resources and experience needed to properly gauge the potential for theory to translate into plausible risk here. That being said, I’d offer the following points for consideration:
That you can still be sued even if you are legally airtight. Defending will still cost time and money, even if you win.
That the specific case history and regulation in this area is very much in flux, with the consequent potential to create new risk suddenly, and potentially even retrospectively. Neither is it safe to trust in legislators considering complex edge cases as regulation rolls out. Advertising typically needs to worry about legal issues over a lot of jurisdictions.
That generative human faces would seem to remove a keystone of historical defence on these issues. (“Here’s the model, here’s their consent”).
That some high risk ad types might even be more prone to issues here. Apple notes lower distinctiveness “space” (and hence lower face id effectiveness) in the faces of those under the age of 13 (1). Different geographies will have different risk because of how facial variability is tied to genetic variability. Diversity of training data matters too. Celebrity images may be higher risk for a few reasons. The preprint paper here describes an experiment in which entirely synthetic images were found to be more readily identifiable as a better match to a at least one celebrity than the celeberities own image at least 2.1% of the time and were flagged as a match 14% of the time. Just by itself, that would seem to suggest potential for risk here. (2).
That even very well considered “explainability” systems might struggle to protect here. You might need to prove a negative within a very complicated “black box” or a proprietary dataset. And even if you can actually do that, it could still involve diverting expensive expertise for a long time.
That, more broadly, deep expertise in this area could be staggeringly expensive. Lots of potential for very expensive cases.
That a lack of a training image in the picture, can’t protect you from that “accidental acquaintance” scenario because any specific likeness could plausibly have been created just using prompts. Defense in this case might become almost impossible if you have no effective documentation of workflows (or straightforward if you do have that in place).
That the broad negative sentiment on AI in media (and creative content generation more broadly) might increase the tendency to sue, and that same sentiment, as well as the value of involvement in high profile work may also mean that plaintiffs here could find it relatively easy to find willing, or even free representation for test cases.
That it might be difficult to draft legislation that would allow for the possibility of coincidental image generation, without also leaving space for abuse by advertisers, especially with regard to the replication of specific liknesses.
That if a favourable ruling or settlement lands in a friendly venue, it might be possible for plaintiff firms, using off-the-shelf tech, to match faces from your historical generative ad campaigns in order to identify and recruit potential plaintiffs after the fact.
I’m sure I’m missing aspects of this. But if I was contemplating using AI to create depictions of realistic humans for use in an ad, then you can better believe I’d be getting an opinion on the above from an actual legal authority.
Other possible questions to ask them:
To what extent can this risk be mitigating by still securing a models likenesses to base your AI images on?
Do any legal guarantees provided by model vendors cover this type of risk? If they do, would you need to have any process safeguards in place to retain that coverage?
What pipeline safeguards can be put in place here? e.g. Tamperproof process documentation, identifying high risk or sensitive ad types, output distinctiveness testing?
* I think it’s worth putting safeguards against actual malice in place here too, but it’s not the specific issue being discussed here.
…………………..
The Technical stuff
The key technical complictation here is that mathematically “perfect” doppelgangers, are effectively impossible, whereas “similar enough to cause confusion” doppelgangers are probably common enough to cause isssues.
The most direct discussion I can see specifically on this topic is on the prepublication paper here, Coincidental Generation submitted to Management Science. This discusses the problem in some detail specifically in the context of enterprise usage, and has some examination of the mathematics of the problem. It also contains the details of the experiment I mention involving celebrity images.
Outside of this, much of the existing research is focused on Facial ID, and false match rates within these systems. This is necessarily a balancing act for vendors between false match rate, and the rate of missed matches. Apple currently advertises a “less than 1 in 1,000,000 false match rate” here for it’s own offering, something that seems to have gone mostly unchanged since its introduction in 2017. That doesn’t mean performence hasn’t improved here over that period, but it might imply performance improvement has been used to decreases the “Missed match rate” rather than decrease the false match rate. False Match rates in these systems also tell us little about how significant those matches might be perceived to be by an actual human, but persistence of issues here may support the argument that feature overlap is quite common and unavoidable.
Other relevent research and articles to check out here:
Look-alike humans identified by facial recognition algorithms show genetic similarities - Does raise the possibility of that there may be areas of the human “space” that are less likely to have actual people in it.
Finding Your Unknown Twins
Francois Brunelle - Artist specializing in Doppelgangers

