Estimating the Micro-Targeting Effect

Evidence from a Survey Experiment
during the 2020 US Election

Musashi Harukawa

Politics in Progress Colloquium, HT21

Introduction

tl;dr

Survey experiment during 2020 US election to estimate effect of micro-targeting.
Trained algorithm to target participants with anti-Biden ads run by Trump campaign.
Among unaligned respondents who had not pre-voted at time of survey targeting:
- Increased proportion anti-Biden by 8.7 percentage points.
- Decreased proportion intending to vote Biden by 7.1 percentage points.

Definitions and Scope

Tailoring is constructing a message so that it appeals to a specific audience.
Targeting is delivering the message so only the intended audience sees it.
Micro- is on the basis of individual characteristics.

Context

In 2016 Cambridge Analytica reportedly able to affect election outcomes using individuals’ Facebook data (Simon 2019).
Significant media and scholarly attention warning of threats and consequences for society:
- informational “filter bubbles” threaten civil discourse (e.g. Burkell & Regan 2020).
Many of these arguments presume that micro-targeting works:
- “micro-targeting of voters can pay very handsome electoral dividends for a relatively modest investment” (Krotoszynski Jr. 2020).

Contradicting Evidence

The issue is that it’s not so clear that micro-targeting does, in fact, work.
The political science literature in particular casts a lot of doubt on CA’s claims.
- A recent panel study by Coppock et al testing 49 ads with over 34,000 people finds little evidence of heterogeneous effects. They specifically point out the incompatibility of this result with claims about the efficacy of micro-targeting (no heterogeneity means no improvements from better targeting)
Research in psychology, on the other hand, suggests that micro-targeting should work.
- Jens Madsen (here at Oxford) and co-authors simulate a campaign with agent-based modelling and show that campaigns that micro-target are more successful.
- A very recent paper by Zarouali et al (2020) simulates micro-targeting with an experiment where they use a writing prompt to profile respondents’ extroversion traits, and match them with ads accordingly. Persuasion effects were stronger amongst respondents matched on the basis of personality.

Decade of political science research leaves little space for micro-targeting to make a difference.
- Coppock et al (2020) test 49 advertisements on 34,000 people, find little evidence of heterogeneous effects.
Psychology research simulating targeting suggests it should work:
- Madsen and Pilditch (2018) use ABM.
- Zarouali et al (2020) use experiment (N=158).

Gaps and Challenges

Strategies and algorithms proprietary (Edelson et al 2019).
Data difficult to obtain (Liberini et al 2020).

Research Question

Does micro-targeting work?

or:

Is it possible to improve the effectiveness of a campaign by optimally allocating advertisements on the basis of individual traits?

Research Design

Case Selection

US 2020 Presidential Election
Participants US citizens, resident in US, of voting age.
Payment and recruitment via Prolific.
Redirected to external website https://survey.polinfo.org

Design Summarized

Five advertisements.
First stage (N=1,500), respondents shown random ad.
This data used to train targeting algorithm.
Second stage (N=900), respondents shown optimal ad.
Difference between Stage 1 and 2 is treatment effect of targeting.

This table describes the five ads that were used in the experiment.
- The first ad depicted Hillary Clinton and Biden mocking Trump supporters, calling them deplorables.
- The second ad detailed Hunter Biden’s ostensible corruption, and blamed it on Joseph Biden.
- The third ad was a standard 2nd amendment ad warning voters that Biden would steal their guns.
- The fourth ad focused on a quote by Biden where he stated that black Trump supporters are not Black.
- The final ad blamed Obama and Biden for wars and neglected veterans.
These ads were chosen from the set of all ads run by the Trump campaign in the final months of the race.
After watching all of the ads, I asked a panel of a dozen other DPhil students to help me narrow down my choice from a shortlist of 12. The criteria for inclusion are that the ads were clearly targeted to a specific, and distinct audiences, and that no one ad was likely to have a universal appeal above all others.

Title	Description
“They Mock Us”	In-group: Clinton and Biden mocking
“Why did Biden let him do it?”	Hunter Biden’s ostensible corruption
“Biden will come for your guns”	2A; Biden will steal guns
“Insult”	Biden: Black Trump supporters not Black
“Real Leadership”	Obama/Biden caused wars, neglected veterans

Stage 1

For individual \(i \in \{1,...,1500\}\):

Pre-treatment covariates \(\mathbf{X}_i\).
Shown \(d_a\), where \(a \sim \mathcal{U}\{1, 5\}\).
Post-treatment outcome(s) \(Y_i\)
- Biden and Trump favorability (1-5)
- Voting preference

Predicting Optimal Advertisement

Results of stage 1 used to learn outcome as function of pre-treatment traits and advertisement:

\[ Y_i = f(X_i, d_a) \]

Thus for any person, I can predict their hypothetical outcome under each of the five advertisements:

\[ \hat{f}(X_i, d_1) = \hat{Y}(d_1) \] \[ \{\hat{Y}(d_1), \hat{Y}(d_2), ... \hat{Y}(d_5)\} \]

Five Candidate Algorithms

Chosen for speed and ability to learn highly conditional relationships:

Random Forest (RF)
AdaBoost
Gradient Boosted Decision Trees (GBDT)
Multi-Layer Perceptron Regressor (MLPR)
Support Vector Machine (SVM)

Choosing the Best Model

Models trained/tested with 30-fold cross-validation
Compared on RMSE, max error and prediction time
RF and AdaBoost generally better, RF weakly better than AdaBoost and less likely to give ties
Fitted model uploaded to web server

Stage 2

Same pre-treatment questions \(\mathbf{X}_i\).
Predictive model chooses optimal advertisement as a function of \(\mathbf{X}_i\):

\[ d^*(X_i): opt_a \; f(X_i, d_{i, a}) \]

Average Targeting Effect

Compare average outcome between randomly assigned stage 1 and optimally assigned stage 2:

\[ ATE = \mathbb{E}_i[Y_i(d^*(X_i))] - \mathbb{E}_a[\mathbb{E}_i[Y_i(d_{i, a})]] \]

Hypotheses

Three outcomes of interest:

Hypothesis 1 (Micro-targeting Affects Favorability): \(\mathbb{E}_i[y_{i, Biden}(d^*)] < \mathbb{E}_a[\mathbb{E}_i[y_{i, Biden}(d_{i, a})]]\)
Hypothesis 2 (Micro-targeting Affects Voting Preference): \(\mathbb{E}_i[v_{i, Biden}(d^*)] < \mathbb{E}_a[\mathbb{E}_i[v_{i, Biden}(d_{i, a})]]\)
Hypothesis 3 (Micro-targeting Affects Turnout): \(\mathbb{E}_i[u_{i}(d^*)] < \mathbb{E}_a[\mathbb{E}_i[u_{i}(d_{i, a})]]\)

CATEs of Interest

Account for conditional effect of two pre-treatment covariates:

Partisan self-identification
Early voting

Results

Stage 1 - Treatment Assignments

Stage 1 - Feature Importances

Stage 1 - Predicted Outcome as Function of Partisanship

This figure I created the most recently after speaking to Alex Coppock.
To create this figure, I calculated predicted Biden favorability for each respondent in the second stage for all ads and levels of partisanship. This figure shows the mean level of Biden favorability at each level of partisanship, averaged across all second-stage respondents.
This figure essentially shows which ad the algorithm predicts will be most effective, on average, across different levels of partisanship.
It can be seen that the Race ad, referencing the time when Biden stated that Black Trump supporters are not Black, is the most effective for 1-3, individuals who are very Democrat to Neither but leaning Dem.
For the rest of the levels, the In-Group advertisement (they called us deplorable, they’re mocking us) appears to be the most effective at getting respondents to dislike Biden.
It’s interesting to note that the 2A advertisement is generally predicted to be ineffective, even for very Rep respondents.
Again, these are not regression coefficients; this is basically a representation of what the model “thinks”.

Stage 2

This figure started out as a series of regression tables.
This figure shows the effect of receiving the targeted ad on intent on Biden favorability, intent to vote biden, and turnout among respondents who had not voted at the time of the survey (N=1,160).
The panels, from left to right, show Democrat, Unaligned and Republican voters.
Each pair of bars shows the predicted proportions when untargeted and targeted.
The number above each pair of bars shows the regression coefficient on the corresponding model.
The key pairs are in the centre panel, above “Dislike Biden” and “Vote Biden”. The pair above “Dislike Biden” shows that the effect of targeting on Biden favorability operationalized as a binary outcome (1-2=Dislike, 3-5=Not Dislike), and the pair above “Vote Biden” pshows the effect of targeting on propensity to state that the respondent will vote for Biden.
In this group, of unaligned voters who had not already voted at the time of the survey, showing respondents an optimized ad increased the likelihood that they would state that they dislike Biden by 8.7 p.p. and decreased the likelihood that they would state that they intended to vote for Biden by 7.1 p.p., relative to receiving a random advertisement. Both of these statistics are significant at \(\alpha=0.5\).
Other results

Robustness

Pre-treatment covariate balance check.
Multiple comparisons correction (Holm, Benjamini-Hochberg).
Variety of operationalizations of outcome (linear, binary, ordered categorical)
Pre-experimental power check using Coppock et al (2020) data (along with permutation test for bias in mechanism).

Discussion

Key Results

Among unaligned respondents who had not pre-voted at time of the survey, targeting:
- Increased proportion anti-Biden by 8.7 percentage points.
- Decreased proportion intending to vote Biden by 7.1 percentage points.
These margins are greater than those seen in many crucial districts.

Normative Aspects

Data and algorithms to influence the outcome of elections?
- Whose preferences are being represented?
- Creates incentives to harvest voter data (Krotoszynski Jr. 2020)
When is it permissible to target political advertisements?
- Is negative advertising the problem?
- Issues of consent and privacy.

Ethical Conisderations

CUREC approved
Participant consent obtained
Extensive end-of-survey debrief detailing purpose and potential manipulation

Limitations

This experiment and study obviously have a number of limitations.
As pointed out to me, there is a difference between answering a survey and actually voting. My claims do not include the latter in scope.
Due to budget constraints, I was not able to guarantee a representative sample.
This approach to targeting is quite different to that by Cambridge Analytica. Their approach involved inferring psychological profiles, and then matching advertisements to these inferred characteristics. I skip this middle step and directly optimize allocation. If it turns out CA approach is better, then my point still stands, however.
I don’t really reveal much about mechanisms here.
As was also pointed out to me, “random allocation” may not be an appropriate comparison category because no ad assignment is ever completely random. This should instead be understood “as good as any ad” vs “the best ad”

Survey Response vs. Vote Choice
Convenience Sample
Psychometric Profiling vs Data-Driven Optimization
- Relatedly: Mechanisms?
Reference Category?

What’s Next

Follow up experiment planned, Germany?
- Additional treatments to test effect of informing participants of targeting.
Further exploration of normative aspects

Estimating the Micro-Targeting Effect

Introduction

tl;dr

Definitions and Scope

Context

Contradicting Evidence

Gaps and Challenges

Research Question

Research Design

Case Selection

Design Summarized

Advertisements

Stage 1

Predicting Optimal Advertisement

Five Candidate Algorithms

Choosing the Best Model

Stage 2

Average Targeting Effect

Hypotheses

CATEs of Interest

Results

Stage 1 - Treatment Assignments

Stage 1 - Feature Importances

Stage 1 - Predicted Outcome as Function of Partisanship

Stage 2

Robustness

Discussion

Key Results

Normative Aspects

Ethical Conisderations

Limitations

What’s Next