# Twitter's Academic API

Let's go through a quick demo of how to use Twitter's new Academic Track full archive search.

## Step 0: Set-Up

So far, I have obtained an academic track Twitter account, and put my credentials into my preferred location: `~/.cfg/twitter.yaml`. The file looks like this:

```{yaml}
search_tweets_v2:
  endpoint:  https://api.twitter.com/2/tweets/search/all
  consumer_key: <CONSUMER_KEY>
  consumer_secret: <CONSUMER_SECRET>
  bearer_token: <BEARER_TOKEN>
```

If you are using colab, you will need to give colab some way of accessing the file.

One solution is to put the credentials file in the root your google drive and then mount the corresponding drive with the following command:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Requirements - use conda install if you have a local installation
!pip install searchtweets-v2
!pip install requests pyyaml

# Step 1: Setting up credentials

The `searchtweets` library provides a convenience function
for reading in the credentials and passing them on in a
standardised format.

In [2]:
import searchtweets as tw

# If using colab, credentials might be at "/content/drive/.twitter_keys.yaml"
search_args = tw.load_credentials("~/.cfg/twitter.yaml",
                                     yaml_key="search_tweets_v2",
                                     env_overwrite=False)

## Step 2: First Request

This library contains two ways of making requests.

I'll demonstrate the easier one first.

Let's get some tweets about political science.

In [4]:
query = tw.gen_request_parameters('polisci', results_per_call=10)
print(query)

{"query": "polisci", "max_results": 10}


In [6]:
tweets = tw.collect_results(
    query=query,
    max_tweets=10,
    result_stream_args=search_args
)

## Step 3: Parsing the Results

Let's look at what the our results:

In [9]:
print(
    f"The function returns a {type(tweets)},",
    f"where each tweet is held in a {type(tweets[0])}.",
    f"The total number of objects is: {len(tweets)}",
    f"The first 10 have the following keys (attributes): {tweets[0].keys()}", 
    f"The last is a token for the next query, with the following keys:\n {tweets[-1].keys()}",
    sep="\n"
)

The function returns a <class 'list'>,
where each tweet is held in a <class 'dict'>.
The total number of objects is: 11
The first 10 have the following keys (attributes): dict_keys(['id', 'text'])
The last is a token for the next query, with the following keys:
 dict_keys(['newest_id', 'oldest_id', 'result_count', 'next_token'])


In [11]:
for tweet in tweets[:-1]:
    print(f"{tweet['id']}\n\t{tweet['text']}")

1369281707577335809
	@vsbc_ @californiadem20 it's not that it's impossible - there's plenty of polisci material that does. but it does become very hard even for political scientists, let alone for us, the riffraff.
1369279033192677377
	RT @CU_CDCC: Looking forward to this timely conversation, featuring @carleton_music @cu_polisci  @CU_History and @CBCAdrianH ! Don't forget‚Ä¶
1369278788970946564
	RT @CU_FASS: Thurs at 7, @carleton_music welcomes you to join Prof. James Deaville &amp; panelists Prof. Melissa Haussman (@cu_polisci), Prof.‚Ä¶
1369278577527685124
	RT @MYBISA: Who's registered for #BISA2021?

With 3 days of panels/roundtables, 3 awesome keynotes and fringe events inc @BISAPGN's 'meet t‚Ä¶
1369277721931624452
	per the good folks at facebook, 10 years ago i got into a polisci phd program i was really excited about and i am so, so glad i didn't do it
1369277221777575936
	RT @CU_CDCC: Looking forward to this timely conversation, featuring @carleton_music @cu_polisci  @CU_Histor

## Step 4: Customizing Queries

References:

- Documentation on `searchtweets.gen_request_parameters`
- https://developer.twitter.com/en/docs/twitter-api/fields
- List of all tweet.fields: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet

In [15]:
query = tw.gen_request_parameters(
    query = "#metoo (place_country:MX OR place_country:IN) -is:retweet -is:nullcast",
    results_per_call = 100,
    start_time = "2021-01-01", 
    end_time = "2021-01-31",
    tweet_fields = "id,created_at,text,author_id,context_annotations,entities"
)

`"#metoo (place_country:MX OR place_country:IN) -is:retweet -is:nullcast"`

- `#metoo`: the query begins with the special hashtag operator. This will match tweets that contain the hashtag: #metoo. Note that this will not match the text ‚Äúmetoo‚Äù, or a longer hashtag ‚Äú#metookutsu‚Äù
- ` ` Spaces act as boolean AND operators.
- `( )` brackets (parentheses) group operators together.
- `place_country:MX` filters for tweets geo-tagged in Mexico. Note that the country names are written as two-letter ISO codes, and also that the vast majority of tweets will not have a country code, so adding this filter will also end up filtering out of tweets that were in fact tweeted in your country of interest.
- `OR` a boolean OR operator.
- `-is:retweet` removes all retweets; in other words, you only get ‚Äúprimary‚Äù tweets. The minus sign negates the following argument, so we are saying IS NOT RETWEET. As a researcher, this is an incredibly useful one.
- `-is:nullcast` removes tweets created purely as promotions/ads.

In [17]:
rs = tw.ResultStream(
    request_parameters=query,
    max_results=100,
    max_pages=1,
    **search_args
)

In [18]:
custom_tweets = list(rs.stream())

## Analyzing the extended output

Here are some tricks we haven't looked at so far, such as set comprehension:

In [28]:
# Seeing the unique combinations of keys (attributes) of the tweet
{
    tuple(sorted(tweet)) for tweet in custom_tweets
}

{('author_id', 'context_annotations', 'created_at', 'entities', 'id', 'text'),
 ('author_id', 'created_at', 'entities', 'id', 'text'),
 ('newest_id', 'oldest_id', 'result_count')}

In [32]:
# We can put these things in a pandas dataframe!
import pandas as pd

df = pd.DataFrame(custom_tweets[:-1])
df

Unnamed: 0,author_id,id,text,created_at,entities,context_annotations
0,25187518,1355630147639119876,Ojal√° est√©n tomando nota del #metoo que se le ...,2021-01-30T21:33:08.000Z,"{'hashtags': [{'start': 29, 'end': 35, 'tag': ...",
1,337753203,1355600319116271619,#BB14 is promoting male sexual harassment . Sh...,2021-01-30T19:34:36.000Z,"{'mentions': [{'start': 220, 'end': 230, 'user...","[{'domain': {'id': '3', 'name': 'TV Shows', 'd..."
2,1196830101511364608,1355204724375449601,@rohini_sgh @Dev_Fadnavis Ha ha ha ha. #Come...,2021-01-29T17:22:39.000Z,"{'mentions': [{'start': 0, 'end': 11, 'usernam...","[{'domain': {'id': '10', 'name': 'Person', 'de..."
3,50599372,1355119454342922240,I remember @factordaily was the only publicati...,2021-01-29T11:43:49.000Z,"{'urls': [{'start': 241, 'end': 264, 'url': 'h...",
4,3192855216,1354825131894349826,@insia_dariwala @pragyavats @CSR_Environment @...,2021-01-28T16:14:17.000Z,"{'urls': [{'start': 141, 'end': 164, 'url': 'h...","[{'domain': {'id': '10', 'name': 'Person', 'de..."
5,1353554760729649152,1354654438225399813,Happiness is something you want every second.....,2021-01-28T04:56:00.000Z,"{'hashtags': [{'start': 95, 'end': 105, 'tag':...",
6,1140129434,1354599860272553984,"Justicia, admiraci√≥n y respeto para mi compa√±e...",2021-01-28T01:19:08.000Z,"{'urls': [{'start': 255, 'end': 278, 'url': 'h...","[{'domain': {'id': '3', 'name': 'TV Shows', 'd..."
7,206377381,1354311464903876609,One of the most disgusting things about the MJ...,2021-01-27T06:13:09.000Z,"{'annotations': [{'start': 44, 'end': 51, 'pro...","[{'domain': {'id': '10', 'name': 'Person', 'de..."
8,159347165,1353746960147230723,https://t.co/X91gJXBsQa\n\n#metoomovement #MeT...,2021-01-25T16:50:01.000Z,"{'urls': [{'start': 0, 'end': 23, 'url': 'http...",
9,252676883,1353691911689793536,@ExSecular #MeToo - MJ Akbar.,2021-01-25T13:11:16.000Z,"{'mentions': [{'start': 0, 'end': 10, 'usernam...","[{'domain': {'id': '10', 'name': 'Person', 'de..."


Two of the fields do not like being coerced to tabular structures. Here's a few tricks
for unpacking them and breaking them down.

The 'entities' field contains information such as hashtags, mentions, urls and so on:

In [70]:
{
    tuple(sorted(tweet['entities'])) for tweet in custom_tweets
        if 'entities' in tweet.keys()
}

{('annotations', 'hashtags'),
 ('annotations', 'hashtags', 'mentions'),
 ('annotations', 'hashtags', 'mentions', 'urls'),
 ('annotations', 'hashtags', 'urls'),
 ('hashtags',),
 ('hashtags', 'mentions'),
 ('hashtags', 'mentions', 'urls'),
 ('hashtags', 'urls')}

In [38]:
[
    print(tweet['text'],  tweet['entities']['mentions'], '', sep='\n') for tweet in custom_tweets
        if 'entities' in tweet.keys() and 'mentions' in tweet['entities'].keys()
];

#BB14 is promoting male sexual harassment . Shocked to see #SalmanKhan calling #RakhiSawant ‚Äòs harassment of #AbhinavSukla in BB house as entertainment. #HimToo is as serious issue as #MeToo #AbhinavShuklaDeservesBetter @ashukla09 - we are with you..
[{'start': 220, 'end': 230, 'username': 'ashukla09'}]

@rohini_sgh @Dev_Fadnavis  Ha ha ha ha.  #ComedyCircus by #BJPJokers. What is relevance here  original Dhongi and #Dimbendra. After this paid news ,people came to know that #Anna  was starting a #MeToo drama. BTW Ghantaa farak Nahi padta. #MVA is dealing farmer protest sensibly to keep peace
[{'start': 0, 'end': 11, 'username': 'rohini_sgh'}, {'start': 12, 'end': 25, 'username': 'Dev_Fadnavis'}]

I remember @factordaily was the only publication who consistently published and followed up on #MeToo before it blew up in2018 and they will forever have my respect for that. All the best folks! So excited to see you back and the new model! https://t.co/omS6mafU2Y
[{'start': 11, 'end': 23, '

The 'context_annotations' contains some very cool stuff.

It's structured as a list of nested dictionaries however.

In [58]:
[
    print(tweet['text'], 
    [
        (t['domain']['name'], t['entity']['name']) for t in tweet['context_annotations']
    ], '', sep='\n') for tweet in custom_tweets
        if 'context_annotations' in tweet.keys()
];

#BB14 is promoting male sexual harassment . Shocked to see #SalmanKhan calling #RakhiSawant ‚Äòs harassment of #AbhinavSukla in BB house as entertainment. #HimToo is as serious issue as #MeToo #AbhinavShuklaDeservesBetter @ashukla09 - we are with you..
[('TV Shows', 'Big Brother UK'), ('Person', 'Abhinav Shukla'), ('Actor', 'Abhinav Shukla'), ('Person', 'Salman Khan'), ('Actor', 'Salman Khan')]

@rohini_sgh @Dev_Fadnavis  Ha ha ha ha.  #ComedyCircus by #BJPJokers. What is relevance here  original Dhongi and #Dimbendra. After this paid news ,people came to know that #Anna  was starting a #MeToo drama. BTW Ghantaa farak Nahi padta. #MVA is dealing farmer protest sensibly to keep peace
[('Person', 'Devendra Fadnavis'), ('Politician', 'Devendra Fadnavis')]

@insia_dariwala @pragyavats @CSR_Environment @#metoo @ProfSonoraJha @BDUTT @TheRestlessQuil @MasalaBai @captraman @_AdilHussain @AzmiShabana https://t.co/3Q4Vdlk4di
[('Person', 'Barkha Dutt'), ('Journalist', 'Barkha Dutt')]

Justicia, a

In [65]:
# The same code, but easier to read (probably)

for tweet in custom_tweets:
    if 'context_annotations' in tweet.keys():
        print('TEXT:',
              tweet['text'],
              '\nNAMED ENTITIES:', sep='\n')
        for entry in tweet['context_annotations']:
            print(f"{entry['entity']['name']}: {entry['domain']['name']}")
        print('\n')

TEXT:
#BB14 is promoting male sexual harassment . Shocked to see #SalmanKhan calling #RakhiSawant ‚Äòs harassment of #AbhinavSukla in BB house as entertainment. #HimToo is as serious issue as #MeToo #AbhinavShuklaDeservesBetter @ashukla09 - we are with you..

NAMED ENTITIES:
Big Brother UK: TV Shows
Abhinav Shukla: Person
Abhinav Shukla: Actor
Salman Khan: Person
Salman Khan: Actor


TEXT:
@rohini_sgh @Dev_Fadnavis  Ha ha ha ha.  #ComedyCircus by #BJPJokers. What is relevance here  original Dhongi and #Dimbendra. After this paid news ,people came to know that #Anna  was starting a #MeToo drama. BTW Ghantaa farak Nahi padta. #MVA is dealing farmer protest sensibly to keep peace

NAMED ENTITIES:
Devendra Fadnavis: Person
Devendra Fadnavis: Politician


TEXT:
@insia_dariwala @pragyavats @CSR_Environment @#metoo @ProfSonoraJha @BDUTT @TheRestlessQuil @MasalaBai @captraman @_AdilHussain @AzmiShabana https://t.co/3Q4Vdlk4di

NAMED ENTITIES:
Barkha Dutt: Person
Barkha Dutt: Journalist


TEXT:

## Step 5: Saving tweets

The tweets are json format, which we can save/read with the `json`
library.

In [67]:
import json

def save_tweets(tweets, filename, mode='w'):
    with open(filename, mode) as fout:
       json.dump(tweets, fout)
    
def read_tweets(filename):
    with open(filename, 'r') as fin:
        tweets = json.load(fin)
    return tweets

save_tweets(custom_tweets, 'metoo_tweets.json')

In [68]:
read_tweets('metoo_tweets.json') == custom_tweets

True