This short vignette demonstrates how to download and annotate images inside Tweets. Besides imgrec, we will use rtweet to get Twitter data and the dplyr package for data wrangling.

Setup

Before we start, access credentials are required for Twitter and Google Cloud Vision. For this example, all credentials are stored as R environment variables. Check out this rtweet vignette to obtain and use Twitter API access tokens. The authentification for Google Cloud Vision is described in this imgrec vignette.

# load libraries
library(imgrec)
library(rtweet)
library(dplyr)

# prepare twitter credentials
app_name <- Sys.getenv('twitter_app_name')
consumer_key <-Sys.getenv('twitter_consumer_key')
consumer_secret <- Sys.getenv('twitter_consumer_secret')
access_token <- Sys.getenv('twitter_access_token')
access_token_secret <- Sys.getenv('twitter_access_secret')

# obtain twitter access token
token <- create_token(app = app_name, 
             consumer_key = consumer_key, 
             consumer_secret = consumer_secret, 
             access_token = access_token,
             access_secret = access_token_secret,
             set_renv = TRUE)

# setup authentification for google vision
gvision_init()

Download tweets

If you know the status id’s of Tweets that you would like to obtain, you can use lookup_tweets(), which takes a vector of status id’s as input and retrieves all corresponding tweets. URL’s of images (and videos) are stored in the list column media_url.

We use one of the most-retweeted tweets posted by Barack Obama as an example:

“No one is born hating another person because of the color of his skin or his background or his religion…” (Barack Obama, Twitter Status)

example <- lookup_tweets(896523232098078720)
example$media_url
## [[1]]
## [1] "http://pbs.twimg.com/media/DHEXH7RV0AAUwKj.jpg"

Annotate Tweets

Now, we retrieve and parse annotations for the Tweet image:

results <- get_annotations(images = example$media_url[[1]], 
                           max_res = 10, # max. number of labels,
                           mode = "url", # we pass an image url
                           features = 'all') %>% 
           parse_annotations()
## [1] "Sending API request(s).."
names(results) # features obtained by Google Cloud Vision
##  [1] "labels"            "web_labels"        "web_similar"      
##  [4] "web_match_partial" "web_match_full"    "web_match_pages"  
##  [7] "faces"             "objects"           "safe_search"      
## [10] "colors"            "crop_hints"

And that’s it! The results are stored in a list object wich includes dataframes for all annotations retrieved from Google Cloud Vision:

results$labels %>% head()
## # A tibble: 6 x 5
##   mid       description score topicality img_id                                 
##   <chr>     <chr>       <dbl>      <dbl> <chr>                                  
## 1 /m/0d4v4  Window      0.732      0.732 http://pbs.twimg.com/media/DHEXH7RV0AA~
## 2 /m/06ht1  Room        0.657      0.657 http://pbs.twimg.com/media/DHEXH7RV0AA~
## 3 /m/01l0mw Home        0.637      0.637 http://pbs.twimg.com/media/DHEXH7RV0AA~
## 4 /m/0244x1 Gesture     0.558      0.558 http://pbs.twimg.com/media/DHEXH7RV0AA~
## 5 /m/03jm5  House       0.537      0.537 http://pbs.twimg.com/media/DHEXH7RV0AA~
## 6 /m/0ytgt  Child       0.520      0.520 http://pbs.twimg.com/media/DHEXH7RV0AA~