Analysing RSS data with tidyRSS

Robert Myles McDonnell

2019-05-22

Introduction

tidyRSS is a package for extracting data from RSS feeds. It has one function, tidyfeed(), which takes one argument, the url of the feed. Included in the package is a simple dataset, a list of feed urls, which you can use to experiment with (they were taken from here). This vignette is designed to give you an idea of what tidyRSS can be used for.

Installation

tidyRSS can be installed from GitHub, using the devtools package, or from CRAN directly.

devtools::install_github("robertmyles/tidyrss")


install.packages("tidyRSS")

Usage

As tidyRSS is based on the idea of tidy data, it forms part of the tidyverse, which means that it plays nice with any tidy-style package. Visualizations, summaries, further manipulation is all easy thanks to this underlying common structure. For example, we can take a look at how statistician and political scientist Andrew Gelman has been feeling, at least from the (very!) superficial means of judging his blog post titles.

library(tidyRSS)

rss <- tidyfeed("http://andrewgelman.com/feed/") 

Since the data are already in a tidy format, it’s straightforward to use it with other tidyverse packages such as tidytext.

library(tidytext)
library(dplyr)
library(ggplot2)

data("stop_words")
  
rss_t <- rss %>%
  unnest_tokens(word, item_title) %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("bing"), by = "word") %>% 
  mutate(week = week(item_date))

ggplot(rss_t, aes(x = sentiment)) +
  geom_bar(aes(fill = sentiment), colour = "black") +
  theme_classic() +
  scale_fill_manual(values = c("#616161", "#FFD700"))