Fixes for Known Issues

While writing this book, we anticipated issues cropping up in the printed code. The systems our workflows rely on are in constant flux. Developers are updating their packages, website administrators may change urls, and of course, we may have made errors (we are only human). So, we agreed to maintain this website to offer fixes to any issues. If you run into such issues, please let us know!

You can use the feedback form, email us directly, or open a new issue on the GitLab repository. We also re-run the code quarterly to detect any problems.

Below, are error or warning messages that might print in our R console, and a link to a potential solution. We also will update the code on the GitLab repository.


Chapter 3: Computing Basics

unexpected string constant in
Yes, there is a minor, but consequential, typo in the code to install all the `R` packages. It is a misplaced comma (p. 28). This results in the error message: `unexpected string constant in`. Below is the corrected code block:

cran_pkgs <- c(
"backbone", "caret", "factoextra", "gender", "ggpubr", "ggraph", 
"ggrepel", "ggtern", "glmnet", "gmodels", "googleLanguageR",
"guardianapi", "gutenbergr", "hunspell", "igraph", "irr", 
"lexicon", "lsa", "marginaleffects","Matrix", "network", "proustr", 
"qdapDictionaries", "quanteda", "quanteda.textmodels", "remotes", 
"reshape2", "reticulate", "rsample", "rsvd", "rtrek", 
"semgram", "sentimentr", "sna",  "stm", "stminsights", 
"stringi",  "tesseract", "text2map",  "text2vec", "textclean", 
"textstem", "tidygraph", "tidymodels", "tidyquant", "tidytext", 
"tidyverse", "tokenizers", "topicdoc", "topicmodels", "udpipe"
)

install.packages(cran_pkgs)

Chapter 7: Wrangling Words

Could not download a book

In Chapter 7 (p. 113), when trying to download Lewis Carroll’s book from Project Gutenberg:

book_ids <- c(11, 12, 13, 620, 651)
my_mirror <- "http://mirrors.xmission.com/gutenberg/"

carroll <- gutenberg_download(book_ids,
                              meta_fields = "title",
                              mirror = my_mirror)

We may see the following:


Warning message:                                                                                                                                                                                                                      
! Could not download a book at http://mirrors.xmission.com/gutenberg//1/11/11.zip.
ℹ The book may have been archived.
ℹ Alternatively, You may need to select a different mirror.
→ See https://www.gutenberg.org/MIRRORS.ALL for options. 

The solution is in the message. Go to https://www.gutenberg.org/MIRRORS.ALL and select a different mirror. For example:


my_mirror <- "https://gutenberg.pglaf.org/"
subscript out of bounds

In Chapter 7 (p. 112), when running the following code:

which.max(leven[, "Jabberwock"] )
which.min(leven[, "Jabberwock"] )

We may see the error message:

Error in leven[, "Jabberwock"] : subscript out of bounds

This is related to case. In a previous step (p. 111), tokenize_words() lowercases the text by default. Thus, there are two solutions.

We can use lowercased “jabberwock”:

which.max(leven[, "jabberwock"] )
which.min(leven[, "jabberwock"] )

Or, we can set lowercase to FALSE in the tokenize_words() function:

tokenize_words(carroll$text, lowercase = FALSE)

Chapter 8: Tagging Words

"object 'rating' not found"
In Chapter 8 (p. 136 and p. ) there is a missing package required: `tidyverse`.

When running:

text_lib <- blogs |> filter(rating == "Liberal")

We will see the error:

Error: object 'rating' not found

This is weird, because the blogs dataframe definitely has a column named, “rating!” What is happening is that without loading tidyverse (which loads the dplyr package), R is defaulting to the filter() function in the base stats package. Therefore, fix it with:

library(tidyverse)

Furthermore, we won’t be able to run several other code chunks in this section without this package loaded.

No spaCy environment found
In Chapter 8 (p. 143) after running:
spacy_initialize(model = "en_core_web_sm", condaenv = "myenv") 

We may see the error message:

No spaCy environment found. Use `spacy_install()` to get started.

Luckily, running spacy_install() – as the spacyr message states – does appear to resolve the issue!

From inspecting the spacyr package further, it seems that spacy_initialize() no longer takes the condaenv argument (i.e. it is deprecated). This is why it cannot find the Python package spacy that we installed during our setup (in Chapter 3, p. 29).

Chapter 11: Extended Inductive

Using `size` aesthetic for lines was deprecated
In Chapter 11 (p. 200-1), when we run:
df_effs |>
  ggplot(aes(x = rank, y = proportion)) +
  geom_errorbar(aes(ymin = lower, ymax = upper),
      width = 0.1, size = 1) +
  geom_point(size = 3) +
  facet_grid(~Topics) +
  coord_flip() +
  labs(x = "Rank", y = "Topic Proportion")

We may see the following warning:

Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0. Please use `linewidth` instead.

The geom_errorbar() function uses lines and therefore we need to change the size argument there:

df_effs |>
  ggplot(aes(x = rank, y = proportion)) +
  geom_errorbar(aes(ymin = lower, ymax = upper),
      width = 0.1, linewidth = 1) +
  geom_point(size = 3) +
  facet_grid(~Topics) +
  coord_flip() +
  labs(x = "Rank", y = "Topic Proportion")
A numeric `legend.position` argument in `theme()` was deprecated
In Chapter 11 (p. 233), when we run:
df_plot |>
  ggplot(aes(x = as.Date(date), y = value, color = name)) +
    geom_smooth(aes(linetype = name)) +
    scale_linetype_manual(values = c("twodash", "solid")) +
    labs(x = NULL, y = "Average Similarity to Press Releases") +
    guides(linetype = guide_legend(nrow = 2)) +
    theme(legend.position = c(.65, .1)) +
    facet_wrap(~lean)

We may see the following warning:

A numeric `legend.position` argument in `theme()` was deprecated in ggplot2 3.5.0. Please use the `legend.position.inside` argument of `theme()` instead. 

The fix is pretty straightforward from the warning message:

df_plot |>
  ggplot(aes(x = as.Date(date), y = value, color = name)) +
    geom_smooth(aes(linetype = name)) +
    scale_linetype_manual(values = c("twodash", "solid")) +
    labs(x = NULL, y = "Average Similarity to Press Releases") +
    guides(linetype = guide_legend(nrow = 2)) +
    theme(legend.position.inside = c(.65, .1)) +
    facet_wrap(~lean)

Chapter 12: Extended Deductive

unused arguments
In Chapter 12 (p. 244), when we run:
y_trn <- to_categorical(df_trn$label, num_classes = 3)

We may see the following error:

Error in (function (x, num_classes = NULL)  : 
  unused arguments (y =

This is odd because y is the primary argument in the function. It seems moving to an upgraded version of the keras package for R called keras3 does the trick. After installing keras3, we will need to start with a fresh R session and then replace library(keras) with library(keras3). That should do it!

could not find function "replace_non_ascii"
In Chapter 12 (p. 253), when we run:
df_shake <- df_shake |>
  mutate(gutenberg_id = as.character(gutenberg_id)) |>
  rename(line = text) |>
  mutate(text = replace_non_ascii(line),
        text = replace_curly_quote(text),
        text = replace_contraction(text),
        text = gsub("[[:punct:]]+", " ", text),
        text = gsub("[[:digit:]]+", " ", text),
        text = tolower(text),
        text = str_squish(text)) |>
  filter(text != "")

We see the following error:

In argument: `text = replace_non_ascii(line)`.
Caused by error in `replace_non_ascii()`:
! could not find function "replace_non_ascii"

We need to include library(textclean) when we load libraries for the session.

as.edgelist.sna input must be an adjacency matrix/array
In Chapter 12 (p. 263), when we run:
get_betweenness(doc_proj) |>
 slice_max(centrality, n = 4, with_ties = FALSE)

We see the following error:

as.edgelist.sna input must be an adjacency matrix/array, edgelist matrix, network, or sparse matrix, or list thereof.

We will also see a warning:

`graph.adjacency()` was deprecated in igraph 2.0.0. Please use `graph_from_adjacency_matrix()` instead.

The get_betweeness() function is one we defined in the session (also on p. 263), so we’ll need to tweak it a bit. Here is the original:

get_betweenness <- function(x) {
  gr <- graph.adjacency(x, mode = "undirected",
                        weighted = TRUE, diag = FALSE)
  E(gr)$weight <- 1 / E(gr)$weight
  df <- data.frame(centrality = betweenness(gr))
  return(df)
}

The culprit of the error is the order we loaded the packages. The igraph package is loaded and then the sna package, but they both have a function called betweenness(). This means the betweenness() from the sna package is masking the betweenness() from the igraph package. We can fix it either by reordering how we load the packages, or using an explicit call with the double colon operator. Here is the updated version of our own function:

get_betweenness <- function(x) {
  gr <- graph_from_adjacency_matrix(x, mode = "undirected",
                        weighted = TRUE, diag = FALSE)
  E(gr)$weight <- 1 / E(gr)$weight
  df <- data.frame(centrality = igraph::betweenness(gr))
  return(df)
}