Class 13
Saving Stuff for New Projects
Preparation Materials
Agenda
Today we’ll focus on:
- regex resources
- saving files
- preparing for Homework 2
Regex
Regex is short for “regular expressions”, which are a tool that can be used in any programming language to help with matching complex string patterns. For example, what if you want to find all sentences containing a word starting with the letter “b”, or all of the word-final letters?
Have you used regex for anything before?
Regex is very useful, but a bit arcane, and in practice most folks seek help on how to match the pattern their interested using a combination of Google, Stack Overflow, and now AI tools like ChatGPT. You are welcome to also use these tools to generate your regex expressions in assignments for this class, but to even do so you need to understand some basics, and how to use these in R!
Today I just want to give a quick demo and point you to some resources in case you want to use any regex in your last steps of your projects.
You can use regex with various R functions, such as grep(), separate_wider_regex(), list.files(), str_detect(), other {stringr} functions and many more.
This is a good reference to start with for R:
Some more resources:
Saving R Objects as Files
Often you want to share objects created in an analysis across projects, or to other people, without repeating the code. There are multiple ways to do this.
More on writing output at https://r4ds.hadley.nz/data-import.html#sec-writing-to-a-file
RDS files
You can save any kind of R object as an rds file, if you are saving it to use in another R script/project. This is an R-specific “binary” format. Binary format means you cannot “read” it as text, it is only interpretable by an R interpreter.
Advantages of RDS format:
- size efficient
- works for all types of R objects (not just rectangular dataframes)
- maintains all variable types and object structure
To write and read RDS, you can use write_rds() and read_rds() (or base R saveRDS() - not writeRDS! - and readRDS()).
Create a vector containing whatever you like. Use help to write the code to save the vector to a file called myvec.rds. Where does the file end up?
CSV, TSV, Excel files
You can also save to CSV, TSV, or Excel formats, just like you can read them in. The disadvantage is that you will lose some of the structure of the data (variable type), and these formats do not work for objects like statistical models or plots.
For these formats, you can use the write_ parallel to the read_ functions.
Google Sheets
You can use the googlesheets4 package to write sheets as well, using the write_sheet() function. Like other spreadsheet formats, you will lose some of the format information preserved in RDS files, but it is very convenient to be able to easily share the sheet with others.
When you use a write or save function, these will overwrite existing files with the same name/directory.
If you only want to save something once, it’s safest to do so using the console instead of leaving the function in your script, so that it doesn’t overwrite the original version every time you render/run.
If you want to regularly save output, but not always overwrite, you might leave the function in your script but comment it out to only run as needed.
Saving Figures
You can use ggsave() to save ggplot figures. Adjust the arguments to get the right format and dimensions for your image!
Preparing for Homework 2
Let’s get this project started!
Create a new project in a new folder.
Create a “data” folder in your new project.
Collect your data from Homework 1 - save it as an RDS file. Later you will copy this to your Homework 2 data folder.
In Homework 1 - if you didn’t already, let’s work on the bonus question. Get those sentences out of the column headers.
One approach, getting a chance to use separate_wider_position():
#> Auto-refreshing stale OAuth token.
#> ✔ Reading from "LING 343 Dialect Survey (Responses)".
#> ✔ Range 'Form Responses 1'.
df_sentences <- names(df) |> tibble() |>
rename(sentences = `names(df)`)
df_sentences |>
filter(str_detect(sentences, "Q")) |>
separate_wider_position(sentences, widths= c(number = 4, sentences = 100), too_few = "align_start")#> # A tibble: 9 × 2
#> number sentences
#> <chr> <chr>
#> 1 "Q94:" " “He used to nap on the couch, but he sprawls out in that new lounge …
#> 2 "Q95:" " “I do exclusively figurative paintings anymore.”"
#> 3 "Q96:" " “Pantyhose are so expensive anymore that I just try to get a good su…
#> 4 "Q97:" " “Forget the nice clothes anymore” (referring to babies eating messil…
#> 5 "Q264" " What do you call the area of grass that occurs in the middle of some…
#> 6 "Q142" " What do you call the night before Halloween?"
#> 7 "Q30 " "What is your generic term for a sweetened carbonated beverage?"
#> 8 "Q151" " How do you pronounce “bag”?"
#> 9 "Q178" " How do you pronounce the first vowel in “apricot”?"
Save your sentences/questions to another RDS file.
Copy your RDS files to your Homework 2 project so you can work on them there.