The only prerequisite for the R training is to install the latest version of R and RStudio on your computer. These should be available in the Company Portal in Entra, and shouldn’t require special permissions to install. We’ll talk about the difference between R and RStudio on the first day, but for now, just make sure they’re installed.
Your R version should be at least 4.4.3 or above to make sure everyone’s code behaves the same way. Likely the R version you have is 4.5.2, and the RStudio version is 2025.06.2-418 or higher. This is how it may appear in Company Portal. If your machine hasn’t migrated to Entra yet, it may appear different.
A number of packages are required to follow along with data wrangling and visualization sessions. Please try to install these in RStudio ahead of time by running the code below. If you don’t know how to run the code, open view the Running Code Screencast below for how to do this.
packages <- c("tidyverse", # for Day 2 and 3 data wrangling
"RColorBrewer", "viridis", "patchwork", # for Day 3 ggplot
"readxl", "writexl") # for day 1 importing from excel
install.packages(setdiff(packages, rownames(installed.packages())))
# Check that installation worked
library(tidyverse) # turns on core tidyverse packages
library(RColorBrewer) # palette generator
library(viridis) # more palettes
library(patchwork) # multipanel plots
library(readxl) # reading xlsx
library(writexl) # writing xlsxGoals for Day 1:
Feedback: Please leave feedback in the
training feedback
form. You can submit feedback multiple times and don’t need to
answer every question. Responses are anonymous.
R is a programming language that was originally developed by statisticians for statistical computing and graphics. R is free and open source. That means you will never need a paid license to use it, and you can view the underlying source code of any function and suggest fixes and improvements. Since its first official release in 1995, R remains one of the leading programming languages for statistics and data visualization, and its capabilities continue to grow.
When you install R, it comes with a simple user interface that lets you write and execute code. However, writing code in this interface is similar to writing a report in Notepad: it’s simple and straightforward, but you likely need more features than Notepad has to format your document. This is where RStudio comes in.
For more information on the history of R, visit the R Project website.This is primarily where you write code. When you create a new script or open an existing one, it displays here. In the screenshot above, there’s a script called bat_data_wrangling.R open in the source pane. Note that if you haven’t yet opened or created a new script, you won’t see this pane until you do.
The source pane color-codes your code to make it easier to read, and detects syntax errors (the coding equivalent of a spell checker) by flagging the line number with a red “x” and showing a squiggly line under the offending code.
When you’re ready to run all or part of your script:This is where the code actually runs. When you first open RStudio, the console will tell you the version of R that you’re running (should be R 4.4.1 or greater).
While most often you’ll run code from a script in the source pane, you can also run code directly in the console. Code in the console won’t get saved to a file, but it’s a great way to experiment and test out lines of code before adding them to your script in the source pane. The console is also where errors appear if your code breaks. Deciphering errors can be a challenge that gets easier over time. Googling errors is a good place to start.File organization is an important part of being a good coder. Keeping code, input data, and results together in one place will protect your sanity and the sanity of the person who inherits the project. R Studio projects help with this. Creating a new R Studio project for each new code project makes it easier to manage settings and file paths.
Before we create a project, take a look at the Console tab.
Notice that at the top of the console there is a folder path. That path
is your current working directory.
If you refer to a file in R using a relative path, for example
./data/my_data_file.csv, R will look in your current
working directory for a folder called data containing a
file called my_data_file.csv.
Note the use of forward slashes instead of back slashes for file paths. You can either use a forward slash (/) or a double back slash for file paths. The paths below are equivalent and the full file path the relative path above is specifying.
Using relative paths is a helpful because the full path will be specific to your computer and likely won’t work on a different computer. But there’s no guarantee that everyone has the same default R working directory. This is where projects come in. Projects package all of your code, data, output, etc. into a file type that is easily transferrable to other machines regardless of file location.imd_r_intro. Next, you’ll select
what folder to keep your project folder in. Documents/R is
a good place to store all of your R projects but it’s up to you. When
you are done, click on Create Project.
If you successfully started a project named
imd_r_intro, you should see it listed at the very top right
of your screen. As you start new projects, you’ll want to check that
you’re working in the right one before you start coding. Take a look at
the Console tab again. Notice that your current working directory
is now your project folder. When you look in the Files tab of the
bottom right pane, you’ll see that it also defaults to the project
folder.
list.files() function, which lists everything in the
working directory of your project.
day_1_script.R. Make sure you are working in the
imd_r_intro project that you just created. Click on the
New File icon
day_1_script.R.
We’ll start with something simple. Basic math in R is pretty straightforward and the syntax is similar to simply using a graphing calculator. You can use the examples below or come up with your own. Even if you’re using the examples, try to actually type the code instead of copy-pasting - you’ll learn to code faster that way.
To run a single line of code in your script, place your cursor anywhere in that line and press CTRL+ENTER (or click the Run button in the top right of the script pane). To run multiple lines of code, highlight the lines you want to run and hit CTRL+ENTER or click Run.
To leave notes in your script, use the hashtag/pound sign (#). This will change the color of text that R reads as a comment and doesn’t run. Commenting your code is one of the best habits you can form. Comments are a gift to your future self and anyone else who tries to use your code.
Type code below in your script and run each line.
# Commented text: try this line to generate some basic text and become familiar with where results will appear:
print("Welcome to R!")## [1] "Welcome to R!"
## [1] 2
## [1] 1.5
## [1] 3
## [1] 669.6619
## [1] -1
Coding Tip: Notice that when you run a line of code, the code and the result appear in the console. You can also type code directly into the console, but it won’t be saved anywhere. As you get more comfortable with R, it can be helpful to use the console as a “scratchpad” for experimenting and troubleshooting. For now, it’s best to err on the side of saving your code as a script so that you don’t accidentally lose useful work.
Occasionally, it’s enough to just run a line of code and display the result in the console. But typically our code is more complex than adding one plus one, and we want to store the result and use it later in the script. This is where variables come in. Variables allow you to assign a value (whether that’s a number, a data table, a chunk of text, or any other type of data that R can handle) to a short, human-readable name. Anywhere you put a variable in your code, R will replace it with its value when your code runs. Variables are also called objects in R.
R uses the <- symbol for variable assignment. If
you’ve used other programming languages, you may be tempted to use
= instead. It will work, but there are subtle differences
between <- and =, so you should get in the
habit of using <-.
R is case-sensitive. So if you name one object treedata
and another Treedata or TREEDATA, R will
interpret these all as unique objects. While you can do things like
this, it’s best practice not to use the same name for different objects,
as it makes code difficult to follow.
Type code below to assign values to variables named a and b
# the value of 12.098 is assigned to variable 'a'
a <- 12.098
# and the value 65.3475 is assigned to variable 'b'
b <- 65.3475
# we can now perform whatever mathematical operations we want using these two
# variables without having to repeatedly type out the actual numbers:
a*b## [1] 790.5741
## [1] 7.305156e+68
## [1] 538.7261
In the code above, we assign the variables a and
b once. We can then reuse them as often as we want. This is
helpful because we save ourselves some typing, reduce the chances of
making a typo somewhere, and if we need to change the value of
a or b, we only have to do it in one
place.
Also notice that when you assign variables, you can see them listed in your Environment tab (top right pane). Remember, everything you see in the environment is just in R’s temporary memory and won’t be saved when you close out of RStudio.
All of the examples you’ve seen so far are fairly contrived for the sake of simplicity. Let’s take a look at some code that everyone here will make use of at some point: reading data from a CSV.It’s hard to get very far in R without making use of functions. Think of a function as a programmed task that takes some kind of input (the argument(s)) from the user and outputs a result (the return value).
Coding Tip: Note the difference in how RStudio color codes what it thinks are functions. There are a lot of pre-programmed functions in base R, which is what comes along with R when you install R. Installing R packages will add additional functions. You can also build your own. Names that R recognizes as a function are color coded differently than what R recognizes as text, numbers, etc. It’s good practice to not use existing functions as new object names.
Commonly used base R functions include:mean(): calculate the mean of a set of numbers
min(): calculate the minimum of a set of numbers
max(): calculate the maximum of a set of numbers
range(): calculate the min and max of a set of numbers
sd(): calculate the standard deviation of set of numbers
sqrt(): calculate the square root of a value
Calculate mean and range to see how functions work
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# equivalent to x <- 1:10
# bad coding
#mean <- mean(x)
# good coding
mean_x <- mean(x)
mean_x## [1] 5.5
## [1] 1 10
Most of the work we do in R relies on one or more existing datasets that we want to query or summarize, rather than creating our own in R. Importing data in R is therefore an important skill. R can import just about any data type, including CSV and MS Excel files. You can also import tables from MS Access and SQL databases using ODBC drivers. That’s beyond the scope of this class, but I can share examples for anyone needing to import from a database. For now, I’ll show how to work with CSVs and Excel spreadsheets.
We use the read.csv() function to import CSVs in R. The
read.csv() function takes the file path or url to the CSV
as input and outputs a data frame containing the data from the CSV. Here
we’re going to read a CSV from a website, then save that in the data
folder of our project. We’ll talk more about what data frames are
next.
Run the following line to import a teaching ACAD wetland data set from the github repository for this training
# read in the data from ACAD_wetland_data_clean.csv and assign it as a dataframe to the variable "ACAD_wetland"
ACAD_wetland <- read.csv(
"https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/ACAD_wetland_data_clean.csv"
)
View the data in a separate window by running the View()
function.
Or, check out the first few or last few records in your console. Click on View R output to view output.
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov
## 1 SEN-01 Sentinel Acer rubrum red maple 2011 0 0.02
## 2 SEN-01 Sentinel Amelanchier serviceberry 2011 20 0.02
## 3 SEN-01 Sentinel Andromeda polifolia bog rosemary 2011 80 2.22
## 4 SEN-01 Sentinel Arethusa bulbosa dragon's mouth 2011 40 0.04
## 5 SEN-01 Sentinel Aronia melanocarpa black chokeberry 2011 100 2.64
## 6 SEN-01 Sentinel Carex exilis coastal sedge 2011 60 6.60
## Invasive Protected X_Coord Y_Coord
## 1 FALSE FALSE 574855.5 4911909
## 2 FALSE FALSE 574855.5 4911909
## 3 FALSE FALSE 574855.5 4911909
## 4 FALSE TRUE 574855.5 4911909
## 5 FALSE FALSE 574855.5 4911909
## 6 FALSE FALSE 574855.5 4911909
## Site_Name Site_Type Latin_Name
## 503 RAM-05 RAM Vaccinium oxycoccos
## 504 RAM-05 RAM Vaccinium vitis-idaea
## 505 RAM-05 RAM Viburnum nudum var. cassinoides
## 506 RAM-05 RAM Viburnum nudum var. cassinoides
## 507 RAM-05 RAM Xyris montana
## 508 RAM-05 RAM Xyris montana
## Common Year PctFreq Ave_Cov Invasive Protected X_Coord
## 503 small cranberry 2012 100 0.04 FALSE FALSE 553186
## 504 lingonberry 2017 25 0.02 FALSE FALSE 553186
## 505 northern wild raisin 2017 100 0.84 FALSE FALSE 553186
## 506 northern wild raisin 2012 100 63.00 FALSE FALSE 553186
## 507 northern yellow-eyed-grass 2017 50 0.44 FALSE FALSE 553186
## 508 northern yellow-eyed-grass 2012 50 1.24 FALSE FALSE 553186
## Y_Coord
## 503 4899764
## 504 4899764
## 505 4899764
## 506 4899764
## 507 4899764
## 508 4899764
Now write the csv to disk and show how to import from your computer.
# Write the data frame to your data folder using a relative path.
# By default, write.csv adds a column with row names that are numbers. I don't
# like that, so I turn that off.
write.csv(ACAD_wetland, "./data/ACAD_wetland_data_clean.csv", row.names = FALSE)Make sure the writing to disk worked by importing the CSV from your computer
# Read the data frame in using a relative path
ACAD_wetland <- read.csv("./data/ACAD_wetland_data_clean.csv")
# Equivalent code to read in the data frame using full path on my computer, but won't match another user.
ACAD_wetland <- read.csv("C:/Users/KMMiller/OneDrive - DOI/NETN/R_Dev/IMD_R_Training_2026/data/ACAD_wetland_data_clean.csv")Base R does not have a way to import MS Excel files. The first step
for working with Excel files (i.e., files with .xls or .xlsx
extensions), therefore, is to install the readxl package to
import .xlsx files and writexl to write files to .xlsx. The
readxl package has a couple of options for loading Excel
spreadsheets, depending on whether the extension is .xls, .xlsx, or
unknown, along with options to import different worksheets within a
spreadsheet.
The code below installs the required packages (if you forgot to ahead of time), loads them, then first writes the ACAD_wetland CSV we just imported to an .xlsx. The last step imports the .xslx version of the ACAD wetland data.
read_xlsx() function can’t read
from a url like read.csv() can.
## # A tibble: 6 × 11
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov Invasive Protected
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl> <lgl>
## 1 SEN-01 Sentinel Acer rubr… red m… 2011 0 0.02 FALSE FALSE
## 2 SEN-01 Sentinel Amelanchi… servi… 2011 20 0.02 FALSE FALSE
## 3 SEN-01 Sentinel Andromeda… bog r… 2011 80 2.22 FALSE FALSE
## 4 SEN-01 Sentinel Arethusa … drago… 2011 40 0.04 FALSE TRUE
## 5 SEN-01 Sentinel Aronia me… black… 2011 100 2.64 FALSE FALSE
## 6 SEN-01 Sentinel Carex exi… coast… 2011 60 6.6 FALSE FALSE
## # ℹ 2 more variables: X_Coord <dbl>, Y_Coord <dbl>
The data frame we just examined is a type of data structure. A data structure is what it sounds like: a structure that holds data in an organized way. There are multiple data structures in R, including vectors, lists, arrays, matrices, data frames, and tibbles (more on this data structure later). Today we’ll focus on vectors and data frames.
Vectors are the simplest data structure in R. Vectors are like a single column of data in an Excel spreadsheet. Vectors only have one dimension, and can be accessed by their row number. Here are some examples of vectors:
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 2 3 4 5 6 7 8 9 10 11
## [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
## [1] 12.5 20.4 18.1 38.5 19.3
bird_ids <- c("black-capped chickadee", "dark-eyed junco", "golden-crowned kinglet", "dark-eyed junco")
bird_ids## [1] "black-capped chickadee" "dark-eyed junco" "golden-crowned kinglet"
## [4] "dark-eyed junco"
Note the use of c(). The c() function
stands for combine, and it combines elements into a single
vector, with each element separated by a comma in code. The c() function
is a fairly universal way to combine multiple elements in R, and you’re
going to see it over and over. Note how in digits, when we added a 1,
every value in digits increased by 1. This highlights the concept of
vectorization in R. The general idea being that you can apply a
single operation to a vector (or row in a data frame), and it will apply
to all elements of that vector.
If you need to access a single element of a vector, you can use the
syntax my_vector[x] where x is the element’s
index (the number corresponding to its position in the vector).
You can also use a vector of indices to extract multiple elements from
the vector. Note that in R, indexing starts at 1
(i.e. my_vector[1] is the first element of
my_vector). If you’ve coded in other languages, you may be
used to indexing starting at 0.
## [1] "dark-eyed junco"
## [1] "black-capped chickadee" "dark-eyed junco"
You can also return only unique values from a vector. The
bird_ids vector has dark-eyed juncos listed twice. To get
only unique species, run the following code. I also added
sort() to sort the list alphabetically.
## [1] "black-capped chickadee" "dark-eyed junco" "golden-crowned kinglet"
In the examples above, each vector contains a different type of data:
digits contains integers, is_odd contains
logical (TRUE/FALSE) values, bird_ids contains text, and
tree_dbh contains decimal numbers. That’s because a given
vector can only contain a single type of data.
In R, there are six main data types:
"hello", "3",
"R is my favorite programming language")23,
3.1415)L to it or use
as.integer() (e.g. 5L,
as.integer(30)).TRUE,
FALSE). Note that TRUE and FALSE
must be all-uppercase.You can use the class() function to get the data type of
a vector:
## [1] "character"
## [1] "numeric"
## [1] "integer"
## [1] "logical"
Data frames are the main way will be interacting with data in R. They’re essentially like spreadsheets in excel with specific properties.
Properties of data frames:
Coding Tip: R is strict about assigning data types to columns,
such that any text in an otherwise numeric field will turn the entire
column into a character. Similarly, if there’s anything besides TRUE,
FALSE, or a blank in a field meant to be TRUE/FALSE, R will treat that
as a character field instead of logical. So, if R treats as a character
something that should be a numeric field, it’s a good clue there may be
a typo or issue in your data needing attention. You can check the
assigned data types using the str() function.
## 'data.frame': 508 obs. of 11 variables:
## $ Site_Name : chr "SEN-01" "SEN-01" "SEN-01" "SEN-01" ...
## $ Site_Type : chr "Sentinel" "Sentinel" "Sentinel" "Sentinel" ...
## $ Latin_Name: chr "Acer rubrum" "Amelanchier" "Andromeda polifolia" "Arethusa bulbosa" ...
## $ Common : chr "red maple" "serviceberry" "bog rosemary" "dragon's mouth" ...
## $ Year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ PctFreq : int 0 20 80 40 100 60 100 60 100 100 ...
## $ Ave_Cov : num 0.02 0.02 2.22 0.04 2.64 6.6 10.2 0.06 0.86 4.82 ...
## $ Invasive : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Protected : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ X_Coord : num 574855 574855 574855 574855 574855 ...
## $ Y_Coord : num 4911909 4911909 4911909 4911909 4911909 ...
$
One way to access the column dimension in data frames is to use the
$ syntax. The $ is used to separate the data
frame name from the column name. It’s similar to the
[table_name].[column_name] syntax in Access.
To view the names of the columns in a data frame, you can use the
names() function, or use head() to see the
first 6 rows with column names. Whatever you prefer. I’ll use the former
for now.
See column names in wetland data.
## [1] "Site_Name" "Site_Type" "Latin_Name" "Common" "Year"
## [6] "PctFreq" "Ave_Cov" "Invasive" "Protected" "X_Coord"
## [11] "Y_Coord"
See list of all sites and species in the wetland data. You can view the output by clicking on the R output drop down.
## [1] "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01"
## [9] "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01"
## [17] "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01"
## [25] "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01" "SEN-01"
## [33] "SEN-01" "SEN-01" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02"
## [41] "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02"
## [49] "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02"
## [57] "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02"
## [65] "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02" "SEN-02"
## [73] "SEN-02" "SEN-02" "SEN-02" "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03"
## [81] "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03"
## [89] "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03"
## [97] "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03" "SEN-03"
## [105] "SEN-03" "SEN-03" "SEN-03" "SEN-03" "RAM-41" "RAM-41" "RAM-41" "RAM-41"
## [113] "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41"
## [121] "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41"
## [129] "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41"
## [137] "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41"
## [145] "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41"
## [153] "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41"
## [161] "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41"
## [169] "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41"
## [177] "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-41" "RAM-53" "RAM-53" "RAM-53"
## [185] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [193] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [201] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [209] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [217] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [225] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [233] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [241] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [249] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [257] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [265] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53"
## [273] "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-53" "RAM-62"
## [281] "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62"
## [289] "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62"
## [297] "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62"
## [305] "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62"
## [313] "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62"
## [321] "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62" "RAM-62"
## [329] "RAM-62" "RAM-62" "RAM-62" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44"
## [337] "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44"
## [345] "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44"
## [353] "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44"
## [361] "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44"
## [369] "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44"
## [377] "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44"
## [385] "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44"
## [393] "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44"
## [401] "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44" "RAM-44"
## [409] "RAM-44" "RAM-44" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [417] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [425] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [433] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [441] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [449] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [457] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [465] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [473] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [481] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [489] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [497] "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [505] "RAM-05" "RAM-05" "RAM-05" "RAM-05"
## [1] "Acer rubrum" "Amelanchier"
## [3] "Andromeda polifolia" "Arethusa bulbosa"
## [5] "Aronia melanocarpa" "Carex exilis"
## [7] "Chamaedaphne calyculata" "Drosera intermedia"
## [9] "Drosera rotundifolia" "Empetrum nigrum"
## [11] "Eriophorum angustifolium" "Eriophorum vaginatum"
## [13] "Gaylussacia baccata" "Gaylussacia dumosa"
## [15] "Ilex mucronata" "Juniperus communis"
## [17] "Kalmia angustifolia" "Kalmia polifolia"
## [19] "Larix laricina" "Myrica gale"
## [21] "Nuphar variegata" "Picea mariana"
## [23] "Rhododendron canadense" "Rhododendron groenlandicum"
## [25] "Rhynchospora alba" "Sarracenia purpurea"
## [27] "Solidago uliginosa" "Symplocarpus foetidus"
## [29] "Trichophorum cespitosum" "Trientalis borealis"
## [31] "Utricularia cornuta" "Vaccinium oxycoccos"
## [33] "Viburnum nudum var. cassinoides" "Xyris montana"
## [35] "Acer rubrum" "Arethusa bulbosa"
## [37] "Aronia melanocarpa" "Betula populifolia"
## [39] "Carex exilis" "Carex lasiocarpa"
## [41] "Carex stricta" "Carex utriculata"
## [43] "Chamaedaphne calyculata" "Drosera intermedia"
## [45] "Drosera rotundifolia" "Dulichium arundinaceum"
## [47] "Eriophorum virginicum" "Gaylussacia baccata"
## [49] "Ilex mucronata" "Ilex verticillata"
## [51] "Juncus acuminatus" "Kalmia angustifolia"
## [53] "Kalmia polifolia" "Larix laricina"
## [55] "Lysimachia terrestris" "Maianthemum trifolium"
## [57] "Muhlenbergia uniflora" "Myrica gale"
## [59] "Oclemena nemoralis" "Picea mariana"
## [61] "Pinus strobus" "Rhododendron canadense"
## [63] "Rhododendron groenlandicum" "Rhynchospora alba"
## [65] "Rubus hispidus" "Sarracenia purpurea"
## [67] "Scutellaria lateriflora" "Spiraea alba"
## [69] "Spiraea tomentosa" "Thuja occidentalis"
## [71] "Triadenum virginicum" "Vaccinium angustifolium"
## [73] "Vaccinium corymbosum" "Vaccinium macrocarpon"
## [75] "Vaccinium oxycoccos" "Acer rubrum"
## [77] "Alnus incana++" "Amelanchier"
## [79] "Aronia melanocarpa" "Carex stricta"
## [81] "Carex trisperma" "Chamaedaphne calyculata"
## [83] "Cornus canadensis" "Drosera rotundifolia"
## [85] "Eriophorum angustifolium" "Eurybia radula"
## [87] "Gaultheria hispidula" "Gaylussacia baccata"
## [89] "Ilex mucronata" "Ilex verticillata"
## [91] "Kalmia angustifolia" "Kalmia polifolia"
## [93] "Larix laricina" "Morella pensylvanica"
## [95] "Osmundastrum cinnamomea" "Picea mariana"
## [97] "Pinus banksiana" "Rhododendron canadense"
## [99] "Rhododendron groenlandicum" "Sarracenia purpurea"
## [101] "Sorbus americana" "Spiraea alba"
## [103] "Thuja occidentalis" "Trientalis borealis"
## [105] "Vaccinium angustifolium" "Vaccinium macrocarpon"
## [107] "Vaccinium oxycoccos" "Viburnum nudum"
## [109] "Acer rubrum" "Acer rubrum"
## [111] "Alnus incana" "Amelanchier"
## [113] "Amelanchier" "Aronia melanocarpa"
## [115] "Berberis thunbergii" "Calamagrostis canadensis"
## [117] "Carex folliculata" "Carex trisperma"
## [119] "Carex trisperma" "Carex"
## [121] "Chamaedaphne calyculata" "Chamaedaphne calyculata"
## [123] "Cornus canadensis" "Cornus canadensis"
## [125] "Doellingeria umbellata" "Doellingeria umbellata"
## [127] "Dryopteris cristata" "Empetrum nigrum"
## [129] "Eriophorum angustifolium" "Eriophorum angustifolium"
## [131] "Gaylussacia baccata" "Gaylussacia baccata"
## [133] "Ilex mucronata" "Ilex mucronata"
## [135] "Ilex verticillata" "Iris versicolor"
## [137] "Juncus effusus" "Kalmia angustifolia"
## [139] "Kalmia angustifolia" "Larix laricina"
## [141] "Larix laricina" "Maianthemum canadense"
## [143] "Maianthemum canadense" "Maianthemum trifolium"
## [145] "Myrica gale" "Myrica gale"
## [147] "Onoclea sensibilis" "Osmundastrum cinnamomea"
## [149] "Osmundastrum cinnamomea" "Picea glauca"
## [151] "Picea mariana" "Picea mariana"
## [153] "Prenanthes" "Rhododendron canadense"
## [155] "Rhododendron canadense" "Rhododendron groenlandicum"
## [157] "Rhododendron groenlandicum" "Rosa nitida"
## [159] "Rosa nitida" "Rosa palustris"
## [161] "Rubus flagellaris" "Rubus hispidus"
## [163] "Rubus hispidus" "Sarracenia purpurea"
## [165] "Solidago uliginosa" "Solidago uliginosa"
## [167] "Spiraea alba" "Spiraea alba"
## [169] "Symplocarpus foetidus" "Thelypteris palustris"
## [171] "Trientalis borealis" "Trientalis borealis"
## [173] "Vaccinium angustifolium" "Vaccinium angustifolium"
## [175] "Vaccinium corymbosum" "Vaccinium corymbosum"
## [177] "Vaccinium myrtilloides" "Vaccinium myrtilloides"
## [179] "Vaccinium vitis-idaea" "Viburnum nudum var. cassinoides"
## [181] "Viburnum nudum var. cassinoides" "Acer rubrum"
## [183] "Acer rubrum" "Alnus incana"
## [185] "Alnus incana" "Amelanchier"
## [187] "Amelanchier" "Aronia melanocarpa"
## [189] "Berberis thunbergii" "Berberis thunbergii"
## [191] "Calamagrostis canadensis" "Calamagrostis canadensis"
## [193] "Carex atlantica" "Carex atlantica"
## [195] "Carex folliculata" "Carex folliculata"
## [197] "Carex lasiocarpa" "Carex lasiocarpa"
## [199] "Carex stricta" "Carex stricta"
## [201] "Celastrus orbiculatus" "Chamaedaphne calyculata"
## [203] "Chamaedaphne calyculata" "Cornus canadensis"
## [205] "Cornus canadensis" "Danthonia spicata"
## [207] "Dichanthelium acuminatum" "Doellingeria umbellata"
## [209] "Doellingeria umbellata" "Drosera intermedia"
## [211] "Drosera rotundifolia" "Drosera rotundifolia"
## [213] "Dulichium arundinaceum" "Dulichium arundinaceum"
## [215] "Epilobium leptophyllum" "Eriophorum angustifolium"
## [217] "Eriophorum tenellum" "Eriophorum virginicum"
## [219] "Eriophorum virginicum" "Eurybia radula"
## [221] "Gaylussacia baccata" "Gaylussacia baccata"
## [223] "Glyceria" "Ilex mucronata"
## [225] "Ilex mucronata" "Ilex verticillata"
## [227] "Ilex verticillata" "Juncus canadensis"
## [229] "Kalmia angustifolia" "Kalmia angustifolia"
## [231] "Larix laricina" "Lysimachia terrestris"
## [233] "Lysimachia terrestris" "Morella pensylvanica"
## [235] "Myrica gale" "Myrica gale"
## [237] "Oclemena nemoralis" "Oclemena nemoralis"
## [239] "Osmunda regalis" "Osmunda regalis"
## [241] "Osmundastrum cinnamomea" "Osmundastrum cinnamomea"
## [243] "Picea rubens" "Pinus strobus"
## [245] "Pinus strobus" "Pogonia ophioglossoides"
## [247] "Quercus rubra" "Rhamnus frangula"
## [249] "Rhamnus frangula" "Rhododendron canadense"
## [251] "Rhododendron canadense" "Rhynchospora alba"
## [253] "Rosa palustris" "Rubus flagellaris"
## [255] "Rubus hispidus" "Scirpus cyperinus"
## [257] "Scirpus cyperinus" "Solidago rugosa"
## [259] "Solidago uliginosa" "Spiraea alba"
## [261] "Spiraea alba" "Spiraea tomentosa"
## [263] "Spiraea tomentosa" "Symphyotrichum novi-belgii"
## [265] "Thelypteris palustris" "Thelypteris palustris"
## [267] "Triadenum virginicum" "Triadenum"
## [269] "Trientalis borealis" "Typha latifolia"
## [271] "Typha latifolia" "Vaccinium angustifolium"
## [273] "Vaccinium corymbosum" "Vaccinium corymbosum"
## [275] "Vaccinium macrocarpon" "Vaccinium macrocarpon"
## [277] "Viburnum nudum var. cassinoides" "Viburnum nudum var. cassinoides"
## [279] "Viola" "Acer rubrum"
## [281] "Acer rubrum" "Amelanchier"
## [283] "Aronia melanocarpa" "Carex limosa"
## [285] "Carex trisperma" "Carex trisperma"
## [287] "Chamaedaphne calyculata" "Chamaedaphne calyculata"
## [289] "Cornus canadensis" "Cornus canadensis"
## [291] "Eriophorum virginicum" "Eriophorum virginicum"
## [293] "Gaultheria hispidula" "Gaultheria hispidula"
## [295] "Gaylussacia baccata" "Gaylussacia baccata"
## [297] "Ilex mucronata" "Ilex mucronata"
## [299] "Ilex verticillata" "Kalmia angustifolia"
## [301] "Kalmia angustifolia" "Larix laricina"
## [303] "Larix laricina" "Maianthemum trifolium"
## [305] "Maianthemum trifolium" "Monotropa uniflora"
## [307] "Monotropa uniflora" "Myrica gale"
## [309] "Myrica gale" "Picea mariana"
## [311] "Picea mariana" "Rhododendron canadense"
## [313] "Rhododendron canadense" "Rhododendron groenlandicum"
## [315] "Rhododendron groenlandicum" "Symplocarpus foetidus"
## [317] "Symplocarpus foetidus" "Trientalis borealis"
## [319] "Trientalis borealis" "Vaccinium angustifolium"
## [321] "Vaccinium angustifolium" "Vaccinium corymbosum"
## [323] "Vaccinium corymbosum" "Vaccinium myrtilloides"
## [325] "Vaccinium myrtilloides" "Vaccinium oxycoccos"
## [327] "Vaccinium oxycoccos" "Vaccinium vitis-idaea"
## [329] "Vaccinium vitis-idaea" "Viburnum nudum var. cassinoides"
## [331] "Viburnum nudum var. cassinoides" "Acer rubrum"
## [333] "Acer rubrum" "Alnus incana"
## [335] "Alnus incana" "Apocynum androsaemifolium"
## [337] "Apocynum androsaemifolium" "Betula populifolia"
## [339] "Betula populifolia" "Calamagrostis canadensis"
## [341] "Calamagrostis canadensis" "Carex atlantica"
## [343] "Carex lacustris" "Carex lacustris"
## [345] "Carex lasiocarpa" "Carex lasiocarpa"
## [347] "Carex Ovalis group" "Carex stricta"
## [349] "Carex stricta" "Chamaedaphne calyculata"
## [351] "Chamaedaphne calyculata" "Comptonia peregrina"
## [353] "Doellingeria umbellata" "Doellingeria umbellata"
## [355] "Dryopteris cristata" "Dulichium arundinaceum"
## [357] "Equisetum arvense" "Eurybia macrophylla"
## [359] "Festuca filiformis" "Ilex mucronata"
## [361] "Ilex verticillata" "Ilex verticillata"
## [363] "Iris versicolor" "Iris versicolor"
## [365] "Juncus canadensis" "Juncus canadensis"
## [367] "Juncus effusus" "Lupinus polyphyllus"
## [369] "Lysimachia terrestris" "Lysimachia terrestris"
## [371] "Malus" "Myrica gale"
## [373] "Myrica gale" "Onoclea sensibilis"
## [375] "Onoclea sensibilis" "Osmunda regalis"
## [377] "Osmundastrum cinnamomea" "Phleum pratense"
## [379] "Pinus strobus" "Pinus strobus"
## [381] "Populus grandidentata" "Populus grandidentata"
## [383] "Populus tremuloides" "Populus tremuloides"
## [385] "Quercus rubra" "Ranunculus acris"
## [387] "Rhamnus frangula" "Rhamnus frangula"
## [389] "Rhododendron canadense" "Rosa nitida"
## [391] "Rosa palustris" "Rosa virginiana"
## [393] "Rubus hispidus" "Rubus"
## [395] "Salix petiolaris" "Salix"
## [397] "Scirpus cyperinus" "Scirpus cyperinus"
## [399] "Scutellaria" "Solidago rugosa"
## [401] "Spiraea alba" "Spiraea alba"
## [403] "Spiraea tomentosa" "Triadenum virginicum"
## [405] "Triadenum" "Utricularia cornuta"
## [407] "Vaccinium corymbosum" "Veronica officinalis"
## [409] "Vicia cracca" "Vicia cracca"
## [411] "Acer rubrum" "Acer rubrum"
## [413] "Alnus incana" "Amelanchier"
## [415] "Arethusa bulbosa" "Arethusa bulbosa"
## [417] "Aronia melanocarpa" "Aronia melanocarpa"
## [419] "Calamagrostis canadensis" "Calopogon tuberosus"
## [421] "Calopogon tuberosus" "Carex atlantica"
## [423] "Carex exilis" "Carex folliculata"
## [425] "Carex folliculata" "Carex magellanica"
## [427] "Carex pauciflora" "Carex trisperma"
## [429] "Carex trisperma" "Chamaedaphne calyculata"
## [431] "Chamaedaphne calyculata" "Cornus canadensis"
## [433] "Cornus canadensis" "Drosera intermedia"
## [435] "Drosera intermedia" "Drosera rotundifolia"
## [437] "Drosera rotundifolia" "Empetrum nigrum"
## [439] "Empetrum nigrum" "Eriophorum angustifolium"
## [441] "Eriophorum angustifolium" "Eriophorum virginicum"
## [443] "Eriophorum virginicum" "Eurybia radula"
## [445] "Gaultheria hispidula" "Gaultheria hispidula"
## [447] "Gaylussacia baccata" "Gaylussacia baccata"
## [449] "Gaylussacia dumosa" "Glyceria striata"
## [451] "Glyceria" "Ilex mucronata"
## [453] "Ilex mucronata" "Iris versicolor"
## [455] "Iris versicolor" "Kalmia angustifolia"
## [457] "Kalmia angustifolia" "Kalmia polifolia"
## [459] "Larix laricina" "Larix laricina"
## [461] "Lonicera - Exotic" "Maianthemum trifolium"
## [463] "Maianthemum trifolium" "Melampyrum lineare"
## [465] "Myrica gale" "Myrica gale"
## [467] "Oclemena nemoralis" "Oclemena nemoralis"
## [469] "Oclemena X blakei" "Osmundastrum cinnamomea"
## [471] "Picea mariana" "Picea mariana"
## [473] "Pogonia ophioglossoides" "Pogonia ophioglossoides"
## [475] "Rhododendron canadense" "Rhododendron canadense"
## [477] "Rhododendron groenlandicum" "Rhododendron groenlandicum"
## [479] "Rhynchospora alba" "Rhynchospora alba"
## [481] "Rosa nitida" "Rosa palustris"
## [483] "Rubus flagellaris" "Rubus hispidus"
## [485] "Sarracenia purpurea" "Sarracenia purpurea"
## [487] "Solidago uliginosa" "Solidago uliginosa"
## [489] "Spiraea alba" "Symphyotrichum novi-belgii"
## [491] "Symplocarpus foetidus" "Symplocarpus foetidus"
## [493] "Trichophorum cespitosum" "Trientalis borealis"
## [495] "Trientalis borealis" "Utricularia cornuta"
## [497] "Utricularia cornuta" "Vaccinium angustifolium"
## [499] "Vaccinium angustifolium" "Vaccinium corymbosum"
## [501] "Vaccinium corymbosum" "Vaccinium oxycoccos"
## [503] "Vaccinium oxycoccos" "Vaccinium vitis-idaea"
## [505] "Viburnum nudum var. cassinoides" "Viburnum nudum var. cassinoides"
## [507] "Xyris montana" "Xyris montana"
[ , ]
Remember that every data frame has 2 dimensions. The first dimension is rows and the second is columns. Thinking of the data in two dimensions in the order of rows then columns helps understand how brackets work.
Square brackets[rows, columns] are how you access specific
rows and columns in a data frame using base R. Examples include:
Square brackets were one of the hardest concepts when I was starting out. Don’t worry if this isn’t immediately intuitive. There are easier ways to work with data frame rows and columns, which you’ll learn on Day 2. It is still useful to have a basic understanding of how to interpret square brackets, as you will likely encounter them on StackOverflow or other R help sites. We’ll work through some examples of using the square brackets to access rows, columns and/or both.
The code below asks for the dimensions of the
ACAD_wetland data frame, and returns 508 11. That means
there are 508 rows, and 11 columns.
Return data frame number of rows and columns by checking data frame dimensions. Click on R output to view results.
## [1] 508 11
## [1] 508
## [1] 11
Return first 5 rows of the wetland data frame.
Note the comma with nothing to the right. That means return all columns.
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov
## 1 SEN-01 Sentinel Acer rubrum red maple 2011 0 0.02
## 2 SEN-01 Sentinel Amelanchier serviceberry 2011 20 0.02
## 3 SEN-01 Sentinel Andromeda polifolia bog rosemary 2011 80 2.22
## 4 SEN-01 Sentinel Arethusa bulbosa dragon's mouth 2011 40 0.04
## 5 SEN-01 Sentinel Aronia melanocarpa black chokeberry 2011 100 2.64
## Invasive Protected X_Coord Y_Coord
## 1 FALSE FALSE 574855.5 4911909
## 2 FALSE FALSE 574855.5 4911909
## 3 FALSE FALSE 574855.5 4911909
## 4 FALSE TRUE 574855.5 4911909
## 5 FALSE FALSE 574855.5 4911909
Return all rows and a subset of columns of the data frame
Note how the left side of the comma is empty. That means return all rows.
## Site_Name Latin_Name
## 1 SEN-01 Acer rubrum
## 2 SEN-01 Amelanchier
## 3 SEN-01 Andromeda polifolia
## 4 SEN-01 Arethusa bulbosa
## 5 SEN-01 Aronia melanocarpa
## 6 SEN-01 Carex exilis
## 7 SEN-01 Chamaedaphne calyculata
## 8 SEN-01 Drosera intermedia
## 9 SEN-01 Drosera rotundifolia
## 10 SEN-01 Empetrum nigrum
## 11 SEN-01 Eriophorum angustifolium
## 12 SEN-01 Eriophorum vaginatum
## 13 SEN-01 Gaylussacia baccata
## 14 SEN-01 Gaylussacia dumosa
## 15 SEN-01 Ilex mucronata
## 16 SEN-01 Juniperus communis
## 17 SEN-01 Kalmia angustifolia
## 18 SEN-01 Kalmia polifolia
## 19 SEN-01 Larix laricina
## 20 SEN-01 Myrica gale
## 21 SEN-01 Nuphar variegata
## 22 SEN-01 Picea mariana
## 23 SEN-01 Rhododendron canadense
## 24 SEN-01 Rhododendron groenlandicum
## 25 SEN-01 Rhynchospora alba
## 26 SEN-01 Sarracenia purpurea
## 27 SEN-01 Solidago uliginosa
## 28 SEN-01 Symplocarpus foetidus
## 29 SEN-01 Trichophorum cespitosum
## 30 SEN-01 Trientalis borealis
## 31 SEN-01 Utricularia cornuta
## 32 SEN-01 Vaccinium oxycoccos
## 33 SEN-01 Viburnum nudum var. cassinoides
## 34 SEN-01 Xyris montana
## 35 SEN-02 Acer rubrum
## 36 SEN-02 Arethusa bulbosa
## 37 SEN-02 Aronia melanocarpa
## 38 SEN-02 Betula populifolia
## 39 SEN-02 Carex exilis
## 40 SEN-02 Carex lasiocarpa
## 41 SEN-02 Carex stricta
## 42 SEN-02 Carex utriculata
## 43 SEN-02 Chamaedaphne calyculata
## 44 SEN-02 Drosera intermedia
## 45 SEN-02 Drosera rotundifolia
## 46 SEN-02 Dulichium arundinaceum
## 47 SEN-02 Eriophorum virginicum
## 48 SEN-02 Gaylussacia baccata
## 49 SEN-02 Ilex mucronata
## 50 SEN-02 Ilex verticillata
## 51 SEN-02 Juncus acuminatus
## 52 SEN-02 Kalmia angustifolia
## 53 SEN-02 Kalmia polifolia
## 54 SEN-02 Larix laricina
## 55 SEN-02 Lysimachia terrestris
## 56 SEN-02 Maianthemum trifolium
## 57 SEN-02 Muhlenbergia uniflora
## 58 SEN-02 Myrica gale
## 59 SEN-02 Oclemena nemoralis
## 60 SEN-02 Picea mariana
## 61 SEN-02 Pinus strobus
## 62 SEN-02 Rhododendron canadense
## 63 SEN-02 Rhododendron groenlandicum
## 64 SEN-02 Rhynchospora alba
## 65 SEN-02 Rubus hispidus
## 66 SEN-02 Sarracenia purpurea
## 67 SEN-02 Scutellaria lateriflora
## 68 SEN-02 Spiraea alba
## 69 SEN-02 Spiraea tomentosa
## 70 SEN-02 Thuja occidentalis
## 71 SEN-02 Triadenum virginicum
## 72 SEN-02 Vaccinium angustifolium
## 73 SEN-02 Vaccinium corymbosum
## 74 SEN-02 Vaccinium macrocarpon
## 75 SEN-02 Vaccinium oxycoccos
## 76 SEN-03 Acer rubrum
## 77 SEN-03 Alnus incana++
## 78 SEN-03 Amelanchier
## 79 SEN-03 Aronia melanocarpa
## 80 SEN-03 Carex stricta
## 81 SEN-03 Carex trisperma
## 82 SEN-03 Chamaedaphne calyculata
## 83 SEN-03 Cornus canadensis
## 84 SEN-03 Drosera rotundifolia
## 85 SEN-03 Eriophorum angustifolium
## 86 SEN-03 Eurybia radula
## 87 SEN-03 Gaultheria hispidula
## 88 SEN-03 Gaylussacia baccata
## 89 SEN-03 Ilex mucronata
## 90 SEN-03 Ilex verticillata
## 91 SEN-03 Kalmia angustifolia
## 92 SEN-03 Kalmia polifolia
## 93 SEN-03 Larix laricina
## 94 SEN-03 Morella pensylvanica
## 95 SEN-03 Osmundastrum cinnamomea
## 96 SEN-03 Picea mariana
## 97 SEN-03 Pinus banksiana
## 98 SEN-03 Rhododendron canadense
## 99 SEN-03 Rhododendron groenlandicum
## 100 SEN-03 Sarracenia purpurea
## 101 SEN-03 Sorbus americana
## 102 SEN-03 Spiraea alba
## 103 SEN-03 Thuja occidentalis
## 104 SEN-03 Trientalis borealis
## 105 SEN-03 Vaccinium angustifolium
## 106 SEN-03 Vaccinium macrocarpon
## 107 SEN-03 Vaccinium oxycoccos
## 108 SEN-03 Viburnum nudum
## 109 RAM-41 Acer rubrum
## 110 RAM-41 Acer rubrum
## 111 RAM-41 Alnus incana
## 112 RAM-41 Amelanchier
## 113 RAM-41 Amelanchier
## 114 RAM-41 Aronia melanocarpa
## 115 RAM-41 Berberis thunbergii
## 116 RAM-41 Calamagrostis canadensis
## 117 RAM-41 Carex folliculata
## 118 RAM-41 Carex trisperma
## 119 RAM-41 Carex trisperma
## 120 RAM-41 Carex
## 121 RAM-41 Chamaedaphne calyculata
## 122 RAM-41 Chamaedaphne calyculata
## 123 RAM-41 Cornus canadensis
## 124 RAM-41 Cornus canadensis
## 125 RAM-41 Doellingeria umbellata
## 126 RAM-41 Doellingeria umbellata
## 127 RAM-41 Dryopteris cristata
## 128 RAM-41 Empetrum nigrum
## 129 RAM-41 Eriophorum angustifolium
## 130 RAM-41 Eriophorum angustifolium
## 131 RAM-41 Gaylussacia baccata
## 132 RAM-41 Gaylussacia baccata
## 133 RAM-41 Ilex mucronata
## 134 RAM-41 Ilex mucronata
## 135 RAM-41 Ilex verticillata
## 136 RAM-41 Iris versicolor
## 137 RAM-41 Juncus effusus
## 138 RAM-41 Kalmia angustifolia
## 139 RAM-41 Kalmia angustifolia
## 140 RAM-41 Larix laricina
## 141 RAM-41 Larix laricina
## 142 RAM-41 Maianthemum canadense
## 143 RAM-41 Maianthemum canadense
## 144 RAM-41 Maianthemum trifolium
## 145 RAM-41 Myrica gale
## 146 RAM-41 Myrica gale
## 147 RAM-41 Onoclea sensibilis
## 148 RAM-41 Osmundastrum cinnamomea
## 149 RAM-41 Osmundastrum cinnamomea
## 150 RAM-41 Picea glauca
## 151 RAM-41 Picea mariana
## 152 RAM-41 Picea mariana
## 153 RAM-41 Prenanthes
## 154 RAM-41 Rhododendron canadense
## 155 RAM-41 Rhododendron canadense
## 156 RAM-41 Rhododendron groenlandicum
## 157 RAM-41 Rhododendron groenlandicum
## 158 RAM-41 Rosa nitida
## 159 RAM-41 Rosa nitida
## 160 RAM-41 Rosa palustris
## 161 RAM-41 Rubus flagellaris
## 162 RAM-41 Rubus hispidus
## 163 RAM-41 Rubus hispidus
## 164 RAM-41 Sarracenia purpurea
## 165 RAM-41 Solidago uliginosa
## 166 RAM-41 Solidago uliginosa
## 167 RAM-41 Spiraea alba
## 168 RAM-41 Spiraea alba
## 169 RAM-41 Symplocarpus foetidus
## 170 RAM-41 Thelypteris palustris
## 171 RAM-41 Trientalis borealis
## 172 RAM-41 Trientalis borealis
## 173 RAM-41 Vaccinium angustifolium
## 174 RAM-41 Vaccinium angustifolium
## 175 RAM-41 Vaccinium corymbosum
## 176 RAM-41 Vaccinium corymbosum
## 177 RAM-41 Vaccinium myrtilloides
## 178 RAM-41 Vaccinium myrtilloides
## 179 RAM-41 Vaccinium vitis-idaea
## 180 RAM-41 Viburnum nudum var. cassinoides
## 181 RAM-41 Viburnum nudum var. cassinoides
## 182 RAM-53 Acer rubrum
## 183 RAM-53 Acer rubrum
## 184 RAM-53 Alnus incana
## 185 RAM-53 Alnus incana
## 186 RAM-53 Amelanchier
## 187 RAM-53 Amelanchier
## 188 RAM-53 Aronia melanocarpa
## 189 RAM-53 Berberis thunbergii
## 190 RAM-53 Berberis thunbergii
## 191 RAM-53 Calamagrostis canadensis
## 192 RAM-53 Calamagrostis canadensis
## 193 RAM-53 Carex atlantica
## 194 RAM-53 Carex atlantica
## 195 RAM-53 Carex folliculata
## 196 RAM-53 Carex folliculata
## 197 RAM-53 Carex lasiocarpa
## 198 RAM-53 Carex lasiocarpa
## 199 RAM-53 Carex stricta
## 200 RAM-53 Carex stricta
## 201 RAM-53 Celastrus orbiculatus
## 202 RAM-53 Chamaedaphne calyculata
## 203 RAM-53 Chamaedaphne calyculata
## 204 RAM-53 Cornus canadensis
## 205 RAM-53 Cornus canadensis
## 206 RAM-53 Danthonia spicata
## 207 RAM-53 Dichanthelium acuminatum
## 208 RAM-53 Doellingeria umbellata
## 209 RAM-53 Doellingeria umbellata
## 210 RAM-53 Drosera intermedia
## 211 RAM-53 Drosera rotundifolia
## 212 RAM-53 Drosera rotundifolia
## 213 RAM-53 Dulichium arundinaceum
## 214 RAM-53 Dulichium arundinaceum
## 215 RAM-53 Epilobium leptophyllum
## 216 RAM-53 Eriophorum angustifolium
## 217 RAM-53 Eriophorum tenellum
## 218 RAM-53 Eriophorum virginicum
## 219 RAM-53 Eriophorum virginicum
## 220 RAM-53 Eurybia radula
## 221 RAM-53 Gaylussacia baccata
## 222 RAM-53 Gaylussacia baccata
## 223 RAM-53 Glyceria
## 224 RAM-53 Ilex mucronata
## 225 RAM-53 Ilex mucronata
## 226 RAM-53 Ilex verticillata
## 227 RAM-53 Ilex verticillata
## 228 RAM-53 Juncus canadensis
## 229 RAM-53 Kalmia angustifolia
## 230 RAM-53 Kalmia angustifolia
## 231 RAM-53 Larix laricina
## 232 RAM-53 Lysimachia terrestris
## 233 RAM-53 Lysimachia terrestris
## 234 RAM-53 Morella pensylvanica
## 235 RAM-53 Myrica gale
## 236 RAM-53 Myrica gale
## 237 RAM-53 Oclemena nemoralis
## 238 RAM-53 Oclemena nemoralis
## 239 RAM-53 Osmunda regalis
## 240 RAM-53 Osmunda regalis
## 241 RAM-53 Osmundastrum cinnamomea
## 242 RAM-53 Osmundastrum cinnamomea
## 243 RAM-53 Picea rubens
## 244 RAM-53 Pinus strobus
## 245 RAM-53 Pinus strobus
## 246 RAM-53 Pogonia ophioglossoides
## 247 RAM-53 Quercus rubra
## 248 RAM-53 Rhamnus frangula
## 249 RAM-53 Rhamnus frangula
## 250 RAM-53 Rhododendron canadense
## 251 RAM-53 Rhododendron canadense
## 252 RAM-53 Rhynchospora alba
## 253 RAM-53 Rosa palustris
## 254 RAM-53 Rubus flagellaris
## 255 RAM-53 Rubus hispidus
## 256 RAM-53 Scirpus cyperinus
## 257 RAM-53 Scirpus cyperinus
## 258 RAM-53 Solidago rugosa
## 259 RAM-53 Solidago uliginosa
## 260 RAM-53 Spiraea alba
## 261 RAM-53 Spiraea alba
## 262 RAM-53 Spiraea tomentosa
## 263 RAM-53 Spiraea tomentosa
## 264 RAM-53 Symphyotrichum novi-belgii
## 265 RAM-53 Thelypteris palustris
## 266 RAM-53 Thelypteris palustris
## 267 RAM-53 Triadenum virginicum
## 268 RAM-53 Triadenum
## 269 RAM-53 Trientalis borealis
## 270 RAM-53 Typha latifolia
## 271 RAM-53 Typha latifolia
## 272 RAM-53 Vaccinium angustifolium
## 273 RAM-53 Vaccinium corymbosum
## 274 RAM-53 Vaccinium corymbosum
## 275 RAM-53 Vaccinium macrocarpon
## 276 RAM-53 Vaccinium macrocarpon
## 277 RAM-53 Viburnum nudum var. cassinoides
## 278 RAM-53 Viburnum nudum var. cassinoides
## 279 RAM-53 Viola
## 280 RAM-62 Acer rubrum
## 281 RAM-62 Acer rubrum
## 282 RAM-62 Amelanchier
## 283 RAM-62 Aronia melanocarpa
## 284 RAM-62 Carex limosa
## 285 RAM-62 Carex trisperma
## 286 RAM-62 Carex trisperma
## 287 RAM-62 Chamaedaphne calyculata
## 288 RAM-62 Chamaedaphne calyculata
## 289 RAM-62 Cornus canadensis
## 290 RAM-62 Cornus canadensis
## 291 RAM-62 Eriophorum virginicum
## 292 RAM-62 Eriophorum virginicum
## 293 RAM-62 Gaultheria hispidula
## 294 RAM-62 Gaultheria hispidula
## 295 RAM-62 Gaylussacia baccata
## 296 RAM-62 Gaylussacia baccata
## 297 RAM-62 Ilex mucronata
## 298 RAM-62 Ilex mucronata
## 299 RAM-62 Ilex verticillata
## 300 RAM-62 Kalmia angustifolia
## 301 RAM-62 Kalmia angustifolia
## 302 RAM-62 Larix laricina
## 303 RAM-62 Larix laricina
## 304 RAM-62 Maianthemum trifolium
## 305 RAM-62 Maianthemum trifolium
## 306 RAM-62 Monotropa uniflora
## 307 RAM-62 Monotropa uniflora
## 308 RAM-62 Myrica gale
## 309 RAM-62 Myrica gale
## 310 RAM-62 Picea mariana
## 311 RAM-62 Picea mariana
## 312 RAM-62 Rhododendron canadense
## 313 RAM-62 Rhododendron canadense
## 314 RAM-62 Rhododendron groenlandicum
## 315 RAM-62 Rhododendron groenlandicum
## 316 RAM-62 Symplocarpus foetidus
## 317 RAM-62 Symplocarpus foetidus
## 318 RAM-62 Trientalis borealis
## 319 RAM-62 Trientalis borealis
## 320 RAM-62 Vaccinium angustifolium
## 321 RAM-62 Vaccinium angustifolium
## 322 RAM-62 Vaccinium corymbosum
## 323 RAM-62 Vaccinium corymbosum
## 324 RAM-62 Vaccinium myrtilloides
## 325 RAM-62 Vaccinium myrtilloides
## 326 RAM-62 Vaccinium oxycoccos
## 327 RAM-62 Vaccinium oxycoccos
## 328 RAM-62 Vaccinium vitis-idaea
## 329 RAM-62 Vaccinium vitis-idaea
## 330 RAM-62 Viburnum nudum var. cassinoides
## 331 RAM-62 Viburnum nudum var. cassinoides
## 332 RAM-44 Acer rubrum
## 333 RAM-44 Acer rubrum
## 334 RAM-44 Alnus incana
## 335 RAM-44 Alnus incana
## 336 RAM-44 Apocynum androsaemifolium
## 337 RAM-44 Apocynum androsaemifolium
## 338 RAM-44 Betula populifolia
## 339 RAM-44 Betula populifolia
## 340 RAM-44 Calamagrostis canadensis
## 341 RAM-44 Calamagrostis canadensis
## 342 RAM-44 Carex atlantica
## 343 RAM-44 Carex lacustris
## 344 RAM-44 Carex lacustris
## 345 RAM-44 Carex lasiocarpa
## 346 RAM-44 Carex lasiocarpa
## 347 RAM-44 Carex Ovalis group
## 348 RAM-44 Carex stricta
## 349 RAM-44 Carex stricta
## 350 RAM-44 Chamaedaphne calyculata
## 351 RAM-44 Chamaedaphne calyculata
## 352 RAM-44 Comptonia peregrina
## 353 RAM-44 Doellingeria umbellata
## 354 RAM-44 Doellingeria umbellata
## 355 RAM-44 Dryopteris cristata
## 356 RAM-44 Dulichium arundinaceum
## 357 RAM-44 Equisetum arvense
## 358 RAM-44 Eurybia macrophylla
## 359 RAM-44 Festuca filiformis
## 360 RAM-44 Ilex mucronata
## 361 RAM-44 Ilex verticillata
## 362 RAM-44 Ilex verticillata
## 363 RAM-44 Iris versicolor
## 364 RAM-44 Iris versicolor
## 365 RAM-44 Juncus canadensis
## 366 RAM-44 Juncus canadensis
## 367 RAM-44 Juncus effusus
## 368 RAM-44 Lupinus polyphyllus
## 369 RAM-44 Lysimachia terrestris
## 370 RAM-44 Lysimachia terrestris
## 371 RAM-44 Malus
## 372 RAM-44 Myrica gale
## 373 RAM-44 Myrica gale
## 374 RAM-44 Onoclea sensibilis
## 375 RAM-44 Onoclea sensibilis
## 376 RAM-44 Osmunda regalis
## 377 RAM-44 Osmundastrum cinnamomea
## 378 RAM-44 Phleum pratense
## 379 RAM-44 Pinus strobus
## 380 RAM-44 Pinus strobus
## 381 RAM-44 Populus grandidentata
## 382 RAM-44 Populus grandidentata
## 383 RAM-44 Populus tremuloides
## 384 RAM-44 Populus tremuloides
## 385 RAM-44 Quercus rubra
## 386 RAM-44 Ranunculus acris
## 387 RAM-44 Rhamnus frangula
## 388 RAM-44 Rhamnus frangula
## 389 RAM-44 Rhododendron canadense
## 390 RAM-44 Rosa nitida
## 391 RAM-44 Rosa palustris
## 392 RAM-44 Rosa virginiana
## 393 RAM-44 Rubus hispidus
## 394 RAM-44 Rubus
## 395 RAM-44 Salix petiolaris
## 396 RAM-44 Salix
## 397 RAM-44 Scirpus cyperinus
## 398 RAM-44 Scirpus cyperinus
## 399 RAM-44 Scutellaria
## 400 RAM-44 Solidago rugosa
## 401 RAM-44 Spiraea alba
## 402 RAM-44 Spiraea alba
## 403 RAM-44 Spiraea tomentosa
## 404 RAM-44 Triadenum virginicum
## 405 RAM-44 Triadenum
## 406 RAM-44 Utricularia cornuta
## 407 RAM-44 Vaccinium corymbosum
## 408 RAM-44 Veronica officinalis
## 409 RAM-44 Vicia cracca
## 410 RAM-44 Vicia cracca
## 411 RAM-05 Acer rubrum
## 412 RAM-05 Acer rubrum
## 413 RAM-05 Alnus incana
## 414 RAM-05 Amelanchier
## 415 RAM-05 Arethusa bulbosa
## 416 RAM-05 Arethusa bulbosa
## 417 RAM-05 Aronia melanocarpa
## 418 RAM-05 Aronia melanocarpa
## 419 RAM-05 Calamagrostis canadensis
## 420 RAM-05 Calopogon tuberosus
## 421 RAM-05 Calopogon tuberosus
## 422 RAM-05 Carex atlantica
## 423 RAM-05 Carex exilis
## 424 RAM-05 Carex folliculata
## 425 RAM-05 Carex folliculata
## 426 RAM-05 Carex magellanica
## 427 RAM-05 Carex pauciflora
## 428 RAM-05 Carex trisperma
## 429 RAM-05 Carex trisperma
## 430 RAM-05 Chamaedaphne calyculata
## 431 RAM-05 Chamaedaphne calyculata
## 432 RAM-05 Cornus canadensis
## 433 RAM-05 Cornus canadensis
## 434 RAM-05 Drosera intermedia
## 435 RAM-05 Drosera intermedia
## 436 RAM-05 Drosera rotundifolia
## 437 RAM-05 Drosera rotundifolia
## 438 RAM-05 Empetrum nigrum
## 439 RAM-05 Empetrum nigrum
## 440 RAM-05 Eriophorum angustifolium
## 441 RAM-05 Eriophorum angustifolium
## 442 RAM-05 Eriophorum virginicum
## 443 RAM-05 Eriophorum virginicum
## 444 RAM-05 Eurybia radula
## 445 RAM-05 Gaultheria hispidula
## 446 RAM-05 Gaultheria hispidula
## 447 RAM-05 Gaylussacia baccata
## 448 RAM-05 Gaylussacia baccata
## 449 RAM-05 Gaylussacia dumosa
## 450 RAM-05 Glyceria striata
## 451 RAM-05 Glyceria
## 452 RAM-05 Ilex mucronata
## 453 RAM-05 Ilex mucronata
## 454 RAM-05 Iris versicolor
## 455 RAM-05 Iris versicolor
## 456 RAM-05 Kalmia angustifolia
## 457 RAM-05 Kalmia angustifolia
## 458 RAM-05 Kalmia polifolia
## 459 RAM-05 Larix laricina
## 460 RAM-05 Larix laricina
## 461 RAM-05 Lonicera - Exotic
## 462 RAM-05 Maianthemum trifolium
## 463 RAM-05 Maianthemum trifolium
## 464 RAM-05 Melampyrum lineare
## 465 RAM-05 Myrica gale
## 466 RAM-05 Myrica gale
## 467 RAM-05 Oclemena nemoralis
## 468 RAM-05 Oclemena nemoralis
## 469 RAM-05 Oclemena X blakei
## 470 RAM-05 Osmundastrum cinnamomea
## 471 RAM-05 Picea mariana
## 472 RAM-05 Picea mariana
## 473 RAM-05 Pogonia ophioglossoides
## 474 RAM-05 Pogonia ophioglossoides
## 475 RAM-05 Rhododendron canadense
## 476 RAM-05 Rhododendron canadense
## 477 RAM-05 Rhododendron groenlandicum
## 478 RAM-05 Rhododendron groenlandicum
## 479 RAM-05 Rhynchospora alba
## 480 RAM-05 Rhynchospora alba
## 481 RAM-05 Rosa nitida
## 482 RAM-05 Rosa palustris
## 483 RAM-05 Rubus flagellaris
## 484 RAM-05 Rubus hispidus
## 485 RAM-05 Sarracenia purpurea
## 486 RAM-05 Sarracenia purpurea
## 487 RAM-05 Solidago uliginosa
## 488 RAM-05 Solidago uliginosa
## 489 RAM-05 Spiraea alba
## 490 RAM-05 Symphyotrichum novi-belgii
## 491 RAM-05 Symplocarpus foetidus
## 492 RAM-05 Symplocarpus foetidus
## 493 RAM-05 Trichophorum cespitosum
## 494 RAM-05 Trientalis borealis
## 495 RAM-05 Trientalis borealis
## 496 RAM-05 Utricularia cornuta
## 497 RAM-05 Utricularia cornuta
## 498 RAM-05 Vaccinium angustifolium
## 499 RAM-05 Vaccinium angustifolium
## 500 RAM-05 Vaccinium corymbosum
## 501 RAM-05 Vaccinium corymbosum
## 502 RAM-05 Vaccinium oxycoccos
## 503 RAM-05 Vaccinium oxycoccos
## 504 RAM-05 Vaccinium vitis-idaea
## 505 RAM-05 Viburnum nudum var. cassinoides
## 506 RAM-05 Viburnum nudum var. cassinoides
## 507 RAM-05 Xyris montana
## 508 RAM-05 Xyris montana
## Common Year PctFreq
## 1 red maple 2011 0
## 2 serviceberry 2011 20
## 3 bog rosemary 2011 80
## 4 dragon's mouth 2011 40
## 5 black chokeberry 2011 100
## 6 coastal sedge 2011 60
## 7 leatherleaf 2011 100
## 8 spoonleaf sundew 2011 60
## 9 round-leaf sundew 2011 100
## 10 black crowberry 2011 100
## 11 tall cottongrass 2011 100
## 12 tussock cottongrass 2011 60
## 13 black huckleberry 2011 20
## 14 dwarf huckleberry 2011 100
## 15 mountain holly/catberry 2011 40
## 16 common juniper 2011 40
## 17 sheep laurel 2011 100
## 18 bog laurel 2011 100
## 19 tamarack 2011 40
## 20 sweetgale 2011 60
## 21 spatterdock 2011 20
## 22 black spruce 2011 80
## 23 rhodora 2011 20
## 24 bog Labrador tea 2011 60
## 25 white beaksedge 2011 100
## 26 purple pitcherplant 2011 100
## 27 bog goldenrod 2011 100
## 28 skunk cabbage 2011 20
## 29 tufted bulrush 2011 100
## 30 starflower 2011 40
## 31 horned bladderwort 2011 100
## 32 small cranberry 2011 100
## 33 northern wild raisin 2011 40
## 34 northern yellow-eyed-grass 2011 40
## 35 red maple 2011 100
## 36 dragon's mouth 2011 20
## 37 black chokeberry 2011 100
## 38 gray birch 2011 20
## 39 coastal sedge 2011 100
## 40 woolly-fruit sedge 2011 40
## 41 upright sedge 2011 100
## 42 Northwest Territory sedge 2011 60
## 43 leatherleaf 2011 100
## 44 spoonleaf sundew 2011 40
## 45 round-leaf sundew 2011 100
## 46 threeway sedge 2011 60
## 47 tawny cottongrass 2011 80
## 48 black huckleberry 2011 80
## 49 mountain holly/catberry 2011 60
## 50 common winterberry 2011 20
## 51 tapertip rush 2011 20
## 52 sheep laurel 2011 100
## 53 bog laurel 2011 40
## 54 tamarack 2011 100
## 55 earth loosestrife 2011 40
## 56 threeleaf false lily of the valley 2011 20
## 57 bog muhly 2011 20
## 58 sweetgale 2011 100
## 59 bog aster 2011 80
## 60 black spruce 2011 100
## 61 eastern white pine 2011 80
## 62 rhodora 2011 40
## 63 bog Labrador tea 2011 20
## 64 white beaksedge 2011 0
## 65 bristly dewberry 2011 60
## 66 purple pitcherplant 2011 100
## 67 blue skullcap 2011 20
## 68 white meadowsweet 2011 20
## 69 steeplebush 2011 80
## 70 eastern white cedar 2011 80
## 71 Virginia marsh St. Johnswort 2011 80
## 72 lowbush blueberry 2011 20
## 73 highbush blueberry 2011 20
## 74 large cranberry 2011 80
## 75 small cranberry 2011 100
## 76 red maple 2011 80
## 77 gray alder 2011 40
## 78 serviceberry 2011 80
## 79 black chokeberry 2011 100
## 80 upright sedge 2011 100
## 81 threeseeded sedge 2011 60
## 82 leatherleaf 2011 100
## 83 bunchberry dogwood 2011 100
## 84 round-leaf sundew 2011 100
## 85 tall cottongrass 2011 60
## 86 low rough aster 2011 20
## 87 creeping snowberry 2011 20
## 88 black huckleberry 2011 100
## 89 mountain holly/catberry 2011 100
## 90 common winterberry 2011 20
## 91 sheep laurel 2011 100
## 92 bog laurel 2011 40
## 93 tamarack 2011 100
## 94 northern bayberry 2011 40
## 95 cinnamon fern 2011 40
## 96 black spruce 2011 100
## 97 jack pine 2011 20
## 98 rhodora 2011 100
## 99 bog Labrador tea 2011 100
## 100 purple pitcherplant 2011 100
## 101 American mountain ash 2011 20
## 102 white meadowsweet 2011 20
## 103 eastern white cedar 2011 100
## 104 starflower 2011 100
## 105 lowbush blueberry 2011 40
## 106 large cranberry 2011 100
## 107 small cranberry 2011 80
## 108 wild raisin 2011 100
## 109 red maple 2017 50
## 110 red maple 2012 50
## 111 gray alder 2017 25
## 112 serviceberry 2017 100
## 113 serviceberry 2012 75
## 114 black chokeberry 2012 100
## 115 Japanese barberry 2017 100
## 116 bluejoint 2012 25
## 117 long sedge 2017 25
## 118 threeseeded sedge 2017 100
## 119 threeseeded sedge 2012 75
## 120 sedge species 2017 25
## 121 leatherleaf 2012 100
## 122 leatherleaf 2017 100
## 123 bunchberry dogwood 2017 25
## 124 bunchberry dogwood 2012 25
## 125 parasol whitetop 2017 75
## 126 parasol whitetop 2012 75
## 127 crested woodfern 2017 25
## 128 black crowberry 2012 25
## 129 tall cottongrass 2012 50
## 130 tall cottongrass 2017 50
## 131 black huckleberry 2017 100
## 132 black huckleberry 2012 0
## 133 mountain holly/catberry 2017 100
## 134 mountain holly/catberry 2012 100
## 135 common winterberry 2017 25
## 136 harlequin blueflag 2012 25
## 137 common rush 2017 25
## 138 sheep laurel 2012 100
## 139 sheep laurel 2017 100
## 140 tamarack 2012 100
## 141 tamarack 2017 100
## 142 Canada mayflower 2017 100
## 143 Canada mayflower 2012 75
## 144 threeleaf false lily of the valley 2017 25
## 145 sweetgale 2017 100
## 146 sweetgale 2012 100
## 147 sensitive fern 2017 25
## 148 cinnamon fern 2012 100
## 149 cinnamon fern 2017 100
## 150 white spruce 2017 25
## 151 black spruce 2012 100
## 152 black spruce 2017 50
## 153 rattlesnakeroot 2017 25
## 154 rhodora 2017 100
## 155 rhodora 2012 100
## 156 bog Labrador tea 2017 100
## 157 bog Labrador tea 2012 100
## 158 shining rose 2017 50
## 159 shining rose 2012 25
## 160 swamp rose 2012 25
## 161 northern dewberry 2012 25
## 162 bristly dewberry 2012 75
## 163 bristly dewberry 2017 100
## 164 purple pitcherplant 2012 25
## 165 bog goldenrod 2012 25
## 166 bog goldenrod 2017 75
## 167 white meadowsweet 2012 100
## 168 white meadowsweet 2017 100
## 169 skunk cabbage 2017 25
## 170 eastern marsh fern 2017 25
## 171 starflower 2017 100
## 172 starflower 2012 100
## 173 lowbush blueberry 2017 100
## 174 lowbush blueberry 2012 75
## 175 highbush blueberry 2012 100
## 176 highbush blueberry 2017 100
## 177 velvetleaf huckleberry 2012 25
## 178 velvetleaf huckleberry 2017 50
## 179 lingonberry 2017 25
## 180 northern wild raisin 2017 100
## 181 northern wild raisin 2012 100
## 182 red maple 2017 100
## 183 red maple 2012 100
## 184 gray alder 2017 100
## 185 gray alder 2012 100
## 186 serviceberry 2012 50
## 187 serviceberry 2017 100
## 188 black chokeberry 2017 100
## 189 Japanese barberry 2012 100
## 190 Japanese barberry 2017 100
## 191 bluejoint 2012 100
## 192 bluejoint 2017 100
## 193 prickly bog sedge 2017 0
## 194 prickly bog sedge 2012 50
## 195 long sedge 2017 50
## 196 long sedge 2012 50
## 197 woolly-fruit sedge 2012 75
## 198 woolly-fruit sedge 2017 100
## 199 upright sedge 2017 100
## 200 upright sedge 2012 100
## 201 Asian bittersweet 2017 50
## 202 leatherleaf 2012 100
## 203 leatherleaf 2017 100
## 204 bunchberry dogwood 2017 100
## 205 bunchberry dogwood 2012 75
## 206 poverty oatgrass 2017 25
## 207 tapered rosette grass 2017 25
## 208 parasol whitetop 2017 75
## 209 parasol whitetop 2012 25
## 210 spoonleaf sundew 2017 50
## 211 round-leaf sundew 2017 50
## 212 round-leaf sundew 2012 50
## 213 threeway sedge 2017 75
## 214 threeway sedge 2012 50
## 215 bog willowherb 2017 25
## 216 tall cottongrass 2012 25
## 217 fewnerved cottongrass 2017 25
## 218 tawny cottongrass 2017 75
## 219 tawny cottongrass 2012 100
## 220 low rough aster 2012 50
## 221 black huckleberry 2017 75
## 222 black huckleberry 2012 75
## 223 mannagrass 2012 25
## 224 mountain holly/catberry 2012 25
## 225 mountain holly/catberry 2017 25
## 226 common winterberry 2017 100
## 227 common winterberry 2012 50
## 228 Canadian rush 2017 25
## 229 sheep laurel 2012 75
## 230 sheep laurel 2017 100
## 231 tamarack 2012 25
## 232 earth loosestrife 2012 75
## 233 earth loosestrife 2017 75
## 234 northern bayberry 2017 50
## 235 sweetgale 2017 100
## 236 sweetgale 2012 100
## 237 bog aster 2017 100
## 238 bog aster 2012 100
## 239 royal fern 2017 100
## 240 royal fern 2012 100
## 241 cinnamon fern 2012 100
## 242 cinnamon fern 2017 100
## 243 red spruce 2017 75
## 244 eastern white pine 2017 100
## 245 eastern white pine 2012 100
## 246 rose pogonia 2017 25
## 247 northern red oak 2012 50
## 248 glossy buckthorn 2012 75
## 249 glossy buckthorn 2017 100
## 250 rhodora 2012 100
## 251 rhodora 2017 100
## 252 white beaksedge 2012 50
## 253 swamp rose 2012 25
## 254 northern dewberry 2012 100
## 255 bristly dewberry 2017 100
## 256 woolgrass 2012 0
## 257 woolgrass 2017 25
## 258 wrinkleleaf goldenrod 2012 25
## 259 bog goldenrod 2012 25
## 260 white meadowsweet 2017 100
## 261 white meadowsweet 2012 100
## 262 steeplebush 2012 100
## 263 steeplebush 2017 100
## 264 New York aster 2017 25
## 265 eastern marsh fern 2012 100
## 266 eastern marsh fern 2017 75
## 267 Virginia marsh St. Johnswort 2012 100
## 268 marsh St. Johnswort 2017 100
## 269 starflower 2012 50
## 270 broadleaf cattail 2012 75
## 271 broadleaf cattail 2017 50
## 272 lowbush blueberry 2017 100
## 273 highbush blueberry 2012 100
## 274 highbush blueberry 2017 100
## 275 large cranberry 2017 100
## 276 large cranberry 2012 100
## 277 northern wild raisin 2012 25
## 278 northern wild raisin 2017 100
## 279 violet 2012 25
## 280 red maple 2017 25
## 281 red maple 2012 25
## 282 serviceberry 2012 25
## 283 black chokeberry 2017 50
## 284 mud sedge 2012 25
## 285 threeseeded sedge 2012 100
## 286 threeseeded sedge 2017 100
## 287 leatherleaf 2012 100
## 288 leatherleaf 2017 100
## 289 bunchberry dogwood 2017 100
## 290 bunchberry dogwood 2012 100
## 291 tawny cottongrass 2017 25
## 292 tawny cottongrass 2012 100
## 293 creeping snowberry 2017 100
## 294 creeping snowberry 2012 100
## 295 black huckleberry 2017 100
## 296 black huckleberry 2012 100
## 297 mountain holly/catberry 2012 100
## 298 mountain holly/catberry 2017 100
## 299 common winterberry 2017 100
## 300 sheep laurel 2017 100
## 301 sheep laurel 2012 100
## 302 tamarack 2017 100
## 303 tamarack 2012 75
## 304 threeleaf false lily of the valley 2017 100
## 305 threeleaf false lily of the valley 2012 100
## 306 Indianpipe 2012 25
## 307 Indianpipe 2017 25
## 308 sweetgale 2017 100
## 309 sweetgale 2012 75
## 310 black spruce 2012 100
## 311 black spruce 2017 100
## 312 rhodora 2017 100
## 313 rhodora 2012 100
## 314 bog Labrador tea 2017 100
## 315 bog Labrador tea 2012 100
## 316 skunk cabbage 2012 100
## 317 skunk cabbage 2017 100
## 318 starflower 2012 25
## 319 starflower 2017 25
## 320 lowbush blueberry 2012 75
## 321 lowbush blueberry 2017 100
## 322 highbush blueberry 2012 100
## 323 highbush blueberry 2017 100
## 324 velvetleaf huckleberry 2012 25
## 325 velvetleaf huckleberry 2017 100
## 326 small cranberry 2017 100
## 327 small cranberry 2012 100
## 328 lingonberry 2012 100
## 329 lingonberry 2017 100
## 330 northern wild raisin 2012 100
## 331 northern wild raisin 2017 100
## 332 red maple 2012 50
## 333 red maple 2017 25
## 334 gray alder 2017 100
## 335 gray alder 2012 100
## 336 spreading dogbane 2017 25
## 337 spreading dogbane 2012 25
## 338 gray birch 2017 75
## 339 gray birch 2012 75
## 340 bluejoint 2017 100
## 341 bluejoint 2012 100
## 342 prickly bog sedge 2017 50
## 343 hairy sedge 2012 100
## 344 hairy sedge 2017 100
## 345 woolly-fruit sedge 2012 50
## 346 woolly-fruit sedge 2017 100
## 347 <NA> 2012 50
## 348 upright sedge 2017 100
## 349 upright sedge 2012 100
## 350 leatherleaf 2012 100
## 351 leatherleaf 2017 100
## 352 sweet fern 2012 25
## 353 parasol whitetop 2017 50
## 354 parasol whitetop 2012 25
## 355 crested woodfern 2017 50
## 356 threeway sedge 2017 50
## 357 field horsetail 2017 25
## 358 bigleaf aster 2017 50
## 359 fineleaf sheep fescue 2017 50
## 360 mountain holly/catberry 2017 50
## 361 common winterberry 2012 75
## 362 common winterberry 2017 75
## 363 harlequin blueflag 2012 50
## 364 harlequin blueflag 2017 50
## 365 Canadian rush 2017 50
## 366 Canadian rush 2012 25
## 367 common rush 2017 25
## 368 garden lupine 2012 50
## 369 earth loosestrife 2012 75
## 370 earth loosestrife 2017 100
## 371 apple 2012 25
## 372 sweetgale 2012 100
## 373 sweetgale 2017 100
## 374 sensitive fern 2017 50
## 375 sensitive fern 2012 25
## 376 royal fern 2012 25
## 377 cinnamon fern 2012 25
## 378 timothy 2012 50
## 379 eastern white pine 2017 50
## 380 eastern white pine 2012 50
## 381 bigtooth aspen 2017 50
## 382 bigtooth aspen 2012 50
## 383 quaking aspen 2012 75
## 384 quaking aspen 2017 75
## 385 northern red oak 2017 50
## 386 tall buttercup 2017 25
## 387 glossy buckthorn 2012 75
## 388 glossy buckthorn 2017 75
## 389 rhodora 2017 25
## 390 shining rose 2017 75
## 391 swamp rose 2012 25
## 392 Virginia rose 2017 25
## 393 bristly dewberry 2017 50
## 394 blackberry 2017 50
## 395 slender willow 2017 100
## 396 willow 2012 100
## 397 woolgrass 2017 25
## 398 woolgrass 2012 50
## 399 Huachuca skulcap 2012 50
## 400 wrinkleleaf goldenrod 2017 50
## 401 white meadowsweet 2012 100
## 402 white meadowsweet 2017 100
## 403 steeplebush 2017 25
## 404 Virginia marsh St. Johnswort 2012 25
## 405 marsh St. Johnswort 2017 75
## 406 horned bladderwort 2017 25
## 407 highbush blueberry 2017 25
## 408 common gypsyweed 2017 50
## 409 bird vetch 2012 50
## 410 bird vetch 2017 50
## 411 red maple 2012 100
## 412 red maple 2017 100
## 413 gray alder 2017 25
## 414 serviceberry 2017 50
## 415 dragon's mouth 2012 50
## 416 dragon's mouth 2017 50
## 417 black chokeberry 2012 100
## 418 black chokeberry 2017 100
## 419 bluejoint 2012 50
## 420 grasspink 2017 100
## 421 grasspink 2012 75
## 422 prickly bog sedge 2017 75
## 423 coastal sedge 2017 75
## 424 long sedge 2012 50
## 425 long sedge 2017 50
## 426 boreal bog sedge 2017 50
## 427 fewflower sedge 2017 25
## 428 threeseeded sedge 2012 75
## 429 threeseeded sedge 2017 75
## 430 leatherleaf 2012 100
## 431 leatherleaf 2017 100
## 432 bunchberry dogwood 2012 100
## 433 bunchberry dogwood 2017 100
## 434 spoonleaf sundew 2012 50
## 435 spoonleaf sundew 2017 75
## 436 round-leaf sundew 2012 100
## 437 round-leaf sundew 2017 100
## 438 black crowberry 2017 100
## 439 black crowberry 2012 100
## 440 tall cottongrass 2012 100
## 441 tall cottongrass 2017 100
## 442 tawny cottongrass 2017 75
## 443 tawny cottongrass 2012 25
## 444 low rough aster 2017 50
## 445 creeping snowberry 2012 75
## 446 creeping snowberry 2017 50
## 447 black huckleberry 2012 50
## 448 black huckleberry 2017 100
## 449 dwarf huckleberry 2012 100
## 450 fowl manna grass 2017 50
## 451 mannagrass 2012 50
## 452 mountain holly/catberry 2017 100
## 453 mountain holly/catberry 2012 100
## 454 harlequin blueflag 2017 75
## 455 harlequin blueflag 2012 75
## 456 sheep laurel 2012 100
## 457 sheep laurel 2017 100
## 458 bog laurel 2017 25
## 459 tamarack 2017 100
## 460 tamarack 2012 100
## 461 exotic bush honeysuckle 2017 50
## 462 threeleaf false lily of the valley 2012 75
## 463 threeleaf false lily of the valley 2017 100
## 464 narrowleaf cowwheat 2012 50
## 465 sweetgale 2012 100
## 466 sweetgale 2017 100
## 467 bog aster 2017 75
## 468 bog aster 2012 100
## 469 Blake's aster 2017 50
## 470 cinnamon fern 2017 25
## 471 black spruce 2017 100
## 472 black spruce 2012 100
## 473 rose pogonia 2012 100
## 474 rose pogonia 2017 100
## 475 rhodora 2012 100
## 476 rhodora 2017 100
## 477 bog Labrador tea 2012 100
## 478 bog Labrador tea 2017 100
## 479 white beaksedge 2017 100
## 480 white beaksedge 2012 100
## 481 shining rose 2017 50
## 482 swamp rose 2012 25
## 483 northern dewberry 2012 25
## 484 bristly dewberry 2017 50
## 485 purple pitcherplant 2017 100
## 486 purple pitcherplant 2012 100
## 487 bog goldenrod 2017 75
## 488 bog goldenrod 2012 100
## 489 white meadowsweet 2017 25
## 490 New York aster 2017 75
## 491 skunk cabbage 2017 100
## 492 skunk cabbage 2012 100
## 493 tufted bulrush 2012 75
## 494 starflower 2012 100
## 495 starflower 2017 75
## 496 horned bladderwort 2012 50
## 497 horned bladderwort 2017 75
## 498 lowbush blueberry 2012 50
## 499 lowbush blueberry 2017 75
## 500 highbush blueberry 2017 50
## 501 highbush blueberry 2012 50
## 502 small cranberry 2017 100
## 503 small cranberry 2012 100
## 504 lingonberry 2017 25
## 505 northern wild raisin 2017 100
## 506 northern wild raisin 2012 100
## 507 northern yellow-eyed-grass 2017 50
## 508 northern yellow-eyed-grass 2012 50
Return first 5 rows and a subset of columns of the data frame
## Site_Name Latin_Name Common Year PctFreq
## 1 SEN-01 Acer rubrum red maple 2011 0
## 2 SEN-01 Amelanchier serviceberry 2011 20
## 3 SEN-01 Andromeda polifolia bog rosemary 2011 80
## 4 SEN-01 Arethusa bulbosa dragon's mouth 2011 40
## 5 SEN-01 Aronia melanocarpa black chokeberry 2011 100
Return all rows and first 4 columns of the data frame
ACAD_sub <- ACAD_wetland[ , 1:4] # works, but risky
ACAD_sub2 <-
ACAD_wetland[,c("Site_Name", "Site_Type", "Latin_Name", "Common")] #same result, but betterCoding Tip: As shown above, you can specify columns by name or by column number. However, it’s almost always best to refer to columns by name. It makes your code easier to read and prevents it from breaking if columns get reordered.
CHALLENGE: How would you look at the the first 4 even rows
(2, 4, 6, 8), and first 2 columns of the ACAD_wetland data
frame?
## Site_Name Site_Type
## 2 SEN-01 Sentinel
## 4 SEN-01 Sentinel
## 6 SEN-01 Sentinel
## 8 SEN-01 Sentinel
## [1] "Site_Name" "Site_Type" "Latin_Name" "Common" "Year"
## [6] "PctFreq" "Ave_Cov" "Invasive" "Protected" "X_Coord"
## [11] "Y_Coord"
## Site_Name Site_Type
## 2 SEN-01 Sentinel
## 4 SEN-01 Sentinel
## 6 SEN-01 Sentinel
## 8 SEN-01 Sentinel
= vs == vs %in%
= or a
double ==.
=.
==.
!= is interpreted as not equal to for similar
use.
%in%. This operator works just like
==, but for multiple conditions. The ==
operator is not designed to take more than 1 condition, even though it
won’t give you an error. Instead, it will stop after it makes the first
match.
As you get more comfortable with R, this will become natural. If you
forget, R will error and may even give you a hint when you used
= instead of ==.
Pattern match (filter) to return a data frame of species that are not invasive and return all columns
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov
## 1 SEN-01 Sentinel Acer rubrum red maple 2011 0 0.02
## 2 SEN-01 Sentinel Amelanchier serviceberry 2011 20 0.02
## 3 SEN-01 Sentinel Andromeda polifolia bog rosemary 2011 80 2.22
## 4 SEN-01 Sentinel Arethusa bulbosa dragon's mouth 2011 40 0.04
## 5 SEN-01 Sentinel Aronia melanocarpa black chokeberry 2011 100 2.64
## 6 SEN-01 Sentinel Carex exilis coastal sedge 2011 60 6.60
## Invasive Protected X_Coord Y_Coord
## 1 FALSE FALSE 574855.5 4911909
## 2 FALSE FALSE 574855.5 4911909
## 3 FALSE FALSE 574855.5 4911909
## 4 FALSE TRUE 574855.5 4911909
## 5 FALSE FALSE 574855.5 4911909
## 6 FALSE FALSE 574855.5 4911909
##
## FALSE TRUE
## 499 9
##
## FALSE
## 499
Filter data to only return the Latin_Name column of rows where Invasive is TRUE. Click on R Output below to view results.
## [1] "Berberis thunbergii" "Berberis thunbergii" "Berberis thunbergii"
## [4] "Celastrus orbiculatus" "Rhamnus frangula" "Rhamnus frangula"
## [7] "Rhamnus frangula" "Rhamnus frangula" "Lonicera - Exotic"
## [1] "Berberis thunbergii" "Berberis thunbergii" "Berberis thunbergii"
## [4] "Celastrus orbiculatus" "Rhamnus frangula" "Rhamnus frangula"
## [7] "Rhamnus frangula" "Rhamnus frangula" "Lonicera - Exotic"
Filter data to return any plot where Arethusa bulbosa, Calopogon tuberosus, or Pogonia ophioglossoides were detected.
orchid_spp <- c("Arethusa bulbosa", "Calopogon tuberosus", "Pogonia ophioglossoides")
ACAD_orchid_plots <- ACAD_wetland[ACAD_wetland$Latin_Name %in% orchid_spp,
c("Site_Name", "Year", "Latin_Name")]
ACAD_orchid_plots## Site_Name Year Latin_Name
## 4 SEN-01 2011 Arethusa bulbosa
## 36 SEN-02 2011 Arethusa bulbosa
## 246 RAM-53 2017 Pogonia ophioglossoides
## 415 RAM-05 2012 Arethusa bulbosa
## 416 RAM-05 2017 Arethusa bulbosa
## 420 RAM-05 2017 Calopogon tuberosus
## 421 RAM-05 2012 Calopogon tuberosus
## 473 RAM-05 2012 Pogonia ophioglossoides
## 474 RAM-05 2017 Pogonia ophioglossoides
Coding Tip: There are often multiple ways to perform a task. The best code is code that 1) works, 2) is easy to follow, and 3) is unlikely to break (e.g. use column names instead of numbers). That still means there are typically multiple equally valid approaches. There are other ways to judge good code as you advance, but for now, aspire to write code that meets these three qualities.
unique(), sort(),
length()
Determining the number of records that match a certain condition can
useful too. Say we want to know how many unique sites were sampled in
the ACAD_wetland data frame. We can use a combination of
brackets and other functions to summarize that, like below.
Sort alphabetically a list of unique site names.
# Return a vector of unique site names, sorted alphabetically
sites_unique <- sort(unique(ACAD_wetland[,"Site_Name"]))
sites_unique## [1] "RAM-05" "RAM-41" "RAM-44" "RAM-53" "RAM-62" "SEN-01" "SEN-02" "SEN-03"
Determine number of unique sites
## [1] 8
CHALLENGE: How many unique species are there in the
ACAD_wetland data frame?
CHALLENGE: Which sites have species that are considered protected on them (Protected = TRUE)?
We’ve already explored the wetland data a bit using
head(), str(), names(), and
View(). These are functions that you will use over and over
as you work with data in R. Below, I’m going to show how I get to know a
data set in R.
Read in example NETN tree data
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")Look at first few records
## ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN ScientificName
## 1 MIMA 12 6/16/2025 FALSE 2025 13 183385 Pinus strobus
## 2 MIMA 12 6/16/2025 FALSE 2025 12 28728 Acer rubrum
## 3 MIMA 12 6/16/2025 FALSE 2025 11 28728 Acer rubrum
## 4 MIMA 12 6/16/2025 FALSE 2025 2 28728 Acer rubrum
## 5 MIMA 12 6/16/2025 FALSE 2025 10 28728 Acer rubrum
## 6 MIMA 12 6/16/2025 FALSE 2025 7 28728 Acer rubrum
## DBHcm TreeStatusCode CrownClassCode DecayClassCode
## 1 24.9 AS 5 <NA>
## 2 10.9 AB 5 <NA>
## 3 18.8 AS 3 <NA>
## 4 51.2 AS 3 <NA>
## 5 38.2 AS 3 <NA>
## 6 22.5 AS 4 <NA>
Look at structure of each column
## 'data.frame': 164 obs. of 12 variables:
## $ ParkUnit : chr "MIMA" "MIMA" "MIMA" "MIMA" ...
## $ PlotCode : int 12 12 12 12 12 12 12 12 12 12 ...
## $ SampleDate : chr "6/16/2025" "6/16/2025" "6/16/2025" "6/16/2025" ...
## $ IsQAQC : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ SampleYear : int 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 ...
## $ TagCode : int 13 12 11 2 10 7 5 9 1 3 ...
## $ TSN : int 183385 28728 28728 28728 28728 28728 28728 28728 28728 28728 ...
## $ ScientificName: chr "Pinus strobus" "Acer rubrum" "Acer rubrum" "Acer rubrum" ...
## $ DBHcm : num 24.9 10.9 18.8 51.2 38.2 22.5 26.4 42.9 12.3 49 ...
## $ TreeStatusCode: chr "AS" "AB" "AS" "AS" ...
## $ CrownClassCode: int 5 5 3 3 3 4 NA NA NA NA ...
## $ DecayClassCode: chr NA NA NA NA ...
Look at summary of the columns
## ParkUnit PlotCode SampleDate IsQAQC
## Length:164 Min. :11.00 Length:164 Mode :logical
## Class :character 1st Qu.:14.00 Class :character FALSE:164
## Mode :character Median :16.50 Mode :character
## Mean :16.05
## 3rd Qu.:19.00
## Max. :20.00
##
## SampleYear TagCode TSN ScientificName
## Min. :2025 Min. : 1.0 Min. : 19049 Length:164
## 1st Qu.:2025 1st Qu.: 7.0 1st Qu.: 24764 Class :character
## Median :2025 Median :12.5 Median : 28728 Mode :character
## Mean :2025 Mean :13.6 Mean : 62361
## 3rd Qu.:2025 3rd Qu.:19.0 3rd Qu.: 32929
## Max. :2025 Max. :36.0 Max. :565478
##
## DBHcm TreeStatusCode CrownClassCode DecayClassCode
## Min. : 10.00 Length:164 Min. :1.000 Length:164
## 1st Qu.: 13.12 Class :character 1st Qu.:3.000 Class :character
## Median : 19.00 Mode :character Median :5.000 Mode :character
## Mean : 25.47 Mean :4.165
## 3rd Qu.: 28.45 3rd Qu.:5.000
## Max. :443.00 Max. :6.000
## NA's :25
Check for complete cases for the first 10 columns that should always have data.
##
## TRUE
## 164
To keep data frames rectangular, R treats missing data (i.e. blanks)
as NA (stands for not available). A foundational philosophy of R is that
the user must tell R functions what do to if NAs are in the data.
Ideally that forces the user to investigate the NAs to determine their
reason for being there, whether there’s a way to fix it, if those
records should be dropped, etc. If you try to calculate the mean of a
column that has a blank in it, and you don’t tell R what to do with NAs,
the returned value will be NA. Most summary functions in R
have an argument na.rm, which is logical (TRUE/FALSE). To
drop NAs, you include na.rm = TRUE.
It’s important every time you have NAs in your data to think about what they mean and how best to treat them. Sometimes, it’s best to drop them. Other times, converting the blanks to 0 is the best approach. It depends entirely on your data and what you intend to do with it.
Test NA use with mean() function
## [1] NA
## [1] 4
Look at unique values for DecayClassCode.
## [1] "1" "2" "3" "PM"
##
## 1 2 3 PM
## 9 6 8 2
There are 2 records called “PM”, which stands for Permanently Missing in our forest data. We will convert PM to a blank, which R calls NA, and create a new decay class column that is converted to numeric.
Convert “PM” to blank. I will first make a copy of the data frame.
trees2 <- trees
trees2$DecayClassCode[trees2$DecayClassCode == "PM"] <- NA
trees2$DecayClassCode_num <- as.numeric(trees2$DecayClassCode)
# check that it worked
str(trees2) # DecayClassCode_num is numeric## 'data.frame': 164 obs. of 13 variables:
## $ ParkUnit : chr "MIMA" "MIMA" "MIMA" "MIMA" ...
## $ PlotCode : int 12 12 12 12 12 12 12 12 12 12 ...
## $ SampleDate : chr "6/16/2025" "6/16/2025" "6/16/2025" "6/16/2025" ...
## $ IsQAQC : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ SampleYear : int 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 ...
## $ TagCode : int 13 12 11 2 10 7 5 9 1 3 ...
## $ TSN : int 183385 28728 28728 28728 28728 28728 28728 28728 28728 28728 ...
## $ ScientificName : chr "Pinus strobus" "Acer rubrum" "Acer rubrum" "Acer rubrum" ...
## $ DBHcm : num 24.9 10.9 18.8 51.2 38.2 22.5 26.4 42.9 12.3 49 ...
## $ TreeStatusCode : chr "AS" "AB" "AS" "AS" ...
## $ CrownClassCode : int 5 5 3 3 3 4 NA NA NA NA ...
## $ DecayClassCode : chr NA NA NA NA ...
## $ DecayClassCode_num: num NA NA NA NA NA NA 1 3 2 3 ...
## [1] 1 2 3
Using the trees2 data frame, which fixed the decay class
column by making the DecayClassCode_num field numeric,
we’re now going to drop visits that were for QAQC using a new base R
function called subset(). The subset()
function allows you to reduce the dimensions of a data frame. You can
reduce rows, columns, or both in the same function call. I will also
show the bracket approach.
Remove QAQC visits (IsQAQC == TRUE) and drop the DecayClassCode column
Convert SampleDate into a date-time instead of character.
## [1] "6/16/2025" "6/16/2025" "6/16/2025" "6/16/2025" "6/16/2025" "6/16/2025"
# Create new column called Date
trees3$Date <- as.Date(trees3$SampleDate, format = "%m/%d/%Y")
str(trees3)## 'data.frame': 164 obs. of 13 variables:
## $ ParkUnit : chr "MIMA" "MIMA" "MIMA" "MIMA" ...
## $ PlotCode : int 12 12 12 12 12 12 12 12 12 12 ...
## $ SampleDate : chr "6/16/2025" "6/16/2025" "6/16/2025" "6/16/2025" ...
## $ IsQAQC : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ SampleYear : int 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 ...
## $ TagCode : int 13 12 11 2 10 7 5 9 1 3 ...
## $ TSN : int 183385 28728 28728 28728 28728 28728 28728 28728 28728 28728 ...
## $ ScientificName : chr "Pinus strobus" "Acer rubrum" "Acer rubrum" "Acer rubrum" ...
## $ DBHcm : num 24.9 10.9 18.8 51.2 38.2 22.5 26.4 42.9 12.3 49 ...
## $ TreeStatusCode : chr "AS" "AB" "AS" "AS" ...
## $ CrownClassCode : int 5 5 3 3 3 4 NA NA NA NA ...
## $ DecayClassCode_num: num NA NA NA NA NA NA 1 3 2 3 ...
## $ Date : Date, format: "2025-06-16" "2025-06-16" ...
Renaming columns in base R is kind of a pain, to the point that I have to look it up every time I need to do it. I’ll show you an easier way to do that tomorrow.
Rename ScientificName column
## [1] "ParkUnit" "PlotCode" "SampleDate"
## [4] "IsQAQC" "SampleYear" "TagCode"
## [7] "TSN" "ScientificName" "DBHcm"
## [10] "TreeStatusCode" "CrownClassCode" "DecayClassCode_num"
## [13] "Date"
## [1] "ParkUnit" "PlotCode" "SampleDate"
## [4] "IsQAQC" "SampleYear" "TagCode"
## [7] "TSN" "Species" "DBHcm"
## [10] "TreeStatusCode" "CrownClassCode" "DecayClassCode_num"
## [13] "Date"
Plot_Name column via paste()
The paste() and paste0() functions are very
handy for creating new columns that are combinations of existing
functions. The code below will create a new column named
Plot_Name that’s a combination of ParkUnit and
PlotCode.
Create new Plot_Name column
trees3$Plot_Name <- paste(trees3$ParkUnit, trees3$PlotCode, sep = "-")
trees3$Plot_Name <- paste0(trees3$ParkUnit, "-", trees3$PlotCode) #equivalent- by default no separation between elements of paste.
Coding Tip: In most cases, it does not matter whether you use
single ’ or double “, as long as you open and close with the same. The
cases where it matters are where you have quotes within quotes. There
you have to alternate your usage, like
print("Text in outer quote 'text printed as being within quotes' end with closing quote").
Option 1. Subset data then calculate number of rows
## [1] 6
Option 2. Subset the data with brackets and use the
table() function to tally status codes.
##
## AB AS DB DM DS
## 1 6 3 1 1
CHALLENGE: Find the DBH record that’s > 400cm DBH.
There are multiple ways to do this. Two examples are below.
Option 1. View the data and sort by DBH.
Option 2. Find the max DBH value and subset the data frame
## ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN Species
## 26 MIMA 16 6/17/2025 FALSE 2025 1 19447 Quercus robur
## DBHcm TreeStatusCode CrownClassCode DecayClassCode_num Date Plot_Name
## 26 443 AS 3 NA 2025-06-17 MIMA-16
CHALLENGE: What is the exact value of the largest DBH, and which record does it belong to?
There are multiple ways to do this. Two examples are below.
Option 1. View the data and sort by DBH.
Option 2. Find the max DBH value and subset the data frame
## [1] 443
## ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN ScientificName
## 26 MIMA 16 6/17/2025 FALSE 2025 1 19447 Quercus robur
## DBHcm TreeStatusCode CrownClassCode DecayClassCode
## 26 443 AS 3 <NA>
CHALLENGE: Fix the DBH typo by replacing 443.0 with 44.3.
Let’s say that you looked at the datasheet, and the actual DBH for that tree was 44.3 instead of 443.0. You can change that value in the original CSV by hand. But even better is to document that change in code. There are multiple ways to do this. Two examples are below.
But first, it’s good to create a new data frame when modifying the original data frame, so you can refer back to the original if needed. I also use a really specific filter to make sure I’m not accidentally changing other data.
Replace 443 with 44.3
# create copy of trees data
trees_fix <- trees3
# find the problematic DBH value, and change it to 44.3
trees_fix$DBHcm[trees_fix$Plot_Name == "MIMA-016" & trees_fix$TagCode == 1 & trees_fix$DBHcm == 443] <- 44.3Check that it worked by showing the range of the original and fixed data frames.
## [1] 10 443
## [1] 10 443
Visualizing the data is also important to get a sense for the data
and look for potential errors and outliers. Base R has plotting
functions that allow you to create quick plots without having to know a
lot of code. I often use Base R plot functions when I’m exploring data
but not making plots I plan to use for publication. When I need to
create more complex plots, I use ggplot2, which we’ll cover
on Day 2 and 3.
Histograms are a great start. The code below generates a basic
histogram plot of a specific column in the dataframe using the
hist() function.
Plot histogram of DBH measurements
Looking at the histogram, it looks like all of the measurements are
below 100cm except for one that’s way out in 400 range. You can also
make a scatterplot of the data. If you only specify one column, the x
axis will be the row number for each record, and the y axis will be the
specified column.
Make point plot of DBH measurements
Again, you can see there’s one value that’s greater than all of the
others.
We can also plot two variables in a scatterplot.
Make scatterplot of crown class vs. DBH measurements (Option 1)
Make scatterplot of crown class vs. DBH measurements (Option 2- better axis labels)
Again, you can see there’s one value that’s greater than all of the
others, and it’s crown class code 3 (codominant).
dplyr.
ifelse() and case_when()
conditional statements.
dplyr.
summarize() and mutate().
We are now going to learn how to subset rows and columns and other
common data wrangling tasks using packages in the
tidyverse. Taken directly from
tidyverse.org: “The tidyverse is an
opinionated collection of R packages designed for data science. All
packages share an underlying design philosophy, grammar, and data
structures.”
You should have installed all of the tidyverse packages in preparation for this training. If you missed that step, install tidyverse packages using code below. It can take a few minutes for all the packages to install.
Only run if you haven’t installed these packages yet
Load the tidyverse
Coding Tip: When you type library(tidyverse), you’re
loading all nine the packages in the tidyverse. If you’re only using one
or two packages, it’s better to just load those to packages. It’s
clearer to the user which packages are needed to run your code and
reduces dependencies. For this session, we’re only going to use
dplyr, so I will just load that.
map() that allow you
to iterate functions or processes like a for loop.
read functions for csv, and other
formats. The read_csv() function, for example has more
bells and whistles than the base R read.csv() function.
I’ve never needed those extra features, so I just use
read.csv().
head(data.frame) over the format for
head(tibble).
dplyr
The dplyr package is perhaps the single most useful
package in R for working with your data.
Artwork by
@allison_horst
dplyr functions and their use:
Now, using the dplyr package in the
tidyverse, we’re going to do the same operations we did
yesterday with brackets.
Read in example NETN tree data
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")Replace decay class code “PM” with NA (blank).
# Base R
trees2 <- trees
trees2$DecayClassCode[trees2$DecayClassCode == "PM"] <- NA
trees2$DecayClassCode_num <- as.numeric(trees2$DecayClassCode)# dplyr approach with mutate
trees2 <- mutate(trees, DecayClassCode_num = as.numeric(replace(DecayClassCode, DecayClassCode == "PM", NA)))
str(trees2)## 'data.frame': 164 obs. of 13 variables:
## $ ParkUnit : chr "MIMA" "MIMA" "MIMA" "MIMA" ...
## $ PlotCode : int 12 12 12 12 12 12 12 12 12 12 ...
## $ SampleDate : chr "6/16/2025" "6/16/2025" "6/16/2025" "6/16/2025" ...
## $ IsQAQC : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ SampleYear : int 2025 2025 2025 2025 2025 2025 2025 2025 2025 2025 ...
## $ TagCode : int 13 12 11 2 10 7 5 9 1 3 ...
## $ TSN : int 183385 28728 28728 28728 28728 28728 28728 28728 28728 28728 ...
## $ ScientificName : chr "Pinus strobus" "Acer rubrum" "Acer rubrum" "Acer rubrum" ...
## $ DBHcm : num 24.9 10.9 18.8 51.2 38.2 22.5 26.4 42.9 12.3 49 ...
## $ TreeStatusCode : chr "AS" "AB" "AS" "AS" ...
## $ CrownClassCode : int 5 5 3 3 3 4 NA NA NA NA ...
## $ DecayClassCode : chr NA NA NA NA ...
## $ DecayClassCode_num: num NA NA NA NA NA NA 1 3 2 3 ...
Convert SampleDate (character) to Date (date-time).
# dplyr approach with mutate
trees3 <- mutate(trees2, Date = as.Date(SampleDate, format = "%m/%d/%Y"))Rename the ScientificName column to Species.
## [1] "ParkUnit" "PlotCode" "SampleDate"
## [4] "IsQAQC" "SampleYear" "TagCode"
## [7] "TSN" "Species" "DBHcm"
## [10] "TreeStatusCode" "CrownClassCode" "DecayClassCode"
## [13] "DecayClassCode_num"
Create a Plot_Name column that’s a combination of ParkUnit and PlotCode.
# dplyr approach with mutate
trees2 <- mutate(trees2, Plot_Name = paste(ParkUnit, PlotCode, sep = "-"))Drop records that are QAQC visits.
# Base R
trees3 <- subset(trees2, IsQAQC == FALSE, select = -DecayClassCode) # Note the importance of FALSE all caps# dplyr
trees3a <- filter(trees2, IsQAQC == FALSE)
trees3 <- select(trees3a, -DecayClassCode)
head(trees3)## ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN Species
## 1 MIMA 12 6/16/2025 FALSE 2025 13 183385 Pinus strobus
## 2 MIMA 12 6/16/2025 FALSE 2025 12 28728 Acer rubrum
## 3 MIMA 12 6/16/2025 FALSE 2025 11 28728 Acer rubrum
## 4 MIMA 12 6/16/2025 FALSE 2025 2 28728 Acer rubrum
## 5 MIMA 12 6/16/2025 FALSE 2025 10 28728 Acer rubrum
## 6 MIMA 12 6/16/2025 FALSE 2025 7 28728 Acer rubrum
## DBHcm TreeStatusCode CrownClassCode DecayClassCode_num Plot_Name
## 1 24.9 AS 5 NA MIMA-12
## 2 10.9 AB 5 NA MIMA-12
## 3 18.8 AS 3 NA MIMA-12
## 4 51.2 AS 3 NA MIMA-12
## 5 38.2 AS 3 NA MIMA-12
## 6 22.5 AS 4 NA MIMA-12
dplyr.
The filter() function reduces rows. The
select() function reduces columns.
|>
The pipe (|> or %>%) makes
dplyr and other tidyverse packages even more powerful. The
pipe |> allows you to string together commands. So,
taking all of the code above, we can do it all in the same function
call.
Wrangle tree data with pipes
trees_final <- trees |>
mutate(DecayClassCode_num = as.numeric(replace(DecayClassCode, DecayClassCode == "PM", NA)),
Plot_Name = paste(ParkUnit, PlotCode, sep = "-"),
Date = as.Date(SampleDate, format = "%m/%d/%Y")) |>
rename("Species" = "ScientificName") |>
filter(IsQAQC == FALSE) |>
select(-DecayClassCode) |>
arrange(Plot_Name, TagCode)
head(trees_final) ## ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN
## 1 MIMA 11 6/16/2025 FALSE 2025 1 28728
## 2 MIMA 11 6/16/2025 FALSE 2025 2 28728
## 3 MIMA 11 6/16/2025 FALSE 2025 3 28728
## 4 MIMA 11 6/16/2025 FALSE 2025 4 28728
## 5 MIMA 11 6/16/2025 FALSE 2025 5 19281
## 6 MIMA 11 6/16/2025 FALSE 2025 6 19300
## Species DBHcm TreeStatusCode CrownClassCode DecayClassCode_num
## 1 Acer rubrum 21.5 AS 5 NA
## 2 Acer rubrum 10.4 DS NA 2
## 3 Acer rubrum 16.8 AS 5 NA
## 4 Acer rubrum 13.6 AS 5 NA
## 5 Quercus palustris 61.8 AS 1 NA
## 6 Quercus bicolor 15.5 AS 5 NA
## Plot_Name Date
## 1 MIMA-11 2025-06-16
## 2 MIMA-11 2025-06-16
## 3 MIMA-11 2025-06-16
## 4 MIMA-11 2025-06-16
## 5 MIMA-11 2025-06-16
## 6 MIMA-11 2025-06-16
The warning in the console tells us that in converting DecayClassCode to numeric, some NAs were introduced. This means that any row in DecayClassCode that had text in it was converted to a NA. In this case it’s the ‘PM’ records, and we’re expecting this warning.
The arrange() line just shows how to order the data by
plot name and tree tag number.
Hopefully you agree that pipes are amazing! They allow for more
efficient coding in relatively easy to follow the steps, and make the
dplyr functions, like mutate() so much more useful. Outside
of pipes for example, mutate() doesn’t feel more useful
than base R for creating a new column. From now on, I will use pipes
regularly in the code.
%>%, that also functions as a
pipe with code. The %>% pipe was the original pipe that
was introduced by the tidyverse in the magrittr package.
The magrittr pipe was so popular, that starting in R 4.0, a
base R pipe was introduced (|>). It’s supposed to be
better optimized for order of operations and reduces a package you need
to install. So, in general, use the base R pipe |>. It’s
also why I had you set the default pipe in Global Options to the
|>. A useful keyboard shortcut for the pipe is Ctrl +
Shift + M. You should see the |> pipe in your script
when you type that shortcut. If you get the %>% pipe
instead, you need to change that default setting in Global Options (see
Day 1 > R and RStudio > RStudio Global Options > Step 3. Change
default pipe.) Coding Tip: While the number of steps you can pipe together is virtually endless, piping many tasks, especially complex ones, can make code hard to read and troubleshoot. It’s best to limit number of pipes to 3-4, and/or to do complex tasks that might fail or require checking on their own.
CHALLENGE: What is the exact value of the largest DBH, and which record does it belong to?
# Base R and dplyr combo
max_dbh <- max(trees_final$DBHcm, na.rm = TRUE)
trees_final |>
filter(DBHcm == max_dbh) |>
select(Plot_Name, SampleYear, TagCode, Species, DBHcm)## Plot_Name SampleYear TagCode Species DBHcm
## 1 MIMA-16 2025 1 Quercus robur 443
# dplyr with slice
trees_final |>
arrange(desc(DBHcm)) |> # arrange DBHcm high to low via desc()
slice(1) |> # slice the top record
select(Plot_Name, SampleYear, TagCode, Species, DBHcm)## Plot_Name SampleYear TagCode Species DBHcm
## 1 MIMA-16 2025 1 Quercus robur 443
CHALLENGE: Fix the DBH typo by replacing 443.0 with 44.3.
# Base R
# create copy of trees data
trees_fix <- trees3
# find the problematic DBH value, and change it to 44.3
trees_fix$DBHcm[trees_fix$Plot_Name == "MIMA-016" & trees_fix$TagCode == 1 & trees_fix$DBHcm == 443] <- 44.3
# dplyr via replace
trees_fix <- trees |> mutate(DBHcm = replace(DBHcm, DBHcm == 443.0, 44.3))Check that it worked by showing the range of the original and fixed data frames.
## [1] 10 443
## [1] 10.0 81.5
Conditional functions ifelse(),
if(){ }else{ }, and case_when() allow you to
return results that depends on specified conditions.
ifelse(): Primarily for use with data frames. Takes 3
arguments: 1) the condition to test; 2) the value to return if condition
is true; 3) the value to return of the condition is false. Function can
only handle 2 possible outcomes, although nested ifelse()
statements are possible (see example below). This function is
vectorized, which means it’s optimized for working on columns in data
frames. Of the 3 conditionals, it tends to perform the fastest on large
data sets.
case_when(): Primarily for use with data frames. Can take
any number of condition statements and their value to return. Requires
dplyr package to be loaded. Syntax is a bit tricky to
figure out at first, but once you have it, it’s about as easy as using
ifelse(). This function is akin to SQL CASE WHEN. On large
data sets, it consistently performs slower than ifelse().
if(){ }else{ }: Can be used with data frames, but is more
commonly used for operations outside of data frames. An example would be
only running a chunk of code if a certain condition is met (e.g., if the
data frame has > 0 rows, run next line of code.)
ifelse()
The ifelse() function takes 3 arguments organized like:
ifelse(condition == TRUE, return this, return this instead).
The first is the condition you’re testing. The second argument is what
to return if the condition is met. The third is what to return if the
condition is not met. You can also nest ifelse() to include
more than 2 conditions, but it can quickly get out of hand and hard to
follow (see below).
Let’s start by adding a column to the NETN tree data that uses the
TreeStatusCode to create a new column called
status that is either live or dead conditional on the
abbreviated code in TreeStatusCode in
trees_final.
Create status column conditioning on TreeStatusCode
# Check the levels of TreeStatusCode
sort(unique(trees_final$TreeStatusCode))
alive <- c("AB", "AL", "AS", "RS")
dead <- c("DB", "DM", "DS")
trees_final <- trees_final |>
mutate(status = ifelse(TreeStatusCode %in% alive, "live", "dead"))
# nested ifelse to make alive, dead, and recruit
trees_final <- trees_final |>
mutate(status2 = ifelse(TreeStatusCode %in% dead, "dead",
ifelse(TreeStatusCode %in% "RS", "recruit",
"live")))%in% instead of ==
because alive has multiple status codes. We used == in
TreeStatusCode == "RS" because there’s only one status code
considered a recruit. We could have also used %in% with the
same results, but that would not be true for matching the alive status
codes.
case_when()
The case_when() function allows you to have multiple
conditions, each with their own return. The syntax is a bit different
than ifelse() to allow for the multiple conditions and
returns. Using the same approach as above, we’ll create a third status
code with case_when(). We’ll specify the output for when a
tree status code is in the dead category, recruit, and live category.
We’ll then add a fourth output for status codes that don’t match any of
the previous conditions and set that as ‘unknown’. Basically the
TRUE just means, any records left are assigned ‘unknown’.
Note also the order of operations in case_when(). The alive
group includes “RS”, but is not assigned “live”, because it was alreay
matched. The case_when() function starts with the first
statement at the top (i.e. dead trees). Any record that matches the
first statement is then dropped from additional statements. The second
statement considers all trees not matched as dead. The third statement
considers all trees not matched as dead or recruit. Then the fourth
statement considers any trees not matched as dead, recruit, or live.
Rather than relying on this function behavior, it’s better to not have
overlapping categories (e.g. not include “RS” in alive). I include it
here to demonstrate the point.
Create status column conditioning on TreeStatusCode
# Check the levels of TreeStatusCode
alive <- c("AB", "AL", "AS", "RS")
dead <- c("DB", "DM", "DS")
trees_final <- trees_final |>
mutate(status3 = case_when(TreeStatusCode %in% dead ~ 'dead',
TreeStatusCode %in% 'RS' ~ 'recruit',
TreeStatusCode %in% alive ~ 'live',
TRUE ~ 'unknown'))
table(trees_final$status2, trees_final$status3) # check that the output is the same##
## dead live recruit
## dead 25 0 0
## live 0 130 0
## recruit 0 0 9
if(){ }else{ }
This style of if(){ }else{ }, hereafter called if/else,
conditionals is best used for operations outside of data frames, like
turning code on or off based on specific conditions. I use if/else with
ggplot (graphing R package we’ll cover later) to turn certain features
on or off based on a condition in the data or a condition I set. If/else
statements are also helpful for bug handling in your code. For example,
if you want the code to send a warning when your data frame is empty (no
rows), you can have an if/else statement that prints to the console. You
can string together multiple conditions to test by adding more
else{ } statements.
Print warning in console that indicates if invasive species are found in wetland data
inv <- ACAD_wetland |> filter(Invasive == TRUE)
if(nrow(inv) > 0){print("Invasive species were detected in the data.")
} else {print("No invasive species were detected in the data.")}## [1] "Invasive species were detected in the data."
Force the else statement to print, by filtering out invasive species before testing. I added another potential else statement just to show that syntax.
native_only <- ACAD_wetland |> filter(Invasive == FALSE)
inv2 <- native_only |> filter(Invasive == TRUE)
if(nrow(inv2) > 0){print("Invasive species were detected in the data.")
} else if(nrow(inv2) == 0){print("No invasive species were detected in the data.")
} else {"Invasive species detections unclear"}## [1] "No invasive species were detected in the data."
# read in wetland data if you don't already have it loaded.
ACAD_wetland <- read.csv("./data/ACAD_wetland_data_clean.csv")
# Base R using the with() function
ACAD_wetland$Status <- with(ACAD_wetland, ifelse(Protected == TRUE, "protected", "public"))
table(ACAD_wetland$Status, ACAD_wetland$Protected)# Tidyverse
ACAD_wetland <- ACAD_wetland |> mutate(Status = ifelse(Protected == TRUE, "protected", "public"))
table(ACAD_wetland$Status, ACAD_wetland$Protected) # check your work##
## FALSE TRUE
## protected 0 9
## public 499 0
# Base R using the with() function and nested ifelse()
ACAD_wetland$abundance_cat <- with(ACAD_wetland, ifelse(Ave_Cov < 10, "Low",
ifelse(Ave_Cov >= 10 & Ave_Cov <= 50, "Medium", "High")))# Tidyverse using case_when() and between
ACAD_wetland <- ACAD_wetland |> mutate(abundance_cat = case_when(Ave_Cov < 10 ~ "Low",
between(Ave_Cov, 10, 50) ~ "Medium",
TRUE ~ "High"))
table(ACAD_wetland$abundance_cat)##
## High Low Medium
## 6 464 38
Note the use of the between() function that saves
typing. This function matches as >= and <=.
group_by() and summarize()
Yesterday, we used functions like mean(),
min(), and max() to summarize entire datasets.
Now we’re going to use those same functions to summarize data by
grouping variables, such as park, year, plot, etc. The process is
similar to using Totals in Access or subtotals in Excel, although it is
more flexible and efficient in R.
summarize() and mutate():
mutate() returns the same number of rows as the
original data frame. This function also returns all of the rows that
were in the original data frame.
summarize() returns the same number of rows as there are
grouping levels in the original data frame. This function only
returns the rows that were part of the group_by() and that
were created in the summarize() function.
mean(): calculate the group means
min(): calculate the group minimums
max(): calculate the group maximums
sum(): calculate the group sums
sd(): calculate the group standard deviations
n(): tally the number of rows within each group
Sum the number of trees per plot, year, and species using mutate.
Note the use of n(), which counts the number of rows
within a group. Be careful here. If there are NAs in a group, they are
counted by n(). Whether that’s okay or not depends on the
data.
num_trees_mut <- trees_final |>
group_by(Plot_Name, SampleYear, Species) |>
mutate(num_trees = n()) |>
select(Plot_Name, SampleYear, Species, num_trees)
nrow(trees_final) #164## [1] 164
## [1] 164
## # A tibble: 6 × 4
## # Groups: Plot_Name, SampleYear, Species [3]
## Plot_Name SampleYear Species num_trees
## <chr> <int> <chr> <int>
## 1 MIMA-11 2025 Acer rubrum 6
## 2 MIMA-11 2025 Acer rubrum 6
## 3 MIMA-11 2025 Acer rubrum 6
## 4 MIMA-11 2025 Acer rubrum 6
## 5 MIMA-11 2025 Quercus palustris 1
## 6 MIMA-11 2025 Quercus bicolor 1
Note how the format of head(num_trees_mut) differs from
output for a data frame. This is how you know your dataset has been
turned into a tibble.
Sum the number of trees per plot, year, and species using mutate.
num_trees_sum <- trees_final |>
group_by(Plot_Name, SampleYear, Species) |>
summarize(num_trees = n())
nrow(trees_final) #164## [1] 164
## [1] 41
## # A tibble: 6 × 4
## # Groups: Plot_Name, SampleYear [2]
## Plot_Name SampleYear Species num_trees
## <chr> <int> <chr> <int>
## 1 MIMA-11 2025 Acer rubrum 6
## 2 MIMA-11 2025 Fraxinus pennsylvanica 1
## 3 MIMA-11 2025 Quercus bicolor 1
## 4 MIMA-11 2025 Quercus palustris 1
## 5 MIMA-12 2025 Acer rubrum 9
## 6 MIMA-12 2025 Fraxinus 1
The group_by() + mutate() approach is
helpful if you’re trying to standardize values within your group. But in
most cases, the group_by() + summarize()
approach, which collapses to the group level, is what you’re looking
for.
Note the warning that summarize() gave you in the
console. The tidyverse is chatty. They have a lot of checks built into
their functions based on how the developers think you should be using
their functions. The warning with summarize() is
particularly annoying. To turn the warning off, you can specify
.groups = 'drop'. I’ll show that next.
There’s also a new .by argument in summarize that allows
you to skip the group_by() step. It came after I learned
dplyr, so I often forget to use it. Most examples I see online use the
original way, I include both approaches below. Interestingly, using
.by returns a data frame, whereas group_by()
returns a tibble. The differences are small and not something to be too
concerned about.
Summarize the average and standard error of tree DBH by plot and year
tree_dbh <- trees_final |>
group_by(Plot_Name, SampleYear) |>
summarize(mean_dbh = mean(DBHcm),
num_trees = n(),
se_dbh = sd(DBHcm)/sqrt(num_trees),
.groups = 'drop') # prevents warning in console
tree_dbh2 <- trees_final |>
summarize(mean_dbh = mean(DBHcm),
num_trees = n(),
se_dbh = sd(DBHcm)/sqrt(num_trees),
.by = c(Plot_Name, SampleYear))
tree_dbh == tree_dbh2 # tests that all the values in 1 data frame match the 2nd. ## Plot_Name SampleYear mean_dbh num_trees se_dbh
## [1,] TRUE TRUE TRUE TRUE TRUE
## [2,] TRUE TRUE TRUE TRUE TRUE
## [3,] TRUE TRUE TRUE TRUE TRUE
## [4,] TRUE TRUE TRUE TRUE TRUE
## [5,] TRUE TRUE TRUE TRUE TRUE
## [6,] TRUE TRUE TRUE TRUE TRUE
## [7,] TRUE TRUE TRUE TRUE TRUE
## [8,] TRUE TRUE TRUE TRUE TRUE
## [9,] TRUE TRUE TRUE TRUE TRUE
## [10,] TRUE TRUE TRUE TRUE TRUE
# Using group_by()
ACAD_inv <- ACAD_wetland |> group_by(Site_Name, Year, Invasive) |>
summarize(Pct_Cov = sum(Ave_Cov),
.groups = 'drop') |> # optional line to keep console from being chatty
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_inv)## # A tibble: 6 × 4
## Site_Name Year Invasive Pct_Cov
## <chr> <int> <lgl> <dbl>
## 1 RAM-05 2012 FALSE 155.
## 2 RAM-05 2017 FALSE 152.
## 3 RAM-05 2017 TRUE 0.06
## 4 RAM-41 2012 FALSE 48.6
## 5 RAM-41 2017 FALSE 107.
## 6 RAM-41 2017 TRUE 10.2
# Using summarize(.by)
ACAD_inv2 <- ACAD_wetland |>
summarize(Pct_Cov = sum(Ave_Cov), .by = c(Site_Name, Year, Invasive)) |>
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_inv2) # should be the same as ACAD_inv## Site_Name Year Invasive Pct_Cov
## 1 RAM-05 2012 FALSE 155.42
## 2 RAM-05 2017 FALSE 152.04
## 3 RAM-05 2017 TRUE 0.06
## 4 RAM-41 2017 FALSE 107.04
## 5 RAM-41 2012 FALSE 48.56
## 6 RAM-41 2017 TRUE 10.20
# Using group_by()
ACAD_spp <- ACAD_wetland |> group_by(Site_Name, Year, Invasive) |>
summarize(num_spp = n(),
.groups = 'drop') |> # optional line to keep console from being chatty
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_spp)## # A tibble: 6 × 4
## Site_Name Year Invasive num_spp
## <chr> <int> <lgl> <int>
## 1 RAM-05 2012 FALSE 44
## 2 RAM-05 2017 FALSE 53
## 3 RAM-05 2017 TRUE 1
## 4 RAM-41 2012 FALSE 33
## 5 RAM-41 2017 FALSE 39
## 6 RAM-41 2017 TRUE 1
# Using summarize(.by)
ACAD_spp2 <- ACAD_wetland |>
summarize(num_spp = n(), .by = c(Site_Name, Year, Invasive)) |>
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_spp2) # should be the same as ACAD_inv## Site_Name Year Invasive num_spp
## 1 RAM-05 2012 FALSE 44
## 2 RAM-05 2017 FALSE 53
## 3 RAM-05 2017 TRUE 1
## 4 RAM-41 2017 FALSE 39
## 5 RAM-41 2012 FALSE 33
## 6 RAM-41 2017 TRUE 1
Most efficient solution figured out during training
# using the .by within mutate (newer solution)
ACAD_wetland <- ACAD_wetland |>
mutate(Site_Cover = sum(Ave_Cov),
.by = c(Site_Name, Year)) |>
mutate(rel_cov = (Ave_Cov/Site_Cover)*100,
.by = c(Site_Name, Year, Latin_Name, Common))Original Solution: First sum site-level cover using mutate to return a value for every original row.
ACAD_wetland <- ACAD_wetland |> group_by(Site_Name, Year) |>
mutate(Site_Cover = sum(Ave_Cov)) |>
ungroup() # good practice to ungroup after group.
table(ACAD_wetland$Site_Name, ACAD_wetland$Site_Cover) # check that each site has a unique value.##
## 48.56 70.6 104.78 106.72 111.4 117.24 152.1 153.8 155.42 165.64 178.34
## RAM-05 0 0 0 0 0 0 54 0 44 0 0
## RAM-41 33 0 0 0 0 40 0 0 0 0 0
## RAM-44 0 0 45 0 0 0 0 0 0 0 34
## RAM-53 0 0 0 0 0 0 0 0 0 0 0
## RAM-62 0 0 0 0 0 0 0 26 0 26 0
## SEN-01 0 34 0 0 0 0 0 0 0 0 0
## SEN-02 0 0 0 41 0 0 0 0 0 0 0
## SEN-03 0 0 0 0 33 0 0 0 0 0 0
##
## 188.84 196.52
## RAM-05 0 0
## RAM-41 0 0
## RAM-44 0 0
## RAM-53 48 50
## RAM-62 0 0
## SEN-01 0 0
## SEN-02 0 0
## SEN-03 0 0
## # A tibble: 6 × 15
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov Invasive Protected
## <chr> <chr> <chr> <chr> <int> <int> <dbl> <lgl> <lgl>
## 1 SEN-01 Sentinel Acer rubr… red m… 2011 0 0.02 FALSE FALSE
## 2 SEN-01 Sentinel Amelanchi… servi… 2011 20 0.02 FALSE FALSE
## 3 SEN-01 Sentinel Andromeda… bog r… 2011 80 2.22 FALSE FALSE
## 4 SEN-01 Sentinel Arethusa … drago… 2011 40 0.04 FALSE TRUE
## 5 SEN-01 Sentinel Aronia me… black… 2011 100 2.64 FALSE FALSE
## 6 SEN-01 Sentinel Carex exi… coast… 2011 60 6.6 FALSE FALSE
## # ℹ 6 more variables: X_Coord <dbl>, Y_Coord <dbl>, Status <chr>,
## # abundance_cat <chr>, Site_Cover <dbl>, rel_cov <dbl>
Next calculate relative cover grouped on Site_Name, Year, Latin_Name, and Common
# Create new dataset because collapsing rows on grouping variables
# Using group_by() and summarize()
ACAD_wetland_relcov <- ACAD_wetland |> group_by(Site_Name, Year, Latin_Name, Common) |>
summarize(rel_cov = (Ave_Cov/Site_Cover)*100,
.groups = 'drop') |>
ungroup()Check that relative cover sums to 100% within each site
# Using summarize(.by = )
ACAD_wetland_relcov2 <- ACAD_wetland |> #group_by(Site_Name, Year, Latin_Name, Common) |>
summarize(rel_cov = Ave_Cov/Site_Cover,
.by = c("Site_Name", "Year", "Latin_Name", "Common"))
# Check that your relative cover sums to 100 for each site
relcov_check <- ACAD_wetland_relcov2 |> group_by(Site_Name, Year) |>
summarize(tot_relcov = sum(rel_cov)*100, .groups = 'drop')
table(relcov_check$tot_relcov) # they should all be 100##
## 100
## 13
Consider three example data visualizations that demonstrate how some approaches are more effective than others in conveying patterns.
Most people can understand this figure of daily Covid cases faster than they can understand the table of daily Covid cases.
| state | timestamp | cases | total_population |
|---|---|---|---|
| AK | 2022-01-25T04:00:00Z | 203110 | 731545 |
| AL | 2022-01-25T04:00:00Z | 1153149 | 4903185 |
| AR | 2022-01-25T04:00:00Z | 738652 | 3017804 |
| AZ | 2022-01-25T04:00:00Z | 1767303 | 7278717 |
| CA | 2022-01-25T04:00:00Z | 7862003 | 39512223 |
| CO | 2022-01-25T04:00:00Z | 1207991 | 5758736 |
| CT | 2022-01-25T04:00:00Z | 683731 | 3565287 |
This table shows average monthly revenue for Acme products.
| category | product | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| party supplies | balloons | 892 | 1557 | 1320 | 972 | 1309 | 1174 | 1153 | 1138 | 1275 | 1178 | 1325 | 1422 |
| party supplies | confetti | 1271 | 1311 | 829 | 1020 | 1233 | 1061 | 1088 | 1395 | 1376 | 1152 | 1568 | 1412 |
| party supplies | party hats | 1338 | 1497 | 1445 | 956 | 1372 | 1482 | 1048 | 877 | 1404 | 1030 | 1458 | 1547 |
| party supplies | wrapping paper | 1396 | 1026 | 932 | 891 | 1364 | 896 | 900 | 1221 | 1146 | 967 | 1394 | 1507 |
| school supplies | backpacks | 1802 | 1773 | 1611 | 1723 | 1799 | 1730 | 1813 | 1676 | 1748 | 1652 | 1819 | 1759 |
| school supplies | notebooks | 1153 | 1471 | 1541 | 1371 | 1592 | 1514 | 1725 | 1702 | 1457 | 1604 | 1729 | 1279 |
| school supplies | pencils | 1679 | 1304 | 1054 | 1259 | 1425 | 1608 | 1972 | 1811 | 1610 | 1004 | 1417 | 1283 |
| school supplies | staplers | 1074 | 1708 | 1439 | 1154 | 1551 | 1099 | 1793 | 1601 | 1647 | 1666 | 1389 | 1511 |
Use the table above to answer these questions:
Now let’s display the same table as a heat map, with larger numbers represented by darker color cells. How quickly can we answer those same two questions? What patterns can we see in the heat map that were not obvious in the table above?
In 1973, Francis Anscombe published “Graphs in statistical analysis”, a paper describing four bivariate datasets with identical means, variances, and correlations.
| x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 |
|---|---|---|---|---|---|---|---|
| 10 | 8.04 | 10 | 9.14 | 10 | 7.46 | 8 | 6.58 |
| 8 | 6.95 | 8 | 8.14 | 8 | 6.77 | 8 | 5.76 |
| 13 | 7.58 | 13 | 8.74 | 13 | 12.74 | 8 | 7.71 |
| 9 | 8.81 | 9 | 8.77 | 9 | 7.11 | 8 | 8.84 |
| 11 | 8.33 | 11 | 9.26 | 11 | 7.81 | 8 | 8.47 |
| 14 | 9.96 | 14 | 8.10 | 14 | 8.84 | 8 | 7.04 |
| 6 | 7.24 | 6 | 6.13 | 6 | 6.08 | 8 | 5.25 |
| 4 | 4.26 | 4 | 3.10 | 4 | 5.39 | 19 | 12.50 |
| 12 | 10.84 | 12 | 9.13 | 12 | 8.15 | 8 | 5.56 |
| 7 | 4.82 | 7 | 7.26 | 7 | 6.42 | 8 | 7.91 |
| 5 | 5.68 | 5 | 4.74 | 5 | 5.73 | 8 | 6.89 |
| x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 | |
|---|---|---|---|---|---|---|---|---|
| mean | 9 | 7.50 | 9 | 7.50 | 9 | 7.50 | 9 | 7.50 |
| var | 11 | 4.13 | 11 | 4.13 | 11 | 4.12 | 11 | 4.12 |
Anscombe data as plots: Despite their identical statistics, when we plot the data we see the four datasets are actually very different. Anscombe’s point was to understand the data, we must plot the data.
ggplot2ggplot2
The ggplot2 package is the most popular R
package for plotting. It takes a little effort to learn how the pieces
of a ggplot object fit together. However, once you get the hang of it,
you can create and customize a large variety of attractive plots with
just a few lines of R code. The package is called
ggplot2 because originally there was ggplot.
The developer, Hadley Wickham, didn’t want to break the original package
to improve the package, so created ggplot2.
The ggplot2 online book and
cheat sheets can be very helpful while you are learning to use the
ggplot2 package.
The ggplot2 package was developed using the grammar of
graphics as the underlying philosophy, which basically breaks a plot up
into individual building blocks related to aesthetics (e.g., color,
size, shape), geometries (e.g. points, lines, boxes), and themes
(e.g. axis label font size, legend placement, etc.).
ggplot2:
aes() argument.
ggplot()call is the data. This also means you can pipe data
into a ggplot object.
aes() argument either at the ggplot() level,
or within the specific geom.
aes(). If you
specify aes(color = park), then scale is where you can
specify a custom color for each park instead of ggplot’s default color
scheme. The scale is where you can set the labels of groups in a legend
(if different than how the data are labeled). You can also customize
axis ranges, breaks, and labels with different scales.
We will build our first ggplot object step-by-step to demonstrate how each component contributes to the final plot. Our first ggplot object will create a line graph of trends in visits to Acadia National Park from 1994 to 2024.
Step 1. Import the visitation data and load ggplot2
library(ggplot2)
library(dplyr) # for filter
visits <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/ACAD_annual_visits.csv")Step 2. Look over the data for any potential problems.
# Examine the data to understand data structure, data types, and potential problems
head(visits)
summary(visits)
table(visits$Year)
table(complete.cases(visits))## 'data.frame': 31 obs. of 3 variables:
## $ Park : chr "ACAD" "ACAD" "ACAD" "ACAD" ...
## $ Year : int 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 ...
## $ Annual_Visits: chr "2,710,749" "2,845,378" "2,704,831" "2,760,306" ...
Step 3. Fix the annual visitation variable
Note that Annual_Visits is treated like a character because there are
“,” thousands separators. We have to fix that before we can plot the
data. There are multiple ways to do this. I’m going to use the
gsub() function that behaves like
gsub([find "pattern"], [replace with "pattern"], [column to search]).
The empty “” removes the “,”. The as.numeric() converts a
character to numeric. As long as you don’t get the “NAs introduced
through coersion” warning in the console, all rows in the column of
interest were successfully converted to a number.
# Base R
visits$Annual_Visits <- as.numeric(gsub(",", "", visits$Annual_Visits))
# Tidyverse
library(dplyr) # load package first
visits <- visits |> mutate(Annual_Visits = as.numeric(gsub(",", "", Annual_Visits)))
str(visits) #check that it worked## 'data.frame': 31 obs. of 3 variables:
## $ Park : chr "ACAD" "ACAD" "ACAD" "ACAD" ...
## $ Year : int 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 ...
## $ Annual_Visits: num 2710749 2845378 2704831 2760306 2594497 ...
Step 4. Create the ggplot template of annual visitation per 1000 visitors over time
Step 5. Add line and point geometry, starting with default, and ending with customized symbols
The plot below has custom the shape, color (outline), fill, and size of the points, but all points are the same. Note the hexcode used for the fill color. This is a 6 digit code that gives maximum flexibility on selecting colors. I often use HTML color codes to find colors and their associated hexcode.
Note that I usually specify the geom_line() before the
geom_point(), because the geoms are drawn in the order
they’re specified. Specifying the opposite order will make the lines
cross over the points and doesn’t look as nice. I also made the
linewidth a bit thicker than default.
p1 <- p +
geom_line(linewidth = 0.6) +
geom_point(color = "black", fill = "#82C2a3", size = 2.5, shape = 24)
p1Step 6. Fine tune the Y and X axis breaks
p2 <- p1 + scale_y_continuous(name = "Annual visitors in 1000's",
limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) + # label at 2000, 2500, ... up to 4500
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) # label at 1994, 1999, ... up to 2024
p2Step 7. Change the labels
Note how the axis labels can be specified either in the scale (step
4) or in the labs() function.
Step 8. Modify the theme using built-in themes
There are built in themes that change the default formatting of the
plot. Here I show theme_bw(), but there are many options.
Play around with the other themes to get a feel for the options:
theme_linedraw(), theme_light(),
theme_dark(), theme_minimal(),
theme_classic(), theme_void(). The two I use
most often are theme_bw() and theme_classic().
There’s a package called ggthemes that you can install with
even more themes. You can also create your own themes.
Or change theme elements manually
Note that ?theme takes you to the help that shows all
the options.
p4b <- p3 + theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1), # make x axis text bigger and angle
panel.grid.major = element_blank(), # turns of major grids
panel.grid.minor = element_blank(), # turns off minor grids
panel.background = element_rect(fill = 'white', color = 'dimgrey'), # panel white w/ grey border
plot.margin = margin(2, 3, 2, 3), # increase white margin around plot
title = element_text(size = 10) # reduce title size
)
p4b
Note the order of margins in ggplot is Top, Right, Bottom, Left,
which sounds like trouble.
Putting it all together
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
geom_line() +
geom_point(color = "black", fill = "#82C2a3", size = 2.5, shape = 24) +
labs(x = "Year", title = "Annual visitation/1000 people in Acadia NP 1994 - 2024") +
scale_y_continuous(name = "Annual visitors in 1000's",
limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) +
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) +
theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1), # make x axis text bigger and angle
panel.grid.major = element_blank(), # turns of major grids
panel.grid.minor = element_blank(), # turns off minor grids
panel.background = element_rect(fill = 'white', color = 'dimgrey'), # make panel white w/ grey border
plot.margin = margin(2, 3, 2, 3), # increase white margin around plot
title = element_text(size = 10) # reduce title size
)CHALLENGE: Recreate the plot below (or customize your own plot). Note that the fill color is “#0080FF”, and the shape is 21, and the theme is classic. The linewidth = 0.75, and linetype = ‘dashed’.
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
geom_line(linewidth = 0.75, linetype = 'dashed') +
geom_point(color = "black", fill = "#0080FF", size = 2.5, shape = 21) +
labs(x = "Year", y = "Annual visits in 1,000s") +
scale_y_continuous(limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) +
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) +
theme_classic()So far, the data we’ve plotted doesn’t include any kind of variance. In many cases, we do want to show the distribution of data among years, parks or other categories. That’s where plots like bar plots with error bars and boxplots can be handy.
First we’ll look at how to make barplots with error bars. The visitation data doesn’t have error bars, so we’ll use a water chemistry dataset, and filter it to only include Jordan Pond and Dissolved Oxygen.
Load the data and packages
library(dplyr)
library(ggplot2)
chem <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_water_chemistry_data.csv")Filter on Jordan Pond and Dissolved Oxygen and extract the month from the date in a VERY hacky way.
We’ll learn about working with dates in a more formal way on Day 3.
jordDO <- chem |>
filter(SiteCode == "ACJORD") |>
filter(Parameter == "DO_mgL") |>
mutate(month = as.numeric(gsub("/", "", substr(EventDate, 1, 2))))head(jordDO)
unique(jordDO$SiteCode) # check filter worked
unique(jordDO$Parameter) # check filter workedLet’s assume that calculating mean and standard error assuming the data were normally distributed is appropriate. The code below calculates mean and standard error by month, so we can plot error bars later.
Calculate standard error of DO by month
jordDO_sum <- jordDO |> group_by(month) |>
summarize(mean_DO = mean(Value),
num_meas = n(),
se_DO = sd(Value)/sqrt(num_meas))
jordDO_sum## # A tibble: 6 × 4
## month mean_DO num_meas se_DO
## <dbl> <dbl> <int> <dbl>
## 1 5 11.1 17 0.168
## 2 6 9.54 19 0.0908
## 3 7 8.72 19 0.0445
## 4 8 8.73 19 0.0570
## 5 9 9.46 19 0.0902
## 6 10 10.3 19 0.0837
Bar charts with error bars are a common way to show mean and variance. Bar charts y-axes should always start at 0, which doesn’t necessarily allow you to see the patterns in the data all that well. Boxplots are often a better approach in that case, which we’ll show next.
A couple of notes on the code below:geom_col(). If instead, you want the bar chart to be
proportional to the number of cases in each group, use
geom_bar(stat = 'count').
width = 0.75 makes the bar width narrower allowing some
white space between the bars.
labs() means the x axis won’t
include a label.
Bar plot of Jordan Pond average DO with 95% CI error bars
ggplot(data = jordDO_sum, aes(x = month, y = mean_DO)) +
geom_col(fill = "#74AAE3", color = "dimgrey", width = 0.75) +
geom_errorbar(aes(ymin = mean_DO - 1.96*se_DO, ymax = mean_DO + 1.96*se_DO),
width = 0.75) +
theme_bw() +
labs(x = NULL, y = "Dissolved Oxygen mg/L") +
scale_x_continuous(limits = c(4, 11),
breaks = c(seq(5, 10, by = 1)),
labels = c("May", "Jun", "Jul",
"Aug", "Sep", "Oct"))Boxplots are another way to show variance in responses. In most cases, the middle line of a boxplot is the median. The lower and upper limits of the box represent the 25th and 75th percentiles. The lower and upper whiskers are 1.5 times the interquartile range (the 25th to 75th percentiles), or the min/max of the data, whichever is smaller. Points beyond the whiskers are considered outlying points. In this case, I turned off the outlying points, because I plotted the actual data behind the boxplots as slightly transparent. This is good practice to see how points are distributed.
Boxplots of Jordan Pond DO
ggplot(data = jordDO, aes(x = month, y = Value, group = month)) +
geom_boxplot(outliers = F) +
geom_point(alpha = 0.2) +
theme_bw() +
labs(x = NULL, y = "Dissolved Oxygen mg/L") +
scale_x_continuous(limits = c(4, 11),
breaks = c(seq(5, 10, by = 1)),
labels = c("May", "Jun", "Jul",
"Aug", "Sep", "Oct"))Goals for Day 3:
Artwork by
@allison_horst
Feedback: Please leave feedback in the
training feedback
form. You can submit feedback multiple times and don’t need to
answer every question. Responses are anonymous.
Reshaping data from long to wide and wide to long is a common task with our data. Datasets are usually described as long, or wide. The long form, which is the structure database tables often take, consists of each row being an observation, and each column being a variable (i.e. tidy format). However, in summary tables, we often want to reshape the data to be wide for better digestion.
We’ll work with a fake bat capture dataset to see how this works. To get started, load the dataset and packages, as shown below.
The example dataset contains simplified capture data in long form. Every row represents an individual bat that was captured at a given site and date. We want to turn this into a wide data frame that has a column for every species, and the number of individuals of that species that were caught for each year and site combination.
Before we start, we’ll make a species code that will be easier to work with (R doesn’t like spaces). We’ll also summarize the data, so we have a count of the number of individuals of each species found per year.
Create sppcode
bat_cap <- bat_cap |>
mutate(genus = toupper(word(Latin, 1)), # capitalize and extract first word in Latin
species = toupper(word(Latin, 2)), # capitalize and extract second word in Latin
sppcode = paste0(substr(genus, 1, 3), # combine first 3 characters of genus and species
substr(species, 1, 3))) |>
select(-genus, -species) # drop temporary columns
head(bat_cap)## Site Julian Year Latin Common sppcode
## 1 site_001 195 2025 Myotis leibii small-footed MYOLEI
## 2 site_001 214 2025 Myotis leibii small-footed MYOLEI
## 3 site_001 237 2022 Myotis leibii small-footed MYOLEI
## 4 site_001 230 2021 Myotis lucifugus little brown MYOLUC
## 5 site_001 201 2022 Lasiurus cinereus hoary LASCIN
## 6 site_001 230 2020 Myotis septentrionalis northern long-eared MYOSEP
Summarize # individuals per species, site and year
bat_sum <- bat_cap |>
summarize(num_indiv = sum(!is.na(sppcode)), # I prefer this over n()
.by = c("Site", "Year", "sppcode")) |>
arrange(Site, Year, sppcode) # helpful for ordering the future wide columnsNote that I’m using a trick in R with logicals to calculate
num_indiv. Logical expressions (TRUE/FALSE) are treated as
1/0 values under the hood in R. Remember that ! is
interpreted in R as “not”, so !is.na() reads as “not
blank”. Every row in sppcode is checked to see if it’s
blank or not. If it’s not blank, that returns a TRUE statement, which is
then treated as 1. The sum() function is then summing all
of the 1s in the data. This is a way to perform a count of rows that
meet a certain condition, and is safer than using n() in my
opinion.
Now that we have bat_sum, we’re going to pivot the data
wide to make each species be a separate column and the values in each
cell be the number of individuals captured. The code below is pretty
straightforward with names_from being the column you want
to turn into column names, and the values_from being the
value you want in the cells.
Pivot bat summary data to wide
## # A tibble: 6 × 6
## Site Year LASCIN MYOLEI MYOSEP MYOLUC
## <chr> <int> <int> <int> <int> <int>
## 1 site_001 2019 1 NA NA NA
## 2 site_001 2020 NA 1 1 NA
## 3 site_001 2021 NA 1 NA 1
## 4 site_001 2022 1 2 NA 1
## 5 site_001 2023 NA 1 NA NA
## 6 site_001 2024 NA NA NA 2
That was pretty simple. But there are a lot of blanks where a species
wasn’t caught in a give year and site. We can use the
values_fill argument to save us time filling blanks as
0s.
Pivot bat summary data to wide filling blanks as 0
bat_wide <- bat_sum |> pivot_wider(names_from = sppcode,
values_from = num_indiv,
values_fill = 0)
head(bat_wide)## # A tibble: 6 × 6
## Site Year LASCIN MYOLEI MYOSEP MYOLUC
## <chr> <int> <int> <int> <int> <int>
## 1 site_001 2019 1 0 0 0
## 2 site_001 2020 0 1 1 0
## 3 site_001 2021 0 1 0 1
## 4 site_001 2022 1 2 0 1
## 5 site_001 2023 0 1 0 0
## 6 site_001 2024 0 0 0 2
##
## TRUE
## 29
Now we see that every cell has a value. Another useful argument in
pivot_wider() is names_prefix. That allows you
to add a string before the column names that are generated in the pivot.
This is helpful if you’re pivoting on a number column, like year or plot
number. R doesn’t like column names that start with a number. The
names_prefix is a quick way to fix that. I’ll just show it with the bat
capture data as an example, even though it wasn’t needed.
bat_wide2 <- bat_sum |> pivot_wider(names_from = sppcode,
values_from = num_indiv,
values_fill = 0,
names_prefix = "spp_")
head(bat_wide2)## # A tibble: 6 × 6
## Site Year spp_LASCIN spp_MYOLEI spp_MYOSEP spp_MYOLUC
## <chr> <int> <int> <int> <int> <int>
## 1 site_001 2019 1 0 0 0
## 2 site_001 2020 0 1 1 0
## 3 site_001 2021 0 1 0 1
## 4 site_001 2022 1 2 0 1
## 5 site_001 2023 0 1 0 0
## 6 site_001 2024 0 0 0 2
CHALLENGE: Pivot the bat_sum data frame on year
instead of species, so that you have a column for every year of
captures. Remember to avoid column names starting with a number.
bat_wide_yr <- pivot_wider(bat_sum,
names_from = Year,
values_from = num_indiv,
values_fill = 0,
names_prefix = "yr")
head(bat_wide_yr)## # A tibble: 6 × 9
## Site sppcode yr2019 yr2020 yr2021 yr2022 yr2023 yr2024 yr2025
## <chr> <chr> <int> <int> <int> <int> <int> <int> <int>
## 1 site_001 LASCIN 1 0 0 1 0 0 0
## 2 site_001 MYOLEI 0 1 1 2 1 0 2
## 3 site_001 MYOSEP 0 1 0 0 0 0 0
## 4 site_001 MYOLUC 0 0 1 1 0 2 0
## 5 site_002 LASCIN 1 0 1 0 0 0 1
## 6 site_002 MYOLEI 1 2 1 0 0 2 0
We can reshape the capture data back to long, which will give us a
similar data as before with 0s are added into the data. For the
pivot_long() function, you have to tell it which columns to
pivot on. If you don’t specify, it will make the entire dataset into 2
long columns, which you typically don’t want. Here I tell R not to pivot
on Site and Year columns, because I know they’re in the data frame and
unlikely to change. If I instead specified the sppcodes to pivot on, if
a new species were found in the next year of sampling, I’d have to
update this code to include that new species.
bat_long <- bat_wide |> pivot_longer(cols = -c(Site, Year),
names_to = "sppcode",
values_to = "num_indiv")
head(bat_long)## # A tibble: 6 × 4
## Site Year sppcode num_indiv
## <chr> <int> <chr> <int>
## 1 site_001 2019 LASCIN 1
## 2 site_001 2019 MYOLEI 0
## 3 site_001 2019 MYOSEP 0
## 4 site_001 2019 MYOLUC 0
## 5 site_001 2020 LASCIN 0
## 6 site_001 2020 MYOLEI 1
CHALLENGE: Pivot the resulting data frame from the previous
question to long on the years columns, and remove the “yr” from the year
names using names_prefix = 'yr'.
We often need to combine data from separate tables in our work (e.g.,
relational database tables). In R we do this using either the
merge() function in base R or join_()
functions in dplyr. Because I find dplyr join functions to be more
intuitive and to perform faster than base R’s merge, I’m going to show
how to use dplyr. If you understand the basic concepts if the join
functions, you can figure out how to merge in base R.
To demonstrate the different joins, we’ll join the
bat_wide capture data frame we just created with a dataset
that includes more information about the bat capture sites.
Read in bat site data
bat_sites <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/bat_site_info.csv")
sort(unique(bat_sites$Site)) # Sites 1, 2, 3, 4, 5## [1] "site_001" "site_002" "site_003" "site_004" "site_005"
## [1] "site_001" "site_002" "site_003" "site_005" "site_006"
The key in the two bat datasets is the “Site” column. In the
bat_sites data frame, there are 5 unique sites, numbered
1:5. In the bat_wide data there are 5 unique sites,
numbered 1, 2, 3, 5, 6. Therefore site_004 is only found in
bat_sites and site_006 is only found in
bat_wide.
Full join
##
## site_001 site_002 site_003 site_004 site_005 site_006
## 7 7 7 1 7 1
| Site | Unit | X | Y | SiteName | Year | LASCIN | MYOLEI | MYOSEP | MYOLUC |
|---|---|---|---|---|---|---|---|---|---|
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2019 | 1 | 0 | 0 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2020 | 0 | 1 | 1 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2021 | 0 | 1 | 0 | 1 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2022 | 1 | 2 | 0 | 1 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2023 | 0 | 1 | 0 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2024 | 0 | 0 | 0 | 2 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2025 | 0 | 2 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2019 | 1 | 1 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2020 | 0 | 2 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2021 | 1 | 1 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2022 | 0 | 0 | 0 | 1 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2023 | 0 | 0 | 1 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2024 | 0 | 2 | 0 | 1 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2025 | 1 | 0 | 1 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2019 | 0 | 1 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2020 | 0 | 1 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2021 | 0 | 3 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2022 | 0 | 2 | 0 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2023 | 0 | 1 | 0 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2024 | 0 | 2 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2025 | 0 | 1 | 0 | 1 |
| site_004 | Mount Desert Island | 549931 | 4903409 | Western Mtns | NA | NA | NA | NA | NA |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2019 | 0 | 1 | 1 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2020 | 1 | 0 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2021 | 0 | 1 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2022 | 0 | 0 | 1 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2023 | 1 | 0 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2024 | 0 | 0 | 0 | 2 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2025 | 0 | 1 | 0 | 0 |
| site_006 | NA | NA | NA | NA | 2025 | 0 | 0 | 1 | 0 |
Note how site_004, which was not in the bat_wide capture
data, but was in the bat_site data is included with NAs for
the columns that came from the bat_wide data. Additionally,
site_006, which was only in the bat_wide capture data but
not in the bat_site data has NAs for the columns that came
from the bat_site data.
Inner join
##
## site_001 site_002 site_003 site_005
## 7 7 7 7
| Site | Unit | X | Y | SiteName | Year | LASCIN | MYOLEI | MYOSEP | MYOLUC |
|---|---|---|---|---|---|---|---|---|---|
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2019 | 1 | 0 | 0 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2020 | 0 | 1 | 1 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2021 | 0 | 1 | 0 | 1 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2022 | 1 | 2 | 0 | 1 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2023 | 0 | 1 | 0 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2024 | 0 | 0 | 0 | 2 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2025 | 0 | 2 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2019 | 1 | 1 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2020 | 0 | 2 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2021 | 1 | 1 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2022 | 0 | 0 | 0 | 1 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2023 | 0 | 0 | 1 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2024 | 0 | 2 | 0 | 1 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2025 | 1 | 0 | 1 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2019 | 0 | 1 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2020 | 0 | 1 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2021 | 0 | 3 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2022 | 0 | 2 | 0 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2023 | 0 | 1 | 0 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2024 | 0 | 2 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2025 | 0 | 1 | 0 | 1 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2019 | 0 | 1 | 1 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2020 | 1 | 0 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2021 | 0 | 1 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2022 | 0 | 0 | 1 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2023 | 1 | 0 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2024 | 0 | 0 | 0 | 2 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2025 | 0 | 1 | 0 | 0 |
The inner join only returns records from both datasets that have site
in common. Therefore, site_004 in the bat_site data and
site_006 in the bat_wide capture data were dropped.
Left join
##
## site_001 site_002 site_003 site_004 site_005
## 7 7 7 1 7
| Site | Unit | X | Y | SiteName | Year | LASCIN | MYOLEI | MYOSEP | MYOLUC |
|---|---|---|---|---|---|---|---|---|---|
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2019 | 1 | 0 | 0 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2020 | 0 | 1 | 1 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2021 | 0 | 1 | 0 | 1 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2022 | 1 | 2 | 0 | 1 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2023 | 0 | 1 | 0 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2024 | 0 | 0 | 0 | 2 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2025 | 0 | 2 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2019 | 1 | 1 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2020 | 0 | 2 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2021 | 1 | 1 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2022 | 0 | 0 | 0 | 1 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2023 | 0 | 0 | 1 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2024 | 0 | 2 | 0 | 1 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2025 | 1 | 0 | 1 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2019 | 0 | 1 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2020 | 0 | 1 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2021 | 0 | 3 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2022 | 0 | 2 | 0 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2023 | 0 | 1 | 0 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2024 | 0 | 2 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2025 | 0 | 1 | 0 | 1 |
| site_004 | Mount Desert Island | 549931 | 4903409 | Western Mtns | NA | NA | NA | NA | NA |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2019 | 0 | 1 | 1 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2020 | 1 | 0 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2021 | 0 | 1 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2022 | 0 | 0 | 1 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2023 | 1 | 0 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2024 | 0 | 0 | 0 | 2 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2025 | 0 | 1 | 0 | 0 |
The left join is taking every row in the left data,
bat_sites, and only the rows in the right data,
bat_wide, that have a matching site. Note how site_004,
which is only in the bat_sites, is included with NAs for
the columns that came from the bat_wide data that didn’t
have a match. Site_006, which was only in the bat_wide data
was dropped.
Coding tip: I use left joins more than any other join because I’m usually joining tables that have a 1-to-many relationship, where the left dataset has 1 row for 1 or more rows in the right dataset. For example, say I have a dataset that only includes data for plots where an invasive species was detected and I want to do summary statistics that require the full number of plots. Using a left join, where the left dataset is a table of all of the plots and the right dataset is the invasive detections, will return the full set of plots to calculate summary statistics from. You may also have to fill 0s where NAs are introduced in the data before generating summary statistics, which should be done wisely.
Right join
##
## site_001 site_002 site_003 site_005 site_006
## 7 7 7 7 1
| Site | Unit | X | Y | SiteName | Year | LASCIN | MYOLEI | MYOSEP | MYOLUC |
|---|---|---|---|---|---|---|---|---|---|
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2019 | 1 | 0 | 0 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2020 | 0 | 1 | 1 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2021 | 0 | 1 | 0 | 1 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2022 | 1 | 2 | 0 | 1 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2023 | 0 | 1 | 0 | 0 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2024 | 0 | 0 | 0 | 2 |
| site_001 | Mount Desert Island | 559205 | 4907461 | Jordan Pond | 2025 | 0 | 2 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2019 | 1 | 1 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2020 | 0 | 2 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2021 | 1 | 1 | 0 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2022 | 0 | 0 | 0 | 1 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2023 | 0 | 0 | 1 | 0 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2024 | 0 | 2 | 0 | 1 |
| site_002 | Schoodic | 574712 | 4909721 | SERC Campus | 2025 | 1 | 0 | 1 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2019 | 0 | 1 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2020 | 0 | 1 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2021 | 0 | 3 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2022 | 0 | 2 | 0 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2023 | 0 | 1 | 0 | 0 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2024 | 0 | 2 | 0 | 1 |
| site_003 | Mount Desert Island | 554607 | 4895800 | Bass Harbor | 2025 | 0 | 1 | 0 | 1 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2019 | 0 | 1 | 1 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2020 | 1 | 0 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2021 | 0 | 1 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2022 | 0 | 0 | 1 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2023 | 1 | 0 | 0 | 0 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2024 | 0 | 0 | 0 | 2 |
| site_005 | Mount Desert Island | 563101 | 4912371 | Sieur de Monts | 2025 | 0 | 1 | 0 | 0 |
| site_006 | NA | NA | NA | NA | 2025 | 0 | 0 | 1 | 0 |
The right join is taking every row in the right data,
bat_wide, and only the rows in the left data,
bat_sites, that have a matching site. Note how Site_006,
which is only in the bat_wide, is included with NAs for the
columns that came from the bat_sites data that didn’t have
a match. Site_004, which was only in the bat_sites data was
dropped.
Anti join to find sites not in bat_wide
## Site Unit X Y SiteName
## 1 site_004 Mount Desert Island 549931 4903409 Western Mtns
Anti join to find sites not in bat_sites
## # A tibble: 1 × 6
## Site Year LASCIN MYOLEI MYOSEP MYOLUC
## <chr> <int> <int> <int> <int> <int>
## 1 site_006 2025 0 0 1 0
Import tree and species tables
spp_tbl <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_species_table.csv")
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")## [1] "TSN" "ScientificName"
# left join species to trees, because don't want to include species not found in tree data
trees_spp <- left_join(trees,
spp_tbl |> select(TSN, ScientificName, CommonName),
by = c("TSN", "ScientificName"))
head(trees_spp)## ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN ScientificName
## 1 MIMA 12 6/16/2025 FALSE 2025 13 183385 Pinus strobus
## 2 MIMA 12 6/16/2025 FALSE 2025 12 28728 Acer rubrum
## 3 MIMA 12 6/16/2025 FALSE 2025 11 28728 Acer rubrum
## 4 MIMA 12 6/16/2025 FALSE 2025 2 28728 Acer rubrum
## 5 MIMA 12 6/16/2025 FALSE 2025 10 28728 Acer rubrum
## 6 MIMA 12 6/16/2025 FALSE 2025 7 28728 Acer rubrum
## DBHcm TreeStatusCode CrownClassCode DecayClassCode CommonName
## 1 24.9 AS 5 <NA> eastern white pine
## 2 10.9 AB 5 <NA> red maple
## 3 18.8 AS 3 <NA> red maple
## 4 51.2 AS 3 <NA> red maple
## 5 38.2 AS 3 <NA> red maple
## 6 22.5 AS 4 <NA> red maple
## [1] "TSN" "ScientificName"
# anti join of trees against species table, selecting only columns of interest
anti_join(trees, spp_tbl, by = c("TSN", "ScientificName")) |>
select(ParkUnit, PlotCode, SampleYear, ScientificName)## ParkUnit PlotCode SampleYear ScientificName
## 1 MIMA 16 2025 Quercus robur
There are a number of other more advanced joins out there, the rolling join being one of them. For more information on all possible joins, refer to Chapter 19 in R for Data Science.
Rolling joins can come in handy if the key values in your two datasets don’t perfectly match, and you want to join on the closest match. An example of where I’ve used rolling joins is to relate timing of high tide to the nearest water temperature measurement from a HOBO logger. You can allow for the nearest match in both directions or specify the direction (e.g., => or <=).Unfortunately, dplyr’s rolling join doesn’t perform the way I’ve
needed it. It only matches in one direction, like the closest
temperature measurement after high tide, or the closest temperature
measurement before high tide. If you need to do a rolling join, the
data.table package is your best bet. It requires learning a
new syntax and coding approach, so I’m not covering it here. But it’s
helpful to know that if you’re working with huge datasets,
data.table tends to perform much faster than dplyr and may
have more features for joining and summarizing your data than dplyr.
Dates, times and date-times are all special types of data in R. When
you read in a dataset that has any of these, they typically will read in
as a character. You then have to convert it into a date/time to do
anything meaningful with it. The first place to start is knowing the
code R uses to define year, month, day, hours, minutes, and seconds. For
the full list, check out the help for strptime by running:
?strptime. The codes below are the ones you’re most likely
to come across, either to define a date/time format, or to return a
specific format (like day of the week, month written in full, Julian
day, etc.)
| Code | Definition |
|---|---|
| %a | Abbreviated weekday name in the current locale on this platform. |
| %A | Full weekday name in the current locale. |
| %b | Abbreviated month name in the current locale on this platform. Case-insensitive on input. |
| %B | Full month name in the current locale. Case-insensitive on input. |
| %d | Day of the month as decimal number (01-31). |
| %H | Hours as decimal number (00-23). As a special exception strings such as ??24:00:00?? are accepted for input. |
| %I | Hours as decimal number (01-12). |
| %j | Day of year (Julian) as decimal number (001-366): For input, 366 is only valid in a leap year. |
| %m | Month as decimal number (01-12). |
| %M | Minute as decimal number (00-59). |
| %p | AM/PM indicator in the locale. Used in conjunction with %I and not with %H. For input the match is case-insensitive. |
| %S | Second as integer (00-61) |
| %u | Weekday as a decimal number (1-7, Monday is 1). |
| %y | Year without century (00-99). |
| %Y | Year with century. |
Look at current time and date output.
## [1] "2026-03-30 13:28:42 EDT"
## [1] "POSIXct" "POSIXt"
## [1] "2026-03-30"
## [1] "Date"
For date only columns, you convert to a Date type. A few
different versions of defining dates are below, based on the different
format of the input date. This requires matching the format exactly. So,
if there are - between day, month, year, or /,
you need to specify the right symbol. If the output returns NA instead
of a Date, something was wrong either in how you specified the format,
or the column you’re trying to format may have more than 1 format
represented.
Example formatting for dates
# date with slashes and full year
date_chr1 <- "3/12/2026"
date1 <- as.Date(date_chr1, format = "%m/%d/%Y")
str(date1)# date with dashes and 2-digit year
date_chr2 <- "3-12-26"
date2 <- as.Date(date_chr2, format = "%m-%d-%y")
str(date2)# date written out
date_chr3 <- "March 12, 2026"
date3 <- as.Date(date_chr3, format = "%b %d, %Y")
str(date3)## Date[1:1], format: "2026-03-12"
Extract information about dates
## [1] 71
## [1] "Thursday"
## [1] "Thu"
## [1] "March 12, 2026"
## [1] "Mar 12, 2026"
Do math with dates
## [1] "2026-03-13"
## [1] "2026-03-19"
Create a vector of evenly spaced dates.
This can be helpful for setting up axis labels where one axis is dates.
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y")
# by 15 days
seq.Date(date_list[1], date_list[2], by = "15 days")## [1] "2026-01-01" "2026-01-16" "2026-01-31" "2026-02-15" "2026-03-02"
## [6] "2026-03-17" "2026-04-01" "2026-04-16" "2026-05-01" "2026-05-16"
## [11] "2026-05-31" "2026-06-15" "2026-06-30" "2026-07-15" "2026-07-30"
## [16] "2026-08-14" "2026-08-29" "2026-09-13" "2026-09-28" "2026-10-13"
## [21] "2026-10-28" "2026-11-12" "2026-11-27" "2026-12-12" "2026-12-27"
## [1] "2026-01-01" "2026-02-01" "2026-03-01" "2026-04-01" "2026-05-01"
## [6] "2026-06-01" "2026-07-01" "2026-08-01" "2026-09-01" "2026-10-01"
## [11] "2026-11-01" "2026-12-01"
## [1] "2026-01-01" "2026-07-01"
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y")
seq.Date(date_list[1], date_list[2], by = "1 week")## [1] "2026-01-01" "2026-01-08" "2026-01-15" "2026-01-22" "2026-01-29"
## [6] "2026-02-05" "2026-02-12" "2026-02-19" "2026-02-26" "2026-03-05"
## [11] "2026-03-12" "2026-03-19" "2026-03-26" "2026-04-02" "2026-04-09"
## [16] "2026-04-16" "2026-04-23" "2026-04-30" "2026-05-07" "2026-05-14"
## [21] "2026-05-21" "2026-05-28" "2026-06-04" "2026-06-11" "2026-06-18"
## [26] "2026-06-25" "2026-07-02" "2026-07-09" "2026-07-16" "2026-07-23"
## [31] "2026-07-30" "2026-08-06" "2026-08-13" "2026-08-20" "2026-08-27"
## [36] "2026-09-03" "2026-09-10" "2026-09-17" "2026-09-24" "2026-10-01"
## [41] "2026-10-08" "2026-10-15" "2026-10-22" "2026-10-29" "2026-11-05"
## [46] "2026-11-12" "2026-11-19" "2026-11-26" "2026-12-03" "2026-12-10"
## [51] "2026-12-17" "2026-12-24" "2026-12-31"
If your dataset is huge, working with the lighter weight POSIXct may be best. Outside of that, whatever you choose may not matter too much in your workflow. We will use the lighter weight POSIXct version for our examples.
Look under the hood of the info stored by the two POSIX types
## [1] 1773293400
## attr(,"tzone")
## [1] "America/New_York"
## $sec
## [1] 0
##
## $min
## [1] 30
##
## $hour
## [1] 1
##
## $mday
## [1] 12
##
## $mon
## [1] 2
##
## $year
## [1] 126
##
## $wday
## [1] 4
##
## $yday
## [1] 70
##
## $isdst
## [1] 1
##
## $zone
## [1] "EDT"
##
## $gmtoff
## [1] NA
##
## attr(,"tzone")
## [1] "America/New_York"
## attr(,"balanced")
## [1] TRUE
Note the use of timezone in the code above. Here I specified the eastern
timezone. There are two handy ways to check timezones in R.
Check the timezone of your computer
## [1] "America/New_York"
Check the timezones built into base R
## [1] "Africa/Abidjan" "Africa/Accra"
## [3] "Africa/Addis_Ababa" "Africa/Algiers"
## [5] "Africa/Asmara" "Africa/Asmera"
## [7] "Africa/Bamako" "Africa/Bangui"
## [9] "Africa/Banjul" "Africa/Bissau"
## [11] "Africa/Blantyre" "Africa/Brazzaville"
## [13] "Africa/Bujumbura" "Africa/Cairo"
## [15] "Africa/Casablanca" "Africa/Ceuta"
## [17] "Africa/Conakry" "Africa/Dakar"
## [19] "Africa/Dar_es_Salaam" "Africa/Djibouti"
## [21] "Africa/Douala" "Africa/El_Aaiun"
## [23] "Africa/Freetown" "Africa/Gaborone"
## [25] "Africa/Harare" "Africa/Johannesburg"
## [27] "Africa/Juba" "Africa/Kampala"
## [29] "Africa/Khartoum" "Africa/Kigali"
## [31] "Africa/Kinshasa" "Africa/Lagos"
## [33] "Africa/Libreville" "Africa/Lome"
## [35] "Africa/Luanda" "Africa/Lubumbashi"
## [37] "Africa/Lusaka" "Africa/Malabo"
## [39] "Africa/Maputo" "Africa/Maseru"
## [41] "Africa/Mbabane" "Africa/Mogadishu"
## [43] "Africa/Monrovia" "Africa/Nairobi"
## [45] "Africa/Ndjamena" "Africa/Niamey"
## [47] "Africa/Nouakchott" "Africa/Ouagadougou"
## [49] "Africa/Porto-Novo" "Africa/Sao_Tome"
## [51] "Africa/Timbuktu" "Africa/Tripoli"
## [53] "Africa/Tunis" "Africa/Windhoek"
## [55] "America/Adak" "America/Anchorage"
## [57] "America/Anguilla" "America/Antigua"
## [59] "America/Araguaina" "America/Argentina/Buenos_Aires"
## [61] "America/Argentina/Catamarca" "America/Argentina/ComodRivadavia"
## [63] "America/Argentina/Cordoba" "America/Argentina/Jujuy"
## [65] "America/Argentina/La_Rioja" "America/Argentina/Mendoza"
## [67] "America/Argentina/Rio_Gallegos" "America/Argentina/Salta"
## [69] "America/Argentina/San_Juan" "America/Argentina/San_Luis"
## [71] "America/Argentina/Tucuman" "America/Argentina/Ushuaia"
## [73] "America/Aruba" "America/Asuncion"
## [75] "America/Atikokan" "America/Atka"
## [77] "America/Bahia" "America/Bahia_Banderas"
## [79] "America/Barbados" "America/Belem"
## [81] "America/Belize" "America/Blanc-Sablon"
## [83] "America/Boa_Vista" "America/Bogota"
## [85] "America/Boise" "America/Buenos_Aires"
## [87] "America/Cambridge_Bay" "America/Campo_Grande"
## [89] "America/Cancun" "America/Caracas"
## [91] "America/Catamarca" "America/Cayenne"
## [93] "America/Cayman" "America/Chicago"
## [95] "America/Chihuahua" "America/Ciudad_Juarez"
## [97] "America/Coral_Harbour" "America/Cordoba"
## [99] "America/Costa_Rica" "America/Creston"
## [101] "America/Cuiaba" "America/Curacao"
## [103] "America/Danmarkshavn" "America/Dawson"
## [105] "America/Dawson_Creek" "America/Denver"
## [107] "America/Detroit" "America/Dominica"
## [109] "America/Edmonton" "America/Eirunepe"
## [111] "America/El_Salvador" "America/Ensenada"
## [113] "America/Fort_Nelson" "America/Fort_Wayne"
## [115] "America/Fortaleza" "America/Glace_Bay"
## [117] "America/Godthab" "America/Goose_Bay"
## [119] "America/Grand_Turk" "America/Grenada"
## [121] "America/Guadeloupe" "America/Guatemala"
## [123] "America/Guayaquil" "America/Guyana"
## [125] "America/Halifax" "America/Havana"
## [127] "America/Hermosillo" "America/Indiana/Indianapolis"
## [129] "America/Indiana/Knox" "America/Indiana/Marengo"
## [131] "America/Indiana/Petersburg" "America/Indiana/Tell_City"
## [133] "America/Indiana/Vevay" "America/Indiana/Vincennes"
## [135] "America/Indiana/Winamac" "America/Indianapolis"
## [137] "America/Inuvik" "America/Iqaluit"
## [139] "America/Jamaica" "America/Jujuy"
## [141] "America/Juneau" "America/Kentucky/Louisville"
## [143] "America/Kentucky/Monticello" "America/Knox_IN"
## [145] "America/Kralendijk" "America/La_Paz"
## [147] "America/Lima" "America/Los_Angeles"
## [149] "America/Louisville" "America/Lower_Princes"
## [151] "America/Maceio" "America/Managua"
## [153] "America/Manaus" "America/Marigot"
## [155] "America/Martinique" "America/Matamoros"
## [157] "America/Mazatlan" "America/Mendoza"
## [159] "America/Menominee" "America/Merida"
## [161] "America/Metlakatla" "America/Mexico_City"
## [163] "America/Miquelon" "America/Moncton"
## [165] "America/Monterrey" "America/Montevideo"
## [167] "America/Montreal" "America/Montserrat"
## [169] "America/Nassau" "America/New_York"
## [171] "America/Nipigon" "America/Nome"
## [173] "America/Noronha" "America/North_Dakota/Beulah"
## [175] "America/North_Dakota/Center" "America/North_Dakota/New_Salem"
## [177] "America/Nuuk" "America/Ojinaga"
## [179] "America/Panama" "America/Pangnirtung"
## [181] "America/Paramaribo" "America/Phoenix"
## [183] "America/Port-au-Prince" "America/Port_of_Spain"
## [185] "America/Porto_Acre" "America/Porto_Velho"
## [187] "America/Puerto_Rico" "America/Punta_Arenas"
## [189] "America/Rainy_River" "America/Rankin_Inlet"
## [191] "America/Recife" "America/Regina"
## [193] "America/Resolute" "America/Rio_Branco"
## [195] "America/Rosario" "America/Santa_Isabel"
## [197] "America/Santarem" "America/Santiago"
## [199] "America/Santo_Domingo" "America/Sao_Paulo"
## [201] "America/Scoresbysund" "America/Shiprock"
## [203] "America/Sitka" "America/St_Barthelemy"
## [205] "America/St_Johns" "America/St_Kitts"
## [207] "America/St_Lucia" "America/St_Thomas"
## [209] "America/St_Vincent" "America/Swift_Current"
## [211] "America/Tegucigalpa" "America/Thule"
## [213] "America/Thunder_Bay" "America/Tijuana"
## [215] "America/Toronto" "America/Tortola"
## [217] "America/Vancouver" "America/Virgin"
## [219] "America/Whitehorse" "America/Winnipeg"
## [221] "America/Yakutat" "America/Yellowknife"
## [223] "Antarctica/Casey" "Antarctica/Davis"
## [225] "Antarctica/DumontDUrville" "Antarctica/Macquarie"
## [227] "Antarctica/Mawson" "Antarctica/McMurdo"
## [229] "Antarctica/Palmer" "Antarctica/Rothera"
## [231] "Antarctica/South_Pole" "Antarctica/Syowa"
## [233] "Antarctica/Troll" "Antarctica/Vostok"
## [235] "Arctic/Longyearbyen" "Asia/Aden"
## [237] "Asia/Almaty" "Asia/Amman"
## [239] "Asia/Anadyr" "Asia/Aqtau"
## [241] "Asia/Aqtobe" "Asia/Ashgabat"
## [243] "Asia/Ashkhabad" "Asia/Atyrau"
## [245] "Asia/Baghdad" "Asia/Bahrain"
## [247] "Asia/Baku" "Asia/Bangkok"
## [249] "Asia/Barnaul" "Asia/Beirut"
## [251] "Asia/Bishkek" "Asia/Brunei"
## [253] "Asia/Calcutta" "Asia/Chita"
## [255] "Asia/Choibalsan" "Asia/Chongqing"
## [257] "Asia/Chungking" "Asia/Colombo"
## [259] "Asia/Dacca" "Asia/Damascus"
## [261] "Asia/Dhaka" "Asia/Dili"
## [263] "Asia/Dubai" "Asia/Dushanbe"
## [265] "Asia/Famagusta" "Asia/Gaza"
## [267] "Asia/Harbin" "Asia/Hebron"
## [269] "Asia/Ho_Chi_Minh" "Asia/Hong_Kong"
## [271] "Asia/Hovd" "Asia/Irkutsk"
## [273] "Asia/Istanbul" "Asia/Jakarta"
## [275] "Asia/Jayapura" "Asia/Jerusalem"
## [277] "Asia/Kabul" "Asia/Kamchatka"
## [279] "Asia/Karachi" "Asia/Kashgar"
## [281] "Asia/Kathmandu" "Asia/Katmandu"
## [283] "Asia/Khandyga" "Asia/Kolkata"
## [285] "Asia/Krasnoyarsk" "Asia/Kuala_Lumpur"
## [287] "Asia/Kuching" "Asia/Kuwait"
## [289] "Asia/Macao" "Asia/Macau"
## [291] "Asia/Magadan" "Asia/Makassar"
## [293] "Asia/Manila" "Asia/Muscat"
## [295] "Asia/Nicosia" "Asia/Novokuznetsk"
## [297] "Asia/Novosibirsk" "Asia/Omsk"
## [299] "Asia/Oral" "Asia/Phnom_Penh"
## [301] "Asia/Pontianak" "Asia/Pyongyang"
## [303] "Asia/Qatar" "Asia/Qostanay"
## [305] "Asia/Qyzylorda" "Asia/Rangoon"
## [307] "Asia/Riyadh" "Asia/Saigon"
## [309] "Asia/Sakhalin" "Asia/Samarkand"
## [311] "Asia/Seoul" "Asia/Shanghai"
## [313] "Asia/Singapore" "Asia/Srednekolymsk"
## [315] "Asia/Taipei" "Asia/Tashkent"
## [317] "Asia/Tbilisi" "Asia/Tehran"
## [319] "Asia/Tel_Aviv" "Asia/Thimbu"
## [321] "Asia/Thimphu" "Asia/Tokyo"
## [323] "Asia/Tomsk" "Asia/Ujung_Pandang"
## [325] "Asia/Ulaanbaatar" "Asia/Ulan_Bator"
## [327] "Asia/Urumqi" "Asia/Ust-Nera"
## [329] "Asia/Vientiane" "Asia/Vladivostok"
## [331] "Asia/Yakutsk" "Asia/Yangon"
## [333] "Asia/Yekaterinburg" "Asia/Yerevan"
## [335] "Atlantic/Azores" "Atlantic/Bermuda"
## [337] "Atlantic/Canary" "Atlantic/Cape_Verde"
## [339] "Atlantic/Faeroe" "Atlantic/Faroe"
## [341] "Atlantic/Jan_Mayen" "Atlantic/Madeira"
## [343] "Atlantic/Reykjavik" "Atlantic/South_Georgia"
## [345] "Atlantic/St_Helena" "Atlantic/Stanley"
## [347] "Australia/ACT" "Australia/Adelaide"
## [349] "Australia/Brisbane" "Australia/Broken_Hill"
## [351] "Australia/Canberra" "Australia/Currie"
## [353] "Australia/Darwin" "Australia/Eucla"
## [355] "Australia/Hobart" "Australia/LHI"
## [357] "Australia/Lindeman" "Australia/Lord_Howe"
## [359] "Australia/Melbourne" "Australia/North"
## [361] "Australia/NSW" "Australia/Perth"
## [363] "Australia/Queensland" "Australia/South"
## [365] "Australia/Sydney" "Australia/Tasmania"
## [367] "Australia/Victoria" "Australia/West"
## [369] "Australia/Yancowinna" "Brazil/Acre"
## [371] "Brazil/DeNoronha" "Brazil/East"
## [373] "Brazil/West" "Canada/Atlantic"
## [375] "Canada/Central" "Canada/Eastern"
## [377] "Canada/Mountain" "Canada/Newfoundland"
## [379] "Canada/Pacific" "Canada/Saskatchewan"
## [381] "Canada/Yukon" "CET"
## [383] "Chile/Continental" "Chile/EasterIsland"
## [385] "CST6CDT" "Cuba"
## [387] "EET" "Egypt"
## [389] "Eire" "EST"
## [391] "EST5EDT" "Etc/GMT"
## [393] "Etc/GMT-0" "Etc/GMT-1"
## [395] "Etc/GMT-10" "Etc/GMT-11"
## [397] "Etc/GMT-12" "Etc/GMT-13"
## [399] "Etc/GMT-14" "Etc/GMT-2"
## [401] "Etc/GMT-3" "Etc/GMT-4"
## [403] "Etc/GMT-5" "Etc/GMT-6"
## [405] "Etc/GMT-7" "Etc/GMT-8"
## [407] "Etc/GMT-9" "Etc/GMT+0"
## [409] "Etc/GMT+1" "Etc/GMT+10"
## [411] "Etc/GMT+11" "Etc/GMT+12"
## [413] "Etc/GMT+2" "Etc/GMT+3"
## [415] "Etc/GMT+4" "Etc/GMT+5"
## [417] "Etc/GMT+6" "Etc/GMT+7"
## [419] "Etc/GMT+8" "Etc/GMT+9"
## [421] "Etc/GMT0" "Etc/Greenwich"
## [423] "Etc/UCT" "Etc/Universal"
## [425] "Etc/UTC" "Etc/Zulu"
## [427] "Europe/Amsterdam" "Europe/Andorra"
## [429] "Europe/Astrakhan" "Europe/Athens"
## [431] "Europe/Belfast" "Europe/Belgrade"
## [433] "Europe/Berlin" "Europe/Bratislava"
## [435] "Europe/Brussels" "Europe/Bucharest"
## [437] "Europe/Budapest" "Europe/Busingen"
## [439] "Europe/Chisinau" "Europe/Copenhagen"
## [441] "Europe/Dublin" "Europe/Gibraltar"
## [443] "Europe/Guernsey" "Europe/Helsinki"
## [445] "Europe/Isle_of_Man" "Europe/Istanbul"
## [447] "Europe/Jersey" "Europe/Kaliningrad"
## [449] "Europe/Kiev" "Europe/Kirov"
## [451] "Europe/Kyiv" "Europe/Lisbon"
## [453] "Europe/Ljubljana" "Europe/London"
## [455] "Europe/Luxembourg" "Europe/Madrid"
## [457] "Europe/Malta" "Europe/Mariehamn"
## [459] "Europe/Minsk" "Europe/Monaco"
## [461] "Europe/Moscow" "Europe/Nicosia"
## [463] "Europe/Oslo" "Europe/Paris"
## [465] "Europe/Podgorica" "Europe/Prague"
## [467] "Europe/Riga" "Europe/Rome"
## [469] "Europe/Samara" "Europe/San_Marino"
## [471] "Europe/Sarajevo" "Europe/Saratov"
## [473] "Europe/Simferopol" "Europe/Skopje"
## [475] "Europe/Sofia" "Europe/Stockholm"
## [477] "Europe/Tallinn" "Europe/Tirane"
## [479] "Europe/Tiraspol" "Europe/Ulyanovsk"
## [481] "Europe/Uzhgorod" "Europe/Vaduz"
## [483] "Europe/Vatican" "Europe/Vienna"
## [485] "Europe/Vilnius" "Europe/Volgograd"
## [487] "Europe/Warsaw" "Europe/Zagreb"
## [489] "Europe/Zaporozhye" "Europe/Zurich"
## [491] "GB" "GB-Eire"
## [493] "GMT" "GMT-0"
## [495] "GMT+0" "GMT0"
## [497] "Greenwich" "Hongkong"
## [499] "HST" "Iceland"
## [501] "Indian/Antananarivo" "Indian/Chagos"
## [503] "Indian/Christmas" "Indian/Cocos"
## [505] "Indian/Comoro" "Indian/Kerguelen"
## [507] "Indian/Mahe" "Indian/Maldives"
## [509] "Indian/Mauritius" "Indian/Mayotte"
## [511] "Indian/Reunion" "Iran"
## [513] "Israel" "Jamaica"
## [515] "Japan" "Kwajalein"
## [517] "Libya" "MET"
## [519] "Mexico/BajaNorte" "Mexico/BajaSur"
## [521] "Mexico/General" "MST"
## [523] "MST7MDT" "Navajo"
## [525] "NZ" "NZ-CHAT"
## [527] "Pacific/Apia" "Pacific/Auckland"
## [529] "Pacific/Bougainville" "Pacific/Chatham"
## [531] "Pacific/Chuuk" "Pacific/Easter"
## [533] "Pacific/Efate" "Pacific/Enderbury"
## [535] "Pacific/Fakaofo" "Pacific/Fiji"
## [537] "Pacific/Funafuti" "Pacific/Galapagos"
## [539] "Pacific/Gambier" "Pacific/Guadalcanal"
## [541] "Pacific/Guam" "Pacific/Honolulu"
## [543] "Pacific/Johnston" "Pacific/Kanton"
## [545] "Pacific/Kiritimati" "Pacific/Kosrae"
## [547] "Pacific/Kwajalein" "Pacific/Majuro"
## [549] "Pacific/Marquesas" "Pacific/Midway"
## [551] "Pacific/Nauru" "Pacific/Niue"
## [553] "Pacific/Norfolk" "Pacific/Noumea"
## [555] "Pacific/Pago_Pago" "Pacific/Palau"
## [557] "Pacific/Pitcairn" "Pacific/Pohnpei"
## [559] "Pacific/Ponape" "Pacific/Port_Moresby"
## [561] "Pacific/Rarotonga" "Pacific/Saipan"
## [563] "Pacific/Samoa" "Pacific/Tahiti"
## [565] "Pacific/Tarawa" "Pacific/Tongatapu"
## [567] "Pacific/Truk" "Pacific/Wake"
## [569] "Pacific/Wallis" "Pacific/Yap"
## [571] "Poland" "Portugal"
## [573] "PRC" "PST8PDT"
## [575] "ROC" "ROK"
## [577] "Singapore" "Turkey"
## [579] "UCT" "Universal"
## [581] "US/Alaska" "US/Aleutian"
## [583] "US/Arizona" "US/Central"
## [585] "US/East-Indiana" "US/Eastern"
## [587] "US/Hawaii" "US/Indiana-Starke"
## [589] "US/Michigan" "US/Mountain"
## [591] "US/Pacific" "US/Samoa"
## [593] "UTC" "W-SU"
## [595] "WET" "Zulu"
## attr(,"Version")
## [1] "2025a"
If you understand how to set up a Date type in R, setting up date-times aren’t that different. It just takes a bit more attention to get the format right. To demonstrate, we’ll read in HOBO temperature data and set the timestamp column as a POSIXct date-time. There’s usually a bit of cleaning required of HOBO data beyond setting the timestamp as POSIXct date-time. I’ll show the whole process below.
Read in temperature data and look at it
temp_data1 <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/HOBO_temp_example.csv")
head(temp_data1)## Plot.Title.HOBO_temp_example.csv X
## 1 # Date Time, GMT-05:00
## 2 1 7/18/2021 10:26
## 3 2 7/18/2021 11:26
## 4 3 7/18/2021 12:26
## 5 4 7/18/2021 13:26
## 6 5 7/18/2021 14:26
## X.1
## 1 Temp, °F (LGR S/N: 20672839, SEN S/N: 20672839)
## 2 58.842
## 3 58.712
## 4 58.109
## 5 56.208
## 6 56.208
## X.2 X.3
## 1 Coupler Detached (LGR S/N: 20672839) Coupler Attached (LGR S/N: 20672839)
## 2 Logged
## 3
## 4
## 5
## 6
## X.4 X.5
## 1 Stopped (LGR S/N: 20672839) End Of File (LGR S/N: 20672839)
## 2
## 3
## 4
## 5
## 6
Note the extra row on top showing the file name. HOBO data often has some metadata in the first row. The next code chunk imports a cleaner version of the data by skipping the first row, only pulling in the first 3 columns (we don’t care about the columns that report Logged), and cleaning up the column names.
Clean up non-date HOBO data
temp_data <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/HOBO_temp_example.csv", skip = 1)[,1:3]
colnames(temp_data) <- c("index", "date_time", "tempF")| index | date_time | tempF |
|---|---|---|
| 1 | 7/18/2021 10:26 | 58.842 |
| 2 | 7/18/2021 11:26 | 58.712 |
| 3 | 7/18/2021 12:26 | 58.109 |
| 4 | 7/18/2021 13:26 | 56.208 |
| 5 | 7/18/2021 14:26 | 56.208 |
| 6 | 7/18/2021 15:26 | 55.342 |
| 7 | 7/18/2021 16:26 | 55.602 |
| 8 | 7/18/2021 17:26 | 55.949 |
| 9 | 7/18/2021 18:26 | 55.602 |
| 10 | 7/18/2021 19:26 | 55.733 |
| 11 | 7/18/2021 20:26 | 55.819 |
| 12 | 7/18/2021 21:26 | 55.776 |
| 13 | 7/18/2021 22:26 | 56.469 |
| 14 | 7/18/2021 23:26 | 56.642 |
| 15 | 7/19/2021 0:26 | 56.556 |
| 16 | 7/19/2021 1:26 | 55.863 |
| 17 | 7/19/2021 2:26 | 55.819 |
| 18 | 7/19/2021 3:26 | 55.733 |
| 19 | 7/19/2021 4:26 | 55.733 |
| 20 | 7/19/2021 5:26 | 55.733 |
| 21 | 7/19/2021 6:26 | 55.949 |
| 22 | 7/19/2021 7:26 | 55.776 |
| 23 | 7/19/2021 8:26 | 56.035 |
| 24 | 7/19/2021 9:26 | 56.079 |
| 25 | 7/19/2021 10:26 | 56.901 |
| 26 | 7/19/2021 11:26 | 63.090 |
| 27 | 7/19/2021 12:26 | 63.732 |
| 28 | 7/19/2021 13:26 | 57.420 |
| 29 | 7/19/2021 14:26 | 56.685 |
| 30 | 7/19/2021 15:26 | 56.383 |
| 31 | 7/19/2021 16:26 | 56.469 |
| 32 | 7/19/2021 17:26 | 56.512 |
| 33 | 7/19/2021 18:26 | 56.815 |
| 34 | 7/19/2021 19:26 | 56.122 |
| 35 | 7/19/2021 20:26 | 57.074 |
| 36 | 7/19/2021 21:26 | 56.469 |
| 37 | 7/19/2021 22:26 | 56.122 |
| 38 | 7/19/2021 23:26 | 56.772 |
| 39 | 7/20/2021 0:26 | 57.979 |
| 40 | 7/20/2021 1:26 | 57.807 |
| 41 | 7/20/2021 2:26 | 56.469 |
| 42 | 7/20/2021 3:26 | 56.728 |
| 43 | 7/20/2021 4:26 | 56.295 |
| 44 | 7/20/2021 5:26 | 56.035 |
| 45 | 7/20/2021 6:26 | 56.079 |
| 46 | 7/20/2021 7:26 | 56.079 |
| 47 | 7/20/2021 8:26 | 56.165 |
| 48 | 7/20/2021 9:26 | 56.469 |
| 49 | 7/20/2021 10:26 | 57.031 |
| 50 | 7/20/2021 11:26 | 57.979 |
Convert date_time to POSIXct
We can see that the date is formatted as M/D/YYY, then there’s a space, then the time is formatted with HH:MM, with hours following the 0-23 pattern, minutes 00-59. There are no seconds.
temp_data$timestamp <- as.POSIXct(temp_data$date_time,
format = "%m/%d/%Y %H:%M",
tz = "America/New_York")
head(temp_data)## index date_time tempF timestamp
## 1 1 7/18/2021 10:26 58.842 2021-07-18 10:26:00
## 2 2 7/18/2021 11:26 58.712 2021-07-18 11:26:00
## 3 3 7/18/2021 12:26 58.109 2021-07-18 12:26:00
## 4 4 7/18/2021 13:26 56.208 2021-07-18 13:26:00
## 5 5 7/18/2021 14:26 56.208 2021-07-18 14:26:00
## 6 6 7/18/2021 15:26 55.342 2021-07-18 15:26:00
Extract the YYYYMMDD date, month, Julian day, time, and hour of the timestamp.
temp_data$date <- format(temp_data$timestamp, "%Y%m%d")
temp_data$month <- format(temp_data$timestamp, "%b")
temp_data$time <- format(temp_data$timestamp, "%I:%M")
temp_data$hour <- as.numeric(format(temp_data$timestamp, "%I"))
head(temp_data)## index date_time tempF timestamp date month time hour
## 1 1 7/18/2021 10:26 58.842 2021-07-18 10:26:00 20210718 Jul 10:26 10
## 2 2 7/18/2021 11:26 58.712 2021-07-18 11:26:00 20210718 Jul 11:26 11
## 3 3 7/18/2021 12:26 58.109 2021-07-18 12:26:00 20210718 Jul 12:26 12
## 4 4 7/18/2021 13:26 56.208 2021-07-18 13:26:00 20210718 Jul 01:26 1
## 5 5 7/18/2021 14:26 56.208 2021-07-18 14:26:00 20210718 Jul 02:26 2
## 6 6 7/18/2021 15:26 55.342 2021-07-18 15:26:00 20210718 Jul 03:26 3
## index date_time tempF timestamp date month time hour
## 1 1 7/18/2021 10:26 58.842 2021-07-18 10:26:00 20210718 Jul 10:26 10
## 2 2 7/18/2021 11:26 58.712 2021-07-18 11:26:00 20210718 Jul 11:26 11
## 3 3 7/18/2021 12:26 58.109 2021-07-18 12:26:00 20210718 Jul 12:26 12
## 4 4 7/18/2021 13:26 56.208 2021-07-18 13:26:00 20210718 Jul 01:26 1
## 5 5 7/18/2021 14:26 56.208 2021-07-18 14:26:00 20210718 Jul 02:26 2
## 6 6 7/18/2021 15:26 55.342 2021-07-18 15:26:00 20210718 Jul 03:26 3
## month_num
## 1 7
## 2 7
## 3 7
## 4 7
## 5 7
## 6 7
## index date_time tempF timestamp date month time hour
## 1 1 7/18/2021 10:26 58.842 2021-07-18 10:26:00 20210718 Jul 10:26 10
## 2 2 7/18/2021 11:26 58.712 2021-07-18 11:26:00 20210718 Jul 11:26 11
## 3 3 7/18/2021 12:26 58.109 2021-07-18 12:26:00 20210718 Jul 12:26 12
## 4 4 7/18/2021 13:26 56.208 2021-07-18 13:26:00 20210718 Jul 01:26 1
## 5 5 7/18/2021 14:26 56.208 2021-07-18 14:26:00 20210718 Jul 02:26 2
## 6 6 7/18/2021 15:26 55.342 2021-07-18 15:26:00 20210718 Jul 03:26 3
## month_num julian
## 1 7 199
## 2 7 199
## 3 7 199
## 4 7 199
## 5 7 199
## 6 7 199
For this section, we’re going to use NETN water quality data to customize ggplot objects. This is an abbreviated dataset from the NETN water package. The data contains surface lake measurements recorded using a YSI in a subset of lakes in Acadia NP. We’ll filter the data to plot at different combinations of parameters and sites.
library(dplyr)
library(ggplot2)
library(patchwork) # for arranging ggplot objects
library(RColorBrewer) # for palettes
library(viridis) # for palettes
chem <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_water_chemistry_data.csv")
str(chem)## 'data.frame': 6088 obs. of 13 variables:
## $ SiteCode : chr "ACBUBL" "ACBUBL" "ACBUBL" "ACBUBL" ...
## $ SiteName : chr "Bubble Pond" "Bubble Pond" "Bubble Pond" "Bubble Pond" ...
## $ UnitCode : chr "ACAD" "ACAD" "ACAD" "ACAD" ...
## $ SubUnitCode : logi NA NA NA NA NA NA ...
## $ EventDate : chr "5/23/2006" "5/23/2006" "5/23/2006" "5/23/2006" ...
## $ SiteType : chr "Lake" "Lake" "Lake" "Lake" ...
## $ Project : chr "NETN_LS" "NETN_LS" "NETN_LS" "NETN_LS" ...
## $ QCtype : int 0 0 0 0 0 0 0 0 0 0 ...
## $ SampleDepth_m: num 0.995 0.995 0.995 0.995 0.995 0.995 0.503 0.503 0.503 0.503 ...
## $ Parameter : chr "DO_mgL" "DOsat_pct" "SpCond_uScm" "Temp_C" ...
## $ Value : num 10.4 99.2 29 13.4 56.1 ...
## $ ValueFlag : logi NA NA NA NA NA NA ...
## $ FlagComments : logi NA NA NA NA NA NA ...
To start with, we’re going to plot temperature data for 8 lakes monitored in Acadia NP. Before we start plotting, we need to convert the EventDate column to a date type, and will extract a year, month, and day of year column for easier plotting later on.
Add date columns to chem, then filter on sites and Temp_F.
chem <- chem |> mutate(date = as.Date(EventDate, "%m/%d/%Y"),
year = as.numeric(format(date, "%Y")),
mon = as.numeric(format(date, "%m")),
doy = as.numeric(format(date, "%j")))
ACAD_lakes <- c("ACBUBL", "ACEAGL", "ACECHO", "ACJORD",
"ACLONG", "ACSEAL", "ACUHAD", "ACWHOL")
lakes_temp <- chem |> filter(SiteCode %in% ACAD_lakes) |>
filter(Parameter %in% "Temp_F")
head(lakes_temp)## SiteCode SiteName UnitCode SubUnitCode EventDate SiteType Project
## 1 ACBUBL Bubble Pond ACAD NA 5/23/2006 Lake NETN_LS
## 2 ACBUBL Bubble Pond ACAD NA 6/21/2006 Lake NETN_LS
## 3 ACBUBL Bubble Pond ACAD NA 7/20/2006 Lake NETN_LS
## 4 ACBUBL Bubble Pond ACAD NA 8/10/2006 Lake NETN_LS
## 5 ACBUBL Bubble Pond ACAD NA 9/26/2006 Lake NETN_LS
## 6 ACBUBL Bubble Pond ACAD NA 10/17/2006 Lake NETN+ACID
## QCtype SampleDepth_m Parameter Value ValueFlag FlagComments date year
## 1 0 0.9950 Temp_F 56.102 NA NA 2006-05-23 2006
## 2 0 0.5030 Temp_F 67.100 NA NA 2006-06-21 2006
## 3 0 1.5405 Temp_F 75.281 NA NA 2006-07-20 2006
## 4 0 0.5020 Temp_F 71.969 NA NA 2006-08-10 2006
## 5 0 1.0720 Temp_F 61.988 NA NA 2006-09-26 2006
## 6 0 0.9670 Temp_F 54.482 NA NA 2006-10-17 2006
## mon doy
## 1 5 143
## 2 6 172
## 3 7 201
## 4 8 222
## 5 9 269
## 6 10 290
Now that we have the data set up, we’re going to make a line and point time series plot of temperature for the different sites.
Make a generic plot with the black and white built in theme
Set color and symbol by SiteCode using default colors and shapes
ggplot(lakes_temp,
aes(x = date, y = Value, color = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_point() ## Warning: The shape palette can deal with a maximum of 6 discrete values because more
## than 6 becomes difficult to discriminate
## ℹ you have requested 8 values. Consider specifying shapes manually if you need
## that many of them.
## Warning: Removed 222 rows containing missing values or values outside the scale range
## (`geom_point()`).
Note the warning that we have 8 groups, but ggplot by default only
provides 6 different shapes. The 220 rows removed corresponds to the
points that were dropped that belonged to the 7th and 8th sites (ACUHAD
and ACWHOL). To use 8 symbols, you have to specify them manually, which
we’ll do next.
In addition, the default colors in ggplot aren’t great. Whenever you see these colors used in publications, you kind of know the author either barely knows ggplot or is lazy. We’re going to start by specifying our own colors and shapes manually. Then we’ll use color palettes from different packages.
Before you start plotting, it’s helpful to know what point symbol
codes are. To view that, run ?points, or search “pch in R
plot” and you’ll get the info below. Note that 0-14 are just lines with
no fill. To change their color, use the color aesthetic. Symbols 15-20
are solid, but also use color to change their aesthetic. Symbols 21-25
have both a color (outline) and fill (inside) aesthetic.
Figure of symbol codes in R.
Specify manual color and shape connected to SiteCode
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode,
shape = SiteCode)) +
theme_bw() +
geom_point() +
scale_color_manual(values = c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")) +
scale_fill_manual(values = c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")) +
scale_shape_manual(values = c("ACBUBL" = 21, "ACEAGL" = 23,
"ACECHO" = 24, "ACJORD" = 25,
"ACLONG" = 23, "ACSEAL" = 21,
"ACUHAD" = 25, "ACWHOL" = 24))The code above gave us a different color/fill and shape for each point, but coding it was cumbersome. Specifying the colors like above gets tedious fast. Imagine needing to make a plot for a dozen different parameters. A more efficient approach is defining them outside of ggplot, and then referencing them in the plot, like the example below.
Specify manual color and shape more efficiently
site_cols <- c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")
site_shps <- c("ACBUBL" = 21, "ACEAGL" = 23,
"ACECHO" = 24, "ACJORD" = 25,
"ACLONG" = 23, "ACSEAL" = 21,
"ACUHAD" = 25, "ACWHOL" = 24)
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_point() +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") Similar plot, but make the outline the same color for each point and increase point size
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_point(color = "dimgrey", size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") ## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's colour values.
The warning is there because you have color in the aes()
and in the geom_point(), but it’s not a problem. The
ggplot2 package is pretty chatty in the console. It’s good to read the
warnings to make sure you didn’t drop values you wanted to plot
(e.g. ACWHOL dropped in previous example), but often they’re not
issues.
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_point(color = "dimgrey", size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") # answerNow we’re going to play with adding lines to the graphs. First we’ll
add the geom_line() to see how it looks. Notice the order
has the line plotting before the point, so it doesn’t cross over the
points.
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_line() + # new line
geom_point(color = "dimgrey") +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") It’s not easy to follow the line and not really what we’re looking
for at this scale of the data (2006-2025). This is where we have to
think about what we’re actually interested in, and in this case, it’s
whether temperature is changing over time. This is where the
geom_smooth() is really helpful. The
geom_smooth() plots a line assuming the y ~ x formula
(unless you specify a different formula). By default the method is a
LOESS smoother, but you can specify a range of methods, including linear
regression by adding method = 'lm' to
geom_smooth().
Note that I turned off the standard error ribbon that plots by
default using se = FALSE. It’s too busy for this plot. I
also don’t use the SE unless I’ve fit an actual model and checked the
diagnostics. The status under the hood of geom_smooth() are
also pretty black boxy, and I don’t always know if I can trust its
calculation of SE.
I added a transparency via alpha = 0.5 to the
geom_point(), so the lines show up better.
Add a LOESS smoother and make points more transparent
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = FALSE, span = 0.5, linewidth = 1) + # new line
geom_point(color = "dimgrey", alpha = 0.5) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") # Need to run these first
site_cols <- c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")
site_shps <- c("ACBUBL" = 21, "ACEAGL" = 23,
"ACECHO" = 24, "ACJORD" = 25,
"ACLONG" = 23, "ACSEAL" = 21,
"ACUHAD" = 25, "ACWHOL" = 24)
# Now generate the plot
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(method = "lm", se = FALSE, linewidth = 0.5) +
geom_point(color = "dimgrey", alpha = 0.85) + # or remove the alpha
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
geom_point(color = "dimgrey", alpha = 0.5) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") The plot looks okay so far, but there are so many points, it’s hard
to see what’s going on. This is where facets are useful. If your data
have grouping variables, in this case SiteCode, then you can plot
separate panels for each of the grouping levels. The code below plots
each site separately. I used ncol = 4 to set the number of
columns that result from the facet wrap.
Facet on SiteCode
p_site <-
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temperature (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 4)
p_siteFacet on year instead of site
Faceting on year can also be a handy way to see how consistent
seasonal patterns are across years. Note that we changed the x variable
from date to doy (day of year) in the code below. I also filtered on the
dataset within the ggplot line to only include years after 2015,
increased the point size, and switched to geom_line()
instead of the smoother to just connect the points. We’ll revisit this
plot to code more meaningful x-axis labels in the next section.
p_year <-
ggplot(lakes_temp |> filter(year > 2015),
aes(x = doy, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_line(linewidth = 0.7) +
geom_point(color = "dimgrey", size = 2.5) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temperature (F)", x = "Year") +
facet_wrap(~year, ncol = 3)
p_yearCHALLENGE: Recreate the plot below Note that the
symbol outline is black. The alpha level is 0.6.
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "black", alpha = 0.6, size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temp. (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 2) +
theme(legend.position = 'bottom')Faceting is helpful when your observations are all within the same
column. But say you have data in multiple columns (e.g., each water
quality parameter is a column) and want to arrange those plots into a
grid. Faceting won’t help because the data to plot are in different
columns. There are multiple packages that make it easy to arrange
multiple plots into a grid to look similar to faceted plots. Packages
include grid (and gridExtra),
cowplot, ggpubr, and patchwork.
We’re going to use patchwork, a relative newcomer, and one
of the easiest I’ve found to code and customize. Here we’re going to
plot pH, temperature, DO, and conductance for Jordan Pond and arrange
them using patchwork.
The patchwork package has a lot of options to customize plot layouts. See the patchwork package website for more information.
Prepare the data to plot
pH <- chem |> filter(SiteCode == "ACJORD") |> filter(Parameter == "pH")
temp <- chem |> filter(SiteCode == "ACJORD") |> filter(Parameter == "Temp_F")
dosat <- chem |> filter(SiteCode == "ACJORD") |> filter(Parameter == "DOsat_pct")
cond <- chem |> filter(SiteCode == "ACJORD") |> filter(Parameter == "SpCond_uScm")
p_pH <-
ggplot(pH, aes(x = date, y = Value)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
labs(y = "pH", x = "Year")
p_temp <-
ggplot(temp, aes(x = date, y = Value)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
labs(y = "Temp (F)", x = "Year")
p_do <-
ggplot(dosat, aes(x = date, y = Value)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
labs(y = "DO (%sat.)", x = "Year")
p_cond <-
ggplot(cond, aes(x = date, y = Value)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
labs(y = "Spec. Cond. (uScm)", x = "Year") Arrange plots using patchwork
This is almost too easy to be true, but it really is this easy with patchwork. The patchwork package includes a bunch of options to customize sizes, add annotation, sharing axes across plots.
Arrange plots using patchwork in column of 4 and share x axis.
You can also collect the legend using a similar approach to collecting the axes.
Starting with the plot faceted on SiteCode, I don’t like that the
first year in the data (2006) is missing from the axis. The x-axis is a
Date type, which gives us some useful options to set breaks and labels.
Below, I set the breaks to be every 2 years and to label only years
(%Y). I added the theme() for the
axis.text.x to make the years plot vertically, and the
vjust = 0.5 centers the labels on the tick marks. Note that
I’m assigning the code below to the object p, so I don’t
have to keep typing the original ggplot code over and over.
Improve x-axis labels
p_site2 <- p_site +
scale_x_date(date_labels = "%Y", date_breaks = "2 years") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
p_site2Returning to the p_year plot faceted by year. Instead of
the day of year on the x-axis, we want to manually set up date axis
labels that include the month and day at the beginning of each month.
This is a bit trickier, because doy isn’t a Date type. It’s
still doable, there may be easier ways to do this.
Manually set up date labels for doy axis
## [1] 5 10
# Set up the date range as a Date type
range_date <- as.Date(c("5/1/2025", "11/01/2025"), format = "%m/%d/%Y")
axis_dates <- seq.Date(range_date[1], range_date[2], by = "1 month")
axis_dates## [1] "2025-05-01" "2025-06-01" "2025-07-01" "2025-08-01" "2025-09-01"
## [6] "2025-10-01" "2025-11-01"
axis_dates_label <- format(axis_dates, "%b-%d")
# Find the doy value that matches each of the axis dates
axis_doy <- as.numeric(format(axis_dates, "%j"))
axis_doy## [1] 121 152 182 213 244 274 305
axis_doy_limits <- c(min(axis_doy)-1, max(axis_doy) + 1)
# Set the limits of the x axis as before and after the last sample,
# otherwise cuts off May 1 in axisp_year +
scale_x_continuous(limits = axis_doy_limits,
breaks = axis_doy,
labels = axis_dates_label) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))One of the easier tasks to do with legends is to change the location
of it. In the next plot we’ll put the legend on the bottom. If you don’t
want to plot a legend (don’t really need it when lakes are faceted, for
example, you can turn the legend off using
legend.position = 'none').
Move legend to bottom
The plots above made legends seem easy, but legends in ggplot can be
really tedious. For example, the legend only shows up if you’re setting
grouping variables in the aes(). There was no legend in the
very first plot we made because everything was the same color and
symbol. If, for example, you want to plot thresholds on a plot, their
color needs to be added to the scale_color_manual() to show
in the legend. Let’s pretend, for example, that 50F and 75F are lower
and upper water quality thresholds that we want to plot.
Add horizontal threshold lines to the plot
p_site3 + geom_hline(yintercept = 75, linetype = 'dashed', linewidth = 1) +
geom_hline(yintercept = 50, linetype = 'dotted', linewidth = 1)These lines are not showing in the legend. To make them show, you
need to wrap them in aes(), as below. Note the difference
in how linetype is specified between the above example, where we are
specifying whether the line is dashed or dotted. Inside the
aes(), linetype is being used to label the
line in the legend. We then have to use
scale_linetype_manual() to indicate the type of line to
plot. Admittedly, it took me a couple of Stackoverflow posts to figure
out how to make this work properly. This stuff can be tedious.
Add horizontal threshold lines to the plot and legend
p_site4 <- p_site3 +
geom_hline(aes(yintercept = 75, linetype = "Upper"), linewidth = 1) +
geom_hline(aes(yintercept = 50, linetype = "Lower"), linewidth = 1) +
scale_linetype_manual(values = c("dashed", "dotted"), name = "WQ Threshold")
p_site4Another option is turning certain geometries off in the legend via
show.legend = F. Let’s say we don’t like the lines in the
legend. I have to go back to the full code to change the geom legend
settings. I’m also overriding the alpha level of the symbols, so they
show up better in the legend.
Remove smoothed lines from legend and increase alpha of symbols in legend
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5, show.legend = FALSE) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temperature (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 4) +
scale_x_date(date_labels = "%Y", date_breaks = "2 years") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
legend.position = 'bottom') +
geom_hline(aes(yintercept = 75, linetype = "Upper"), linewidth = 1) +
geom_hline(aes(yintercept = 50, linetype = "Lower"), linewidth = 1) +
scale_linetype_manual(values = c("dashed", "dotted"), name = "WQ Threshold") +
guides(fill = guide_legend(override.aes = list(alpha = 1)))Another option is to remove the SiteCode fill, color, and shape from
the legend via guides(). You can also do this by turning a
geom off using show.legend = F like we did with the
smoother. In the code below we are telling ggplot that any shape, color,
or fill used in an aes() should not be included in the
legend. The WQ thresholds still show up because their aes was
linetype.
Finally, the width of the lines in the WQ thresholds makes it appear that the Lower threshold is solid instead of dashed. I’ll use an option in theme to make the key wider.
Remove SiteCode keys from legend and increase key width of lines.
p_site4 + guides(shape = 'none', color = 'none', fill = 'none') +
theme(legend.key.width = unit(0.8, "cm"))The palettes in RColorBrewer can be viewed by running the code below.
The first group shows the sequential palettes (e.g. YlOrRd - Yellow
Orange Red). The second group shows the qualitative colors. The last
group shows the diverging palettes. The main drawback of these palettes
is they are limited by the number of levels in your data. So, if you
specify Set2 to color code different levels of a factor,
there are only 8 colors available to you. If your factor has more than 8
levels (e.g., 9 sites, 10 parks, etc.), then the levels beyond 8 won’t
get plotted and you’ll get a warning in the console similar to what we
saw for ggplot’s default number of symbols.
View RColorBrewer palettes
Going back to the temperature plots we made before, we’ll use RColorBrewer to color code each site instead of doing this manually. We’ll build the plot in the next chunk, that we then change the color palettes with in later plots.
Create basic plot
p_pal <- ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5, linewidth = 1) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temperature (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 4) +
scale_x_date(date_labels = "%Y", date_breaks = "2 years") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
geom_hline(aes(yintercept = 75, linetype = "Upper"), linewidth = 1) +
geom_hline(aes(yintercept = 50, linetype = "Lower"), linewidth = 1) +
scale_linetype_manual(values = c("dashed", "dotted"), name = "WQ Threshold")Use Set2 palette on temperature plot and remove transparency of symbols in legend
p_pal + scale_color_brewer(name = "Lake", palette = "Set2", aesthetics = c("fill", "color")) +
guides(fill = guide_legend(override.aes = list(alpha = 1))) # solid symbols in legendNote how I used the aesthetics in the
scale_color_brewer() to set fill and color as the same
time. We could have done this in the code above too. I also changed the
symbols in the legend to not be transparent, so they’re easier to see
using the override.aes in the guide.
Use Dark2 palette on temperature plot
p_pal + scale_color_brewer(name = "Lake", palette = "Dark2", aesthetics = c("fill", "color")) +
guides(fill = guide_legend(override.aes = list(alpha = 1))) # makes symbols not transparent in legendThe viridis package comes with 8 palettes. The benefit of viridis is the number of levels is not limited to 8 like RColorBrewer. The palette options are below for 12 levels.
View viridis palettes with hexcodes
You can view the hexcodes of the different palettes by running the
code below. Just change viridis() to one of the other
palette names to get the hexcodes for those levels.
Use viridis default palette on temperature plot
The scale_color_viridis_d() selects the viridis palette
option (purple, green, yellow) for discrete values (i.e. categories).
For a continuous scale (e.g. temperature), you would specify
scale_color_viridis_c().
Use turbo palette on temperature plot
The scale_color_viridis_d() selects the viridis palette
option (purple, green, yellow) for discrete values (i.e. categories).
For a continuous scale (e.g. temperature), you would specify
scale_color_viridis_c().
Heatmaps via geom_tile() are a place where viridis
palettes are especially helpful producing useful sequential or diverging
color palettes. We’ll use the temperature data to plot heatmaps by month
for each site. Heatmaps are a bit different than other plots we’ve seen,
as the x and y values create a discrete grid, and the color in the cell
represents the value for that level of x and y. That means we have to
change how the x, y and color aesthetics are specified. Here we will
plot temperature by month and year faceted on site.
Basic heatmap code
Note the use of base R’s month.abb to set the labels on
the x-axis. The month.abb is a vector of the 12 months
abbreviated as 3 letters. By setting 5:10, I’m taking the months May -
Oct.
p_heat <-
ggplot(lakes_temp, aes(x = mon, y = year, color = Value, fill = Value)) +
theme_bw() +
geom_tile() +
labs(y = "Year", x = "Month") +
facet_wrap(~SiteCode, ncol = 4) +
scale_x_continuous(breaks = c(5, 6, 7, 8, 9, 10),
limits = c(4, 11),
labels = month.abb[5:10]) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))Plot heatmap with viridis continuous palette
Plot heatmap with plasma continuous palette, reverse scale
p_heat + scale_color_viridis_c(name = "Temp. (F)", aesthetics = c("fill", "color"),
option = "plasma", direction = -1) You can also create your own color ramp via
scale_color_gradient(), which creates a 2-color gradient,
scale_color_gradient2(), which creates a diverging color
gradient (low-mid-high), and a scale_color_gradientn(),
which creates an n-color gradient.
Create 2-color gradient
p_heat + scale_color_gradient(low = "#FCFC9A", high = "#F54927",
aesthetics = c("fill", 'color'),
name = "Temp. (F)") Create diverging gradient
For the divergent palette to be meaningful, you usually need to set the midpoint if it’s not 0.
p_heat + scale_color_gradient2(low = "navy", mid = "#FCFC9A", high = "#F54927",
aesthetics = c("fill", 'color'),
midpoint = mean(lakes_temp$Value),
name = "Temp. (F)") Create diverging gradient with multiple colors
Note the change in the legend by using guide = 'legend'.
Default is guide = 'colorbar'. I also customized the breaks
into 5-degree bins using breaks() and
seq().
p_heat + scale_color_gradientn(colors = c("#805A91", "#406AC2", "#FBFFAD", "#FFA34A", "#AB1F1F"),
aesthetics = c("fill", 'color'),
guide = "legend",
breaks = c(seq(40, 85, 5)),
name = "Temp. (F)") Knowing how to code is only part of being a good coder. Below are general best practices to make code easier to run, understand, and be more stable with a relatively low maintenance cost. Many of these suggestions come from lessons working with my and other peoples’ code. The R for Data Science also has a lot of great information on coding best practices in
# libraries
library(dplyr) # for mutate and filter
# parameters
analysis_year <- 2017
# data sets
df <- read.csv("./data/ACAD_wetland_data_clean.csv")
# Filtering on RAM sites, create new site as a number column, and only include data from specified year
df2 <- df |> filter(Site_Type == "RAM") |>
filter(Year == analysis_year) |>
mutate(site_num = as.numeric(substr(Site_Name, 5, 6)))Object names must start with a letter and can only contain letters, numbers, underscore, and period. Spaces aren’t allowed in object names, and are best avoided in column names of data frames too. Descriptive object names will help you digest code, and often you’ll want more than one word in the name. There are multiple cases that people tend to use, the most common of which tends to be snake_case. Other examples are below.
Ordering words in names, so that objects that are similar or derived from each other sort together. This also makes coding easier, as like objects will sort together in the popups that you see as you code.
# good word order
ACAD_wet <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_wet2 <- ACAD_wet |> filter(year > 2020)
ACAD_wet3 <- ACAD_wet2 |> mutate(plot_type = "RAM")
# bad word order
wet_ACAD <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_after_2020 <- wet_ACAD |> filter(year > 2020)
RAM_ACAD_2020 <- ACAD_after_2020 |> mutate(plot_type = "RAM")It’s helpful to balance descriptive names with length. The longer the object name, the more typing you have to do to refer to that object. Coding long names, such as long column names in data frames, is cumbersome and inefficient. Compare the two objects below. While I doubt many would make super long object names like this, I commonly see excessively long column names in data packages. Limiting column names to 12 characters or less is super helpful for coders using those data.
# super long names
ACAD_wetland_sampling_data <- data.frame(years_plots_were_sampled = c(2020:2025), wetland_plots_sampled = c(1:6))
ACAD_wetland_sampling_data2 <- ACAD_wetland_sampling_data |> filter(years_plots_were_sampled > 2020)
# shorter still meaningful
ACAD_wet <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_wet2 <- ACAD_wet |> filter(year > 2020)Code style refers to consistent use of case, indenting, spacing, line width, etc. There are several style conventions out there. I tend to use the tidyverse style guide, which is based on Google’s R style guide.
Style conventions I follow:<-,
=, ==, |>, +,
etc.
Example 1. Style for pipes
# Good code
trees_final <- trees |>
mutate(DecayClassCode_num = as.numeric(DecayClassCode),
Plot_Name = paste(ParkUnit, PlotCode, sep = "-"),
Date = as.Date(SampleDate, format = "%m/%d/%Y")) |>
rename("Species" = "ScientificName") |>
filter(IsQAQC == FALSE) |>
select(-DecayClassCode) |>
arrange(Plot_Name, TagCode)
# Same code, but much harder to follow
trees_final <- trees|>mutate(DecayClassCode_num=as.numeric(DecayClassCode), Plot_Name=paste(ParkUnit,PlotCode,sep = "-"), Date=as.Date(SampleDate,format="%m/%d/%Y"))|> rename("Species"="ScientificName")|>filter(IsQAQC==FALSE)|>select(-DecayClassCode)|>arrange(Plot_Name,TagCode)Example 2. Style for ggplot object
# Good code
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
geom_line() +
geom_point(color = "black", fill = "#82C2a3", size = 2.5, shape = 24) +
labs(x = "Year",
y = "Annual visitors in 1000's") +
scale_y_continuous(limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) +
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) +
theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = 'white', color = 'dimgrey'),
title = element_text(size = 10)
)
# Same code but hard to follow
ggplot(data=visits,aes(x=Year,y=Annual_Visits/1000))+geom_line()+geom_point(color="black",fill="#82C2a3",size=2.5,shape=24) +
labs(x = "Year", y = "Annual visitors in 1000's")+
scale_y_continuous(limits=c(2000,4500),breaks=seq(2000,4500,by=500))+
scale_x_continuous(limits=c(1994,2024),breaks=c(seq(1994,2024,by=5)))+
theme(axis.text.x=element_text(size=10,angle=45,hjust=1), panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),panel.background=element_rect(fill='white',color='dimgrey'),
title = element_text(size = 10))Using projects instead of stand alone scripts helps keep the various pieces of an analysis project in one place and more easily transferable across computers. Logical naming of scripts, so they sort easily, is also helpful.
Order and purpose of file names easy to follow
Hard to know script order and purpose
If you’re starting a new R session to answer these questions, you’ll need to read in the wetland and tree data frames again.
Read in example ACAD wetland data from url
ACAD_wetland <- read.csv(
"https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/ACAD_wetland_data_clean.csv"
)Read in example NETN tree data from url
How would you look at the the first 4 even rows (2, 4, 6, 8), and first 2 columns?
How many unique species are there in the ACAD_wetland data frame?
Which sites have species that are considered protected on them (Protected = TRUE)?
Option 1. Subset data then calculate number of rows
## [1] 6
Option 2. Subset the data with brackets and use the
table() function to tally status codes.
## < table of extent 0 >
Find the DBH record that’s > 400cm DBH.
There are multiple ways to do this. Two examples are below.
Option 1. View the data and sort by DBH.
Option 2. Find the max DBH value and subset the data frame
## ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN ScientificName
## 26 MIMA 16 6/17/2025 FALSE 2025 1 19447 Quercus robur
## DBHcm TreeStatusCode CrownClassCode DecayClassCode
## 26 443 AS 3 <NA>
What is the exact value of the largest DBH, and which record does it belong to?
There are multiple ways to do this. Two examples are below.
Option 1. View the data and sort by DBH.
Option 2. Find the max DBH value and subset the data frame
## [1] 443
## ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN ScientificName
## 26 MIMA 16 6/17/2025 FALSE 2025 1 19447 Quercus robur
## DBHcm TreeStatusCode CrownClassCode DecayClassCode
## 26 443 AS 3 <NA>
Fix the DBH typo by replacing 443.0 with 44.3.
Let’s say that you looked at the datasheet, and the actual DBH for that tree was 44.3 instead of 443.0. You can change that value in the original CSV by hand. But even better is to document that change in code. There are multiple ways to do this. Two examples are below.
But first, it’s good to create a new data frame when modifying the original data frame, so you can refer back to the original if needed. I also use a really specific filter to make sure I’m not accidentally changing other data.
Replace 443 with 44.3
# create copy of trees data
trees_fix <- trees
# find the problematic DBH value, and change it to 44.3
trees_fix$DBHcm[trees_fix$Plot_Name == "MIMA-016" & trees_fix$TagCode == 1 & trees_fix$DBHcm == 443] <- 44.3Check that it worked by showing the range of the original and fixed data frames.
## [1] 10 443
## [1] 10 443
Load dplyr
Read in example NETN tree data from url
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")Create the tree_final data frame
trees_final <- trees |>
mutate(DecayClassCode_num = as.numeric(DecayClassCode),
Plot_Name = paste(ParkUnit, PlotCode, sep = "-"),
Date = as.Date(SampleDate, format = "%m/%d/%Y")) |>
rename("Species" = "ScientificName") |>
filter(IsQAQC == FALSE) |>
select(-DecayClassCode)Read in example ACAD wetland data from url
Load packages
Prep the data
How many trees are on Plot MIMA-12 (using trees_final)?
What is the exact value of the largest DBH, and which record does it belong to?
# Base R and dplyr combo
max_dbh <- max(trees_final$DBHcm, na.rm = TRUE)
trees_final |>
filter(DBHcm == max_dbh) |>
select(Plot_Name, SampleYear, TagCode, Species, DBHcm)## Plot_Name SampleYear TagCode Species DBHcm
## 1 MIMA-16 2025 1 Quercus robur 443
# dplyr with slice
trees_final |>
arrange(desc(DBHcm)) |> # arrange DBHcm high to low via desc()
slice(1) |> # slice the top record
select(Plot_Name, SampleYear, TagCode, Species, DBHcm)## Plot_Name SampleYear TagCode Species DBHcm
## 1 MIMA-16 2025 1 Quercus robur 443
Fix the DBH typo by replacing 443.0 with 44.3.
# Base R
# create copy of trees data
trees_fix <- trees3
# find the problematic DBH value, and change it to 44.3
trees_fix$DBHcm[trees_fix$Plot_Name == "MIMA-016" & trees_fix$TagCode == 1 & trees_fix$DBHcm == 443] <- 44.3
# dplyr via replace
trees_fix <- trees |> mutate(DBHcm = replace(DBHcm, DBHcm == 443.0, 44.3))Check that it worked by showing the range of the original and fixed data frames.
## [1] 10 443
## [1] 10.0 81.5
Using the ACAD_wetland data, create a new column called Status that has “protected” for Protected = TRUE and “public” values for Protected = FALSE.
# read in wetland data if you don't already have it loaded.
ACAD_wetland <- read.csv("./data/ACAD_wetland_data_clean.csv")
# Base R using the with() function
ACAD_wetland$Status <- with(ACAD_wetland, ifelse(Protected == TRUE, "protected", "public"))
table(ACAD_wetland$Status, ACAD_wetland$Protected)# Tidyverse
ACAD_wetland <- ACAD_wetland |> mutate(Status = ifelse(Protected == TRUE, "protected", "public"))
table(ACAD_wetland$Status, ACAD_wetland$Protected) # check your work##
## FALSE TRUE
## protected 0 9
## public 499 0
# Base R using the with() function and nested ifelse()
ACAD_wetland$abundance_cat <- with(ACAD_wetland, ifelse(Ave_Cov < 10, "Low",
ifelse(Ave_Cov >= 10 & Ave_Cov <= 50, "Medium", "High")))# Tidyverse using case_when() and between
ACAD_wetland <- ACAD_wetland |> mutate(abundance_cat = case_when(Ave_Cov < 10 ~ "Low",
between(Ave_Cov, 10, 50) ~ "Medium",
TRUE ~ "High"))
table(ACAD_wetland$abundance_cat)##
## High Low Medium
## 6 464 38
Note the use of the between() function that saves
typing. This function matches as >= and <=.
# Using group_by()
ACAD_inv <- ACAD_wetland |> group_by(Site_Name, Year, Invasive) |>
summarize(Pct_Cov = sum(Ave_Cov),
.groups = 'drop') |> # optional line to keep console from being chatty
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_inv)## # A tibble: 6 × 4
## Site_Name Year Invasive Pct_Cov
## <chr> <int> <lgl> <dbl>
## 1 RAM-05 2012 FALSE 155.
## 2 RAM-05 2017 FALSE 152.
## 3 RAM-05 2017 TRUE 0.06
## 4 RAM-41 2012 FALSE 48.6
## 5 RAM-41 2017 FALSE 107.
## 6 RAM-41 2017 TRUE 10.2
# Using summarize(.by)
ACAD_inv2 <- ACAD_wetland |>
summarize(Pct_Cov = sum(Ave_Cov), .by = c(Site_Name, Year, Invasive)) |>
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_inv2) # should be the same as ACAD_inv## # A tibble: 6 × 4
## Site_Name Year Invasive Pct_Cov
## <chr> <int> <lgl> <dbl>
## 1 RAM-05 2012 FALSE 155.
## 2 RAM-05 2017 FALSE 152.
## 3 RAM-05 2017 TRUE 0.06
## 4 RAM-41 2017 FALSE 107.
## 5 RAM-41 2012 FALSE 48.6
## 6 RAM-41 2017 TRUE 10.2
# Using group_by()
ACAD_spp <- ACAD_wetland |> group_by(Site_Name, Year, Invasive) |>
summarize(num_spp = n(),
.groups = 'drop') |> # optional line to keep console from being chatty
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_spp)## # A tibble: 6 × 4
## Site_Name Year Invasive num_spp
## <chr> <int> <lgl> <int>
## 1 RAM-05 2012 FALSE 44
## 2 RAM-05 2017 FALSE 53
## 3 RAM-05 2017 TRUE 1
## 4 RAM-41 2012 FALSE 33
## 5 RAM-41 2017 FALSE 39
## 6 RAM-41 2017 TRUE 1
# Using summarize(.by)
ACAD_spp2 <- ACAD_wetland |>
summarize(num_spp = n(), .by = c(Site_Name, Year, Invasive)) |>
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_spp2) # should be the same as ACAD_inv## # A tibble: 6 × 4
## Site_Name Year Invasive num_spp
## <chr> <int> <lgl> <int>
## 1 RAM-05 2012 FALSE 44
## 2 RAM-05 2017 FALSE 53
## 3 RAM-05 2017 TRUE 1
## 4 RAM-41 2017 FALSE 39
## 5 RAM-41 2012 FALSE 33
## 6 RAM-41 2017 TRUE 1
Most efficient solution figured out during training
# using the .by within mutate (newer solution)
ACAD_wetland <- ACAD_wetland |>
mutate(Site_Cover = sum(Ave_Cov),
.by = c(Site_Name, Year)) |>
mutate(rel_cov = (Ave_Cov/Site_Cover)*100,
.by = c(Site_Name, Year, Latin_Name, Common))Original Solution: First sum site-level cover using mutate to return a value for every original row.
# older solution
ACAD_wetland <- ACAD_wetland |> group_by(Site_Name, Year) |>
mutate(Site_Cover = sum(Ave_Cov)) |>
ungroup() # good practice to ungroup after group.
table(ACAD_wetland$Site_Name, ACAD_wetland$Site_Cover) # check that each site has a unique value.##
## 48.56 70.6 104.78 106.72 111.4 117.24 152.1 153.8 155.42 165.64 178.34
## RAM-05 0 0 0 0 0 0 54 0 44 0 0
## RAM-41 33 0 0 0 0 40 0 0 0 0 0
## RAM-44 0 0 45 0 0 0 0 0 0 0 34
## RAM-53 0 0 0 0 0 0 0 0 0 0 0
## RAM-62 0 0 0 0 0 0 0 26 0 26 0
## SEN-01 0 34 0 0 0 0 0 0 0 0 0
## SEN-02 0 0 0 41 0 0 0 0 0 0 0
## SEN-03 0 0 0 0 33 0 0 0 0 0 0
##
## 188.84 196.52
## RAM-05 0 0
## RAM-41 0 0
## RAM-44 0 0
## RAM-53 48 50
## RAM-62 0 0
## SEN-01 0 0
## SEN-02 0 0
## SEN-03 0 0
## # A tibble: 6 × 15
## Site_Name Site_Type Latin_Name Common Year PctFreq Ave_Cov Invasive Protected
## <chr> <chr> <chr> <chr> <int> <int> <dbl> <lgl> <lgl>
## 1 SEN-01 Sentinel Acer rubr… red m… 2011 0 0.02 FALSE FALSE
## 2 SEN-01 Sentinel Amelanchi… servi… 2011 20 0.02 FALSE FALSE
## 3 SEN-01 Sentinel Andromeda… bog r… 2011 80 2.22 FALSE FALSE
## 4 SEN-01 Sentinel Arethusa … drago… 2011 40 0.04 FALSE TRUE
## 5 SEN-01 Sentinel Aronia me… black… 2011 100 2.64 FALSE FALSE
## 6 SEN-01 Sentinel Carex exi… coast… 2011 60 6.6 FALSE FALSE
## # ℹ 6 more variables: X_Coord <dbl>, Y_Coord <dbl>, Status <chr>,
## # abundance_cat <chr>, Site_Cover <dbl>, rel_cov <dbl>
Next calculate relative cover grouped on Site_Name, Year, Latin_Name, and Common
# Create new dataset because collapsing rows on grouping variables
# Using group_by() and summarize()
ACAD_wetland_relcov <- ACAD_wetland |> group_by(Site_Name, Year, Latin_Name, Common) |>
summarize(rel_cov = (Ave_Cov/Site_Cover)*100,
.groups = 'drop') |>
ungroup()Check that relative cover sums to 100% within each site
# Using summarize(.by = )
ACAD_wetland_relcov2 <- ACAD_wetland |> #group_by(Site_Name, Year, Latin_Name, Common) |>
summarize(rel_cov = Ave_Cov/Site_Cover,
.by = c("Site_Name", "Year", "Latin_Name", "Common"))
# Check that your relative cover sums to 100 for each site
relcov_check <- ACAD_wetland_relcov2 |> group_by(Site_Name, Year) |>
summarize(tot_relcov = sum(rel_cov)*100, .groups = 'drop')
table(relcov_check$tot_relcov) # they should all be 100##
## 100
## 13
Recreate the plot below (or customize your own plot). Note that the fill color is “#0080FF”, and the shape is 21, and the theme is classic. The linewidth = 0.75, and linetype = ‘dashed’ .
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
geom_line(linewidth = 0.75, linetype = 'dashed') +
geom_point(color = "black", fill = "#0080FF", size = 2.5, shape = 21) +
labs(x = "Year", y = "Annual visits in 1,000s") +
scale_y_continuous(limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) +
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) +
theme_classic()Load data and dplyr
Load data and dplyr
library(dplyr)
# tree data
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")
# tree species table
spp_tbl <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_species_table.csv")Load and prep data.
# Hobo Temp data
temp_data <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/HOBO_temp_example.csv", skip = 1)[,1:3]
colnames(temp_data) <- c("index", "temp_time", "tempF")
temp_data$timestamp_temp <- as.POSIXct(temp_data$temp_time,
format = "%m/%d/%Y %H:%M",
tz = "America/New_York")Load packages and prep data for ggplot sections
# Water chemistry data for ggplot section
library(dplyr)
library(ggplot2)
library(patchwork) # for arranging ggplot objects
library(RColorBrewer) # for palettes
library(viridis) # for palettes
chem <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_water_chemistry_data.csv")
chem <- chem |> mutate(date = as.Date(EventDate, "%m/%d/%Y"),
year = as.numeric(format(date, "%Y")),
mon = as.numeric(format(date, "%m")),
doy = as.numeric(format(date, "%j")))
ACAD_lakes <- c("ACBUBL", "ACEAGL", "ACECHO", "ACJORD",
"ACLONG", "ACSEAL", "ACUHAD", "ACWHOL")
lakes_temp <- chem |> filter(SiteCode %in% ACAD_lakes) |>
filter(Parameter %in% "Temp_F") Pivot the bat_sum data frame on year instead of
species, so that you have a column for every year of captures. Remember
to avoid column names starting with a number.
bat_wide_yr <- pivot_wider(bat_sum, names_from = Year,
values_from = num_indiv,
values_fill = 0,
names_prefix = "yr")
head(bat_wide_yr)## # A tibble: 6 × 9
## Site sppcode yr2019 yr2020 yr2021 yr2022 yr2023 yr2024 yr2025
## <chr> <chr> <int> <int> <int> <int> <int> <int> <int>
## 1 site_001 LASCIN 1 0 0 1 0 0 0
## 2 site_001 MYOLEI 0 1 1 2 1 0 2
## 3 site_001 MYOSEP 0 1 0 0 0 0 0
## 4 site_001 MYOLUC 0 0 1 1 0 2 0
## 5 site_002 LASCIN 1 0 1 0 0 0 1
## 6 site_002 MYOLEI 1 2 1 0 0 2 0
Pivot the resulting data frame from the previous question to
long on the years columns, and remove the “yr” from the year names using
names_prefix = 'yr'.
Join the NETN_tree_data.csv and NETN_tree_species_table.csv to connect the common name to the tree data.
## [1] "TSN" "ScientificName"
# left join species to trees, because don't want to include species not found in tree data
trees_spp <- left_join(trees,
spp_tbl |> select(TSN, ScientificName, CommonName),
by = c("TSN", "ScientificName"))
head(trees_spp)## ParkUnit PlotCode SampleDate IsQAQC SampleYear TagCode TSN ScientificName
## 1 MIMA 12 6/16/2025 FALSE 2025 13 183385 Pinus strobus
## 2 MIMA 12 6/16/2025 FALSE 2025 12 28728 Acer rubrum
## 3 MIMA 12 6/16/2025 FALSE 2025 11 28728 Acer rubrum
## 4 MIMA 12 6/16/2025 FALSE 2025 2 28728 Acer rubrum
## 5 MIMA 12 6/16/2025 FALSE 2025 10 28728 Acer rubrum
## 6 MIMA 12 6/16/2025 FALSE 2025 7 28728 Acer rubrum
## DBHcm TreeStatusCode CrownClassCode DecayClassCode CommonName
## 1 24.9 AS 5 <NA> eastern white pine
## 2 10.9 AB 5 <NA> red maple
## 3 18.8 AS 3 <NA> red maple
## 4 51.2 AS 3 <NA> red maple
## 5 38.2 AS 3 <NA> red maple
## 6 22.5 AS 4 <NA> red maple
## [1] "TSN" "ScientificName"
# anti join of trees against species table, selecting only columns of interest
anti_join(trees, spp_tbl, by = c("TSN", "ScientificName")) |>
select(ParkUnit, PlotCode, SampleYear, ScientificName)## ParkUnit PlotCode SampleYear ScientificName
## 1 MIMA 16 2025 Quercus robur
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y")
seq.Date(date_list[1], date_list[2], by = "1 week")## [1] "2026-01-01" "2026-01-08" "2026-01-15" "2026-01-22" "2026-01-29"
## [6] "2026-02-05" "2026-02-12" "2026-02-19" "2026-02-26" "2026-03-05"
## [11] "2026-03-12" "2026-03-19" "2026-03-26" "2026-04-02" "2026-04-09"
## [16] "2026-04-16" "2026-04-23" "2026-04-30" "2026-05-07" "2026-05-14"
## [21] "2026-05-21" "2026-05-28" "2026-06-04" "2026-06-11" "2026-06-18"
## [26] "2026-06-25" "2026-07-02" "2026-07-09" "2026-07-16" "2026-07-23"
## [31] "2026-07-30" "2026-08-06" "2026-08-13" "2026-08-20" "2026-08-27"
## [36] "2026-09-03" "2026-09-10" "2026-09-17" "2026-09-24" "2026-10-01"
## [41] "2026-10-08" "2026-10-15" "2026-10-22" "2026-10-29" "2026-11-05"
## [46] "2026-11-12" "2026-11-19" "2026-11-26" "2026-12-03" "2026-12-10"
## [51] "2026-12-17" "2026-12-24" "2026-12-31"
## index temp_time tempF timestamp_temp month_num
## 1 1 7/18/2021 10:26 58.842 2021-07-18 10:26:00 7
## 2 2 7/18/2021 11:26 58.712 2021-07-18 11:26:00 7
## 3 3 7/18/2021 12:26 58.109 2021-07-18 12:26:00 7
## 4 4 7/18/2021 13:26 56.208 2021-07-18 13:26:00 7
## 5 5 7/18/2021 14:26 56.208 2021-07-18 14:26:00 7
## 6 6 7/18/2021 15:26 55.342 2021-07-18 15:26:00 7
## index temp_time tempF timestamp_temp month_num julian
## 1 1 7/18/2021 10:26 58.842 2021-07-18 10:26:00 7 199
## 2 2 7/18/2021 11:26 58.712 2021-07-18 11:26:00 7 199
## 3 3 7/18/2021 12:26 58.109 2021-07-18 12:26:00 7 199
## 4 4 7/18/2021 13:26 56.208 2021-07-18 13:26:00 7 199
## 5 5 7/18/2021 14:26 56.208 2021-07-18 14:26:00 7 199
## 6 6 7/18/2021 15:26 55.342 2021-07-18 15:26:00 7 199
# Will need to run these first
site_cols <- c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")
site_shps <- c("ACBUBL" = 21, "ACEAGL" = 23,
"ACECHO" = 24, "ACJORD" = 25,
"ACLONG" = 23, "ACSEAL" = 21,
"ACUHAD" = 25, "ACWHOL" = 24)
# Now generate the plot
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_point(color = "dimgrey", size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") # answerggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(method = "lm", se = FALSE, linewidth = 0.5) +
geom_point(color = "dimgrey", alpha = 0.85) + # or remove the alpha
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") # Need to run these first
site_cols <- c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")
site_shps <- c("ACBUBL" = 21, "ACEAGL" = 23,
"ACECHO" = 24, "ACJORD" = 25,
"ACLONG" = 23, "ACSEAL" = 21,
"ACUHAD" = 25, "ACWHOL" = 24)
# Now generate the plot
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(method = "lm", se = FALSE, linewidth = 0.5) +
geom_point(color = "dimgrey", alpha = 0.85) + # or remove the alpha
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
geom_point(color = "dimgrey", alpha = 0.5) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") CHALLENGE: Recreate the plot below Note that the
symbol outline is black. The alpha level is 0.6.
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "black", alpha = 0.6, size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temp. (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 2) +
theme(legend.position = 'bottom')There are a number of options to get help with R. If you’re trying to
figure out how to use a function, you can type ?function_name. For
example ?plot will show the R documentation for that
function in the Help panel.
Get help for the functions below
You can also press F1 while the cursor is on a function name to access the help for that function. Help documents in R are standardized to help you find what you’re looking for.Great online resources to find answers to questions include Stackexchange, and Stackoverflow. Google searches are usually my first step, and I include “in R” and the package name (if applicable) in every search related to R code. If you’re troubleshooting an error message, copying and pasting the error message verbatim into a search engine often helps.
Don’t hesitate to reach out to colleagues for help as well! If you are stuck on something and the answers on Google are more confusing than helpful, don’t be afraid to ask a human. Every experienced R programmer was a beginner once, so chances are they’ve encountered the same problem as you at some point. There is an R-focused Data Science Community of Practice for I&M folks, which anyone working in R (regardless of experience!) is invited and encouraged to join.
Unmatched parenthesis
mean_x <- mean(c(1, 3, 5, 7, 8, 21) # missing closing parentheses
mean_x <- mean(c(1, 3, 5, 7, 8, 21)) # correctUnmatched quotes
birds <- c("black-capped chickadee", "golden-crowned kinglet, "wood thrush") # missing quote after kingletMissing a comma between elements
birds <- c("black-capped chickadee", "golden-crowned kinglet" "wood thrush") # missing comma after kinglet
birds <- c("black-capped chickadee", "golden-crowned kinglet", "wood thrush") # correctedMisspelled function name
Incorrect use of dimensions with brackets
# Missing comma to indicate subsetting rows (records)
ACAD_wetland2 <- ACAD_wetland[!is.na(ACAD_wetland$Site_Name)]## Error in `ACAD_wetland[!is.na(ACAD_wetland$Site_Name)]`:
## ! Can't subset columns with `!is.na(ACAD_wetland$Site_Name)`.
## ✖ Logical subscript `!is.na(ACAD_wetland$Site_Name)` must be size 1 or 15, not 508.
There’s a lot of great online material for learning new applications of R. The ones we’ve used the most are listed below.
While we won’t get to these topics this week, the 2022 Advanced R training has sessions covering all of these topics. The Resources tab includes other online resources that cover these topics as well.
knitr::opts_chunk$set(warning=FALSE, message=FALSE)
hooks = knitr::knit_hooks$get()
hook_foldable = function(type) {
force(type)
function(x, options) {
res = hooks[[type]](x, options)
if (isFALSE(options[[paste0("fold.", type)]])) return(res)
paste0(
"<details><summary class='code2'>View R ", type, "</summary>\n",
res, "\n\n",
"</details>",
"\n\n",
"<hr style='height:1px; margin-bottom:15px; padding-bottom:15px; padding-top:-15px;margin-top:-15px;visibility:hidden;'>",
"\n\n"
)
}
}
knitr::knit_hooks$set(
output = hook_foldable("output"),
plot = hook_foldable("plot")
)
body {
background-color: #EBEBEB;
}
.tab-content {
background-color: #FAFAF0;
padding: 0 5px;
}
library(tidyverse)
#------------------------------------
# Day 0 - prep code
#------------------------------------
rm(list = ls())
packages <- c("tidyverse", # for Day 2 and 3 data wrangling
"RColorBrewer", "viridis", "patchwork", # for Day 3 ggplot
"readxl", "writexl") # for day 1 importing from excel
install.packages(setdiff(packages, rownames(installed.packages())))
# Check that installation worked
library(tidyverse) # turns on core tidyverse packages
library(RColorBrewer) # palette generator
library(viridis) # more palettes
library(patchwork) # multipanel plots
library(readxl) # reading xlsx
library(writexl) # writing xlsx
#------------------------------------
# Day 1: Project Setup Code
#------------------------------------
# forward slash file path approach
"C:/Users/KMMiller/OneDrive = DOI/data/"
# backward slash file path approach
"C:\\Users\\KMMiller\\OneDrive = DOI\\data\\"
dir.create("data")
list.files() # you should see a data folder listed
#------------------------------------
# Day 1: Start Coding Code
#------------------------------------
# Commented text: try this line to generate some basic text and become familiar with where results will appear:
print("Welcome to R!")
# simple math
1+1
(2*3)/4
sqrt(9)
# calculate basal area of tree with 14.6cm diameter; note pi is built in constant in R
(14.6^2)*pi
# get the cosine of 180 degrees - note that trig functions in R expect angles in radians
cos(pi)
# the value of 12.098 is assigned to variable 'a'
a <- 12.098
# and the value 65.3475 is assigned to variable 'b'
b <- 65.3475
# we can now perform whatever mathematical operations we want using these two
# variables without having to repeatedly type out the actual numbers:
a*b
(a^b)/((b+a))
sqrt((a^7)/(b*2))
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# equivalent to x <- 1:10
# bad coding
#mean <- mean(x)
# good coding
mean_x <- mean(x)
mean_x
range_x <- range(x)
range_x
#------------------------------------
# Day 1: Read and Write Code
#------------------------------------
# read in the data from ACAD_wetland_data_clean.csv and assign it as a dataframe to the variable "ACAD_wetland"
ACAD_wetland <- read.csv(
"https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/ACAD_wetland_data_clean.csv"
)
# View the ACAD_wetland data frame we just created
View(ACAD_wetland)
# Look at the top 6 rows of the data frame
head(ACAD_wetland)
# Look at the bottom 6 rows of the data frame
tail(ACAD_wetland)
# Write the data frame to your data folder using a relative path.
# By default, write.csv adds a column with row names that are numbers. I don't
# like that, so I turn that off.
write.csv(ACAD_wetland, "./data/ACAD_wetland_data_clean.csv", row.names = FALSE)
# Read the data frame in using a relative path
ACAD_wetland <- read.csv("./data/ACAD_wetland_data_clean.csv")
# Equivalent code to read in the data frame using full path on my computer, but won't match another user.
ACAD_wetland <- read.csv("C:/Users/KMMiller/OneDrive - DOI/NETN/R_Dev/IMD_R_Training_2026/data/ACAD_wetland_data_clean.csv")
install.packages("readxl") # only need to run once.
install.packages("writexl")
library(writexl) # saving xlsx
library(readxl) # importing xlsx
write_xlsx(ACAD_wetland, "./data/ACAD_wetland_data_clean.xlsx")
ACAD_wetxls <- read_xlsx(path = "./data/ACAD_wetland_data_clean.xlsx", sheet = "Sheet1")
head(ACAD_wetxls)
#------------------------------------
# Day 1: Vectors Code
#------------------------------------
digits <- c(1:10) # Use x:y to create a sequence of integers starting at x and ending at y
digits
digits + 1 # note how 1 was added to every element of digits.
is_odd <- rep(c(FALSE, TRUE), 5) # Use rep(x, n) to create a vector by repeating x n times
is_odd
tree_dbh <- c(12.5, 20.4, 18.1, 38.5, 19.3)
tree_dbh
bird_ids <- c("black-capped chickadee", "dark-eyed junco", "golden-crowned kinglet", "dark-eyed junco")
bird_ids
second_bird <- bird_ids[2]
second_bird
top_two_birds <- bird_ids[c(1,2)]
top_two_birds
sort(unique(bird_ids))
class(bird_ids)
class(tree_dbh)
class(digits)
class(is_odd)
str(ACAD_wetland)
names(ACAD_wetland)
ACAD_wetland$Site_Name
ACAD_wetland$Latin_Name
dim(ACAD_wetland)
nrow(ACAD_wetland) # first dim
ncol(ACAD_wetland) # second dim
ACAD_wetland[1:5,]
ACAD_wetland[c(1, 2, 3, 4, 5),] #equivalent but more typing
ACAD_wetland[, c("Site_Name", "Latin_Name", "Common", "Year", "PctFreq")]
ACAD_wetland[1:5, c("Site_Name", "Latin_Name", "Common", "Year", "PctFreq")]
ACAD_sub <- ACAD_wetland[ , 1:4] # works, but risky
ACAD_sub2 <-
ACAD_wetland[,c("Site_Name", "Site_Type", "Latin_Name", "Common")] #same result, but better
# compare the two data frames to the original
head(ACAD_wetland)
head(ACAD_sub)
head(ACAD_sub2)
ACAD_wetland[c(2, 4, 6, 8), c(1, 2)]
names(ACAD_wetland) # get the names of the first 2 columns
ACAD_wetland[c(2, 4, 6, 8), c("Site_Name", "Site_Type")]
head(ACAD_wetland)
ACAD_nat <- ACAD_wetland[ACAD_wetland$Invasive == FALSE, ]
table(ACAD_wetland$Invasive) # 9 T
table(ACAD_nat$Invasive) # No T
ACAD_wetland$Latin_Name[ACAD_wetland$Invasive == TRUE]
ACAD_wetland[ACAD_wetland$Invasive == TRUE, "Latin_Name"] # equivalent
orchid_spp <- c("Arethusa bulbosa", "Calopogon tuberosus", "Pogonia ophioglossoides")
ACAD_orchid_plots <- ACAD_wetland[ACAD_wetland$Latin_Name %in% orchid_spp,
c("Site_Name", "Year", "Latin_Name")]
ACAD_orchid_plots
# Return a vector of unique site names, sorted alphabetically
sites_unique <- sort(unique(ACAD_wetland[,"Site_Name"]))
sites_unique
# Returns the number of elements in sites_unique vector
length(sites_unique) # 8
# Option 1
length(unique(ACAD_wetland[, "Latin_Name"]))
# Option 2
length(unique(ACAD_wetland$Latin_Name)) # equivalent
# Option 1 - used unique to just return unique site name
unique(ACAD_wetland$Site_Name[ACAD_wetland$Protected == TRUE])
# Option 2
unique(ACAD_wetland[ACAD_wetland$Protected == TRUE, "Site_Name"])
#-----------------------------------------
# Day 1: Data Exploration Code
#-----------------------------------------
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")
head(trees)
str(trees)
summary(trees)
table(complete.cases(trees[,1:10]))# all true
x <- c(1, 3, 8, 3, 5, NA)
mean(x) # returns NA
mean(x, na.rm = TRUE)
sort(unique(trees$DecayClassCode)) # sorts the unique values in the column
table(trees$DecayClassCode) # shows the number of records per value - very handy
trees2 <- trees
trees2$DecayClassCode[trees2$DecayClassCode == "PM"] <- NA
trees2$DecayClassCode_num <- as.numeric(trees2$DecayClassCode)
# check that it worked
str(trees2) # DecayClassCode_num is numeric
sort(unique(trees2$DecayClassCode_num)) # Only numbers show in table
trees3 <- subset(trees2, IsQAQC == FALSE, select = -DecayClassCode) # Note the importance of FALSE all caps
trees3 <- subset(trees2, IsQAQC != TRUE, select = -DecayClassCode) # equivalent
trees3 <- trees2[trees2$IsQAQC == FALSE, -12] #equivalent but not as easy to follow
# Look at the sample date format
head(trees3$SampleDate) # month/day/year
# Create new column called Date
trees3$Date <- as.Date(trees3$SampleDate, format = "%m/%d/%Y")
str(trees3)
names(trees3) # original names
names(trees3)[names(trees3) == "ScientificName"] <- "Species"
names(trees3) # check that it worked
trees3$Plot_Name <- paste(trees3$ParkUnit, trees3$PlotCode, sep = "-")
trees3$Plot_Name <- paste0(trees3$ParkUnit, "-", trees3$PlotCode) #equivalent- by default no separation between elements of paste.
mima12 <- subset(trees3, Plot_Name == "MIMA-12")
nrow(mima12) # 12
mima12_as <- subset(trees3, Plot_Name == "MIMA-12" & TreeStatusCode == "AS")
nrow(mima12_as) # 6
# OPTION 2
mima12 <- trees3[trees3$Plot_Name == "MIMA-12",]
table(mima12$TreeStatusCode) # 6
View(trees3)
max_dbh <- max(trees3$DBHcm, na.rm = TRUE)
trees3[trees3$DBHcm == max_dbh,]
View(trees)
max_dbh <- max(trees3$DBHcm, na.rm = TRUE)
max_dbh #443
trees[trees3$DBHcm == max_dbh,]
# Plot MIMA-016, TagCode = 1.
# create copy of trees data
trees_fix <- trees3
# find the problematic DBH value, and change it to 44.3
trees_fix$DBHcm[trees_fix$Plot_Name == "MIMA-016" & trees_fix$TagCode == 1 & trees_fix$DBHcm == 443] <- 44.3
range(trees$DBHcm)
range(trees_fix$DBHcm)
#------------------------------------
# Day 1: Basic Plotting Code
#------------------------------------
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")
hist(x = trees$DBHcm)
plot(trees$DBHcm)
plot(trees$DBHcm ~ trees$CrownClassCode)
plot(DBHcm ~ CrownClassCode, data = trees) # equivalent but cleaner axis titles
hist(ACAD_wetland$Ave_Cov)
#------------------------------------
# Day 2: Tidyverse Code
#------------------------------------
install.packages('tidyverse')
library(tidyverse)
library(dplyr)
#------------------------------------
# Day 2: Data Wrangling Code
#------------------------------------
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")
# Base R
trees2 <- trees
trees2$DecayClassCode[trees2$DecayClassCode == "PM"] <- NA
trees2$DecayClassCode_num <- as.numeric(trees2$DecayClassCode)
# dplyr approach with mutate
trees2 <- mutate(trees, DecayClassCode_num = as.numeric(replace(DecayClassCode, DecayClassCode == "PM", NA)))
str(trees2)
# Base R
trees3$Date <- as.Date(trees3$SampleDate, format = "%m/%d/%Y")
# dplyr approach with mutate
trees3 <- mutate(trees2, Date = as.Date(SampleDate, format = "%m/%d/%Y"))
# Base R code
names(trees2)[names(trees2) == "ScientificName"] <- "Species"
# dplyr approach with rename
trees2 <- rename(trees2, "Species" = "ScientificName")
names(trees2)
# Base R
trees2$Plot_Name <- paste(trees2$ParkUnit, trees2$PlotCode, sep = "-")
# dplyr approach with mutate
trees2 <- mutate(trees2, Plot_Name = paste(ParkUnit, PlotCode, sep = "-"))
# Base R
trees3 <- subset(trees2, IsQAQC == FALSE, select = -DecayClassCode) # Note the importance of FALSE all caps
# dplyr
trees3a <- filter(trees2, IsQAQC == FALSE)
trees3 <- select(trees3a, -DecayClassCode)
head(trees3)
trees_final <- trees |>
mutate(DecayClassCode_num = as.numeric(replace(DecayClassCode, DecayClassCode == "PM", NA)),
Plot_Name = paste(ParkUnit, PlotCode, sep = "-"),
Date = as.Date(SampleDate, format = "%m/%d/%Y")) |>
rename("Species" = "ScientificName") |>
filter(IsQAQC == FALSE) |>
select(-DecayClassCode) |>
arrange(Plot_Name, TagCode)
head(trees_final)
trees_final |> filter(Plot_Name == "MIMA-12") |> nrow()
trees_final |> filter(Plot_Name == "MIMA-12" & TreeStatusCode == "AS") |> nrow()
# Base R and dplyr combo
max_dbh <- max(trees_final$DBHcm, na.rm = TRUE)
trees_final |>
filter(DBHcm == max_dbh) |>
select(Plot_Name, SampleYear, TagCode, Species, DBHcm)
# dplyr with slice
trees_final |>
arrange(desc(DBHcm)) |> # arrange DBHcm high to low via desc()
slice(1) |> # slice the top record
select(Plot_Name, SampleYear, TagCode, Species, DBHcm)
# Base R
# create copy of trees data
trees_fix <- trees3
# find the problematic DBH value, and change it to 44.3
trees_fix$DBHcm[trees_fix$Plot_Name == "MIMA-016" & trees_fix$TagCode == 1 & trees_fix$DBHcm == 443] <- 44.3
# dplyr via replace
trees_fix <- trees |> mutate(DBHcm = replace(DBHcm, DBHcm == 443.0, 44.3))
range(trees$DBHcm)
range(trees_fix$DBHcm)
#------------------------------------
# Day 2: Conditionals Code
#------------------------------------
# Check the levels of TreeStatusCode
sort(unique(trees_final$TreeStatusCode))
alive <- c("AB", "AL", "AS", "RS")
dead <- c("DB", "DM", "DS")
trees_final <- trees_final |>
mutate(status = ifelse(TreeStatusCode %in% alive, "live", "dead"))
# nested ifelse to make alive, dead, and recruit
trees_final <- trees_final |>
mutate(status2 = ifelse(TreeStatusCode %in% dead, "dead",
ifelse(TreeStatusCode %in% "RS", "recruit",
"live")))
# Check the levels of TreeStatusCode
alive <- c("AB", "AL", "AS", "RS")
dead <- c("DB", "DM", "DS")
trees_final <- trees_final |>
mutate(status3 = case_when(TreeStatusCode %in% dead ~ 'dead',
TreeStatusCode %in% 'RS' ~ 'recruit',
TreeStatusCode %in% alive ~ 'live',
TRUE ~ 'unknown'))
table(trees_final$status2, trees_final$status3) # check that the output is the same
inv <- ACAD_wetland |> filter(Invasive == TRUE)
if(nrow(inv) > 0){print("Invasive species were detected in the data.")
} else {print("No invasive species were detected in the data.")}
native_only <- ACAD_wetland |> filter(Invasive == FALSE)
inv2 <- native_only |> filter(Invasive == TRUE)
if(nrow(inv2) > 0){print("Invasive species were detected in the data.")
} else if(nrow(inv2) == 0){print("No invasive species were detected in the data.")
} else {"Invasive species detections unclear"}
# read in wetland data if you don't already have it loaded.
ACAD_wetland <- read.csv("./data/ACAD_wetland_data_clean.csv")
# Base R using the with() function
ACAD_wetland$Status <- with(ACAD_wetland, ifelse(Protected == TRUE, "protected", "public"))
table(ACAD_wetland$Status, ACAD_wetland$Protected)
# Tidyverse
ACAD_wetland <- ACAD_wetland |> mutate(Status = ifelse(Protected == TRUE, "protected", "public"))
table(ACAD_wetland$Status, ACAD_wetland$Protected) # check your work
# Base R using the with() function and nested ifelse()
ACAD_wetland$abundance_cat <- with(ACAD_wetland, ifelse(Ave_Cov < 10, "Low",
ifelse(Ave_Cov >= 10 & Ave_Cov <= 50, "Medium", "High")))
# Tidyverse using case_when() and between
ACAD_wetland <- ACAD_wetland |> mutate(abundance_cat = case_when(Ave_Cov < 10 ~ "Low",
between(Ave_Cov, 10, 50) ~ "Medium",
TRUE ~ "High"))
table(ACAD_wetland$abundance_cat)
#------------------------------------
# Day 2: Summarizing Code
#------------------------------------
num_trees_mut <- trees_final |>
group_by(Plot_Name, SampleYear, Species) |>
mutate(num_trees = n()) |>
select(Plot_Name, SampleYear, Species, num_trees)
nrow(trees_final) #164
nrow(num_trees_mut) #164
head(num_trees_mut)
num_trees_sum <- trees_final |>
group_by(Plot_Name, SampleYear, Species) |>
summarize(num_trees = n())
nrow(trees_final) #164
nrow(num_trees_sum) #164
head(num_trees_sum)
tree_dbh <- trees_final |>
group_by(Plot_Name, SampleYear) |>
summarize(mean_dbh = mean(DBHcm),
num_trees = n(),
se_dbh = sd(DBHcm)/sqrt(num_trees),
.groups = 'drop') # prevents warning in console
tree_dbh2 <- trees_final |>
summarize(mean_dbh = mean(DBHcm),
num_trees = n(),
se_dbh = sd(DBHcm)/sqrt(num_trees),
.by = c(Plot_Name, SampleYear))
tree_dbh == tree_dbh2 # tests that all the values in 1 data frame match the 2nd.
# Using group_by()
ACAD_inv <- ACAD_wetland |> group_by(Site_Name, Year, Invasive) |>
summarize(Pct_Cov = sum(Ave_Cov),
.groups = 'drop') |> # optional line to keep console from being chatty
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_inv)
# Using summarize(.by)
ACAD_inv2 <- ACAD_wetland |>
summarize(Pct_Cov = sum(Ave_Cov), .by = c(Site_Name, Year, Invasive)) |>
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_inv2) # should be the same as ACAD_inv
# Using group_by()
ACAD_spp <- ACAD_wetland |> group_by(Site_Name, Year, Invasive) |>
summarize(num_spp = n(),
.groups = 'drop') |> # optional line to keep console from being chatty
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_spp)
# Using summarize(.by)
ACAD_spp2 <- ACAD_wetland |>
summarize(num_spp = n(), .by = c(Site_Name, Year, Invasive)) |>
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_spp2) # should be the same as ACAD_inv
# using the .by within mutate (newer solution)
ACAD_wetland <- ACAD_wetland |>
mutate(Site_Cover = sum(Ave_Cov),
.by = c(Site_Name, Year)) |>
mutate(rel_cov = (Ave_Cov/Site_Cover)*100,
.by = c(Site_Name, Year, Latin_Name, Common))
ACAD_wetland <- ACAD_wetland |> group_by(Site_Name, Year) |>
mutate(Site_Cover = sum(Ave_Cov)) |>
ungroup() # good practice to ungroup after group.
table(ACAD_wetland$Site_Name, ACAD_wetland$Site_Cover) # check that each site has a unique value.
head(ACAD_wetland)
# Create new dataset because collapsing rows on grouping variables
# Using group_by() and summarize()
ACAD_wetland_relcov <- ACAD_wetland |> group_by(Site_Name, Year, Latin_Name, Common) |>
summarize(rel_cov = (Ave_Cov/Site_Cover)*100,
.groups = 'drop') |>
ungroup()
# Using summarize(.by = )
ACAD_wetland_relcov2 <- ACAD_wetland |> #group_by(Site_Name, Year, Latin_Name, Common) |>
summarize(rel_cov = Ave_Cov/Site_Cover,
.by = c("Site_Name", "Year", "Latin_Name", "Common"))
# Check that your relative cover sums to 100 for each site
relcov_check <- ACAD_wetland_relcov2 |> group_by(Site_Name, Year) |>
summarize(tot_relcov = sum(rel_cov)*100, .groups = 'drop')
table(relcov_check$tot_relcov) # they should all be 100
#----------------------------------------------
# Day 2: Data Viz. Best Practices Code
#----------------------------------------------
library(knitr)
library(kableExtra)
covid_numbers <- read.csv("./data/covid_numbers.csv")
head(covid_numbers, 7) |>
knitr::kable(align = "c", caption = "<h6><b>Table 1.</b> Daily Covid cases and population numbers by state (only showing first 7 records)</h6>") |>
kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 12) |>
kableExtra::column_spec(1:4, background = 'white', include_thead = T)
acme_in <- read.csv("./data/acme_sales.csv") |>
dplyr::arrange(category, product)
acme_in |>
knitr::kable(align = "c", caption = "<h6><b>Table 2. </b>Average monthly revenue (in $1000's) from Acme product sales, 1950 - 2020</h6>") |>
kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 12) |>
kableExtra::column_spec(1:14, background = 'white', include_thead = T)
acme <- acme_in |>
pivot_longer(-c(category, product), names_to = "month", values_to = "revenue")
acme$month <- factor(acme$month, levels = month.abb)
ggplot(acme, aes(x=month, y=product, fill=revenue)) +
geom_raster() +
geom_text(aes(label=revenue, color = revenue > 1250)) + # color of text conditional on revenue relative to 1250
scale_color_manual(guide = "none", values = c("black", "white")) + # set color of text
scale_fill_viridis_c(direction = -1, name = "Monthly revenue,\nin $1000's") +
scale_y_discrete(limits=rev) + # reverses order of y-axis bc ggplot reverses it from the data
labs(#title = "Average monthly revenue (in $1000's) from Acme product sales, 1950 - 2020",
x = "Month", y = "Product") +
theme_bw(base_size = 11) +
facet_grid(rows = vars(category), scales = "free") # set scales to free so each facet only shows its own levels
ansc <- anscombe |>
dplyr::select(x1, y1, x2, y2, x3, y3, x4, y4)
ansc |>
knitr::kable(align = "c", caption = "<h6><b>Table 3.</b> Anscombe's Quartet - Four bivariate datasets with identical summary statistics</h6>") |>
kableExtra::column_spec (c(2,4,6),border_left = F, border_right = T) |>
kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 12) |>
kableExtra::column_spec(1:8, background = 'white', include_thead = T)
sapply(ansc, function(x) c(mean=round(mean(x), 2), var=round(var(x), 2))) |>
knitr::kable(align = "c", caption = "<h6><b>Table 4. </b>Means and variances are identical in the four datasets. The correlation between x and y (r = 0.82) is also identical across the datasets.</h6>") |>
kableExtra::column_spec (c(1,3,5,7), border_left = F, border_right = T) |>
kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 12) |>
kableExtra::column_spec(1:9, background = 'white', include_thead = T)
#------------------------------------
# Day 2: Intro to ggplot Code
#------------------------------------
knitr::opts_chunk$set(warning=FALSE, message=FALSE, fig.align = 'center', fig.height = 3, fig.width = 5)
library(ggplot2)
library(dplyr) # for filter
visits <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/ACAD_annual_visits.csv")
library(ggplot2)
visits <- read.csv("./data/ACAD_annual_visits.csv")
# Examine the data to understand data structure, data types, and potential problems
head(visits)
summary(visits)
table(visits$Year)
table(complete.cases(visits))
str(visits)
# Base R
visits$Annual_Visits <- as.numeric(gsub(",", "", visits$Annual_Visits))
# Tidyverse
library(dplyr) # load package first
visits <- visits |> mutate(Annual_Visits = as.numeric(gsub(",", "", Annual_Visits)))
str(visits) #check that it worked
p <- ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000))
p
p1a <- p + geom_line() + geom_point() # default color and shape to points
p1a
p1 <- p +
geom_line(linewidth = 0.6) +
geom_point(color = "black", fill = "#82C2a3", size = 2.5, shape = 24)
p1
p2 <- p1 + scale_y_continuous(name = "Annual visitors in 1000's",
limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) + # label at 2000, 2500, ... up to 4500
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) # label at 1994, 1999, ... up to 2024
p2
p3 <- p2 + labs(x = "Year",
title = "Annual visitation/1000 people in Acadia NP 1994 - 2024")
p3
p4 <- p3 + theme_bw()
p4
p4b <- p3 + theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1), # make x axis text bigger and angle
panel.grid.major = element_blank(), # turns of major grids
panel.grid.minor = element_blank(), # turns off minor grids
panel.background = element_rect(fill = 'white', color = 'dimgrey'), # panel white w/ grey border
plot.margin = margin(2, 3, 2, 3), # increase white margin around plot
title = element_text(size = 10) # reduce title size
)
p4b
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
geom_line() +
geom_point(color = "black", fill = "#82C2a3", size = 2.5, shape = 24) +
labs(x = "Year", title = "Annual visitation/1000 people in Acadia NP 1994 - 2024") +
scale_y_continuous(name = "Annual visitors in 1000's",
limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) +
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) +
theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1), # make x axis text bigger and angle
panel.grid.major = element_blank(), # turns of major grids
panel.grid.minor = element_blank(), # turns off minor grids
panel.background = element_rect(fill = 'white', color = 'dimgrey'), # make panel white w/ grey border
plot.margin = margin(2, 3, 2, 3), # increase white margin around plot
title = element_text(size = 10) # reduce title size
)
pq <-
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
geom_line(linewidth = 0.75, linetype = 'dashed') +
geom_point(color = "black", fill = "#0080FF", size = 2.5, shape = 21) +
labs(x = "Year", y = "Annual visits in 1,000s") +
scale_y_continuous(limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) +
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) +
theme_classic()
pq
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
geom_line(linewidth = 0.75, linetype = 'dashed') +
geom_point(color = "black", fill = "#0080FF", size = 2.5, shape = 21) +
labs(x = "Year", y = "Annual visits in 1,000s") +
scale_y_continuous(limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) +
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) +
theme_classic()
library(dplyr)
library(ggplot2)
chem <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_water_chemistry_data.csv")
library(dplyr)
chem <- read.csv("./data/NETN_water_chemistry_data.csv")
jordDO <- chem |>
filter(SiteCode == "ACJORD") |>
filter(Parameter == "DO_mgL") |>
mutate(month = as.numeric(gsub("/", "", substr(EventDate, 1, 2))))
head(jordDO)
unique(jordDO$SiteCode) # check filter worked
unique(jordDO$Parameter) # check filter worked
jordDO_sum <- jordDO |> group_by(month) |>
summarize(mean_DO = mean(Value),
num_meas = n(),
se_DO = sd(Value)/sqrt(num_meas))
jordDO_sum
ggplot(data = jordDO_sum, aes(x = month, y = mean_DO)) +
geom_col(fill = "#74AAE3", color = "dimgrey", width = 0.75) +
geom_errorbar(aes(ymin = mean_DO - 1.96*se_DO, ymax = mean_DO + 1.96*se_DO),
width = 0.75) +
theme_bw() +
labs(x = NULL, y = "Dissolved Oxygen mg/L") +
scale_x_continuous(limits = c(4, 11),
breaks = c(seq(5, 10, by = 1)),
labels = c("May", "Jun", "Jul",
"Aug", "Sep", "Oct"))
ggplot(data = jordDO, aes(x = month, y = Value, group = month)) +
geom_boxplot(outliers = F) +
geom_point(alpha = 0.2) +
theme_bw() +
labs(x = NULL, y = "Dissolved Oxygen mg/L") +
scale_x_continuous(limits = c(4, 11),
breaks = c(seq(5, 10, by = 1)),
labels = c("May", "Jun", "Jul",
"Aug", "Sep", "Oct"))
#------------------------------------
# Day 3: Pivot Code
#------------------------------------
library(dplyr)
library(stringr) # for word
bat_cap <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/example_bat_capture_data.csv")
head(bat_cap)
str(bat_cap)
bat_cap <- read.csv("./data/example_bat_capture_data.csv")
bat_cap <- bat_cap |>
mutate(genus = toupper(word(Latin, 1)), # capitalize and extract first word in Latin
species = toupper(word(Latin, 2)), # capitalize and extract second word in Latin
sppcode = paste0(substr(genus, 1, 3), # combine first 3 characters of genus and species
substr(species, 1, 3))) |>
select(-genus, -species) # drop temporary columns
head(bat_cap)
bat_sum <- bat_cap |>
summarize(num_indiv = sum(!is.na(sppcode)), # I prefer this over n()
.by = c("Site", "Year", "sppcode")) |>
arrange(Site, Year, sppcode) # helpful for ordering the future wide columns
bat_wide <- bat_sum |> pivot_wider(names_from = sppcode, values_from = num_indiv)
head(bat_wide)
bat_wide <- bat_sum |> pivot_wider(names_from = sppcode,
values_from = num_indiv,
values_fill = 0)
head(bat_wide)
table(complete.cases(bat_wide)) # all true; no blanks
bat_wide2 <- bat_sum |> pivot_wider(names_from = sppcode,
values_from = num_indiv,
values_fill = 0,
names_prefix = "spp_")
head(bat_wide2)
bat_wide_yr <- pivot_wider(bat_sum,
names_from = Year,
values_from = num_indiv,
values_fill = 0,
names_prefix = "yr")
head(bat_wide_yr)
bat_long <- bat_wide |> pivot_longer(cols = -c(Site, Year),
names_to = "sppcode",
values_to = "num_indiv")
head(bat_long)
bat_long_yr <- pivot_longer(bat_wide_yr,
cols = -c(Site, sppcode),
names_to = "Year",
values_to = "num_indiv",
names_prefix = "yr") # drops this string from values
#------------------------------------
# Day 3: Join Code
#------------------------------------
bat_sites <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/bat_site_info.csv")
sort(unique(bat_sites$Site)) # Sites 1, 2, 3, 4, 5
sort(unique(bat_wide$Site)) # Sites 1, 2, 3, 5, 6
bat_full <- full_join(bat_sites, bat_wide, by = "Site")
table(bat_full$Site)
knitr::kable(bat_full, align = 'c') |>
kableExtra::scroll_box(height = "300px") |>
kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 10) |>
kableExtra::column_spec(1:10, background = 'white', include_thead = T)
bat_inner <- inner_join(bat_sites, bat_wide, by = "Site")
table(bat_inner$Site)
knitr::kable(bat_inner) |>
kableExtra::scroll_box(height = "300px") |>
kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 10) |>
kableExtra::column_spec(1:10, background = 'white', include_thead = T)
bat_left <- left_join(bat_sites, bat_wide, by = "Site")
table(bat_left$Site)
knitr::kable(bat_left) |>
kableExtra::scroll_box(height = "300px") |>
kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 10) |>
kableExtra::column_spec(1:10, background = 'white', include_thead = T)
bat_right <- right_join(bat_sites, bat_wide, by = "Site")
table(bat_right$Site)
knitr::kable(bat_right) |>
kableExtra::scroll_box(height = "300px") |>
kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 10) |>
kableExtra::column_spec(1:10, background = 'white', include_thead = T)
anti_join(bat_sites, bat_wide, by = "Site")
anti_join(bat_wide, bat_sites, by = "Site")
spp_tbl <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_species_table.csv")
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")
# find the columns in common
intersect(names(spp_tbl), names(trees)) # TSN and ScientificName
# left join species to trees, because don't want to include species not found in tree data
trees_spp <- left_join(trees,
spp_tbl |> select(TSN, ScientificName, CommonName),
by = c("TSN", "ScientificName"))
head(trees_spp)
# find the columns in common
intersect(names(spp_tbl), names(trees)) # TSN and ScientificName
# anti join of trees against species table, selecting only columns of interest
anti_join(trees, spp_tbl, by = c("TSN", "ScientificName")) |>
select(ParkUnit, PlotCode, SampleYear, ScientificName)
#------------------------------------
# Day 3: Dates and Time Code
#------------------------------------
codes <- read.csv("./data/datetime_codes.csv", encoding = "Latin-1")
knitr::kable(codes) |>
#kableExtra::scroll_box(width = "300px") |>
kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 11,
bootstrap_options = "condensed") |>
kableExtra::column_spec(1:2, background = 'white', include_thead = T)
Sys.time()
class(Sys.time()) # POSIXct POSIXt
Sys.Date()
class(Sys.Date()) # Date
# date with slashes and full year
date_chr1 <- "3/12/2026"
date1 <- as.Date(date_chr1, format = "%m/%d/%Y")
str(date1)
# date with dashes and 2-digit year
date_chr2 <- "3-12-26"
date2 <- as.Date(date_chr2, format = "%m-%d-%y")
str(date2)
# date written out
date_chr3 <- "March 12, 2026"
date3 <- as.Date(date_chr3, format = "%b %d, %Y")
str(date3)
#Julian date as numeric
as.numeric(format(date1, format = "%j"))
#Return day of week
format(date1, format = "%A")
#Return abbreviated day of week
format(date1, format = "%a")
#Return written out date with month name
format(date1, format = "%B %d, %Y")
#Return abbreviated written out date with month name
format(date1, format = "%b %d, %Y")
date1 + 1 # add a day
date1 + 7 # add a week
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y")
# by 15 days
seq.Date(date_list[1], date_list[2], by = "15 days")
# by month
seq.Date(date_list[1], date_list[2], by = "1 month")
# by 6 months
seq.Date(date_list[1], date_list[2], by = "6 months")
format(date1, format = "%Y%m%d")
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y")
seq.Date(date_list[1], date_list[2], by = "3 months")
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y")
seq.Date(date_list[1], date_list[2], by = "1 week")
unclass(as.POSIXct("2026-03-12 01:30:00", "%Y-%m-%d %H:%M:%S", tz = "America/New_York"))
unclass(as.POSIXlt("2026-03-12 01:30:00", "%Y-%m-%d %H:%M:%S", tz = "America/New_York"))
Sys.timezone()
OlsonNames()
temp_data1 <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/HOBO_temp_example.csv")
head(temp_data1)
temp_data <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/HOBO_temp_example.csv", skip = 1)[,1:3]
colnames(temp_data) <- c("index", "date_time", "tempF")
View(temp_data)
knitr::kable(temp_data[1:50,], caption = "First 50 rows of temp_data") |>
kableExtra::scroll_box(height = "300px") |>
kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 10) |>
kableExtra::column_spec(1:3, background = 'white', include_thead = T)
temp_data$timestamp <- as.POSIXct(temp_data$date_time,
format = "%m/%d/%Y %H:%M",
tz = "America/New_York")
head(temp_data)
temp_data$date <- format(temp_data$timestamp, "%Y%m%d")
temp_data$month <- format(temp_data$timestamp, "%b")
temp_data$time <- format(temp_data$timestamp, "%I:%M")
temp_data$hour <- as.numeric(format(temp_data$timestamp, "%I"))
head(temp_data)
temp_data$month_num <- as.numeric(format(temp_data$timestamp, "%m"))
head(temp_data)
temp_data$julian <- as.numeric(format(temp_data$timestamp, "%j"))
head(temp_data)
#------------------------------------
# Day 2: Adv. ggplot Code
#------------------------------------
library(dplyr)
library(ggplot2)
library(patchwork) # for arranging ggplot objects
library(RColorBrewer) # for palettes
library(viridis) # for palettes
chem <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_water_chemistry_data.csv")
str(chem)
library(viridis)
library(RColorBrewer)
library(scales)
library(viridis)
chem <- read.csv("./data/NETN_water_chemistry_data.csv")
str(chem)
chem <- chem |> mutate(date = as.Date(EventDate, "%m/%d/%Y"),
year = as.numeric(format(date, "%Y")),
mon = as.numeric(format(date, "%m")),
doy = as.numeric(format(date, "%j")))
ACAD_lakes <- c("ACBUBL", "ACEAGL", "ACECHO", "ACJORD",
"ACLONG", "ACSEAL", "ACUHAD", "ACWHOL")
lakes_temp <- chem |> filter(SiteCode %in% ACAD_lakes) |>
filter(Parameter %in% "Temp_F")
head(lakes_temp)
ggplot(lakes_temp, aes(x = date, y = Value)) +
theme_bw() +
geom_point()
ggplot(lakes_temp,
aes(x = date, y = Value, color = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_point()
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode,
shape = SiteCode)) +
theme_bw() +
geom_point() +
scale_color_manual(values = c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")) +
scale_fill_manual(values = c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")) +
scale_shape_manual(values = c("ACBUBL" = 21, "ACEAGL" = 23,
"ACECHO" = 24, "ACJORD" = 25,
"ACLONG" = 23, "ACSEAL" = 21,
"ACUHAD" = 25, "ACWHOL" = 24))
site_cols <- c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")
site_shps <- c("ACBUBL" = 21, "ACEAGL" = 23,
"ACECHO" = 24, "ACJORD" = 25,
"ACLONG" = 23, "ACSEAL" = 21,
"ACUHAD" = 25, "ACWHOL" = 24)
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_point() +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake")
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_point(color = "dimgrey", size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake")
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_point(color = "dimgrey", size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") # answer
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_line() + # new line
geom_point(color = "dimgrey") +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)")
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = FALSE, span = 0.5, linewidth = 1) + # new line
geom_point(color = "dimgrey", alpha = 0.5) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)")
# Need to run these first
site_cols <- c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")
site_shps <- c("ACBUBL" = 21, "ACEAGL" = 23,
"ACECHO" = 24, "ACJORD" = 25,
"ACLONG" = 23, "ACSEAL" = 21,
"ACUHAD" = 25, "ACWHOL" = 24)
# Now generate the plot
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(method = "lm", se = FALSE, linewidth = 0.5) +
geom_point(color = "dimgrey", alpha = 0.85) + # or remove the alpha
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)")
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
geom_point(color = "dimgrey", alpha = 0.5) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)")
p_site <-
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temperature (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 4)
p_site
p_year <-
ggplot(lakes_temp |> filter(year > 2015),
aes(x = doy, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_line(linewidth = 0.7) +
geom_point(color = "dimgrey", size = 2.5) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temperature (F)", x = "Year") +
facet_wrap(~year, ncol = 3)
p_year
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "black", alpha = 0.6, size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temp. (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 2) +
theme(legend.position = 'bottom')
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "black", alpha = 0.6, size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temp. (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 2) +
theme(legend.position = 'bottom')
pH <- chem |> filter(SiteCode == "ACJORD") |> filter(Parameter == "pH")
temp <- chem |> filter(SiteCode == "ACJORD") |> filter(Parameter == "Temp_F")
dosat <- chem |> filter(SiteCode == "ACJORD") |> filter(Parameter == "DOsat_pct")
cond <- chem |> filter(SiteCode == "ACJORD") |> filter(Parameter == "SpCond_uScm")
p_pH <-
ggplot(pH, aes(x = date, y = Value)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
labs(y = "pH", x = "Year")
p_temp <-
ggplot(temp, aes(x = date, y = Value)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
labs(y = "Temp (F)", x = "Year")
p_do <-
ggplot(dosat, aes(x = date, y = Value)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
labs(y = "DO (%sat.)", x = "Year")
p_cond <-
ggplot(cond, aes(x = date, y = Value)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
labs(y = "Spec. Cond. (uScm)", x = "Year")
library(patchwork)
p_pH + p_temp + p_do + p_cond
library(patchwork)
p_pH / p_temp / p_do / p_cond + plot_layout(axes = "collect_x")
p_site2 <- p_site +
scale_x_date(date_labels = "%Y", date_breaks = "2 years") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
p_site2
# Find range of months in the data
range_mon <- range(lakes_temp$mon)
range_mon # 5:10
# Set up the date range as a Date type
range_date <- as.Date(c("5/1/2025", "11/01/2025"), format = "%m/%d/%Y")
axis_dates <- seq.Date(range_date[1], range_date[2], by = "1 month")
axis_dates
axis_dates_label <- format(axis_dates, "%b-%d")
# Find the doy value that matches each of the axis dates
axis_doy <- as.numeric(format(axis_dates, "%j"))
axis_doy
axis_doy_limits <- c(min(axis_doy)-1, max(axis_doy) + 1)
# Set the limits of the x axis as before and after the last sample,
# otherwise cuts off May 1 in axis
p_year +
scale_x_continuous(limits = axis_doy_limits,
breaks = axis_doy,
labels = axis_dates_label) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
p_site +
scale_x_date(date_labels = "%Y", date_breaks = "4 years") +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
p_site3 <- p_site2 + theme(legend.position = 'bottom')
p_site3
p_site3 + geom_hline(yintercept = 75, linetype = 'dashed', linewidth = 1) +
geom_hline(yintercept = 50, linetype = 'dotted', linewidth = 1)
p_site4 <- p_site3 +
geom_hline(aes(yintercept = 75, linetype = "Upper"), linewidth = 1) +
geom_hline(aes(yintercept = 50, linetype = "Lower"), linewidth = 1) +
scale_linetype_manual(values = c("dashed", "dotted"), name = "WQ Threshold")
p_site4
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5, show.legend = FALSE) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temperature (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 4) +
scale_x_date(date_labels = "%Y", date_breaks = "2 years") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
legend.position = 'bottom') +
geom_hline(aes(yintercept = 75, linetype = "Upper"), linewidth = 1) +
geom_hline(aes(yintercept = 50, linetype = "Lower"), linewidth = 1) +
scale_linetype_manual(values = c("dashed", "dotted"), name = "WQ Threshold") +
guides(fill = guide_legend(override.aes = list(alpha = 1)))
p_site4 + guides(shape = 'none', color = 'none', fill = 'none') +
theme(legend.key.width = unit(0.8, "cm"))
#------------------------------------
# Day 3: ggplot Palettes Code
#------------------------------------
display.brewer.all(colorblindFriendly = TRUE)
p_pal <- ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5, linewidth = 1) +
geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temperature (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 4) +
scale_x_date(date_labels = "%Y", date_breaks = "2 years") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
geom_hline(aes(yintercept = 75, linetype = "Upper"), linewidth = 1) +
geom_hline(aes(yintercept = 50, linetype = "Lower"), linewidth = 1) +
scale_linetype_manual(values = c("dashed", "dotted"), name = "WQ Threshold")
p_pal + scale_color_brewer(name = "Lake", palette = "Set2", aesthetics = c("fill", "color")) +
guides(fill = guide_legend(override.aes = list(alpha = 1))) # solid symbols in legend
p_pal + scale_color_brewer(name = "Lake", palette = "Dark2", aesthetics = c("fill", "color")) +
guides(fill = guide_legend(override.aes = list(alpha = 1))) # makes symbols not transparent in legend
p_pal + scale_color_brewer(name = "Lake", palette = "RdYlBu", aesthetics = c("fill", "color")) +
guides(fill = guide_legend(override.aes = list(alpha = 1))) # makes symbols not transparent in legend
# viridis
scales::show_col(viridis(12), cex_label = 0.45, ncol = 6)
p_pal + scale_color_viridis_d(name = "Lake", aesthetics = c("fill", "color")) #default viridis
p_pal + scale_color_viridis_d(name = "Lake", aesthetics = c("fill", "color"), option = 'turbo')
p_heat <-
ggplot(lakes_temp, aes(x = mon, y = year, color = Value, fill = Value)) +
theme_bw() +
geom_tile() +
labs(y = "Year", x = "Month") +
facet_wrap(~SiteCode, ncol = 4) +
scale_x_continuous(breaks = c(5, 6, 7, 8, 9, 10),
limits = c(4, 11),
labels = month.abb[5:10]) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
p_heat + scale_color_viridis_c(name = "Temp. (F)", aesthetics = c("fill", "color"))
p_heat + scale_color_viridis_c(name = "Temp. (F)", aesthetics = c("fill", "color"),
option = "plasma", direction = -1)
p_heat + scale_color_gradient(low = "#FCFC9A", high = "#F54927",
aesthetics = c("fill", 'color'),
name = "Temp. (F)")
p_heat + scale_color_gradient2(low = "navy", mid = "#FCFC9A", high = "#F54927",
aesthetics = c("fill", 'color'),
midpoint = mean(lakes_temp$Value),
name = "Temp. (F)")
p_heat + scale_color_gradientn(colors = c("#805A91", "#406AC2", "#FBFFAD", "#FFA34A", "#AB1F1F"),
aesthetics = c("fill", 'color'),
guide = "legend",
breaks = c(seq(40, 85, 5)),
name = "Temp. (F)")
p_heat + scale_color_gradient2(low = "#3E693D", mid = "#FDFFC7", high = "#7A6646",
aesthetics = c("fill", 'color'),
midpoint = mean(lakes_temp$Value),
name = "Temp. (F)")
#------------------------------------
# Day 3: Best Practices Code
#------------------------------------
# libraries
library(dplyr) # for mutate and filter
# parameters
analysis_year <- 2017
# data sets
df <- read.csv("./data/ACAD_wetland_data_clean.csv")
# Filtering on RAM sites, create new site as a number column, and only include data from specified year
df2 <- df |> filter(Site_Type == "RAM") |>
filter(Year == analysis_year) |>
mutate(site_num = as.numeric(substr(Site_Name, 5, 6)))
snake_case # most common in R
camelCase # capitalize new words after the first
period.separation # separate words by periods
whyWOULDyouDOthisTOsomeone # excess capitalization is a pain
# good word order
ACAD_wet <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_wet2 <- ACAD_wet |> filter(year > 2020)
ACAD_wet3 <- ACAD_wet2 |> mutate(plot_type = "RAM")
# bad word order
wet_ACAD <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_after_2020 <- wet_ACAD |> filter(year > 2020)
RAM_ACAD_2020 <- ACAD_after_2020 |> mutate(plot_type = "RAM")
# super long names
ACAD_wetland_sampling_data <- data.frame(years_plots_were_sampled = c(2020:2025), wetland_plots_sampled = c(1:6))
ACAD_wetland_sampling_data2 <- ACAD_wetland_sampling_data |> filter(years_plots_were_sampled > 2020)
# shorter still meaningful
ACAD_wet <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_wet2 <- ACAD_wet |> filter(year > 2020)
# Good code
trees_final <- trees |>
mutate(DecayClassCode_num = as.numeric(DecayClassCode),
Plot_Name = paste(ParkUnit, PlotCode, sep = "-"),
Date = as.Date(SampleDate, format = "%m/%d/%Y")) |>
rename("Species" = "ScientificName") |>
filter(IsQAQC == FALSE) |>
select(-DecayClassCode) |>
arrange(Plot_Name, TagCode)
# Same code, but much harder to follow
trees_final <- trees|>mutate(DecayClassCode_num=as.numeric(DecayClassCode), Plot_Name=paste(ParkUnit,PlotCode,sep = "-"), Date=as.Date(SampleDate,format="%m/%d/%Y"))|> rename("Species"="ScientificName")|>filter(IsQAQC==FALSE)|>select(-DecayClassCode)|>arrange(Plot_Name,TagCode)
# Good code
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
geom_line() +
geom_point(color = "black", fill = "#82C2a3", size = 2.5, shape = 24) +
labs(x = "Year",
y = "Annual visitors in 1000's") +
scale_y_continuous(limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) +
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) +
theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = 'white', color = 'dimgrey'),
title = element_text(size = 10)
)
# Same code but hard to follow
ggplot(data=visits,aes(x=Year,y=Annual_Visits/1000))+geom_line()+geom_point(color="black",fill="#82C2a3",size=2.5,shape=24) +
labs(x = "Year", y = "Annual visitors in 1000's")+
scale_y_continuous(limits=c(2000,4500),breaks=seq(2000,4500,by=500))+
scale_x_continuous(limits=c(1994,2024),breaks=c(seq(1994,2024,by=5)))+
theme(axis.text.x=element_text(size=10,angle=45,hjust=1), panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),panel.background=element_rect(fill='white',color='dimgrey'),
title = element_text(size = 10))
#------------------------------------
# Day 1: Challenges Code
#------------------------------------
ACAD_wetland <- read.csv(
"https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/ACAD_wetland_data_clean.csv"
)
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")
ACAD_wetland[c(2, 4, 6, 8), c(1, 2)]
# Option 1
length(unique(ACAD_wetland[, "Latin_Name"]))
# Option 2
length(unique(ACAD_wetland$Latin_Name)) # equivalent
# Option 1 - used unique to just return unique site name
unique(ACAD_wetland$Site_Name[ACAD_wetland$Protected == TRUE])
# Option 2
unique(ACAD_wetland[ACAD_wetland$Protected == TRUE, "Site_Name"])
mima12 <- subset(trees, PlotCode == 12)
nrow(mima12) # 12
mima12_as <- subset(trees, PlotCode == 12 & TreeStatusCode == "AS")
nrow(mima12_as) # 6
mima12 <- trees[trees$Plot_Name == "MIMA-012",]
table(mima12$TreeStatusCode) # 6
View(trees)
max_dbh <- max(trees$DBHcm, na.rm = TRUE)
trees[trees$DBHcm == max_dbh,]
View(trees)
max_dbh <- max(trees$DBHcm, na.rm = TRUE)
max_dbh #443
trees[trees$DBHcm == max_dbh,]
# Plot MIMA-016, TagCode = 1.
# create copy of trees data
trees_fix <- trees
# find the problematic DBH value, and change it to 44.3
trees_fix$DBHcm[trees_fix$Plot_Name == "MIMA-016" & trees_fix$TagCode == 1 & trees_fix$DBHcm == 443] <- 44.3
range(trees$DBHcm)
range(trees_fix$DBHcm)
hist(ACAD_wetland$Ave_Cov)
#---- Day 2: Challenges Code ----
library(dplyr)
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")
trees_final <- trees |>
mutate(DecayClassCode_num = as.numeric(DecayClassCode),
Plot_Name = paste(ParkUnit, PlotCode, sep = "-"),
Date = as.Date(SampleDate, format = "%m/%d/%Y")) |>
rename("Species" = "ScientificName") |>
filter(IsQAQC == FALSE) |>
select(-DecayClassCode)
ACAD_wetland <- read.csv(
"https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/ACAD_wetland_data_clean.csv"
)
library(ggplot2)
library(dplyr) # for mutate
visits <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/ACAD_annual_visits.csv")
visits <- visits |> mutate(Annual_Visits = as.numeric(gsub(",", "", Annual_Visits)))
trees_final |> filter(Plot_Name == "MIMA-12") |> nrow()
trees_final |> filter(Plot_Name == "MIMA-12" & TreeStatusCode == "AS") |> nrow()
# Base R and dplyr combo
max_dbh <- max(trees_final$DBHcm, na.rm = TRUE)
trees_final |>
filter(DBHcm == max_dbh) |>
select(Plot_Name, SampleYear, TagCode, Species, DBHcm)
# dplyr with slice
trees_final |>
arrange(desc(DBHcm)) |> # arrange DBHcm high to low via desc()
slice(1) |> # slice the top record
select(Plot_Name, SampleYear, TagCode, Species, DBHcm)
# Base R
# create copy of trees data
trees_fix <- trees3
# find the problematic DBH value, and change it to 44.3
trees_fix$DBHcm[trees_fix$Plot_Name == "MIMA-016" & trees_fix$TagCode == 1 & trees_fix$DBHcm == 443] <- 44.3
# dplyr via replace
trees_fix <- trees |> mutate(DBHcm = replace(DBHcm, DBHcm == 443.0, 44.3))
range(trees$DBHcm)
range(trees_fix$DBHcm)
# read in wetland data if you don't already have it loaded.
ACAD_wetland <- read.csv("./data/ACAD_wetland_data_clean.csv")
# Base R using the with() function
ACAD_wetland$Status <- with(ACAD_wetland, ifelse(Protected == TRUE, "protected", "public"))
table(ACAD_wetland$Status, ACAD_wetland$Protected)
# Tidyverse
ACAD_wetland <- ACAD_wetland |> mutate(Status = ifelse(Protected == TRUE, "protected", "public"))
table(ACAD_wetland$Status, ACAD_wetland$Protected) # check your work
# Base R using the with() function and nested ifelse()
ACAD_wetland$abundance_cat <- with(ACAD_wetland, ifelse(Ave_Cov < 10, "Low",
ifelse(Ave_Cov >= 10 & Ave_Cov <= 50, "Medium", "High")))
# Tidyverse using case_when() and between
ACAD_wetland <- ACAD_wetland |> mutate(abundance_cat = case_when(Ave_Cov < 10 ~ "Low",
between(Ave_Cov, 10, 50) ~ "Medium",
TRUE ~ "High"))
table(ACAD_wetland$abundance_cat)
# Using group_by()
ACAD_inv <- ACAD_wetland |> group_by(Site_Name, Year, Invasive) |>
summarize(Pct_Cov = sum(Ave_Cov),
.groups = 'drop') |> # optional line to keep console from being chatty
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_inv)
# Using summarize(.by)
ACAD_inv2 <- ACAD_wetland |>
summarize(Pct_Cov = sum(Ave_Cov), .by = c(Site_Name, Year, Invasive)) |>
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_inv2) # should be the same as ACAD_inv
# Using group_by()
ACAD_spp <- ACAD_wetland |> group_by(Site_Name, Year, Invasive) |>
summarize(num_spp = n(),
.groups = 'drop') |> # optional line to keep console from being chatty
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_spp)
# Using summarize(.by)
ACAD_spp2 <- ACAD_wetland |>
summarize(num_spp = n(), .by = c(Site_Name, Year, Invasive)) |>
arrange(Site_Name) # sort by Site_Name for easier comparison
head(ACAD_spp2) # should be the same as ACAD_inv
# using the .by within mutate (newer solution)
ACAD_wetland <- ACAD_wetland |>
mutate(Site_Cover = sum(Ave_Cov),
.by = c(Site_Name, Year)) |>
mutate(rel_cov = (Ave_Cov/Site_Cover)*100,
.by = c(Site_Name, Year, Latin_Name, Common))
# older solution
ACAD_wetland <- ACAD_wetland |> group_by(Site_Name, Year) |>
mutate(Site_Cover = sum(Ave_Cov)) |>
ungroup() # good practice to ungroup after group.
table(ACAD_wetland$Site_Name, ACAD_wetland$Site_Cover) # check that each site has a unique value.
head(ACAD_wetland)
# Create new dataset because collapsing rows on grouping variables
# Using group_by() and summarize()
ACAD_wetland_relcov <- ACAD_wetland |> group_by(Site_Name, Year, Latin_Name, Common) |>
summarize(rel_cov = (Ave_Cov/Site_Cover)*100,
.groups = 'drop') |>
ungroup()
# Using summarize(.by = )
ACAD_wetland_relcov2 <- ACAD_wetland |> #group_by(Site_Name, Year, Latin_Name, Common) |>
summarize(rel_cov = Ave_Cov/Site_Cover,
.by = c("Site_Name", "Year", "Latin_Name", "Common"))
# Check that your relative cover sums to 100 for each site
relcov_check <- ACAD_wetland_relcov2 |> group_by(Site_Name, Year) |>
summarize(tot_relcov = sum(rel_cov)*100, .groups = 'drop')
table(relcov_check$tot_relcov) # they should all be 100
pq <-
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
geom_line(linewidth = 0.75, linetype = 'dashed') +
geom_point(color = "black", fill = "#0080FF", size = 2.5, shape = 21) +
labs(x = "Year", y = "Annual visits in 1,000s") +
scale_y_continuous(limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) +
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) +
theme_classic()
pq
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
geom_line(linewidth = 0.75, linetype = 'dashed') +
geom_point(color = "black", fill = "#0080FF", size = 2.5, shape = 21) +
labs(x = "Year", y = "Annual visits in 1,000s") +
scale_y_continuous(limits = c(2000, 4500),
breaks = seq(2000, 4500, by = 500)) +
scale_x_continuous(limits = c(1994, 2024),
breaks = c(seq(1994, 2024, by = 5))) +
theme_classic()
#------------------------------------
# Day 3: Challenges Code
#------------------------------------
library(dplyr)
# bat capture data
bat_cap <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/example_bat_capture_data.csv")
library(dplyr)
# tree data
trees <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_data.csv")
# tree species table
spp_tbl <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_tree_species_table.csv")
# Hobo Temp data
temp_data <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/HOBO_temp_example.csv", skip = 1)[,1:3]
colnames(temp_data) <- c("index", "temp_time", "tempF")
temp_data$timestamp_temp <- as.POSIXct(temp_data$temp_time,
format = "%m/%d/%Y %H:%M",
tz = "America/New_York")
# Water chemistry data for ggplot section
library(dplyr)
library(ggplot2)
library(patchwork) # for arranging ggplot objects
library(RColorBrewer) # for palettes
library(viridis) # for palettes
chem <- read.csv("https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/NETN_water_chemistry_data.csv")
chem <- chem |> mutate(date = as.Date(EventDate, "%m/%d/%Y"),
year = as.numeric(format(date, "%Y")),
mon = as.numeric(format(date, "%m")),
doy = as.numeric(format(date, "%j")))
ACAD_lakes <- c("ACBUBL", "ACEAGL", "ACECHO", "ACJORD",
"ACLONG", "ACSEAL", "ACUHAD", "ACWHOL")
lakes_temp <- chem |> filter(SiteCode %in% ACAD_lakes) |>
filter(Parameter %in% "Temp_F")
bat_wide_yr <- pivot_wider(bat_sum, names_from = Year,
values_from = num_indiv,
values_fill = 0,
names_prefix = "yr")
head(bat_wide_yr)
bat_long_yr <- pivot_longer(bat_wide_yr,
cols = -c(Site, sppcode),
names_to = "Year",
values_to = "num_indiv",
names_prefix = "yr") # drops this string from values
# find the columns in common
intersect(names(spp_tbl), names(trees)) # TSN and ScientificName
# left join species to trees, because don't want to include species not found in tree data
trees_spp <- left_join(trees,
spp_tbl |> select(TSN, ScientificName, CommonName),
by = c("TSN", "ScientificName"))
head(trees_spp)
# find the columns in common
intersect(names(spp_tbl), names(trees)) # TSN and ScientificName
# anti join of trees against species table, selecting only columns of interest
anti_join(trees, spp_tbl, by = c("TSN", "ScientificName")) |>
select(ParkUnit, PlotCode, SampleYear, ScientificName)
format(date1, format = "%Y%m%d")
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y")
seq.Date(date_list[1], date_list[2], by = "3 months")
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y")
seq.Date(date_list[1], date_list[2], by = "1 week")
temp_data$month_num <- as.numeric(format(temp_data$timestamp_temp, "%m"))
head(temp_data)
temp_data$julian <- as.numeric(format(temp_data$timestamp_temp, "%j"))
head(temp_data)
# Will need to run these first
site_cols <- c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")
site_shps <- c("ACBUBL" = 21, "ACEAGL" = 23,
"ACECHO" = 24, "ACJORD" = 25,
"ACLONG" = 23, "ACSEAL" = 21,
"ACUHAD" = 25, "ACWHOL" = 24)
# Now generate the plot
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_point(color = "dimgrey", size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)") # answer
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(method = "lm", se = FALSE, linewidth = 0.5) +
geom_point(color = "dimgrey", alpha = 0.85) + # or remove the alpha
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)")
# Need to run these first
site_cols <- c("ACBUBL" = "#7FC7E1", "ACEAGL" = "#ED5C5C",
"ACECHO" = "#F59617", "ACJORD" = "#A26CF5",
"ACLONG" = "#1952CF", "ACSEAL" = "#F3F56C",
"ACUHAD" = "#FFA6C8", "ACWHOL" = "#8FD184")
site_shps <- c("ACBUBL" = 21, "ACEAGL" = 23,
"ACECHO" = 24, "ACJORD" = 25,
"ACLONG" = 23, "ACSEAL" = 21,
"ACUHAD" = 25, "ACWHOL" = 24)
# Now generate the plot
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(method = "lm", se = FALSE, linewidth = 0.5) +
geom_point(color = "dimgrey", alpha = 0.85) + # or remove the alpha
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)")
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
geom_point(color = "dimgrey", alpha = 0.5) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(x = "Year", y = "Temp. (F)")
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "black", alpha = 0.6, size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temp. (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 2) +
theme(legend.position = 'bottom')
ggplot(lakes_temp, aes(x = date, y = Value, color = SiteCode,
fill = SiteCode, shape = SiteCode)) +
theme_bw() +
geom_smooth(se = F, span = 0.5) +
geom_point(color = "black", alpha = 0.6, size = 2) +
scale_color_manual(values = site_cols, name = "Lake") +
scale_fill_manual(values = site_cols, name = "Lake") +
scale_shape_manual(values = site_shps, name = "Lake") +
labs(y = "Temp. (F)", x = "Year") +
facet_wrap(~SiteCode, ncol = 2) +
theme(legend.position = 'bottom')
p_site +
scale_x_date(date_labels = "%Y", date_breaks = "4 years") +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
p_pal + scale_color_brewer(name = "Lake", palette = "RdYlBu", aesthetics = c("fill", "color")) +
guides(fill = guide_legend(override.aes = list(alpha = 1))) # makes symbols not transparent in legend
p_heat + scale_color_gradient2(low = "#3E693D", mid = "#FDFFC7", high = "#7A6646",
aesthetics = c("fill", 'color'),
midpoint = mean(lakes_temp$Value),
name = "Temp. (F)")
?plot
?dplyr::filter
mean_x <- mean(c(1, 3, 5, 7, 8, 21) # missing closing parentheses
mean_x <- mean(c(1, 3, 5, 7, 8, 21)) # correct
birds <- c("black-capped chickadee", "golden-crowned kinglet, "wood thrush") # missing quote after kinglet
birds <- c("black-capped chickadee", "golden-crowned kinglet", "wood thrush") # corrected
birds <- c("black-capped chickadee", "golden-crowned kinglet" "wood thrush") # missing comma after kinglet
birds <- c("black-capped chickadee", "golden-crowned kinglet", "wood thrush") # corrected
x_mean <- maen(x) # misspelled mean
x_mean <- mean(x) # Corrected
# Missing comma to indicate subsetting rows (records)
ACAD_wetland2 <- ACAD_wetland[!is.na(ACAD_wetland$Site_Name)]
# Correct
ACAD_wetland2 <- ACAD_wetland[!is.na(ACAD_wetland$Site_Name), ]