Maine Maritime Academy Intro to R Training: April 2026

Prep for Training

Installing required software

The only prerequisite for the R training is to install the latest version of R and RStudio on your computer and the packages we're using for the training. These are available online and are free to download and install. We'll talk about the difference between R and RStudio on the first day, but for now, just make sure they're installed.

  1. Install R: Download and install R 4.5.3 for Windows or for for macOS.
  2. Install R Studio: The main RStudio page has links to download RStudio-2026.01.01-403.EXE for Windows (assuming your operating system is at least Windows 10), and RStudio-2026.01.01-403.DMG for macOS (assuming OS 14 or greater)

NOTE: If you already have R and RStudio on your computer, the R version should be at least 4.4.3 and the RStudio version at least 2025.06.2-418 to make sure everyone's code behaves the same way. You can check your software versions by opening RStudio. The console in the bottom left will display the R version. Check the RStudio version by going to Help > About RStudio. The first 4 numbers should be 2025 or higher.


Required Packages

A number of packages are required to follow along with data wrangling and visualization sessions. Please try to install these in RStudio ahead of time by running the code below. If you don't know how to run the code, open view the Running Code Screencast below for how to do this.

packages <- c("tidyverse", # for Day 2 and 3 data wrangling
              "RColorBrewer", "viridis", "patchwork", # for Day 3 ggplot
              "readxl", "writexl", # for day 1 importing from excel
              "car") # for Levene's test - also a great stats R package

install.packages(setdiff(packages, rownames(installed.packages())))  

# Check that installation worked
library(tidyverse) # turns on core tidyverse packages
library(RColorBrewer) # palette generator
library(viridis) # more palettes
library(patchwork) # multipanel plots
library(readxl) # reading xlsx
library(writexl) # writing xlsx
Note that when you turn on the tidyverse package via library(tidyverse), you will get the following message in your console. This means the tidyverse successfully loaded on your machine and is not an error.
tidyverse message


Running Code Screencast


Check you can download data from github

We'll download these as we run through the code. The check is that your computer doesn't have security restrictions that prevent importing datasets from github into R. All datasets used in this training will also be available via a zip file. The github hosted datasets are just easier to make sure everyone downloads the same version and is grabbing them from the same place.

Copy the code below into RStudio and run it. If you don't get an error message, you're good.

If you do get an error message, you'll just need to import the datasets locally using the provided zip file.

motinv <- read.csv(
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/BASHAR_motile_invert_counts.csv")
Optional Reading This is completely optional, but if you have any time before training starts, I highly recommend reading Chapter 2: R basics and workflows in STAT545. This is an online book based on a graduate level statistics class that Dr. Jenny Bryan teaches and is turning into this book. She's a stats professor turned senior programmer at RStudio, and I really like how she approaches programming R in this book.


About the Training
  • Timing: The training will take place over three consecutive Fridays (4/3, 4/10, 4/17). Each day will run from 12:30 - 3:30pm EST. We'll take short breaks between topics.
  • Office Hours: For the day prior to training, I will be available virtually from 12:30 - 2pm EST for office hours in case there are questions that couldn't be handled before or during training. I will send a Teams link to participants for this office hours via email. Folks also welcome to email me questions.
  • Structure: For most of the training, I will share my screen as I go through the website and then demo with live coding. Participants should have their own laptop to follow along with the code.
  • Objectives: Three days is barely enough to scratch the surface on what you can do in R. My goals with this training are to:
    1. Help you get beyond the initial learning curve that can be really challenging to climb on your own.
    2. Expose you to what I consider are the useful things R can do for us.
    3. Provide you the tools needed to continue advancing your R skills on your own.
  • Credit: Much of this training was borrowed heavily from NPS IMD Intro to R training in 2022. A ton of credit for this training goes to the developers of those lessons:
    • Day 1 Intro to R: Sarah Wright and Andrew Birch
    • Day 2 Data Wrangling: John Paul Schmit and Lauren Pandori
    • Day 3 Data Visualization: Ellen Cheng and Kate Miller (Spatial Data)
    • Day 4 Programming Best Practices: Sarah Wright and Thomas Parr
  • Feedback: To help me improve this training for future sessions, please leave feedback in the training feedback form in the initial email. You can submit feedback multiple times and don't need to answer every question. Responses are anonymous.


Day 1: Intro to R

Day 1 Goals

Goals for Day 1:

  1. Get comfortable navigating RStudio, such as opening a new project or script, running code and viewing the output, etc.
  2. Ability to import and save .csv and .xlsx files.
  3. Basic understanding of variables, functions, and data frames.
  4. Ability to explore data frames, such as dimensions (rows and columns), min/max of different columns, data type of column, basic plotting, etc.
  5. Basic understanding of square brackets to view and filter data.frame[rows, columns].
  6. Exposure to NAs (blanks).
  7. Able to access help within and outside of R.


Feedback: Please leave feedback in the training feedback form. You can submit feedback multiple times and don't need to answer every question. Responses are anonymous.

R journey
Artwork by @allison_horst


Intro to R

Why I love R:

R welcoming illustration Artwork by @allison_horst

There are many reasons to use R. Some of my top reasons are below:
  • It's free!
  • Thorough, helpful, and welcoming user community, including a ton of freely available online help and learning resources (see Resources tab).
  • Large user community among researchers to collaborate and share code.
  • Language was designed by statisticians to facilitate data analysis and visualization.
    • Relatively shallow learning curve compared to other coding languages (e.g. python).
    • Developers' philosophy was to not have to know how to program to learn R. Then as you become more advanced and do more complicated tasks, learning how to program will benefit your work.
  • Code documents your workflow.
  • Code builds on code.
  • Automating tasks, like QA/QC, compiling/querying data, calculating summary statistics, and building dashboards has improved our data quality and made our data accessible and easy to work with for other users.

Other benefits of R:
  • Base R maintains backwards compatibility, so that code written in base R, regardless of R version should run.
    • Caveats are that packages are not guaranteed to be backwards compatible.
    • Python is not backwards compatible
  • The tidyverse, which is a collection of really useful R packages, makes code more readable and consistently formatted. Tidyverse packages aren't always backwards compatible, but they tend to be pretty stable and are super helpful for data wrangling and plotting.
  • RStudio can integrate with other coding languages, such as SQL, HTML/CSS, python and javascript.

Recipe for learning R Everyone learns differently, but the ingredients I see that most ensure success are:
  • Community: Finding a group of other R users you can reach out to when you're stuck or need feedback is invaluable. I was lucky to be part of a group that was learning R together. I still collaborate with many of these folks. I'm hoping you all see this group as your R community.
  • Persistence: Keep trying new things in R, even if you ultimately have to abandon attempts and go back to what you know, like Access or Excel. Persistence pays off.
  • Fearlessness: You have to be okay with failing the first, second, or tenth time to solve a problem, at least at first. As you get more comfortable, your success rate will improve. Throughout that entire process, you're learning R.
  • Googling: Half of being a good coder is learning how to google what you're trying to do, or the issue you're having. At first, you may not find the answers you're looking for, but by reading help pages, like StackOverflow, you're learning to read code and seeing solutions that may help you in the future.
Debugging rollercoaster
Artwork by @allison_horst

AI soapbox Why I don't use AI to write code:
  • Behind the scenes, AI is taking answers from websites and other sources without crediting them.
  • There's a huge environmental footprint to run the generative AI servers.
  • Research has shown that people who use AI to write for them lose their ability to write and think critically over time. Writing code isn't that different. If you're not actively writing the code you are using, your ability to debug and verify code is doing what you expect may be weakened. A recent article in Frontiers in Ecology and the Environment cautioning early career scientists against using AI for scientific writing captures this concern well.
  • To prevent AI responses returned by google searches, include -ai in google search box.

R and RStudio

About R

R is a programming language that was originally developed by statisticians for statistical computing and graphics. R is free and open source. That means you will never need a paid license to use it, and you can view the underlying source code of any function and suggest fixes and improvements. Since its first official release in 1995, R remains one of the leading programming languages for statistics and data visualization, and its capabilities continue to grow.

When you install R, it comes with a simple user interface that lets you write and execute code. However, writing code in this interface is similar to writing a report in Notepad: it's simple and straightforward, but you likely need more features than Notepad has to format your document. This is where RStudio comes in.

For more information on the history of R, visit the R Project website.


About RStudio RStudio is what's called an integrated development environment (IDE), which is essentially a shell around the R program. RStudio makes programming in R easier by color coding different types of code, auto-completing code, flagging mistakes, and providing many useful tools with the push of a button or key stroke (e.g. viewing help info).


RStudio Anatomy When you open RStudio, you typically see 4 panes:
RStudio panes

Source

This is primarily where you write code. When you create a new script or open an existing one, it displays here. In the screenshot above, there's a script called bat_data_wrangling.R open in the source pane. Note that if you haven't yet opened or created a new script, you won't see this pane until you do.

The source pane color-codes your code to make it easier to read, and detects syntax errors (the coding equivalent of a spell checker) by flagging the line number with a red "x" and showing a squiggly line under the offending code.

When you're ready to run all or part of your script:
  • Highlight the line(s) of code you want to run
  • Either click the "Run" button (top right of the source pane) or press Ctrl+Enter.
At this point, the code is sent to the console (the bottom left pane). You'll first see your code appear in the console, and then you'll see the output if there is any.

Console

This is where the code actually runs. When you first open RStudio, the console will tell you the version of R that you're running (should be R 4.4.1 or greater).

While most often you'll run code from a script in the source pane, you can also run code directly in the console. Code in the console won't get saved to a file, but it's a great way to experiment and test out lines of code before adding them to your script in the source pane. The console is also where errors appear if your code breaks. Deciphering errors can be a challenge that gets easier over time. Googling errors is a good place to start.

Environment/History/Connections
  • Environment: This is where you can see what is currently in your environment. Think of the environment as temporary storage for objects - things like datasets and stored values - that you are using in your script(s). You can also click on objects and view them. Anything you see in your environment is temporary and it will disappear when you restart R. If there is something in your environment that you want to access in the future, make sure your script is able to reproduce it (or save it to a file).
  • History: This shows the code you've run in the current session. It's not good to rely on it, but it can be a way to recover code you ran in the console and later realized you needed in your script.
  • Connections: This is one way to connect R to a database.
  • Git: If you have installed Git on your computer, you may see a Git tab. We won't talk much about it this week, but this is where you'll keep track of changes to your code.
  • Tutorial: This has some interactive tutorials that you can check out if you are interested.

Workspace
  • Files: This tab shows the files within your working directory (typically the folder where your current code project lives). More on this later.
  • Plots: This tab will show plots that you create.
  • Packages: This tab allows you to install, load, and update packages, and also view the help files within each package. You can also access these files in code.
  • Help: Allows you to search for and view documentation for packages that are installed on your computer.
  • Viewer: Shows HTML outputs produced by R Markdown, R Shiny, and some plotting and mapping packages.


RStudio Global Options There are several settings in the Global Options that everyone should check to make sure we all have consistent settings. Go to Tools -> Global Options and follow the steps below.
  1. Under the General tab, you should see that your R Version is [64-bit] and the version is R-4.4.3 or greater. If it's not, you probably need to update R. Let me know if you need help with this.
  2. Also in the General tab, make sure you are not saving your environment. To do this, uncheck the Restore .RData into your workspace at startup option. When this option is checked, R Studio saves your current working environment (the stuff in the Environment tab) when you exit. The next time you open R Studio, it restores that same environment.
    • This may seem uesful, but part of the point of using R is that your code should return the same results every time you run it. Clearing the environment when you close RStudio forces you to run your code with a clean slate.
    • Set Save workspace to .RData on exit: to Never. The only reason not to set to "Never" is if you are working with a huge dataset that takes a long time to load and process. In that case, you may want to set Save workspace to .RData on exit to "Ask". When you close RStudio, it will ask you if you want to save your workspace image.
  3. Change default pipe to base R pipe by going to the Code tab, and check the box Use native pipe operator, |> (requires R 4.1+). We will discuss what this pipe means tomorrow.
  4. Most other settings are whatever you prefer. For example, to change the color of your background and text, go to the Appearance tab. I prefer Cobalt.

Project and File Setup

File organization

File organization is an important part of being a good coder. Keeping code, input data, and results together in one place will protect your sanity and the sanity of the person who inherits the project. R Studio projects help with this. Creating a new R Studio project for each new code project makes it easier to manage settings and file paths.

Before we create a project, take a look at the Console tab. Notice that at the top of the console there is a folder path. That path is your current working directory.
Default working directory

If you refer to a file in R using a relative path, for example ./data/my_data_file.csv, R will look in your current working directory for a folder called data containing a file called my_data_file.csv.

Note the use of forward slashes instead of back slashes for file paths. You can either use a forward slash (/) or a double back slash for file paths. The paths below are equivalent and the full file path the relative path above is specifying.

# forward slash file path approach
"C:/Users/KMMiller/OneDrive = DOI/data/"

# backward slash file path approach
"C:\\Users\\KMMiller\\OneDrive = DOI\\data\\"
Using relative paths is a helpful because the full path will be specific to your computer and likely won't work on a different computer. But there's no guarantee that everyone has the same default R working directory. This is where projects come in. Projects package all of your code, data, output, etc. into a file type that is easily transferrable to other machines regardless of file location.

Start a new Project To demonstrate the value of a project, we'll create and use one for this class. Click File > New Project. In the window that appears, select New Directory, then select New Project. You will be prompted for a directory name. This is the name of your project folder. For this class, call it mma_r_intro. Next, you'll select what folder to keep your project folder in. Documents/R is a good place to store all of your R projects but it's up to you. When you are done, click on Create Project.

Step 1. Select New Directory New project step 1

Step 2. Select New Project New project step 2

Step 3. Name project mma_r_intro Save project to a place you can find it. Don't worry about whether the git repository box is checked or not.
New project step 3


If you successfully started a project named mma_r_intro, you should see it listed at the very top right of your screen. As you start new projects, you'll want to check that you're working in the right one before you start coding. Take a look at the Console tab again. Notice that your current working directory is now your project folder. When you look in the Files tab of the bottom right pane, you'll see that it also defaults to the project folder.

We also want to create a folder called "data", where we will store datasets we're using for this class. To do that, you can either go to Windows Explorer and add a new folder, or you run the code below. As long as you're working within your project (project name should be at the top right of window), a folder named data will appear within your project. You can check that it worked by using the list.files() function, which lists everything in the working directory of your project.


Add data folder and download data

We're going to store all of our datasets for this training in a data folder. First, create the data folder in your project using the code below.

Create data folder

dir.create("data")
list.files() # you should see a data folder listed 

Download files from github repo

We're going to try downloading all of the datasets you're going to use for the training into this data folder. Copy the code in the chunk below and paste it into a new script (File > New File > R Script). Then select and run the code.

file_list <- c(
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/ACAD_Jordan_Pond_water_chem.csv",
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/BASHAR_motile_invert_counts.csv", 
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/BASHAR_Point_Intercept_data.csv", 
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/bat_site_info.csv", 
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/bat_captures.csv", 
  "https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/HOBO_temp_example.csv", 
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/motile_invert_species_table.csv",
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/SHIHAR_photoplot_cover.csv")

file_names <- sub(".*data/", "",  file_list)

lapply(seq_along(file_list), function(x){
    download.file(file_list[x], 
                  destfile = paste0("./data/", file_names[x]))
})

If you're not able to download, extract the MMI_R_training_data.zip file into this new data folder.

Start Coding

Start a new script First let's create a new R script file called day_1_script.R. Make sure you are working in the mma_r_intro project that you just created. Click on the New File icon new script in the top left corner. In the dropdown, select R Script. The source pane will appear with an untitled empty script. Go ahead and save it by clicking the Save icon (and make sure the Source on Save checkbox is deselected). Call your new script day_1_script.R.

Coding basics

We'll start with something simple. Basic math in R is pretty straightforward and the syntax is similar to simply using a graphing calculator. You can use the examples below or come up with your own. Even if you're using the examples, try to actually type the code instead of copy-pasting - you’ll learn to code faster that way.

To run a single line of code in your script, place your cursor anywhere in that line and press CTRL+ENTER (or click the Run button in the top right of the script pane). To run multiple lines of code, highlight the lines you want to run and hit CTRL+ENTER or click Run.

To leave notes in your script, use the hashtag/pound sign (#). This will change the color of text that R reads as a comment and doesn't run. Commenting your code is one of the best habits you can form. Comments are a gift to your future self and anyone else who tries to use your code.

Type code below in your script and run each line.

# Commented text: try this line to generate some basic text and become familiar with where results will appear:
print("Welcome to R!")
## [1] "Welcome to R!"
# simple math
1+1
## [1] 2
(2*3)/4
## [1] 1.5
sqrt(9)
## [1] 3
# calculate basal area of tree with 14.6cm diameter; note pi is built in constant in R
(14.6^2)*pi
## [1] 669.6619
# get the cosine of 180 degrees - note that trig functions in R expect angles in radians
cos(pi)
## [1] -1

Coding Tip: Notice that when you run a line of code, the code and the result appear in the console. You can also type code directly into the console, but it won't be saved anywhere. As you get more comfortable with R, it can be helpful to use the console as a "scratchpad" for experimenting and troubleshooting. For now, it's best to err on the side of saving your code as a script so that you don't accidentally lose useful work.


Variables

Occasionally, it's enough to just run a line of code and display the result in the console. But typically our code is more complex than adding one plus one, and we want to store the result and use it later in the script. This is where variables come in. Variables allow you to assign a value (whether that's a number, a data table, a chunk of text, or any other type of data that R can handle) to a short, human-readable name. Anywhere you put a variable in your code, R will replace it with its value when your code runs. Variables are also called objects in R.

R uses the <- symbol for variable assignment. If you've used other programming languages, you may be tempted to use = instead. It will work, but there are subtle differences between <- and =, so you should get in the habit of using <-.

R is case-sensitive. So if you name one object fishdata and another Fishdata or FISHDATA, R will interpret these all as unique objects. While you can do things like this, it's best practice not to use the same name for different objects, as it makes code difficult to follow.

Type code below to assign values to variables named a and b

# the value of 12.098 is assigned to variable 'a'
a <- 12.098

# and the value 65.3475 is assigned to variable 'b'
b <- 65.3475

# we can now perform whatever mathematical operations we want using these two 
# variables without having to repeatedly type out the actual numbers:

a*b
## [1] 790.5741
(a^b)/((b+a))
## [1] 7.305156e+68
sqrt((a^7)/(b*2))
## [1] 538.7261

In the code above, we assign the variables a and b once. We can then reuse them as often as we want. This is helpful because we save ourselves some typing, reduce the chances of making a typo somewhere, and if we need to change the value of a or b, we only have to do it in one place.

Also notice that when you assign variables, you can see them listed in your Environment tab (top right pane). Remember, everything you see in the environment is just in R's temporary memory and won't be saved when you close out of RStudio.

All of the examples you've seen so far are fairly contrived for the sake of simplicity. Let's take a look at some code that everyone here will make use of at some point: reading data from a CSV.


Functions

It's hard to get very far in R without making use of functions. Think of a function as a programmed task that takes some kind of input (the argument(s)) from the user and outputs a result (the return value).

anatomy of a function


Coding Tip: Note the difference in how RStudio color codes what it thinks are functions. There are a lot of pre-programmed functions in base R, which is what comes along with R when you install R. Installing R packages will add additional functions. You can also build your own. Names that R recognizes as a function are color coded differently than what R recognizes as text, numbers, etc. It's good practice to not use existing functions as new object names.

Commonly used base R functions include:
  • mean(): calculate the mean of a set of numbers
  • min(): calculate the minimum of a set of numbers
  • max(): calculate the maximum of a set of numbers
  • range(): calculate the min and max of a set of numbers
  • sd(): calculate the standard deviation of set of numbers
  • sqrt(): calculate the square root of a value

Calculate mean and range to see how functions work

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# equivalent to x <- 1:10

# bad coding
#mean <- mean(x)

# good coding 
mean_x <- mean(x)
mean_x
## [1] 5.5
range_x <- range(x)
range_x
## [1]  1 10


Importing and Saving Data

Most of the work we do in R relies on one or more existing datasets that we want to query or summarize, rather than creating our own in R. Importing data in R is therefore an important skill. R can import just about any data type, including CSV and MS Excel files. You can also import tables from MS Access and SQL databases using ODBC drivers. That's beyond the scope of this class, but I can share examples for anyone needing to import from a database. For now, I'll show how to work with CSVs and Excel spreadsheets.

Import CSV

We use the read.csv() function to import CSVs in R. The read.csv() function takes the file path or url to the CSV as input and outputs a data frame containing the data from the CSV. Here we're going to read a CSV from a website, then save that in the data folder of our project. We'll talk more about what data frames are next.

Run the following line to import a teaching data set of motile invertebrates collected in the Bass Harbor rocky intertidal zone from the github repository for this training

# read in the data from BASHAR_motile_invert_counts.csv and assign it as a dataframe to the variable "motinv"
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")

View the data in a separate window by running the View() function.

# View the BASHAR_motile data frame we just created
View(motinv)

Or, check out the first few or last few records in your console. Click on View R output to view output.

# Look at the top 6 rows of the data frame
head(motinv)
View R output
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 3    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 4    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 5    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 6    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
##       ScientificName        CommonName SpeciesCode Damage No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT      0         2         No
## 2 Littorina littorea Common periwinkle      LITLIT      0         3         No
## 3 Littorina obtusata Smooth periwinkle      LITOBT      1         2         No
## 4 Littorina obtusata Smooth periwinkle      LITOBT      0         6         No
## 5   Nucella lapillus          Dogwhelk      NUCLAP      0         1         No
## 6 Littorina littorea Common periwinkle      LITLIT      0         2         No

# Look at the bottom 6 rows of the data frame
tail(motinv)
View R output
##     Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 677    NETN     ACAD   BASHAR 7/18/2021 2021 FALSE       R5     Red Algae
## 678    NETN     ACAD   BASHAR 5/22/2022 2022 FALSE       R5     Red Algae
## 679    NETN     ACAD   BASHAR 5/22/2022 2022 FALSE       R5     Red Algae
## 680    NETN     ACAD   BASHAR 5/22/2022 2022 FALSE       R5     Red Algae
## 681    NETN     ACAD   BASHAR 6/12/2023 2023 FALSE       R5     Red Algae
## 682    NETN     ACAD   BASHAR 6/29/2024 2024 FALSE       R5     Red Algae
##                ScientificName        CommonName SpeciesCode Damage No.Damage
## 677 Testudinalia testudinalis            Limpet      TECTES      0         1
## 678        Littorina littorea Common periwinkle      LITLIT      5        45
## 679        Littorina obtusata Smooth periwinkle      LITOBT      0         1
## 680 Testudinalia testudinalis            Limpet      TECTES      0         2
## 681        Littorina littorea Common periwinkle      LITLIT      2        26
## 682        Littorina littorea Common periwinkle      LITLIT      0         5
##     Subsampled
## 677         No
## 678         No
## 679         No
## 680         No
## 681         No
## 682         No


Save CSV

Now write the csv to disk and show how to import from your computer.

# Write the data frame to your data folder using a relative path. 
# By default, write.csv adds a column with row names that are numbers. I don't
# like that, so I turn that off.
write.csv(motinv, "./data/BASHAR_motile_invert_counts.csv", row.names = FALSE)

Make sure the writing to disk worked by importing the CSV from your computer

# Read the data frame in using a relative path
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")

# Equivalent code to read in the data frame using full path on my computer, but won't match another user.
motinv <- read.csv("C:/Users/KMMiller/OneDrive - DOI/NETN/R_Dev/MMA_R_Training_2026/data/BASHAR_motile_invert_counts.csv")

Import from XLSX

Base R does not have a way to import MS Excel files. The first step for working with Excel files (i.e., files with .xls or .xlsx extensions), therefore, is to install the readxl package to import .xlsx files and writexl to write files to .xlsx. The readxl package has a couple of options for loading Excel spreadsheets, depending on whether the extension is .xls, .xlsx, or unknown, along with options to import different worksheets within a spreadsheet.

The code below installs the required packages (if you forgot to ahead of time), loads them, then first writes the ACAD_wetland CSV we just imported to an .xlsx. The last step imports the .xslx version of the ACAD wetland data.

  1. Install packages (only if you haven't already)
  2. install.packages("readxl") # only need to run once. 
    install.packages("writexl")
  3. Load packages
  4. library(writexl) # saving xlsx
    library(readxl) # importing xlsx
  5. Write CSV to .xlsx to data folder. I'm going in this order to keep this training stand-alone. The read_xlsx() function can't read from a url like read.csv() can.
  6. write_xlsx(motinv, "./data/BASHAR_motile_invert_counts.xlsx")
  7. Import spreadsheet. Note that the default settings import the first sheet, so I didn't really need to specify the sheet below. I included the sheet argument to show how it's done.
  8. motinv_xls <- read_xlsx(path = "./data/BASHAR_motile_invert_counts.xlsx", sheet = "Sheet1") 
  9. View top 6 rows to check the data
  10. head(motinv_xls)
    View R output
    ## # A tibble: 6 × 14
    ##   Network UnitCode SiteCode StartDate   Year QAQC  PlotName CommunityType
    ##   <chr>   <chr>    <chr>    <chr>      <dbl> <lgl> <chr>    <chr>        
    ## 1 NETN    ACAD     BASHAR   2013-06-21  2013 FALSE A1       Ascophyllum  
    ## 2 NETN    ACAD     BASHAR   2013-06-21  2013 FALSE A1       Ascophyllum  
    ## 3 NETN    ACAD     BASHAR   2014-06-21  2014 FALSE A1       Ascophyllum  
    ## 4 NETN    ACAD     BASHAR   2014-06-21  2014 FALSE A1       Ascophyllum  
    ## 5 NETN    ACAD     BASHAR   2016-06-28  2016 FALSE A1       Ascophyllum  
    ## 6 NETN    ACAD     BASHAR   2016-06-28  2016 FALSE A1       Ascophyllum  
    ## # ℹ 6 more variables: ScientificName <chr>, CommonName <chr>,
    ## #   SpeciesCode <chr>, Damage <dbl>, No.Damage <dbl>, Subsampled <chr>


Data Structures

Vectors

The data frame we just examined is a type of data structure. A data structure is what it sounds like: a structure that holds data in an organized way. There are multiple data structures in R, including vectors, lists, arrays, matrices, data frames, and tibbles (more on this data structure later). Today we'll focus on vectors and data frames.

Vectors are the simplest data structure in R. Vectors are like a single column of data in an Excel spreadsheet. Vectors only have one dimension, and can be accessed by their row number. Here are some examples of vectors:

digits <- c(1:10)  # Use x:y to create a sequence of integers starting at x and ending at y
digits
##  [1]  1  2  3  4  5  6  7  8  9 10
digits + 1 # note how 1 was added to every element of digits. 
##  [1]  2  3  4  5  6  7  8  9 10 11
is_odd <- rep(c(FALSE, TRUE), 5)  # Use rep(x, n) to create a vector by repeating x n times 
is_odd
##  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
tree_dbh <- c(12.5, 20.4, 18.1, 38.5, 19.3)
tree_dbh
## [1] 12.5 20.4 18.1 38.5 19.3
bird_ids <- c("song sparrow", "dark-eyed junco", "golden-crowned kinglet", "dark-eyed junco")
bird_ids
## [1] "song sparrow"           "dark-eyed junco"        "golden-crowned kinglet"
## [4] "dark-eyed junco"

Note the use of c(). The c() function stands for combine, and it combines elements into a single vector, with each element separated by a comma in code. The c() function is a fairly universal way to combine multiple elements in R, and you’re going to see it over and over. Note how in digits, when we added a 1, every value in digits increased by 1. This highlights the concept of vectorization in R. The general idea being that you can apply a single operation to a vector (or row in a data frame), and it will apply to all elements of that vector.

If you need to access a single element of a vector, you can use the syntax my_vector[x] where x is the element's index (the number corresponding to its position in the vector). You can also use a vector of indices to extract multiple elements from the vector. Note that in R, indexing starts at 1 (i.e. my_vector[1] is the first element of my_vector). If you've coded in other languages, you may be used to indexing starting at 0.

second_bird <- bird_ids[2]
second_bird
## [1] "dark-eyed junco"
top_two_birds <- bird_ids[c(1,2)]
top_two_birds
## [1] "song sparrow"    "dark-eyed junco"

You can also return only unique values from a vector. The bird_ids vector has dark-eyed juncos listed twice. To get only unique species, run the following code. I also added sort() to sort the list alphabetically.

sort(unique(bird_ids))
## [1] "dark-eyed junco"        "golden-crowned kinglet" "song sparrow"


Data Types

In the examples above, each vector contains a different type of data: digits contains integers, is_odd contains logical (TRUE/FALSE) values, bird_ids contains text, and tree_dbh contains decimal numbers. That's because a given vector can only contain a single type of data.

In R, there are six main data types:

  • character: Regular text, denoted with double or single quotation marks (e.g. "hello", "3", "R is my favorite programming language")
  • numeric: Decimal numbers (e.g. 23, 3.1415)
  • integer: Integers. If you want to explicitly denote a number as an integer in R, append L to it or use as.integer() (e.g. 5L, as.integer(30)).
  • logical: True or false values (TRUE, FALSE). Note that TRUE and FALSE must be all-uppercase.
  • date-time: specially formatted field for dates or time using POSIX format.
  • factor: These are strings that have defined levels (e.g., Parks in your network) that are kept with the column even if no records exist for a given factor level. Factors used to be a lot more common in R. I typically only use them in plotting to force an order level that's not alphabetical.

You can use the class() function to get the data type of a vector:

class(bird_ids)
## [1] "character"
class(tree_dbh)
## [1] "numeric"
class(digits)
## [1] "integer"
class(is_odd)
## [1] "logical"


Data Frames

Data Frame Properties

Data frames are the main way will be interacting with data in R. They're essentially like spreadsheets in excel with specific properties.

Properties of data frames:
  1. As the name implies, data frames are rectangular. That means each column has the same number of rows. Each row has the same number of columns. But data frames can have different number of columns and rows (ie rectangular, not square).
  2. Data frames have 2 dimensions: First is always Rows, and second is always Columns.
  3. You can access the data within data frames by specifying the row, the column, or both at the same time.
  4. Each column in a data frame is assigned one of the five main data types we discussed above: numeric, integer, character, logical, or date-time

Coding Tip: R is strict about assigning data types to columns, such that any text in an otherwise numeric field will turn the entire column into a character. Similarly, if there's anything besides TRUE, FALSE, or a blank in a field meant to be TRUE/FALSE, R will treat that as a character field instead of logical. So, if R treats as a character something that should be a numeric field, it's a good clue there may be a typo or issue in your data needing attention. You can check the assigned data types using the str() function.

str(motinv)
View R output
## 'data.frame':    682 obs. of  14 variables:
##  $ Network       : chr  "NETN" "NETN" "NETN" "NETN" ...
##  $ UnitCode      : chr  "ACAD" "ACAD" "ACAD" "ACAD" ...
##  $ SiteCode      : chr  "BASHAR" "BASHAR" "BASHAR" "BASHAR" ...
##  $ StartDate     : chr  "6/24/2013" "6/21/2013" "6/24/2013" "6/21/2013" ...
##  $ Year          : int  2013 2013 2013 2013 2013 2014 2014 2016 2016 2017 ...
##  $ QAQC          : logi  TRUE FALSE TRUE FALSE TRUE FALSE ...
##  $ PlotName      : chr  "A1" "A1" "A1" "A1" ...
##  $ CommunityType : chr  "Ascophyllum" "Ascophyllum" "Ascophyllum" "Ascophyllum" ...
##  $ ScientificName: chr  "Littorina littorea" "Littorina littorea" "Littorina obtusata" "Littorina obtusata" ...
##  $ CommonName    : chr  "Common periwinkle" "Common periwinkle" "Smooth periwinkle" "Smooth periwinkle" ...
##  $ SpeciesCode   : chr  "LITLIT" "LITLIT" "LITOBT" "LITOBT" ...
##  $ Damage        : chr  "0" "0" "1" "0" ...
##  $ No.Damage     : int  2 3 2 6 1 2 1 6 9 41 ...
##  $ Subsampled    : chr  "No" "No" "No" "No" ...


Accessing Rows and Columns
Show me the $

One way to access the column dimension in data frames is to use the $ syntax. The $ is used to separate the data frame name from the column name. It's similar to the [table_name].[column_name] syntax in Access.

To view the names of the columns in a data frame, you can use the names() function, or use head() to see the first 6 rows with column names. Whatever you prefer. I'll use the former for now.

See column names in wetland data.

names(motinv)
##  [1] "Network"        "UnitCode"       "SiteCode"       "StartDate"     
##  [5] "Year"           "QAQC"           "PlotName"       "CommunityType" 
##  [9] "ScientificName" "CommonName"     "SpeciesCode"    "Damage"        
## [13] "No.Damage"      "Subsampled"

See all rows in the PlotName and ScientificName columns in the motile invertebrate data. You can view the output by clicking on the R output drop down.

motinv$PlotName
View R output
##   [1] "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1"
##  [16] "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A1" "A2" "A2" "A2"
##  [31] "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2"
##  [46] "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2" "A2"
##  [61] "A2" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3"
##  [76] "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3" "A3"
##  [91] "A3" "A3" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4"
## [106] "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4" "A4"
## [121] "A4" "A4" "A4" "A4" "A4" "A4" "A5" "A5" "A5" "A5" "A5" "A5" "A5" "A5" "A5"
## [136] "A5" "A5" "A5" "A5" "A5" "A5" "A5" "A5" "A5" "A5" "A5" "A5" "A5" "A5" "A5"
## [151] "A5" "A5" "A5" "A5" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1"
## [166] "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1" "B1"
## [181] "B1" "B1" "B1" "B1" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2"
## [196] "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B2"
## [211] "B2" "B2" "B2" "B2" "B2" "B2" "B2" "B3" "B3" "B3" "B3" "B3" "B3" "B3" "B3"
## [226] "B3" "B3" "B3" "B3" "B3" "B3" "B3" "B3" "B3" "B3" "B3" "B3" "B3" "B3" "B3"
## [241] "B3" "B3" "B3" "B3" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4"
## [256] "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B4"
## [271] "B4" "B4" "B4" "B4" "B4" "B4" "B4" "B5" "B5" "B5" "B5" "B5" "B5" "B5" "B5"
## [286] "B5" "B5" "B5" "B5" "B5" "B5" "B5" "B5" "B5" "B5" "B5" "B5" "B5" "B5" "B5"
## [301] "B5" "B5" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1"
## [316] "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1"
## [331] "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1" "F1"
## [346] "F1" "F1" "F1" "F1" "F1" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2"
## [361] "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2"
## [376] "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2" "F2"
## [391] "F2" "F2" "F2" "F2" "F2" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3"
## [406] "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3"
## [421] "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3" "F3"
## [436] "F3" "F3" "F3" "F3" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4"
## [451] "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4"
## [466] "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4" "F4"
## [481] "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5"
## [496] "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5"
## [511] "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5" "F5"
## [526] "F5" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1"
## [541] "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R1"
## [556] "R1" "R1" "R1" "R1" "R1" "R1" "R1" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2"
## [571] "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2"
## [586] "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R2" "R3"
## [601] "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3"
## [616] "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R3" "R4" "R4" "R4"
## [631] "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4"
## [646] "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R4" "R5" "R5"
## [661] "R5" "R5" "R5" "R5" "R5" "R5" "R5" "R5" "R5" "R5" "R5" "R5" "R5" "R5" "R5"
## [676] "R5" "R5" "R5" "R5" "R5" "R5" "R5"

motinv$ScientificName
View R output
##   [1] "Littorina littorea"        "Littorina littorea"       
##   [3] "Littorina obtusata"        "Littorina obtusata"       
##   [5] "Nucella lapillus"          "Littorina littorea"       
##   [7] "Littorina obtusata"        "Littorina littorea"       
##   [9] "Littorina obtusata"        "Littorina littorea"       
##  [11] "Littorina obtusata"        "Littorina littorea"       
##  [13] "Littorina obtusata"        "Littorina littorea"       
##  [15] "Littorina obtusata"        "Carcinus maenas"          
##  [17] "Littorina littorea"        "Littorina obtusata"       
##  [19] "Carcinus maenas"           "Littorina littorea"       
##  [21] "Littorina obtusata"        "Carcinus maenas"          
##  [23] "Littorina littorea"        "Littorina obtusata"       
##  [25] "Carcinus maenas"           "Littorina littorea"       
##  [27] "Littorina saxatilis"       "Littorina littorea"       
##  [29] "Littorina littorea"        "Littorina obtusata"       
##  [31] "Littorina obtusata"        "Nucella lapillus"         
##  [33] "Nucella lapillus"          "Littorina littorea"       
##  [35] "Littorina obtusata"        "Littorina saxatilis"      
##  [37] "Littorina littorea"        "Littorina obtusata"       
##  [39] "Nucella lapillus"          "Littorina littorea"       
##  [41] "Littorina obtusata"        "Nucella lapillus"         
##  [43] "Littorina littorea"        "Littorina obtusata"       
##  [45] "Littorina littorea"        "Littorina obtusata"       
##  [47] "Littorina littorea"        "Littorina obtusata"       
##  [49] "Carcinus maenas"           "Littorina littorea"       
##  [51] "Littorina obtusata"        "Carcinus maenas"          
##  [53] "Littorina littorea"        "Littorina obtusata"       
##  [55] "Nucella lapillus"          "Carcinus maenas"          
##  [57] "Littorina littorea"        "Littorina obtusata"       
##  [59] "Carcinus maenas"           "Littorina littorea"       
##  [61] "Littorina obtusata"        "Littorina littorea"       
##  [63] "Littorina littorea"        "Littorina obtusata"       
##  [65] "Littorina obtusata"        "Nucella lapillus"         
##  [67] "Nucella lapillus"          "Littorina littorea"       
##  [69] "Littorina obtusata"        "Littorina littorea"       
##  [71] "Littorina obtusata"        "Littorina littorea"       
##  [73] "Littorina obtusata"        "Nucella lapillus"         
##  [75] "Littorina littorea"        "Littorina obtusata"       
##  [77] "Littorina littorea"        "Littorina obtusata"       
##  [79] "Carcinus maenas"           "Littorina littorea"       
##  [81] "Littorina obtusata"        "Carcinus maenas"          
##  [83] "Littorina littorea"        "Littorina obtusata"       
##  [85] "Carcinus maenas"           "Littorina littorea"       
##  [87] "Littorina obtusata"        "Carcinus maenas"          
##  [89] "Littorina littorea"        "Littorina obtusata"       
##  [91] "Littorina littorea"        "Littorina obtusata"       
##  [93] "Littorina littorea"        "Littorina littorea"       
##  [95] "Littorina obtusata"        "Littorina obtusata"       
##  [97] "Nucella lapillus"          "Littorina littorea"       
##  [99] "Littorina obtusata"        "Nucella lapillus"         
## [101] "Littorina littorea"        "Littorina obtusata"       
## [103] "Littorina littorea"        "Littorina obtusata"       
## [105] "Nucella lapillus"          "Littorina littorea"       
## [107] "Littorina obtusata"        "Littorina littorea"       
## [109] "Littorina obtusata"        "Nucella lapillus"         
## [111] "Carcinus maenas"           "Littorina littorea"       
## [113] "Littorina obtusata"        "Carcinus maenas"          
## [115] "Littorina littorea"        "Littorina obtusata"       
## [117] "Carcinus maenas"           "Littorina littorea"       
## [119] "Littorina obtusata"        "Nucella lapillus"         
## [121] "Carcinus maenas"           "Littorina littorea"       
## [123] "Littorina obtusata"        "Nucella lapillus"         
## [125] "Littorina littorea"        "Littorina obtusata"       
## [127] "Littorina littorea"        "Littorina littorea"       
## [129] "Littorina obtusata"        "Littorina obtusata"       
## [131] "Littorina littorea"        "Littorina obtusata"       
## [133] "Littorina littorea"        "Littorina obtusata"       
## [135] "Littorina littorea"        "Littorina obtusata"       
## [137] "Littorina littorea"        "Littorina obtusata"       
## [139] "Nucella lapillus"          "Littorina littorea"       
## [141] "Littorina obtusata"        "Littorina littorea"       
## [143] "Littorina obtusata"        "Carcinus maenas"          
## [145] "Littorina littorea"        "Littorina obtusata"       
## [147] "Littorina littorea"        "Littorina obtusata"       
## [149] "Nucella lapillus"          "Carcinus maenas"          
## [151] "Littorina littorea"        "Littorina obtusata"       
## [153] "Littorina littorea"        "Littorina obtusata"       
## [155] "Littorina littorea"        "Littorina saxatilis"      
## [157] "Nucella lapillus"          "Littorina littorea"       
## [159] "Littorina obtusata"        "Nucella lapillus"         
## [161] "Littorina littorea"        "Littorina obtusata"       
## [163] "Nucella lapillus"          "Littorina littorea"       
## [165] "Littorina obtusata"        "Littorina littorea"       
## [167] "Littorina obtusata"        "Littorina saxatilis"      
## [169] "Littorina littorea"        "Littorina obtusata"       
## [171] "Littorina saxatilis"       "Carcinus maenas"          
## [173] "Littorina littorea"        "Littorina obtusata"       
## [175] "Littorina saxatilis"       "Nucella lapillus"         
## [177] "Littorina littorea"        "Littorina obtusata"       
## [179] "Nucella lapillus"          "Littorina littorea"       
## [181] "Littorina obtusata"        "Littorina saxatilis"      
## [183] "Littorina littorea"        "Littorina obtusata"       
## [185] "Littorina littorea"        "Littorina obtusata"       
## [187] "Littorina littorea"        "Littorina obtusata"       
## [189] "Littorina littorea"        "Littorina obtusata"       
## [191] "Littorina littorea"        "Littorina obtusata"       
## [193] "Nucella lapillus"          "Littorina littorea"       
## [195] "Littorina obtusata"        "Nucella lapillus"         
## [197] "Littorina littorea"        "Littorina obtusata"       
## [199] "Littorina littorea"        "Littorina obtusata"       
## [201] "Nucella lapillus"          "Carcinus maenas"          
## [203] "Littorina littorea"        "Littorina obtusata"       
## [205] "Nucella lapillus"          "Littorina littorea"       
## [207] "Littorina obtusata"        "Nucella lapillus"         
## [209] "Carcinus maenas"           "Littorina littorea"       
## [211] "Littorina obtusata"        "Nucella lapillus"         
## [213] "Carcinus maenas"           "Littorina littorea"       
## [215] "Littorina obtusata"        "Littorina saxatilis"      
## [217] "Nucella lapillus"          "Littorina obtusata"       
## [219] "Littorina littorea"        "Littorina obtusata"       
## [221] "Littorina littorea"        "Littorina obtusata"       
## [223] "Nucella lapillus"          "Littorina littorea"       
## [225] "Littorina obtusata"        "Littorina littorea"       
## [227] "Littorina obtusata"        "Littorina saxatilis"      
## [229] "Littorina littorea"        "Littorina obtusata"       
## [231] "Nucella lapillus"          "Carcinus maenas"          
## [233] "Littorina littorea"        "Littorina obtusata"       
## [235] "Littorina littorea"        "Littorina obtusata"       
## [237] "Nucella lapillus"          "Littorina littorea"       
## [239] "Littorina obtusata"        "Nucella lapillus"         
## [241] "Carcinus maenas"           "Littorina littorea"       
## [243] "Littorina obtusata"        "Nucella lapillus"         
## [245] "Littorina obtusata"        "Littorina saxatilis"      
## [247] "Littorina littorea"        "Littorina obtusata"       
## [249] "Littorina littorea"        "Littorina obtusata"       
## [251] "Nucella lapillus"          "Littorina littorea"       
## [253] "Littorina obtusata"        "Nucella lapillus"         
## [255] "Testudinalia testudinalis" "Littorina littorea"       
## [257] "Littorina obtusata"        "Littorina saxatilis"      
## [259] "Littorina littorea"        "Littorina obtusata"       
## [261] "Littorina saxatilis"       "Littorina littorea"       
## [263] "Littorina obtusata"        "Littorina saxatilis"      
## [265] "Nucella lapillus"          "Littorina littorea"       
## [267] "Littorina obtusata"        "Littorina saxatilis"      
## [269] "Nucella lapillus"          "Littorina littorea"       
## [271] "Littorina obtusata"        "Littorina saxatilis"      
## [273] "Nucella lapillus"          "Carcinus maenas"          
## [275] "Littorina littorea"        "Littorina obtusata"       
## [277] "Nucella lapillus"          "Littorina obtusata"       
## [279] "Littorina littorea"        "Littorina obtusata"       
## [281] "Littorina littorea"        "Littorina littorea"       
## [283] "Littorina obtusata"        "Littorina littorea"       
## [285] "Littorina obtusata"        "Nucella lapillus"         
## [287] "Carcinus maenas"           "Littorina littorea"       
## [289] "Littorina obtusata"        "Littorina saxatilis"      
## [291] "Nucella lapillus"          "Testudinalia testudinalis"
## [293] "Littorina littorea"        "Littorina obtusata"       
## [295] "Littorina saxatilis"       "Carcinus maenas"          
## [297] "Littorina littorea"        "Littorina obtusata"       
## [299] "Nucella lapillus"          "Littorina littorea"       
## [301] "Littorina obtusata"        "Nucella lapillus"         
## [303] "Littorina littorea"        "Littorina littorea"       
## [305] "Littorina obtusata"        "Littorina obtusata"       
## [307] "Nucella lapillus"          "Testudinalia testudinalis"
## [309] "Testudinalia testudinalis" "Littorina littorea"       
## [311] "Littorina obtusata"        "Testudinalia testudinalis"
## [313] "Littorina littorea"        "Littorina obtusata"       
## [315] "Nucella lapillus"          "Testudinalia testudinalis"
## [317] "Littorina littorea"        "Littorina obtusata"       
## [319] "Nucella lapillus"          "Littorina littorea"       
## [321] "Littorina obtusata"        "Nucella lapillus"         
## [323] "Testudinalia testudinalis" "Littorina littorea"       
## [325] "Littorina obtusata"        "Nucella lapillus"         
## [327] "Testudinalia testudinalis" "Carcinus maenas"          
## [329] "Littorina littorea"        "Littorina obtusata"       
## [331] "Testudinalia testudinalis" "Carcinus maenas"          
## [333] "Littorina littorea"        "Littorina obtusata"       
## [335] "Nucella lapillus"          "Testudinalia testudinalis"
## [337] "Carcinus maenas"           "Littorina littorea"       
## [339] "Littorina obtusata"        "Nucella lapillus"         
## [341] "Testudinalia testudinalis" "Carcinus maenas"          
## [343] "Littorina littorea"        "Littorina obtusata"       
## [345] "Nucella lapillus"          "Testudinalia testudinalis"
## [347] "Carcinus maenas"           "Littorina littorea"       
## [349] "Littorina obtusata"        "Testudinalia testudinalis"
## [351] "Littorina littorea"        "Littorina obtusata"       
## [353] "Littorina obtusata"        "Nucella lapillus"         
## [355] "Nucella lapillus"          "Testudinalia testudinalis"
## [357] "Littorina littorea"        "Littorina obtusata"       
## [359] "Nucella lapillus"          "Testudinalia testudinalis"
## [361] "Littorina littorea"        "Littorina obtusata"       
## [363] "Nucella lapillus"          "Littorina littorea"       
## [365] "Littorina obtusata"        "Littorina saxatilis"      
## [367] "Nucella lapillus"          "Testudinalia testudinalis"
## [369] "Littorina littorea"        "Littorina obtusata"       
## [371] "Nucella lapillus"          "Littorina littorea"       
## [373] "Littorina obtusata"        "Nucella lapillus"         
## [375] "Testudinalia testudinalis" "Littorina littorea"       
## [377] "Littorina obtusata"        "Nucella lapillus"         
## [379] "Carcinus maenas"           "Littorina littorea"       
## [381] "Littorina obtusata"        "Nucella lapillus"         
## [383] "Carcinus maenas"           "Littorina littorea"       
## [385] "Littorina obtusata"        "Littorina saxatilis"      
## [387] "Nucella lapillus"          "Testudinalia testudinalis"
## [389] "Carcinus maenas"           "Littorina littorea"       
## [391] "Littorina obtusata"        "Nucella lapillus"         
## [393] "Littorina littorea"        "Littorina obtusata"       
## [395] "Nucella lapillus"          "Littorina littorea"       
## [397] "Littorina littorea"        "Littorina obtusata"       
## [399] "Littorina obtusata"        "Nucella lapillus"         
## [401] "Nucella lapillus"          "Testudinalia testudinalis"
## [403] "Testudinalia testudinalis" "Littorina littorea"       
## [405] "Littorina obtusata"        "Nucella lapillus"         
## [407] "Testudinalia testudinalis" "Littorina littorea"       
## [409] "Littorina obtusata"        "Testudinalia testudinalis"
## [411] "Littorina littorea"        "Littorina obtusata"       
## [413] "Nucella lapillus"          "Littorina littorea"       
## [415] "Littorina obtusata"        "Nucella lapillus"         
## [417] "Testudinalia testudinalis" "Littorina littorea"       
## [419] "Littorina obtusata"        "Nucella lapillus"         
## [421] "Testudinalia testudinalis" "Littorina littorea"       
## [423] "Littorina obtusata"        "Littorina littorea"       
## [425] "Littorina obtusata"        "Nucella lapillus"         
## [427] "Testudinalia testudinalis" "Littorina littorea"       
## [429] "Littorina obtusata"        "Nucella lapillus"         
## [431] "Testudinalia testudinalis" "Carcinus maenas"          
## [433] "Littorina littorea"        "Littorina obtusata"       
## [435] "Nucella lapillus"          "Carcinus maenas"          
## [437] "Littorina littorea"        "Littorina obtusata"       
## [439] "Nucella lapillus"          "Littorina littorea"       
## [441] "Littorina littorea"        "Littorina obtusata"       
## [443] "Littorina obtusata"        "Nucella lapillus"         
## [445] "Testudinalia testudinalis" "Testudinalia testudinalis"
## [447] "Littorina littorea"        "Littorina obtusata"       
## [449] "Nucella lapillus"          "Littorina littorea"       
## [451] "Littorina obtusata"        "Littorina littorea"       
## [453] "Littorina obtusata"        "Nucella lapillus"         
## [455] "Testudinalia testudinalis" "Littorina littorea"       
## [457] "Littorina obtusata"        "Nucella lapillus"         
## [459] "Littorina littorea"        "Littorina obtusata"       
## [461] "Nucella lapillus"          "Testudinalia testudinalis"
## [463] "Littorina littorea"        "Littorina obtusata"       
## [465] "Testudinalia testudinalis" "Carcinus maenas"          
## [467] "Littorina littorea"        "Littorina obtusata"       
## [469] "Nucella lapillus"          "Testudinalia testudinalis"
## [471] "Littorina littorea"        "Littorina obtusata"       
## [473] "Nucella lapillus"          "Testudinalia testudinalis"
## [475] "Littorina littorea"        "Littorina obtusata"       
## [477] "Nucella lapillus"          "Carcinus maenas"          
## [479] "Littorina littorea"        "Littorina obtusata"       
## [481] "Littorina littorea"        "Littorina littorea"       
## [483] "Littorina obtusata"        "Littorina obtusata"       
## [485] "Nucella lapillus"          "Nucella lapillus"         
## [487] "Testudinalia testudinalis" "Testudinalia testudinalis"
## [489] "Littorina littorea"        "Littorina obtusata"       
## [491] "Testudinalia testudinalis" "Littorina littorea"       
## [493] "Littorina obtusata"        "Nucella lapillus"         
## [495] "Testudinalia testudinalis" "Littorina littorea"       
## [497] "Littorina obtusata"        "Nucella lapillus"         
## [499] "Testudinalia testudinalis" "Littorina littorea"       
## [501] "Littorina obtusata"        "Nucella lapillus"         
## [503] "Testudinalia testudinalis" "Littorina littorea"       
## [505] "Littorina obtusata"        "Nucella lapillus"         
## [507] "Testudinalia testudinalis" "Littorina littorea"       
## [509] "Littorina obtusata"        "Nucella lapillus"         
## [511] "Carcinus maenas"           "Littorina littorea"       
## [513] "Littorina obtusata"        "Nucella lapillus"         
## [515] "Testudinalia testudinalis" "Littorina littorea"       
## [517] "Littorina obtusata"        "Nucella lapillus"         
## [519] "Littorina littorea"        "Littorina obtusata"       
## [521] "Nucella lapillus"          "Testudinalia testudinalis"
## [523] "Carcinus maenas"           "Littorina littorea"       
## [525] "Littorina obtusata"        "Nucella lapillus"         
## [527] "Littorina littorea"        "Littorina littorea"       
## [529] "Littorina obtusata"        "Littorina obtusata"       
## [531] "Nucella lapillus"          "Testudinalia testudinalis"
## [533] "Littorina littorea"        "Littorina obtusata"       
## [535] "Nucella lapillus"          "Testudinalia testudinalis"
## [537] "Littorina littorea"        "Littorina obtusata"       
## [539] "Testudinalia testudinalis" "Littorina littorea"       
## [541] "Nucella lapillus"          "Testudinalia testudinalis"
## [543] "Littorina littorea"        "Littorina obtusata"       
## [545] "Littorina littorea"        "Littorina obtusata"       
## [547] "Testudinalia testudinalis" "Littorina littorea"       
## [549] "Littorina obtusata"        "Testudinalia testudinalis"
## [551] "Carcinus maenas"           "Littorina littorea"       
## [553] "Littorina obtusata"        "Testudinalia testudinalis"
## [555] "Littorina littorea"        "Littorina obtusata"       
## [557] "Nucella lapillus"          "Littorina littorea"       
## [559] "Carcinus maenas"           "Littorina littorea"       
## [561] "Nucella lapillus"          "Testudinalia testudinalis"
## [563] "Littorina obtusata"        "Littorina littorea"       
## [565] "Littorina obtusata"        "Testudinalia testudinalis"
## [567] "Littorina littorea"        "Littorina obtusata"       
## [569] "Nucella lapillus"          "Testudinalia testudinalis"
## [571] "Littorina littorea"        "Littorina obtusata"       
## [573] "Testudinalia testudinalis" "Littorina littorea"       
## [575] "Littorina obtusata"        "Testudinalia testudinalis"
## [577] "Littorina littorea"        "Littorina obtusata"       
## [579] "Nucella lapillus"          "Testudinalia testudinalis"
## [581] "Littorina littorea"        "Littorina obtusata"       
## [583] "Nucella lapillus"          "Testudinalia testudinalis"
## [585] "Carcinus maenas"           "Littorina littorea"       
## [587] "Littorina obtusata"        "Testudinalia testudinalis"
## [589] "Littorina littorea"        "Littorina obtusata"       
## [591] "Nucella lapillus"          "Testudinalia testudinalis"
## [593] "Littorina littorea"        "Testudinalia testudinalis"
## [595] "Carcinus maenas"           "Littorina littorea"       
## [597] "Littorina obtusata"        "Nucella lapillus"         
## [599] "Testudinalia testudinalis" "Littorina littorea"       
## [601] "Nucella lapillus"          "Littorina littorea"       
## [603] "Testudinalia testudinalis" "Littorina littorea"       
## [605] "Littorina obtusata"        "Nucella lapillus"         
## [607] "Testudinalia testudinalis" "Littorina littorea"       
## [609] "Testudinalia testudinalis" "Littorina littorea"       
## [611] "Littorina obtusata"        "Littorina saxatilis"      
## [613] "Testudinalia testudinalis" "Littorina littorea"       
## [615] "Testudinalia testudinalis" "Littorina littorea"       
## [617] "Littorina obtusata"        "Nucella lapillus"         
## [619] "Testudinalia testudinalis" "Littorina littorea"       
## [621] "Littorina littorea"        "Littorina obtusata"       
## [623] "Nucella lapillus"          "Carcinus maenas"          
## [625] "Littorina littorea"        "Testudinalia testudinalis"
## [627] "Littorina littorea"        "Littorina littorea"       
## [629] "Littorina littorea"        "Littorina littorea"       
## [631] "Littorina obtusata"        "Testudinalia testudinalis"
## [633] "Littorina littorea"        "Littorina obtusata"       
## [635] "Testudinalia testudinalis" "Littorina littorea"       
## [637] "Testudinalia testudinalis" "Littorina littorea"       
## [639] "Littorina obtusata"        "Nucella lapillus"         
## [641] "Testudinalia testudinalis" "Littorina littorea"       
## [643] "Littorina obtusata"        "Nucella lapillus"         
## [645] "Testudinalia testudinalis" "Littorina littorea"       
## [647] "Littorina obtusata"        "Testudinalia testudinalis"
## [649] "Carcinus maenas"           "Littorina littorea"       
## [651] "Nucella lapillus"          "Testudinalia testudinalis"
## [653] "Littorina littorea"        "Littorina obtusata"       
## [655] "Testudinalia testudinalis" "Littorina littorea"       
## [657] "Testudinalia testudinalis" "Littorina littorea"       
## [659] "Nucella lapillus"          "Nucella lapillus"         
## [661] "Littorina littorea"        "Littorina littorea"       
## [663] "Littorina littorea"        "Nucella lapillus"         
## [665] "Testudinalia testudinalis" "Littorina littorea"       
## [667] "Nucella lapillus"          "Testudinalia testudinalis"
## [669] "Littorina littorea"        "Nucella lapillus"         
## [671] "Testudinalia testudinalis" "Littorina littorea"       
## [673] "Littorina obtusata"        "Testudinalia testudinalis"
## [675] "Littorina littorea"        "Nucella lapillus"         
## [677] "Testudinalia testudinalis" "Littorina littorea"       
## [679] "Littorina obtusata"        "Testudinalia testudinalis"
## [681] "Littorina littorea"        "Littorina littorea"

Square brackets [ , ]

Remember that every data frame has 2 dimensions. The first dimension is rows and the second is columns. Thinking of the data in two dimensions in the order of rows then columns helps understand how brackets work.

Square brackets [rows, columns] are how you access specific rows and columns in a data frame using base R. Examples include:
  • Specifying row numbers or matching specific patterns, like return all TRUE values.
  • Specifying column numbers or names to return specific columns.
  • Returning specific columns and all rows (leave the left side of the "," blank).
  • Returning specific rows and all columns (leave the right side of the "," blank).

Square brackets were one of the hardest concepts when I was starting out. Don't worry if this isn't immediately intuitive. There are easier ways to work with data frame rows and columns, which you'll learn on Day 2. It is still useful to have a basic understanding of how to interpret square brackets, as you will likely encounter them on StackOverflow or other R help sites. We'll work through some examples of using the square brackets to access rows, columns and/or both.

The code below asks for the dimensions of the motinv data frame, and returns 682 14. That means there are 682 rows, and 14 columns.

Return data frame number of rows and columns by checking data frame dimensions. Click on R output to view results.

dim(motinv)
## [1] 682  14
nrow(motinv) # first dim
## [1] 682
ncol(motinv) # second dim
## [1] 14

Return first 5 rows of the motile invert. data frame.

Note the comma with nothing to the right. That means return all columns.

motinv[1:5,]
motinv[c(1, 2, 3, 4, 5),] #equivalent but more typing
View R output
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 3    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 4    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 5    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
##       ScientificName        CommonName SpeciesCode Damage No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT      0         2         No
## 2 Littorina littorea Common periwinkle      LITLIT      0         3         No
## 3 Littorina obtusata Smooth periwinkle      LITOBT      1         2         No
## 4 Littorina obtusata Smooth periwinkle      LITOBT      0         6         No
## 5   Nucella lapillus          Dogwhelk      NUCLAP      0         1         No

Return all rows and a subset of columns of the data frame

Note how the left side of the comma is empty. That means return all rows.

motinv[, c("SiteCode", "ScientificName", "CommonName", "Year", "Damage", "No.Damage")]
View R output
##     SiteCode            ScientificName        CommonName Year Damage No.Damage
## 1     BASHAR        Littorina littorea Common periwinkle 2013      0         2
## 2     BASHAR        Littorina littorea Common periwinkle 2013      0         3
## 3     BASHAR        Littorina obtusata Smooth periwinkle 2013      1         2
## 4     BASHAR        Littorina obtusata Smooth periwinkle 2013      0         6
## 5     BASHAR          Nucella lapillus          Dogwhelk 2013      0         1
## 6     BASHAR        Littorina littorea Common periwinkle 2014      0         2
## 7     BASHAR        Littorina obtusata Smooth periwinkle 2014      0         1
## 8     BASHAR        Littorina littorea Common periwinkle 2016      0         6
## 9     BASHAR        Littorina obtusata Smooth periwinkle 2016      1         9
## 10    BASHAR        Littorina littorea Common periwinkle 2017      0        41
## 11    BASHAR        Littorina obtusata Smooth periwinkle 2017      0         1
## 12    BASHAR        Littorina littorea Common periwinkle 2018      1        11
## 13    BASHAR        Littorina obtusata Smooth periwinkle 2018      0         3
## 14    BASHAR        Littorina littorea Common periwinkle 2019      0         9
## 15    BASHAR        Littorina obtusata Smooth periwinkle 2019      0         5
## 16    BASHAR           Carcinus maenas        Green crab 2021      0         1
## 17    BASHAR        Littorina littorea Common periwinkle 2021      0         2
## 18    BASHAR        Littorina obtusata Smooth periwinkle 2021      0        16
## 19    BASHAR           Carcinus maenas        Green crab 2022      0         1
## 20    BASHAR        Littorina littorea Common periwinkle 2022      0         4
## 21    BASHAR        Littorina obtusata Smooth periwinkle 2022      0         5
## 22    BASHAR           Carcinus maenas        Green crab 2023      0         2
## 23    BASHAR        Littorina littorea Common periwinkle 2023      0         1
## 24    BASHAR        Littorina obtusata Smooth periwinkle 2023      0         1
## 25    BASHAR           Carcinus maenas        Green crab 2024      0         2
## 26    BASHAR        Littorina littorea Common periwinkle 2024      7        35
## 27    BASHAR       Littorina saxatilis  Rough periwinkle 2024      1         1
## 28    BASHAR        Littorina littorea Common periwinkle 2013      0         3
## 29    BASHAR        Littorina littorea Common periwinkle 2013      0         8
## 30    BASHAR        Littorina obtusata Smooth periwinkle 2013      4        19
## 31    BASHAR        Littorina obtusata Smooth periwinkle 2013      0        25
## 32    BASHAR          Nucella lapillus          Dogwhelk 2013      0         1
## 33    BASHAR          Nucella lapillus          Dogwhelk 2013      0         3
## 34    BASHAR        Littorina littorea Common periwinkle 2014      0         4
## 35    BASHAR        Littorina obtusata Smooth periwinkle 2014      0        29
## 36    BASHAR       Littorina saxatilis  Rough periwinkle 2014      0         1
## 37    BASHAR        Littorina littorea Common periwinkle 2015      1        28
## 38    BASHAR        Littorina obtusata Smooth periwinkle 2015      0        12
## 39    BASHAR          Nucella lapillus          Dogwhelk 2015      1         0
## 40    BASHAR        Littorina littorea Common periwinkle 2016      5        52
## 41    BASHAR        Littorina obtusata Smooth periwinkle 2016      3        50
## 42    BASHAR          Nucella lapillus          Dogwhelk 2016      0         1
## 43    BASHAR        Littorina littorea Common periwinkle 2017      0        65
## 44    BASHAR        Littorina obtusata Smooth periwinkle 2017      0        20
## 45    BASHAR        Littorina littorea Common periwinkle 2018      4        75
## 46    BASHAR        Littorina obtusata Smooth periwinkle 2018      4        31
## 47    BASHAR        Littorina littorea Common periwinkle 2019      0        70
## 48    BASHAR        Littorina obtusata Smooth periwinkle 2019      0        23
## 49    BASHAR           Carcinus maenas        Green crab 2021      0         5
## 50    BASHAR        Littorina littorea Common periwinkle 2021      0        23
## 51    BASHAR        Littorina obtusata Smooth periwinkle 2021      0        39
## 52    BASHAR           Carcinus maenas        Green crab 2022      0         1
## 53    BASHAR        Littorina littorea Common periwinkle 2022      0        67
## 54    BASHAR        Littorina obtusata Smooth periwinkle 2022      0        30
## 55    BASHAR          Nucella lapillus          Dogwhelk 2022      0         2
## 56    BASHAR           Carcinus maenas        Green crab 2023      0         3
## 57    BASHAR        Littorina littorea Common periwinkle 2023      2        46
## 58    BASHAR        Littorina obtusata Smooth periwinkle 2023      0         2
## 59    BASHAR           Carcinus maenas        Green crab 2024      0         2
## 60    BASHAR        Littorina littorea Common periwinkle 2024      6        78
## 61    BASHAR        Littorina obtusata Smooth periwinkle 2024      1        16
## 62    BASHAR        Littorina littorea Common periwinkle 2013      2        14
## 63    BASHAR        Littorina littorea Common periwinkle 2013      0        16
## 64    BASHAR        Littorina obtusata Smooth periwinkle 2013      5        23
## 65    BASHAR        Littorina obtusata Smooth periwinkle 2013      0        34
## 66    BASHAR          Nucella lapillus          Dogwhelk 2013      0         2
## 67    BASHAR          Nucella lapillus          Dogwhelk 2013      0         4
## 68    BASHAR        Littorina littorea Common periwinkle 2014      0        14
## 69    BASHAR        Littorina obtusata Smooth periwinkle 2014      0        18
## 70    BASHAR        Littorina littorea Common periwinkle 2015      1        27
## 71    BASHAR        Littorina obtusata Smooth periwinkle 2015      0        19
## 72    BASHAR        Littorina littorea Common periwinkle 2016      0        59
## 73    BASHAR        Littorina obtusata Smooth periwinkle 2016      2        20
## 74    BASHAR          Nucella lapillus          Dogwhelk 2016      0         1
## 75    BASHAR        Littorina littorea Common periwinkle 2017      0        54
## 76    BASHAR        Littorina obtusata Smooth periwinkle 2017      0        10
## 77    BASHAR        Littorina littorea Common periwinkle 2018      5        66
## 78    BASHAR        Littorina obtusata Smooth periwinkle 2018      2        20
## 79    BASHAR           Carcinus maenas        Green crab 2019      0         4
## 80    BASHAR        Littorina littorea Common periwinkle 2019      0       137
## 81    BASHAR        Littorina obtusata Smooth periwinkle 2019      0        49
## 82    BASHAR           Carcinus maenas        Green crab 2021      0         7
## 83    BASHAR        Littorina littorea Common periwinkle 2021      0        30
## 84    BASHAR        Littorina obtusata Smooth periwinkle 2021      0        37
## 85    BASHAR           Carcinus maenas        Green crab 2022      0         1
## 86    BASHAR        Littorina littorea Common periwinkle 2022      4        81
## 87    BASHAR        Littorina obtusata Smooth periwinkle 2022      0        26
## 88    BASHAR           Carcinus maenas        Green crab 2023      0         3
## 89    BASHAR        Littorina littorea Common periwinkle 2023      5        26
## 90    BASHAR        Littorina obtusata Smooth periwinkle 2023      1         5
## 91    BASHAR        Littorina littorea Common periwinkle 2024      6        81
## 92    BASHAR        Littorina obtusata Smooth periwinkle 2024      0        12
## 93    BASHAR        Littorina littorea Common periwinkle 2013      0        20
## 94    BASHAR        Littorina littorea Common periwinkle 2013      1        29
## 95    BASHAR        Littorina obtusata Smooth periwinkle 2013      1         6
## 96    BASHAR        Littorina obtusata Smooth periwinkle 2013      5        15
## 97    BASHAR          Nucella lapillus          Dogwhelk 2013      0         1
## 98    BASHAR        Littorina littorea Common periwinkle 2014      0        35
## 99    BASHAR        Littorina obtusata Smooth periwinkle 2014      0        22
## 100   BASHAR          Nucella lapillus          Dogwhelk 2014      0         1
## 101   BASHAR        Littorina littorea Common periwinkle 2015      4        98
## 102   BASHAR        Littorina obtusata Smooth periwinkle 2015      2        27
## 103   BASHAR        Littorina littorea Common periwinkle 2016      5       113
## 104   BASHAR        Littorina obtusata Smooth periwinkle 2016      1        52
## 105   BASHAR          Nucella lapillus          Dogwhelk 2016      1         1
## 106   BASHAR        Littorina littorea Common periwinkle 2017      3       134
## 107   BASHAR        Littorina obtusata Smooth periwinkle 2017      1        19
## 108   BASHAR        Littorina littorea Common periwinkle 2018      1        96
## 109   BASHAR        Littorina obtusata Smooth periwinkle 2018      2        11
## 110   BASHAR          Nucella lapillus          Dogwhelk 2018      0         1
## 111   BASHAR           Carcinus maenas        Green crab 2019      0         2
## 112   BASHAR        Littorina littorea Common periwinkle 2019      0        92
## 113   BASHAR        Littorina obtusata Smooth periwinkle 2019      0        37
## 114   BASHAR           Carcinus maenas        Green crab 2021      0         5
## 115   BASHAR        Littorina littorea Common periwinkle 2021      9       124
## 116   BASHAR        Littorina obtusata Smooth periwinkle 2021      1        54
## 117   BASHAR           Carcinus maenas        Green crab 2022      0         2
## 118   BASHAR        Littorina littorea Common periwinkle 2022      0        78
## 119   BASHAR        Littorina obtusata Smooth periwinkle 2022      0        29
## 120   BASHAR          Nucella lapillus          Dogwhelk 2022      1         0
## 121   BASHAR           Carcinus maenas        Green crab 2023      0         3
## 122   BASHAR        Littorina littorea Common periwinkle 2023     24        75
## 123   BASHAR        Littorina obtusata Smooth periwinkle 2023      3        12
## 124   BASHAR          Nucella lapillus          Dogwhelk 2023      0         1
## 125   BASHAR        Littorina littorea Common periwinkle 2024     14       101
## 126   BASHAR        Littorina obtusata Smooth periwinkle 2024      3        16
## 127   BASHAR        Littorina littorea Common periwinkle 2013      5        17
## 128   BASHAR        Littorina littorea Common periwinkle 2013      1        22
## 129   BASHAR        Littorina obtusata Smooth periwinkle 2013      0        23
## 130   BASHAR        Littorina obtusata Smooth periwinkle 2013      4        23
## 131   BASHAR        Littorina littorea Common periwinkle 2014      0        49
## 132   BASHAR        Littorina obtusata Smooth periwinkle 2014      0        22
## 133   BASHAR        Littorina littorea Common periwinkle 2015      1        65
## 134   BASHAR        Littorina obtusata Smooth periwinkle 2015     PM        11
## 135   BASHAR        Littorina littorea Common periwinkle 2016      4       113
## 136   BASHAR        Littorina obtusata Smooth periwinkle 2016      2        30
## 137   BASHAR        Littorina littorea Common periwinkle 2017      4        62
## 138   BASHAR        Littorina obtusata Smooth periwinkle 2017      0        14
## 139   BASHAR          Nucella lapillus          Dogwhelk 2017      0         2
## 140   BASHAR        Littorina littorea Common periwinkle 2018      1        65
## 141   BASHAR        Littorina obtusata Smooth periwinkle 2018      1        15
## 142   BASHAR        Littorina littorea Common periwinkle 2019      1        93
## 143   BASHAR        Littorina obtusata Smooth periwinkle 2019      0        24
## 144   BASHAR           Carcinus maenas        Green crab 2021      0         4
## 145   BASHAR        Littorina littorea Common periwinkle 2021      1        45
## 146   BASHAR        Littorina obtusata Smooth periwinkle 2021      0        45
## 147   BASHAR        Littorina littorea Common periwinkle 2022      3        45
## 148   BASHAR        Littorina obtusata Smooth periwinkle 2022      1        21
## 149   BASHAR          Nucella lapillus          Dogwhelk 2022      0         1
## 150   BASHAR           Carcinus maenas        Green crab 2023      0         3
## 151   BASHAR        Littorina littorea Common periwinkle 2023      5        45
## 152   BASHAR        Littorina obtusata Smooth periwinkle 2023      0         4
## 153   BASHAR        Littorina littorea Common periwinkle 2024     14        74
## 154   BASHAR        Littorina obtusata Smooth periwinkle 2024      2         7
## 155   BASHAR        Littorina littorea Common periwinkle 2013      0         1
## 156   BASHAR       Littorina saxatilis  Rough periwinkle 2013      0         1
## 157   BASHAR          Nucella lapillus          Dogwhelk 2013      0         1
## 158   BASHAR        Littorina littorea Common periwinkle 2015      0         3
## 159   BASHAR        Littorina obtusata Smooth periwinkle 2015      4        40
## 160   BASHAR          Nucella lapillus          Dogwhelk 2015      1         0
## 161   BASHAR        Littorina littorea Common periwinkle 2016      0         5
## 162   BASHAR        Littorina obtusata Smooth periwinkle 2016      0        40
## 163   BASHAR          Nucella lapillus          Dogwhelk 2016      0         8
## 164   BASHAR        Littorina littorea Common periwinkle 2017      1        12
## 165   BASHAR        Littorina obtusata Smooth periwinkle 2017      0        17
## 166   BASHAR        Littorina littorea Common periwinkle 2018      6        47
## 167   BASHAR        Littorina obtusata Smooth periwinkle 2018      7        44
## 168   BASHAR       Littorina saxatilis  Rough periwinkle 2018      0         5
## 169   BASHAR        Littorina littorea Common periwinkle 2019      0        27
## 170   BASHAR        Littorina obtusata Smooth periwinkle 2019      0        44
## 171   BASHAR       Littorina saxatilis  Rough periwinkle 2019      0         1
## 172   BASHAR           Carcinus maenas        Green crab 2021      0         2
## 173   BASHAR        Littorina littorea Common periwinkle 2021      0        21
## 174   BASHAR        Littorina obtusata Smooth periwinkle 2021      0        53
## 175   BASHAR       Littorina saxatilis  Rough periwinkle 2021      0         2
## 176   BASHAR          Nucella lapillus          Dogwhelk 2021      0         2
## 177   BASHAR        Littorina littorea Common periwinkle 2022      0        35
## 178   BASHAR        Littorina obtusata Smooth periwinkle 2022      0        75
## 179   BASHAR          Nucella lapillus          Dogwhelk 2022      0         7
## 180   BASHAR        Littorina littorea Common periwinkle 2023      3        46
## 181   BASHAR        Littorina obtusata Smooth periwinkle 2023      0        15
## 182   BASHAR       Littorina saxatilis  Rough periwinkle 2023      0         1
## 183   BASHAR        Littorina littorea Common periwinkle 2024      1        18
## 184   BASHAR        Littorina obtusata Smooth periwinkle 2024      1         7
## 185   BASHAR        Littorina littorea Common periwinkle 2013      0         1
## 186   BASHAR        Littorina obtusata Smooth periwinkle 2013      0         1
## 187   BASHAR        Littorina littorea Common periwinkle 2014      0         2
## 188   BASHAR        Littorina obtusata Smooth periwinkle 2014      0         1
## 189   BASHAR        Littorina littorea Common periwinkle 2015      0        10
## 190   BASHAR        Littorina obtusata Smooth periwinkle 2015      2        20
## 191   BASHAR        Littorina littorea Common periwinkle 2016      0         2
## 192   BASHAR        Littorina obtusata Smooth periwinkle 2016      0        61
## 193   BASHAR          Nucella lapillus          Dogwhelk 2016      0         2
## 194   BASHAR        Littorina littorea Common periwinkle 2017      1        56
## 195   BASHAR        Littorina obtusata Smooth periwinkle 2017      1        47
## 196   BASHAR          Nucella lapillus          Dogwhelk 2017      0         3
## 197   BASHAR        Littorina littorea Common periwinkle 2018      4        52
## 198   BASHAR        Littorina obtusata Smooth periwinkle 2018      6        25
## 199   BASHAR        Littorina littorea Common periwinkle 2019      1        41
## 200   BASHAR        Littorina obtusata Smooth periwinkle 2019      1        43
## 201   BASHAR          Nucella lapillus          Dogwhelk 2019      0         4
## 202   BASHAR           Carcinus maenas        Green crab 2021      0         1
## 203   BASHAR        Littorina littorea Common periwinkle 2021      0        54
## 204   BASHAR        Littorina obtusata Smooth periwinkle 2021      0        91
## 205   BASHAR          Nucella lapillus          Dogwhelk 2021      0         4
## 206   BASHAR        Littorina littorea Common periwinkle 2022      0        20
## 207   BASHAR        Littorina obtusata Smooth periwinkle 2022      0        24
## 208   BASHAR          Nucella lapillus          Dogwhelk 2022      0         1
## 209   BASHAR           Carcinus maenas        Green crab 2023      0         1
## 210   BASHAR        Littorina littorea Common periwinkle 2023      1        23
## 211   BASHAR        Littorina obtusata Smooth periwinkle 2023      0        10
## 212   BASHAR          Nucella lapillus          Dogwhelk 2023      0         2
## 213   BASHAR           Carcinus maenas        Green crab 2024      0         1
## 214   BASHAR        Littorina littorea Common periwinkle 2024      0        25
## 215   BASHAR        Littorina obtusata Smooth periwinkle 2024      0        13
## 216   BASHAR       Littorina saxatilis  Rough periwinkle 2024      0         1
## 217   BASHAR          Nucella lapillus          Dogwhelk 2024      0         2
## 218   BASHAR        Littorina obtusata Smooth periwinkle 2014      1         4
## 219   BASHAR        Littorina littorea Common periwinkle 2015      0         1
## 220   BASHAR        Littorina obtusata Smooth periwinkle 2015      0        10
## 221   BASHAR        Littorina littorea Common periwinkle 2016      0         3
## 222   BASHAR        Littorina obtusata Smooth periwinkle 2016      0        20
## 223   BASHAR          Nucella lapillus          Dogwhelk 2016      0        11
## 224   BASHAR        Littorina littorea Common periwinkle 2017      1        57
## 225   BASHAR        Littorina obtusata Smooth periwinkle 2017      0        21
## 226   BASHAR        Littorina littorea Common periwinkle 2018      2        51
## 227   BASHAR        Littorina obtusata Smooth periwinkle 2018      4        23
## 228   BASHAR       Littorina saxatilis  Rough periwinkle 2018      0         1
## 229   BASHAR        Littorina littorea Common periwinkle 2019      0        17
## 230   BASHAR        Littorina obtusata Smooth periwinkle 2019      0        39
## 231   BASHAR          Nucella lapillus          Dogwhelk 2019      0         6
## 232   BASHAR           Carcinus maenas        Green crab 2021      0         9
## 233   BASHAR        Littorina littorea Common periwinkle 2021      0         9
## 234   BASHAR        Littorina obtusata Smooth periwinkle 2021      0        11
## 235   BASHAR        Littorina littorea Common periwinkle 2022      1        31
## 236   BASHAR        Littorina obtusata Smooth periwinkle 2022      0        45
## 237   BASHAR          Nucella lapillus          Dogwhelk 2022      0         4
## 238   BASHAR        Littorina littorea Common periwinkle 2023      1        23
## 239   BASHAR        Littorina obtusata Smooth periwinkle 2023      0         8
## 240   BASHAR          Nucella lapillus          Dogwhelk 2023      0         1
## 241   BASHAR           Carcinus maenas        Green crab 2024      0         2
## 242   BASHAR        Littorina littorea Common periwinkle 2024      0        25
## 243   BASHAR        Littorina obtusata Smooth periwinkle 2024      0         5
## 244   BASHAR          Nucella lapillus          Dogwhelk 2024      0         1
## 245   BASHAR        Littorina obtusata Smooth periwinkle 2013      0         1
## 246   BASHAR       Littorina saxatilis  Rough periwinkle 2013      0         1
## 247   BASHAR        Littorina littorea Common periwinkle 2014      0         1
## 248   BASHAR        Littorina obtusata Smooth periwinkle 2014      2         0
## 249   BASHAR        Littorina littorea Common periwinkle 2015      0         1
## 250   BASHAR        Littorina obtusata Smooth periwinkle 2015      2        19
## 251   BASHAR          Nucella lapillus          Dogwhelk 2015      0         2
## 252   BASHAR        Littorina littorea Common periwinkle 2016      0         4
## 253   BASHAR        Littorina obtusata Smooth periwinkle 2016      0        21
## 254   BASHAR          Nucella lapillus          Dogwhelk 2016      1         4
## 255   BASHAR Testudinalia testudinalis            Limpet 2016      0         1
## 256   BASHAR        Littorina littorea Common periwinkle 2017      1        25
## 257   BASHAR        Littorina obtusata Smooth periwinkle 2017      1        26
## 258   BASHAR       Littorina saxatilis  Rough periwinkle 2017      0         1
## 259   BASHAR        Littorina littorea Common periwinkle 2018      1        28
## 260   BASHAR        Littorina obtusata Smooth periwinkle 2018      2        29
## 261   BASHAR       Littorina saxatilis  Rough periwinkle 2018      0         1
## 262   BASHAR        Littorina littorea Common periwinkle 2019      3        19
## 263   BASHAR        Littorina obtusata Smooth periwinkle 2019      0        56
## 264   BASHAR       Littorina saxatilis  Rough periwinkle 2019      0         1
## 265   BASHAR          Nucella lapillus          Dogwhelk 2019      0         1
## 266   BASHAR        Littorina littorea Common periwinkle 2021      0        18
## 267   BASHAR        Littorina obtusata Smooth periwinkle 2021      0        28
## 268   BASHAR       Littorina saxatilis  Rough periwinkle 2021      0         1
## 269   BASHAR          Nucella lapillus          Dogwhelk 2021      0         2
## 270   BASHAR        Littorina littorea Common periwinkle 2022      1        29
## 271   BASHAR        Littorina obtusata Smooth periwinkle 2022      0        72
## 272   BASHAR       Littorina saxatilis  Rough periwinkle 2022      0         2
## 273   BASHAR          Nucella lapillus          Dogwhelk 2022      0        11
## 274   BASHAR           Carcinus maenas        Green crab 2024      0         1
## 275   BASHAR        Littorina littorea Common periwinkle 2024      0         8
## 276   BASHAR        Littorina obtusata Smooth periwinkle 2024      0         5
## 277   BASHAR          Nucella lapillus          Dogwhelk 2024      0         4
## 278   BASHAR        Littorina obtusata Smooth periwinkle 2015      1        10
## 279   BASHAR        Littorina littorea Common periwinkle 2016      0         1
## 280   BASHAR        Littorina obtusata Smooth periwinkle 2016      0        26
## 281   BASHAR        Littorina littorea Common periwinkle 2017      0         5
## 282   BASHAR        Littorina littorea Common periwinkle 2018      1        10
## 283   BASHAR        Littorina obtusata Smooth periwinkle 2018      2        20
## 284   BASHAR        Littorina littorea Common periwinkle 2019      2        14
## 285   BASHAR        Littorina obtusata Smooth periwinkle 2019      0         9
## 286   BASHAR          Nucella lapillus          Dogwhelk 2019      0         1
## 287   BASHAR           Carcinus maenas        Green crab 2021      0         1
## 288   BASHAR        Littorina littorea Common periwinkle 2021      0         7
## 289   BASHAR        Littorina obtusata Smooth periwinkle 2021      1        13
## 290   BASHAR       Littorina saxatilis  Rough periwinkle 2021      0         1
## 291   BASHAR          Nucella lapillus          Dogwhelk 2021      0         1
## 292   BASHAR Testudinalia testudinalis            Limpet 2021      1         0
## 293   BASHAR        Littorina littorea Common periwinkle 2022      0        47
## 294   BASHAR        Littorina obtusata Smooth periwinkle 2022      0       107
## 295   BASHAR       Littorina saxatilis  Rough periwinkle 2022      0         1
## 296   BASHAR           Carcinus maenas        Green crab 2023      0         1
## 297   BASHAR        Littorina littorea Common periwinkle 2023      0        21
## 298   BASHAR        Littorina obtusata Smooth periwinkle 2023      0         6
## 299   BASHAR          Nucella lapillus          Dogwhelk 2023      0         3
## 300   BASHAR        Littorina littorea Common periwinkle 2024      0        17
## 301   BASHAR        Littorina obtusata Smooth periwinkle 2024      0        17
## 302   BASHAR          Nucella lapillus          Dogwhelk 2024      0         7
## 303   BASHAR        Littorina littorea Common periwinkle 2013      2        18
## 304   BASHAR        Littorina littorea Common periwinkle 2013      2        30
## 305   BASHAR        Littorina obtusata Smooth periwinkle 2013      0        31
## 306   BASHAR        Littorina obtusata Smooth periwinkle 2013      1        40
## 307   BASHAR          Nucella lapillus          Dogwhelk 2013      0         5
## 308   BASHAR Testudinalia testudinalis            Limpet 2013      0         4
## 309   BASHAR Testudinalia testudinalis            Limpet 2013      0        11
## 310   BASHAR        Littorina littorea Common periwinkle 2014      0        43
## 311   BASHAR        Littorina obtusata Smooth periwinkle 2014      0        58
## 312   BASHAR Testudinalia testudinalis            Limpet 2014      0        12
## 313   BASHAR        Littorina littorea Common periwinkle 2015      0        10
## 314   BASHAR        Littorina obtusata Smooth periwinkle 2015      3        31
## 315   BASHAR          Nucella lapillus          Dogwhelk 2015      0         1
## 316   BASHAR Testudinalia testudinalis            Limpet 2015      0        31
## 317   BASHAR        Littorina littorea Common periwinkle 2016      0        84
## 318   BASHAR        Littorina obtusata Smooth periwinkle 2016      0        74
## 319   BASHAR          Nucella lapillus          Dogwhelk 2016      0         2
## 320   BASHAR        Littorina littorea Common periwinkle 2017      0        96
## 321   BASHAR        Littorina obtusata Smooth periwinkle 2017      0        19
## 322   BASHAR          Nucella lapillus          Dogwhelk 2017      0         9
## 323   BASHAR Testudinalia testudinalis            Limpet 2017      0         2
## 324   BASHAR        Littorina littorea Common periwinkle 2018      4        66
## 325   BASHAR        Littorina obtusata Smooth periwinkle 2018      0        18
## 326   BASHAR          Nucella lapillus          Dogwhelk 2018      0         1
## 327   BASHAR Testudinalia testudinalis            Limpet 2018      0         1
## 328   BASHAR           Carcinus maenas        Green crab 2019      0         1
## 329   BASHAR        Littorina littorea Common periwinkle 2019      0       116
## 330   BASHAR        Littorina obtusata Smooth periwinkle 2019      0        38
## 331   BASHAR Testudinalia testudinalis            Limpet 2019      0         6
## 332   BASHAR           Carcinus maenas        Green crab 2021      0         3
## 333   BASHAR        Littorina littorea Common periwinkle 2021      2       131
## 334   BASHAR        Littorina obtusata Smooth periwinkle 2021      1        90
## 335   BASHAR          Nucella lapillus          Dogwhelk 2021      0         6
## 336   BASHAR Testudinalia testudinalis            Limpet 2021      0         1
## 337   BASHAR           Carcinus maenas        Green crab 2022      0         3
## 338   BASHAR        Littorina littorea Common periwinkle 2022      0       234
## 339   BASHAR        Littorina obtusata Smooth periwinkle 2022      0        46
## 340   BASHAR          Nucella lapillus          Dogwhelk 2022      1         2
## 341   BASHAR Testudinalia testudinalis            Limpet 2022      0         2
## 342   BASHAR           Carcinus maenas        Green crab 2023      0         4
## 343   BASHAR        Littorina littorea Common periwinkle 2023     11       188
## 344   BASHAR        Littorina obtusata Smooth periwinkle 2023      3        24
## 345   BASHAR          Nucella lapillus          Dogwhelk 2023      0         5
## 346   BASHAR Testudinalia testudinalis            Limpet 2023      0         2
## 347   BASHAR           Carcinus maenas        Green crab 2024      0         1
## 348   BASHAR        Littorina littorea Common periwinkle 2024      1       116
## 349   BASHAR        Littorina obtusata Smooth periwinkle 2024      1        18
## 350   BASHAR Testudinalia testudinalis            Limpet 2024      0         2
## 351   BASHAR        Littorina littorea Common periwinkle 2013      0         5
## 352   BASHAR        Littorina obtusata Smooth periwinkle 2013      0        16
## 353   BASHAR        Littorina obtusata Smooth periwinkle 2013      1        80
## 354   BASHAR          Nucella lapillus          Dogwhelk 2013      0         1
## 355   BASHAR          Nucella lapillus          Dogwhelk 2013      0         6
## 356   BASHAR Testudinalia testudinalis            Limpet 2013      0         3
## 357   BASHAR        Littorina littorea Common periwinkle 2014      0        10
## 358   BASHAR        Littorina obtusata Smooth periwinkle 2014      0        47
## 359   BASHAR          Nucella lapillus          Dogwhelk 2014      0         1
## 360   BASHAR Testudinalia testudinalis            Limpet 2014      0         4
## 361   BASHAR        Littorina littorea Common periwinkle 2015      1        29
## 362   BASHAR        Littorina obtusata Smooth periwinkle 2015      2        26
## 363   BASHAR          Nucella lapillus          Dogwhelk 2015      1         1
## 364   BASHAR        Littorina littorea Common periwinkle 2016      1        42
## 365   BASHAR        Littorina obtusata Smooth periwinkle 2016      2       103
## 366   BASHAR       Littorina saxatilis  Rough periwinkle 2016      0         1
## 367   BASHAR          Nucella lapillus          Dogwhelk 2016      0         6
## 368   BASHAR Testudinalia testudinalis            Limpet 2016      0         1
## 369   BASHAR        Littorina littorea Common periwinkle 2017      0        71
## 370   BASHAR        Littorina obtusata Smooth periwinkle 2017      0        35
## 371   BASHAR          Nucella lapillus          Dogwhelk 2017      0         4
## 372   BASHAR        Littorina littorea Common periwinkle 2018      1       100
## 373   BASHAR        Littorina obtusata Smooth periwinkle 2018      1        29
## 374   BASHAR          Nucella lapillus          Dogwhelk 2018      0         1
## 375   BASHAR Testudinalia testudinalis            Limpet 2018      0         1
## 376   BASHAR        Littorina littorea Common periwinkle 2019      1        85
## 377   BASHAR        Littorina obtusata Smooth periwinkle 2019      0        34
## 378   BASHAR          Nucella lapillus          Dogwhelk 2019      0         1
## 379   BASHAR           Carcinus maenas        Green crab 2021      0         5
## 380   BASHAR        Littorina littorea Common periwinkle 2021      0        90
## 381   BASHAR        Littorina obtusata Smooth periwinkle 2021      0        87
## 382   BASHAR          Nucella lapillus          Dogwhelk 2021      0         2
## 383   BASHAR           Carcinus maenas        Green crab 2022      0         1
## 384   BASHAR        Littorina littorea Common periwinkle 2022      4       123
## 385   BASHAR        Littorina obtusata Smooth periwinkle 2022      0        44
## 386   BASHAR       Littorina saxatilis  Rough periwinkle 2022      0         2
## 387   BASHAR          Nucella lapillus          Dogwhelk 2022      0         2
## 388   BASHAR Testudinalia testudinalis            Limpet 2022      0         2
## 389   BASHAR           Carcinus maenas        Green crab 2023      0         1
## 390   BASHAR        Littorina littorea Common periwinkle 2023      1       148
## 391   BASHAR        Littorina obtusata Smooth periwinkle 2023      0        37
## 392   BASHAR          Nucella lapillus          Dogwhelk 2023      0         3
## 393   BASHAR        Littorina littorea Common periwinkle 2024      1        84
## 394   BASHAR        Littorina obtusata Smooth periwinkle 2024      0        41
## 395   BASHAR          Nucella lapillus          Dogwhelk 2024      0         3
## 396   BASHAR        Littorina littorea Common periwinkle 2013      0         3
## 397   BASHAR        Littorina littorea Common periwinkle 2013      0         4
## 398   BASHAR        Littorina obtusata Smooth periwinkle 2013      0        17
## 399   BASHAR        Littorina obtusata Smooth periwinkle 2013      0        24
## 400   BASHAR          Nucella lapillus          Dogwhelk 2013      0         1
## 401   BASHAR          Nucella lapillus          Dogwhelk 2013      0         2
## 402   BASHAR Testudinalia testudinalis            Limpet 2013      0         4
## 403   BASHAR Testudinalia testudinalis            Limpet 2013      0        13
## 404   BASHAR        Littorina littorea Common periwinkle 2014      0        41
## 405   BASHAR        Littorina obtusata Smooth periwinkle 2014      0        59
## 406   BASHAR          Nucella lapillus          Dogwhelk 2014      0         3
## 407   BASHAR Testudinalia testudinalis            Limpet 2014      0         3
## 408   BASHAR        Littorina littorea Common periwinkle 2015      3        23
## 409   BASHAR        Littorina obtusata Smooth periwinkle 2015      0        25
## 410   BASHAR Testudinalia testudinalis            Limpet 2015      0         5
## 411   BASHAR        Littorina littorea Common periwinkle 2016      2        46
## 412   BASHAR        Littorina obtusata Smooth periwinkle 2016      1        42
## 413   BASHAR          Nucella lapillus          Dogwhelk 2016      1         2
## 414   BASHAR        Littorina littorea Common periwinkle 2017      0       121
## 415   BASHAR        Littorina obtusata Smooth periwinkle 2017      0        30
## 416   BASHAR          Nucella lapillus          Dogwhelk 2017      0         2
## 417   BASHAR Testudinalia testudinalis            Limpet 2017      0         1
## 418   BASHAR        Littorina littorea Common periwinkle 2018      7       165
## 419   BASHAR        Littorina obtusata Smooth periwinkle 2018      0        34
## 420   BASHAR          Nucella lapillus          Dogwhelk 2018      0         4
## 421   BASHAR Testudinalia testudinalis            Limpet 2018      0         6
## 422   BASHAR        Littorina littorea Common periwinkle 2019      0        86
## 423   BASHAR        Littorina obtusata Smooth periwinkle 2019      0        27
## 424   BASHAR        Littorina littorea Common periwinkle 2021      7       249
## 425   BASHAR        Littorina obtusata Smooth periwinkle 2021      1        42
## 426   BASHAR          Nucella lapillus          Dogwhelk 2021      0         2
## 427   BASHAR Testudinalia testudinalis            Limpet 2021      0         1
## 428   BASHAR        Littorina littorea Common periwinkle 2022      0       165
## 429   BASHAR        Littorina obtusata Smooth periwinkle 2022      0        34
## 430   BASHAR          Nucella lapillus          Dogwhelk 2022      0         1
## 431   BASHAR Testudinalia testudinalis            Limpet 2022      0         4
## 432   BASHAR           Carcinus maenas        Green crab 2023      0         1
## 433   BASHAR        Littorina littorea Common periwinkle 2023      2       151
## 434   BASHAR        Littorina obtusata Smooth periwinkle 2023      0        16
## 435   BASHAR          Nucella lapillus          Dogwhelk 2023      0         3
## 436   BASHAR           Carcinus maenas        Green crab 2024      0         3
## 437   BASHAR        Littorina littorea Common periwinkle 2024      5       106
## 438   BASHAR        Littorina obtusata Smooth periwinkle 2024      0        26
## 439   BASHAR          Nucella lapillus          Dogwhelk 2024      0         1
## 440   BASHAR        Littorina littorea Common periwinkle 2013      1        22
## 441   BASHAR        Littorina littorea Common periwinkle 2013      1        26
## 442   BASHAR        Littorina obtusata Smooth periwinkle 2013      2        32
## 443   BASHAR        Littorina obtusata Smooth periwinkle 2013      0        80
## 444   BASHAR          Nucella lapillus          Dogwhelk 2013      0         1
## 445   BASHAR Testudinalia testudinalis            Limpet 2013      0         2
## 446   BASHAR Testudinalia testudinalis            Limpet 2013      0         9
## 447   BASHAR        Littorina littorea Common periwinkle 2014      0        25
## 448   BASHAR        Littorina obtusata Smooth periwinkle 2014      0        25
## 449   BASHAR          Nucella lapillus          Dogwhelk 2014      0         1
## 450   BASHAR        Littorina littorea Common periwinkle 2015      2        18
## 451   BASHAR        Littorina obtusata Smooth periwinkle 2015      1         4
## 452   BASHAR        Littorina littorea Common periwinkle 2016      0        68
## 453   BASHAR        Littorina obtusata Smooth periwinkle 2016      0        51
## 454   BASHAR          Nucella lapillus          Dogwhelk 2016      0         1
## 455   BASHAR Testudinalia testudinalis            Limpet 2016      0        12
## 456   BASHAR        Littorina littorea Common periwinkle 2017      0       124
## 457   BASHAR        Littorina obtusata Smooth periwinkle 2017      0        41
## 458   BASHAR          Nucella lapillus          Dogwhelk 2017      0         1
## 459   BASHAR        Littorina littorea Common periwinkle 2018      4       181
## 460   BASHAR        Littorina obtusata Smooth periwinkle 2018      1        28
## 461   BASHAR          Nucella lapillus          Dogwhelk 2018      0         2
## 462   BASHAR Testudinalia testudinalis            Limpet 2018      0         3
## 463   BASHAR        Littorina littorea Common periwinkle 2019      0       102
## 464   BASHAR        Littorina obtusata Smooth periwinkle 2019      0        31
## 465   BASHAR Testudinalia testudinalis            Limpet 2019      0         4
## 466   BASHAR           Carcinus maenas        Green crab 2021      0         3
## 467   BASHAR        Littorina littorea Common periwinkle 2021      2       212
## 468   BASHAR        Littorina obtusata Smooth periwinkle 2021      0        34
## 469   BASHAR          Nucella lapillus          Dogwhelk 2021      0         1
## 470   BASHAR Testudinalia testudinalis            Limpet 2021      0         1
## 471   BASHAR        Littorina littorea Common periwinkle 2022     17       282
## 472   BASHAR        Littorina obtusata Smooth periwinkle 2022      4        33
## 473   BASHAR          Nucella lapillus          Dogwhelk 2022      0         9
## 474   BASHAR Testudinalia testudinalis            Limpet 2022      1         5
## 475   BASHAR        Littorina littorea Common periwinkle 2023      1       130
## 476   BASHAR        Littorina obtusata Smooth periwinkle 2023      0         7
## 477   BASHAR          Nucella lapillus          Dogwhelk 2023      0         1
## 478   BASHAR           Carcinus maenas        Green crab 2024      0         2
## 479   BASHAR        Littorina littorea Common periwinkle 2024     11       138
## 480   BASHAR        Littorina obtusata Smooth periwinkle 2024      1        21
## 481   BASHAR        Littorina littorea Common periwinkle 2013      0         5
## 482   BASHAR        Littorina littorea Common periwinkle 2013      0        13
## 483   BASHAR        Littorina obtusata Smooth periwinkle 2013      0        45
## 484   BASHAR        Littorina obtusata Smooth periwinkle 2013      1        89
## 485   BASHAR          Nucella lapillus          Dogwhelk 2013      1         0
## 486   BASHAR          Nucella lapillus          Dogwhelk 2013      0        11
## 487   BASHAR Testudinalia testudinalis            Limpet 2013      0         5
## 488   BASHAR Testudinalia testudinalis            Limpet 2013      0        13
## 489   BASHAR        Littorina littorea Common periwinkle 2014      0         9
## 490   BASHAR        Littorina obtusata Smooth periwinkle 2014      0        35
## 491   BASHAR Testudinalia testudinalis            Limpet 2014      0         1
## 492   BASHAR        Littorina littorea Common periwinkle 2015      5        17
## 493   BASHAR        Littorina obtusata Smooth periwinkle 2015      4        35
## 494   BASHAR          Nucella lapillus          Dogwhelk 2015      1         6
## 495   BASHAR Testudinalia testudinalis            Limpet 2015      0         2
## 496   BASHAR        Littorina littorea Common periwinkle 2016      0        61
## 497   BASHAR        Littorina obtusata Smooth periwinkle 2016      2        49
## 498   BASHAR          Nucella lapillus          Dogwhelk 2016      1        10
## 499   BASHAR Testudinalia testudinalis            Limpet 2016      0         4
## 500   BASHAR        Littorina littorea Common periwinkle 2017      0        80
## 501   BASHAR        Littorina obtusata Smooth periwinkle 2017      0        28
## 502   BASHAR          Nucella lapillus          Dogwhelk 2017      0         1
## 503   BASHAR Testudinalia testudinalis            Limpet 2017      0         2
## 504   BASHAR        Littorina littorea Common periwinkle 2018      0        97
## 505   BASHAR        Littorina obtusata Smooth periwinkle 2018      2        39
## 506   BASHAR          Nucella lapillus          Dogwhelk 2018      0         3
## 507   BASHAR Testudinalia testudinalis            Limpet 2018      0        10
## 508   BASHAR        Littorina littorea Common periwinkle 2019      0        70
## 509   BASHAR        Littorina obtusata Smooth periwinkle 2019      0        18
## 510   BASHAR          Nucella lapillus          Dogwhelk 2019      1         7
## 511   BASHAR           Carcinus maenas        Green crab 2021      0         1
## 512   BASHAR        Littorina littorea Common periwinkle 2021      3       130
## 513   BASHAR        Littorina obtusata Smooth periwinkle 2021      2        39
## 514   BASHAR          Nucella lapillus          Dogwhelk 2021      0         3
## 515   BASHAR Testudinalia testudinalis            Limpet 2021      0         1
## 516   BASHAR        Littorina littorea Common periwinkle 2022      6       134
## 517   BASHAR        Littorina obtusata Smooth periwinkle 2022      0        63
## 518   BASHAR          Nucella lapillus          Dogwhelk 2022      0        10
## 519   BASHAR        Littorina littorea Common periwinkle 2023      1       168
## 520   BASHAR        Littorina obtusata Smooth periwinkle 2023      0        11
## 521   BASHAR          Nucella lapillus          Dogwhelk 2023      0         3
## 522   BASHAR Testudinalia testudinalis            Limpet 2023      0         1
## 523   BASHAR           Carcinus maenas        Green crab 2024      0         2
## 524   BASHAR        Littorina littorea Common periwinkle 2024      8        73
## 525   BASHAR        Littorina obtusata Smooth periwinkle 2024      3        29
## 526   BASHAR          Nucella lapillus          Dogwhelk 2024      0         2
## 527   BASHAR        Littorina littorea Common periwinkle 2013      0         2
## 528   BASHAR        Littorina littorea Common periwinkle 2013      0         4
## 529   BASHAR        Littorina obtusata Smooth periwinkle 2013      0         1
## 530   BASHAR        Littorina obtusata Smooth periwinkle 2013      0         1
## 531   BASHAR          Nucella lapillus          Dogwhelk 2013      0         1
## 532   BASHAR Testudinalia testudinalis            Limpet 2013      0         2
## 533   BASHAR        Littorina littorea Common periwinkle 2014      0         6
## 534   BASHAR        Littorina obtusata Smooth periwinkle 2014      0         2
## 535   BASHAR          Nucella lapillus          Dogwhelk 2014      0         2
## 536   BASHAR Testudinalia testudinalis            Limpet 2014      0         3
## 537   BASHAR        Littorina littorea Common periwinkle 2015      9        69
## 538   BASHAR        Littorina obtusata Smooth periwinkle 2015      1        18
## 539   BASHAR Testudinalia testudinalis            Limpet 2015      0         6
## 540   BASHAR        Littorina littorea Common periwinkle 2016      3        18
## 541   BASHAR          Nucella lapillus          Dogwhelk 2016      0         2
## 542   BASHAR Testudinalia testudinalis            Limpet 2016      0         6
## 543   BASHAR        Littorina littorea Common periwinkle 2017      0        92
## 544   BASHAR        Littorina obtusata Smooth periwinkle 2017      0         2
## 545   BASHAR        Littorina littorea Common periwinkle 2018      5        94
## 546   BASHAR        Littorina obtusata Smooth periwinkle 2018      0         5
## 547   BASHAR Testudinalia testudinalis            Limpet 2018      0         2
## 548   BASHAR        Littorina littorea Common periwinkle 2019      0       234
## 549   BASHAR        Littorina obtusata Smooth periwinkle 2019      0         9
## 550   BASHAR Testudinalia testudinalis            Limpet 2019      0         6
## 551   BASHAR           Carcinus maenas        Green crab 2021      0         3
## 552   BASHAR        Littorina littorea Common periwinkle 2021     18       261
## 553   BASHAR        Littorina obtusata Smooth periwinkle 2021      0         4
## 554   BASHAR Testudinalia testudinalis            Limpet 2021      0         5
## 555   BASHAR        Littorina littorea Common periwinkle 2022     11       233
## 556   BASHAR        Littorina obtusata Smooth periwinkle 2022      0        12
## 557   BASHAR          Nucella lapillus          Dogwhelk 2022      0         1
## 558   BASHAR        Littorina littorea Common periwinkle 2023     10       182
## 559   BASHAR           Carcinus maenas        Green crab 2024      0         2
## 560   BASHAR        Littorina littorea Common periwinkle 2024     10       153
## 561   BASHAR          Nucella lapillus          Dogwhelk 2024      0         3
## 562   BASHAR Testudinalia testudinalis            Limpet 2024      0         1
## 563   BASHAR        Littorina obtusata Smooth periwinkle 2013      0         1
## 564   BASHAR        Littorina littorea Common periwinkle 2014      0        23
## 565   BASHAR        Littorina obtusata Smooth periwinkle 2014      0         3
## 566   BASHAR Testudinalia testudinalis            Limpet 2014      0         1
## 567   BASHAR        Littorina littorea Common periwinkle 2015     10        94
## 568   BASHAR        Littorina obtusata Smooth periwinkle 2015      0         5
## 569   BASHAR          Nucella lapillus          Dogwhelk 2015      1         0
## 570   BASHAR Testudinalia testudinalis            Limpet 2015      0        12
## 571   BASHAR        Littorina littorea Common periwinkle 2016      0        30
## 572   BASHAR        Littorina obtusata Smooth periwinkle 2016      1         0
## 573   BASHAR Testudinalia testudinalis            Limpet 2016      0         2
## 574   BASHAR        Littorina littorea Common periwinkle 2017      0       106
## 575   BASHAR        Littorina obtusata Smooth periwinkle 2017      0         1
## 576   BASHAR Testudinalia testudinalis            Limpet 2017      0         1
## 577   BASHAR        Littorina littorea Common periwinkle 2018     12        95
## 578   BASHAR        Littorina obtusata Smooth periwinkle 2018      0         4
## 579   BASHAR          Nucella lapillus          Dogwhelk 2018      0         1
## 580   BASHAR Testudinalia testudinalis            Limpet 2018      0         3
## 581   BASHAR        Littorina littorea Common periwinkle 2019      1       170
## 582   BASHAR        Littorina obtusata Smooth periwinkle 2019      0         8
## 583   BASHAR          Nucella lapillus          Dogwhelk 2019      0         1
## 584   BASHAR Testudinalia testudinalis            Limpet 2019      0         1
## 585   BASHAR           Carcinus maenas        Green crab 2021      1         0
## 586   BASHAR        Littorina littorea Common periwinkle 2021    257         0
## 587   BASHAR        Littorina obtusata Smooth periwinkle 2021      1         0
## 588   BASHAR Testudinalia testudinalis            Limpet 2021      0         1
## 589   BASHAR        Littorina littorea Common periwinkle 2022      9       123
## 590   BASHAR        Littorina obtusata Smooth periwinkle 2022      0         8
## 591   BASHAR          Nucella lapillus          Dogwhelk 2022      0         4
## 592   BASHAR Testudinalia testudinalis            Limpet 2022      0         4
## 593   BASHAR        Littorina littorea Common periwinkle 2023      6       169
## 594   BASHAR Testudinalia testudinalis            Limpet 2023      0         2
## 595   BASHAR           Carcinus maenas        Green crab 2024      0         2
## 596   BASHAR        Littorina littorea Common periwinkle 2024      6        96
## 597   BASHAR        Littorina obtusata Smooth periwinkle 2024      0         1
## 598   BASHAR          Nucella lapillus          Dogwhelk 2024      0         1
## 599   BASHAR Testudinalia testudinalis            Limpet 2024      0         1
## 600   BASHAR        Littorina littorea Common periwinkle 2013      0         1
## 601   BASHAR          Nucella lapillus          Dogwhelk 2013      0         1
## 602   BASHAR        Littorina littorea Common periwinkle 2014      0         6
## 603   BASHAR Testudinalia testudinalis            Limpet 2014      0         3
## 604   BASHAR        Littorina littorea Common periwinkle 2015      5        15
## 605   BASHAR        Littorina obtusata Smooth periwinkle 2015      0         3
## 606   BASHAR          Nucella lapillus          Dogwhelk 2015      1         3
## 607   BASHAR Testudinalia testudinalis            Limpet 2015      0         3
## 608   BASHAR        Littorina littorea Common periwinkle 2016      0        51
## 609   BASHAR Testudinalia testudinalis            Limpet 2016      0         2
## 610   BASHAR        Littorina littorea Common periwinkle 2017      0        63
## 611   BASHAR        Littorina obtusata Smooth periwinkle 2017      0         1
## 612   BASHAR       Littorina saxatilis  Rough periwinkle 2017      0         1
## 613   BASHAR Testudinalia testudinalis            Limpet 2017      0         2
## 614   BASHAR        Littorina littorea Common periwinkle 2018      5       101
## 615   BASHAR Testudinalia testudinalis            Limpet 2018      0         4
## 616   BASHAR        Littorina littorea Common periwinkle 2019      7       125
## 617   BASHAR        Littorina obtusata Smooth periwinkle 2019      0         2
## 618   BASHAR          Nucella lapillus          Dogwhelk 2019      0         1
## 619   BASHAR Testudinalia testudinalis            Limpet 2019      0         5
## 620   BASHAR        Littorina littorea Common periwinkle 2021     13       107
## 621   BASHAR        Littorina littorea Common periwinkle 2022      0       148
## 622   BASHAR        Littorina obtusata Smooth periwinkle 2022      0         2
## 623   BASHAR          Nucella lapillus          Dogwhelk 2022      0         2
## 624   BASHAR           Carcinus maenas        Green crab 2023      0         1
## 625   BASHAR        Littorina littorea Common periwinkle 2023     34       180
## 626   BASHAR Testudinalia testudinalis            Limpet 2023      0         2
## 627   BASHAR        Littorina littorea Common periwinkle 2024      4        36
## 628   BASHAR        Littorina littorea Common periwinkle 2013      0         1
## 629   BASHAR        Littorina littorea Common periwinkle 2013      0         1
## 630   BASHAR        Littorina littorea Common periwinkle 2014      2        12
## 631   BASHAR        Littorina obtusata Smooth periwinkle 2014      0         1
## 632   BASHAR Testudinalia testudinalis            Limpet 2014      0         6
## 633   BASHAR        Littorina littorea Common periwinkle 2015      9        58
## 634   BASHAR        Littorina obtusata Smooth periwinkle 2015      0         1
## 635   BASHAR Testudinalia testudinalis            Limpet 2015      0         5
## 636   BASHAR        Littorina littorea Common periwinkle 2016      1         6
## 637   BASHAR Testudinalia testudinalis            Limpet 2016      0         1
## 638   BASHAR        Littorina littorea Common periwinkle 2017      1       131
## 639   BASHAR        Littorina obtusata Smooth periwinkle 2017      0         1
## 640   BASHAR          Nucella lapillus          Dogwhelk 2017      0         1
## 641   BASHAR Testudinalia testudinalis            Limpet 2017      0         2
## 642   BASHAR        Littorina littorea Common periwinkle 2018     12       106
## 643   BASHAR        Littorina obtusata Smooth periwinkle 2018      0         1
## 644   BASHAR          Nucella lapillus          Dogwhelk 2018      0         3
## 645   BASHAR Testudinalia testudinalis            Limpet 2018      0         5
## 646   BASHAR        Littorina littorea Common periwinkle 2019     11      1960
## 647   BASHAR        Littorina obtusata Smooth periwinkle 2019      0         1
## 648   BASHAR Testudinalia testudinalis            Limpet 2019      0         3
## 649   BASHAR           Carcinus maenas        Green crab 2021      0         2
## 650   BASHAR        Littorina littorea Common periwinkle 2021     15       224
## 651   BASHAR          Nucella lapillus          Dogwhelk 2021      3         0
## 652   BASHAR Testudinalia testudinalis            Limpet 2021      0         3
## 653   BASHAR        Littorina littorea Common periwinkle 2022      3       100
## 654   BASHAR        Littorina obtusata Smooth periwinkle 2022      0         4
## 655   BASHAR Testudinalia testudinalis            Limpet 2022      0         6
## 656   BASHAR        Littorina littorea Common periwinkle 2023     26       150
## 657   BASHAR Testudinalia testudinalis            Limpet 2023      0         3
## 658   BASHAR        Littorina littorea Common periwinkle 2024      3        62
## 659   BASHAR          Nucella lapillus          Dogwhelk 2013      0         1
## 660   BASHAR          Nucella lapillus          Dogwhelk 2013      0         2
## 661   BASHAR        Littorina littorea Common periwinkle 2014      0         1
## 662   BASHAR        Littorina littorea Common periwinkle 2015      4        16
## 663   BASHAR        Littorina littorea Common periwinkle 2016      1         5
## 664   BASHAR          Nucella lapillus          Dogwhelk 2016      0         2
## 665   BASHAR Testudinalia testudinalis            Limpet 2016      0         2
## 666   BASHAR        Littorina littorea Common periwinkle 2017      0        96
## 667   BASHAR          Nucella lapillus          Dogwhelk 2017      0         1
## 668   BASHAR Testudinalia testudinalis            Limpet 2017      0         2
## 669   BASHAR        Littorina littorea Common periwinkle 2018      0       101
## 670   BASHAR          Nucella lapillus          Dogwhelk 2018      0         3
## 671   BASHAR Testudinalia testudinalis            Limpet 2018      0         2
## 672   BASHAR        Littorina littorea Common periwinkle 2019      0        39
## 673   BASHAR        Littorina obtusata Smooth periwinkle 2019      0         1
## 674   BASHAR Testudinalia testudinalis            Limpet 2019      0         1
## 675   BASHAR        Littorina littorea Common periwinkle 2021      6        11
## 676   BASHAR          Nucella lapillus          Dogwhelk 2021      2         0
## 677   BASHAR Testudinalia testudinalis            Limpet 2021      0         1
## 678   BASHAR        Littorina littorea Common periwinkle 2022      5        45
## 679   BASHAR        Littorina obtusata Smooth periwinkle 2022      0         1
## 680   BASHAR Testudinalia testudinalis            Limpet 2022      0         2
## 681   BASHAR        Littorina littorea Common periwinkle 2023      2        26
## 682   BASHAR        Littorina littorea Common periwinkle 2024      0         5

Return first 5 rows and a subset of columns of the data frame

motinv[1:5, c("SiteCode", "ScientificName", "CommonName", "Year", "Damage", "No.Damage")]
View R output
##   SiteCode     ScientificName        CommonName Year Damage No.Damage
## 1   BASHAR Littorina littorea Common periwinkle 2013      0         2
## 2   BASHAR Littorina littorea Common periwinkle 2013      0         3
## 3   BASHAR Littorina obtusata Smooth periwinkle 2013      1         2
## 4   BASHAR Littorina obtusata Smooth periwinkle 2013      0         6
## 5   BASHAR   Nucella lapillus          Dogwhelk 2013      0         1

Return all rows and first 4 columns of the data frame

motinv_sub <- motinv[, 1:4] # works, but risky
motinv_sub2 <- motinv[, c("Network", "UnitCode", "SiteCode", "StartDate")]  #same result, but better
# compare the two data frames to the original
head(motinv)
head(motinv_sub)
head(motinv_sub2)

Coding Tip: As shown above, you can specify columns by name or by column number. However, it's almost always best to refer to columns by name. It makes your code easier to read and prevents it from breaking if columns get reordered.


Test your skills!

CHALLENGE: How would you look at the the first 4 even rows (2, 4, 6, 8), and first 2 columns of the motinv data frame?

Answer
Answer that works
motinv[c(2, 4, 6, 8), c(1, 2)]
##   Network UnitCode
## 2    NETN     ACAD
## 4    NETN     ACAD
## 6    NETN     ACAD
## 8    NETN     ACAD
Better answer that's more stable
names(motinv) # get the names of the first 2 columns
##  [1] "Network"        "UnitCode"       "SiteCode"       "StartDate"     
##  [5] "Year"           "QAQC"           "PlotName"       "CommunityType" 
##  [9] "ScientificName" "CommonName"     "SpeciesCode"    "Damage"        
## [13] "No.Damage"      "Subsampled"
motinv[c(2, 4, 6, 8), c("Network", "UnitCode")]
##   Network UnitCode
## 2    NETN     ACAD
## 4    NETN     ACAD
## 6    NETN     ACAD
## 8    NETN     ACAD

Advanced Bracketry You can do more than just subset by row numbers and column names. Pattern matching to return certain rows or columns is a common and more advanced used of brackets.

Using = vs == vs %in%
A key point in R is knowing when to use a single = or a double ==.
  • When using the equals to assign a value to a new object or column (stay tuned for tomorrow), use the single =.
  • When you're using the equals symbol to match a value, you use the double ==.
  • Conversely, != is interpreted as not equal to for similar use.
  • Another operator is %in%. This operator works just like ==, but for multiple conditions. The == operator is not designed to take more than 1 condition, even though it won't give you an error. Instead, it will stop after it makes the first match.

As you get more comfortable with R, this will become natural. If you forget, R will error and may even give you a hint when you used = instead of ==.

Pattern match (filter) to return a data frame of surveys that were not QAQC visits (QAQC = TRUE) visits and return all columns.

head(motinv)
View R output
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 3    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 4    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 5    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 6    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
##       ScientificName        CommonName SpeciesCode Damage No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT      0         2         No
## 2 Littorina littorea Common periwinkle      LITLIT      0         3         No
## 3 Littorina obtusata Smooth periwinkle      LITOBT      1         2         No
## 4 Littorina obtusata Smooth periwinkle      LITOBT      0         6         No
## 5   Nucella lapillus          Dogwhelk      NUCLAP      0         1         No
## 6 Littorina littorea Common periwinkle      LITLIT      0         2         No

motinv_nonQ <- motinv[motinv$QAQC == FALSE, ]
table(motinv$QAQC) # 42 T
View R output
## 
## FALSE  TRUE 
##   640    42

table(motinv_nonQ$QAQC) # 0 T
View R output
## 
## FALSE 
##   640

Filter data to only return the ScientificName column of rows where CommunityType is "Barnacle". Click on R Output below to view results.

motinv$ScientificName[motinv$CommunityType == "Barnacle"]
View R output
##   [1] "Littorina littorea"        "Littorina saxatilis"      
##   [3] "Nucella lapillus"          "Littorina littorea"       
##   [5] "Littorina obtusata"        "Nucella lapillus"         
##   [7] "Littorina littorea"        "Littorina obtusata"       
##   [9] "Nucella lapillus"          "Littorina littorea"       
##  [11] "Littorina obtusata"        "Littorina littorea"       
##  [13] "Littorina obtusata"        "Littorina saxatilis"      
##  [15] "Littorina littorea"        "Littorina obtusata"       
##  [17] "Littorina saxatilis"       "Carcinus maenas"          
##  [19] "Littorina littorea"        "Littorina obtusata"       
##  [21] "Littorina saxatilis"       "Nucella lapillus"         
##  [23] "Littorina littorea"        "Littorina obtusata"       
##  [25] "Nucella lapillus"          "Littorina littorea"       
##  [27] "Littorina obtusata"        "Littorina saxatilis"      
##  [29] "Littorina littorea"        "Littorina obtusata"       
##  [31] "Littorina littorea"        "Littorina obtusata"       
##  [33] "Littorina littorea"        "Littorina obtusata"       
##  [35] "Littorina littorea"        "Littorina obtusata"       
##  [37] "Littorina littorea"        "Littorina obtusata"       
##  [39] "Nucella lapillus"          "Littorina littorea"       
##  [41] "Littorina obtusata"        "Nucella lapillus"         
##  [43] "Littorina littorea"        "Littorina obtusata"       
##  [45] "Littorina littorea"        "Littorina obtusata"       
##  [47] "Nucella lapillus"          "Carcinus maenas"          
##  [49] "Littorina littorea"        "Littorina obtusata"       
##  [51] "Nucella lapillus"          "Littorina littorea"       
##  [53] "Littorina obtusata"        "Nucella lapillus"         
##  [55] "Carcinus maenas"           "Littorina littorea"       
##  [57] "Littorina obtusata"        "Nucella lapillus"         
##  [59] "Carcinus maenas"           "Littorina littorea"       
##  [61] "Littorina obtusata"        "Littorina saxatilis"      
##  [63] "Nucella lapillus"          "Littorina obtusata"       
##  [65] "Littorina littorea"        "Littorina obtusata"       
##  [67] "Littorina littorea"        "Littorina obtusata"       
##  [69] "Nucella lapillus"          "Littorina littorea"       
##  [71] "Littorina obtusata"        "Littorina littorea"       
##  [73] "Littorina obtusata"        "Littorina saxatilis"      
##  [75] "Littorina littorea"        "Littorina obtusata"       
##  [77] "Nucella lapillus"          "Carcinus maenas"          
##  [79] "Littorina littorea"        "Littorina obtusata"       
##  [81] "Littorina littorea"        "Littorina obtusata"       
##  [83] "Nucella lapillus"          "Littorina littorea"       
##  [85] "Littorina obtusata"        "Nucella lapillus"         
##  [87] "Carcinus maenas"           "Littorina littorea"       
##  [89] "Littorina obtusata"        "Nucella lapillus"         
##  [91] "Littorina obtusata"        "Littorina saxatilis"      
##  [93] "Littorina littorea"        "Littorina obtusata"       
##  [95] "Littorina littorea"        "Littorina obtusata"       
##  [97] "Nucella lapillus"          "Littorina littorea"       
##  [99] "Littorina obtusata"        "Nucella lapillus"         
## [101] "Testudinalia testudinalis" "Littorina littorea"       
## [103] "Littorina obtusata"        "Littorina saxatilis"      
## [105] "Littorina littorea"        "Littorina obtusata"       
## [107] "Littorina saxatilis"       "Littorina littorea"       
## [109] "Littorina obtusata"        "Littorina saxatilis"      
## [111] "Nucella lapillus"          "Littorina littorea"       
## [113] "Littorina obtusata"        "Littorina saxatilis"      
## [115] "Nucella lapillus"          "Littorina littorea"       
## [117] "Littorina obtusata"        "Littorina saxatilis"      
## [119] "Nucella lapillus"          "Carcinus maenas"          
## [121] "Littorina littorea"        "Littorina obtusata"       
## [123] "Nucella lapillus"          "Littorina obtusata"       
## [125] "Littorina littorea"        "Littorina obtusata"       
## [127] "Littorina littorea"        "Littorina littorea"       
## [129] "Littorina obtusata"        "Littorina littorea"       
## [131] "Littorina obtusata"        "Nucella lapillus"         
## [133] "Carcinus maenas"           "Littorina littorea"       
## [135] "Littorina obtusata"        "Littorina saxatilis"      
## [137] "Nucella lapillus"          "Testudinalia testudinalis"
## [139] "Littorina littorea"        "Littorina obtusata"       
## [141] "Littorina saxatilis"       "Carcinus maenas"          
## [143] "Littorina littorea"        "Littorina obtusata"       
## [145] "Nucella lapillus"          "Littorina littorea"       
## [147] "Littorina obtusata"        "Nucella lapillus"

motinv[motinv$CommunityType == "Barnacle", "ScientificName"] # equivalent
View R output
##   [1] "Littorina littorea"        "Littorina saxatilis"      
##   [3] "Nucella lapillus"          "Littorina littorea"       
##   [5] "Littorina obtusata"        "Nucella lapillus"         
##   [7] "Littorina littorea"        "Littorina obtusata"       
##   [9] "Nucella lapillus"          "Littorina littorea"       
##  [11] "Littorina obtusata"        "Littorina littorea"       
##  [13] "Littorina obtusata"        "Littorina saxatilis"      
##  [15] "Littorina littorea"        "Littorina obtusata"       
##  [17] "Littorina saxatilis"       "Carcinus maenas"          
##  [19] "Littorina littorea"        "Littorina obtusata"       
##  [21] "Littorina saxatilis"       "Nucella lapillus"         
##  [23] "Littorina littorea"        "Littorina obtusata"       
##  [25] "Nucella lapillus"          "Littorina littorea"       
##  [27] "Littorina obtusata"        "Littorina saxatilis"      
##  [29] "Littorina littorea"        "Littorina obtusata"       
##  [31] "Littorina littorea"        "Littorina obtusata"       
##  [33] "Littorina littorea"        "Littorina obtusata"       
##  [35] "Littorina littorea"        "Littorina obtusata"       
##  [37] "Littorina littorea"        "Littorina obtusata"       
##  [39] "Nucella lapillus"          "Littorina littorea"       
##  [41] "Littorina obtusata"        "Nucella lapillus"         
##  [43] "Littorina littorea"        "Littorina obtusata"       
##  [45] "Littorina littorea"        "Littorina obtusata"       
##  [47] "Nucella lapillus"          "Carcinus maenas"          
##  [49] "Littorina littorea"        "Littorina obtusata"       
##  [51] "Nucella lapillus"          "Littorina littorea"       
##  [53] "Littorina obtusata"        "Nucella lapillus"         
##  [55] "Carcinus maenas"           "Littorina littorea"       
##  [57] "Littorina obtusata"        "Nucella lapillus"         
##  [59] "Carcinus maenas"           "Littorina littorea"       
##  [61] "Littorina obtusata"        "Littorina saxatilis"      
##  [63] "Nucella lapillus"          "Littorina obtusata"       
##  [65] "Littorina littorea"        "Littorina obtusata"       
##  [67] "Littorina littorea"        "Littorina obtusata"       
##  [69] "Nucella lapillus"          "Littorina littorea"       
##  [71] "Littorina obtusata"        "Littorina littorea"       
##  [73] "Littorina obtusata"        "Littorina saxatilis"      
##  [75] "Littorina littorea"        "Littorina obtusata"       
##  [77] "Nucella lapillus"          "Carcinus maenas"          
##  [79] "Littorina littorea"        "Littorina obtusata"       
##  [81] "Littorina littorea"        "Littorina obtusata"       
##  [83] "Nucella lapillus"          "Littorina littorea"       
##  [85] "Littorina obtusata"        "Nucella lapillus"         
##  [87] "Carcinus maenas"           "Littorina littorea"       
##  [89] "Littorina obtusata"        "Nucella lapillus"         
##  [91] "Littorina obtusata"        "Littorina saxatilis"      
##  [93] "Littorina littorea"        "Littorina obtusata"       
##  [95] "Littorina littorea"        "Littorina obtusata"       
##  [97] "Nucella lapillus"          "Littorina littorea"       
##  [99] "Littorina obtusata"        "Nucella lapillus"         
## [101] "Testudinalia testudinalis" "Littorina littorea"       
## [103] "Littorina obtusata"        "Littorina saxatilis"      
## [105] "Littorina littorea"        "Littorina obtusata"       
## [107] "Littorina saxatilis"       "Littorina littorea"       
## [109] "Littorina obtusata"        "Littorina saxatilis"      
## [111] "Nucella lapillus"          "Littorina littorea"       
## [113] "Littorina obtusata"        "Littorina saxatilis"      
## [115] "Nucella lapillus"          "Littorina littorea"       
## [117] "Littorina obtusata"        "Littorina saxatilis"      
## [119] "Nucella lapillus"          "Carcinus maenas"          
## [121] "Littorina littorea"        "Littorina obtusata"       
## [123] "Nucella lapillus"          "Littorina obtusata"       
## [125] "Littorina littorea"        "Littorina obtusata"       
## [127] "Littorina littorea"        "Littorina littorea"       
## [129] "Littorina obtusata"        "Littorina littorea"       
## [131] "Littorina obtusata"        "Nucella lapillus"         
## [133] "Carcinus maenas"           "Littorina littorea"       
## [135] "Littorina obtusata"        "Littorina saxatilis"      
## [137] "Nucella lapillus"          "Testudinalia testudinalis"
## [139] "Littorina littorea"        "Littorina obtusata"       
## [141] "Littorina saxatilis"       "Carcinus maenas"          
## [143] "Littorina littorea"        "Littorina obtusata"       
## [145] "Nucella lapillus"          "Littorina littorea"       
## [147] "Littorina obtusata"        "Nucella lapillus"

Filter data to return any plot where Littorina species were detected in the barnacle plots.

lit_spp <- c("Littorina littorea", "Littorina obtusata", "Littorina saxatilis")
motinv_lit <- motinv[motinv$ScientificName %in% lit_spp, 
                     c("SiteCode", "PlotName", "ScientificName", "Year")]
motinv_lit
View R output
##     SiteCode PlotName      ScientificName Year
## 1     BASHAR       A1  Littorina littorea 2013
## 2     BASHAR       A1  Littorina littorea 2013
## 3     BASHAR       A1  Littorina obtusata 2013
## 4     BASHAR       A1  Littorina obtusata 2013
## 6     BASHAR       A1  Littorina littorea 2014
## 7     BASHAR       A1  Littorina obtusata 2014
## 8     BASHAR       A1  Littorina littorea 2016
## 9     BASHAR       A1  Littorina obtusata 2016
## 10    BASHAR       A1  Littorina littorea 2017
## 11    BASHAR       A1  Littorina obtusata 2017
## 12    BASHAR       A1  Littorina littorea 2018
## 13    BASHAR       A1  Littorina obtusata 2018
## 14    BASHAR       A1  Littorina littorea 2019
## 15    BASHAR       A1  Littorina obtusata 2019
## 17    BASHAR       A1  Littorina littorea 2021
## 18    BASHAR       A1  Littorina obtusata 2021
## 20    BASHAR       A1  Littorina littorea 2022
## 21    BASHAR       A1  Littorina obtusata 2022
## 23    BASHAR       A1  Littorina littorea 2023
## 24    BASHAR       A1  Littorina obtusata 2023
## 26    BASHAR       A1  Littorina littorea 2024
## 27    BASHAR       A1 Littorina saxatilis 2024
## 28    BASHAR       A2  Littorina littorea 2013
## 29    BASHAR       A2  Littorina littorea 2013
## 30    BASHAR       A2  Littorina obtusata 2013
## 31    BASHAR       A2  Littorina obtusata 2013
## 34    BASHAR       A2  Littorina littorea 2014
## 35    BASHAR       A2  Littorina obtusata 2014
## 36    BASHAR       A2 Littorina saxatilis 2014
## 37    BASHAR       A2  Littorina littorea 2015
## 38    BASHAR       A2  Littorina obtusata 2015
## 40    BASHAR       A2  Littorina littorea 2016
## 41    BASHAR       A2  Littorina obtusata 2016
## 43    BASHAR       A2  Littorina littorea 2017
## 44    BASHAR       A2  Littorina obtusata 2017
## 45    BASHAR       A2  Littorina littorea 2018
## 46    BASHAR       A2  Littorina obtusata 2018
## 47    BASHAR       A2  Littorina littorea 2019
## 48    BASHAR       A2  Littorina obtusata 2019
## 50    BASHAR       A2  Littorina littorea 2021
## 51    BASHAR       A2  Littorina obtusata 2021
## 53    BASHAR       A2  Littorina littorea 2022
## 54    BASHAR       A2  Littorina obtusata 2022
## 57    BASHAR       A2  Littorina littorea 2023
## 58    BASHAR       A2  Littorina obtusata 2023
## 60    BASHAR       A2  Littorina littorea 2024
## 61    BASHAR       A2  Littorina obtusata 2024
## 62    BASHAR       A3  Littorina littorea 2013
## 63    BASHAR       A3  Littorina littorea 2013
## 64    BASHAR       A3  Littorina obtusata 2013
## 65    BASHAR       A3  Littorina obtusata 2013
## 68    BASHAR       A3  Littorina littorea 2014
## 69    BASHAR       A3  Littorina obtusata 2014
## 70    BASHAR       A3  Littorina littorea 2015
## 71    BASHAR       A3  Littorina obtusata 2015
## 72    BASHAR       A3  Littorina littorea 2016
## 73    BASHAR       A3  Littorina obtusata 2016
## 75    BASHAR       A3  Littorina littorea 2017
## 76    BASHAR       A3  Littorina obtusata 2017
## 77    BASHAR       A3  Littorina littorea 2018
## 78    BASHAR       A3  Littorina obtusata 2018
## 80    BASHAR       A3  Littorina littorea 2019
## 81    BASHAR       A3  Littorina obtusata 2019
## 83    BASHAR       A3  Littorina littorea 2021
## 84    BASHAR       A3  Littorina obtusata 2021
## 86    BASHAR       A3  Littorina littorea 2022
## 87    BASHAR       A3  Littorina obtusata 2022
## 89    BASHAR       A3  Littorina littorea 2023
## 90    BASHAR       A3  Littorina obtusata 2023
## 91    BASHAR       A3  Littorina littorea 2024
## 92    BASHAR       A3  Littorina obtusata 2024
## 93    BASHAR       A4  Littorina littorea 2013
## 94    BASHAR       A4  Littorina littorea 2013
## 95    BASHAR       A4  Littorina obtusata 2013
## 96    BASHAR       A4  Littorina obtusata 2013
## 98    BASHAR       A4  Littorina littorea 2014
## 99    BASHAR       A4  Littorina obtusata 2014
## 101   BASHAR       A4  Littorina littorea 2015
## 102   BASHAR       A4  Littorina obtusata 2015
## 103   BASHAR       A4  Littorina littorea 2016
## 104   BASHAR       A4  Littorina obtusata 2016
## 106   BASHAR       A4  Littorina littorea 2017
## 107   BASHAR       A4  Littorina obtusata 2017
## 108   BASHAR       A4  Littorina littorea 2018
## 109   BASHAR       A4  Littorina obtusata 2018
## 112   BASHAR       A4  Littorina littorea 2019
## 113   BASHAR       A4  Littorina obtusata 2019
## 115   BASHAR       A4  Littorina littorea 2021
## 116   BASHAR       A4  Littorina obtusata 2021
## 118   BASHAR       A4  Littorina littorea 2022
## 119   BASHAR       A4  Littorina obtusata 2022
## 122   BASHAR       A4  Littorina littorea 2023
## 123   BASHAR       A4  Littorina obtusata 2023
## 125   BASHAR       A4  Littorina littorea 2024
## 126   BASHAR       A4  Littorina obtusata 2024
## 127   BASHAR       A5  Littorina littorea 2013
## 128   BASHAR       A5  Littorina littorea 2013
## 129   BASHAR       A5  Littorina obtusata 2013
## 130   BASHAR       A5  Littorina obtusata 2013
## 131   BASHAR       A5  Littorina littorea 2014
## 132   BASHAR       A5  Littorina obtusata 2014
## 133   BASHAR       A5  Littorina littorea 2015
## 134   BASHAR       A5  Littorina obtusata 2015
## 135   BASHAR       A5  Littorina littorea 2016
## 136   BASHAR       A5  Littorina obtusata 2016
## 137   BASHAR       A5  Littorina littorea 2017
## 138   BASHAR       A5  Littorina obtusata 2017
## 140   BASHAR       A5  Littorina littorea 2018
## 141   BASHAR       A5  Littorina obtusata 2018
## 142   BASHAR       A5  Littorina littorea 2019
## 143   BASHAR       A5  Littorina obtusata 2019
## 145   BASHAR       A5  Littorina littorea 2021
## 146   BASHAR       A5  Littorina obtusata 2021
## 147   BASHAR       A5  Littorina littorea 2022
## 148   BASHAR       A5  Littorina obtusata 2022
## 151   BASHAR       A5  Littorina littorea 2023
## 152   BASHAR       A5  Littorina obtusata 2023
## 153   BASHAR       A5  Littorina littorea 2024
## 154   BASHAR       A5  Littorina obtusata 2024
## 155   BASHAR       B1  Littorina littorea 2013
## 156   BASHAR       B1 Littorina saxatilis 2013
## 158   BASHAR       B1  Littorina littorea 2015
## 159   BASHAR       B1  Littorina obtusata 2015
## 161   BASHAR       B1  Littorina littorea 2016
## 162   BASHAR       B1  Littorina obtusata 2016
## 164   BASHAR       B1  Littorina littorea 2017
## 165   BASHAR       B1  Littorina obtusata 2017
## 166   BASHAR       B1  Littorina littorea 2018
## 167   BASHAR       B1  Littorina obtusata 2018
## 168   BASHAR       B1 Littorina saxatilis 2018
## 169   BASHAR       B1  Littorina littorea 2019
## 170   BASHAR       B1  Littorina obtusata 2019
## 171   BASHAR       B1 Littorina saxatilis 2019
## 173   BASHAR       B1  Littorina littorea 2021
## 174   BASHAR       B1  Littorina obtusata 2021
## 175   BASHAR       B1 Littorina saxatilis 2021
## 177   BASHAR       B1  Littorina littorea 2022
## 178   BASHAR       B1  Littorina obtusata 2022
## 180   BASHAR       B1  Littorina littorea 2023
## 181   BASHAR       B1  Littorina obtusata 2023
## 182   BASHAR       B1 Littorina saxatilis 2023
## 183   BASHAR       B1  Littorina littorea 2024
## 184   BASHAR       B1  Littorina obtusata 2024
## 185   BASHAR       B2  Littorina littorea 2013
## 186   BASHAR       B2  Littorina obtusata 2013
## 187   BASHAR       B2  Littorina littorea 2014
## 188   BASHAR       B2  Littorina obtusata 2014
## 189   BASHAR       B2  Littorina littorea 2015
## 190   BASHAR       B2  Littorina obtusata 2015
## 191   BASHAR       B2  Littorina littorea 2016
## 192   BASHAR       B2  Littorina obtusata 2016
## 194   BASHAR       B2  Littorina littorea 2017
## 195   BASHAR       B2  Littorina obtusata 2017
## 197   BASHAR       B2  Littorina littorea 2018
## 198   BASHAR       B2  Littorina obtusata 2018
## 199   BASHAR       B2  Littorina littorea 2019
## 200   BASHAR       B2  Littorina obtusata 2019
## 203   BASHAR       B2  Littorina littorea 2021
## 204   BASHAR       B2  Littorina obtusata 2021
## 206   BASHAR       B2  Littorina littorea 2022
## 207   BASHAR       B2  Littorina obtusata 2022
## 210   BASHAR       B2  Littorina littorea 2023
## 211   BASHAR       B2  Littorina obtusata 2023
## 214   BASHAR       B2  Littorina littorea 2024
## 215   BASHAR       B2  Littorina obtusata 2024
## 216   BASHAR       B2 Littorina saxatilis 2024
## 218   BASHAR       B3  Littorina obtusata 2014
## 219   BASHAR       B3  Littorina littorea 2015
## 220   BASHAR       B3  Littorina obtusata 2015
## 221   BASHAR       B3  Littorina littorea 2016
## 222   BASHAR       B3  Littorina obtusata 2016
## 224   BASHAR       B3  Littorina littorea 2017
## 225   BASHAR       B3  Littorina obtusata 2017
## 226   BASHAR       B3  Littorina littorea 2018
## 227   BASHAR       B3  Littorina obtusata 2018
## 228   BASHAR       B3 Littorina saxatilis 2018
## 229   BASHAR       B3  Littorina littorea 2019
## 230   BASHAR       B3  Littorina obtusata 2019
## 233   BASHAR       B3  Littorina littorea 2021
## 234   BASHAR       B3  Littorina obtusata 2021
## 235   BASHAR       B3  Littorina littorea 2022
## 236   BASHAR       B3  Littorina obtusata 2022
## 238   BASHAR       B3  Littorina littorea 2023
## 239   BASHAR       B3  Littorina obtusata 2023
## 242   BASHAR       B3  Littorina littorea 2024
## 243   BASHAR       B3  Littorina obtusata 2024
## 245   BASHAR       B4  Littorina obtusata 2013
## 246   BASHAR       B4 Littorina saxatilis 2013
## 247   BASHAR       B4  Littorina littorea 2014
## 248   BASHAR       B4  Littorina obtusata 2014
## 249   BASHAR       B4  Littorina littorea 2015
## 250   BASHAR       B4  Littorina obtusata 2015
## 252   BASHAR       B4  Littorina littorea 2016
## 253   BASHAR       B4  Littorina obtusata 2016
## 256   BASHAR       B4  Littorina littorea 2017
## 257   BASHAR       B4  Littorina obtusata 2017
## 258   BASHAR       B4 Littorina saxatilis 2017
## 259   BASHAR       B4  Littorina littorea 2018
## 260   BASHAR       B4  Littorina obtusata 2018
## 261   BASHAR       B4 Littorina saxatilis 2018
## 262   BASHAR       B4  Littorina littorea 2019
## 263   BASHAR       B4  Littorina obtusata 2019
## 264   BASHAR       B4 Littorina saxatilis 2019
## 266   BASHAR       B4  Littorina littorea 2021
## 267   BASHAR       B4  Littorina obtusata 2021
## 268   BASHAR       B4 Littorina saxatilis 2021
## 270   BASHAR       B4  Littorina littorea 2022
## 271   BASHAR       B4  Littorina obtusata 2022
## 272   BASHAR       B4 Littorina saxatilis 2022
## 275   BASHAR       B4  Littorina littorea 2024
## 276   BASHAR       B4  Littorina obtusata 2024
## 278   BASHAR       B5  Littorina obtusata 2015
## 279   BASHAR       B5  Littorina littorea 2016
## 280   BASHAR       B5  Littorina obtusata 2016
## 281   BASHAR       B5  Littorina littorea 2017
## 282   BASHAR       B5  Littorina littorea 2018
## 283   BASHAR       B5  Littorina obtusata 2018
## 284   BASHAR       B5  Littorina littorea 2019
## 285   BASHAR       B5  Littorina obtusata 2019
## 288   BASHAR       B5  Littorina littorea 2021
## 289   BASHAR       B5  Littorina obtusata 2021
## 290   BASHAR       B5 Littorina saxatilis 2021
## 293   BASHAR       B5  Littorina littorea 2022
## 294   BASHAR       B5  Littorina obtusata 2022
## 295   BASHAR       B5 Littorina saxatilis 2022
## 297   BASHAR       B5  Littorina littorea 2023
## 298   BASHAR       B5  Littorina obtusata 2023
## 300   BASHAR       B5  Littorina littorea 2024
## 301   BASHAR       B5  Littorina obtusata 2024
## 303   BASHAR       F1  Littorina littorea 2013
## 304   BASHAR       F1  Littorina littorea 2013
## 305   BASHAR       F1  Littorina obtusata 2013
## 306   BASHAR       F1  Littorina obtusata 2013
## 310   BASHAR       F1  Littorina littorea 2014
## 311   BASHAR       F1  Littorina obtusata 2014
## 313   BASHAR       F1  Littorina littorea 2015
## 314   BASHAR       F1  Littorina obtusata 2015
## 317   BASHAR       F1  Littorina littorea 2016
## 318   BASHAR       F1  Littorina obtusata 2016
## 320   BASHAR       F1  Littorina littorea 2017
## 321   BASHAR       F1  Littorina obtusata 2017
## 324   BASHAR       F1  Littorina littorea 2018
## 325   BASHAR       F1  Littorina obtusata 2018
## 329   BASHAR       F1  Littorina littorea 2019
## 330   BASHAR       F1  Littorina obtusata 2019
## 333   BASHAR       F1  Littorina littorea 2021
## 334   BASHAR       F1  Littorina obtusata 2021
## 338   BASHAR       F1  Littorina littorea 2022
## 339   BASHAR       F1  Littorina obtusata 2022
## 343   BASHAR       F1  Littorina littorea 2023
## 344   BASHAR       F1  Littorina obtusata 2023
## 348   BASHAR       F1  Littorina littorea 2024
## 349   BASHAR       F1  Littorina obtusata 2024
## 351   BASHAR       F2  Littorina littorea 2013
## 352   BASHAR       F2  Littorina obtusata 2013
## 353   BASHAR       F2  Littorina obtusata 2013
## 357   BASHAR       F2  Littorina littorea 2014
## 358   BASHAR       F2  Littorina obtusata 2014
## 361   BASHAR       F2  Littorina littorea 2015
## 362   BASHAR       F2  Littorina obtusata 2015
## 364   BASHAR       F2  Littorina littorea 2016
## 365   BASHAR       F2  Littorina obtusata 2016
## 366   BASHAR       F2 Littorina saxatilis 2016
## 369   BASHAR       F2  Littorina littorea 2017
## 370   BASHAR       F2  Littorina obtusata 2017
## 372   BASHAR       F2  Littorina littorea 2018
## 373   BASHAR       F2  Littorina obtusata 2018
## 376   BASHAR       F2  Littorina littorea 2019
## 377   BASHAR       F2  Littorina obtusata 2019
## 380   BASHAR       F2  Littorina littorea 2021
## 381   BASHAR       F2  Littorina obtusata 2021
## 384   BASHAR       F2  Littorina littorea 2022
## 385   BASHAR       F2  Littorina obtusata 2022
## 386   BASHAR       F2 Littorina saxatilis 2022
## 390   BASHAR       F2  Littorina littorea 2023
## 391   BASHAR       F2  Littorina obtusata 2023
## 393   BASHAR       F2  Littorina littorea 2024
## 394   BASHAR       F2  Littorina obtusata 2024
## 396   BASHAR       F3  Littorina littorea 2013
## 397   BASHAR       F3  Littorina littorea 2013
## 398   BASHAR       F3  Littorina obtusata 2013
## 399   BASHAR       F3  Littorina obtusata 2013
## 404   BASHAR       F3  Littorina littorea 2014
## 405   BASHAR       F3  Littorina obtusata 2014
## 408   BASHAR       F3  Littorina littorea 2015
## 409   BASHAR       F3  Littorina obtusata 2015
## 411   BASHAR       F3  Littorina littorea 2016
## 412   BASHAR       F3  Littorina obtusata 2016
## 414   BASHAR       F3  Littorina littorea 2017
## 415   BASHAR       F3  Littorina obtusata 2017
## 418   BASHAR       F3  Littorina littorea 2018
## 419   BASHAR       F3  Littorina obtusata 2018
## 422   BASHAR       F3  Littorina littorea 2019
## 423   BASHAR       F3  Littorina obtusata 2019
## 424   BASHAR       F3  Littorina littorea 2021
## 425   BASHAR       F3  Littorina obtusata 2021
## 428   BASHAR       F3  Littorina littorea 2022
## 429   BASHAR       F3  Littorina obtusata 2022
## 433   BASHAR       F3  Littorina littorea 2023
## 434   BASHAR       F3  Littorina obtusata 2023
## 437   BASHAR       F3  Littorina littorea 2024
## 438   BASHAR       F3  Littorina obtusata 2024
## 440   BASHAR       F4  Littorina littorea 2013
## 441   BASHAR       F4  Littorina littorea 2013
## 442   BASHAR       F4  Littorina obtusata 2013
## 443   BASHAR       F4  Littorina obtusata 2013
## 447   BASHAR       F4  Littorina littorea 2014
## 448   BASHAR       F4  Littorina obtusata 2014
## 450   BASHAR       F4  Littorina littorea 2015
## 451   BASHAR       F4  Littorina obtusata 2015
## 452   BASHAR       F4  Littorina littorea 2016
## 453   BASHAR       F4  Littorina obtusata 2016
## 456   BASHAR       F4  Littorina littorea 2017
## 457   BASHAR       F4  Littorina obtusata 2017
## 459   BASHAR       F4  Littorina littorea 2018
## 460   BASHAR       F4  Littorina obtusata 2018
## 463   BASHAR       F4  Littorina littorea 2019
## 464   BASHAR       F4  Littorina obtusata 2019
## 467   BASHAR       F4  Littorina littorea 2021
## 468   BASHAR       F4  Littorina obtusata 2021
## 471   BASHAR       F4  Littorina littorea 2022
## 472   BASHAR       F4  Littorina obtusata 2022
## 475   BASHAR       F4  Littorina littorea 2023
## 476   BASHAR       F4  Littorina obtusata 2023
## 479   BASHAR       F4  Littorina littorea 2024
## 480   BASHAR       F4  Littorina obtusata 2024
## 481   BASHAR       F5  Littorina littorea 2013
## 482   BASHAR       F5  Littorina littorea 2013
## 483   BASHAR       F5  Littorina obtusata 2013
## 484   BASHAR       F5  Littorina obtusata 2013
## 489   BASHAR       F5  Littorina littorea 2014
## 490   BASHAR       F5  Littorina obtusata 2014
## 492   BASHAR       F5  Littorina littorea 2015
## 493   BASHAR       F5  Littorina obtusata 2015
## 496   BASHAR       F5  Littorina littorea 2016
## 497   BASHAR       F5  Littorina obtusata 2016
## 500   BASHAR       F5  Littorina littorea 2017
## 501   BASHAR       F5  Littorina obtusata 2017
## 504   BASHAR       F5  Littorina littorea 2018
## 505   BASHAR       F5  Littorina obtusata 2018
## 508   BASHAR       F5  Littorina littorea 2019
## 509   BASHAR       F5  Littorina obtusata 2019
## 512   BASHAR       F5  Littorina littorea 2021
## 513   BASHAR       F5  Littorina obtusata 2021
## 516   BASHAR       F5  Littorina littorea 2022
## 517   BASHAR       F5  Littorina obtusata 2022
## 519   BASHAR       F5  Littorina littorea 2023
## 520   BASHAR       F5  Littorina obtusata 2023
## 524   BASHAR       F5  Littorina littorea 2024
## 525   BASHAR       F5  Littorina obtusata 2024
## 527   BASHAR       R1  Littorina littorea 2013
## 528   BASHAR       R1  Littorina littorea 2013
## 529   BASHAR       R1  Littorina obtusata 2013
## 530   BASHAR       R1  Littorina obtusata 2013
## 533   BASHAR       R1  Littorina littorea 2014
## 534   BASHAR       R1  Littorina obtusata 2014
## 537   BASHAR       R1  Littorina littorea 2015
## 538   BASHAR       R1  Littorina obtusata 2015
## 540   BASHAR       R1  Littorina littorea 2016
## 543   BASHAR       R1  Littorina littorea 2017
## 544   BASHAR       R1  Littorina obtusata 2017
## 545   BASHAR       R1  Littorina littorea 2018
## 546   BASHAR       R1  Littorina obtusata 2018
## 548   BASHAR       R1  Littorina littorea 2019
## 549   BASHAR       R1  Littorina obtusata 2019
## 552   BASHAR       R1  Littorina littorea 2021
## 553   BASHAR       R1  Littorina obtusata 2021
## 555   BASHAR       R1  Littorina littorea 2022
## 556   BASHAR       R1  Littorina obtusata 2022
## 558   BASHAR       R1  Littorina littorea 2023
## 560   BASHAR       R1  Littorina littorea 2024
## 563   BASHAR       R2  Littorina obtusata 2013
## 564   BASHAR       R2  Littorina littorea 2014
## 565   BASHAR       R2  Littorina obtusata 2014
## 567   BASHAR       R2  Littorina littorea 2015
## 568   BASHAR       R2  Littorina obtusata 2015
## 571   BASHAR       R2  Littorina littorea 2016
## 572   BASHAR       R2  Littorina obtusata 2016
## 574   BASHAR       R2  Littorina littorea 2017
## 575   BASHAR       R2  Littorina obtusata 2017
## 577   BASHAR       R2  Littorina littorea 2018
## 578   BASHAR       R2  Littorina obtusata 2018
## 581   BASHAR       R2  Littorina littorea 2019
## 582   BASHAR       R2  Littorina obtusata 2019
## 586   BASHAR       R2  Littorina littorea 2021
## 587   BASHAR       R2  Littorina obtusata 2021
## 589   BASHAR       R2  Littorina littorea 2022
## 590   BASHAR       R2  Littorina obtusata 2022
## 593   BASHAR       R2  Littorina littorea 2023
## 596   BASHAR       R2  Littorina littorea 2024
## 597   BASHAR       R2  Littorina obtusata 2024
## 600   BASHAR       R3  Littorina littorea 2013
## 602   BASHAR       R3  Littorina littorea 2014
## 604   BASHAR       R3  Littorina littorea 2015
## 605   BASHAR       R3  Littorina obtusata 2015
## 608   BASHAR       R3  Littorina littorea 2016
## 610   BASHAR       R3  Littorina littorea 2017
## 611   BASHAR       R3  Littorina obtusata 2017
## 612   BASHAR       R3 Littorina saxatilis 2017
## 614   BASHAR       R3  Littorina littorea 2018
## 616   BASHAR       R3  Littorina littorea 2019
## 617   BASHAR       R3  Littorina obtusata 2019
## 620   BASHAR       R3  Littorina littorea 2021
## 621   BASHAR       R3  Littorina littorea 2022
## 622   BASHAR       R3  Littorina obtusata 2022
## 625   BASHAR       R3  Littorina littorea 2023
## 627   BASHAR       R3  Littorina littorea 2024
## 628   BASHAR       R4  Littorina littorea 2013
## 629   BASHAR       R4  Littorina littorea 2013
## 630   BASHAR       R4  Littorina littorea 2014
## 631   BASHAR       R4  Littorina obtusata 2014
## 633   BASHAR       R4  Littorina littorea 2015
## 634   BASHAR       R4  Littorina obtusata 2015
## 636   BASHAR       R4  Littorina littorea 2016
## 638   BASHAR       R4  Littorina littorea 2017
## 639   BASHAR       R4  Littorina obtusata 2017
## 642   BASHAR       R4  Littorina littorea 2018
## 643   BASHAR       R4  Littorina obtusata 2018
## 646   BASHAR       R4  Littorina littorea 2019
## 647   BASHAR       R4  Littorina obtusata 2019
## 650   BASHAR       R4  Littorina littorea 2021
## 653   BASHAR       R4  Littorina littorea 2022
## 654   BASHAR       R4  Littorina obtusata 2022
## 656   BASHAR       R4  Littorina littorea 2023
## 658   BASHAR       R4  Littorina littorea 2024
## 661   BASHAR       R5  Littorina littorea 2014
## 662   BASHAR       R5  Littorina littorea 2015
## 663   BASHAR       R5  Littorina littorea 2016
## 666   BASHAR       R5  Littorina littorea 2017
## 669   BASHAR       R5  Littorina littorea 2018
## 672   BASHAR       R5  Littorina littorea 2019
## 673   BASHAR       R5  Littorina obtusata 2019
## 675   BASHAR       R5  Littorina littorea 2021
## 678   BASHAR       R5  Littorina littorea 2022
## 679   BASHAR       R5  Littorina obtusata 2022
## 681   BASHAR       R5  Littorina littorea 2023
## 682   BASHAR       R5  Littorina littorea 2024

Coding Tip: There are often multiple ways to perform a task. The best code is code that 1) works, 2) is easy to follow, and 3) is unlikely to break (e.g. use column names instead of numbers). That still means there are typically multiple equally valid approaches. There are other ways to judge good code as you advance, but for now, aspire to write code that meets these three qualities.


Functions unique(), sort(), length()

Determining the number of records that match a certain condition can useful too. Say we want to know how many unique sites were sampled in the motinv data frame. We can use a combination of brackets and other functions to summarize that, like below.

Sort alphabetically a list of unique plot names.

# Return a vector of unique plot names, sorted alphabetically
plots_unique <- sort(unique(motinv[,"PlotName"]))
plots_unique
##  [1] "A1" "A2" "A3" "A4" "A5" "B1" "B2" "B3" "B4" "B5" "F1" "F2" "F3" "F4" "F5"
## [16] "R1" "R2" "R3" "R4" "R5"

Determine number of unique sites

# Returns the number of elements in sites_unique vector
length(plots_unique) # 20
View R output
## [1] 20

CHALLENGE: How many unique species are there in the motinv data frame?

Answer
# Option 1
length(unique(motinv[, "ScientificName"])) # 6
# Option 2
length(unique(motinv$ScientificName)) # equivalent
## [1] 6
CHALLENGE: What years were QAQC visits conducted (QAQC = TRUE)?
Answer
# Option 1 - used unique to just return unique site name
unique(motinv$Year[motinv$QAQC == TRUE]) # 2013
# Option 2
unique(motinv[motinv$QAQC == TRUE, "Year"])
## [1] 2013


Data Exploration

Exploring the data

We've already explored the motile invertebrate dta a bit using head(), str(), names(), and View(). These are functions that you will use over and over as you work with data in R. Below, I'm going to show how I get to know a data set in R. First, to help you picture how these data are collected, here's a site map of the Bass Harbor monitoring site.

Bass Harbor site map
dplyr filter


Read in example rocky intertidal motile invertebrate data

motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")

Look at first few records

head(motinv)
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 3    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 4    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 5    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 6    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
##       ScientificName        CommonName SpeciesCode Damage No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT      0         2         No
## 2 Littorina littorea Common periwinkle      LITLIT      0         3         No
## 3 Littorina obtusata Smooth periwinkle      LITOBT      1         2         No
## 4 Littorina obtusata Smooth periwinkle      LITOBT      0         6         No
## 5   Nucella lapillus          Dogwhelk      NUCLAP      0         1         No
## 6 Littorina littorea Common periwinkle      LITLIT      0         2         No

Look at structure of each column

str(motinv)
## 'data.frame':    682 obs. of  14 variables:
##  $ Network       : chr  "NETN" "NETN" "NETN" "NETN" ...
##  $ UnitCode      : chr  "ACAD" "ACAD" "ACAD" "ACAD" ...
##  $ SiteCode      : chr  "BASHAR" "BASHAR" "BASHAR" "BASHAR" ...
##  $ StartDate     : chr  "6/24/2013" "6/21/2013" "6/24/2013" "6/21/2013" ...
##  $ Year          : int  2013 2013 2013 2013 2013 2014 2014 2016 2016 2017 ...
##  $ QAQC          : logi  TRUE FALSE TRUE FALSE TRUE FALSE ...
##  $ PlotName      : chr  "A1" "A1" "A1" "A1" ...
##  $ CommunityType : chr  "Ascophyllum" "Ascophyllum" "Ascophyllum" "Ascophyllum" ...
##  $ ScientificName: chr  "Littorina littorea" "Littorina littorea" "Littorina obtusata" "Littorina obtusata" ...
##  $ CommonName    : chr  "Common periwinkle" "Common periwinkle" "Smooth periwinkle" "Smooth periwinkle" ...
##  $ SpeciesCode   : chr  "LITLIT" "LITLIT" "LITOBT" "LITOBT" ...
##  $ Damage        : chr  "0" "0" "1" "0" ...
##  $ No.Damage     : int  2 3 2 6 1 2 1 6 9 41 ...
##  $ Subsampled    : chr  "No" "No" "No" "No" ...

Look at summary of the columns

summary(motinv)
##    Network            UnitCode           SiteCode          StartDate        
##  Length:682         Length:682         Length:682         Length:682        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##       Year         QAQC           PlotName         CommunityType     
##  Min.   :2013   Mode :logical   Length:682         Length:682        
##  1st Qu.:2015   FALSE:640       Class :character   Class :character  
##  Median :2018   TRUE :42        Mode  :character   Mode  :character  
##  Mean   :2018                                                        
##  3rd Qu.:2022                                                        
##  Max.   :2024                                                        
##  ScientificName      CommonName        SpeciesCode           Damage         
##  Length:682         Length:682         Length:682         Length:682        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    No.Damage       Subsampled       
##  Min.   :   0.0   Length:682        
##  1st Qu.:   2.0   Class :character  
##  Median :   8.0   Mode  :character  
##  Mean   :  30.9                     
##  3rd Qu.:  35.0                     
##  Max.   :1960.0

Check for complete cases, assuming every column requires a value (i.e. no blanks).

table(complete.cases(motinv[,1:13]))# first 13 columns are all complete
View R output
## 
## TRUE 
##  682

table(complete.cases(motinv$Subsampled))# where the FALSE are introduced
View R output
## 
## FALSE  TRUE 
##    12   670

There's a lot to digest from the summary results.
  • We can see that Network, UnitCode, and SiteCode, are treated as characters, which makes sense.
  • StartDate is interpreted as a character, not date. We'll fix that later.
  • Year is treated as an integer, meaning it's a number with no decimals. That's correct.
  • QAQC is treated as TRUE/FALSE. We'll use that to filter out QAQC visits.
  • PlotName through SpeciesCode are characters, which also makes sense.
  • Damage, which is a count of the individuals of a species that are found damaged, is coming in as a character instead of a number. That's weird, we'll look deeper into why and fix this.
  • No.Damage is interpreted as an integer, which again makes sense. This is the count of non-damaged individuals found in a given Plot. The summary shows a max count of 1960, which seems high.
  • Subsampled is interpreted as a character consisting of "No" and "Yes" values. This would be better handled as a logical TRUE/FALSE data type. There are also 12 NAs in that column.


Sidebar on NAs

To keep data frames rectangular, R treats missing data (i.e. blanks) as NA (stands for not available). A foundational philosophy of R is that the user must tell R functions what do to if NAs are in the data. Ideally that forces the user to investigate the NAs to determine their reason for being there, whether there's a way to fix it, if those records should be dropped, etc. If you try to calculate the mean of a column that has a blank in it, and you don't tell R what to do with NAs, the returned value will be NA. Most summary functions in R have an argument na.rm, which is logical (TRUE/FALSE). To drop NAs, you include na.rm = TRUE.

It's important every time you have NAs in your data to think about what they mean and how best to treat them. Sometimes, it's best to drop them. Other times, converting the blanks to 0 is the best approach. It depends entirely on your data and what you intend to do with it.

Test NA use with mean() function

x <- c(1, 3, 8, 3, 5, NA)
mean(x) # returns NA
## [1] NA
mean(x, na.rm = TRUE) 
## [1] 4


Fix the data The steps we are going to take with the Bass Harbor motile invertebrate data were:
  1. Figure out why Damage is a character, and fix the issue.
  2. Convert StartDate (character) to a Date type.
  3. Rename the ScientificName column to Species.
  4. Create a Site_Plot column that's a combination of SiteCode and PlotName.
  5. Change No.Damage count that's 1960 to 196.
  6. Drop records that are QAQC visits.
Fix the Damage column

Look at unique values for Damage.

sort(unique(motinv$Damage)) # sorts the unique values in the column
##  [1] "0"   "1"   "10"  "11"  "12"  "13"  "14"  "15"  "17"  "18"  "2"   "24" 
## [13] "257" "26"  "3"   "34"  "4"   "5"   "6"   "7"   "8"   "9"   "PM"
table(unique(motinv$Damage)) # shows the number of records per value - very handy
## 
##   0   1  10  11  12  13  14  15  17  18   2  24 257  26   3  34   4   5   6   7 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##   8   9  PM 
##   1   1   1

There's 1 record called "PM", which stands for Permanently Missing in our data. We will convert PM to a blank, which R calls NA, and create a new Damage class column that is converted to numeric.

Convert "PM" to blank. I will first make a copy of the data frame.

motinv2 <- motinv
motinv2$Damage[motinv2$Damage == "PM"] <- NA
motinv2$Damage_num <- as.numeric(motinv2$Damage)

# check that it worked
str(motinv2) # Damage_num is numeric
View R output
## 'data.frame':    682 obs. of  15 variables:
##  $ Network       : chr  "NETN" "NETN" "NETN" "NETN" ...
##  $ UnitCode      : chr  "ACAD" "ACAD" "ACAD" "ACAD" ...
##  $ SiteCode      : chr  "BASHAR" "BASHAR" "BASHAR" "BASHAR" ...
##  $ StartDate     : chr  "6/24/2013" "6/21/2013" "6/24/2013" "6/21/2013" ...
##  $ Year          : int  2013 2013 2013 2013 2013 2014 2014 2016 2016 2017 ...
##  $ QAQC          : logi  TRUE FALSE TRUE FALSE TRUE FALSE ...
##  $ PlotName      : chr  "A1" "A1" "A1" "A1" ...
##  $ CommunityType : chr  "Ascophyllum" "Ascophyllum" "Ascophyllum" "Ascophyllum" ...
##  $ ScientificName: chr  "Littorina littorea" "Littorina littorea" "Littorina obtusata" "Littorina obtusata" ...
##  $ CommonName    : chr  "Common periwinkle" "Common periwinkle" "Smooth periwinkle" "Smooth periwinkle" ...
##  $ SpeciesCode   : chr  "LITLIT" "LITLIT" "LITOBT" "LITOBT" ...
##  $ Damage        : chr  "0" "0" "1" "0" ...
##  $ No.Damage     : int  2 3 2 6 1 2 1 6 9 41 ...
##  $ Subsampled    : chr  "No" "No" "No" "No" ...
##  $ Damage_num    : num  0 0 1 0 0 0 0 0 1 0 ...

sort(unique(motinv2$Damage_num)) # Only numbers show in table
View R output
##  [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  17  18  24
## [20]  26  34 257

Remove QAQC visits

Using the motinv2 data frame, which fixed the Damage column by making the Damage_num field numeric, we're now going to drop visits that were for QAQC using a new base R function called subset(). The subset() function allows you to reduce the dimensions of a data frame. You can reduce rows, columns, or both in the same function call. I will also show the bracket approach.

Remove QAQC visits (IsQAQC == TRUE) and drop the original Damage column

motinv3 <- subset(motinv2, QAQC == FALSE, select = -Damage) # Note the importance of FALSE all caps
motinv3 <- subset(motinv2, QAQC != TRUE, select = -Damage) # equivalent
motinv3 <- motinv2[motinv2$QAQC == FALSE, -12] #equivalent but not as easy to follow
Convert StartDate to Date

Convert StartDate into a date-time instead of character.

# Look at the start date format
head(motinv3) # month/day/year
View R output
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 4    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 6    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
## 7    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
## 8    NETN     ACAD   BASHAR 6/28/2016 2016 FALSE       A1   Ascophyllum
## 9    NETN     ACAD   BASHAR 6/28/2016 2016 FALSE       A1   Ascophyllum
##       ScientificName        CommonName SpeciesCode No.Damage Subsampled
## 2 Littorina littorea Common periwinkle      LITLIT         3         No
## 4 Littorina obtusata Smooth periwinkle      LITOBT         6         No
## 6 Littorina littorea Common periwinkle      LITLIT         2         No
## 7 Littorina obtusata Smooth periwinkle      LITOBT         1         No
## 8 Littorina littorea Common periwinkle      LITLIT         6         No
## 9 Littorina obtusata Smooth periwinkle      LITOBT         9         No
##   Damage_num
## 2          0
## 4          0
## 6          0
## 7          0
## 8          0
## 9          1

# Create new column called Date
motinv3$Date <- as.Date(motinv3$StartDate, format = "%m/%d/%Y")
str(motinv3)
View R output
## 'data.frame':    640 obs. of  15 variables:
##  $ Network       : chr  "NETN" "NETN" "NETN" "NETN" ...
##  $ UnitCode      : chr  "ACAD" "ACAD" "ACAD" "ACAD" ...
##  $ SiteCode      : chr  "BASHAR" "BASHAR" "BASHAR" "BASHAR" ...
##  $ StartDate     : chr  "6/21/2013" "6/21/2013" "6/21/2014" "6/21/2014" ...
##  $ Year          : int  2013 2013 2014 2014 2016 2016 2017 2017 2018 2018 ...
##  $ QAQC          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ PlotName      : chr  "A1" "A1" "A1" "A1" ...
##  $ CommunityType : chr  "Ascophyllum" "Ascophyllum" "Ascophyllum" "Ascophyllum" ...
##  $ ScientificName: chr  "Littorina littorea" "Littorina obtusata" "Littorina littorea" "Littorina obtusata" ...
##  $ CommonName    : chr  "Common periwinkle" "Smooth periwinkle" "Common periwinkle" "Smooth periwinkle" ...
##  $ SpeciesCode   : chr  "LITLIT" "LITOBT" "LITLIT" "LITOBT" ...
##  $ No.Damage     : int  3 6 2 1 6 9 41 1 11 3 ...
##  $ Subsampled    : chr  "No" "No" "No" "No" ...
##  $ Damage_num    : num  0 0 0 0 0 1 0 0 1 0 ...
##  $ Date          : Date, format: "2013-06-21" "2013-06-21" ...

Rename ScientificName to Species for shorter typing

Renaming columns in base R is kind of a pain, and I have to look it up every time I need to do it. I'll show you an easier way to do this tomorrow.

Rename ScientificName column

names(motinv3) # original names
View R output
##  [1] "Network"        "UnitCode"       "SiteCode"       "StartDate"     
##  [5] "Year"           "QAQC"           "PlotName"       "CommunityType" 
##  [9] "ScientificName" "CommonName"     "SpeciesCode"    "No.Damage"     
## [13] "Subsampled"     "Damage_num"     "Date"

names(motinv3)[names(motinv3) == "ScientificName"] <- "Species"
names(motinv3) # check that it worked
View R output
##  [1] "Network"       "UnitCode"      "SiteCode"      "StartDate"    
##  [5] "Year"          "QAQC"          "PlotName"      "CommunityType"
##  [9] "Species"       "CommonName"    "SpeciesCode"   "No.Damage"    
## [13] "Subsampled"    "Damage_num"    "Date"

Create a Site_Plot column via paste()

The paste() and paste0() functions are very handy for creating new columns that are combinations of existing functions. The code below will create a new column named Site_Plot that's a combination of SiteCode and PlotName.

Create new Site_Plot column

motinv3$Site_Plot <- paste(motinv3$SiteCode, motinv3$PlotName, sep = "-")
motinv3$Site_Plot <- paste0(motinv3$SiteCode, "-", motinv3$PlotName) #equivalent- by default no separation between elements of paste.

Coding Tip: In most cases, it does not matter whether you use single ' or double ", as long as you open and close with the same. The cases where it matters are where you have quotes within quotes. There you have to alternate your usage, like print("Text in outer quote 'text printed as being within quotes' end with closing quote").


Test your skills!
CHALLENGE: Using motinv3, how many species are found in PlotName A1 in 2024?
Answer

Option 1. Use brackets, then calculate the number of rows.

# with brackets
A1_2024 <- motinv3[motinv3$PlotName == "A1" & motinv3$Year == 2024, ]
nrow(A1_2024) # 3
View R output
## [1] 3

Option 2. Use base R subset, then view the data.frame to see.

# with base R subset
A1_2024b <- subset(motinv3, PlotName == "A1" & Year == 2024)
View(A1_2024b) # 3
CHALLENGE: What years have green crabs (Latin: Carcinus maenas) been detected?
Answer

Option 1. Subset the data with brackets and use the sort(unique()) to give an easier to read output.

# OPTION 2
gcrab <- motinv3[motinv3$Species == "Carcinus maenas",]
sort(unique(gcrab$Year)) #2019, 2021, 2022, 2023, 2024
## [1] 2019 2021 2022 2023 2024

Option 2. Subset data then use table() to tally the years and number of rows green crabs were found.

gcrab2 <- subset(motinv3, Species == "Carcinus maenas")
table(gcrab2$Year)
## 
## 2019 2021 2022 2023 2024 
##    3   16    6   11   11
CHALLENGE: Find the highest value recorded in No.Damage column.
Answer

There are multiple ways to do this. Two examples are below.

Option 1. View the data and sort by No.Damage.

View(motinv3)

Option 2. Find the max No.Damage count and subset the data frame

max_nd <- max(motinv3$No.Damage, na.rm = TRUE)
motinv3[motinv3$No.Damage == max_nd,]
##     Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 646    NETN     ACAD   BASHAR 6/11/2019 2019 FALSE       R4     Red Algae
##                Species        CommonName SpeciesCode No.Damage Subsampled
## 646 Littorina littorea Common periwinkle      LITLIT      1960         No
##     Damage_num       Date Site_Plot
## 646         11 2019-06-11 BASHAR-R4

CHALLENGE: Fix the No.Damage typo by replacing 1960 with 196.

Answer

Let's say that you looked at the datasheet, and the actual count for No.Damage was 196 instead of 1960. You can change that value in the original CSV by hand. But even better is to document that change in code. There are multiple ways to do this. Two examples are below.

But first, it's good to create a new data frame when modifying the original data frame, so you can refer back to the original if needed. I also use a really specific filter to make sure I'm not accidentally changing other data.

Replace 1960 with 196

# create copy of motinv data
motinv_fix <- motinv3

# find the problematic value, and change it to 196
motinv_fix$No.Damage[motinv_fix$Year == 2019 & 
                       motinv_fix$PlotName == "R4" & 
                       motinv_fix$No.Damage == 1960] <- 196

# check your work
range(motinv3$No.Damage) #1960
## [1]    0 1960
range(motinv_fix$No.Damage) # now 282
## [1]   0 282


Basic Plotting

Basic plotting

Visualizing the data is also important to get a sense for the data and look for potential errors and outliers. Base R has plotting functions that allow you to create quick plots without having to know a lot of code. I often use Base R plot functions when I'm exploring data but not making plots I plan to use for publication. When I need to create more complex plots, I use ggplot2, which we'll cover on Day 2 and 3.

Histograms are a great start. The code below generates a basic histogram plot of a specific column in the dataframe using the hist() function.

Plot histogram of motile invertebrate No.Damage counts

hist(x = motinv3$No.Damage)


Looking at the histogram, it looks like all of the counts are below 500cm except for one that's way out in 2000 range. You can also make a scatterplot of the data. If you only specify one column, the x axis will be the row number for each record, and the y axis will be the specified column.

Make point plot of No.Damage counts

plot(motinv3$No.Damage)


Again, you can see there's one value that's greater than all of the others.

We can also plot two variables in a scatterplot.

Make scatterplot of No.Damage vs. Damage_num (Option 1)

plot(motinv3$No.Damage ~ motinv3$Damage_num)

Make scatterplot of No.Damage vs. Damage_num (Option 2- better axis labels)

plot(No.Damage ~ Damage_num, data = motinv3) # equivalent but cleaner axis titles


Here you can see there's one value that's greater than all of the others in both sets of counts. These would be worth looking at more carefully to determine if they're errors in the data.

CHALLENGE: Plot a histogram of Damage_num in the motinv3 data frame

Answer
hist(motinv3$Damage_num)


Day 2: Data Wrangling

Day 2 Goals

Goals for Day 2:
  • Understanding of tidy data format (rows are observations; columns are variables)
  • Exposure to the main tidyverse packages and philosophy behind it
  • Comfortable filtering and selecting data, and renaming and creating new columns in dplyr
  • How to use ifelse() and case_when() conditional statements
  • Comfortable grouping and summarizing data in dplyr.
  • Difference between summarize() and mutate().
  • Learn how to pivot data from long to wide and wide to long
  • Learn how to join tables and apply the different join types
  • Working with dates and times

Feedback: Please leave feedback in the training feedback form. You can submit feedback multiple times and don't need to answer every question. Responses are anonymous.

friends with tidy
Artwork by @allison_horst


Tidyverse

Tidyverse background

Tidyverse packages From tidyverse.org

We are now going to learn how to subset rows and columns and other common data wrangling tasks using packages in the tidyverse. Taken directly from tidyverse.org: "The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures."

Why I like the tidyverse:
  • Function names and arguments are well-named, clear, and consistent, making code easier to read/understand than base R.
  • Functions assume the tidy data format (rows = observations; columns = variables).
  • Availability of tidyverse help and free online learning materials is excellent.
  • Functions take data as the first argument, making it easy to pipe together multiple tasks (more on pipes later).
  • Column names don't need to be quoted to refer to them in function arguments (called non-standard evaluation), making typing faster.
  • Taken together, these features make the learning curve much shallower than base R.

You should have installed all of the tidyverse packages in preparation for this training. If you missed that step, install tidyverse packages using code below. It can take a few minutes for all the packages to install.

Only run if you haven't installed these packages yet

install.packages('tidyverse')

Load the tidyverse

library(tidyverse)

Coding Tip: When you type library(tidyverse), you're loading all nine the packages in the tidyverse. If you're only using one or two packages, it's better to just load those to packages. It's clearer to the user which packages are needed to run your code and reduces dependencies. For this session, we're only going to use dplyr, so I will just load that.

library(dplyr)

Tidy Data Format For most of the tasks you will do in R, you will want your data organized in a particular format, often referred to as Tidy Data (see below). Tidy data is organized such that columns are variables, like plot_number, date, measurement, and rows are observations. Each cell is a value. Base R and most R packages are designed to work with data in this format. Always following this approach saves mental and coding time in trying to figure out how to organize your data. The tidyverse suite of packages, which we'll talk about tomorrow, are especially optimized to work with data in this format.

Tidy Data
Figure From R for Data Science


Core tidyverse packages (ordered from my most to least used):
  • ggplot2: plotting package based on The Grammar of Graphics (more on that later).
  • dplyr: filtering, selecting, renaming, and summarizing based on SQL.
  • purrr: contains functions like map() that allow you to iterate functions or processes like a for loop.
  • tidyr: reshaping data from wide to long and long to wide.
  • stringr: functions that help you work with strings, such as extracting specific patterns, splitting strings using a specific character (e.g. the '_' in ACAD_101), left pad a number by 0s and turning into a string, etc. There's generally always a base R version of a stringr function, but stringr functions tend to be easier to use and read.
  • readr: includes read functions for csv, and other formats. The read_csv() function, for example has more bells and whistles than the base R read.csv() function. I've never needed those extra features, so I just use read.csv().
  • lubridate: package for working with dates easier. However, the functions are stricter than base R date functions. I tend to prefer base R for that reason.
  • forcats: package to work with factors, which are special character columns that are categorical with defined levels.
  • tibble: these are basically data frames with more checks on the data. When you create a new object from with a tidyverse function, the default result will be a tibble instead of a data frame. 99% of the time, it won't matter whether your data object is a data frame or tibble. The 1% of the time it does matter, has made me strongly dislike tibbles. From tidyverse.org: "Tibbles are data.frames that are lazy and surly: they do less and complain more forcing you to confront problems earlier, typically leading to cleaner, more expressive code". I also prefer the base R data frame format of head(data.frame) over the format for head(tibble).


Wrangling with dplyr

Introduction to dplyr

The dplyr package is perhaps the single most useful package in R for working with your data. dplyr filter Artwork by @allison_horst

Commonly used dplyr functions and their use:
  • filter(): filters data for observations that meet specific criteria.
  • select(): subsets columns by either selecting or removing them.
  • arrange(): sorts data by specified column.
  • mutate(): adds new column(s) to a data frame.
  • slice(): slices the data based on specified number of rows
  • summarize(.by = groups): summarizes by groups/factor levels and returns data for each group (e.g. mean cover grouped by plot).
  • mutate(.by = groups): summarizes by groups/factor levels and returns the original number of rows and columns.
  • rename(): renames columns

Now, using the dplyr package in the tidyverse, we're going to do the same operations we did yesterday with brackets.

The steps we took with the Bass Harbor motile invertebrate data were:
  • Convert Damage to a number.
  • Convert StartDate (character) to a Date type.
  • Rename the ScientificName column to Species.
  • Create a Site_Plot column that's a combination of SiteCode and PlotName.
  • Change No.Damage count that's 1960 to 196.
  • Drop records that are QAQC visits.


Wrangle in dplyr
  1. Read in example motile invertebrate data

  2. motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")
  3. Replace "PM" with NA (blank) in Damage columns.

  4. # Base R
    motinv2 <- motinv
    motinv2$Damage[motinv2$Damage == "PM"] <- NA
    motinv2$Damage_num <- as.numeric(motinv2$Damage)
    # dplyr approach with mutate
    motinv2 <- mutate(motinv, Damage_num = as.numeric(replace(Damage, Damage == "PM", NA)))
    str(motinv2)
    View R output
    ## 'data.frame':    682 obs. of  15 variables:
    ##  $ Network       : chr  "NETN" "NETN" "NETN" "NETN" ...
    ##  $ UnitCode      : chr  "ACAD" "ACAD" "ACAD" "ACAD" ...
    ##  $ SiteCode      : chr  "BASHAR" "BASHAR" "BASHAR" "BASHAR" ...
    ##  $ StartDate     : chr  "6/24/2013" "6/21/2013" "6/24/2013" "6/21/2013" ...
    ##  $ Year          : int  2013 2013 2013 2013 2013 2014 2014 2016 2016 2017 ...
    ##  $ QAQC          : logi  TRUE FALSE TRUE FALSE TRUE FALSE ...
    ##  $ PlotName      : chr  "A1" "A1" "A1" "A1" ...
    ##  $ CommunityType : chr  "Ascophyllum" "Ascophyllum" "Ascophyllum" "Ascophyllum" ...
    ##  $ ScientificName: chr  "Littorina littorea" "Littorina littorea" "Littorina obtusata" "Littorina obtusata" ...
    ##  $ CommonName    : chr  "Common periwinkle" "Common periwinkle" "Smooth periwinkle" "Smooth periwinkle" ...
    ##  $ SpeciesCode   : chr  "LITLIT" "LITLIT" "LITOBT" "LITOBT" ...
    ##  $ Damage        : chr  "0" "0" "1" "0" ...
    ##  $ No.Damage     : int  2 3 2 6 1 2 1 6 9 41 ...
    ##  $ Subsampled    : chr  "No" "No" "No" "No" ...
    ##  $ Damage_num    : num  0 0 1 0 0 0 0 0 1 0 ...

  5. Convert StartDate (character) to Date (date-time).

  6. # Base R
    motinv2$Date <- as.Date(motinv2$StartDate, format = "%m/%d/%Y")
    # dplyr approach with mutate
    motinv2 <- mutate(motinv2, Date = as.Date(StartDate, format = "%m/%d/%Y"))
  7. Rename the ScientificName column to Species.

  8. # Base R code
    names(motinv2)[names(motinv2) == "ScientificName"] <- "Species"
    # dplyr approach with rename
    motinv2 <- rename(motinv2, "Species" = "ScientificName")
    names(motinv2)
    View R output
    ##  [1] "Network"       "UnitCode"      "SiteCode"      "StartDate"    
    ##  [5] "Year"          "QAQC"          "PlotName"      "CommunityType"
    ##  [9] "Species"       "CommonName"    "SpeciesCode"   "Damage"       
    ## [13] "No.Damage"     "Subsampled"    "Damage_num"    "Date"

  9. Create a Site_Plot column that's a combination of SiteCode and PlotName.

  10. # Base R
    motinv2$Site_Plot <- paste(motinv2$SiteCode, motinv2$PlotName, sep = "-")
    # dplyr approach with mutate
    motinv2 <- mutate(motinv2, Site_Plot = paste(SiteCode, PlotName, sep = "-"))
  11. Drop records that are QAQC visits and drop original Damage column.

  12. # Base R
    motinv3 <- subset(motinv2, QAQC == FALSE, select = -Damage) # Note the importance of FALSE all caps
    # dplyr
    motinv3a <- filter(motinv2, QAQC == FALSE)
    motinv3 <- select(motinv3a, -Damage)
    
    head(motinv3)
    View R output
    ##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
    ## 1    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
    ## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
    ## 3    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
    ## 4    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
    ## 5    NETN     ACAD   BASHAR 6/28/2016 2016 FALSE       A1   Ascophyllum
    ## 6    NETN     ACAD   BASHAR 6/28/2016 2016 FALSE       A1   Ascophyllum
    ##              Species        CommonName SpeciesCode No.Damage Subsampled
    ## 1 Littorina littorea Common periwinkle      LITLIT         3         No
    ## 2 Littorina obtusata Smooth periwinkle      LITOBT         6         No
    ## 3 Littorina littorea Common periwinkle      LITLIT         2         No
    ## 4 Littorina obtusata Smooth periwinkle      LITOBT         1         No
    ## 5 Littorina littorea Common periwinkle      LITLIT         6         No
    ## 6 Littorina obtusata Smooth periwinkle      LITOBT         9         No
    ##   Damage_num       Date Site_Plot
    ## 1          0 2013-06-21 BASHAR-A1
    ## 2          0 2013-06-21 BASHAR-A1
    ## 3          0 2014-06-21 BASHAR-A1
    ## 4          0 2014-06-21 BASHAR-A1
    ## 5          0 2016-06-28 BASHAR-A1
    ## 6          1 2016-06-28 BASHAR-A1

    Note that subsetting data frames in R, which refers to reducing rows, columns, or both, is split between 2 functions in dplyr. The filter() function reduces rows. The select() function reduces columns.
  13. Reclass No.Damage outlier

  14. motinv4 <-  mutate(motinv3, No.Damage = replace(No.Damage, No.Damage == 1960, 196))


The magic pipe |>

The pipe (|> or %>%) makes dplyr and other tidyverse packages even more powerful. The pipe |> allows you to string together commands. So, taking all of the code above, we can do it all in the same function call.

Wrangle motile invertebrate data with pipes

motinv_final <- motinv |> 
  mutate(Damage_num = as.numeric(replace(Damage, Damage == "PM", NA)), # Fix Damage PM
         SitePlot = paste(SiteCode, PlotName, sep = "-"), # create new SitePlot column
         Date = as.Date(StartDate, format = "%m/%d/%Y"), # create new Date column
         No.Damage_fix = replace(No.Damage, No.Damage == 1960, 196)) |> # fix error in No.Damage
  rename("Species" = "ScientificName") |> # change column name 
  filter(QAQC == FALSE) |> # drop QAQC visits
  select(-Damage) |> # drop original Damage column
  arrange(SitePlot, Year, Species) # optional sorting the data

head(motinv_final)  
View R output
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 3    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
## 4    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
## 5    NETN     ACAD   BASHAR 6/28/2016 2016 FALSE       A1   Ascophyllum
## 6    NETN     ACAD   BASHAR 6/28/2016 2016 FALSE       A1   Ascophyllum
##              Species        CommonName SpeciesCode No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT         3         No
## 2 Littorina obtusata Smooth periwinkle      LITOBT         6         No
## 3 Littorina littorea Common periwinkle      LITLIT         2         No
## 4 Littorina obtusata Smooth periwinkle      LITOBT         1         No
## 5 Littorina littorea Common periwinkle      LITLIT         6         No
## 6 Littorina obtusata Smooth periwinkle      LITOBT         9         No
##   Damage_num  SitePlot       Date No.Damage_fix
## 1          0 BASHAR-A1 2013-06-21             3
## 2          0 BASHAR-A1 2013-06-21             6
## 3          0 BASHAR-A1 2014-06-21             2
## 4          0 BASHAR-A1 2014-06-21             1
## 5          0 BASHAR-A1 2016-06-28             6
## 6          1 BASHAR-A1 2016-06-28             9

Hopefully you agree that pipes are amazing! They allow for more efficient coding and in relatively easy to follow the steps, and make the dplyr functions, like mutate() so much more useful. Outside of pipes for example, mutate() doesn't feel more useful than base R for creating a new column. From now on, I will use pipes regularly in the code.

If you've ever seen the %>%, that also functions as a pipe with code. The %>% pipe was the original pipe that was introduced by the tidyverse in the magrittr package. The magrittr pipe was so popular, that starting in R 4.0, a base R pipe was introduced (|>). It's supposed to be better optimized for order of operations and reduces a package you need to install. So, in general, use the base R pipe |>. It's also why I had you set the default pipe in Global Options to the |>. A useful keyboard shortcut for the pipe is Ctrl + Shift + M. You should see the |> pipe in your script when you type that shortcut. If you get the %>% pipe instead, you need to change that default setting in Global Options (see Day 1 > R and RStudio > RStudio Global Options > Step 3. Change default pipe.)

Coding Tip: While the number of steps you can pipe together is virtually endless, piping many tasks, especially complex ones, can make code hard to read and troubleshoot. It's best to limit number of pipes to 3-4, and/or to do complex tasks that might fail or require checking on their own.


Test your skills with dplyr!
CHALLENGE: Using motinv, how many species are found in PlotName A1 in 2024?
Answer
# with brackets
A1_2024 <- motinv |> filter(PlotName == "A1" & Year == 2024)
nrow(A1_2024) # 3
View R output
## [1] 3

CHALLENGE: What years have green crabs (Latin: Carcinus maenas) been detected?
Answer
gcrab <- motinv |> filter(ScientificName == "Carcinus maenas") |> 
  select(Year) |> unique()

gcrab
##   Year
## 1 2021
## 2 2022
## 3 2023
## 4 2024
## 9 2019
CHALLENGE: Find the highest value recorded in No.Damage column.
Answer
max_nd <- max(motinv$No.Damage, na.rm = TRUE)

motinv |> filter(No.Damage == max_nd)
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/11/2019 2019 FALSE       R4     Red Algae
##       ScientificName        CommonName SpeciesCode Damage No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT     11      1960         No

CHALLENGE: Fix the No.Damage typo by replacing 1960 with 196.

Answer

Let's say that you looked at the datasheet, and the actual count for No.Damage was 196 instead of 1960. You can change that value in the original CSV by hand. But even better is to document that change in code. There are multiple ways to do this. Two examples are below.

But first, it's good to create a new data frame when modifying the original data frame, so you can refer back to the original if needed. I also use a really specific filter to make sure I'm not accidentally changing other data.

Replace 1960 with 196

# Reminder of the base R approach

# create copy of motinv data
motinv_fix <- motinv

# find the problematic value, and change it to 196
motinv_fix$No.Damage[motinv_fix$Year == 2019 & 
                     motinv_fix$PlotName == "R4" & 
                     motinv_fix$No.Damage == 1960] <- 196

# dplyr approach
motinv_fix <- motinv |> mutate(No.Damage = replace(No.Damage, No.Damage == 1960, 196))
range(motinv_fix$No.Damage)
## [1]   0 282


Conditional Functions

Conditionals 101

Conditional functions ifelse(), if(){ }else{ }, and case_when() allow you to return results that depends on specified conditions.

The main differences between the most common conditional functions:
  • ifelse(): Primarily for use with data frames. Takes 3 arguments: 1) the condition to test; 2) the value to return if condition is true; 3) the value to return of the condition is false. Function can only handle 2 possible outcomes, although nested ifelse() statements are possible (see example below). This function is vectorized, which means it's optimized for working on columns in data frames. Of the 3 conditionals, it tends to perform the fastest on large data sets.
  • case_when(): Primarily for use with data frames. Can take any number of condition statements and their value to return. Requires dplyr package to be loaded. Syntax is a bit tricky to figure out at first, but once you have it, it's about as easy as using ifelse(). This function is akin to SQL CASE WHEN. On large data sets, it consistently performs slower than ifelse().
  • if(){ }else{ }: Can be used with data frames, but is more commonly used for operations outside of data frames. An example would be only running a chunk of code if a certain condition is met (e.g., if the data frame has > 0 rows, run next line of code.)
ifelse()

The ifelse() function takes 3 arguments organized like: ifelse(condition == TRUE, return this, return this instead). The first is the condition you're testing. The second argument is what to return if the condition is met. The third is what to return if the condition is not met. You can also nest ifelse() to include more than 2 conditions, but it can quickly get out of hand and hard to follow (see below).

Let's start by adding a column to the motile invertebrate that uses the SpeciesCode to create a new column called native that is either TRUE for native species, or FALSE for non-native species. We'll add a second column named native_grp that is "native", "exotic", and "invasive". The invasive group includes Asian shore crabs and green crabs.

Create nativity column conditioning on SpeciesCode

# green crab, Asian shore crab, and common periwinkle species codes
exo_spp <- c("CARMAE", "HEMISAN", "LITLIT") 

# smooth periwinkle, rough periwinkle, dogwhelk, and limpet species codes
nat_spp <- c("LITOBT", "LITSAX", "NUCLAP", "TECTES")

# Make a table of species codes in BASHAR
table(motinv$SpeciesCode)
View R output
## 
## CARMAE LITLIT LITOBT LITSAX NUCLAP TECTES 
##     47    220    197     20    116     82

# Add native column with ifelse
motinv <- motinv |> mutate(native = ifelse(SpeciesCode %in% nat_spp, TRUE, FALSE))

# Add native_status column with nested ifelse
motinv <- motinv |> mutate(native_status = ifelse(SpeciesCode %in% nat_spp, "native",
                                                  ifelse(SpeciesCode %in% c("CARMAE", "HEMISAN"), "invasive", 
                                                         "exotic")))

table(motinv$SpeciesCode, motinv$native)
View R output
##         
##          FALSE TRUE
##   CARMAE    47    0
##   LITLIT   220    0
##   LITOBT     0  197
##   LITSAX     0   20
##   NUCLAP     0  116
##   TECTES     0   82

table(motinv$SpeciesCode, motinv$native_status)
View R output
##         
##          exotic invasive native
##   CARMAE      0       47      0
##   LITLIT    220        0      0
##   LITOBT      0        0    197
##   LITSAX      0        0     20
##   NUCLAP      0        0    116
##   TECTES      0        0     82

Remember that we used %in% instead of == because exotic has multiple species codes. Only use == when there's only one condition to match against. The %in% approach can with with any combination of matching conditions, so I almost always use %in% instead of ==.


case_when()

The case_when() function allows you to have multiple conditions, each with their own return. The syntax is a bit different than ifelse() to allow for the multiple conditions and returns. Using the same approach as above, we'll recreate the native_status column with case_when(). We'll then add a fourth output for species codes that don't match any of the previous conditions and set that as 'unknown'. Basically the TRUE just means, any records left are assigned 'unknown'.

Note the order of operations in case_when(). The first step assigns native species a 'native' status. Then, only non-native species are left to condition on. The next step assigns CARMAE and HEMISAN as 'invasive'. The third step conditions on species on the exo_spp group, but only those that weren't already handled in previous steps. Then the fourth statement considers any species not matched as native, invasive, exotic. Rather than relying on this function behavior, it's better to not have overlapping categories (e.g. not include CARMAE and HEMISAN in the exo_spp). I include it here to demonstrate the point.

Create status column conditioning on SpeciesCode

# green crab, Asian shore crab, and common periwinkle species codes
exo_spp <- c("CARMAE", "HEMISAN", "LITLIT") 

# smooth periwinkle, rough periwinkle, dogwhelk, and limpet species codes
nat_spp <- c("LITOBT", "LITSAX", "NUCLAP", "TECTES")

motinv <- motinv |> 
  mutate(native_status = case_when(SpeciesCode %in% nat_spp ~ 'native',
                                   SpeciesCode %in% c("CARMAE", "HEMISAN") ~ 'invasive',
                                   SpeciesCode %in% exo_spp ~ 'exotic', 
                                   TRUE ~ 'unknown'))

table(motinv$SpeciesCode, motinv$native_status) # check that the output worked
View R output
##         
##          exotic invasive native
##   CARMAE      0       47      0
##   LITLIT    220        0      0
##   LITOBT      0        0    197
##   LITSAX      0        0     20
##   NUCLAP      0        0    116
##   TECTES      0        0     82


if(){ }else{ }

This style of if(){ }else{ }, hereafter called if/else, conditionals is best used for operations outside of data frames, like turning code on or off based on specific conditions. I use if/else with ggplot (graphing R package we'll cover later) to turn certain features on or off based on a condition in the data or a condition I set. If/else statements are also helpful for bug handling in your code. For example, if you want the code to send a warning when your data frame is empty (no rows), you can have an if/else statement that prints to the console. You can string together multiple conditions to test by adding more else{ } statements.

Print warning in console that indicates if invasive species are found in the motile invertebrate data.

inv <- motinv |> filter(native_status == "invasive")
spp_det <- unique(inv$CommonName)

if(nrow(inv) > 0){print(paste0("The following invasive species were detected in the data: ", 
                               paste0(spp_det, collapse = ", ")))
  } else {print("No invasive species were detected in the data.")}
## [1] "The following invasive species were detected in the data: Green crab"

Force the else statement to print, by filtering out invasive species before testing. I added another potential else statement just to show that syntax.

inv <- motinv |> filter(SpeciesCode %in% nat_spp) |> 
  filter(native_status == "invasive")
spp_det <- unique(inv$CommonName)

if(nrow(inv) > 0){print(paste0("The following invasive species were detected in the data: ", 
                               paste0(spp_det, collapse = ", ")))
  } else {print("No invasive species were detected in the data.")}
## [1] "No invasive species were detected in the data."


Test your skills with conditionals
CHALLENGE: Using the motinv data, create a new column called trophic that indicates whether the species is an herbivore or predator.
Hint: predator site codes are c("CARMAE", "NUCLAP"), and herbivore site codes are c("LITLIT", "LITOBT", "LITSAX", "TECTES").
Answer
pred <- c("CARMAE", "NUCLAP")

# base R
motinv$trophic <- ifelse(motinv$SpeciesCode %in% pred, "predator", "herbivore")
table(motinv$trophic, motinv$SpeciesCode)
View R output
##            
##             CARMAE LITLIT LITOBT LITSAX NUCLAP TECTES
##   herbivore      0    220    197     20      0     82
##   predator      47      0      0      0    116      0

# tidyverse
motinv <- motinv |> mutate(trophic = ifelse(SpeciesCode %in% pred, "predator", "herbivore"))
table(motinv$trophic, motinv$SpeciesCode)
View R output
##            
##             CARMAE LITLIT LITOBT LITSAX NUCLAP TECTES
##   herbivore      0    220    197     20      0     82
##   predator      47      0      0      0    116      0

CHALLENGE: Using the motile invertebrate data, create a new column called count_level that has levels High, Medium, Low, based on No.Damage, where "High" is > 35, "Medium" is 10 - 35, and "Low" is < 10.
Answer
# Base R using a nested ifelse()
motinv$count_level <- 
  ifelse(motinv$No.Damage > 35, "High", 
         ifelse(motinv$No.Damage >= 10 & motinv$No.Damage <= 35, "Medium", "Low"))

table(motinv$count_level) # check that it worked
View R output
## 
##   High    Low Medium 
##    167    352    163

# Tidyverse using case_when() and between
motinv <- motinv |> mutate(count_level = case_when(No.Damage > 35 ~ "High",
                                                   between(No.Damage, 10, 35) ~ "Medium",
                                                   No.Damage < 10 ~ "Low"))

table(motinv$count_level) # check that it worked
View R output
## 
##   High    Low Medium 
##    167    352    163

Note the use of the between() function that saves typing. This function matches as >= and <=.



Summarizing with dplyr

Using summarize()

Yesterday, we used functions like mean(), min(), and max() to summarize entire datasets. Now we're going to use those same functions to summarize data by grouping variables, such as park, year, plot, etc. The process is similar to using Totals in Access or subtotals in Excel, although it is more flexible and efficient in R.

Difference between summarize() and mutate():
  • mutate() returns the same number of rows as the original data frame. This function also returns all of the rows that were in the original data frame.
  • summarize() returns the same number of rows as there are grouping levels in the original data frame. This function only returns the rows that were part of the .by = c() and that were created in the summarize() function.
Common functions used to summarize:
  • mean(): calculate the group means
  • min(): calculate the group minimums
  • max(): calculate the group maximums
  • sum(): calculate the group sums
  • sd(): calculate the group standard deviations
  • n(): tally the number of rows within each group

To demonstrate summarize and mutate in dplyr, we're going to use the point intercept data collected along 3 transects in the Bass Harbor site. The data have already been summarized by transect (T1, T2, T3). We now want to calculate site-level median elevation and percent frequency for each cover type.

Read in the point intercept data

pi_dat <- read.csv("./data/BASHAR_Point_Intercept_data.csv")

head(pi_dat)
View R output
##   SiteCode PlotName Year              CoverType CoverCode med_elev num_counts
## 1   BASHAR       T1 2018                   Rock      ROCK 4.340922         19
## 2   BASHAR       T1 2018 Crustose non-coralline    NONCOR 3.389422         14
## 3   BASHAR       T1 2018                  Water     WATER 4.404461          4
## 4   BASHAR       T1 2018                   Bolt      BOLT 4.107183          2
## 5   BASHAR       T1 2018    Other Algae - Green    ALGGRE 3.823654          7
## 6   BASHAR       T1 2018               Barnacle    BARSPP 2.519213          6
##   samp_counts  pct_freq
## 1         153 12.418301
## 2         153  9.150327
## 3         153  2.614379
## 4         153  1.307190
## 5         153  4.575163
## 6         153  3.921569

Using mutate(), calculate the average percent frequency and median elevation by CoverType and Year

pi_dat_mut <- pi_dat |> mutate(med_elev_sl = median(med_elev), 
                               avg_pct_freq = mean(pct_freq), 
                               .by = c(SiteCode, Year, CoverType, CoverCode))
nrow(pi_dat) #314
nrow(pi_dat_mut) #314
head(pi_dat_mut)

Note how pi_dat_mut has the same number of rows and all the original columns plus the two we calculated (site-level median, and average % frequency). More often we're interested in reducing the data to one row per grouping level. That's what summarize() is for.

Using summarize(), calculate the average percent frequency, median elevation, min/max of frequency and elevation by CoverType and Year.

pi_dat_sum <- pi_dat |> summarize(elev_sl_med = median(med_elev), 
                                  elev_sl_min = min(med_elev),
                                  elev_sl_max = max(med_elev),
                                  avg_pct_freq = mean(pct_freq), 
                                  .by = c(SiteCode, Year, CoverType, CoverCode))
nrow(pi_dat) #314
nrow(pi_dat_sum) #124
head(pi_dat_sum)

Note how pi_dat_sum has the 1/3 of the rows and only the grouping columns and the two we calculated (site-level median, and average % frequency).

The mutate(.by = c()) approach is helpful if you're trying to standardize values within your group. But in most cases, the summarize(.by = c()) approach, which collapses to the group level, is what you're looking for. Note that in older versions of dplyr, the syntax was group_by() |> summarize() with no .by = c().

Summarize the average and standard error total counts for the motile invertebrate data by CommunityType.

We will first fix the 1960 and PM errors from before, drop QAQC visits, then combine the Damage and No.Damage columns to make a total_count column.

# Fix the data issues again
motinv <- motinv |> 
  mutate(NoDamage_fix = replace(No.Damage, Damage == 1960, 196),
         Damage_fix = as.numeric(replace(Damage, Damage == "PM", NA)),
         total_count = NoDamage_fix + Damage_fix) |> 
  filter(QAQC == FALSE)

# Summarize the mean count per plot of each species by year and community type
motinv_sum <- motinv |> 
  summarize(mean_count = sum(total_count)/5, # 5 plots per site
            se_counts = sd(total_count)/sqrt(5), # 5 plots per site
            .by = c(SiteCode, Year, CommunityType, 
                    ScientificName, CommonName, SpeciesCode))

head(motinv_sum)


Test your summarizing skills
CHALLENGE: Using the point intercept data (pi_dat), calculate the average percent frequency of each non-vegetated substrate by year. Note that non-vegetated substrates are CoverCode = c('BOLT', 'ROCK', 'WATER').
Answer
pi_nonveg <- pi_dat |> filter(CoverCode %in% c("BOLT", "ROCK", "WATER")) |> # filter nonveg grps
  summarize(avg_freq = mean(pct_freq), # calc avg.
            .by = c(SiteCode, Year, CoverCode, CoverType)) # grouping variables 

head(pi_nonveg) # check output
CHALLENGE: Using the point intercept data (pi_dat), calculate the average percent frequency of each non-vegetated vs vegetated cover types by year. Note that non-vegetated substrates are CoverCode = c('BOLT', 'ROCK', 'WATER').
Answer
pi_subtype <- pi_dat |>
  mutate(sub_type = ifelse(CoverCode %in% c("BOLT", "ROCK", "WATER"), "nonveg", "veg")) |> # filter nonveg grps
  summarize(avg_freq = mean(pct_freq), # calc avg.
            .by = c(SiteCode, Year, sub_type)) |> # grouping variables 
  arrange(SiteCode, Year, sub_type) # sort variables

head(pi_subtype) # check output


Pivoting Tables

Reshaping 101

Reshaping data from long to wide and wide to long is a common task with our data. Datasets are usually described as long, or wide. The long form, which is the structure database tables often take, consists of each row being an observation, and each column being a variable (i.e. tidy format). However, in summary tables, we often want to reshape the data to be wide for better digestion.

We’ll work with the point intercept data again to demonstrate pivoting, and will use the data frame we created by summarizing median elevation and average percent frequency of each cover code by year. If you don't have that data frame yet, run the code below.

# load the package
library(dplyr)
library(tidyr) # for pivot functions

#--- import the raw point intercept data
pi_dat <- read.csv("./data/BASHAR_Point_Intercept_data.csv")


# summarize data by site, year, and cover type
pi_dat_sum <- pi_dat |> summarize(med_elev_sl = median(med_elev, na.rm = T), 
                                  avg_pct_freq = mean(pct_freq, na.rm = T), 
                                  .by = c(SiteCode, Year, CoverType, CoverCode))


Pivot from long to wide

With pi_dat_sum we're going to pivot the data wide to make each CoverCode a separate column and the values in each cell be the avg_pct_freq. The code below is pretty straightforward with names_from being the column you want to turn into column names, and the values_from being the value you want in the cells.

Pivot point intercept data to wide

When pivoting long to wide, you're reducing the number of rows to have one observation for each of level of the variable you're pivoting on. If you have other variables in your data frame, like CoverCode and med_elev_sl in this data frame, you have to drop those columns for the pivot to result in one observation per level of the pivoted variable. If that doesn't make sense, try running the pivot_wider() without dropping CoverType or med_elev_sl, and you'll see what I mean. I also added the arrange() by CoverCode, so the columns were sorted alphabetically in the pivot.

pi_wide <- pi_dat_sum |> 
  arrange(CoverCode, Year) |> # sort by CoverCode and year
  select(-CoverType, -med_elev_sl) |> # Drop extra column
  pivot_wider(names_from = CoverCode, # column that will produce column names
              values_from = avg_pct_freq) # column to make the values
head(pi_wide)
View R output
## # A tibble: 6 × 22
##   SiteCode  Year ALGGRE ALGRED ARTCOR ASCNOD BARSPP   BOLT CHOMAS CRUCOR FUCEPI
##   <chr>    <int>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 BASHAR    2013  0.654  0.983  1.99  NA      16.2  NA      10.5  NA       1.94
## 2 BASHAR    2014  1.98  NA      0.652 NA      14.9   0.662   7.46 NA       3.50
## 3 BASHAR    2015  4.16  NA     NA      0.976  12.1  NA       5.47 NA       2.40
## 4 BASHAR    2016  3.98  NA      0.667 NA       9.14 NA       6.86  0.667   1.99
## 5 BASHAR    2017 10.3    0.983  0.987 NA      12.6  NA      10.1  NA       4.16
## 6 BASHAR    2018  5.19  NA      0.650 NA       8.59  1.30    3.71 NA       6.53
## # ℹ 11 more variables: FUCSPP <dbl>, NONCOR <dbl>, OTHINV <dbl>, OTHSUB <dbl>,
## #   PALPAL <dbl>, PORSPP <dbl>, ROCK <dbl>, ULVINT <dbl>, ULVLAC <dbl>,
## #   UNIDEN <dbl>, WATER <dbl>

That was pretty simple. But there are a lot of blanks where a CoverCode wasn't detected in a give year and site. We can use the values_fill argument to save us time filling blanks as 0s.

Pivot point intercept data to wide filling blanks as 0

pi_wide <- pi_dat_sum |> 
  arrange(CoverCode, Year) |> 
  select(-CoverType, -med_elev_sl) |> 
  pivot_wider(names_from = CoverCode, 
              values_from = avg_pct_freq, 
              values_fill = 0) # new line

head(pi_wide)
View R output
## # A tibble: 6 × 22
##   SiteCode  Year ALGGRE ALGRED ARTCOR ASCNOD BARSPP  BOLT CHOMAS CRUCOR FUCEPI
##   <chr>    <int>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>
## 1 BASHAR    2013  0.654  0.983  1.99   0      16.2  0      10.5   0       1.94
## 2 BASHAR    2014  1.98   0      0.652  0      14.9  0.662   7.46  0       3.50
## 3 BASHAR    2015  4.16   0      0      0.976  12.1  0       5.47  0       2.40
## 4 BASHAR    2016  3.98   0      0.667  0       9.14 0       6.86  0.667   1.99
## 5 BASHAR    2017 10.3    0.983  0.987  0      12.6  0      10.1   0       4.16
## 6 BASHAR    2018  5.19   0      0.650  0       8.59 1.30    3.71  0       6.53
## # ℹ 11 more variables: FUCSPP <dbl>, NONCOR <dbl>, OTHINV <dbl>, OTHSUB <dbl>,
## #   PALPAL <dbl>, PORSPP <dbl>, ROCK <dbl>, ULVINT <dbl>, ULVLAC <dbl>,
## #   UNIDEN <dbl>, WATER <dbl>

Now we see that every cell has a value. Another useful argument in pivot_wider() is names_prefix. That allows you to add a string before the column names that are generated in the pivot. This is helpful if you're pivoting on a number column, like year or plot number. R doesn't like column names that start with a number. The names_prefix is a quick way to fix that. To demonstrate, I'll pivot on year instead of CoverCode.

Pivot point intercept data wide using Year instead of CoverCode and add prefix.

pi_wide_yr <- pi_dat_sum |> 
  arrange(Year) |>
  select(-med_elev_sl) |> 
  pivot_wider(names_from = Year, # pivot on year instead of CoverCode 
              values_from = avg_pct_freq, 
              values_fill = 0,
              names_prefix = "yr_") # new line

head(pi_wide_yr)
View R output
## # A tibble: 6 × 14
##   SiteCode CoverType   CoverCode yr_2013 yr_2014 yr_2015 yr_2016 yr_2017 yr_2018
##   <chr>    <chr>       <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 BASHAR   Rock        ROCK        15.4   14.6     16.4    10.5    16.2    11.3 
## 2 BASHAR   Water       WATER        2.82   1.95     3.03    2.84    1.96    2.82
## 3 BASHAR   Crustose n… NONCOR      10.2   10.3      8.03    7.72    4.57    8.65
## 4 BASHAR   Barnacle    BARSPP      16.2   14.9     12.1     9.14   12.6     8.59
## 5 BASHAR   Rockweed    FUCSPP      31.2   43.7     45.7    56.3    35.7    50.1 
## 6 BASHAR   Unidentifi… UNIDEN       2.65   0.658    0       0       0       0   
## # ℹ 5 more variables: yr_2019 <dbl>, yr_2021 <dbl>, yr_2022 <dbl>,
## #   yr_2023 <dbl>, yr_2024 <dbl>

CHALLENGE: Use the motinv_sum data frame from the "Summarizing with dplyr" tab to pivot on SpeciesCode and mean_count, and fill the NAs with 0s. If you don't have the motinv_sum data frame handy, run the code below to create it.
Hint: Drop the ScientificName and CommonName columns before you pivot.

# Fix the data issues again
motinv <- motinv |> 
  mutate(NoDamage_fix = replace(No.Damage, Damage == 1960, 196),
         Damage_fix = as.numeric(replace(Damage, Damage == "PM", NA)),
         total_count = NoDamage_fix + Damage_fix) |> 
  filter(QAQC == FALSE)

# Summarize the mean count per plot of each species by year and community type
motinv_sum <- motinv |> 
  summarize(mean_count = sum(total_count)/5, # 5 plots per site
            se_counts = sd(total_count)/sqrt(5), # 5 plots per site
            .by = c(SiteCode, Year, CommunityType, 
                    ScientificName, CommonName, SpeciesCode))
Answer
motinv_wide <- motinv_sum |> 
  arrange(SpeciesCode) |> # sorting so columns are alphabetical 
  select(-ScientificName, -CommonName) |> 
  pivot_wider(names_from = SpeciesCode,
              values_from = mean_count, 
              values_fill = 0)

head(motinv_wide)
## # A tibble: 6 × 10
##   SiteCode  Year CommunityType se_counts CARMAE LITLIT LITOBT LITSAX NUCLAP
##   <chr>    <int> <chr>             <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 BASHAR    2021 Ascophyllum       0.980    4.4      0      0      0      0
## 2 BASHAR    2022 Ascophyllum       0.224    1        0      0      0      0
## 3 BASHAR    2023 Ascophyllum       0.2      2.8      0      0      0      0
## 4 BASHAR    2024 Ascophyllum       0        0.8      0      0      0      0
## 5 BASHAR    2019 Ascophyllum       0.632    1.2      0      0      0      0
## 6 BASHAR    2021 Barnacle          1.73     2.6      0      0      0      0
## # ℹ 1 more variable: TECTES <dbl>

CHALLENGE: Use the motinv_sum data frame from the "Summarizing with dplyr" tab to pivot on Year and mean_count, fill the NAs with 0s, and add "yr_" to the column names to prevent column names starting with numbers. If you don't have the motinv_sum data frame handy, run the code below to create it.
Hint: Drop the se_counts column before you pivot.

# Fix the data issues again
motinv <- motinv |> 
  mutate(NoDamage_fix = replace(No.Damage, Damage == 1960, 196),
         Damage_fix = as.numeric(replace(Damage, Damage == "PM", NA)),
         total_count = NoDamage_fix + Damage_fix) |> 
  filter(QAQC == FALSE)

# Summarize the mean count per plot of each species by year and community type
motinv_sum <- motinv |> 
  summarize(mean_count = sum(total_count)/5, # 5 plots per site
            se_counts = sd(total_count)/sqrt(5), # 5 plots per site
            .by = c(SiteCode, Year, CommunityType, 
                    ScientificName, CommonName, SpeciesCode))
Answer
motinv_wide_yr <- motinv_sum |> 
  arrange(Year) |> # sorting so columns are alphabetical 
  select(-se_counts) |> 
  pivot_wider(names_from = Year,
              values_from = mean_count, 
              values_fill = 0, 
              names_prefix = "yr_")

head(motinv_wide_yr)
## # A tibble: 6 × 16
##   SiteCode CommunityType ScientificName   CommonName SpeciesCode yr_2013 yr_2014
##   <chr>    <chr>         <chr>            <chr>      <chr>         <dbl>   <dbl>
## 1 BASHAR   Ascophyllum   Littorina litto… Common pe… LITLIT         14      20.8
## 2 BASHAR   Ascophyllum   Littorina obtus… Smooth pe… LITOBT         19      18.4
## 3 BASHAR   Ascophyllum   Nucella lapillus Dogwhelk   NUCLAP          0.6     0.2
## 4 BASHAR   Barnacle      Littorina litto… Common pe… LITLIT          0.4     0.6
## 5 BASHAR   Barnacle      Littorina obtus… Smooth pe… LITOBT          0.2     1.6
## 6 BASHAR   Fucus         Littorina litto… Common pe… LITLIT         13.6    25.6
## # ℹ 9 more variables: yr_2015 <dbl>, yr_2016 <dbl>, yr_2017 <dbl>,
## #   yr_2018 <dbl>, yr_2019 <dbl>, yr_2021 <dbl>, yr_2022 <dbl>, yr_2023 <dbl>,
## #   yr_2024 <dbl>


Pivot wide to long

We can reshape the capture data back to long, which will give us a similar data as before with 0s are added into the data. For the pivot_long() function, you have to tell it which columns to pivot on. If you don't specify, it will make the entire dataset into 2 long columns, which you typically don't want. Here I tell R not to pivot on SiteCode and Year columns, because I know they're in the data frame and unlikely to change. If I instead specified the species codes to pivot on, if a new species were found in the next year of sampling, I'd have to update this code to include that new species.

pi_long <- pi_wide |> pivot_longer(cols = -c(SiteCode, Year), 
                                   names_to = "SpeciesCode", 
                                   values_to = "Avg_Pct_Freq")
head(pi_long)
View R output
## # A tibble: 6 × 4
##   SiteCode  Year SpeciesCode Avg_Pct_Freq
##   <chr>    <int> <chr>              <dbl>
## 1 BASHAR    2013 ALGGRE             0.654
## 2 BASHAR    2013 ALGRED             0.983
## 3 BASHAR    2013 ARTCOR             1.99 
## 4 BASHAR    2013 ASCNOD             0    
## 5 BASHAR    2013 BARSPP            16.2  
## 6 BASHAR    2013 BOLT               0

Note that for pivot_longer() the names_prefix = "" argument actually removes the string you specify from the columns you're pivoting on, rather than adding the string to the column name in pivot_wider(). In other words, it does the opposite.

CHALLENGE: Pivot the motinv_wide_yr data frame on the years columns, and remove the "yr_" from the year names using names_prefix = 'yr_'.

Answer
motinv_long_yr <- pivot_longer(motinv_wide_yr, 
                               cols = -c(SiteCode, CommunityType, ScientificName, 
                                         CommonName, SpeciesCode),
                               names_to = "Year", 
                               values_to = "mean_counts", 
                               names_prefix = "yr_") # drops this string from values


Joining Tables

Joining tables 101

We often need to combine data from separate tables in our work (e.g., relational database tables). In R we do this using either the merge() function in base R or join_() functions in dplyr. Because I find dplyr join functions to be more intuitive and to perform faster than base R's merge, I'm going to show how to use dplyr. If you understand the basic concepts if the join functions, you can figure out how to merge in base R.

Joining tables requires that the two datasets to join have at least one column in common, which is referred to as the key. The key is used to match records. The join type will determine whether all rows from both datasets are returned, or if only a portion are returned based on values in either or both of the two datasets. We think of joining as consisting of a left and right dataset. The left is specified first in the function argument, and the right is specified second. It generally doesn't matter which you make left or right, just that you know which is left or right. In general, I put site-level datasets on the left, and sample event datasets on the right.


Types of joins
  • Full Join: keeps all observations that appear in both the left and right dataset. Any key value in the left dataset that is not found in the right dataset will return NAs for columns coming from the right data frame, and vice versa. The full join treats both the left and right datasets equally.
  • full join
    Figure from R for Data Science

  • Inner Join: keeps only observations that are matched in both the left and right dataset. Any key value in the left dataset that is not found in the right dataset will be dropped, and vice versa. The inner join treats the left and right datasets equally.
  • inner join
    Figure from R for Data Science

  • Left Join: keeps all observations that appear in the left dataset (first specified) and only those matched in the right dataset. Any key value in the left dataset that is not found in the right dataset will return NAs for columns coming from the right dataset. Any key value in the right dataset not found in the left data frame will be dropped.
  • left join
    Figure from R for Data Science

  • Right Join: keeps all observations that appear in the right dataset (second specified) and only those matched in the left dataset. Any key value in the right dataset that is not found in the left dataset will return NAs for columns coming from the left dataset. Any key value in the left dataset not found in the right dataset will be dropped.
  • right join
    Figure from R for Data Science
  • Anti Join: Returns records in the left dataset that are not found in the right dataset. This is one direction, so to find all records not in common, you have to do an anti join with both combinations of data being on left or right side. I use anti joins when I'm trying to check that two datasets have the same sites or years represented.
  • anti join
    Figure from R for Data Science


Joins in practice

To demonstrate the different joins, we're going to use some fake bat capture data. One table has species captured by year and site. The other has additional site information, like X/Y coordinates, full site names, etc.

Read in fake bat site and capture data

#site data
bat_sites <- read.csv("./data/bat_site_info.csv")
# bat capture data
bat_cap <- read.csv("./data/bat_captures.csv")

# View sites listed in each
sort(unique(bat_sites$Site)) # Sites 1, 2, 3, 4, 5
## [1] "site_001" "site_002" "site_003" "site_004" "site_005"
sort(unique(bat_cap$Site)) # Sites 1, 2, 3, 5, 6
## [1] "site_001" "site_002" "site_003" "site_005" "site_006"

The key in the two bat datasets is the "Site" column. In the bat_sites data frame, there are 5 unique sites, numbered 1:5. In the bat_cap data there are 5 unique sites, numbered 1, 2, 3, 5, 6. Therefore site_004 is only found in bat_sites and site_006 is only found in bat_cap.

Full join

bat_full <- full_join(bat_sites, bat_cap, by = "Site")
table(bat_full$Site)
## 
## site_001 site_002 site_003 site_004 site_005 site_006 
##        7        7        7        1        7        1
View R output
Site Unit X Y SiteName Year LASCIN MYOLEI MYOSEP MYOLUC
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2019 1 0 0 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2020 0 1 1 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2021 0 1 0 1
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2022 1 2 0 1
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2023 0 1 0 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2024 0 0 0 2
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2025 0 2 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2019 1 1 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2020 0 2 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2021 1 1 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2022 0 0 0 1
site_002 Schoodic 574712 4909721 SERC Campus 2023 0 0 1 0
site_002 Schoodic 574712 4909721 SERC Campus 2024 0 2 0 1
site_002 Schoodic 574712 4909721 SERC Campus 2025 1 0 1 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2019 0 1 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2020 0 1 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2021 0 3 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2022 0 2 0 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2023 0 1 0 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2024 0 2 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2025 0 1 0 1
site_004 Mount Desert Island 549931 4903409 Western Mtns NA NA NA NA NA
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2019 0 1 1 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2020 1 0 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2021 0 1 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2022 0 0 1 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2023 1 0 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2024 0 0 0 2
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2025 0 1 0 0
site_006 NA NA NA NA 2025 0 0 1 0

Note how site_004, which was not in the bat_cap capture data, but was in the bat_site data is included with NAs for the columns that came from the bat_cap data. Additionally, site_006, which was only in the bat_cap capture data but not in the bat_site data has NAs for the columns that came from the bat_site data.

Inner join

bat_inner <- inner_join(bat_sites, bat_cap, by = "Site")
table(bat_inner$Site)
## 
## site_001 site_002 site_003 site_005 
##        7        7        7        7
View R output
Site Unit X Y SiteName Year LASCIN MYOLEI MYOSEP MYOLUC
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2019 1 0 0 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2020 0 1 1 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2021 0 1 0 1
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2022 1 2 0 1
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2023 0 1 0 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2024 0 0 0 2
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2025 0 2 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2019 1 1 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2020 0 2 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2021 1 1 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2022 0 0 0 1
site_002 Schoodic 574712 4909721 SERC Campus 2023 0 0 1 0
site_002 Schoodic 574712 4909721 SERC Campus 2024 0 2 0 1
site_002 Schoodic 574712 4909721 SERC Campus 2025 1 0 1 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2019 0 1 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2020 0 1 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2021 0 3 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2022 0 2 0 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2023 0 1 0 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2024 0 2 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2025 0 1 0 1
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2019 0 1 1 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2020 1 0 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2021 0 1 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2022 0 0 1 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2023 1 0 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2024 0 0 0 2
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2025 0 1 0 0

The inner join only returns records from both datasets that have site in common. Therefore, site_004 in the bat_site data and site_006 in the bat_cap capture data were dropped.

Left join

bat_left <- left_join(bat_sites, bat_cap, by = "Site")
table(bat_left$Site)
## 
## site_001 site_002 site_003 site_004 site_005 
##        7        7        7        1        7
View R output
Site Unit X Y SiteName Year LASCIN MYOLEI MYOSEP MYOLUC
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2019 1 0 0 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2020 0 1 1 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2021 0 1 0 1
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2022 1 2 0 1
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2023 0 1 0 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2024 0 0 0 2
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2025 0 2 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2019 1 1 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2020 0 2 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2021 1 1 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2022 0 0 0 1
site_002 Schoodic 574712 4909721 SERC Campus 2023 0 0 1 0
site_002 Schoodic 574712 4909721 SERC Campus 2024 0 2 0 1
site_002 Schoodic 574712 4909721 SERC Campus 2025 1 0 1 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2019 0 1 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2020 0 1 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2021 0 3 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2022 0 2 0 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2023 0 1 0 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2024 0 2 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2025 0 1 0 1
site_004 Mount Desert Island 549931 4903409 Western Mtns NA NA NA NA NA
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2019 0 1 1 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2020 1 0 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2021 0 1 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2022 0 0 1 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2023 1 0 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2024 0 0 0 2
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2025 0 1 0 0

The left join is taking every row in the left data, bat_sites, and only the rows in the right data, bat_cap, that have a matching site. Note how site_004, which is only in the bat_sites, is included with NAs for the columns that came from the bat_cap data that didn't have a match. Site_006, which was only in the bat_cap data was dropped.

Coding tip: I use left joins more than any other join because I'm usually joining tables that have a 1-to-many relationship, where the left dataset has 1 row for 1 or more rows in the right dataset. For example, say I have a dataset that only includes data for plots where an invasive species was detected and I want to do summary statistics that require the full number of plots. Using a left join, where the left dataset is a table of all of the plots and the right dataset is the invasive detections, will return the full set of plots to calculate summary statistics from. You may also have to fill 0s where NAs are introduced in the data before generating summary statistics, which should be done wisely.

Right join

bat_right <- right_join(bat_sites, bat_cap, by = "Site")
table(bat_right$Site)
## 
## site_001 site_002 site_003 site_005 site_006 
##        7        7        7        7        1
View R output
Site Unit X Y SiteName Year LASCIN MYOLEI MYOSEP MYOLUC
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2019 1 0 0 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2020 0 1 1 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2021 0 1 0 1
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2022 1 2 0 1
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2023 0 1 0 0
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2024 0 0 0 2
site_001 Mount Desert Island 559205 4907461 Jordan Pond 2025 0 2 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2019 1 1 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2020 0 2 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2021 1 1 0 0
site_002 Schoodic 574712 4909721 SERC Campus 2022 0 0 0 1
site_002 Schoodic 574712 4909721 SERC Campus 2023 0 0 1 0
site_002 Schoodic 574712 4909721 SERC Campus 2024 0 2 0 1
site_002 Schoodic 574712 4909721 SERC Campus 2025 1 0 1 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2019 0 1 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2020 0 1 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2021 0 3 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2022 0 2 0 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2023 0 1 0 0
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2024 0 2 0 1
site_003 Mount Desert Island 554607 4895800 Bass Harbor 2025 0 1 0 1
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2019 0 1 1 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2020 1 0 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2021 0 1 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2022 0 0 1 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2023 1 0 0 0
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2024 0 0 0 2
site_005 Mount Desert Island 563101 4912371 Sieur de Monts 2025 0 1 0 0
site_006 NA NA NA NA 2025 0 0 1 0

The right join is taking every row in the right data, bat_cap, and only the rows in the left data, bat_sites, that have a matching site. Note how Site_006, which is only in the bat_cap, is included with NAs for the columns that came from the bat_sites data that didn't have a match. Site_004, which was only in the bat_sites data was dropped.

Anti join to find sites not in bat_cap

anti_join(bat_sites, bat_cap, by = "Site")
##       Site                Unit      X       Y     SiteName
## 1 site_004 Mount Desert Island 549931 4903409 Western Mtns

Anti join to find sites not in bat_sites

anti_join(bat_cap, bat_sites, by = "Site")
##       Site Year LASCIN MYOLEI MYOSEP MYOLUC
## 1 site_006 2025      0      0      1      0


Test your skills!

CHALLENGE: Join the motile invertebrate count data frame to the motile invertebrate species table to get Invasive and Exotic columns added to the data.

Import motinv data frames

#--- Read in motinv data if you haven't yet
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")

#--- Read in species table
motspp <- read.csv("./data/motile_invert_species_table.csv")

head(motspp)
View R output
##              ScientificName        CommonName SpeciesCode Invasive Exotic
## 1        Littorina littorea Common periwinkle      LITLIT    FALSE   TRUE
## 2        Littorina obtusata Smooth periwinkle      LITOBT    FALSE  FALSE
## 3           Carcinus maenas        Green crab      CARMAE     TRUE   TRUE
## 4       Littorina saxatilis  Rough periwinkle      LITSAX    FALSE  FALSE
## 5          Nucella lapillus          Dogwhelk      NUCLAP    FALSE  FALSE
## 6 Testudinalia testudinalis            Limpet      TECTES    FALSE  FALSE

intersect(names(motinv), names(motspp)) # 3 columns in common
View R output
## [1] "ScientificName" "CommonName"     "SpeciesCode"

Answer
# left join species to motinv, because don't want to include species not found in count data
motinv_spp <- left_join(motinv, 
                        motspp, 
                        by = c("SpeciesCode", "ScientificName", "CommonName"))

head(motinv_spp)
View R output
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 3    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 4    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 5    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 6    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
##       ScientificName        CommonName SpeciesCode Damage No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT      0         2         No
## 2 Littorina littorea Common periwinkle      LITLIT      0         3         No
## 3 Littorina obtusata Smooth periwinkle      LITOBT      1         2         No
## 4 Littorina obtusata Smooth periwinkle      LITOBT      0         6         No
## 5   Nucella lapillus          Dogwhelk      NUCLAP      0         1         No
## 6 Littorina littorea Common periwinkle      LITLIT      0         2         No
##   Invasive Exotic
## 1    FALSE   TRUE
## 2    FALSE   TRUE
## 3    FALSE  FALSE
## 4    FALSE  FALSE
## 5    FALSE  FALSE
## 6    FALSE   TRUE

CHALLENGE: Find species in motspp data frame that don't have a match in the motinv data frame.
Answer
# anti join of 
anti_join(motspp, motinv, by = c("SpeciesCode", "ScientificName", "CommonName"))
View R output
##           ScientificName      CommonName SpeciesCode Invasive Exotic
## 1 Hemigrapsus sanguineus Asian shorecrab     HEMISAN     TRUE   TRUE
## 2         Locusta marina         lobster      LOCMAR    FALSE  FALSE


Rolling joins

There are a number of other more advanced joins out there, the rolling join being one of them. For more information on all possible joins, refer to Chapter 19 in R for Data Science.

Rolling joins can come in handy if the key values in your two datasets don't perfectly match, and you want to join on the closest match. An example of where I've used rolling joins is to relate timing of high tide to the nearest water temperature measurement from a HOBO logger. You can allow for the nearest match in both directions or specify the direction (e.g., => or <=). rolling join
Figure from R for Data Science


Unfortunately, dplyr's rolling join doesn't perform the way I've needed it. It only matches in one direction, like the closest temperature measurement after high tide, or the closest temperature measurement before high tide. If you need to do a rolling join, the data.table package is your best bet. It requires learning a new syntax and coding approach, so I'm not covering it here. But it's helpful to know that if you're working with huge datasets, data.table tends to perform much faster than dplyr and may have more features for joining and summarizing your data than dplyr.


Dates and Times

Dates and times 101

Dates, times and date-times are all species types of data in R. When you read in a dataset that has any of these, they typically will read in as a character. You then have to convert it into a date/time to do anything meaningful with it. The first place to start is knowing the code R uses to define year, month, day, hours, minutes, and seconds. The most common codes you'll come across are below. For the full list, check out the help for strptime by running: ?strptime. The codes below are the ones you're most likely to come across, either to define a date/time format, or to return a specific format (like day of the week, month written in full, Julian day, etc.)

Code Definition
%a Abbreviated weekday name in the current locale on this platform.
%A Full weekday name in the current locale.
%b Abbreviated month name in the current locale on this platform. Case-insensitive on input.
%B Full month name in the current locale. Case-insensitive on input.
%d Day of the month as decimal number (01-31).
%H Hours as decimal number (00-23). As a special exception strings such as ??24:00:00?? are accepted for input.
%I Hours as decimal number (01-12).
%j Day of year (Julian) as decimal number (001-366): For input, 366 is only valid in a leap year.
%m Month as decimal number (01-12).
%M Minute as decimal number (00-59).
%p AM/PM indicator in the locale. Used in conjunction with %I and not with %H. For input the match is case-insensitive.
%S Second as integer (00-61)
%u Weekday as a decimal number (1-7, Monday is 1).
%y Year without century (00-99).
%Y Year with century.

Look at current time and date output.

Sys.time()
## [1] "2026-04-27 13:00:49 EDT"
class(Sys.time()) # POSIXct POSIXt
## [1] "POSIXct" "POSIXt"
Sys.Date()
## [1] "2026-04-27"
class(Sys.Date()) # Date
## [1] "Date"


Dates in R

For date only columns, you convert to a Date type. A few different versions of defining dates are below, based on the different format of the input date. This requires matching the format exactly. So, if there are - between day, month, year, or /, you need to specify the right symbol. If the output returns NA instead of a Date, something was wrong either in how you specified the format, or the column you're trying to format may have more than 1 format represented.

Example formatting for dates

# date with slashes and full year
date_chr1 <- "3/12/2026"
date1 <- as.Date(date_chr1, format = "%m/%d/%Y")
str(date1)
# date with dashes and 2-digit year
date_chr2 <- "3-12-26"
date2 <- as.Date(date_chr2, format = "%m-%d-%y")
str(date2)
# date written out
date_chr3 <- "March 12, 2026"
date3 <- as.Date(date_chr3, format = "%b %d, %Y")
str(date3)
##  Date[1:1], format: "2026-03-12"

Extract information about dates

#Julian date as numeric
as.numeric(format(date1, format = "%j"))
## [1] 71
#Return day of week
format(date1, format = "%A") 
## [1] "Thursday"
#Return abbreviated day of week
format(date1, format = "%a") 
## [1] "Thu"
#Return written out date with month name
format(date1, format = "%B %d, %Y") 
## [1] "March 12, 2026"
#Return abbreviated written out date with month name
format(date1, format = "%b %d, %Y") 
## [1] "Mar 12, 2026"

Do math with dates

date1 + 1 # add a day
## [1] "2026-03-13"
date1 + 7 # add a week
## [1] "2026-03-19"

Create a vector of evenly spaced dates.

This can be helpful for setting up axis labels where one axis is dates.

date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y") 
# by 15 days
seq.Date(date_list[1], date_list[2], by = "15 days")
##  [1] "2026-01-01" "2026-01-16" "2026-01-31" "2026-02-15" "2026-03-02"
##  [6] "2026-03-17" "2026-04-01" "2026-04-16" "2026-05-01" "2026-05-16"
## [11] "2026-05-31" "2026-06-15" "2026-06-30" "2026-07-15" "2026-07-30"
## [16] "2026-08-14" "2026-08-29" "2026-09-13" "2026-09-28" "2026-10-13"
## [21] "2026-10-28" "2026-11-12" "2026-11-27" "2026-12-12" "2026-12-27"
# by month
seq.Date(date_list[1], date_list[2], by = "1 month")
##  [1] "2026-01-01" "2026-02-01" "2026-03-01" "2026-04-01" "2026-05-01"
##  [6] "2026-06-01" "2026-07-01" "2026-08-01" "2026-09-01" "2026-10-01"
## [11] "2026-11-01" "2026-12-01"
# by 6 months
seq.Date(date_list[1], date_list[2], by = "6 months")
## [1] "2026-01-01" "2026-07-01"
CHALLENGE: How would you return date1 as YYYYMMDD (20260312)?
Answer
format(date1, format = "%Y%m%d")
View R output
## [1] "20260312"

CHALLENGE: How would you create a list of dates in 2026 that are evenly spaced by 3 months?
Answer
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y") 
seq.Date(date_list[1], date_list[2], by = "3 months")
View R output
## [1] "2026-01-01" "2026-04-01" "2026-07-01" "2026-10-01"

CHALLENGE: How would you create a list of dates in 2026 that are evenly spaced by 1 week?
Answer
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y") 
seq.Date(date_list[1], date_list[2], by = "1 week")
View R output
##  [1] "2026-01-01" "2026-01-08" "2026-01-15" "2026-01-22" "2026-01-29"
##  [6] "2026-02-05" "2026-02-12" "2026-02-19" "2026-02-26" "2026-03-05"
## [11] "2026-03-12" "2026-03-19" "2026-03-26" "2026-04-02" "2026-04-09"
## [16] "2026-04-16" "2026-04-23" "2026-04-30" "2026-05-07" "2026-05-14"
## [21] "2026-05-21" "2026-05-28" "2026-06-04" "2026-06-11" "2026-06-18"
## [26] "2026-06-25" "2026-07-02" "2026-07-09" "2026-07-16" "2026-07-23"
## [31] "2026-07-30" "2026-08-06" "2026-08-13" "2026-08-20" "2026-08-27"
## [36] "2026-09-03" "2026-09-10" "2026-09-17" "2026-09-24" "2026-10-01"
## [41] "2026-10-08" "2026-10-15" "2026-10-22" "2026-10-29" "2026-11-05"
## [46] "2026-11-12" "2026-11-19" "2026-11-26" "2026-12-03" "2026-12-10"
## [51] "2026-12-17" "2026-12-24" "2026-12-31"


Times in R Date-time variables (e.g. a HOBO logger timestamp) need to be converted into a POSIX type before you can work with them. POSIX (Portable Operating System Interface) is an international standard for handling things like dates and times and comes from Unix. The idea being that POSIX date/times are transferable across software. There are 2 POSIX types for date-times in R.
  1. POSIXct: is lighter weight and only stores the date-time as the number of seconds since January 1, 1970. Times prior to 1970 are stored as negative numbers.
  2. POSIXlt: type stores more information that's easily accessible, including min, hour, sec, mday (day of the month), month (mon), year, yday (Julian day), etc.

If your dataset is huge, working with the lighter weight POSIXct may be best. Outside of that, whatever you choose may not matter too much in your workflow. We will use the lighter weight POSIXct version for our examples.

Look under the hood of the info stored by the two POSIX types

unclass(as.POSIXct("2026-03-12 01:30:00", "%Y-%m-%d %H:%M:%S", tz = "America/New_York"))
## [1] 1773293400
## attr(,"tzone")
## [1] "America/New_York"
unclass(as.POSIXlt("2026-03-12 01:30:00", "%Y-%m-%d %H:%M:%S", tz = "America/New_York"))
## $sec
## [1] 0
## 
## $min
## [1] 30
## 
## $hour
## [1] 1
## 
## $mday
## [1] 12
## 
## $mon
## [1] 2
## 
## $year
## [1] 126
## 
## $wday
## [1] 4
## 
## $yday
## [1] 70
## 
## $isdst
## [1] 1
## 
## $zone
## [1] "EDT"
## 
## $gmtoff
## [1] NA
## 
## attr(,"tzone")
## [1] "America/New_York"
## attr(,"balanced")
## [1] TRUE
Note the use of timezone in the code above. Here I specified the eastern timezone. There are two handy ways to check timezones in R.

Check the timezone of your computer

Sys.timezone()
## [1] "America/New_York"

Check the timezones built into base R

OlsonNames()
View R output
##   [1] "Africa/Abidjan"                   "Africa/Accra"                    
##   [3] "Africa/Addis_Ababa"               "Africa/Algiers"                  
##   [5] "Africa/Asmara"                    "Africa/Asmera"                   
##   [7] "Africa/Bamako"                    "Africa/Bangui"                   
##   [9] "Africa/Banjul"                    "Africa/Bissau"                   
##  [11] "Africa/Blantyre"                  "Africa/Brazzaville"              
##  [13] "Africa/Bujumbura"                 "Africa/Cairo"                    
##  [15] "Africa/Casablanca"                "Africa/Ceuta"                    
##  [17] "Africa/Conakry"                   "Africa/Dakar"                    
##  [19] "Africa/Dar_es_Salaam"             "Africa/Djibouti"                 
##  [21] "Africa/Douala"                    "Africa/El_Aaiun"                 
##  [23] "Africa/Freetown"                  "Africa/Gaborone"                 
##  [25] "Africa/Harare"                    "Africa/Johannesburg"             
##  [27] "Africa/Juba"                      "Africa/Kampala"                  
##  [29] "Africa/Khartoum"                  "Africa/Kigali"                   
##  [31] "Africa/Kinshasa"                  "Africa/Lagos"                    
##  [33] "Africa/Libreville"                "Africa/Lome"                     
##  [35] "Africa/Luanda"                    "Africa/Lubumbashi"               
##  [37] "Africa/Lusaka"                    "Africa/Malabo"                   
##  [39] "Africa/Maputo"                    "Africa/Maseru"                   
##  [41] "Africa/Mbabane"                   "Africa/Mogadishu"                
##  [43] "Africa/Monrovia"                  "Africa/Nairobi"                  
##  [45] "Africa/Ndjamena"                  "Africa/Niamey"                   
##  [47] "Africa/Nouakchott"                "Africa/Ouagadougou"              
##  [49] "Africa/Porto-Novo"                "Africa/Sao_Tome"                 
##  [51] "Africa/Timbuktu"                  "Africa/Tripoli"                  
##  [53] "Africa/Tunis"                     "Africa/Windhoek"                 
##  [55] "America/Adak"                     "America/Anchorage"               
##  [57] "America/Anguilla"                 "America/Antigua"                 
##  [59] "America/Araguaina"                "America/Argentina/Buenos_Aires"  
##  [61] "America/Argentina/Catamarca"      "America/Argentina/ComodRivadavia"
##  [63] "America/Argentina/Cordoba"        "America/Argentina/Jujuy"         
##  [65] "America/Argentina/La_Rioja"       "America/Argentina/Mendoza"       
##  [67] "America/Argentina/Rio_Gallegos"   "America/Argentina/Salta"         
##  [69] "America/Argentina/San_Juan"       "America/Argentina/San_Luis"      
##  [71] "America/Argentina/Tucuman"        "America/Argentina/Ushuaia"       
##  [73] "America/Aruba"                    "America/Asuncion"                
##  [75] "America/Atikokan"                 "America/Atka"                    
##  [77] "America/Bahia"                    "America/Bahia_Banderas"          
##  [79] "America/Barbados"                 "America/Belem"                   
##  [81] "America/Belize"                   "America/Blanc-Sablon"            
##  [83] "America/Boa_Vista"                "America/Bogota"                  
##  [85] "America/Boise"                    "America/Buenos_Aires"            
##  [87] "America/Cambridge_Bay"            "America/Campo_Grande"            
##  [89] "America/Cancun"                   "America/Caracas"                 
##  [91] "America/Catamarca"                "America/Cayenne"                 
##  [93] "America/Cayman"                   "America/Chicago"                 
##  [95] "America/Chihuahua"                "America/Ciudad_Juarez"           
##  [97] "America/Coral_Harbour"            "America/Cordoba"                 
##  [99] "America/Costa_Rica"               "America/Coyhaique"               
## [101] "America/Creston"                  "America/Cuiaba"                  
## [103] "America/Curacao"                  "America/Danmarkshavn"            
## [105] "America/Dawson"                   "America/Dawson_Creek"            
## [107] "America/Denver"                   "America/Detroit"                 
## [109] "America/Dominica"                 "America/Edmonton"                
## [111] "America/Eirunepe"                 "America/El_Salvador"             
## [113] "America/Ensenada"                 "America/Fort_Nelson"             
## [115] "America/Fort_Wayne"               "America/Fortaleza"               
## [117] "America/Glace_Bay"                "America/Godthab"                 
## [119] "America/Goose_Bay"                "America/Grand_Turk"              
## [121] "America/Grenada"                  "America/Guadeloupe"              
## [123] "America/Guatemala"                "America/Guayaquil"               
## [125] "America/Guyana"                   "America/Halifax"                 
## [127] "America/Havana"                   "America/Hermosillo"              
## [129] "America/Indiana/Indianapolis"     "America/Indiana/Knox"            
## [131] "America/Indiana/Marengo"          "America/Indiana/Petersburg"      
## [133] "America/Indiana/Tell_City"        "America/Indiana/Vevay"           
## [135] "America/Indiana/Vincennes"        "America/Indiana/Winamac"         
## [137] "America/Indianapolis"             "America/Inuvik"                  
## [139] "America/Iqaluit"                  "America/Jamaica"                 
## [141] "America/Jujuy"                    "America/Juneau"                  
## [143] "America/Kentucky/Louisville"      "America/Kentucky/Monticello"     
## [145] "America/Knox_IN"                  "America/Kralendijk"              
## [147] "America/La_Paz"                   "America/Lima"                    
## [149] "America/Los_Angeles"              "America/Louisville"              
## [151] "America/Lower_Princes"            "America/Maceio"                  
## [153] "America/Managua"                  "America/Manaus"                  
## [155] "America/Marigot"                  "America/Martinique"              
## [157] "America/Matamoros"                "America/Mazatlan"                
## [159] "America/Mendoza"                  "America/Menominee"               
## [161] "America/Merida"                   "America/Metlakatla"              
## [163] "America/Mexico_City"              "America/Miquelon"                
## [165] "America/Moncton"                  "America/Monterrey"               
## [167] "America/Montevideo"               "America/Montreal"                
## [169] "America/Montserrat"               "America/Nassau"                  
## [171] "America/New_York"                 "America/Nipigon"                 
## [173] "America/Nome"                     "America/Noronha"                 
## [175] "America/North_Dakota/Beulah"      "America/North_Dakota/Center"     
## [177] "America/North_Dakota/New_Salem"   "America/Nuuk"                    
## [179] "America/Ojinaga"                  "America/Panama"                  
## [181] "America/Pangnirtung"              "America/Paramaribo"              
## [183] "America/Phoenix"                  "America/Port-au-Prince"          
## [185] "America/Port_of_Spain"            "America/Porto_Acre"              
## [187] "America/Porto_Velho"              "America/Puerto_Rico"             
## [189] "America/Punta_Arenas"             "America/Rainy_River"             
## [191] "America/Rankin_Inlet"             "America/Recife"                  
## [193] "America/Regina"                   "America/Resolute"                
## [195] "America/Rio_Branco"               "America/Rosario"                 
## [197] "America/Santa_Isabel"             "America/Santarem"                
## [199] "America/Santiago"                 "America/Santo_Domingo"           
## [201] "America/Sao_Paulo"                "America/Scoresbysund"            
## [203] "America/Shiprock"                 "America/Sitka"                   
## [205] "America/St_Barthelemy"            "America/St_Johns"                
## [207] "America/St_Kitts"                 "America/St_Lucia"                
## [209] "America/St_Thomas"                "America/St_Vincent"              
## [211] "America/Swift_Current"            "America/Tegucigalpa"             
## [213] "America/Thule"                    "America/Thunder_Bay"             
## [215] "America/Tijuana"                  "America/Toronto"                 
## [217] "America/Tortola"                  "America/Vancouver"               
## [219] "America/Virgin"                   "America/Whitehorse"              
## [221] "America/Winnipeg"                 "America/Yakutat"                 
## [223] "America/Yellowknife"              "Antarctica/Casey"                
## [225] "Antarctica/Davis"                 "Antarctica/DumontDUrville"       
## [227] "Antarctica/Macquarie"             "Antarctica/Mawson"               
## [229] "Antarctica/McMurdo"               "Antarctica/Palmer"               
## [231] "Antarctica/Rothera"               "Antarctica/South_Pole"           
## [233] "Antarctica/Syowa"                 "Antarctica/Troll"                
## [235] "Antarctica/Vostok"                "Arctic/Longyearbyen"             
## [237] "Asia/Aden"                        "Asia/Almaty"                     
## [239] "Asia/Amman"                       "Asia/Anadyr"                     
## [241] "Asia/Aqtau"                       "Asia/Aqtobe"                     
## [243] "Asia/Ashgabat"                    "Asia/Ashkhabad"                  
## [245] "Asia/Atyrau"                      "Asia/Baghdad"                    
## [247] "Asia/Bahrain"                     "Asia/Baku"                       
## [249] "Asia/Bangkok"                     "Asia/Barnaul"                    
## [251] "Asia/Beirut"                      "Asia/Bishkek"                    
## [253] "Asia/Brunei"                      "Asia/Calcutta"                   
## [255] "Asia/Chita"                       "Asia/Choibalsan"                 
## [257] "Asia/Chongqing"                   "Asia/Chungking"                  
## [259] "Asia/Colombo"                     "Asia/Dacca"                      
## [261] "Asia/Damascus"                    "Asia/Dhaka"                      
## [263] "Asia/Dili"                        "Asia/Dubai"                      
## [265] "Asia/Dushanbe"                    "Asia/Famagusta"                  
## [267] "Asia/Gaza"                        "Asia/Harbin"                     
## [269] "Asia/Hebron"                      "Asia/Ho_Chi_Minh"                
## [271] "Asia/Hong_Kong"                   "Asia/Hovd"                       
## [273] "Asia/Irkutsk"                     "Asia/Istanbul"                   
## [275] "Asia/Jakarta"                     "Asia/Jayapura"                   
## [277] "Asia/Jerusalem"                   "Asia/Kabul"                      
## [279] "Asia/Kamchatka"                   "Asia/Karachi"                    
## [281] "Asia/Kashgar"                     "Asia/Kathmandu"                  
## [283] "Asia/Katmandu"                    "Asia/Khandyga"                   
## [285] "Asia/Kolkata"                     "Asia/Krasnoyarsk"                
## [287] "Asia/Kuala_Lumpur"                "Asia/Kuching"                    
## [289] "Asia/Kuwait"                      "Asia/Macao"                      
## [291] "Asia/Macau"                       "Asia/Magadan"                    
## [293] "Asia/Makassar"                    "Asia/Manila"                     
## [295] "Asia/Muscat"                      "Asia/Nicosia"                    
## [297] "Asia/Novokuznetsk"                "Asia/Novosibirsk"                
## [299] "Asia/Omsk"                        "Asia/Oral"                       
## [301] "Asia/Phnom_Penh"                  "Asia/Pontianak"                  
## [303] "Asia/Pyongyang"                   "Asia/Qatar"                      
## [305] "Asia/Qostanay"                    "Asia/Qyzylorda"                  
## [307] "Asia/Rangoon"                     "Asia/Riyadh"                     
## [309] "Asia/Saigon"                      "Asia/Sakhalin"                   
## [311] "Asia/Samarkand"                   "Asia/Seoul"                      
## [313] "Asia/Shanghai"                    "Asia/Singapore"                  
## [315] "Asia/Srednekolymsk"               "Asia/Taipei"                     
## [317] "Asia/Tashkent"                    "Asia/Tbilisi"                    
## [319] "Asia/Tehran"                      "Asia/Tel_Aviv"                   
## [321] "Asia/Thimbu"                      "Asia/Thimphu"                    
## [323] "Asia/Tokyo"                       "Asia/Tomsk"                      
## [325] "Asia/Ujung_Pandang"               "Asia/Ulaanbaatar"                
## [327] "Asia/Ulan_Bator"                  "Asia/Urumqi"                     
## [329] "Asia/Ust-Nera"                    "Asia/Vientiane"                  
## [331] "Asia/Vladivostok"                 "Asia/Yakutsk"                    
## [333] "Asia/Yangon"                      "Asia/Yekaterinburg"              
## [335] "Asia/Yerevan"                     "Atlantic/Azores"                 
## [337] "Atlantic/Bermuda"                 "Atlantic/Canary"                 
## [339] "Atlantic/Cape_Verde"              "Atlantic/Faeroe"                 
## [341] "Atlantic/Faroe"                   "Atlantic/Jan_Mayen"              
## [343] "Atlantic/Madeira"                 "Atlantic/Reykjavik"              
## [345] "Atlantic/South_Georgia"           "Atlantic/St_Helena"              
## [347] "Atlantic/Stanley"                 "Australia/ACT"                   
## [349] "Australia/Adelaide"               "Australia/Brisbane"              
## [351] "Australia/Broken_Hill"            "Australia/Canberra"              
## [353] "Australia/Currie"                 "Australia/Darwin"                
## [355] "Australia/Eucla"                  "Australia/Hobart"                
## [357] "Australia/LHI"                    "Australia/Lindeman"              
## [359] "Australia/Lord_Howe"              "Australia/Melbourne"             
## [361] "Australia/North"                  "Australia/NSW"                   
## [363] "Australia/Perth"                  "Australia/Queensland"            
## [365] "Australia/South"                  "Australia/Sydney"                
## [367] "Australia/Tasmania"               "Australia/Victoria"              
## [369] "Australia/West"                   "Australia/Yancowinna"            
## [371] "Brazil/Acre"                      "Brazil/DeNoronha"                
## [373] "Brazil/East"                      "Brazil/West"                     
## [375] "Canada/Atlantic"                  "Canada/Central"                  
## [377] "Canada/Eastern"                   "Canada/Mountain"                 
## [379] "Canada/Newfoundland"              "Canada/Pacific"                  
## [381] "Canada/Saskatchewan"              "Canada/Yukon"                    
## [383] "CET"                              "Chile/Continental"               
## [385] "Chile/EasterIsland"               "CST6CDT"                         
## [387] "Cuba"                             "EET"                             
## [389] "Egypt"                            "Eire"                            
## [391] "EST"                              "EST5EDT"                         
## [393] "Etc/GMT"                          "Etc/GMT-0"                       
## [395] "Etc/GMT-1"                        "Etc/GMT-10"                      
## [397] "Etc/GMT-11"                       "Etc/GMT-12"                      
## [399] "Etc/GMT-13"                       "Etc/GMT-14"                      
## [401] "Etc/GMT-2"                        "Etc/GMT-3"                       
## [403] "Etc/GMT-4"                        "Etc/GMT-5"                       
## [405] "Etc/GMT-6"                        "Etc/GMT-7"                       
## [407] "Etc/GMT-8"                        "Etc/GMT-9"                       
## [409] "Etc/GMT+0"                        "Etc/GMT+1"                       
## [411] "Etc/GMT+10"                       "Etc/GMT+11"                      
## [413] "Etc/GMT+12"                       "Etc/GMT+2"                       
## [415] "Etc/GMT+3"                        "Etc/GMT+4"                       
## [417] "Etc/GMT+5"                        "Etc/GMT+6"                       
## [419] "Etc/GMT+7"                        "Etc/GMT+8"                       
## [421] "Etc/GMT+9"                        "Etc/GMT0"                        
## [423] "Etc/Greenwich"                    "Etc/UCT"                         
## [425] "Etc/Universal"                    "Etc/UTC"                         
## [427] "Etc/Zulu"                         "Europe/Amsterdam"                
## [429] "Europe/Andorra"                   "Europe/Astrakhan"                
## [431] "Europe/Athens"                    "Europe/Belfast"                  
## [433] "Europe/Belgrade"                  "Europe/Berlin"                   
## [435] "Europe/Bratislava"                "Europe/Brussels"                 
## [437] "Europe/Bucharest"                 "Europe/Budapest"                 
## [439] "Europe/Busingen"                  "Europe/Chisinau"                 
## [441] "Europe/Copenhagen"                "Europe/Dublin"                   
## [443] "Europe/Gibraltar"                 "Europe/Guernsey"                 
## [445] "Europe/Helsinki"                  "Europe/Isle_of_Man"              
## [447] "Europe/Istanbul"                  "Europe/Jersey"                   
## [449] "Europe/Kaliningrad"               "Europe/Kiev"                     
## [451] "Europe/Kirov"                     "Europe/Kyiv"                     
## [453] "Europe/Lisbon"                    "Europe/Ljubljana"                
## [455] "Europe/London"                    "Europe/Luxembourg"               
## [457] "Europe/Madrid"                    "Europe/Malta"                    
## [459] "Europe/Mariehamn"                 "Europe/Minsk"                    
## [461] "Europe/Monaco"                    "Europe/Moscow"                   
## [463] "Europe/Nicosia"                   "Europe/Oslo"                     
## [465] "Europe/Paris"                     "Europe/Podgorica"                
## [467] "Europe/Prague"                    "Europe/Riga"                     
## [469] "Europe/Rome"                      "Europe/Samara"                   
## [471] "Europe/San_Marino"                "Europe/Sarajevo"                 
## [473] "Europe/Saratov"                   "Europe/Simferopol"               
## [475] "Europe/Skopje"                    "Europe/Sofia"                    
## [477] "Europe/Stockholm"                 "Europe/Tallinn"                  
## [479] "Europe/Tirane"                    "Europe/Tiraspol"                 
## [481] "Europe/Ulyanovsk"                 "Europe/Uzhgorod"                 
## [483] "Europe/Vaduz"                     "Europe/Vatican"                  
## [485] "Europe/Vienna"                    "Europe/Vilnius"                  
## [487] "Europe/Volgograd"                 "Europe/Warsaw"                   
## [489] "Europe/Zagreb"                    "Europe/Zaporozhye"               
## [491] "Europe/Zurich"                    "GB"                              
## [493] "GB-Eire"                          "GMT"                             
## [495] "GMT-0"                            "GMT+0"                           
## [497] "GMT0"                             "Greenwich"                       
## [499] "Hongkong"                         "HST"                             
## [501] "Iceland"                          "Indian/Antananarivo"             
## [503] "Indian/Chagos"                    "Indian/Christmas"                
## [505] "Indian/Cocos"                     "Indian/Comoro"                   
## [507] "Indian/Kerguelen"                 "Indian/Mahe"                     
## [509] "Indian/Maldives"                  "Indian/Mauritius"                
## [511] "Indian/Mayotte"                   "Indian/Reunion"                  
## [513] "Iran"                             "Israel"                          
## [515] "Jamaica"                          "Japan"                           
## [517] "Kwajalein"                        "Libya"                           
## [519] "MET"                              "Mexico/BajaNorte"                
## [521] "Mexico/BajaSur"                   "Mexico/General"                  
## [523] "MST"                              "MST7MDT"                         
## [525] "Navajo"                           "NZ"                              
## [527] "NZ-CHAT"                          "Pacific/Apia"                    
## [529] "Pacific/Auckland"                 "Pacific/Bougainville"            
## [531] "Pacific/Chatham"                  "Pacific/Chuuk"                   
## [533] "Pacific/Easter"                   "Pacific/Efate"                   
## [535] "Pacific/Enderbury"                "Pacific/Fakaofo"                 
## [537] "Pacific/Fiji"                     "Pacific/Funafuti"                
## [539] "Pacific/Galapagos"                "Pacific/Gambier"                 
## [541] "Pacific/Guadalcanal"              "Pacific/Guam"                    
## [543] "Pacific/Honolulu"                 "Pacific/Johnston"                
## [545] "Pacific/Kanton"                   "Pacific/Kiritimati"              
## [547] "Pacific/Kosrae"                   "Pacific/Kwajalein"               
## [549] "Pacific/Majuro"                   "Pacific/Marquesas"               
## [551] "Pacific/Midway"                   "Pacific/Nauru"                   
## [553] "Pacific/Niue"                     "Pacific/Norfolk"                 
## [555] "Pacific/Noumea"                   "Pacific/Pago_Pago"               
## [557] "Pacific/Palau"                    "Pacific/Pitcairn"                
## [559] "Pacific/Pohnpei"                  "Pacific/Ponape"                  
## [561] "Pacific/Port_Moresby"             "Pacific/Rarotonga"               
## [563] "Pacific/Saipan"                   "Pacific/Samoa"                   
## [565] "Pacific/Tahiti"                   "Pacific/Tarawa"                  
## [567] "Pacific/Tongatapu"                "Pacific/Truk"                    
## [569] "Pacific/Wake"                     "Pacific/Wallis"                  
## [571] "Pacific/Yap"                      "Poland"                          
## [573] "Portugal"                         "PRC"                             
## [575] "PST8PDT"                          "ROC"                             
## [577] "ROK"                              "Singapore"                       
## [579] "Turkey"                           "UCT"                             
## [581] "Universal"                        "US/Alaska"                       
## [583] "US/Aleutian"                      "US/Arizona"                      
## [585] "US/Central"                       "US/East-Indiana"                 
## [587] "US/Eastern"                       "US/Hawaii"                       
## [589] "US/Indiana-Starke"                "US/Michigan"                     
## [591] "US/Mountain"                      "US/Pacific"                      
## [593] "US/Samoa"                         "UTC"                             
## [595] "W-SU"                             "WET"                             
## [597] "Zulu"                            
## attr(,"Version")
## [1] "2025b"

If you understand how to set up a Date type in R, setting up date-times aren't that different. It just takes a bit more attention to get the format right. To demonstrate, we'll read in HOBO temperature data and set the timestamp column as a POSIXct date-time. There's usually a bit of cleaning required of HOBO data beyond setting the timestamp as POSIXct date-time. I'll show the whole process below.

Read in temperature data and look at it

temp_data1 <- read.csv("./data/HOBO_temp_example.csv")

# check data
head(temp_data1)
View R output
##   Plot.Title.HOBO_temp_example.csv                    X
## 1                                # Date Time, GMT-05:00
## 2                                1      7/18/2021 10:26
## 3                                2      7/18/2021 11:26
## 4                                3      7/18/2021 12:26
## 5                                4      7/18/2021 13:26
## 6                                5      7/18/2021 14:26
##                                               X.1
## 1 Temp, °F (LGR S/N: 20672839, SEN S/N: 20672839)
## 2                                          58.842
## 3                                          58.712
## 4                                          58.109
## 5                                          56.208
## 6                                          56.208
##                                    X.2                                  X.3
## 1 Coupler Detached (LGR S/N: 20672839) Coupler Attached (LGR S/N: 20672839)
## 2                               Logged                                     
## 3                                                                          
## 4                                                                          
## 5                                                                          
## 6                                                                          
##                           X.4                             X.5
## 1 Stopped (LGR S/N: 20672839) End Of File (LGR S/N: 20672839)
## 2                                                            
## 3                                                            
## 4                                                            
## 5                                                            
## 6

Note the extra row on top showing the file name. HOBO data often has some metadata in the first row. The next code chunk imports a cleaner version of the data by skipping the first row, only pulling in the first 3 columns (we don't care about the columns that report Logged), and cleaning up the column names.

Clean up non-date HOBO data

temp_data <- read.csv("./data/HOBO_temp_example.csv", skip = 1)[,1:3]
colnames(temp_data) <- c("index", "date_time", "tempF")
View(temp_data)
First 50 rows of temp_data
index date_time tempF
1 7/18/2021 10:26 58.842
2 7/18/2021 11:26 58.712
3 7/18/2021 12:26 58.109
4 7/18/2021 13:26 56.208
5 7/18/2021 14:26 56.208
6 7/18/2021 15:26 55.342
7 7/18/2021 16:26 55.602
8 7/18/2021 17:26 55.949
9 7/18/2021 18:26 55.602
10 7/18/2021 19:26 55.733
11 7/18/2021 20:26 55.819
12 7/18/2021 21:26 55.776
13 7/18/2021 22:26 56.469
14 7/18/2021 23:26 56.642
15 7/19/2021 0:26 56.556
16 7/19/2021 1:26 55.863
17 7/19/2021 2:26 55.819
18 7/19/2021 3:26 55.733
19 7/19/2021 4:26 55.733
20 7/19/2021 5:26 55.733
21 7/19/2021 6:26 55.949
22 7/19/2021 7:26 55.776
23 7/19/2021 8:26 56.035
24 7/19/2021 9:26 56.079
25 7/19/2021 10:26 56.901
26 7/19/2021 11:26 63.090
27 7/19/2021 12:26 63.732
28 7/19/2021 13:26 57.420
29 7/19/2021 14:26 56.685
30 7/19/2021 15:26 56.383
31 7/19/2021 16:26 56.469
32 7/19/2021 17:26 56.512
33 7/19/2021 18:26 56.815
34 7/19/2021 19:26 56.122
35 7/19/2021 20:26 57.074
36 7/19/2021 21:26 56.469
37 7/19/2021 22:26 56.122
38 7/19/2021 23:26 56.772
39 7/20/2021 0:26 57.979
40 7/20/2021 1:26 57.807
41 7/20/2021 2:26 56.469
42 7/20/2021 3:26 56.728
43 7/20/2021 4:26 56.295
44 7/20/2021 5:26 56.035
45 7/20/2021 6:26 56.079
46 7/20/2021 7:26 56.079
47 7/20/2021 8:26 56.165
48 7/20/2021 9:26 56.469
49 7/20/2021 10:26 57.031
50 7/20/2021 11:26 57.979

Convert date_time to POSIXct

We can see that the date is formatted as M/D/YYY, then there's a space, then the time is formatted with HH:MM, with hours following the 0-23 pattern, minutes 00-59. There are no seconds.

temp_data$timestamp <- as.POSIXct(temp_data$date_time, 
                                  format = "%m/%d/%Y %H:%M", 
                                  tz = "America/New_York")
head(temp_data)
##   index       date_time  tempF           timestamp
## 1     1 7/18/2021 10:26 58.842 2021-07-18 10:26:00
## 2     2 7/18/2021 11:26 58.712 2021-07-18 11:26:00
## 3     3 7/18/2021 12:26 58.109 2021-07-18 12:26:00
## 4     4 7/18/2021 13:26 56.208 2021-07-18 13:26:00
## 5     5 7/18/2021 14:26 56.208 2021-07-18 14:26:00
## 6     6 7/18/2021 15:26 55.342 2021-07-18 15:26:00

Extract the YYYYMMDD date, month, Julian day, time, and hour of the timestamp.

temp_data$date <- format(temp_data$timestamp, "%Y%m%d") 
temp_data$month <- format(temp_data$timestamp, "%b")
temp_data$time <- format(temp_data$timestamp, "%I:%M") 
temp_data$hour <- as.numeric(format(temp_data$timestamp, "%I")) 
head(temp_data)
##   index       date_time  tempF           timestamp     date month  time hour
## 1     1 7/18/2021 10:26 58.842 2021-07-18 10:26:00 20210718   Jul 10:26   10
## 2     2 7/18/2021 11:26 58.712 2021-07-18 11:26:00 20210718   Jul 11:26   11
## 3     3 7/18/2021 12:26 58.109 2021-07-18 12:26:00 20210718   Jul 12:26   12
## 4     4 7/18/2021 13:26 56.208 2021-07-18 13:26:00 20210718   Jul 01:26    1
## 5     5 7/18/2021 14:26 56.208 2021-07-18 14:26:00 20210718   Jul 02:26    2
## 6     6 7/18/2021 15:26 55.342 2021-07-18 15:26:00 20210718   Jul 03:26    3
CHALLENGE: How would you extract the month as a number ranging from 1-12 in temp_data?.
Answer
temp_data$month_num <- as.numeric(format(temp_data$timestamp, "%m"))
head(temp_data)
View R output
##   index       date_time  tempF           timestamp     date month  time hour
## 1     1 7/18/2021 10:26 58.842 2021-07-18 10:26:00 20210718   Jul 10:26   10
## 2     2 7/18/2021 11:26 58.712 2021-07-18 11:26:00 20210718   Jul 11:26   11
## 3     3 7/18/2021 12:26 58.109 2021-07-18 12:26:00 20210718   Jul 12:26   12
## 4     4 7/18/2021 13:26 56.208 2021-07-18 13:26:00 20210718   Jul 01:26    1
## 5     5 7/18/2021 14:26 56.208 2021-07-18 14:26:00 20210718   Jul 02:26    2
## 6     6 7/18/2021 15:26 55.342 2021-07-18 15:26:00 20210718   Jul 03:26    3
##   month_num
## 1         7
## 2         7
## 3         7
## 4         7
## 5         7
## 6         7

CHALLENGE: How would you extract the Julian date in temp_data?.
Answer
temp_data$julian <- as.numeric(format(temp_data$timestamp, "%j"))
head(temp_data)
View R output
##   index       date_time  tempF           timestamp     date month  time hour
## 1     1 7/18/2021 10:26 58.842 2021-07-18 10:26:00 20210718   Jul 10:26   10
## 2     2 7/18/2021 11:26 58.712 2021-07-18 11:26:00 20210718   Jul 11:26   11
## 3     3 7/18/2021 12:26 58.109 2021-07-18 12:26:00 20210718   Jul 12:26   12
## 4     4 7/18/2021 13:26 56.208 2021-07-18 13:26:00 20210718   Jul 01:26    1
## 5     5 7/18/2021 14:26 56.208 2021-07-18 14:26:00 20210718   Jul 02:26    2
## 6     6 7/18/2021 15:26 55.342 2021-07-18 15:26:00 20210718   Jul 03:26    3
##   month_num julian
## 1         7    199
## 2         7    199
## 3         7    199
## 4         7    199
## 5         7    199
## 6         7    199


Day 3: Data Viz

Day 3 Goals

Goals for Day 3:
Poor file management illustration Artwork by @allison_horst

Data Visualization:
  • Data visualization best practices
  • Understanding the building blocks of ggplot2 plotting package
  • Custom colors and shapes by grouping variables
  • Customizing axes
  • Combining plots with facets and patchwork
  • Working with legends
  • Color palettes
Coding best practices:
  • Commenting code
  • Putting packages, datasets, and parameters on top of script
  • Using consistent coding style
  • Logical object naming
  • Using projects instead of stand alone scripts where possible
  • How to choose R packages


Feedback: Please leave feedback in the training feedback form. You can submit feedback multiple times and don't need to answer every question. Responses are anonymous.


Best Practices

The Power of (Good) Data Visualizations

Data are useful only when used. Data are used only when understood.

Consider three example data visualizations that demonstrate how some approaches are more effective than others in conveying patterns.

Example 1. Plots convey messages faster than tables

Most people can understand this figure of daily Covid cases faster than they can understand the table of daily Covid cases.

Figure 1. Average daily Covid cases per 100k people, by region (Sources: State and local health agencies [cases]; Census Bureau [population data]

Table 1. Daily Covid cases and population numbers by state (only showing first 7 records)
state timestamp cases total_population
AK 2022-01-25T04:00:00Z 203110 731545
AL 2022-01-25T04:00:00Z 1153149 4903185
AR 2022-01-25T04:00:00Z 738652 3017804
AZ 2022-01-25T04:00:00Z 1767303 7278717
CA 2022-01-25T04:00:00Z 7862003 39512223
CO 2022-01-25T04:00:00Z 1207991 5758736
CT 2022-01-25T04:00:00Z 683731 3565287


Example 2. Plots reveal patterns and highlight extremes

This table shows average monthly revenue for Acme products.

Table 2. Average monthly revenue (in $1000's) from Acme product sales, 1950 - 2020
category product Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
party supplies balloons 892 1557 1320 972 1309 1174 1153 1138 1275 1178 1325 1422
party supplies confetti 1271 1311 829 1020 1233 1061 1088 1395 1376 1152 1568 1412
party supplies party hats 1338 1497 1445 956 1372 1482 1048 877 1404 1030 1458 1547
party supplies wrapping paper 1396 1026 932 891 1364 896 900 1221 1146 967 1394 1507
school supplies backpacks 1802 1773 1611 1723 1799 1730 1813 1676 1748 1652 1819 1759
school supplies notebooks 1153 1471 1541 1371 1592 1514 1725 1702 1457 1604 1729 1279
school supplies pencils 1679 1304 1054 1259 1425 1608 1972 1811 1610 1004 1417 1283
school supplies staplers 1074 1708 1439 1154 1551 1099 1793 1601 1647 1666 1389 1511

Use the table above to answer these questions:

  1. What product and month had the highest average monthly revenue?
  2. What product and month had the lowest average monthly revenue?

Now let's display the same table as a heat map, with larger numbers represented by darker color cells. How quickly can we answer those same two questions? What patterns can we see in the heat map that were not obvious in the table above?

Figure 2. Heat map of average monthly revenue (in $1000's) from Acme product sales, 1950 - 2020


Example 3. Plots provide insights that statistics obscure

In 1973, Francis Anscombe published "Graphs in statistical analysis", a paper describing four bivariate datasets with identical means, variances, and correlations.

Table 3. Anscombe's Quartet - Four bivariate datasets with identical summary statistics
x1 y1 x2 y2 x3 y3 x4 y4
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.10 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.10 4 5.39 19 12.50
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89
Table 4. Means and variances are identical in the four datasets. The correlation between x and y (r = 0.82) is also identical across the datasets.
x1 y1 x2 y2 x3 y3 x4 y4
mean 9 7.50 9 7.50 9 7.50 9 7.50
var 11 4.13 11 4.13 11 4.12 11 4.12


Anscombe data as plots: Despite their identical statistics, when we plot the data we see the four datasets are actually very different. Anscombe's point was to understand the data, we must plot the data.

)


Guidelines for Effective Data Visualizations Anscombe used clever thinking and simple plots to demonstrate the importance of data visualizations. But it's not enough to just plot the data. To have impact, a plot must convey a clear message. How can we do this?
  • Have a purpose. Every data visualization should tell a story.
  • Consider your audience. Avoid scientific names, acronyms, or jargon unless your audience is well-versed in that language. Use color-blind friendly colors.
  • Use an appropriate visualization. For example:
    • Line graphs work well for showing changes in continuous data over time
    • Bar charts compare counts or proportions in categorical data (pie charts get a bad rap in the data viz world, but can be useful in certain situations)
    • For statistics (e.g., means) with confidence intervals, point plots with error bars are preferred over bar charts (nice explanation here)
    • Scatterplots are useful for showing the relationship (correlation) between two continuous variables.
    • Matrix heat maps can efficiently compare the magnitude of numbers when we have lots of data structured in table format, especially when colors have a clear connection to the numbers (e.g., scorecard data)
    • Box plots, violin plots, and histograms show distributions and outliers for continuous data. Dot plots are a useful alternative when sample sizes are small.
  • Keep it simple:
    • Every plot element and aesthetic should have a purpose.
    • Avoid 3D charts unless you have good reason otherwise.
    • Don't try to cram everything into one plot (e.g., juxtapose two plots instead of adding a secondary y-axis. Nice explanation here).
  • Use appropriate font size for axis labels. The audience should not need a magnifying glass to read the axis labels on your figure. This is particularly true for powerpoint slides. Either make the font bigger in R, or manually fix that in powerpoint.
  • Figures should not require 5 minutes of explaining before people can understand them. If that's the case for your figure, think about how you can simplify or more clearly convey information.
  • Use informative text and arrows wisely. Clear, meaningful titles, subtitles, axes titles/labels, and annotations help convey the message of a plot. Use lines and arrows (sparingly but effectively) to emphasize important thresholds, data points or other plot features. Fonts should be large enough with good contrast (against the background) and sufficient white space to be easily readable.


Intro to ggplot2

Intro to ggplot2

The ggplot2 package is the most popular R package for plotting. It takes a little effort to learn how the pieces of a ggplot object fit together. However, once you get the hang of it, you can create and customize a large variety of attractive plots with just a few lines of R code. The package is called ggplot2 because originally there was ggplot. The developer, Hadley Wickham, didn't want to break the original package to improve the package, so created ggplot2.

The ggplot2 online book and cheatsheets can be very helpful while you are learning to use the ggplot2 package.

The ggplot2 package was developed using the grammar of graphics as the underlying philosophy, which basically breaks a plot up into individual building blocks related to aesthetics (e.g., color, size, shape), geometries (e.g. points, lines, boxes), and themes (e.g. axis label font size, legend placement, etc.).

Important concepts with ggplot2:
  • ggplot(data, aes()): Every ggplot object starts with this line, which tells R the data you're plotting, and which variables you're plotting where in aes() argument.
  • data: every plot requires data. The first argument for every ggplot()call is the data. This also means you can pipe data into a ggplot object.
  • aes: Short for aesthetics. This is where you tell ggplot what your x and y variables are. If you want aesthetics, like color, fill, or size to vary by the data (e.g., color code a figure by park, use a dashed vs. solid line to distinguish between significant/non-significant trend, etc.), those variables are specified within the aes() argument either at the ggplot() level, or within the specific geom.
  • geom: Short for geometries, geoms represent what you see in the plot, such as the points in a scatter plot, the lines in a trend plot, or the boxes in a boxplot.
  • scale: Scales go hand in hand with aes(). If you specify aes(color = park), then scale is where you can specify a custom color for each park instead of ggplot's default color scheme. The scale is where you can set the labels of groups in a legend (if different than how the data are labeled). You can also customize axis ranges, breaks, and labels with different scales.
  • theme: This is where you can change the format of the figure, such as removing the gridlines in the default ggplot format, making axis labels larger, changing the position of the legend, etc.
  • facet: Faceting the data allows you to graph multiple plots based on a grouping variable (e.g., site, species, year, etc.) in the same ggplot object.


Building a ggplot

Building a ggplot object

We're going to use site-level photoplot % cover data from Ship Harbor to create the plot below, and will work through the code one piece at a time.

Mean percent cover by community type (panels) in Ship Harbor. Error bars represent +/- 1 SE.

Mean percent cover by community type (panels) in Ship Harbor. Error bars represent +/- 1 SE.


Import photoplot cover data from Ship Harbor

# load package
library(ggplot2) 
# import data
pcov <- read.csv("./data/SHIHAR_photoplot_cover.csv") 
# check out the data
head(pcov) 

Create the ggplot template of average cover over time.

I'm going to assign this to an object named p, so we can build it one layer at a time. We're going to have unique colors and shapes for each level of CoverCode in our data, so we need to indicate that in the aes() along with our x and y variables. If we don't include color, fill, and shape in the aes(), the points would all be the same color (default = black) and shape (default is filled circle).

p <- ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                             color = CoverCode, 
                             fill = CoverCode,
                             shape = CoverCode))

p


Add point and error bars

Add point and errorbar geometry

The order of geometries is the order they're drawn. I prefer the look of the points after the error bars.

p2 <- p + 
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover)) +
  geom_point()

p2

Specify colors and shapes

The default colors and symbols in ggplot aren't great. We're going to start by specifying our own colors and shapes manually. Then we'll use color palettes from different packages. Specifying shapes in R requires knowing the shape's symbol code. To view that, run ?points, or search "pch in R plot" and you'll get the info below. Note that 0-14 are just lines with no fill. To change their color, use the color aesthetic. Symbols 15-20 are solid, but also use color to change their aesthetic. Symbols 21-25 have both a color (outline) and fill (inside) aesthetic.

R pch symbols

Figure of symbol codes in R.


p3 <- p2 + 
  scale_fill_manual(values = c("ASCNOD" = "#bcb02f", "BARSPP" = "#CAC7B6", 
                               "NONCOR" = "#420816", "FUCSPP" = "#646519",
                               "MUSSPP" = "#170461", "REDGRP" = "#9e224d")) +
  scale_color_manual(values = c("ASCNOD" = "#bcb02f", "BARSPP" = "#CAC7B6", 
                                "NONCOR" = "#420816", "FUCSPP" = "#646519",
                                "MUSSPP" = "#170461", "REDGRP" = "#9e224d")) +
  scale_shape_manual(values = c("ASCNOD" = 23, "BARSPP" = 24, "NONCOR" = 23,
                                "FUCSPP" = 25, "MUSSPP" = 23, "REDGRP" = 25))
p3
The code above was redundant and tedious to code. Changing colors means changing them in two places. Imagine you're making one of these plots for every rocky intertidal site in Acadia. It quickly becomes cumbersome. A more efficient way to code this is below.

Assign colors and shapes to variables

Adding the name to the scales renames the legend title. But to keep the shapes and colors in the same legend, you have to name all of them the same name.

cols <- c("ASCNOD" = "#bcb02f", "BARSPP" = "#CAC7B6", 
          "NONCOR" = "#420816", "FUCSPP" = "#646519",
          "MUSSPP" = "#170461", "REDGRP" = "#9e224d")

shps <- c("ASCNOD" = 23, "BARSPP" = 24, "NONCOR" = 23,
          "FUCSPP" = 25, "MUSSPP" = 23, "REDGRP" = 25)

p3 <- p2 + 
  geom_point(color = 'dimgrey', size = 2.5) + # setting point outline to dark grey 
  scale_fill_manual(values = cols, name = "Species Group") +
  scale_color_manual(values = cols, name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group")
  
p3


Improve formatting

Improve axis ticks and labels

# Determine year range (so not hard coded/easily updated in future)
xrange <- range(pcov$Year)

p4 <- p3 + 
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") 
  
p4

Improve theme components

p5 <- p4 +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1), # angle x text
          panel.grid.major = element_blank(), # turns of major grids
          panel.grid.minor = element_blank(), # turns off minor grids
          panel.background = element_rect(fill = 'white', color = 'dimgrey'),# makes background white 
          legend.key = element_blank()) # removes square fill around symbols in legend

p5

Facet on CommunityType

p6 <- p5 + facet_wrap(~CommunityType)

p6  


Full code and save figure
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) +
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), # turns of major grids
        panel.grid.minor = element_blank(), # turns off minor grids
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)

Save plot to disk

Copy and pasting figures from R into word documents or powerpoints usually results in a poor resolution figure. The better approach is to save figures to disk using ggsave(). The function saves the most recent figure that was generated in the Plots tab in the bottom right pane. You can also specify the ggplot object name, which would come before the file name. If you wanted a jpg or png instead of svg, just use that as the file extension type. The svg is a vectorize image, which for figures with lines, is usually the best quality image that won't become pixelated when zoomed. Only caveat is not all software supports svgs.

ggsave("SHIHAR_photoplot_cover.svg", height = 8, width = 7)


Test your ggplot skills
CHALLENGE: Using the full plot code above, make the point size 1.5.
Answer
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 1.5) + # changed this line
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
View R plot


CHALLENGE: Using the full plot code above, make the error bars wider.
Answer
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 1.2) + # changed this line
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") + 
  scale_shape_manual(values = shps, name = "Species Group") + 
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
View R plot


CHALLENGE: Using the full plot code above, change the x axis label to "Year".
Answer
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = "Year", y = "Avg Percent Cover") + # changed this line
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
View R plot



Other geometries

Lines and smoothers

Add line geometry to figure

Note that for lines to plot properly in ggplot, you have to assign a grouping variable in the aes(). The fill, color, and shape aesthetics in the beginning already did that for us, but here's how it would look within geom_line() if you needed to do that.

p6 + geom_line(linewidth = 0.8, aes(group = CoverCode))

Add smoother to figure

The geom_smooth() plots a line assuming the y ~ x formula (unless you specify a different formula). By default the method is a LOESS smoother, but you can specify a range of methods, including linear regression by adding method = 'lm' to geom_smooth().

Note that I turned off the standard error ribbon that plots by default using se = FALSE. It’s too busy for this plot. I also don’t use the SE unless I’ve fit an actual model and checked the diagnostics. The status under the hood of geom_smooth() are also pretty black boxy, and I don’t always know if I can trust its calculation of SE.

p6 + geom_smooth(se = F, span = 0.75)

CHALLENGE: How would you change the smoother in the code above from LOESS to linear model and make the line dashed? Hint: method = 'lm'.
Answer
p6 + geom_smooth(method = 'lm', se = F, linetype = 'dashed')
View R plot



Add horizontal dashed line called 50% line

p6 + geom_hline(aes(yintercept = 50), linetype = "dashed") 

Add horizontal dashed line called 50% line and make it show in the legend

p6 + geom_hline(aes(yintercept = 50, linetype = "50% line")) +
     scale_linetype_manual(values = c("50% line" = "dashed"), 
                           name = "Threshold")


Theme changes

Move legend to the bottom

p6 + theme(legend.position = 'bottom')

CHALLENGE: How would you make the legend title larger and bold, and legend text larger?
Answer
p6 + theme(legend.title = element_text(size = 12, face = 'bold'),
           legend.text = element_text(size = 11))
View R plot



Change facet strips fill, color, and font size

p6 + theme(strip.background = element_rect(fill = "#F5F0DC", color = "black"),
           strip.text = element_text(size = 10))


Other geometries

Stacked bars instead of points

ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_bar(stat = 'identity', position = 'fill', color = 'dimgrey') +
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Median. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), # turns of major grids
        panel.grid.minor = element_blank(), # turns off minor grids
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)

Bars with error bars instead of points (note I filtered on Barnacle CommunityType)

ggplot(data = pcov |> filter(CommunityType == "Barnacle"), # note filter 
       aes(x = Year, y = avg_cover, 
           color = CoverCode, group = CoverCode,
           fill = CoverCode, shape = CoverCode)) +
  geom_bar(stat = 'identity', position = 'dodge', color = 'dimgrey') + # new line
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), linewidth = 0.6) +
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CoverCode) # Different facet


Multipanels

Multi-panel plots via patchwork

Faceting is helpful when your observations are all within the same column. But say you have data in multiple columns and want to arrange those plots into a grid. Faceting won't help because the data to plot are in different columns. There are multiple packages that make it easy to arrange multiple plots into a grid to look similar to faceted plots. Packages include grid (and gridExtra), cowplot, ggpubr, and patchwork. We're going to use patchwork, a relative newcomer, and one of the easiest I've found to code and customize. Here we're going to plot pH, temperature, DO, and conductance for Jordan Pond in Acadia NP and arrange them using patchwork.

The patchwork package has a lot of options to customize plot layouts. See the patchwork package website for more information.

Load patchwork and read in water chemistry data

# load packages
library(ggplot2) 
library(patchwork) # multipanel plots

# load data
chem <- read.csv("./data/ACAD_Jordan_Pond_water_chem.csv")

# make date field a date
chem$date <- as.Date(chem$date, format = "%Y-%m-%d")

Create a ggplot object for each parameter

# pH plot
p_pH <-
  ggplot(chem, aes(x = date, y = pH)) + 
  theme_bw() +
  geom_smooth(se = F, span = 0.5) +
  geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
  labs(y = "pH", x = "Year") +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

# temp plot
p_temp <-
  ggplot(chem, aes(x = date, y = Temp_F)) + 
  theme_bw() +
  geom_smooth(se = F, span = 0.5) +
  geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
  labs(y = "Temp (F)", x = "Year") +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
 
# Diss. Oxygen plot
p_do <-
  ggplot(chem, aes(x = date, y = DO_mgL)) + 
  theme_bw() +
  geom_smooth(se = F, span = 0.5) +
  geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
  labs(y = "DO (mg/L)", x = "Year") +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

# Conductance plot
p_cond <-
  ggplot(chem, aes(x = date, y = SpCond_uScm)) + 
  theme_bw() +
  geom_smooth(se = F, span = 0.5) +
  geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
  labs(y = "Spec. Cond. (uScm)", x = "Year")+
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))


Arrange plots using patchwork

This is almost too easy to be true, but it really is this easy with patchwork. The patchwork package includes a bunch of options to customize sizes, add annotation, sharing axes across plots.

p_pH + p_temp + p_do + p_cond

Arrange plots using patchwork in column of 4 and share x axis.

You can also collect the legend using a similar approach to collecting the axes.

library(patchwork)
p_pH / p_temp / p_do / p_cond + plot_layout(axes = "collect_x")


Palettes

External ggplot Palettes A number of R packages have color palettes available that are colorblind friendly. The two most commonly used are RColorBrewer and viridis. Both packages include palettes that are for three main types of data:
  • Sequential: for continuous variables that grade from low to high or vice versa. These tend to be one hue that increases in saturation as values increase.
  • Diverging: for data where low values have different colors than high values (e.g. starts red, then grades to blue).
  • Qualitative: data for categorical variables where all colors are a similar saturation but different hues.
RColorBrewer palette

The palettes in RColorBrewer can be viewed by running the code below. The first group shows the sequential palettes (e.g. YlOrRd - Yellow Orange Red). The second group shows the qualitative colors. The last group shows the diverging palettes. The main drawback of these palettes is they are limited by the number of levels in your data. So, if you specify Set2 to color code different levels of a factor, there are only 8 colors available to you. If your factor has more than 8 levels (e.g., 9 sites, 10 parks, etc.), then the levels beyond 8 won't get plotted and you'll get a warning in the console similar to what we saw for ggplot's default number of symbols.

View RColorBrewer palettes

display.brewer.all(colorblindFriendly = TRUE)
RColorBrewer palettes


Going back to the photoplot cover plots we made before, we'll use RColorBrewer to color code each CoverCode instead of doing this manually. We'll build the plot in the next chunk, that we then change the color palettes with in later plots.


Create basic plot

p_pal <- ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = "Year", y = "Avg Percent Cover") + # changed this line
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)


Use Set2 palette for species groups

p_pal +  scale_color_brewer(name = "Species Group", palette = "Set2", 
                            aesthetics = c("fill", "color")) 

Note how I used the aesthetics in the scale_color_brewer() to set fill and color as the same time. We could have done this in the code above too.


Use Dark2 palette on temperature plot

p_pal +  scale_color_brewer(name = "Species Group", palette = "Dark2", 
                            aesthetics = c("fill", "color")) 


CHALLENGE: How would you specify the 'RdYlBu' palette instead of the ones used above?
Hint: Start with p_pal to save time coding.
Answer
p_pal +  scale_color_brewer(name = "Species Group", palette = "RdYlBu", 
                            aesthetics = c("fill", "color"))  
View R plot


viridis palettes

The viridis package comes with 8 palettes. The benefit of viridis is the number of levels is not limited to 8 like RColorBrewer. The palette options are below for 12 levels.

viridis palettes


View viridis palettes with hexcodes

You can view the hexcodes of the different palettes by running the code below. Just change viridis() to one of the other palette names to get the hexcodes for those levels.

# viridis 
scales::show_col(viridis(12), cex_label = 0.45, ncol = 6)
viridis palette

Use viridis default palette on CoverCode

The scale_color_viridis_d() selects the viridis palette option (purple, green, yellow) for discrete values (i.e. categories). For a continuous scale (e.g. temperature), you would specify scale_color_viridis_c().

p_pal + scale_color_viridis_d(name = "Species Group", aesthetics = c("fill", "color"))  #default viridis 

Use turbo palette on CoverCode

The scale_color_viridis_d() selects the viridis palette option (purple, green, yellow) for discrete values (i.e. categories). For a continuous scale (e.g. temperature), you would specify scale_color_viridis_c().

p_pal + scale_color_viridis_d(name = "Species Group", aesthetics = c("fill", "color"), option = 'turbo') 

Continuous palette with heatmaps

Heatmaps via geom_tile() are a place where viridis palettes are especially helpful producing useful sequential or diverging color palettes. We'll use the temperature data to plot heatmaps by month for each site. Heatmaps are a bit different than other plots we've seen, as the x and y values create a discrete grid, and the color in the cell represents the value for that level of x and y. That means we have to change how the x, y and color aesthetics are specified. Here we will plot temperature by month and year faceted on site.

Basic heatmap code

Note the use of base R's month.abb to set the labels on the x-axis. The month.abb is a vector of the 12 months abbreviated as 3 letters. By setting 5:10, I'm taking the months May - Oct.

p_heat <- 
ggplot(chem, aes(x = mon, y = year, color = Temp_F, fill = Temp_F)) + 
  theme_bw() +
  geom_tile() + 
  labs(y = "Year", x = "Month") +
  scale_x_continuous(breaks = c(5, 6, 7, 8, 9, 10),
                     limits = c(4, 11), 
                     labels = month.abb[5:10]) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

Plot heatmap with viridis continuous palette

p_heat + scale_color_viridis_c(name = "Temp. (F)", aesthetics = c("fill", "color")) 

Plot heatmap with plasma continuous palette, reverse scale

p_heat + scale_color_viridis_c(name = "Temp. (F)", aesthetics = c("fill", "color"), 
                               option = "plasma", direction = -1) 


Create your own color ramp

You can also create your own color ramp via scale_color_gradient(), which creates a 2-color gradient, scale_color_gradient2(), which creates a diverging color gradient (low-mid-high), and a scale_color_gradientn(), which creates an n-color gradient.

Create 2-color gradient

p_heat + scale_color_gradient(low = "#FCFC9A", high = "#F54927", 
                              aesthetics = c("fill", 'color'), 
                              name = "Temp. (F)") 

Create diverging gradient

For the divergent palette to be meaningful, you usually need to set the midpoint if it's not 0.

p_heat + scale_color_gradient2(low = "navy", mid = "#FCFC9A", high = "#F54927", 
                               aesthetics = c("fill", 'color'),
                               midpoint = mean(chem$Temp_F), 
                               name = "Temp. (F)") 

Create diverging gradient with multiple colors

Note the change in the legend by using guide = 'legend'. Default is guide = 'colorbar'. I also customized the breaks into 5-degree bins using breaks() and seq().

p_heat + scale_color_gradientn(colors = c("#805A91", "#406AC2", "#FBFFAD", "#FFA34A", "#AB1F1F"), 
                               aesthetics = c("fill", 'color'),
                               guide = "legend",
                               breaks = c(seq(40, 85, 5)), 
                               name = "Temp. (F)") 

CHALLENGE: Create your own palette with at least three colors.
Hint: Start with p_heat to save time coding.
Answer
p_heat + scale_color_gradient2(low = "#3E693D", mid = "#FDFFC7", high = "#7A6646", 
                               aesthetics = c("fill", 'color'),
                               midpoint = mean(chem$Temp_F), 
                               name = "Temp. (F)") 
View R plot



CTD data

Example code for plotting CTD data in ggplot.

Load libraries and import data

library(tidyverse)
library(readxl)

ctd_mma <- read_xlsx("./data/PR_PF_2903444 (2).xlsx") |> data.frame()

Graph data with reversed Y axis and color coded by station

ggplot(ctd_mma, aes(x = `TEMP..degree_Celsius.`, 
                    y = `PRES..decibar.`,
                    group = Station, 
                    color = Station)) +
  geom_line() + 
  theme_bw() +
  labs(x = "Temp. (C)", y = "Pressure (dbars)") +
  scale_color_gradientn(colors = c("#805A91", "#406AC2", "#FBFFAD", "#FFA34A", "#AB1F1F"), 
                        aesthetics = c('color'),
                        guide = "legend", # makes legend distinct, rather than color band
                        breaks = 1:15, # number of stations
                        name = "Station ID") +
  scale_y_reverse() + # flip y axis
  scale_x_continuous(limits = c(0, 30),
                     breaks = seq(0, 30, 5),
                     position = 'top') + # plot x-axis on top
  theme(legend.position = 'bottom')


Statistical Analysis

R was designed to facilitate statistical analysis and data visualization, making tests like linear regression and Analysis of Variance relatively straightforward. Using the percent cover data by community type, I'll demonstrate how to set up the models, check diagnostics (at least how I do it), and summarize output. For the linear regression, we'll look at whether common periwinkle (SpeceisCode LITLIT) counts have changed over time in the Red Algae community plots (we'll ignore the lack of independence between years). For the ANOVA, we'll test whether common periwinkle counts differ between the community types.

Import and prep data

Import data and load packages

library(dplyr)
library(ggplot2)
# install.packages('car') # uncomment and run if you don't have this package installed
library(car) # for levene's test

# import data
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")
head(motinv)
View R output
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 3    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 4    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 5    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 6    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
##       ScientificName        CommonName SpeciesCode Damage No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT      0         2         No
## 2 Littorina littorea Common periwinkle      LITLIT      0         3         No
## 3 Littorina obtusata Smooth periwinkle      LITOBT      1         2         No
## 4 Littorina obtusata Smooth periwinkle      LITOBT      0         6         No
## 5   Nucella lapillus          Dogwhelk      NUCLAP      0         1         No
## 6 Littorina littorea Common periwinkle      LITLIT      0         2         No

Prep data for analysis

# prep data for analysis
motinv_final <- motinv |> 
  mutate(Damage = as.numeric(replace(Damage, Damage == "PM", NA)), # Fix Damage PM
         SitePlot = paste(SiteCode, PlotName, sep = "-"), # create new SitePlot column
         Date = as.Date(StartDate, format = "%m/%d/%Y"), # create new Date column
         No.Damage_fix = replace(No.Damage, No.Damage == 1960, 196), 
         total_count = Damage + No.Damage,
         total_count_fix = Damage + No.Damage_fix, # fix error in No.Damage 
         year_st = Year - 2012) |> # set start year to 1 instead of 2013 for better interpretation 
  filter(QAQC == FALSE) |> # drop QAQC visits
  arrange(SitePlot, Year, ScientificName) # optional sorting the data

# summarize counts, so 1 count per year, species and community type
motinv_sum <- motinv_final |> summarize(mean_count = mean(total_count, na.rm = TRUE),
                                        mean_count_fix = mean(total_count_fix, na.rm = T),
                                        .by = c(SiteCode, year_st, CommunityType, SpeciesCode))

# prep for linear regression
motinv_reg <- motinv_sum |> filter(SpeciesCode == "LITLIT" & CommunityType == "Red Algae") 
head(motinv_reg)
View R output
##   SiteCode year_st CommunityType SpeciesCode mean_count mean_count_fix
## 1   BASHAR       1     Red Algae      LITLIT   1.333333       1.333333
## 2   BASHAR       2     Red Algae      LITLIT  10.000000      10.000000
## 3   BASHAR       3     Red Algae      LITLIT  57.800000      57.800000
## 4   BASHAR       4     Red Algae      LITLIT  23.000000      23.000000
## 5   BASHAR       5     Red Algae      LITLIT  97.800000      97.800000
## 6   BASHAR       6     Red Algae      LITLIT 106.200000     106.200000

# prep for analysis of variance
motinv_aov <- motinv_sum |> filter(SpeciesCode == "LITLIT") |> 
  mutate(ComCode = toupper(substr(CommunityType, 1, 3))) # create community code for easier plotting
head(motinv_aov)
View R output
##   SiteCode year_st CommunityType SpeciesCode mean_count mean_count_fix ComCode
## 1   BASHAR       1   Ascophyllum      LITLIT       14.0           14.0     ASC
## 2   BASHAR       2   Ascophyllum      LITLIT       20.8           20.8     ASC
## 3   BASHAR       4   Ascophyllum      LITLIT       71.4           71.4     ASC
## 4   BASHAR       5   Ascophyllum      LITLIT       72.6           72.6     ASC
## 5   BASHAR       6   Ascophyllum      LITLIT       65.0           65.0     ASC
## 6   BASHAR       7   Ascophyllum      LITLIT       80.4           80.4     ASC


Linear Regression

Model formula

lm_mod <- lm(mean_count ~ year_st, data = motinv_reg)

Model diagnostic plots

par(mfrow = c(2,2)) # makes diagnostic plots 2 x 2 grid 
plot(lm_mod)

par(mfrow = c(1,1)) # resets to 1 plot 
hist(resid(lm_mod))

Check outliers

# detect outliers as > 2 SD of residuals
outliers <- which(abs(resid(lm_mod)) > 2 * sd(resid(lm_mod)))

# Highlight the outliers in a scatterplot
plot(mean_count ~ year_st, data = motinv_reg)
points(motinv_reg$year_st[outliers], motinv_reg$mean_count[outliers], col = "red", pch = 19)

Residual and QQ plots clearly shows an outlier. This is the error in No.Damage that has 1960 instead of 196 for a count. I'll show the diagnostics for the fixed data now.

Rerun model with fixed counts

lm_mod_fix <- lm(mean_count_fix ~ year_st, data = motinv_reg)

Model diagnostic plots, take 2

par(mfrow = c(2,2)) # makes diagnostic plots 2 x 2 grid 
plot(lm_mod_fix)

par(mfrow = c(1,1)) # resets to 1 plot 
hist(resid(lm_mod_fix))

Summarize output

summary(lm_mod_fix) 
## 
## Call:
## lm(formula = mean_count_fix ~ year_st, data = motinv_reg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -86.400 -24.845   8.066  21.626  58.397 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   11.813     27.538   0.429  0.67802   
## year_st       12.466      3.773   3.304  0.00917 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.73 on 9 degrees of freedom
## Multiple R-squared:  0.5481, Adjusted R-squared:  0.4979 
## F-statistic: 10.92 on 1 and 9 DF,  p-value: 0.009172

Estimates are the betas, such that the Estimate for year_st is the slope. These results suggest that for every year, there's an average of 12 more common periwinkles found in Red Algae plots. Though note that in plotting the results of the linear regression (using the geom_smooth() with linear method), the trend does not look linear.

Plot model results

ggplot(data = motinv_reg, aes(x = year_st, y = mean_count_fix)) +
  geom_point() +
  geom_smooth(method = 'lm') +
  scale_x_continuous(breaks = seq(1, 13, 2),
                     labels = seq(1, 13, 2) + 2012) +
  labs(x = "Year", y = "Mean common periwinkle count") +
  theme_bw()


Analysis of Variance

Model formula

aov_mod <- aov(mean_count_fix ~ ComCode, data = motinv_aov)

Model diagnostic plots

par(mfrow = c(2,2)) # makes diagnostic plots 2 x 2 grid 
plot(aov_mod)

par(mfrow = c(1,1)) # resets to 1 plot 
hist(resid(aov_mod))

We are going to say that we're okay with the model diagnostics.

Levene's test of equal variance

library(car)
leveneTest(aov_mod)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value    Pr(>F)    
## group  3  7.7178 0.0003493 ***
##       40                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Significant p-value for Levene test indicates non-equal variance among groups (not surprisingly).

Shapiro-Wilk test of normality

A significant p-value rejects the null hypothesis of normal. The non-significant p-values suggests normality isn't a problem.

shapiro.test(rstandard(aov_mod)) 
## 
##  Shapiro-Wilk normality test
## 
## data:  rstandard(aov_mod)
## W = 0.96188, p-value = 0.153

Summarize output

summary(aov_mod) 
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## ComCode      3  42785   14262   6.719 0.000888 ***
## Residuals   40  84903    2123                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Significant p-value indicates at least one community type has a different mean count of common periwinkle.

Tukey's pairwise comparisons

TukeyHSD(aov_mod, conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mean_count_fix ~ ComCode, data = motinv_aov)
## 
## $ComCode
##               diff       lwr       upr     p adj
## BAR-ASC -36.681818 -89.33845  15.97481 0.2582119
## FUC-ASC  41.031818 -11.62481  93.68845 0.1742661
## RED-ASC  35.443939 -17.21269  88.10057 0.2864043
## FUC-BAR  77.713636  25.05700 130.37027 0.0016647
## RED-BAR  72.125758  19.46913 124.78239 0.0037846
## RED-FUC  -5.587879 -58.24451  47.06875 0.9918492

plot Tukey's pairwise comparisons

plot(TukeyHSD(aov_mod, conf.level = 0.95), las = 2)

Plot model results

# reorder community by elevation
motinv_aov$ComCode_fac <- factor(motinv_aov$ComCode, levels = c("BAR", "ASC", "FUC", "RED"))

ggplot(data = motinv_aov, aes(x = ComCode_fac, y = mean_count_fix)) +
  stat_summary(geom = 'bar', fun.data = mean_se, fill = 'grey', color = 'dimgrey') +
  stat_summary(geom = 'errorbar', fun.data = mean_se, color = 'dimgrey', width = 0.3) +
  labs(x = "Community Type", y = "Mean common periwinkle count") +
  geom_text(aes(x = 1, y = 30, label = "AB"), size = 5) +
  geom_text(aes(x = 2, y = 70, label = "A"), size = 5) +
  geom_text(aes(x = 3, y = 125, label = "B"), size = 5) +
  geom_text(aes(x = 4, y = 118, label = "B"), size = 5) +
  theme_bw()


Coding Best Practices

Background

Knowing how to code is only part of being a good coder. Below are general best practices to make code easier to run, understand, and be more stable with a relatively low maintenance cost. Many of these suggestions come from lessons working with my and other peoples' code. The R for Data Science also has a lot of great information on coding best practices in


Tips for good code
Thorough commenting. Dependencies, like packages, datasets, and parameters are at the top.
# libraries
library(dplyr) # for mutate and filter

# parameters
analysis_year <- 2023

# import data set
photo_dat <- read.csv("./data/SHIHAR_photoplot_cover.csv")

# Filtering on Barnacle community type and analysis year
photo_dat2 <- photo_dat |> filter(CommunityType == "Barnacle") |> 
                           filter(Year == analysis_year) 
Descriptive names and consistent case

Object names must start with a letter and can only contain letters, numbers, underscore, and period. Spaces aren't allowed in object names, and are best avoided in column names of data frames too. Descriptive object names will help you digest code, and often you'll want more than one word in the name. There are multiple cases that people tend to use, the most common of which tends to be snake_case. Other examples are below.

snake_case # most common in R
camelCase # capitalize new words after the first
period.separation # separate words by periods
whyWOULDyouDOthisTOsomeone # excess capitalization is a pain
Thoughtful word ordering

Ordering words in names, so that objects that are similar or derived from each other sort together. This also makes coding easier, as like objects will sort together in the popups that you see as you code.

# good word order
ACAD_rocky <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_rocky2 <- ACAD_rocky |> filter(year > 2020)
ACAD_rocky3 <- ACAD_rocky2 |> mutate(plot_type = "vital signs")

# bad word order
rocky_ACAD <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_after_2020 <- rocky_ACAD |> filter(year > 2020)
vital_ACAD_2020 <- ACAD_after_2020 |> mutate(plot_type = "vital signs")
Avoid long names

It's helpful to balance descriptive names with length. The longer the object name, the more typing you have to do to refer to that object. Coding long names, such as long column names in data frames, is cumbersome and inefficient. Compare the two objects below. While I doubt many would make super long object names like this, I commonly see excessively long column names in data packages. Limiting column names to 12 characters or less is super helpful for coders using those data.

# super long names
ACAD_rocky_intertidal_sampling_data <- data.frame(years_plots_were_sampled = c(2020:2025), wetland_plots_sampled = c(1:6))
ACAD_rocky_intertidal_sampling_data2 <- rocky_intertidal_sampling_data |> filter(years_plots_were_sampled > 2020)

# shorter still meaningful
ACAD_rocky <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_rocky2 <- ACAD_rocky |> filter(year > 2020)
Use a consistent code style

Code style refers to consistent use of case, indenting, spacing, line width, etc. There are several style conventions out there. I tend to use the tidyverse style guide, which is based on Google's R style guide.

Style conventions I follow:
  • Space before and after operators, like <-, =, ==, |>, +, etc.
  • Space after commas
  • Keep line width narrow enough to prevent scrolling to the right to view code
  • Indent code in the same function or list together
  • One pipe per line

Example 1. Style for pipes

# Good code
trees_final <- trees |> 
  mutate(DecayClassCode_num = as.numeric(DecayClassCode),
         Plot_Name = paste(ParkUnit, PlotCode, sep = "-"),
         Date = as.Date(SampleDate, format = "%m/%d/%Y")) |> 
  rename("Species" = "ScientificName") |> 
  filter(IsQAQC == FALSE) |> 
  select(-DecayClassCode) |> 
  arrange(Plot_Name, TagCode)

# Same code, but much harder to follow
trees_final <- trees|>mutate(DecayClassCode_num=as.numeric(DecayClassCode), Plot_Name=paste(ParkUnit,PlotCode,sep = "-"),  Date=as.Date(SampleDate,format="%m/%d/%Y"))|> rename("Species"="ScientificName")|>filter(IsQAQC==FALSE)|>select(-DecayClassCode)|>arrange(Plot_Name,TagCode)

Example 2. Style for ggplot object

# Good code
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
  geom_line() + 
  geom_point(color = "black", fill = "#82C2a3", size = 2.5, shape = 24) +
  labs(x = "Year", 
       y = "Annual visitors in 1000's") +
  scale_y_continuous(limits = c(2000, 4500),
                     breaks = seq(2000, 4500, by = 500)) + 
  scale_x_continuous(limits = c(1994, 2024),
                     breaks = c(seq(1994, 2024, by = 5))) + 
  theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1), 
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'),
        title = element_text(size = 10) 
        )

# Same code but hard to follow
ggplot(data=visits,aes(x=Year,y=Annual_Visits/1000))+geom_line()+geom_point(color="black",fill="#82C2a3",size=2.5,shape=24) +
labs(x = "Year", y = "Annual visitors in 1000's")+
scale_y_continuous(limits=c(2000,4500),breaks=seq(2000,4500,by=500))+ 
scale_x_continuous(limits=c(1994,2024),breaks=c(seq(1994,2024,by=5)))+ 
theme(axis.text.x=element_text(size=10,angle=45,hjust=1), panel.grid.major=element_blank(), 
panel.grid.minor=element_blank(),panel.background=element_rect(fill='white',color='dimgrey'),
title = element_text(size = 10))

Higher level practices
Logical file naming

Using projects instead of stand alone scripts helps keep the various pieces of an analysis project in one place and more easily transferable across computers. Logical naming of scripts, so they sort easily, is also helpful.

Order and purpose of file names easy to follow

Logical file naming

Hard to know script order and purpose

Logical file naming

Caution choosing packages for core work R packages add a ton of functionality that is not available in base R. They save us a lot of work having to build tasks from scratch and are the product of developers sharing their work for free to the benefit of the rest of us. In that way, R packages are amazing. However, there is a dark side to R packages. While base R code is backwards compatible, meaning anything built in R 1.0, should run without breaking in R 5.0, R packages do not generally come with that promise. The more packages your code uses, the more susceptible your code is to changes in package dependencies that break your code. For one-time tasks I don't expect to repeat, or if I don't have time to build the thing I need that a package does for me, I am pretty package promiscuous. For coding tasks I expect to perform repeatedly, and therefore will have a cost to maintaining code, I use packages sparingly. Here's how I choose whether or not to trust a package:
  • Package is hosted on CRAN repository (CRAN). Hosting packages on CRAN is a high bar. Packages have to meet certain standards and go through rigorous testing before they're accepted. This means there's likely a long-term plan for this package to be maintained and updated as needed.
  • If package is instead on GitHub.com or another coding repository, counting on that project is a bit riskier. I still use GitHub packages, but I look for ones that had active development within the past year or so, and that have good help documentation. That again usually means there's good long-term plan for maintaining this package, and it's less likely to disappear or have sloppy code-breaking changes.
  • The tidyverse collection of packages is incredible, but there are pros and cons. While some packages and functions have been really stable, like dplyr and ggplot2, there's a lot of active development in tidyverse packages. Developers do a good job documenting the lifecycle of functions in help documentation, which I encourage you to pay attention to when you're coding. I have had updates to tidyverse packages break my code, mostly because functions have become stricter in what they accept over time. The lubridate package, for example, has burned me a couple of times, so I primarily use base R code to work with date times.
  • Packages that have a lot of other dependencies add to the risk of code breaking changes. You can view dependencies in the DESCRIPTION file in every package. Those listed under Imports or Depends are the primary dependencies.


Challenges

Day 1 Questions

Load Data

If you're starting a new R session to answer these questions, you'll need to read in the wetland and tree data frames again.

Read in example Bass Harbor motile invertebrate data from url

motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")


Data Structures

CHALLENGE: How would you look at the the first 4 even rows (2, 4, 6, 8), and first 2 columns of the motinv data frame?

Answer
Answer that works
motinv[c(2, 4, 6, 8), c(1, 2)]
##   Network UnitCode
## 2    NETN     ACAD
## 4    NETN     ACAD
## 6    NETN     ACAD
## 8    NETN     ACAD
Better answer that's more stable
names(motinv) # get the names of the first 2 columns
##  [1] "Network"        "UnitCode"       "SiteCode"       "StartDate"     
##  [5] "Year"           "QAQC"           "PlotName"       "CommunityType" 
##  [9] "ScientificName" "CommonName"     "SpeciesCode"    "Damage"        
## [13] "No.Damage"      "Subsampled"
motinv[c(2, 4, 6, 8), c("Network", "UnitCode")]
##   Network UnitCode
## 2    NETN     ACAD
## 4    NETN     ACAD
## 6    NETN     ACAD
## 8    NETN     ACAD

CHALLENGE: How many unique species are there in the motinv data frame?

Answer
# Option 1
length(unique(motinv[, "ScientificName"])) # 6
# Option 2
length(unique(motinv$ScientificName)) # equivalent
## [1] 6
CHALLENGE: What years were QAQC visits conducted (QAQC = TRUE)?
Answer
# Option 1 - used unique to just return unique site name
unique(motinv$Year[motinv$QAQC == TRUE]) # 2013
# Option 2
unique(motinv[motinv$QAQC == TRUE, "Year"])
## [1] 2013


Data Exploration
CHALLENGE: Using motinv, how many species are found in PlotName A1 in 2024?
Answer

Option 1. Use brackets, then calculate the number of rows.

# with brackets
A1_2024 <- motinv[motinv$PlotName == "A1" & motinv3$Year == 2024, ]
nrow(A1_2024) # 3
View R output
## [1] 3

Option 2. Use base R subset, then view the data.frame to see.

# with base R subset
A1_2024b <- subset(motinv, PlotName == "A1" & Year == 2024)
View(A1_2024b) # 3
CHALLENGE: What years have green crabs (Latin: Carcinus maenas) been detected?
Answer

Option 1. Subset the data with brackets and use the sort(unique()) to give an easier to read output.

# OPTION 2
gcrab <- motinv[motinv$ScientificName == "Carcinus maenas",]
sort(unique(gcrab$Year)) #2019, 2021, 2022, 2023, 2024
## [1] 2019 2021 2022 2023 2024

Option 2. Subset data then use table() to tally the years and number of rows green crabs were found.

gcrab2 <- subset(motinv, ScientificName == "Carcinus maenas")
table(gcrab2$Year)
## 
## 2019 2021 2022 2023 2024 
##    3   16    6   11   11
CHALLENGE: Find the highest value recorded in No.Damage column.
Answer

There are multiple ways to do this. Two examples are below.

Option 1. View the data and sort by No.Damage.

View(motinv)

Option 2. Find the max No.Damage count and subset the data frame

max_nd <- max(motinv$No.Damage, na.rm = TRUE)
motinv3[motinv$No.Damage == max_nd,]
##    Network UnitCode SiteCode StartDate Year QAQC PlotName CommunityType Species
## NA    <NA>     <NA>     <NA>      <NA>   NA   NA     <NA>          <NA>    <NA>
##    CommonName SpeciesCode No.Damage Subsampled Damage_num Date Site_Plot
## NA       <NA>        <NA>        NA       <NA>         NA <NA>      <NA>

CHALLENGE: Fix the No.Damage typo by replacing 1960 with 196.

Answer

Let's say that you looked at the datasheet, and the actual count for No.Damage was 196 instead of 1960. You can change that value in the original CSV by hand. But even better is to document that change in code. There are multiple ways to do this. Two examples are below.

But first, it's good to create a new data frame when modifying the original data frame, so you can refer back to the original if needed. I also use a really specific filter to make sure I'm not accidentally changing other data.

Replace 1960 with 196

# create copy of motinv data
motinv_fix <- motinv

# find the problematic value, and change it to 196
motinv_fix$No.Damage[motinv_fix$Year == 2019 & 
                     motinv_fix$PlotName == "R4" & 
                     motinv_fix$No.Damage == 1960] <- 196

# check your work
range(motinv$No.Damage) # 0 1960
## [1]    0 1960
range(motinv_fix$No.Damage) # 0 282
## [1]   0 282


Basic Plotting
CHALLENGE: Plot a histogram of Damage in the motinv data frame Hint: make the Damage column numeric first.
Answer
hist(as.numeric(motinv$Damage))


Day 2 Questions

Load Data and Packages

If you're starting a new session to answer these questions, you'll need to load dplyr and read in the motile invertebrate data frame again.

Load dplyr

library(dplyr)

Read in example motile invertebrate and point intercept data

#--- Point intercept data ---
pi_dat <- read.csv("./data/BASHAR_Point_Intercept_data.csv")

#--- Motile invert count ---
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")

#--- Motile invert site ---
motspp <- read.csv("./data/motile_invert_species_table.csv")

#--- hobo temp data ---
temp_data <- read.csv("./data/HOBO_temp_example.csv", skip = 1)[,1:3]
colnames(temp_data) <- c("index", "date_time", "tempF")


Data Wrangling with dplyr
CHALLENGE: Using motinv, how many species are found in PlotName = A1 in 2024?
Answer
# with brackets
A1_2024 <- motinv |> filter(PlotName == "A1" & Year == 2024)
nrow(A1_2024) # 3
View R output
## [1] 3

CHALLENGE: What years have green crabs (Latin: Carcinus maenas) been detected?
Answer
gcrab <- motinv |> filter(ScientificName == "Carcinus maenas") |> 
  select(Year) |> unique()

gcrab
##   Year
## 1 2021
## 2 2022
## 3 2023
## 4 2024
## 9 2019
CHALLENGE: Find the highest value recorded in No.Damage column.
Answer
max_nd <- max(motinv$No.Damage, na.rm = TRUE)
motinv |> filter(No.Damage == max_nd)
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/11/2019 2019 FALSE       R4     Red Algae
##       ScientificName        CommonName SpeciesCode Damage No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT     11      1960         No

CHALLENGE: Fix the No.Damage typo by replacing 1960 with 196.

Answer

Let's say that you looked at the datasheet, and the actual count for No.Damage was 196 instead of 1960. You can change that value in the original CSV by hand. But even better is to document that change in code. There are multiple ways to do this. Two examples are below.

But first, it's good to create a new data frame when modifying the original data frame, so you can refer back to the original if needed. I also use a really specific filter to make sure I'm not accidentally changing other data.

Replace 1960 with 196

# dplyr approach
motinv_fix <- motinv |> mutate(No.Damage = replace(No.Damage, No.Damage == 1960, 196))
range(motinv$No.Damage)
## [1]    0 1960
range(motinv_fix$No.Damage)
## [1]   0 282


Conditionals
CHALLENGE: Using the motinv data, create a new column called trophic that indicates whether the species is an herbivore or predator.
Hint: predator site codes are c("CARMAE", "NUCLAP"), and herbivore site codes are c("LITLIT", "LITOBT", "LITSAX", "TECTES").
Answer
pred <- c("CARMAE", "NUCLAP")

# base R
motinv$trophic <- ifelse(motinv$SpeciesCode %in% pred, "predator", "herbivore")
table(motinv$trophic, motinv$SpeciesCode)
View R output
##            
##             CARMAE LITLIT LITOBT LITSAX NUCLAP TECTES
##   herbivore      0    220    197     20      0     82
##   predator      47      0      0      0    116      0

# tidyverse
motinv <- motinv |> mutate(trophic = ifelse(SpeciesCode %in% pred, "predator", "herbivore"))
table(motinv$trophic, motinv$SpeciesCode)
View R output
##            
##             CARMAE LITLIT LITOBT LITSAX NUCLAP TECTES
##   herbivore      0    220    197     20      0     82
##   predator      47      0      0      0    116      0

CHALLENGE: Using the motile invertebrate data, create a new column called count_level that has levels High, Medium, Low, based on No.Damage, where "High" is > 35, "Medium" is 10 - 35, and "Low" is < 10.
Answer
# Base R using a nested ifelse()
motinv$count_level <- 
  ifelse(motinv$No.Damage > 35, "High", 
         ifelse(motinv$No.Damage >= 10 & motinv$No.Damage <= 35, "Medium", "Low"))

table(motinv$count_level) # check that it worked
View R output
## 
##   High    Low Medium 
##    167    352    163

# Tidyverse using case_when() and between()
motinv <- motinv |> mutate(count_level = case_when(No.Damage > 35 ~ "High",
                                                   between(No.Damage, 10, 35) ~ "Medium",
                                                   No.Damage < 10 ~ "Low"))

table(motinv$count_level) # check that it worked
View R output
## 
##   High    Low Medium 
##    167    352    163

Note the use of the between() function that saves typing. This function matches as >= and <=.


Summarizing
CHALLENGE: Using the point intercept data (pi_dat), calculate the average percent frequency of each non-vegetated substrate by year. Note that non-vegetated substrates are CoverCode = c('BOLT', 'ROCK', 'WATER').
Answer
pi_nonveg <- pi_dat |> filter(CoverCode %in% c("BOLT", "ROCK", "WATER")) |> # filter nonveg grps
  summarize(avg_freq = mean(pct_freq), # calc avg.
            .by = c(SiteCode, Year, CoverCode, CoverType)) # grouping variables 

head(pi_nonveg) # check output
View R output
##   SiteCode Year CoverCode CoverType  avg_freq
## 1   BASHAR 2018      ROCK      Rock 11.314530
## 2   BASHAR 2018     WATER     Water  2.824469
## 3   BASHAR 2018      BOLT      Bolt  1.301859
## 4   BASHAR 2022     WATER     Water  1.774598
## 5   BASHAR 2022      ROCK      Rock  8.864728
## 6   BASHAR 2019      ROCK      Rock 16.006219

CHALLENGE: Using the point intercept data (pi_dat), calculate the average percent frequency of each non-vegetated vs vegetated cover types by year. Note that non-vegetated substrates are CoverCode = c('BOLT', 'ROCK', 'WATER').
Answer
pi_subtype <- pi_dat |>
  mutate(sub_type = ifelse(CoverCode %in% c("BOLT", "ROCK", "WATER"), "nonveg", "veg")) |> # filter nonveg grps
  summarize(avg_freq = mean(pct_freq), # calc avg.
            .by = c(SiteCode, Year, sub_type)) |> # grouping variables 
  arrange(SiteCode, Year, sub_type) # sort variables

head(pi_subtype) # check output
View R output
##   SiteCode Year sub_type  avg_freq
## 1   BASHAR 2013   nonveg  9.134165
## 2   BASHAR 2013      veg  9.807801
## 3   BASHAR 2014   nonveg  7.170307
## 4   BASHAR 2014      veg  9.992314
## 5   BASHAR 2015   nonveg  9.699333
## 6   BASHAR 2015      veg 10.075167


Pivoting Tables

CHALLENGE: Use the motinv_sum data frame from the "Summarizing with dplyr" tab to pivot on SpeciesCode and mean_count, and fill the NAs with 0s. If you don't have the motinv_sum data frame handy, run the code below to create it.
Hint: Drop the ScientificName and CommonName columns before you pivot.

# Fix the data issues again
motinv <- motinv |> 
  mutate(NoDamage_fix = replace(No.Damage, Damage == 1960, 196),
         Damage_fix = as.numeric(replace(Damage, Damage == "PM", NA)),
         total_count = NoDamage_fix + Damage_fix) |> 
  filter(QAQC == FALSE)

# Summarize the mean count per plot of each species by year and community type
motinv_sum <- motinv |> 
  summarize(mean_count = sum(total_count)/5, # 5 plots per site
            se_counts = sd(total_count)/sqrt(5), # 5 plots per site
            .by = c(SiteCode, Year, CommunityType, 
                    ScientificName, CommonName, SpeciesCode))
Answer
motinv_wide <- motinv_sum |> 
  arrange(SpeciesCode) |> # sorting so columns are alphabetical 
  select(-ScientificName, -CommonName) |> 
  pivot_wider(names_from = SpeciesCode,
              values_from = mean_count, 
              values_fill = 0)

head(motinv_wide)
## # A tibble: 6 × 10
##   SiteCode  Year CommunityType se_counts CARMAE LITLIT LITOBT LITSAX NUCLAP
##   <chr>    <int> <chr>             <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 BASHAR    2021 Ascophyllum       0.980    4.4      0      0      0      0
## 2 BASHAR    2022 Ascophyllum       0.224    1        0      0      0      0
## 3 BASHAR    2023 Ascophyllum       0.2      2.8      0      0      0      0
## 4 BASHAR    2024 Ascophyllum       0        0.8      0      0      0      0
## 5 BASHAR    2019 Ascophyllum       0.632    1.2      0      0      0      0
## 6 BASHAR    2021 Barnacle          1.73     2.6      0      0      0      0
## # ℹ 1 more variable: TECTES <dbl>

CHALLENGE: Use the motinv_sum data frame from the "Summarizing with dplyr" tab to pivot on Year and mean_count, fill the NAs with 0s, and add "yr_" to the column names to prevent column names starting with numbers. If you don't have the motinv_sum data frame handy, run the code below to create it.
Hint: Drop the se_counts column before you pivot.

# Fix the data issues again
motinv <- motinv |> 
  mutate(NoDamage_fix = replace(No.Damage, Damage == 1960, 196),
         Damage_fix = as.numeric(replace(Damage, Damage == "PM", NA)),
         total_count = NoDamage_fix + Damage_fix) |> 
  filter(QAQC == FALSE)

# Summarize the mean count per plot of each species by year and community type
motinv_sum <- motinv |> 
  summarize(mean_count = sum(total_count)/5, # 5 plots per site
            se_counts = sd(total_count)/sqrt(5), # 5 plots per site
            .by = c(SiteCode, Year, CommunityType, 
                    ScientificName, CommonName, SpeciesCode))
Answer
motinv_wide_yr <- motinv_sum |> 
  arrange(Year) |> # sorting so columns are alphabetical 
  select(-se_counts) |> 
  pivot_wider(names_from = Year,
              values_from = mean_count, 
              values_fill = 0, 
              names_prefix = "yr_")

head(motinv_wide_yr)
## # A tibble: 6 × 16
##   SiteCode CommunityType ScientificName   CommonName SpeciesCode yr_2013 yr_2014
##   <chr>    <chr>         <chr>            <chr>      <chr>         <dbl>   <dbl>
## 1 BASHAR   Ascophyllum   Littorina litto… Common pe… LITLIT         14      20.8
## 2 BASHAR   Ascophyllum   Littorina obtus… Smooth pe… LITOBT         19      18.4
## 3 BASHAR   Ascophyllum   Nucella lapillus Dogwhelk   NUCLAP          0.6     0.2
## 4 BASHAR   Barnacle      Littorina litto… Common pe… LITLIT          0.4     0.6
## 5 BASHAR   Barnacle      Littorina obtus… Smooth pe… LITOBT          0.2     1.6
## 6 BASHAR   Fucus         Littorina litto… Common pe… LITLIT         13.6    25.6
## # ℹ 9 more variables: yr_2015 <dbl>, yr_2016 <dbl>, yr_2017 <dbl>,
## #   yr_2018 <dbl>, yr_2019 <dbl>, yr_2021 <dbl>, yr_2022 <dbl>, yr_2023 <dbl>,
## #   yr_2024 <dbl>

CHALLENGE: Pivot the motinv_wide_yr data frame on the years columns, and remove the "yr_" from the year names using names_prefix = 'yr_'.

Answer
motinv_long_yr <- pivot_longer(motinv_wide_yr, 
                               cols = -c(SiteCode, CommunityType, ScientificName, 
                                         CommonName, SpeciesCode),
                               names_to = "Year", 
                               values_to = "mean_counts", 
                               names_prefix = "yr_") # drops this string from values


Joining Tables

CHALLENGE: Join the motile invertebrate count data frame to the motile invertebrate species table to get Invasive and Exotic columns added to the data.

Import motinv data frames

# Read in motinv data if you haven't yet
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")
head(motinv)
View R output
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 3    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 4    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 5    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 6    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
##       ScientificName        CommonName SpeciesCode Damage No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT      0         2         No
## 2 Littorina littorea Common periwinkle      LITLIT      0         3         No
## 3 Littorina obtusata Smooth periwinkle      LITOBT      1         2         No
## 4 Littorina obtusata Smooth periwinkle      LITOBT      0         6         No
## 5   Nucella lapillus          Dogwhelk      NUCLAP      0         1         No
## 6 Littorina littorea Common periwinkle      LITLIT      0         2         No

# Read in species table
motspp <- read.csv("./data/motile_invert_species_table.csv")
head(motspp)
View R output
##              ScientificName        CommonName SpeciesCode Invasive Exotic
## 1        Littorina littorea Common periwinkle      LITLIT    FALSE   TRUE
## 2        Littorina obtusata Smooth periwinkle      LITOBT    FALSE  FALSE
## 3           Carcinus maenas        Green crab      CARMAE     TRUE   TRUE
## 4       Littorina saxatilis  Rough periwinkle      LITSAX    FALSE  FALSE
## 5          Nucella lapillus          Dogwhelk      NUCLAP    FALSE  FALSE
## 6 Testudinalia testudinalis            Limpet      TECTES    FALSE  FALSE

intersect(names(motinv), names(motspp)) # 3 columns in common
View R output
## [1] "ScientificName" "CommonName"     "SpeciesCode"

Answer
# left join species to motinv, because don't want to include species not found in count data
motinv_spp <- left_join(motinv, 
                        motspp, 
                        by = c("SpeciesCode", "ScientificName", "CommonName"))

head(motinv_spp)
View R output
##   Network UnitCode SiteCode StartDate Year  QAQC PlotName CommunityType
## 1    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 2    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 3    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 4    NETN     ACAD   BASHAR 6/21/2013 2013 FALSE       A1   Ascophyllum
## 5    NETN     ACAD   BASHAR 6/24/2013 2013  TRUE       A1   Ascophyllum
## 6    NETN     ACAD   BASHAR 6/21/2014 2014 FALSE       A1   Ascophyllum
##       ScientificName        CommonName SpeciesCode Damage No.Damage Subsampled
## 1 Littorina littorea Common periwinkle      LITLIT      0         2         No
## 2 Littorina littorea Common periwinkle      LITLIT      0         3         No
## 3 Littorina obtusata Smooth periwinkle      LITOBT      1         2         No
## 4 Littorina obtusata Smooth periwinkle      LITOBT      0         6         No
## 5   Nucella lapillus          Dogwhelk      NUCLAP      0         1         No
## 6 Littorina littorea Common periwinkle      LITLIT      0         2         No
##   Invasive Exotic
## 1    FALSE   TRUE
## 2    FALSE   TRUE
## 3    FALSE  FALSE
## 4    FALSE  FALSE
## 5    FALSE  FALSE
## 6    FALSE   TRUE

CHALLENGE: Find species in motspp data frame that don't have a match in the motinv data frame.
Answer
# anti join of 
anti_join(motspp, motinv, by = c("SpeciesCode", "ScientificName", "CommonName"))
View R output
##           ScientificName      CommonName SpeciesCode Invasive Exotic
## 1 Hemigrapsus sanguineus Asian shorecrab     HEMISAN     TRUE   TRUE
## 2         Locusta marina         lobster      LOCMAR    FALSE  FALSE


Dates and Times

CHALLENGE: How would you return date1 as YYYYMMDD (20260312)?

# Create date1 if you don't have it already
date1 <- as.Date("3/12/2026", format = "%m/%d/%Y")
Answer
format(date1, format = "%Y%m%d")
View R output
## [1] "20260312"

CHALLENGE: How would you create a list of dates in 2026 that are evenly spaced by 3 months?
Answer
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y") 
seq.Date(date_list[1], date_list[2], by = "3 months")
View R output
## [1] "2026-01-01" "2026-04-01" "2026-07-01" "2026-10-01"

CHALLENGE: How would you create a list of dates in 2026 that are evenly spaced by 1 week?
Answer
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y") 
seq.Date(date_list[1], date_list[2], by = "1 week")
View R output
##  [1] "2026-01-01" "2026-01-08" "2026-01-15" "2026-01-22" "2026-01-29"
##  [6] "2026-02-05" "2026-02-12" "2026-02-19" "2026-02-26" "2026-03-05"
## [11] "2026-03-12" "2026-03-19" "2026-03-26" "2026-04-02" "2026-04-09"
## [16] "2026-04-16" "2026-04-23" "2026-04-30" "2026-05-07" "2026-05-14"
## [21] "2026-05-21" "2026-05-28" "2026-06-04" "2026-06-11" "2026-06-18"
## [26] "2026-06-25" "2026-07-02" "2026-07-09" "2026-07-16" "2026-07-23"
## [31] "2026-07-30" "2026-08-06" "2026-08-13" "2026-08-20" "2026-08-27"
## [36] "2026-09-03" "2026-09-10" "2026-09-17" "2026-09-24" "2026-10-01"
## [41] "2026-10-08" "2026-10-15" "2026-10-22" "2026-10-29" "2026-11-05"
## [46] "2026-11-12" "2026-11-19" "2026-11-26" "2026-12-03" "2026-12-10"
## [51] "2026-12-17" "2026-12-24" "2026-12-31"

CHALLENGE: How would you extract the month as a number ranging from 1-12 in temp_data?.
Answer
temp_data$month_num <- as.numeric(format(temp_data$timestamp, "%m"))
head(temp_data)
View R output
##   index       date_time  tempF month_num
## 1     1 7/18/2021 10:26 58.842        NA
## 2     2 7/18/2021 11:26 58.712        NA
## 3     3 7/18/2021 12:26 58.109        NA
## 4     4 7/18/2021 13:26 56.208        NA
## 5     5 7/18/2021 14:26 56.208        NA
## 6     6 7/18/2021 15:26 55.342        NA

CHALLENGE: How would you extract the Julian date in temp_data?.
Answer
temp_data$julian <- as.numeric(format(temp_data$timestamp, "%j"))
head(temp_data)
View R output
##   index       date_time  tempF month_num julian
## 1     1 7/18/2021 10:26 58.842        NA     NA
## 2     2 7/18/2021 11:26 58.712        NA     NA
## 3     3 7/18/2021 12:26 58.109        NA     NA
## 4     4 7/18/2021 13:26 56.208        NA     NA
## 5     5 7/18/2021 14:26 56.208        NA     NA
## 6     6 7/18/2021 15:26 55.342        NA     NA


Day 3 Questions

Load Data and Packages

Load packages and prep data for ggplot sections

# packages
library(dplyr)
library(ggplot2)
library(patchwork) # for arranging ggplot objects
library(RColorBrewer) # for palettes
library(viridis) # for palettes

# load data
pcov <- read.csv("./data/SHIHAR_photoplot_cover.csv") # import data

# define color and shape objects
cols <- c("ASCNOD" = "#C5B47B", "BARSPP" = "#A9A9A9", 
         "NONCOR" = "#574F91", "FUCSPP" = "#FFD560",
         "MUSSPP" = "#6F88BF", "REDGRP" = "#FF4C53")

shps <- c("ASCNOD" = 23, "BARSPP" = 24, "NONCOR" = 23,
          "FUCSPP" = 25, "MUSSPP" = 23, "REDGRP" = 25)

# Set x axis range
xrange <- range(pcov$Year)


Building a ggplot object
CHALLENGE: Using the full plot code above, make the point size 1.5.
Answer
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 1.5) + # changed this line
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
View R plot


CHALLENGE: Using the full plot code above, make the error bars wider.
Answer
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 1.2) + # changed this line
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") + 
  scale_shape_manual(values = shps, name = "Species Group") + 
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
View R plot


CHALLENGE: Using the full plot code above, change the x axis label to "Year".
Answer
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = "Year", y = "Avg Percent Cover") + # changed this line
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
View R plot



Improve formatting

CHALLENGE: How would you change the smoother in the code below from LOESS to linear model and make the line dashed? Hint: method = 'lm'.
Code for p6 if you don't have it.

p6 <- ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) +
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), # turns of major grids
        panel.grid.minor = element_blank(), # turns off minor grids
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
p6 + geom_smooth(se = F, span = 0.75)
View R plot


Answer
p6 + geom_smooth(method = 'lm', se = F, linetype = 'dashed')
View R plot


CHALLENGE: How would you make the legend title larger and bold, and legend text larger?
Answer
p6 + theme(legend.title = element_text(size = 12, face = 'bold'),
           legend.text = element_text(size = 11))
View R plot



Palettes

CHALLENGE: How would you specify the 'RdYlBu' palette instead of the ones used above?
Hint: Start with p_pal to save time coding. Code below for p_pal if you don't have it.

p_pal <- ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = "Year", y = "Avg Percent Cover") + # changed this line
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
Answer
p_pal +  scale_color_brewer(name = "Species Group", palette = "RdYlBu", 
                            aesthetics = c("fill", "color"))  
View R plot


CHALLENGE: Create your own palette with at least three colors.
Hint: Start with p_heat to save time coding. Code below, if you don't have it.

# Create p_heat for 'palettes' section
p_heat <- 
ggplot(chem, aes(x = mon, y = year, color = Temp_F, fill = Temp_F)) + 
  theme_bw() +
  geom_tile() + 
  labs(y = "Year", x = "Month") +
  scale_x_continuous(breaks = c(5, 6, 7, 8, 9, 10),
                     limits = c(4, 11), 
                     labels = month.abb[5:10]) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
Answer
p_heat + scale_color_gradient2(low = "#3E693D", mid = "#FDFFC7", high = "#7A6646", 
                               aesthetics = c("fill", 'color'),
                               midpoint = mean(chem$Temp_F), 
                               name = "Temp. (F)") 
View R plot



Getting Help

Help Documentation

There are a number of options to get help with R. If you're trying to figure out how to use a function, you can type ?function_name. For example ?plot will show the R documentation for that function in the Help panel.

Get help for the functions below

?plot
?dplyr::filter
You can also press F1 while the cursor is on a function name to access the help for that function. Help documents in R are standardized to help you find what you're looking for.
  • Top left shows the function with the {package}. Base means it's a function in the base R install.
  • Description: tell you what the function is.
  • Usage: tells you what the arguments are. The "..." means there are other potential arguments, but isn't something we need to talk about right now.
  • Arguments: define the arguments and what their inputs take. For example, if an argument is TRUE or FALSE, or a text string.
  • Value: Describes more about the function (not always included)
  • See Also: Sometimes functions build on other functions. This section links to similar or building block functions.
  • Examples: Functions that provide good examples are invaluable. Sometimes these are afterthoughts or not included in help documentation, which is too bad. Unfortunately, base R functions tend to have some of the most obscure, hard to understand examples.


Troubleshooting errors

Great online resources to find answers to questions include Stackexchange, and Stackoverflow. Google searches are usually my first step, and I include "in R" and the package name (if applicable) in every search related to R code. If you're troubleshooting an error message, copying and pasting the error message verbatim into a search engine often helps.

Don't hesitate to reach out to colleagues for help as well! If you are stuck on something and the answers on Google are more confusing than helpful, don't be afraid to ask a human. Every experienced R programmer was a beginner once, so chances are they've encountered the same problem as you at some point. There is an R-focused Data Science Community of Practice for I&M folks, which anyone working in R (regardless of experience!) is invited and encouraged to join.


Common errors and how to fix them
  1. Unmatched parenthesis

  2. mean_x <- mean(c(1, 3, 5, 7, 8, 21) # missing closing parentheses
    mean_x <- mean(c(1, 3, 5, 7, 8, 21)) # correct
  3. Unmatched quotes

  4. birds <- c("black-capped chickadee", "golden-crowned kinglet, "wood thrush") # missing quote after kinglet
    birds <- c("black-capped chickadee", "golden-crowned kinglet", "wood thrush") # corrected
  5. Missing a comma between elements

  6. birds <- c("black-capped chickadee", "golden-crowned kinglet" "wood thrush") # missing comma after kinglet
    birds <- c("black-capped chickadee", "golden-crowned kinglet", "wood thrush") # corrected
  7. Misspelled function name

  8. x_mean <- maen(x) # misspelled mean
    x_mean <- mean(x) # Corrected
  9. Incorrect use of dimensions with brackets

  10. # Missing comma to indicate subsetting rows (records)
    motinv <- motinv[!is.na(motinv$SiteCode)]
    ## Error in `[.data.frame`:
    ## ! undefined columns selected
    # Correct
    motinv <- motinv[!is.na(motinv$SiteCode),]
Other resources that may help:



Resources

Online Resources

There's a lot of great online material for learning new applications of R. The ones we've used the most are listed below.

Online Books
  • R for Data Science First author is Hadley Wickham, one of the main programmers behind the tidyverse. There's a lot of good stuff in here. This book is the first place to look for anything you want to follow up on from this training.
  • ggplot2: Elegant Graphics for Data Analysis A great reference on ggplot2 also by Hadley Wickham.
  • Mastering Software Development in R First author is Roger Peng, a Biostatistics professor at John Hopkins, who has taught a lot of undergrad/grad students how to use R. He's also one of the hosts of Not So Standard Deviations podcast. His intro to ggplot is great. He's also got a lot of more advanced topics in this book, like making functions and packages.
  • R Packages Another book by Hadley Wickham that teaches you how to build, debug, and test R packages.
  • Advanced R Yet another book by Hadley Wickham that helps you understand more about how R works under the hood, how it relates to other programming languages, and how to build packages.
  • Mastering Shiny And another Hadley Wickham book on building shiny apps.
Other useful sites
  • NPS_IMD_Data_Science_and_Visualization > Community of Practice is an IMD work group that meets once a month talk about R and Data Science. There are also notes, materials and recordings from previous meetings, a Wiki with helpful tips, and the chat is a great place to post questions or cool tips you've come across.
  • STAT545 Jenny Bryan's site that accompanies the graduate level stats class of the same name. She includes topics on best practices for coding, and not just how to make scripts work. It's really well done.
  • RStudio home page There's a ton of info in the Resources tab on this site, including cheatsheets for each package developed by RStudio (ie tidyverse packages), webinars, presentations from past RStudio Conferences, etc.
  • RStudio list of useful R packages by topic
  • patchwork R package tutorial for arranging multiple ggplot figures.
  • R Markdown: The Definitive Guide provides nearly everything you need to know about building R Markdown documents, a really useful way to document your code, output, and notes all in one place. This website was developed in R Markdown, for example.
  • Happy Git with R If you find yourself wanting to go down the path of hosting your code on github, this site will walk you through the process of linking github to RStudio.

Keyboard Shortcuts

Once you get in the swing of coding, you'll find that minimizing the number of times you have to use your mouse will help you code faster. RStudio has done a great job creating lots of really useful keyboard shortcuts designed to keep your hands on the keyboard instead of having to click through menus. One way to see all of the shortcuts RStudio has built in is to press Alt+Shift+K. A window should appear with a bunch of shortcuts listed. These are also listed on List of RStudio IDE Keyboard Shortcuts. The shortcuts I use the most often are listed below:
  • Undo: Ctrl Z
  • Redo: Ctrl Shift Z
  • Run highlighted code: Ctrl Enter
  • Insert "<-" : Alt -
  • Zoom in to make text bigger: Ctrl roll mouse forward (set in Global Options)
  • Zoom out: Ctrl - or Ctrl roll mouse backward (set in Global Options)
  • Move line of code up or down: Alt arrow up or down
  • Comment out whole line: Ctrl Shift C
  • Duplicate line of code: Ctrl Shift D
  • Move cursor to beginning of line: Home
  • Move cursor to end of line: End
  • View help for a given function: Put cursor on function name and press F1
  • Esc escapes out of the command currently being executed in the console
  • Restart R Session: Ctrl Shift F10
  • Insert pipe (|>): Ctrl Shift M
  • View RStudio's keyboard shortcuts: Alt Shift K

Advanced topics

Additional skills that can greatly improve your workflow are:
  • Using R Markdown (stand alone websites/docs) or R Shiny (interactive websites) for automated reporting or visualizing your data.
  • Writing your own functions and iterating tasks.
  • Version control using git/GitHub.
  • Building R packages.

While we won't get to these topics this during this training, the 2022 Advanced R training has sessions covering all of these topics. The Resources tab includes other online resources that cover these topics as well.

R Markdown wizards
Artwork by @allison_horst

Code printout

knitr::opts_chunk$set(warning=FALSE, message=FALSE)
hooks = knitr::knit_hooks$get()
hook_foldable = function(type) {
  force(type)
  function(x, options) {
    res = hooks[[type]](x, options)
    
    if (isFALSE(options[[paste0("fold.", type)]])) return(res)
    
    paste0(
      "<details><summary class='code2'>View R ", type, "</summary>\n",
      res, "\n\n",
      "</details>",
      "\n\n",
      "<hr style='height:1px; margin-bottom:15px; padding-bottom:15px; padding-top:-15px;margin-top:-15px;visibility:hidden;'>",
      "\n\n"
    
      )
  }
}
knitr::knit_hooks$set(
  output = hook_foldable("output"),
  plot = hook_foldable("plot")
)
body {
  background-color: #EBEBEB;
}

.tab-content {
  background-color: #FAFAF0;
  padding: 0 5px;
}

library(tidyverse)
#------------------------------------
#        Day 0 - prep code 
#------------------------------------
rm(list = ls())
packages <- c("tidyverse", # for Day 2 and 3 data wrangling
              "RColorBrewer", "viridis", "patchwork", # for Day 3 ggplot
              "readxl", "writexl", # for day 1 importing from excel
              "car") # for Levene's test - also a great stats R package

install.packages(setdiff(packages, rownames(installed.packages())))  

# Check that installation worked
library(tidyverse) # turns on core tidyverse packages
library(RColorBrewer) # palette generator
library(viridis) # more palettes
library(patchwork) # multipanel plots
library(readxl) # reading xlsx
library(writexl) # writing xlsx
motinv <- read.csv(
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/BASHAR_motile_invert_counts.csv")
#------------------------------------
#     Day 1: Project Setup Code 
#------------------------------------
# forward slash file path approach
"C:/Users/KMMiller/OneDrive = DOI/data/"

# backward slash file path approach
"C:\\Users\\KMMiller\\OneDrive = DOI\\data\\"

dir.create("data")
list.files() # you should see a data folder listed 
file_list <- c(
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/ACAD_Jordan_Pond_water_chem.csv",
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/BASHAR_motile_invert_counts.csv", 
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/BASHAR_Point_Intercept_data.csv", 
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/bat_site_info.csv", 
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/bat_captures.csv", 
  "https://raw.githubusercontent.com/KateMMiller/IMD_R_Training_2026/refs/heads/main/data/HOBO_temp_example.csv", 
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/motile_invert_species_table.csv",
  "https://raw.githubusercontent.com/KateMMiller/MMA_R_Training_2026/refs/heads/main/data/SHIHAR_photoplot_cover.csv")

file_names <- sub(".*data/", "",  file_list)

lapply(seq_along(file_list), function(x){
    download.file(file_list[x], 
                  destfile = paste0("./data/", file_names[x]))
})

#------------------------------------
#     Day 1: Start Coding Code 
#------------------------------------
# Commented text: try this line to generate some basic text and become familiar with where results will appear:
print("Welcome to R!")

# simple math
1+1

(2*3)/4

sqrt(9)

# calculate basal area of tree with 14.6cm diameter; note pi is built in constant in R
(14.6^2)*pi

# get the cosine of 180 degrees - note that trig functions in R expect angles in radians
cos(pi)

# the value of 12.098 is assigned to variable 'a'
a <- 12.098

# and the value 65.3475 is assigned to variable 'b'
b <- 65.3475

# we can now perform whatever mathematical operations we want using these two 
# variables without having to repeatedly type out the actual numbers:

a*b

(a^b)/((b+a))

sqrt((a^7)/(b*2))

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# equivalent to x <- 1:10

# bad coding
#mean <- mean(x)

# good coding 
mean_x <- mean(x)
mean_x

range_x <- range(x)
range_x
#------------------------------------
#     Day 1: Read and Write Code 
#------------------------------------
# read in the data from BASHAR_motile_invert_counts.csv and assign it as a dataframe to the variable "motinv"
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")
# View the BASHAR_motile data frame we just created
View(motinv)
# Look at the top 6 rows of the data frame
head(motinv)
# Look at the bottom 6 rows of the data frame
tail(motinv)
# Write the data frame to your data folder using a relative path. 
# By default, write.csv adds a column with row names that are numbers. I don't
# like that, so I turn that off.
write.csv(motinv, "./data/BASHAR_motile_invert_counts.csv", row.names = FALSE)
# Read the data frame in using a relative path
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")

# Equivalent code to read in the data frame using full path on my computer, but won't match another user.
motinv <- read.csv("C:/Users/KMMiller/OneDrive - DOI/NETN/R_Dev/MMA_R_Training_2026/data/BASHAR_motile_invert_counts.csv")

install.packages("readxl") # only need to run once. 
install.packages("writexl")
library(writexl) # saving xlsx
library(readxl) # importing xlsx
write_xlsx(motinv, "./data/BASHAR_motile_invert_counts.xlsx")
motinv_xls <- read_xlsx(path = "./data/BASHAR_motile_invert_counts.xlsx", sheet = "Sheet1") 
head(motinv_xls)
#------------------------------------
#       Day 1: Vectors Code 
#------------------------------------
digits <- c(1:10)  # Use x:y to create a sequence of integers starting at x and ending at y
digits
digits + 1 # note how 1 was added to every element of digits. 

is_odd <- rep(c(FALSE, TRUE), 5)  # Use rep(x, n) to create a vector by repeating x n times 
is_odd

tree_dbh <- c(12.5, 20.4, 18.1, 38.5, 19.3)
tree_dbh

bird_ids <- c("song sparrow", "dark-eyed junco", "golden-crowned kinglet", "dark-eyed junco")
bird_ids
second_bird <- bird_ids[2]
second_bird
top_two_birds <- bird_ids[c(1,2)]
top_two_birds
sort(unique(bird_ids))
class(bird_ids)
class(tree_dbh)
class(digits)
class(is_odd)
str(motinv)
names(motinv)
motinv$PlotName
motinv$ScientificName
dim(motinv)
nrow(motinv) # first dim
ncol(motinv) # second dim
motinv[1:5,]
motinv[c(1, 2, 3, 4, 5),] #equivalent but more typing
motinv[, c("SiteCode", "ScientificName", "CommonName", "Year", "Damage", "No.Damage")]
motinv[1:5, c("SiteCode", "ScientificName", "CommonName", "Year", "Damage", "No.Damage")]
motinv_sub <- motinv[, 1:4] # works, but risky
motinv_sub2 <- motinv[, c("Network", "UnitCode", "SiteCode", "StartDate")]  #same result, but better
# compare the two data frames to the original
head(motinv)
head(motinv_sub)
head(motinv_sub2)
motinv[c(2, 4, 6, 8), c(1, 2)]
names(motinv) # get the names of the first 2 columns
motinv[c(2, 4, 6, 8), c("Network", "UnitCode")]
head(motinv)

motinv_nonQ <- motinv[motinv$QAQC == FALSE, ]
table(motinv$QAQC) # 42 T
table(motinv_nonQ$QAQC) # 0 T
motinv$ScientificName[motinv$CommunityType == "Barnacle"]
motinv[motinv$CommunityType == "Barnacle", "ScientificName"] # equivalent
lit_spp <- c("Littorina littorea", "Littorina obtusata", "Littorina saxatilis")
motinv_lit <- motinv[motinv$ScientificName %in% lit_spp, 
                     c("SiteCode", "PlotName", "ScientificName", "Year")]
motinv_lit
# Return a vector of unique plot names, sorted alphabetically
plots_unique <- sort(unique(motinv[,"PlotName"]))
plots_unique
# Returns the number of elements in sites_unique vector
length(plots_unique) # 20
# Option 1
length(unique(motinv[, "ScientificName"])) # 6
# Option 2
length(unique(motinv$ScientificName)) # equivalent
# Option 1 - used unique to just return unique site name
unique(motinv$Year[motinv$QAQC == TRUE]) # 2013
# Option 2
unique(motinv[motinv$QAQC == TRUE, "Year"])
#-----------------------------------------
#     Day 1: Data Exploration Code 
#-----------------------------------------
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")
head(motinv)
str(motinv)
summary(motinv)
table(complete.cases(motinv[,1:13]))# first 13 columns are all complete
table(complete.cases(motinv$Subsampled))# where the FALSE are introduced

x <- c(1, 3, 8, 3, 5, NA)
mean(x) # returns NA
mean(x, na.rm = TRUE) 
sort(unique(motinv$Damage)) # sorts the unique values in the column
table(unique(motinv$Damage)) # shows the number of records per value - very handy
motinv2 <- motinv
motinv2$Damage[motinv2$Damage == "PM"] <- NA
motinv2$Damage_num <- as.numeric(motinv2$Damage)

# check that it worked
str(motinv2) # Damage_num is numeric
sort(unique(motinv2$Damage_num)) # Only numbers show in table
motinv3 <- subset(motinv2, QAQC == FALSE, select = -Damage) # Note the importance of FALSE all caps
motinv3 <- subset(motinv2, QAQC != TRUE, select = -Damage) # equivalent
motinv3 <- motinv2[motinv2$QAQC == FALSE, -12] #equivalent but not as easy to follow
# Look at the start date format
head(motinv3) # month/day/year

# Create new column called Date
motinv3$Date <- as.Date(motinv3$StartDate, format = "%m/%d/%Y")
str(motinv3)

names(motinv3) # original names
names(motinv3)[names(motinv3) == "ScientificName"] <- "Species"
names(motinv3) # check that it worked
motinv3$Site_Plot <- paste(motinv3$SiteCode, motinv3$PlotName, sep = "-")
motinv3$Site_Plot <- paste0(motinv3$SiteCode, "-", motinv3$PlotName) #equivalent- by default no separation between elements of paste.

# with brackets
A1_2024 <- motinv3[motinv3$PlotName == "A1" & motinv3$Year == 2024, ]
nrow(A1_2024) # 3
# with base R subset
A1_2024b <- subset(motinv3, PlotName == "A1" & Year == 2024)
View(A1_2024b) # 3
# OPTION 2
gcrab <- motinv3[motinv3$Species == "Carcinus maenas",]
sort(unique(gcrab$Year)) #2019, 2021, 2022, 2023, 2024

gcrab2 <- subset(motinv3, Species == "Carcinus maenas")
table(gcrab2$Year)
View(motinv3)
max_nd <- max(motinv3$No.Damage, na.rm = TRUE)
motinv3[motinv3$No.Damage == max_nd,]
# create copy of motinv data
motinv_fix <- motinv3

# find the problematic value, and change it to 196
motinv_fix$No.Damage[motinv_fix$Year == 2019 & 
                       motinv_fix$PlotName == "R4" & 
                       motinv_fix$No.Damage == 1960] <- 196

# check your work
range(motinv3$No.Damage) #1960
range(motinv_fix$No.Damage) # now 282

#------------------------------------
#     Day 1: Basic Plotting Code 
#------------------------------------
hist(x = motinv3$No.Damage)
plot(motinv3$No.Damage)
plot(motinv3$No.Damage ~ motinv3$Damage_num)
plot(No.Damage ~ Damage_num, data = motinv3) # equivalent but cleaner axis titles
hist(motinv3$Damage_num)
#------------------------------------
#       Day 2: Tidyverse Code 
#------------------------------------
install.packages('tidyverse')
library(tidyverse)
library(dplyr)
#------------------------------------
#     Day 2: Data Wrangling Code 
#------------------------------------
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")
# Base R
motinv2 <- motinv
motinv2$Damage[motinv2$Damage == "PM"] <- NA
motinv2$Damage_num <- as.numeric(motinv2$Damage)
# dplyr approach with mutate
motinv2 <- mutate(motinv, Damage_num = as.numeric(replace(Damage, Damage == "PM", NA)))
str(motinv2)
# Base R
motinv2$Date <- as.Date(motinv2$StartDate, format = "%m/%d/%Y")
# dplyr approach with mutate
motinv2 <- mutate(motinv2, Date = as.Date(StartDate, format = "%m/%d/%Y"))
# Base R code
names(motinv2)[names(motinv2) == "ScientificName"] <- "Species"
# dplyr approach with rename
motinv2 <- rename(motinv2, "Species" = "ScientificName")
names(motinv2)
# Base R
motinv2$Site_Plot <- paste(motinv2$SiteCode, motinv2$PlotName, sep = "-")
# dplyr approach with mutate
motinv2 <- mutate(motinv2, Site_Plot = paste(SiteCode, PlotName, sep = "-"))
# Base R
motinv3 <- subset(motinv2, QAQC == FALSE, select = -Damage) # Note the importance of FALSE all caps
# dplyr
motinv3a <- filter(motinv2, QAQC == FALSE)
motinv3 <- select(motinv3a, -Damage)

head(motinv3)
motinv4 <-  mutate(motinv3, No.Damage = replace(No.Damage, No.Damage == 1960, 196))
motinv_final <- motinv |> 
  mutate(Damage_num = as.numeric(replace(Damage, Damage == "PM", NA)), # Fix Damage PM
         SitePlot = paste(SiteCode, PlotName, sep = "-"), # create new SitePlot column
         Date = as.Date(StartDate, format = "%m/%d/%Y"), # create new Date column
         No.Damage_fix = replace(No.Damage, No.Damage == 1960, 196)) |> # fix error in No.Damage
  rename("Species" = "ScientificName") |> # change column name 
  filter(QAQC == FALSE) |> # drop QAQC visits
  select(-Damage) |> # drop original Damage column
  arrange(SitePlot, Year, Species) # optional sorting the data

head(motinv_final)  
# with brackets
A1_2024 <- motinv |> filter(PlotName == "A1" & Year == 2024)
nrow(A1_2024) # 3
gcrab <- motinv |> filter(ScientificName == "Carcinus maenas") |> 
  select(Year) |> unique()

gcrab
max_nd <- max(motinv$No.Damage, na.rm = TRUE)

motinv |> filter(No.Damage == max_nd)
# Reminder of the base R approach

# create copy of motinv data
motinv_fix <- motinv

# find the problematic value, and change it to 196
motinv_fix$No.Damage[motinv_fix$Year == 2019 & 
                     motinv_fix$PlotName == "R4" & 
                     motinv_fix$No.Damage == 1960] <- 196

# dplyr approach
motinv_fix <- motinv |> mutate(No.Damage = replace(No.Damage, No.Damage == 1960, 196))
range(motinv_fix$No.Damage)
#------------------------------------
#      Day 2: Conditionals Code 
#------------------------------------
# green crab, Asian shore crab, and common periwinkle species codes
exo_spp <- c("CARMAE", "HEMISAN", "LITLIT") 

# smooth periwinkle, rough periwinkle, dogwhelk, and limpet species codes
nat_spp <- c("LITOBT", "LITSAX", "NUCLAP", "TECTES")

# Make a table of species codes in BASHAR
table(motinv$SpeciesCode)

# Add native column with ifelse
motinv <- motinv |> mutate(native = ifelse(SpeciesCode %in% nat_spp, TRUE, FALSE))

# Add native_status column with nested ifelse
motinv <- motinv |> mutate(native_status = ifelse(SpeciesCode %in% nat_spp, "native",
                                                  ifelse(SpeciesCode %in% c("CARMAE", "HEMISAN"), "invasive", 
                                                         "exotic")))

table(motinv$SpeciesCode, motinv$native)
table(motinv$SpeciesCode, motinv$native_status)
# green crab, Asian shore crab, and common periwinkle species codes
exo_spp <- c("CARMAE", "HEMISAN", "LITLIT") 

# smooth periwinkle, rough periwinkle, dogwhelk, and limpet species codes
nat_spp <- c("LITOBT", "LITSAX", "NUCLAP", "TECTES")

motinv <- motinv |> 
  mutate(native_status = case_when(SpeciesCode %in% nat_spp ~ 'native',
                                   SpeciesCode %in% c("CARMAE", "HEMISAN") ~ 'invasive',
                                   SpeciesCode %in% exo_spp ~ 'exotic', 
                                   TRUE ~ 'unknown'))

table(motinv$SpeciesCode, motinv$native_status) # check that the output worked

inv <- motinv |> filter(native_status == "invasive")
spp_det <- unique(inv$CommonName)

if(nrow(inv) > 0){print(paste0("The following invasive species were detected in the data: ", 
                               paste0(spp_det, collapse = ", ")))
  } else {print("No invasive species were detected in the data.")}

inv <- motinv |> filter(SpeciesCode %in% nat_spp) |> 
  filter(native_status == "invasive")
spp_det <- unique(inv$CommonName)

if(nrow(inv) > 0){print(paste0("The following invasive species were detected in the data: ", 
                               paste0(spp_det, collapse = ", ")))
  } else {print("No invasive species were detected in the data.")}
pred <- c("CARMAE", "NUCLAP")

# base R
motinv$trophic <- ifelse(motinv$SpeciesCode %in% pred, "predator", "herbivore")
table(motinv$trophic, motinv$SpeciesCode)

# tidyverse
motinv <- motinv |> mutate(trophic = ifelse(SpeciesCode %in% pred, "predator", "herbivore"))
table(motinv$trophic, motinv$SpeciesCode)
# Base R using a nested ifelse()
motinv$count_level <- 
  ifelse(motinv$No.Damage > 35, "High", 
         ifelse(motinv$No.Damage >= 10 & motinv$No.Damage <= 35, "Medium", "Low"))

table(motinv$count_level) # check that it worked
# Tidyverse using case_when() and between
motinv <- motinv |> mutate(count_level = case_when(No.Damage > 35 ~ "High",
                                                   between(No.Damage, 10, 35) ~ "Medium",
                                                   No.Damage < 10 ~ "Low"))

table(motinv$count_level) # check that it worked
#------------------------------------
#      Day 2: Summarizing Code 
#------------------------------------
pi_dat <- read.csv("./data/BASHAR_Point_Intercept_data.csv")

head(pi_dat)
pi_dat_mut <- pi_dat |> mutate(med_elev_sl = median(med_elev), 
                               avg_pct_freq = mean(pct_freq), 
                               .by = c(SiteCode, Year, CoverType, CoverCode))
nrow(pi_dat) #314
nrow(pi_dat_mut) #314
head(pi_dat_mut)
pi_dat_sum <- pi_dat |> summarize(elev_sl_med = median(med_elev), 
                                  elev_sl_min = min(med_elev),
                                  elev_sl_max = max(med_elev),
                                  avg_pct_freq = mean(pct_freq), 
                                  .by = c(SiteCode, Year, CoverType, CoverCode))
nrow(pi_dat) #314
nrow(pi_dat_sum) #124
head(pi_dat_sum)
# Fix the data issues again
motinv <- motinv |> 
  mutate(NoDamage_fix = replace(No.Damage, Damage == 1960, 196),
         Damage_fix = as.numeric(replace(Damage, Damage == "PM", NA)),
         total_count = NoDamage_fix + Damage_fix) |> 
  filter(QAQC == FALSE)

# Summarize the mean count per plot of each species by year and community type
motinv_sum <- motinv |> 
  summarize(mean_count = sum(total_count)/5, # 5 plots per site
            se_counts = sd(total_count)/sqrt(5), # 5 plots per site
            .by = c(SiteCode, Year, CommunityType, 
                    ScientificName, CommonName, SpeciesCode))

head(motinv_sum)
pi_nonveg <- pi_dat |> filter(CoverCode %in% c("BOLT", "ROCK", "WATER")) |> # filter nonveg grps
  summarize(avg_freq = mean(pct_freq), # calc avg.
            .by = c(SiteCode, Year, CoverCode, CoverType)) # grouping variables 

head(pi_nonveg) # check output
pi_subtype <- pi_dat |>
  mutate(sub_type = ifelse(CoverCode %in% c("BOLT", "ROCK", "WATER"), "nonveg", "veg")) |> # filter nonveg grps
  summarize(avg_freq = mean(pct_freq), # calc avg.
            .by = c(SiteCode, Year, sub_type)) |> # grouping variables 
  arrange(SiteCode, Year, sub_type) # sort variables

head(pi_subtype) # check output
#------------------------------------
#         Day 3: Pivot Code 
#------------------------------------
# load the package
library(dplyr)
library(tidyr) # for pivot functions

#--- import the raw point intercept data
pi_dat <- read.csv("./data/BASHAR_Point_Intercept_data.csv")


# summarize data by site, year, and cover type
pi_dat_sum <- pi_dat |> summarize(med_elev_sl = median(med_elev, na.rm = T), 
                                  avg_pct_freq = mean(pct_freq, na.rm = T), 
                                  .by = c(SiteCode, Year, CoverType, CoverCode))

pi_wide <- pi_dat_sum |> 
  arrange(CoverCode, Year) |> # sort by CoverCode and year
  select(-CoverType, -med_elev_sl) |> # Drop extra column
  pivot_wider(names_from = CoverCode, # column that will produce column names
              values_from = avg_pct_freq) # column to make the values
head(pi_wide)
pi_wide <- pi_dat_sum |> 
  arrange(CoverCode, Year) |> 
  select(-CoverType, -med_elev_sl) |> 
  pivot_wider(names_from = CoverCode, 
              values_from = avg_pct_freq, 
              values_fill = 0) # new line

head(pi_wide)
pi_wide_yr <- pi_dat_sum |> 
  arrange(Year) |>
  select(-med_elev_sl) |> 
  pivot_wider(names_from = Year, # pivot on year instead of CoverCode 
              values_from = avg_pct_freq, 
              values_fill = 0,
              names_prefix = "yr_") # new line

head(pi_wide_yr)
# Fix the data issues again
motinv <- motinv |> 
  mutate(NoDamage_fix = replace(No.Damage, Damage == 1960, 196),
         Damage_fix = as.numeric(replace(Damage, Damage == "PM", NA)),
         total_count = NoDamage_fix + Damage_fix) |> 
  filter(QAQC == FALSE)

# Summarize the mean count per plot of each species by year and community type
motinv_sum <- motinv |> 
  summarize(mean_count = sum(total_count)/5, # 5 plots per site
            se_counts = sd(total_count)/sqrt(5), # 5 plots per site
            .by = c(SiteCode, Year, CommunityType, 
                    ScientificName, CommonName, SpeciesCode))
motinv_wide <- motinv_sum |> 
  arrange(SpeciesCode) |> # sorting so columns are alphabetical 
  select(-ScientificName, -CommonName) |> 
  pivot_wider(names_from = SpeciesCode,
              values_from = mean_count, 
              values_fill = 0)

head(motinv_wide)

# Fix the data issues again
motinv <- motinv |> 
  mutate(NoDamage_fix = replace(No.Damage, Damage == 1960, 196),
         Damage_fix = as.numeric(replace(Damage, Damage == "PM", NA)),
         total_count = NoDamage_fix + Damage_fix) |> 
  filter(QAQC == FALSE)

# Summarize the mean count per plot of each species by year and community type
motinv_sum <- motinv |> 
  summarize(mean_count = sum(total_count)/5, # 5 plots per site
            se_counts = sd(total_count)/sqrt(5), # 5 plots per site
            .by = c(SiteCode, Year, CommunityType, 
                    ScientificName, CommonName, SpeciesCode))
motinv_wide_yr <- motinv_sum |> 
  arrange(Year) |> # sorting so columns are alphabetical 
  select(-se_counts) |> 
  pivot_wider(names_from = Year,
              values_from = mean_count, 
              values_fill = 0, 
              names_prefix = "yr_")

head(motinv_wide_yr)
pi_long <- pi_wide |> pivot_longer(cols = -c(SiteCode, Year), 
                                   names_to = "SpeciesCode", 
                                   values_to = "Avg_Pct_Freq")
head(pi_long)
motinv_long_yr <- pivot_longer(motinv_wide_yr, 
                               cols = -c(SiteCode, CommunityType, ScientificName, 
                                         CommonName, SpeciesCode),
                               names_to = "Year", 
                               values_to = "mean_counts", 
                               names_prefix = "yr_") # drops this string from values
#------------------------------------
#        Day 3: Join Code 
#------------------------------------
#site data
bat_sites <- read.csv("./data/bat_site_info.csv")
# bat capture data
bat_cap <- read.csv("./data/bat_captures.csv")

# View sites listed in each
sort(unique(bat_sites$Site)) # Sites 1, 2, 3, 4, 5
sort(unique(bat_cap$Site)) # Sites 1, 2, 3, 5, 6
bat_full <- full_join(bat_sites, bat_cap, by = "Site")
table(bat_full$Site)
knitr::kable(bat_full, align = 'c') |> 
  kableExtra::scroll_box(height = "300px") |> 
  kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 10) |> 
  kableExtra::column_spec(1:10, background = 'white', include_thead = T)
bat_inner <- inner_join(bat_sites, bat_cap, by = "Site")
table(bat_inner$Site)
knitr::kable(bat_inner) |> 
  kableExtra::scroll_box(height = "300px") |> 
  kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 10) |> 
  kableExtra::column_spec(1:10, background = 'white', include_thead = T)
bat_left <- left_join(bat_sites, bat_cap, by = "Site")
table(bat_left$Site)
knitr::kable(bat_left) |> 
  kableExtra::scroll_box(height = "300px") |> 
  kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 10) |> 
  kableExtra::column_spec(1:10, background = 'white', include_thead = T)
bat_right <- right_join(bat_sites, bat_cap, by = "Site")
table(bat_right$Site)
knitr::kable(bat_right) |> 
  kableExtra::scroll_box(height = "300px") |> 
  kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 10) |> 
  kableExtra::column_spec(1:10, background = 'white', include_thead = T)
anti_join(bat_sites, bat_cap, by = "Site")
anti_join(bat_cap, bat_sites, by = "Site")
#--- Read in motinv data if you haven't yet
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")

#--- Read in species table
motspp <- read.csv("./data/motile_invert_species_table.csv")

head(motspp)

intersect(names(motinv), names(motspp)) # 3 columns in common
# left join species to motinv, because don't want to include species not found in count data
motinv_spp <- left_join(motinv, 
                        motspp, 
                        by = c("SpeciesCode", "ScientificName", "CommonName"))

head(motinv_spp)
# anti join of 
anti_join(motspp, motinv, by = c("SpeciesCode", "ScientificName", "CommonName"))

#------------------------------------
#     Day 3: Dates and Time Code 
#------------------------------------
codes <- read.csv("./data/datetime_codes.csv", encoding = "Latin-1")
knitr::kable(codes) |> 
  #kableExtra::scroll_box(width = "300px") |> 
  kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 11,
                            bootstrap_options = "condensed") |> 
  kableExtra::column_spec(1:2, background = 'white', include_thead = T)
Sys.time()
class(Sys.time()) # POSIXct POSIXt
Sys.Date()
class(Sys.Date()) # Date
# date with slashes and full year
date_chr1 <- "3/12/2026"
date1 <- as.Date(date_chr1, format = "%m/%d/%Y")
str(date1)
# date with dashes and 2-digit year
date_chr2 <- "3-12-26"
date2 <- as.Date(date_chr2, format = "%m-%d-%y")
str(date2)
# date written out
date_chr3 <- "March 12, 2026"
date3 <- as.Date(date_chr3, format = "%b %d, %Y")
str(date3)
#Julian date as numeric
as.numeric(format(date1, format = "%j"))

#Return day of week
format(date1, format = "%A") 
#Return abbreviated day of week
format(date1, format = "%a") 

#Return written out date with month name
format(date1, format = "%B %d, %Y") 
#Return abbreviated written out date with month name
format(date1, format = "%b %d, %Y") 

date1 + 1 # add a day
date1 + 7 # add a week
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y") 
# by 15 days
seq.Date(date_list[1], date_list[2], by = "15 days")
# by month
seq.Date(date_list[1], date_list[2], by = "1 month")
# by 6 months
seq.Date(date_list[1], date_list[2], by = "6 months")
format(date1, format = "%Y%m%d")
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y") 
seq.Date(date_list[1], date_list[2], by = "3 months")
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y") 
seq.Date(date_list[1], date_list[2], by = "1 week")
unclass(as.POSIXct("2026-03-12 01:30:00", "%Y-%m-%d %H:%M:%S", tz = "America/New_York"))
unclass(as.POSIXlt("2026-03-12 01:30:00", "%Y-%m-%d %H:%M:%S", tz = "America/New_York"))
Sys.timezone()
OlsonNames()
temp_data1 <- read.csv("./data/HOBO_temp_example.csv")

# check data
head(temp_data1)
temp_data <- read.csv("./data/HOBO_temp_example.csv", skip = 1)[,1:3]
colnames(temp_data) <- c("index", "date_time", "tempF")

View(temp_data)
knitr::kable(temp_data[1:50,], caption = "First 50 rows of temp_data") |> 
  kableExtra::scroll_box(height = "300px") |> 
  kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 10) |> 
  kableExtra::column_spec(1:3, background = 'white', include_thead = T)
temp_data$timestamp <- as.POSIXct(temp_data$date_time, 
                                  format = "%m/%d/%Y %H:%M", 
                                  tz = "America/New_York")
head(temp_data)
temp_data$date <- format(temp_data$timestamp, "%Y%m%d") 
temp_data$month <- format(temp_data$timestamp, "%b")
temp_data$time <- format(temp_data$timestamp, "%I:%M") 
temp_data$hour <- as.numeric(format(temp_data$timestamp, "%I")) 
head(temp_data)
temp_data$month_num <- as.numeric(format(temp_data$timestamp, "%m"))
head(temp_data)
temp_data$julian <- as.numeric(format(temp_data$timestamp, "%j"))
head(temp_data)
#----------------------------------------------
#     Day 2: Data Viz. Best Practices Code 
#----------------------------------------------
library(knitr)
library(kableExtra)
covid_numbers <- read.csv("./data/covid_numbers.csv")
head(covid_numbers, 7) |> 
  knitr::kable(align = "c", caption = "<h6><b>Table 1.</b> Daily Covid cases and population numbers by state (only showing first 7 records)</h6>") |> 
  kableExtra::kable_styling(full_width = F,  html_font = 'Arial', font_size = 12) |> 
  kableExtra::column_spec(1:4, background = 'white', include_thead = T)
acme_in <- read.csv("./data/acme_sales.csv") |> 
  dplyr::arrange(category, product) 
acme_in |> 
  knitr::kable(align = "c", caption = "<h6><b>Table 2. </b>Average monthly revenue (in $1000's) from Acme product sales, 1950 - 2020</h6>") |> 
  kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 12) |> 
  kableExtra::column_spec(1:14, background = 'white', include_thead = T)

acme <- acme_in |> 
  pivot_longer(-c(category, product), names_to = "month", values_to = "revenue")
acme$month <- factor(acme$month, levels = month.abb)

ggplot(acme, aes(x=month, y=product, fill=revenue)) + 
  geom_raster() +
  geom_text(aes(label=revenue, color = revenue > 1250)) + # color of text conditional on revenue relative to 1250
  scale_color_manual(guide = "none", values = c("black", "white")) + # set color of text
  scale_fill_viridis_c(direction = -1, name = "Monthly revenue,\nin $1000's") +
  scale_y_discrete(limits=rev) + # reverses order of y-axis bc ggplot reverses it from the data
  labs(#title = "Average monthly revenue (in $1000's) from Acme product sales, 1950 - 2020", 
       x = "Month", y = "Product") + 
  theme_bw(base_size = 11) +
  facet_grid(rows = vars(category), scales = "free") # set scales to free so each facet only shows its own levels
ansc <- anscombe |> 
  dplyr::select(x1, y1, x2, y2, x3, y3, x4, y4) 

ansc |> 
  knitr::kable(align = "c", caption = "<h6><b>Table 3.</b> Anscombe's Quartet - Four bivariate datasets with identical summary statistics</h6>") |> 
  kableExtra::column_spec (c(2,4,6),border_left = F, border_right = T) |> 
  kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 12) |> 
  kableExtra::column_spec(1:8, background = 'white', include_thead = T)

sapply(ansc, function(x) c(mean=round(mean(x), 2), var=round(var(x), 2))) |> 
  knitr::kable(align = "c", caption = "<h6><b>Table 4. </b>Means and variances are identical in the four datasets. The correlation between x and y (r = 0.82) is also identical across the datasets.</h6>") |> 
  kableExtra::column_spec (c(1,3,5,7), border_left = F, border_right = T) |> 
  kableExtra::kable_styling(full_width = F, html_font = 'Arial', font_size = 12) |> 
  kableExtra::column_spec(1:9, background = 'white', include_thead = T)
#------------------------------------
#    Day 2: Intro to ggplot Code 
#------------------------------------
knitr::opts_chunk$set(warning=FALSE, message=FALSE, fig.align = 'center', fig.height = 4, fig.width = 6)
library(ggplot2) # load ggplot
library(dplyr)
pcov <- read.csv("./data/SHIHAR_photoplot_cover.csv") # import data

cols <- c("ASCNOD" = "#bcb02f", "BARSPP" = "#CAC7B6", 
          "NONCOR" = "#430816", "FUCSPP" = "#65651a",
          "MUSSPP" = "#170461", "REDGRP" = "#9e224d")

shps <- c("ASCNOD" = 23, "BARSPP" = 24, "NONCOR" = 23,
          "FUCSPP" = 25, "MUSSPP" = 23, "REDGRP" = 25)

xrange <- range(pcov$Year)

ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  #geom_smooth(se = F, span = 0.75) +
  #geom_line(linewidth = 0.6) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) +
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), # turns of major grids
        panel.grid.minor = element_blank(), # turns off minor grids
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
# load package
library(ggplot2) 
# import data
pcov <- read.csv("./data/SHIHAR_photoplot_cover.csv") 
# check out the data
head(pcov) 
p <- ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                             color = CoverCode, 
                             fill = CoverCode,
                             shape = CoverCode))

p
p2 <- p + 
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover)) +
  geom_point()

p2
p3 <- p2 + 
  scale_fill_manual(values = c("ASCNOD" = "#bcb02f", "BARSPP" = "#CAC7B6", 
                               "NONCOR" = "#420816", "FUCSPP" = "#646519",
                               "MUSSPP" = "#170461", "REDGRP" = "#9e224d")) +
  scale_color_manual(values = c("ASCNOD" = "#bcb02f", "BARSPP" = "#CAC7B6", 
                                "NONCOR" = "#420816", "FUCSPP" = "#646519",
                                "MUSSPP" = "#170461", "REDGRP" = "#9e224d")) +
  scale_shape_manual(values = c("ASCNOD" = 23, "BARSPP" = 24, "NONCOR" = 23,
                                "FUCSPP" = 25, "MUSSPP" = 23, "REDGRP" = 25))
p3
cols <- c("ASCNOD" = "#bcb02f", "BARSPP" = "#CAC7B6", 
          "NONCOR" = "#420816", "FUCSPP" = "#646519",
          "MUSSPP" = "#170461", "REDGRP" = "#9e224d")

shps <- c("ASCNOD" = 23, "BARSPP" = 24, "NONCOR" = 23,
          "FUCSPP" = 25, "MUSSPP" = 23, "REDGRP" = 25)

p3 <- p2 + 
  geom_point(color = 'dimgrey', size = 2.5) + # setting point outline to dark grey 
  scale_fill_manual(values = cols, name = "Species Group") +
  scale_color_manual(values = cols, name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group")
  
p3
# Determine year range (so not hard coded/easily updated in future)
xrange <- range(pcov$Year)

p4 <- p3 + 
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") 
  
p4
p5 <- p4 +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1), # angle x text
          panel.grid.major = element_blank(), # turns of major grids
          panel.grid.minor = element_blank(), # turns off minor grids
          panel.background = element_rect(fill = 'white', color = 'dimgrey'),# makes background white 
          legend.key = element_blank()) # removes square fill around symbols in legend

p5
p6 <- p5 + facet_wrap(~CommunityType)

p6  
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) +
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), # turns of major grids
        panel.grid.minor = element_blank(), # turns off minor grids
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
ggsave("SHIHAR_photoplot_cover.svg", height = 8, width = 7)
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 1.5) + # changed this line
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 1.2) + # changed this line
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") + 
  scale_shape_manual(values = shps, name = "Species Group") + 
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = "Year", y = "Avg Percent Cover") + # changed this line
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
p6 + geom_line(linewidth = 0.8, aes(group = CoverCode))
p6 + geom_smooth(se = F, span = 0.75)
p6 + geom_smooth(method = 'lm', se = F, linetype = 'dashed')
p6 + geom_hline(aes(yintercept = 50), linetype = "dashed") 
p6 + geom_hline(aes(yintercept = 50, linetype = "50% line")) +
     scale_linetype_manual(values = c("50% line" = "dashed"), 
                           name = "Threshold")
p6 + theme(legend.position = 'bottom')
p6 + theme(legend.title = element_text(size = 12, face = 'bold'),
           legend.text = element_text(size = 11))
p6 + theme(strip.background = element_rect(fill = "#F5F0DC", color = "black"),
           strip.text = element_text(size = 10))
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_bar(stat = 'identity', position = 'fill', color = 'dimgrey') +
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Median. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), # turns of major grids
        panel.grid.minor = element_blank(), # turns off minor grids
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)

ggplot(data = pcov |> filter(CommunityType == "Barnacle"), # note filter 
       aes(x = Year, y = avg_cover, 
           color = CoverCode, group = CoverCode,
           fill = CoverCode, shape = CoverCode)) +
  geom_bar(stat = 'identity', position = 'dodge', color = 'dimgrey') + # new line
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), linewidth = 0.6) +
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CoverCode) # Different facet


# load packages
library(ggplot2) 
library(patchwork) # multipanel plots

# load data
chem <- read.csv("./data/ACAD_Jordan_Pond_water_chem.csv")

# make date field a date
chem$date <- as.Date(chem$date, format = "%Y-%m-%d")
# pH plot
p_pH <-
  ggplot(chem, aes(x = date, y = pH)) + 
  theme_bw() +
  geom_smooth(se = F, span = 0.5) +
  geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
  labs(y = "pH", x = "Year") +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

# temp plot
p_temp <-
  ggplot(chem, aes(x = date, y = Temp_F)) + 
  theme_bw() +
  geom_smooth(se = F, span = 0.5) +
  geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
  labs(y = "Temp (F)", x = "Year") +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
 
# Diss. Oxygen plot
p_do <-
  ggplot(chem, aes(x = date, y = DO_mgL)) + 
  theme_bw() +
  geom_smooth(se = F, span = 0.5) +
  geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
  labs(y = "DO (mg/L)", x = "Year") +
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

# Conductance plot
p_cond <-
  ggplot(chem, aes(x = date, y = SpCond_uScm)) + 
  theme_bw() +
  geom_smooth(se = F, span = 0.5) +
  geom_point(color = "dimgrey", alpha = 0.5, size = 2) +
  labs(y = "Spec. Cond. (uScm)", x = "Year")+
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
p_pH + p_temp + p_do + p_cond
library(patchwork)
p_pH / p_temp / p_do / p_cond + plot_layout(axes = "collect_x")
#------------------------------------
#     Day 3: ggplot Palettes Code 
#------------------------------------
display.brewer.all(colorblindFriendly = TRUE)
p_pal <- ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = "Year", y = "Avg Percent Cover") + # changed this line
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
p_pal +  scale_color_brewer(name = "Species Group", palette = "Set2", 
                            aesthetics = c("fill", "color")) 

p_pal +  scale_color_brewer(name = "Species Group", palette = "Dark2", 
                            aesthetics = c("fill", "color")) 

p_pal +  scale_color_brewer(name = "Species Group", palette = "RdYlBu", 
                            aesthetics = c("fill", "color"))  
# viridis 
scales::show_col(viridis(12), cex_label = 0.45, ncol = 6)
p_pal + scale_color_viridis_d(name = "Species Group", aesthetics = c("fill", "color"))  #default viridis 
p_pal + scale_color_viridis_d(name = "Species Group", aesthetics = c("fill", "color"), option = 'turbo') 
p_heat <- 
ggplot(chem, aes(x = mon, y = year, color = Temp_F, fill = Temp_F)) + 
  theme_bw() +
  geom_tile() + 
  labs(y = "Year", x = "Month") +
  scale_x_continuous(breaks = c(5, 6, 7, 8, 9, 10),
                     limits = c(4, 11), 
                     labels = month.abb[5:10]) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

p_heat + scale_color_viridis_c(name = "Temp. (F)", aesthetics = c("fill", "color")) 
p_heat + scale_color_viridis_c(name = "Temp. (F)", aesthetics = c("fill", "color"), 
                               option = "plasma", direction = -1) 
p_heat + scale_color_gradient(low = "#FCFC9A", high = "#F54927", 
                              aesthetics = c("fill", 'color'), 
                              name = "Temp. (F)") 

p_heat + scale_color_gradient2(low = "navy", mid = "#FCFC9A", high = "#F54927", 
                               aesthetics = c("fill", 'color'),
                               midpoint = mean(chem$Temp_F), 
                               name = "Temp. (F)") 

p_heat + scale_color_gradientn(colors = c("#805A91", "#406AC2", "#FBFFAD", "#FFA34A", "#AB1F1F"), 
                               aesthetics = c("fill", 'color'),
                               guide = "legend",
                               breaks = c(seq(40, 85, 5)), 
                               name = "Temp. (F)") 

p_heat + scale_color_gradient2(low = "#3E693D", mid = "#FDFFC7", high = "#7A6646", 
                               aesthetics = c("fill", 'color'),
                               midpoint = mean(chem$Temp_F), 
                               name = "Temp. (F)") 
library(tidyverse)
library(readxl)

ctd_mma <- read_xlsx("./data/PR_PF_2903444 (2).xlsx") |> data.frame()
ggplot(ctd_mma, aes(x = `TEMP..degree_Celsius.`, 
                    y = `PRES..decibar.`,
                    group = Station, 
                    color = Station)) +
  geom_line() + 
  theme_bw() +
  labs(x = "Temp. (C)", y = "Pressure (dbars)") +
  scale_color_gradientn(colors = c("#805A91", "#406AC2", "#FBFFAD", "#FFA34A", "#AB1F1F"), 
                        aesthetics = c('color'),
                        guide = "legend", # makes legend distinct, rather than color band
                        breaks = 1:15, # number of stations
                        name = "Station ID") +
  scale_y_reverse() + # flip y axis
  scale_x_continuous(limits = c(0, 30),
                     breaks = seq(0, 30, 5),
                     position = 'top') + # plot x-axis on top
  theme(legend.position = 'bottom')

library(dplyr)
library(ggplot2)
# install.packages('car') # uncomment and run if you don't have this package installed
library(car) # for levene's test

# import data
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")
head(motinv)
# prep data for analysis
motinv_final <- motinv |> 
  mutate(Damage = as.numeric(replace(Damage, Damage == "PM", NA)), # Fix Damage PM
         SitePlot = paste(SiteCode, PlotName, sep = "-"), # create new SitePlot column
         Date = as.Date(StartDate, format = "%m/%d/%Y"), # create new Date column
         No.Damage_fix = replace(No.Damage, No.Damage == 1960, 196), 
         total_count = Damage + No.Damage,
         total_count_fix = Damage + No.Damage_fix, # fix error in No.Damage 
         year_st = Year - 2012) |> # set start year to 1 instead of 2013 for better interpretation 
  filter(QAQC == FALSE) |> # drop QAQC visits
  arrange(SitePlot, Year, ScientificName) # optional sorting the data

# summarize counts, so 1 count per year, species and community type
motinv_sum <- motinv_final |> summarize(mean_count = mean(total_count, na.rm = TRUE),
                                        mean_count_fix = mean(total_count_fix, na.rm = T),
                                        .by = c(SiteCode, year_st, CommunityType, SpeciesCode))

# prep for linear regression
motinv_reg <- motinv_sum |> filter(SpeciesCode == "LITLIT" & CommunityType == "Red Algae") 
head(motinv_reg)

# prep for analysis of variance
motinv_aov <- motinv_sum |> filter(SpeciesCode == "LITLIT") |> 
  mutate(ComCode = toupper(substr(CommunityType, 1, 3))) # create community code for easier plotting
head(motinv_aov)
lm_mod <- lm(mean_count ~ year_st, data = motinv_reg)
par(mfrow = c(2,2)) # makes diagnostic plots 2 x 2 grid 
plot(lm_mod)
par(mfrow = c(1,1)) # resets to 1 plot 
hist(resid(lm_mod))
# detect outliers as > 2 SD of residuals
outliers <- which(abs(resid(lm_mod)) > 2 * sd(resid(lm_mod)))

# Highlight the outliers in a scatterplot
plot(mean_count ~ year_st, data = motinv_reg)
points(motinv_reg$year_st[outliers], motinv_reg$mean_count[outliers], col = "red", pch = 19)
lm_mod_fix <- lm(mean_count_fix ~ year_st, data = motinv_reg)
par(mfrow = c(2,2)) # makes diagnostic plots 2 x 2 grid 
plot(lm_mod_fix)
par(mfrow = c(1,1)) # resets to 1 plot 
hist(resid(lm_mod_fix))
summary(lm_mod_fix) 
ggplot(data = motinv_reg, aes(x = year_st, y = mean_count_fix)) +
  geom_point() +
  geom_smooth(method = 'lm') +
  scale_x_continuous(breaks = seq(1, 13, 2),
                     labels = seq(1, 13, 2) + 2012) +
  labs(x = "Year", y = "Mean common periwinkle count") +
  theme_bw()

aov_mod <- aov(mean_count_fix ~ ComCode, data = motinv_aov)
par(mfrow = c(2,2)) # makes diagnostic plots 2 x 2 grid 
plot(aov_mod)
par(mfrow = c(1,1)) # resets to 1 plot 
hist(resid(aov_mod))
library(car)
leveneTest(aov_mod)
shapiro.test(rstandard(aov_mod)) 
summary(aov_mod) 
TukeyHSD(aov_mod, conf.level = 0.95)
plot(TukeyHSD(aov_mod, conf.level = 0.95), las = 2)
# reorder community by elevation
motinv_aov$ComCode_fac <- factor(motinv_aov$ComCode, levels = c("BAR", "ASC", "FUC", "RED"))

ggplot(data = motinv_aov, aes(x = ComCode_fac, y = mean_count_fix)) +
  stat_summary(geom = 'bar', fun.data = mean_se, fill = 'grey', color = 'dimgrey') +
  stat_summary(geom = 'errorbar', fun.data = mean_se, color = 'dimgrey', width = 0.3) +
  labs(x = "Community Type", y = "Mean common periwinkle count") +
  geom_text(aes(x = 1, y = 30, label = "AB"), size = 5) +
  geom_text(aes(x = 2, y = 70, label = "A"), size = 5) +
  geom_text(aes(x = 3, y = 125, label = "B"), size = 5) +
  geom_text(aes(x = 4, y = 118, label = "B"), size = 5) +
  theme_bw()

#------------------------------------
#     Day 3: Best Practices Code 
#------------------------------------
# libraries
library(dplyr) # for mutate and filter

# parameters
analysis_year <- 2023

# import data set
photo_dat <- read.csv("./data/SHIHAR_photoplot_cover.csv")

# Filtering on Barnacle community type and analysis year
photo_dat2 <- photo_dat |> filter(CommunityType == "Barnacle") |> 
                           filter(Year == analysis_year) 
snake_case # most common in R
camelCase # capitalize new words after the first
period.separation # separate words by periods
whyWOULDyouDOthisTOsomeone # excess capitalization is a pain
# good word order
ACAD_rocky <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_rocky2 <- ACAD_rocky |> filter(year > 2020)
ACAD_rocky3 <- ACAD_rocky2 |> mutate(plot_type = "vital signs")

# bad word order
rocky_ACAD <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_after_2020 <- rocky_ACAD |> filter(year > 2020)
vital_ACAD_2020 <- ACAD_after_2020 |> mutate(plot_type = "vital signs")

# super long names
ACAD_rocky_intertidal_sampling_data <- data.frame(years_plots_were_sampled = c(2020:2025), wetland_plots_sampled = c(1:6))
ACAD_rocky_intertidal_sampling_data2 <- rocky_intertidal_sampling_data |> filter(years_plots_were_sampled > 2020)

# shorter still meaningful
ACAD_rocky <- data.frame(year = 2020:2025, plot = 1:6)
ACAD_rocky2 <- ACAD_rocky |> filter(year > 2020)
# Good code
trees_final <- trees |> 
  mutate(DecayClassCode_num = as.numeric(DecayClassCode),
         Plot_Name = paste(ParkUnit, PlotCode, sep = "-"),
         Date = as.Date(SampleDate, format = "%m/%d/%Y")) |> 
  rename("Species" = "ScientificName") |> 
  filter(IsQAQC == FALSE) |> 
  select(-DecayClassCode) |> 
  arrange(Plot_Name, TagCode)

# Same code, but much harder to follow
trees_final <- trees|>mutate(DecayClassCode_num=as.numeric(DecayClassCode), Plot_Name=paste(ParkUnit,PlotCode,sep = "-"),  Date=as.Date(SampleDate,format="%m/%d/%Y"))|> rename("Species"="ScientificName")|>filter(IsQAQC==FALSE)|>select(-DecayClassCode)|>arrange(Plot_Name,TagCode)
# Good code
ggplot(data = visits, aes(x = Year, y = Annual_Visits/1000)) +
  geom_line() + 
  geom_point(color = "black", fill = "#82C2a3", size = 2.5, shape = 24) +
  labs(x = "Year", 
       y = "Annual visitors in 1000's") +
  scale_y_continuous(limits = c(2000, 4500),
                     breaks = seq(2000, 4500, by = 500)) + 
  scale_x_continuous(limits = c(1994, 2024),
                     breaks = c(seq(1994, 2024, by = 5))) + 
  theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1), 
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'),
        title = element_text(size = 10) 
        )

# Same code but hard to follow
ggplot(data=visits,aes(x=Year,y=Annual_Visits/1000))+geom_line()+geom_point(color="black",fill="#82C2a3",size=2.5,shape=24) +
labs(x = "Year", y = "Annual visitors in 1000's")+
scale_y_continuous(limits=c(2000,4500),breaks=seq(2000,4500,by=500))+ 
scale_x_continuous(limits=c(1994,2024),breaks=c(seq(1994,2024,by=5)))+ 
theme(axis.text.x=element_text(size=10,angle=45,hjust=1), panel.grid.major=element_blank(), 
panel.grid.minor=element_blank(),panel.background=element_rect(fill='white',color='dimgrey'),
title = element_text(size = 10))
#------------------------------------
#     Day 1: Challenges Code 
#------------------------------------
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")
motinv[c(2, 4, 6, 8), c(1, 2)]
names(motinv) # get the names of the first 2 columns
motinv[c(2, 4, 6, 8), c("Network", "UnitCode")]
# Option 1
length(unique(motinv[, "ScientificName"])) # 6
# Option 2
length(unique(motinv$ScientificName)) # equivalent
# Option 1 - used unique to just return unique site name
unique(motinv$Year[motinv$QAQC == TRUE]) # 2013
# Option 2
unique(motinv[motinv$QAQC == TRUE, "Year"])
# with brackets
A1_2024 <- motinv[motinv$PlotName == "A1" & motinv3$Year == 2024, ]
nrow(A1_2024) # 3
# with base R subset
A1_2024b <- subset(motinv, PlotName == "A1" & Year == 2024)
View(A1_2024b) # 3
# OPTION 2
gcrab <- motinv[motinv$ScientificName == "Carcinus maenas",]
sort(unique(gcrab$Year)) #2019, 2021, 2022, 2023, 2024

gcrab2 <- subset(motinv, ScientificName == "Carcinus maenas")
table(gcrab2$Year)
View(motinv)
max_nd <- max(motinv$No.Damage, na.rm = TRUE)
motinv3[motinv$No.Damage == max_nd,]
# create copy of motinv data
motinv_fix <- motinv

# find the problematic value, and change it to 196
motinv_fix$No.Damage[motinv_fix$Year == 2019 & 
                     motinv_fix$PlotName == "R4" & 
                     motinv_fix$No.Damage == 1960] <- 196

# check your work
range(motinv$No.Damage) # 0 1960
range(motinv_fix$No.Damage) # 0 282

hist(as.numeric(motinv$Damage))
#------------------------------------
#     Day 2: Challenges Code 
#------------------------------------
library(dplyr)
#--- Point intercept data ---
pi_dat <- read.csv("./data/BASHAR_Point_Intercept_data.csv")

#--- Motile invert count ---
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")

#--- Motile invert site ---
motspp <- read.csv("./data/motile_invert_species_table.csv")

#--- hobo temp data ---
temp_data <- read.csv("./data/HOBO_temp_example.csv", skip = 1)[,1:3]
colnames(temp_data) <- c("index", "date_time", "tempF")

# with brackets
A1_2024 <- motinv |> filter(PlotName == "A1" & Year == 2024)
nrow(A1_2024) # 3
gcrab <- motinv |> filter(ScientificName == "Carcinus maenas") |> 
  select(Year) |> unique()

gcrab
max_nd <- max(motinv$No.Damage, na.rm = TRUE)
motinv |> filter(No.Damage == max_nd)
# dplyr approach
motinv_fix <- motinv |> mutate(No.Damage = replace(No.Damage, No.Damage == 1960, 196))
range(motinv$No.Damage)
range(motinv_fix$No.Damage)

pred <- c("CARMAE", "NUCLAP")

# base R
motinv$trophic <- ifelse(motinv$SpeciesCode %in% pred, "predator", "herbivore")
table(motinv$trophic, motinv$SpeciesCode)

# tidyverse
motinv <- motinv |> mutate(trophic = ifelse(SpeciesCode %in% pred, "predator", "herbivore"))
table(motinv$trophic, motinv$SpeciesCode)
# Base R using a nested ifelse()
motinv$count_level <- 
  ifelse(motinv$No.Damage > 35, "High", 
         ifelse(motinv$No.Damage >= 10 & motinv$No.Damage <= 35, "Medium", "Low"))

table(motinv$count_level) # check that it worked
# Tidyverse using case_when() and between()
motinv <- motinv |> mutate(count_level = case_when(No.Damage > 35 ~ "High",
                                                   between(No.Damage, 10, 35) ~ "Medium",
                                                   No.Damage < 10 ~ "Low"))

table(motinv$count_level) # check that it worked
pi_nonveg <- pi_dat |> filter(CoverCode %in% c("BOLT", "ROCK", "WATER")) |> # filter nonveg grps
  summarize(avg_freq = mean(pct_freq), # calc avg.
            .by = c(SiteCode, Year, CoverCode, CoverType)) # grouping variables 

head(pi_nonveg) # check output
pi_subtype <- pi_dat |>
  mutate(sub_type = ifelse(CoverCode %in% c("BOLT", "ROCK", "WATER"), "nonveg", "veg")) |> # filter nonveg grps
  summarize(avg_freq = mean(pct_freq), # calc avg.
            .by = c(SiteCode, Year, sub_type)) |> # grouping variables 
  arrange(SiteCode, Year, sub_type) # sort variables

head(pi_subtype) # check output
# Fix the data issues again
motinv <- motinv |> 
  mutate(NoDamage_fix = replace(No.Damage, Damage == 1960, 196),
         Damage_fix = as.numeric(replace(Damage, Damage == "PM", NA)),
         total_count = NoDamage_fix + Damage_fix) |> 
  filter(QAQC == FALSE)

# Summarize the mean count per plot of each species by year and community type
motinv_sum <- motinv |> 
  summarize(mean_count = sum(total_count)/5, # 5 plots per site
            se_counts = sd(total_count)/sqrt(5), # 5 plots per site
            .by = c(SiteCode, Year, CommunityType, 
                    ScientificName, CommonName, SpeciesCode))
motinv_wide <- motinv_sum |> 
  arrange(SpeciesCode) |> # sorting so columns are alphabetical 
  select(-ScientificName, -CommonName) |> 
  pivot_wider(names_from = SpeciesCode,
              values_from = mean_count, 
              values_fill = 0)

head(motinv_wide)

# Fix the data issues again
motinv <- motinv |> 
  mutate(NoDamage_fix = replace(No.Damage, Damage == 1960, 196),
         Damage_fix = as.numeric(replace(Damage, Damage == "PM", NA)),
         total_count = NoDamage_fix + Damage_fix) |> 
  filter(QAQC == FALSE)

# Summarize the mean count per plot of each species by year and community type
motinv_sum <- motinv |> 
  summarize(mean_count = sum(total_count)/5, # 5 plots per site
            se_counts = sd(total_count)/sqrt(5), # 5 plots per site
            .by = c(SiteCode, Year, CommunityType, 
                    ScientificName, CommonName, SpeciesCode))
motinv_wide_yr <- motinv_sum |> 
  arrange(Year) |> # sorting so columns are alphabetical 
  select(-se_counts) |> 
  pivot_wider(names_from = Year,
              values_from = mean_count, 
              values_fill = 0, 
              names_prefix = "yr_")

head(motinv_wide_yr)
motinv_long_yr <- pivot_longer(motinv_wide_yr, 
                               cols = -c(SiteCode, CommunityType, ScientificName, 
                                         CommonName, SpeciesCode),
                               names_to = "Year", 
                               values_to = "mean_counts", 
                               names_prefix = "yr_") # drops this string from values
# Read in motinv data if you haven't yet
motinv <- read.csv("./data/BASHAR_motile_invert_counts.csv")
head(motinv)

# Read in species table
motspp <- read.csv("./data/motile_invert_species_table.csv")
head(motspp)

intersect(names(motinv), names(motspp)) # 3 columns in common
# left join species to motinv, because don't want to include species not found in count data
motinv_spp <- left_join(motinv, 
                        motspp, 
                        by = c("SpeciesCode", "ScientificName", "CommonName"))

head(motinv_spp)
# anti join of 
anti_join(motspp, motinv, by = c("SpeciesCode", "ScientificName", "CommonName"))

# Create date1 if you don't have it already
date1 <- as.Date("3/12/2026", format = "%m/%d/%Y")
format(date1, format = "%Y%m%d")
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y") 
seq.Date(date_list[1], date_list[2], by = "3 months")
date_list <- as.Date(c("01/01/2026", "12/31/2026"), format = "%m/%d/%Y") 
seq.Date(date_list[1], date_list[2], by = "1 week")
temp_data$month_num <- as.numeric(format(temp_data$timestamp, "%m"))
head(temp_data)
temp_data$julian <- as.numeric(format(temp_data$timestamp, "%j"))
head(temp_data)
#------------------------------------
#       Day 3: Challenges Code 
#------------------------------------
# packages
library(dplyr)
library(ggplot2)
library(patchwork) # for arranging ggplot objects
library(RColorBrewer) # for palettes
library(viridis) # for palettes

# load data
pcov <- read.csv("./data/SHIHAR_photoplot_cover.csv") # import data

# define color and shape objects
cols <- c("ASCNOD" = "#C5B47B", "BARSPP" = "#A9A9A9", 
         "NONCOR" = "#574F91", "FUCSPP" = "#FFD560",
         "MUSSPP" = "#6F88BF", "REDGRP" = "#FF4C53")

shps <- c("ASCNOD" = 23, "BARSPP" = 24, "NONCOR" = 23,
          "FUCSPP" = 25, "MUSSPP" = 23, "REDGRP" = 25)

# Set x axis range
xrange <- range(pcov$Year)

ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 1.5) + # changed this line
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 1.2) + # changed this line
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") + 
  scale_shape_manual(values = shps, name = "Species Group") + 
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = "Year", y = "Avg Percent Cover") + # changed this line
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
p6 <- ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) +
  scale_fill_manual(values = cols, aesthetics = c("fill", "color"), 
                    name = "Species Group") +
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = NULL, y = "Avg. Percent Cover") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), # turns of major grids
        panel.grid.minor = element_blank(), # turns off minor grids
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)

p6 + geom_smooth(se = F, span = 0.75)
p6 + geom_smooth(method = 'lm', se = F, linetype = 'dashed')
p6 + theme(legend.title = element_text(size = 12, face = 'bold'),
           legend.text = element_text(size = 11))
p_pal <- ggplot(data = pcov, aes(x = Year, y = avg_cover, 
                        color = CoverCode, group = CoverCode,
                        fill = CoverCode, shape = CoverCode)) +
  geom_errorbar(aes(ymin = avg_cover - se_cover, ymax = avg_cover + se_cover), 
                linewidth = 0.6) +
  geom_point(color = "dimgrey", size = 2.5) + 
  scale_shape_manual(values = shps, name = "Species Group") +
  scale_x_continuous(limits = c(xrange[1] - 1, xrange[2] + 1),
                     breaks = c(seq(xrange[1] - 1, xrange[2] + 1, 2))) +
  labs(x = "Year", y = "Avg Percent Cover") + # changed this line
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_rect(fill = 'white', color = 'dimgrey'), 
        legend.key = element_blank()) +
  facet_wrap(~CommunityType)
p_pal +  scale_color_brewer(name = "Species Group", palette = "RdYlBu", 
                            aesthetics = c("fill", "color"))  
# Create p_heat for 'palettes' section
p_heat <- 
ggplot(chem, aes(x = mon, y = year, color = Temp_F, fill = Temp_F)) + 
  theme_bw() +
  geom_tile() + 
  labs(y = "Year", x = "Month") +
  scale_x_continuous(breaks = c(5, 6, 7, 8, 9, 10),
                     limits = c(4, 11), 
                     labels = month.abb[5:10]) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
p_heat + scale_color_gradient2(low = "#3E693D", mid = "#FDFFC7", high = "#7A6646", 
                               aesthetics = c("fill", 'color'),
                               midpoint = mean(chem$Temp_F), 
                               name = "Temp. (F)") 
?plot
?dplyr::filter
mean_x <- mean(c(1, 3, 5, 7, 8, 21) # missing closing parentheses
mean_x <- mean(c(1, 3, 5, 7, 8, 21)) # correct
               
birds <- c("black-capped chickadee", "golden-crowned kinglet, "wood thrush") # missing quote after kinglet
birds <- c("black-capped chickadee", "golden-crowned kinglet", "wood thrush") # corrected

birds <- c("black-capped chickadee", "golden-crowned kinglet" "wood thrush") # missing comma after kinglet
birds <- c("black-capped chickadee", "golden-crowned kinglet", "wood thrush") # corrected
x_mean <- maen(x) # misspelled mean
x_mean <- mean(x) # Corrected

# Missing comma to indicate subsetting rows (records)
motinv <- motinv[!is.na(motinv$SiteCode)]
# Correct
motinv <- motinv[!is.na(motinv$SiteCode),]