Week 2 (Activity A) Description
A marketing consultant observed 50 consecutive shoppers at a grocery store, and recorded how much money each shopper spent in the store. The dataset is listed below.
First things first: create a new RStudio Project. Using a separate project for each assignment is probably best. Don’t worry about version control unless you either a) plan on working on these assignments across multiple devices, or b) are already knowledgeable of and comfortable with Git or Subversion.
You’ll need to use this spending.csv data for your assignment. The data is also displayed below.
spending = amount spent in USD. Use the following R code to import the data. Using the following code loads the file remotely. Alternatively, you can download it and keep a local copy and use
dfa <- read.table("spending.csv", header = TRUE), though this assumes your .csv is in the project’s working directory.)
Why are we calling it
df is often used as an abbreviation for
data frame, which is what our new csv/table is called in R. It’s
dfa because it’s the
data frame for
Activity A. It’s important to name data and variables meaningfully.
spendingcolumn you’ll need to reference the dataframe AND the variable:
# Create the dataframe called dfa. dfa <- read.csv(url("https://302.ryanstraight.com/spending.csv"), header = TRUE) # This loads the data from the remote .csv file and saves it in our environment. # Display our newly found data. kable(dfa, caption = "Spending") # That displays the data frame we've just created as a nice looking table. You could also simply type dfa. Try them both out.
echo = TRUEflag is set on your
knitr::opts_chunk$set(echo = FALSE)line in the r setup chunk! It defaults to
FALSEso make sure you switch it to
TRUE. This will display your code and the results.
For this activity, create and submit a document with the following (doing the coding in an R script file and then putting that code in an RMarkdown file is required!):
- Summarize the data by creating and describing the following descriptive statistics:
- standard deviation
- interquartile range
- (optional) any other descriptive statistics you find interesting
- Show how a histogram that, although the distribution of the data is slightly skewed with a long right tail, is approximately normally distributed.
- It’s easiest to write your code in a
.Rfile (called an R script file) so you can easily test it while working. Then, when you’ve got everything above taken care of, create a
.Rmdfile (RMarkdown) and use that to present your data rather than simply turning in code and the results. Here’s a great write-up on how code from an R script can be used in an R Markdown file.
- For this assignment and all others, having read the introduction to RMarkdown page is absolutely key.
- This is very likely going to take some trial and error. Set aside 2-3 times the amount of time you think this will take to account for fixing errors and debugging. R code is relatively straight forward and easy to use but it can be somewhat intimidating to the beginner. You’re encouraged to read through most of the R Markdown book as it will make things much easier on you in the long run. When in doubt: copy example code that works and tweak to your specifications.
- Submitting the assignment:
- Submit both your
- Remember: the point of using this file system is reproducability. If I can’t see the content you won’t get credit for it. That sounds obvious, right? This is why a PDF is important: if you just knit your
html, you may be referencing local files in that page. Files that I don’t have in the same location as you, possibly. So:
- Submit both your