Here is a collections of problems to solve, if you think you are a medium to good R-coder. I have chosen examples in a way that hopefully guarantees that you will learn something useful. My motivation was to improve people’s (that includes me, too) R-skills and creating something fun. Feel free to send your solutions and/or feedback to me and I will have a look (you can also attempt parts of it).

**1. What will the following expressions evaluate to and why?**

y <- "0" print(is.numeric(y))

X <- 0 print(x == 0)

myframe <- data.frame(a1 = rep(0, 100), a2 = -100:-1, a3 = seq(0.01, 1, 0.1)) attach(myframe) a1 <- a2 + a3 + a1 print(myframe$a1)

**2. Data Frame handling:**

- Create a data frame named myresults containing 1000 simulated participants with 3 variables: sex (binomial with p(male)=0.5), height (normal with mu=167, sd=10), and nationality (British, German, and Chinese, all equally likely).
- From myresults, create another data frame by selecting only British participants. What are the dimensions of this data frame?
- Create a third data frame that only contains German citizens who are taller than 175cm. How many of them are there? How many are male?
- Select a random subset of 100 participants for a new data frame. How many of them are Chinese?

**3. The apply-family:**

Given the following matrix:

- Print the dimensions of that matrix.
- Calculate the mean of each column by using a loop.
- Calculate the median of each row by using apply.
- Calculate the standard error for every second column and store the result in a list.
- Change the matrix into a data frame. Change the data frame into a single vector in which the last column of the former data frame are the first 100 observations, the penutlimate column the next 100 observations, and so forth.

**4. Everyday encounters:**

- What is the difference between ? and ?? in R?
- What is the difference between require and library?
- How would you find out what elements an R-object contains?
- What is the difference between <-, =, and <<- in R? When would you use which?
- What is the difference between tapply, lapply, and apply?

**5. Further Functions:**

- Write a function that takes a number mynumb as an input and returns FALSE if the number is even and TRUE otherwise.
- Write a function that takes a vector as input and returns a single values that is the product of all elements in the vector without using a loop. For example:
mycumproduct(1:4) == 24

Should evaluate to TRUE.

- Write a function permute that takes in a number n and returns a list of all possible permutations of 1:n. For example:
Should evaluate to TRUE.

- Write a function prime that takes in a number n and returns all prime numbers up to n. For example:
prime(9) == c(2, 3, 5, 7)

Should evaluate to TRUE.

**6. Further Modelling and Plotting:**

- Have a look at the mtcars data set. Plot a histogram of mpg (miles per gallon) for different cars by using both hist from R-base and truehist from the package MASS. Why do you think truehist is called
**true**hist? - Using lm, regress the variables gear and carb on mpg and look at the summary of the model. Call the resulting model msq.
- Plot the residual vs. fitted values of msq and judge the model’s quality.
- What does the rlm-command do? Using rlm repeat the regression from above and look at the rlm summary. Call this model mrl.
- Using the package ggplot2, create a plot that contains two plots (using facet_wrap), each of which contains the residual vs. fitted values of the used models. The left plot for msq and the right one for mrl. Additionally, put a smooth line into both of the plots using stat_smooth.
- Which model would you prefer?

**7. Further Packages:**

- What does the %>% operator from the package magrittr do?
- What does the | operator from the package lattice do? Use it to create one scatterplot of mpg for each class of gears in the mtcars data set we have used above.
- What does the following code do?

**8. Performance in R:**

- Install the package microbenchmark.
- Write a function allcomparisons1 that takes a data frame as an input and creates a data frame of all possible row-wise comparisons by using simple for-loops.
- Microbenchmark this function for a data frame of the mtcars data set:
timeone <- microbenchmark( allcomparisons1(mtcars), times = 100)

- What does the following code do?
- Microbenchmark this function and compare the performance of the two functions.
- Install the Rcpp package.
- Save the following code with as .cpp file in your current working directory:
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] DataFrame allcomparisons3(NumericMatrix data) { int nrow=data.nrow(); int ncol=data.ncol(); int size=((nrow-1)*nrow)/2; NumericMatrix out(size,ncol); int counter=0; for (int i=0;i<nrow-1;i++){ for (int j=i+1;j<nrow;j++){ for (int k=0;k<ncol;k++){ out(counter,k)=data(i,k)-data(j,k); } counter++; } } return Rcpp::DataFrame::create(Named("comparisons") = out); }

Now, you can read in this function by using:

sourceCpp("allcomparisons3.cpp")

And use it just like any other function.

- Compare all three functions by microbenchmarking them. What do you conclude?