A quiz to improve your R-skills

Here is a collections of problems to solve, if you think you are a medium to good R-coder. I have chosen examples in a way that hopefully guarantees that you will learn something useful. My motivation was to improve people’s (that includes me, too) R-skills and creating something fun. Feel free to send your solutions and/or feedback to me and I will have a look (you can also attempt parts of it).

1. What will the following expressions evaluate to and why?

y <- "0"
print(is.numeric(y))
X <- 0
print(x == 0)
z <- 1/0
print(is.na(z))
length(matrix(0, nrow=10, ncol=10))
myframe <- data.frame(a1 = rep(0, 100), a2 = -100:-1, a3 = seq(0.01, 1, 0.1))
attach(myframe)
a1 <- a2 + a3 + a1
print(myframe$a1)
amazingstop <- function(x, y) {
x + 20
}
print(amazingstop(10, stop("You R amazing!")))
f <- function(x) {
f <- function(x) {
f <- function(x) {
x ^ 2
}
f(x) + 1
}
f(x) * 2
}
f(5)
mutual <- local({
f <- function(y) g(y)
g <- function(x) if(x) x * f(x-1) else 1
})
sapply(1:6, mutual)

2. Data Frame handling:

  1. Create a data frame named myresults containing 1000 simulated participants with 3 variables: sex (binomial with p(male)=0.5), height (normal with mu=167, sd=10), and nationality (British, German, and Chinese, all equally likely).
  2. From myresults, create another data frame by selecting only British participants. What are the dimensions of this data frame?
  3. Create a third data frame that only contains German citizens who are taller than 175cm. How many of them are there? How many are male?
  4. Select a random subset of 100 participants for a new data frame. How many of them are Chinese?

3. The apply-family:

Given the following matrix:

mymat <- matrix(rnorm(1e6), nrow = 100, ncol = 100)
  1. Print the dimensions of that matrix.
  2. Calculate the mean of each column by using a loop.
  3. Calculate the median of each row by using apply.
  4. Calculate the standard error for every second column and store the result in a list.
  5. Change the matrix into a data frame. Change the data frame into a single vector in which the last column of the former data frame are the first 100 observations, the penutlimate column the next 100 observations, and so forth.

4. Everyday encounters:

  1. What is the difference between ? and ?? in R?
  2. What is the difference between require and library?
  3. How would you find out what elements an R-object contains?
  4. What is the difference between <-, =, and <<- in R? When would you use which?
  5. What is the difference between tapply, lapply, and apply?

5. Further Functions:

  1. Write a function that takes a number mynumb as an input and returns FALSE if the number is even and TRUE otherwise.
  2. Write a function that takes a vector as input and returns a single values that is the product of all elements in the vector without using a loop. For example:
    mycumproduct(1:4) == 24

    Should evaluate to TRUE.

  3. Write a function permute that takes in a number n and returns a list of all possible permutations of 1:n. For example:
    permute(2) == list(c(1,2), c(2,1))

    Should evaluate to TRUE.

  4. Write a function prime that takes in a number n and returns all prime numbers up to n. For example:
    prime(9) == c(2, 3, 5, 7)

    Should evaluate to TRUE.

6. Further Modelling and Plotting:

  1. Have a look at the mtcars data set. Plot a histogram of mpg (miles per gallon) for different cars by using both hist from R-base and truehist from the package MASS. Why do you think truehist is called truehist?
  2. Using lm, regress the variables gear and carb on mpg and look at the summary of the model. Call the resulting model msq.
  3. Plot the residual vs. fitted values of msq and judge the model’s quality.
  4. What does the rlm-command do? Using rlm repeat the regression from above and look at the rlm summary. Call this model mrl.
  5. Using the package ggplot2, create a plot that contains two plots (using facet_wrap), each of which contains the residual vs. fitted values of the used models. The left plot for msq and the right one for mrl. Additionally, put a smooth line into both of the plots using stat_smooth.
  6. Which model would you prefer?

7. Further Packages:

  1. What does the %>% operator from the package magrittr do?
  2. What does the | operator from the package lattice do? Use it to create one scatterplot of mpg for each class of gears in the mtcars data set we have used above.
  3. What does the following code do?
    packages <- c('dplyr', 'nycflights13')
    lapply(packages, library, character.only = TRUE)
     
    data <- filter(
    summarise(
    select(
    group_by(flights, year, month, day),
    arr_delay, dep_delay
    ),
    arr = mean(arr_delay, na.rm = TRUE),
    dep = mean(dep_delay, na.rm = TRUE)
    ),
    arr > 30 | dep > 30
    )

8. Performance in R:

  1. Install the package microbenchmark.
  2. Write a function allcomparisons1 that takes a data frame as an input and creates a data frame of all possible row-wise comparisons by using simple for-loops.
  3. Microbenchmark this function for a data frame of the mtcars data set:
    timeone <- microbenchmark( allcomparisons1(mtcars), times = 100)
  4. What does the following code do?
    library(plyr)
     
    allcomparisons2 <- function(dataf){
    combos <- combn(nrow(dataf), 2)
    dout <- adply(combos, 2, function(x) {
    out <- data.frame(dataf[x[1] , ] - dataf[x[2] , ])
    return(out)
    }
    )
    return(dout)
    }
  5. Microbenchmark this function and compare the performance of the two functions.
  6. Install the Rcpp package.
  7. Save the following code with as .cpp file in your current working directory:
    #include <Rcpp.h>
    using namespace Rcpp;
    // [[Rcpp::export]]
    DataFrame allcomparisons3(NumericMatrix data) {
    int nrow=data.nrow();
    int ncol=data.ncol();
    int size=((nrow-1)*nrow)/2;
    NumericMatrix out(size,ncol);
    int counter=0;
    
    for (int i=0;i<nrow-1;i++){
    for (int j=i+1;j<nrow;j++){
    for (int k=0;k<ncol;k++){
    out(counter,k)=data(i,k)-data(j,k);
    }
    counter++;
    }
    }
    
    return Rcpp::DataFrame::create(Named("comparisons") = out);
    }
    

    Now, you can read in this function by using:

    sourceCpp("allcomparisons3.cpp")

    And use it just like any other function.

  8. Compare all three functions by microbenchmarking them. What do you conclude?
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s