Class 6: R Functions

Author

Brian Wong (PID: A18639001)

Background

ALl functions in R have at least 3 things:

  • A name that we use to call the function.
  • One or more input arguments.
  • The body the lines of R code that do the work

Our first function

Let’s write a silly wee function called add() to add some numbers (the input arguments)

add <- function(x, y){
  x + y
}

Now we can use this function

add(100, 1)
[1] 101
add(x=10, y=10)
[1] 20
add(x=c(100, 1, 100), y=1)
[1] 101   2 101

Q. What if I give a multiple element vector to x and y?

add(x=c(100, 1), y=c(100, 1))
[1] 200   2

Q. What if I give three inputs to the function?

#add(x=c(100, 1), y=1, z=1)

Q. What if I give only one input to the add function?

addnew <- function(x, y=1){
  x + y
}
addnew(x=100)
[1] 101
addnew(c(100,1), 100)
[1] 200 101

If we write our function with input arguments having no default value, then the user will be required to set them when they use the function. We can give our input arguments “default” values by setting them equal to some sensible value - e.g. y=1 in the addnew() function.

##A second function

Let’s try something more interesting: Make a sequence generating tool…

The sample() function can be a useful starting point here:

sample(1:10, size = 4)
[1] 9 6 7 4

Q. Generate 9 random numbers taken from the input vector x=1:10?

sample(1:10, size = 9)
[1]  7  6  1  5  8  4 10  9  2

Q. Generate 12 random numbers taken from the input vector x=1:10?

sample(1:10, size = 12, replace = TRUE)
 [1] 9 8 7 7 6 4 9 2 8 1 4 5

Q. Write code for the sample() function that generates nucleotide sequences of length 6?

sample(c("A","T","C","G"), size = 6, replace = TRUE)
[1] "C" "A" "T" "C" "T" "A"

Q. Write a first function generate_dna() that returns a user specified length DNA sequence:

generate_dna <- function(len=6){
  sample(c("A","T","C","G"), size = len, replace = TRUE)
}
generate_dna(len=100)
  [1] "A" "T" "G" "A" "A" "A" "G" "G" "A" "A" "T" "G" "C" "A" "A" "T" "T" "A"
 [19] "A" "A" "G" "G" "A" "C" "A" "A" "A" "T" "G" "A" "T" "G" "A" "C" "G" "T"
 [37] "A" "T" "T" "T" "A" "A" "G" "G" "A" "C" "C" "C" "A" "C" "C" "A" "C" "A"
 [55] "G" "C" "T" "C" "T" "G" "C" "G" "C" "A" "G" "T" "A" "G" "C" "C" "T" "C"
 [73] "G" "T" "A" "T" "T" "G" "A" "C" "G" "T" "A" "G" "T" "G" "A" "G" "T" "T"
 [91] "C" "C" "T" "G" "A" "A" "A" "A" "A" "G"

Key-Points Every function in R looks fundamentally the same in terms of its structure. Basically 3 things: name, input, body

name <- function(input){
  body
}

Functions can have multiple inputs. These can be required arguments or optional argument with optional arguments having a set default value.

Q. Modify and improve our generate_dna() function to return its generated sequence in a more standard format like “AGTAGTA” rather than the vector “A”, “C”, “G”, “A”

generate_dna <- function(len=6, fasta=TRUE){
  ans <- sample(c("A","T","C","G"), size = len, replace = TRUE)
  if(fasta){
    cat("Single-element vector output")
    ans <- paste(ans, collapse = "")
  }else{
    cat("Multi-element vecotr ouptut")
  }
  
  return(ans)
}

generate_dna()
Single-element vector output
[1] "AACTGG"

The paste() function - its job is to join up or stick together (a.k.a paste) input strings together

paste(c("alice", "loves R", sep="****"))
[1] "alice"   "loves R" "****"   

FLow control means where the R brain goes in your code

good_mood <- TRUE

if(good_mood){
  cat("Great!")
}else{
  cat("Bummer!")
}
Great!

A Protein generating function

Q. Write a funciton, called generate_protein(), that genreates a user specified length protein sequence.

There are 20 natural amino-acids:

aa <-c("A","R","N","D","C","E","Q","G","H",
       "I","L","K","M","F","P",
       "S","T", "W", "Y", "V")
generate_protein <- function(len){
  
  # The amino-acids to sample from
  aa <-c("A","R","N","D","C","E","Q","G","H",
       "I","L","K","M","F","P",
       "S","T", "W", "Y", "V")
  # Draw n=len amino-acids to make our sequence
  ans <- sample(aa, size = len, replace = T)
  ans <- paste(ans, collapse = "")
  return(ans)
}
myseq <- generate_protein(42)
myseq
[1] "VASNEGKPTYDEWSMRHIQSDKIPMSTIMAPTWDIVVFDPHH"

Q. Use that function to generate random protein sequences betwen length 6 and 12

for(i in 6:12){
  # FASTA ID line > ">id"
  cat(">",i,sep="","\n")
  # Protein Sequence Line
  cat(generate_protein(i), "\n")
}
>6
WVMDQG 
>7
VEFLHTN 
>8
QFHAIEIS 
>9
DPTEHVRNF 
>10
FRIIMTIVGQ 
>11
DWNYGLPSQDG 
>12
DREKIRWKKAAR 

Q. Are any of your sequences unique i.e. not found anywhere in nature?

Yes, there are unique sequences starting at 9 amino acids all the way up to 12.