Archive for the ‘R’ Category

How to Conditionally Remove Character of a Vector Element in R

I have (sometimes incomplete) data on addresses that looks like this:

data <- c("1600 Pennsylvania Avenue, Washington DC", 
          ",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")  

where I need to remove the first and/or last character if either one of them are a comma.

Avinash Raj was able to help me with this on S.O. and the question turned out to be a popular one, so I’ll show the solution here:

> data <- c("1600 Pennsylvania Avenue, Washington DC", 
+           ",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")
> gsub("(?<=^),|,(?=$)", "", data, perl=TRUE)
[1] "1600 Pennsylvania Avenue, Washington DC"
[2] "Siem Reap,FC"                           
[3] "11 Wall Street, New York, NY"           
[4] "Addis Ababa,FC" 

Pattern explanation:

  • (?<=^), In regex (?<=) called positive look-behind. In our case it asserts What precedes the comma must be a line start ^. So it matches the starting comma.
  • | Logical OR operator usually used to combine(ie, ORing) two regexes.
  • ,(?=$) Lookahead aseerts that what follows comma must be a line end $. So it matches the comma present at the line end.


Write an R Package from Scratch with Github

Writing an R package is simple. Writing an R package via Github is simple and smart. Github adds all the traditional benefits of version control, in addition to showing off your work and providing and facilitating publication of your package. This tutorial was inspired by a blog post from the beautiful Hillary Parker last year. I used her tut myself, but trying to integrate it with Github leads to some headaches and I felt there were a couple of other small additions to be made.

 This has been sitting in my Evernote for some time, so I figured it was about time to upload to my own highly neglected blog, however as a caveate I’ll say that I still need to append more sample code and such, so watch for updates.

Step 0: Load the necessary packages  
if (!require(“pacman”)) install.packages(“pacman”) # Don’t use pacman yet? Get ready to fall in love
pacman::p_load("devtools", "roxygen2")


Step 1: Create your package directory
* Create a new repo on Github with the name of your package 
* Create a new project in RStudio from the Github repo
* Open a .R file to begin writing code
* Open the automatically generated file and edit appropriately
Step 2: Add functions
* Enter your functions and save the file (i.e. dog_function.R) 
* You can move this to the R folder once it has been automatically created in Step 3, or feel free to create the folder before saving the .R file (remember not to overwrite it in the next step)

Step 3: Add minimal documentation

* Utilize roxygen2 by typing create(“packagename”)

* Copy the files in this newly created folder — except the .Rproj and .gitignore files — to the top level folder you cloned from Github
* Delete the folder created by roxygen2 
* Edit the files to reflect the details of your package, such as its license and author

Step 4: Add optional, but recommended example and docs


4a. data
* dir.create(“data”) # Example .RData goes here (optional, but strongly recommended)
* include a file called datalist to list the data in this folder, for example:
4b. vignettes
* dir.create(“vignettes”) # From the top level folder that you created on Github
* Add a .pdf, .Rnw vignette files here
4c. man
* dir.create(“man”) # From the top level folder that you created on Github
* Add .Rd manual files here
Step 5: Process your documentation
Step 6: Install your package!

R: Happy Pi Day

Today, 3/14/2015, is Pi Day (see

In honor of Pi Day, I threw together a little R code on Github, which discusses pi, prints it, and creates Julia set (fractal) images based on it:

Happy Pi Day!


R: How to Transform “prob” Predictions to a Single Column of Predicted Values

# Recombine Test + Training ———————————————–
a <- cbind(x1, y1)
b <- cbind(x, y)

a$actual <- a$y1
b$actual <- b$y
a$y1     <- NULL
b$y      <- NULL

c <- rbind(a, b)

# Run Predictions for Entire Data Set ————————————-
all_preds <- predict(rf, newdata = c, type = “prob”)

c$predicted <- apply(all_preds, 1, which.max)

then you can replace the column number with the descriptive category name or whatever
NOTE: This is NOT the same result that you’ll get by doing colSums on the “prob” type prediction, however

Machine Learning: Definition of %Var(y) in R’s randomForest package’s regression method

The second column is simply the first column divided by the variance of the response that have been OOB up to that point (20 trees), times 100. 

R: Add smoother to ggplot2 plot (geom_smooth()) in 1 line

Just use qplot(votes, rating, data = movies) + geom_smooth()

Did you know? Source of ggplot2 in R

You thought it was Hadley Wickham, right? Nope!

ggplot2 comes from  Grammar of Graphics developed by Leland Wilkinson

R: Annotate the panels in a multi-panel lattice plot in 1 line

Just use panel.lmline()

Ruby: Use R in Ruby via “rinruby”!

  >>  sample_size = 10
>>  R.eval “x <- rnorm(#{sample_size})”
>>  R.eval “summary(x)”
>>  R.eval “sd(x)”

With a here document:

require "rinruby"      
#Set all your variables in Ruby
n = 10
beta_0 = 1
beta_1 = 0.25
alpha = 0.05
seed = 23423
R.x = (1..n).entries
#Use actual R code to perform the analysis
R.eval <<EOF
  y <- #{beta_0} + #{beta_1}*x + rnorm(#{n})
  fit <- lm( y ~ x )
  est <- round(coef(fit),3)
  pvalue <- summary(fit)$coefficients[2,4]

Quick-tip: Read a table or other data from your clipboard in R

xxx <- read.delim("clipboard") 
If you want to copy data from an R variable named rdat into the Windows clipboard (for example, to copy into Excel) use:
<pre><code>write.table(rdat, "clipboard", sep="\t", row.names=FALSE, col.names=FALSE)