Archive for the ‘R’ Category

Write an R Package from Scratch with Github

Writing an R package is simple. Writing an R package via Github is simple and smart. Github adds all the traditional benefits of version control, in addition to showing off your work and providing and facilitating publication of your package. This tutorial was inspired by a blog post from the beautiful Hillary Parker last year. I used her tut myself, but trying to integrate it with Github leads to some headaches and I felt there were a couple of other small additions to be made.

 This has been sitting in my Evernote for some time, so I figured it was about time to upload to my own highly neglected blog, however as a caveate I’ll say that I still need to append more sample code and such, so watch for updates.

 
Step 0: Load the necessary packages  
if (!require(“pacman”)) install.packages(“pacman”) # Don’t use pacman yet? Get ready to fall in love
pacman::p_load("devtools", "roxygen2")

 

Step 1: Create your package directory
 
* Create a new repo on Github with the name of your package 
* Create a new project in RStudio from the Github repo
* Open a .R file to begin writing code
* Open the automatically generated README.md file and edit appropriately
 
Step 2: Add functions
 
* Enter your functions and save the file (i.e. dog_function.R) 
* You can move this to the R folder once it has been automatically created in Step 3, or feel free to create the folder before saving the .R file (remember not to overwrite it in the next step)
 

Step 3: Add minimal documentation

* Utilize roxygen2 by typing create(“packagename”)

* Copy the files in this newly created folder — except the .Rproj and .gitignore files — to the top level folder you cloned from Github
* Delete the folder created by roxygen2 
* Edit the files to reflect the details of your package, such as its license and author
 

Step 4: Add optional, but recommended example and docs

 

4a. data
 
* dir.create(“data”) # Example .RData goes here (optional, but strongly recommended)
* include a file called datalist to list the data in this folder, for example:
 
4b. vignettes
 
* dir.create(“vignettes”) # From the top level folder that you created on Github
* Add a .pdf, .Rnw vignette files here
 
 
4c. man
 
* dir.create(“man”) # From the top level folder that you created on Github
* Add .Rd manual files here
 
Step 5: Process your documentation
 
setwd("./dogs")
document()
 
Step 6: Install your package!
 
setwd("..")
install("dogs")


R: Happy Pi Day

Today, 3/14/2015, is Pi Day (see http://piday.org).

In honor of Pi Day, I threw together a little R code on Github, which discusses pi, prints it, and creates Julia set (fractal) images based on it:

https://github.com/hack-r/Rpiday

Happy Pi Day!

pi_fractal

R: How to Transform “prob” Predictions to a Single Column of Predicted Values

# Recombine Test + Training ———————————————–
a <- cbind(x1, y1)
b <- cbind(x, y)

a$actual <- a$y1
b$actual <- b$y
a$y1     <- NULL
b$y      <- NULL

c <- rbind(a, b)

# Run Predictions for Entire Data Set ————————————-
all_preds <- predict(rf, newdata = c, type = “prob”)
colSums(all_preds)
summary(c$actual)

c$predicted <- apply(all_preds, 1, which.max)

then you can replace the column number with the descriptive category name or whatever
NOTE: This is NOT the same result that you’ll get by doing colSums on the “prob” type prediction, however

Machine Learning: Definition of %Var(y) in R’s randomForest package’s regression method

The second column is simply the first column divided by the variance of the response that have been OOB up to that point (20 trees), times 100. 
Source:
https://stat.ethz.ch/pipermail/r-help/2008-July/167748.html

R: Add smoother to ggplot2 plot (geom_smooth()) in 1 line

Just use qplot(votes, rating, data = movies) + geom_smooth()

Did you know? Source of ggplot2 in R

You thought it was Hadley Wickham, right? Nope!

ggplot2 comes from  Grammar of Graphics developed by Leland Wilkinson

R: Annotate the panels in a multi-panel lattice plot in 1 line

Just use panel.lmline()

Ruby: Use R in Ruby via “rinruby”!

  >>  sample_size = 10
>>  R.eval “x <- rnorm(#{sample_size})”
>>  R.eval “summary(x)”
>>  R.eval “sd(x)”

With a here document:

require "rinruby"      
#Set all your variables in Ruby
n = 10
beta_0 = 1
beta_1 = 0.25
alpha = 0.05
seed = 23423
R.x = (1..n).entries
#Use actual R code to perform the analysis
R.eval <<EOF
  set.seed(#{seed})
  y <- #{beta_0} + #{beta_1}*x + rnorm(#{n})
  fit <- lm( y ~ x )
  est <- round(coef(fit),3)
  pvalue <- summary(fit)$coefficients[2,4]
EOF

Quick-tip: Read a table or other data from your clipboard in R


xxx <- read.delim("clipboard") 
If you want to copy data from an R variable named rdat into the Windows clipboard (for example, to copy into Excel) use:
</div>
<div>
<pre><code>write.table(rdat, "clipboard", sep="\t", row.names=FALSE, col.names=FALSE)

R: Automatic Re-installation of R Packages (Quick Tip)

  1. package_df <-as.data.frame(installed.packages(“/Library/Frameworks/R.framework/Versions/2.15/Resources/library”))
  2. package_list <- as.character(package_df$Package)
  3. install.packages(package_list)
  4. ???????????
  5. PROFIT