## Posts Tagged ‘r’

## How to Conditionally Remove Character of a Vector Element in R

I have (sometimes incomplete) data on addresses that looks like this:

```
data <- c("1600 Pennsylvania Avenue, Washington DC",
",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")
```

where I need to remove the first and/or last character if either one of them are a comma.

Avinash Raj was able to help me with this on S.O. and the question turned out to be a popular one, so I’ll show the solution here:

```
> data <- c("1600 Pennsylvania Avenue, Washington DC",
+ ",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")
> gsub("(?<=^),|,(?=$)", "", data, perl=TRUE)
[1] "1600 Pennsylvania Avenue, Washington DC"
[2] "Siem Reap,FC"
[3] "11 Wall Street, New York, NY"
[4] "Addis Ababa,FC"
```

**Pattern explanation:**

`(?<=^),`

In regex`(?<=)`

called positive look-behind. In our case it asserts What precedes the comma must be a line start`^`

. So it matches the starting comma.`|`

Logical OR operator usually used to combine(ie, ORing) two regexes.`,(?=$)`

Lookahead aseerts that what follows comma must be a line end`$`

. So it matches the comma present at the line end.

## Write an R Package from Scratch with Github

Writing an R package is simple. Writing an R package via Github is simple and smart. Github adds all the traditional benefits of version control, in addition to showing off your work and providing and facilitating publication of your package. This tutorial was inspired by a blog post from the beautiful Hillary Parker last year. I used her tut myself, but trying to integrate it with Github leads to some headaches and I felt there were a couple of other small additions to be made.

** **This has been sitting in my Evernote for some time, so I figured it was about time to upload to my own highly neglected blog, however as a caveate I’ll say that I still need to append more sample code and such, so watch for updates.

**Step 0: Load the necessary packages **

`pacman::p_load("devtools", "roxygen2")`

** **

**Step 1: Create your package directory**

* Open a .R file to begin writing code

* Open the automatically generated README.md file and edit appropriately

**Step 2: Add functions**

** **

**Step 3: Add minimal documentation**

* Utilize roxygen2 by typing create(“packagename”)

* Delete the folder created by roxygen2

**Step 4: Add optional, but recommended example and docs**

**4a. data**

**4b. vignettes**** **

** **

*http://r-pkgs.had.co.nz/vignettes.html*

**For more detail: **

**4c. man**

** **

**Step 5: Process your documentation**

** **

`setwd`

`(`

`"./dogs"`

`)`

`document`

`()`

` `

**Step 6: Install your package!**

** **

`setwd`

`(`

`".."`

`)`

`install`

`(`

`"dogs"`

`)`

## R: Happy Pi Day

Today, 3/14/2015, is Pi Day (see http://piday.org).

In honor of Pi Day, I threw together a little R code on Github, which discusses pi, prints it, and creates Julia set (fractal) images based on it:

https://github.com/hack-r/Rpiday

Happy Pi Day!

## R: How to Transform “prob” Predictions to a Single Column of Predicted Values

a <- cbind(x1, y1)

b <- cbind(x, y)

a$actual <- a$y1

b$actual <- b$y

a$y1 <- NULL

b$y <- NULL

c <- rbind(a, b)

# Run Predictions for Entire Data Set ————————————-

all_preds <- predict(rf, newdata = c, type = “prob”)

colSums(all_preds)

summary(c$actual)

c$predicted <- apply(all_preds, 1, which.max)

## Machine Learning: Definition of %Var(y) in R’s randomForest package’s regression method

The second column is simply the first column divided by the variance of the response that have been OOB up to that point (20 trees), times 100.

Source:

https://stat.ethz.ch/pipermail/r-help/2008-July/167748.html

## R: Add smoother to ggplot2 plot (geom_smooth()) in 1 line

Just use qplot(votes, rating, data = movies) + geom_smooth()

## Did you know? Source of ggplot2 in R

You thought it was Hadley Wickham, right? Nope!

ggplot2 comes from * Grammar of Graphics* developed by Leland Wilkinson

## R: Annotate the panels in a multi-panel lattice plot in 1 line

Just use panel.lmline()

## Ruby: Use R in Ruby via “rinruby”!

**>> sample_size = 10**

>> R.eval “x <- rnorm(#{sample_size})”

>> R.eval “summary(x)”

>> R.eval “sd(x)”

>> R.eval “x <- rnorm(#{sample_size})”

>> R.eval “summary(x)”

>> R.eval “sd(x)”

With a here document:

```
require "rinruby"
#Set all your variables in Ruby
n = 10
beta_0 = 1
beta_1 = 0.25
alpha = 0.05
seed = 23423
R.x = (1..n).entries
#Use actual R code to perform the analysis
R.eval <<EOF
set.seed(#{seed})
y <- #{beta_0} + #{beta_1}*x + rnorm(#{n})
fit <- lm( y ~ x )
est <- round(coef(fit),3)
pvalue <- summary(fit)$coefficients[2,4]
EOF
```

## Quick-tip: Read a table or other data from your clipboard in R

xxx <- read.delim("clipboard")

`rdat`

into the Windows clipboard (for example, to copy into Excel) use:</div> <div> <pre><code>write.table(rdat, "clipboard", sep="\t", row.names=FALSE, col.names=FALSE)