How to Conditionally Remove Character of a Vector Element in R

I have (sometimes incomplete) data on addresses that looks like this:

data <- c("1600 Pennsylvania Avenue, Washington DC", 
          ",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")  

where I need to remove the first and/or last character if either one of them are a comma.

Avinash Raj was able to help me with this on S.O. and the question turned out to be a popular one, so I’ll show the solution here:

> data <- c("1600 Pennsylvania Avenue, Washington DC", 
+           ",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")
> gsub("(?<=^),|,(?=$)", "", data, perl=TRUE)
[1] "1600 Pennsylvania Avenue, Washington DC"
[2] "Siem Reap,FC"                           
[3] "11 Wall Street, New York, NY"           
[4] "Addis Ababa,FC" 

Pattern explanation:

  • (?<=^), In regex (?<=) called positive look-behind. In our case it asserts What precedes the comma must be a line start ^. So it matches the starting comma.
  • | Logical OR operator usually used to combine(ie, ORing) two regexes.
  • ,(?=$) Lookahead aseerts that what follows comma must be a line end $. So it matches the comma present at the line end.

Rlogo