Skip to content

Instantly share code, notes, and snippets.

Created February 26, 2014 20:38
Show Gist options
  • Save anonymous/9238018 to your computer and use it in GitHub Desktop.
Save anonymous/9238018 to your computer and use it in GitHub Desktop.
Easy way to split strings and keep id or row number in R?
id name
1 11 rick
2 32 tom
3 37 joe
id letters
1 11 r
2 11 i
3 11 c
4 11 k
5 32 t
6 32 o
7 32 m
8 37 j
9 37 o
10 37 e
@sckott
Copy link

sckott commented Feb 26, 2014

What about this?

df <- data.frame(id =c(11,32,37), name=c("rick","tom","joe"), stringsAsFactors = FALSE)
library(plyr)
foo <- function(x){
  strsplit(x, "")[[1]]
}
ddply(df, .(id, name), summarise, letters=foo(name))
   id name letters
1  11 rick       r
2  11 rick       i
3  11 rick       c
4  11 rick       k
5  32  tom       t
6  32  tom       o
7  32  tom       m
8  37  joe       j
9  37  joe       o
10 37  joe       e

@sckott
Copy link

sckott commented Feb 26, 2014

updated, deleted unneeded stuff

@rvidal
Copy link

rvidal commented Feb 26, 2014

Clean solution. Thanks.

@rvidal
Copy link

rvidal commented Jul 9, 2014

I have a more complicated example. How would I go about if I wanted all combinations of two letters from the name?

Here's another example:

1   F       6,10      Cancer     6,10
2   F       8,10      Cancer     8,10
3   F      12,13    NoCancer    12,13
4   F   3,4,5,10      Cancer         
5   F       7,10      Cancer     7,10
6   F        4,8    NoCancer      4,8

Which I would like to transform into:

1   F       6,10      Cancer    6,10
2   F       8,10      Cancer    8,10
3   F      12,13    NoCancer    12,13
4   F   3,4,5,10      Cancer    3,4
4   F   3,4,5,10      Cancer    3,5
4   F   3,4,5,10      Cancer    3,10
4   F   3,4,5,10      Cancer    4,5
4   F   3,4,5,10      Cancer    4,10
4   F   3,4,5,10      Cancer    5,10
5   F       7,10      Cancer    7,10
6   F        4,8    NoCancer    4,8

Note how entry # 4 has an entry for each combination of two entries.

I was trying something with:

combn(x,2, simplify=F, function(x){ paste(x, collapse=",")} )

I'm treating the comma separated numbers as characters. Any ideas?

@sckott
Copy link

sckott commented Jul 9, 2014

hi, i'm not sure I get the question. Why is row 4 the only one that gets split up? Not sure what name is in this context? Is it the column with "Cancer" and "NoCancer"?

@rvidal
Copy link

rvidal commented Jul 9, 2014

looking for combinations of 2 numbers. All the other cases third column is 2 numbers. However, wherever it is more than two, I'd like to split it up and add it to the last column.
Here's the more elaborate question on SO http://stackoverflow.com/questions/24662637/split-a-string-into-combinations-of-2-characters-and-expand-into-data-frame-in-r

Thanks

@sckott
Copy link

sckott commented Jul 9, 2014

maybe this

df <- data.frame(
  iter=1:6,
  a=rep("F", 6), 
  b=c('6,10','8,10','12,13','3,4,5,10','7,10','4,8'), 
  c=c('Cancer','Cancer','NoCancer','Cancer','Cancer','NoCancer'),
  d=c('6,10','8,10','12,13','','7,10','4,8'), stringsAsFactors = FALSE)

library(plyr)

foo <- function(x){
  tmp <- strsplit(x$b, ",")[[1]]
  if(length(tmp) > 2){
    combos <- combn(tmp, 2, simplify = FALSE)
    combos <- sapply(combos, function(y) paste0(y, collapse=",") )
    data.frame(iter=x$iter, a=x$a, b=x$b, c=x$c, d=combos)
  } else  { x }
}

ddply(df, .(iter), foo)
   iter a        b        c     d
1     1 F     6,10   Cancer  6,10
2     2 F     8,10   Cancer  8,10
3     3 F    12,13 NoCancer 12,13
4     4 F 3,4,5,10   Cancer   3,4
5     4 F 3,4,5,10   Cancer   3,5
6     4 F 3,4,5,10   Cancer  3,10
7     4 F 3,4,5,10   Cancer   4,5
8     4 F 3,4,5,10   Cancer  4,10
9     4 F 3,4,5,10   Cancer  5,10
10    5 F     7,10   Cancer  7,10
11    6 F      4,8 NoCancer   4,8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment