data.frame(data.frame(numbers=c(1,2,3)), data.frame(letters=c("a", "b", "c"))) makes a data frame:
numbers letters 1 1 a 2 2 b 3 3 c
data.frame(data.frame(numbers=c(1,2,3)), data.frame(letters=c("a", "b", "c", "d"))) does not:
Error in data.frame(data.frame(numbers = c(1, 2, 3)), data.frame(letters = c("a", : arguments imply differing number of rows: 3, 4
data.frame(data.frame(numbers=c(1,2,3)), data.frame(letters=c("a", "b", "c", "d", "e","f"))) makes a data frame, but not the one I'd expect:
numbers letters 1 1 a 2 2 b 3 3 c 4 1 d 5 2 e 6 3 f
What's the reasoning behind this? I get that if one column is a factor of the other (numerically speaking) it will just repeat the content to make it fit. But I'm not sure why you'd want to do that as a default/without an error, or what about the structure of data frames/R made this seem like a good idea.
Thanks, Ben. Someone on #R also tipped me off to the "recycling" term.
The lack of a warning for when recycling is done evenly seems to rely on the assumption that if that happens, it must be intentional. That seems unwarranted to me.
I wrote a quick (and possibly buggy) piece of code to see how often a number is divisible by another number:
Which gave me a result of 0.02632785643547798, which I interpret as, "For a random number between 1 and 1000, there is a 2% chance that a smaller number will go into it evenly."
I think this a conservative estimate of the number of times someone might accidentally recycle silently, because I think people are more likely to be working with multiples when working with real data. If I want to divide my 100 treatment+control measure A by my 100 treatment+control measure B but accidentally divide by my 50 treatment only measure B, I'm not going to notice that I've done anything wrong.
Now I'm tempted to search through published R code and see if there are any recycling errors.