maelle/thumbs.md

## thumbs.md

      
    Raw
  

              thumbs.md
            
          
    You can get issues ordered by thumbs up votes on the first comment via
ghrecipes! An idea of
Kirill Müller’s
issues <- ghrecipes::get_issues_thumbs(owner = "tidyverse", 
                                          repo = "dplyr")
You could View(issues) or e.g.
issues <- dplyr::mutate_if(issues, is.character, stringr::str_squish)
knitr::kable(issues)


number
created_at
title
author
thumbs_up_no
body
labels
url
milestone_no
milestone_desc
owner
repo


2240
2016-11-08 08:46:46
join_by(): Syntax for generic joins
krlmlr
34
hadley#557 (comment) and hadley#378 (comment) propose a syntax for generic and rolling joins: left_join( FundMonths, Returns, join_by(FundID == FundID, yearmonth > gmonth + 3, yearmonth <= gmonth + 15) ) left_join( events, days, join_by(collector_id == collector_id, event_timestamp >= la
featuregeneric
tidyverse/dplyr#2240
NA
NA
tidyverse
dplyr


341
2014-03-20 12:57:03
Preserve zero-length groups
hadley
9
http://stackoverflow.com/questions/22523131 Not sure what the interface to this should be - probably should default to drop = FALSE.
data framefeature
tidyverse/dplyr#341
10

tidyverse
dplyr


2326
2016-12-15 23:18:13
Single table verbs should accept tibbles in conditions
hadley
8
Currently mutate() and summarise() only work with vectorised functions: functions that take a vector as input and return a vector (or “scalar”) as output. I don’t see any reason why summarise() and mutate() couldn’t also accept tibbles. The existing restrictions would continue to apply so that in s
data framefeature
tidyverse/dplyr#2326
NA
NA
tidyverse
dplyr


2047
2016-07-30 22:11:31
FR: before and after arguments to mutate()
krlmlr
6
These arguments would specify the position where the new columns are inserted, before or after some column given as index or name. See also tidyverse/tibble#99.
featuregeneric
tidyverse/dplyr#2047
NA
NA
tidyverse
dplyr


1185
2015-05-30 17:43:23
Create a group_indices as a new variable
matthieugomez
5
Some packages like ggplot2 act on groups defined by one variable only (as opposed to groups defined by several variables). It would be nice to have a function, say group(), that creates a new integer variable from groups defined by multiple variables: Batting %>% mutate(group = group(teamID, yearID)
data framefeature
tidyverse/dplyr#1185
NA
NA
tidyverse
dplyr


1792
2016-04-28 15:09:13
set_key
hadley
5
library(nycflights13) weather <- flights %>% set_key(year, month, day, hour, origin) planes <- planes %>% set_key(tailnum) airlines <- airlines %>% set_key(carrier) airports <- airports %>% set_key(faa) This would check that the combination of variables is a valid key (i.e. no duplicates and no miss
data framefeature
tidyverse/dplyr#1792
10

tidyverse
dplyr


977
2015-02-17 19:26:36
n_distinct way slower than length(unique)
dpeterson71
3
I would like to move to more uniform implementation of dplyr memes; I really like the syntax. However, I am seeing several instances where dplyr analogues to plyr or base-R functions incur a severe performance hit on my data sets. Here is a simple example ilustrating that dplyr’s n_distinct is a fa
data framefeatureperformance
tidyverse/dplyr#977
10

tidyverse
dplyr


2132
2016-09-21 17:42:51
Summarising verbs with variable-length outputs
lionel-
2
A new dplyr family of verbs for variable-length output may be useful. Like summarise() it would discard all input columns except for the grouping variables. This allows the output to have a different number of rows than the input. Unlike summarise(), it would not require length 1 results and would
featuregeneric
tidyverse/dplyr#2132
NA
NA
tidyverse
dplyr


2432
2017-02-16 20:47:38
Better support combining for non-base types
hadley
2
This is a meta issue related for all bind/grouped-mutate/join/filter issues related to custom S3 + S4 classes chron (#1829) lubridate (#1581, #1708) difftime (#2059) table (#2406) POSIXct (#2322) bit64 (#3210) See also r-lib/vctrs#27
bugdata frame
tidyverse/dplyr#2432
13

tidyverse
dplyr


2993
2017-07-23 09:06:53
DBI sources can’t provide sampling implementation
hannesmuehleisen
2
When using src_sql, it is possible to attach a custom class to tbl objects. But the documentation for src_sql states that it is deprecated and that src_dbi should be used instead. When using src_dbi, no custom class can be set for tbl objects (at least not as far as I can tell). This is mostly fine,
databasefeaturegeneric
tidyverse/dplyr#2993
NA
NA
tidyverse
dplyr


2995
2017-07-25 01:19:26
dplyr feature request, between() for character variables.
lhunsicker
2
I was referred here by the folks at RStudio. I’m not sure that I’m in the right place, as this is not a bug, but a suggestion for a new feature. If there is a more appropriate place for this, let me know. The between() function in dplyr is very nice and much appreciated. But it presently only w
data framefeature
tidyverse/dplyr#2995
NA
NA
tidyverse
dplyr


3259
2017-12-22 14:53:54
bind_rows() using tibbles with attributes loses attributes
DavisVaughan
2
I assume this is very similar to #2457. Using bind_rows() on two tibbles where either one has extra attributes removes all extra attributes. Perhaps an approach similar to #1692 can be taken where the attributes of the first are kept? Ideally I would like bind_rows() to be generic but I’ve read all
data framefeature
tidyverse/dplyr#3259
NA
NA
tidyverse
dplyr


3314
2018-01-19 14:09:20
Link to tidyselect from ?select and ?rename
krlmlr
2
The documentation of select() and rename() should link to the functions in tidyselect for more details and examples.
docs
tidyverse/dplyr#3314
NA
NA
tidyverse
dplyr


3357
2018-02-13 15:35:15
order_by() could have an error hint when confused with arrange()
econandrew
2
For those of us who sometimes have SQL on the brain… > df <- df %>% order_by(value) Error: call must be a function call, not a symbol > df <- df %>% order_by(-value) Error: Can’t use matrix or array for column indexing > df <- df %>% arrange(-value) e.g. Did you mean to use arrange()?
data framefeature
tidyverse/dplyr#3357
NA
NA
tidyverse
dplyr


1092
2015-04-21 18:26:54
Support for integer64 column in data frame?
coloneltriq
1
group_by() doesn’t appear to support integer64 columns. I saw an issue from about a year ago that said that support for 64 bit integers wasn’t available, but might be added. Has anything changed in that regard? This is a significant issue for me, as I’m dealing with a database whose index column
data framefeature
tidyverse/dplyr#1092
10

tidyverse
dplyr


2183
2016-10-18 14:57:49
feature request : add merge indicator after a merge in dplyr
randomgambit
1
Hello there, Congrats for the great work here! I have a suggestion to make. Is there a way to get the equivalent of a _merge indicator variable after a merge in Dplyr? Something similar to Pandas’ indicator = True option that essentially tells you how the merge went (how many matches from each datas
featuregeneric
tidyverse/dplyr#2183
NA
NA
tidyverse
dplyr


2355
2017-01-07 21:53:18
Use serialization for columns of type “list”
krlmlr
1
in join (#2194) and distinct (#2222) operations. We should be able to serialize all elements of a list efficiently by calling R_Serialize() for each. A specialized version of JoinVisitorImpl and VectorVisitor would then operate on the serializations. Perhaps hashing the serialization will be good en
data framefeature
tidyverse/dplyr#2355
NA
NA
tidyverse
dplyr


2922
2017-06-27 20:20:32
Investigate R_ObjectTables
krlmlr
1
instead of bindrcpp. benchmark robustness drop-in replacement? See RProtoBuf for an example implementation: https://github.com/eddelbuettel/rprotobuf/blob/20cc4ab41b36c9582adeed182f1c429e38df6be4/src/lookup.cpp#L222. Existing package: http://www.omegahat.net/RObjectTables/.
data framefeature
tidyverse/dplyr#2922
NA
NA
tidyverse
dplyr


2984
2017-07-19 21:59:06
Show example of vars() with mutate_at() in select_helpers
eamoncaddigan
1
Tidy evaluation has hit CRAN and data analyzers everywhere are probably stumbling over it (I know I did). The deprecation warning that appeared during a call mutate_each() was helpful for me, but perusing the docs wasn’t enough to get my code to work. The documentation for select_helpers would be m
docs
tidyverse/dplyr#2984
NA
NA
tidyverse
dplyr


3059
2017-08-28 14:22:37
Add warning when two different timezones are joined
danielsjf
1
This relates to this issue: #2643 The decision was taken to convert to UTC when two different posixct columns are joined. Indeed, the timezone is often only used for presentation, but some functions such as lubridate::year() work on the current timezone. Therefore such a decision could have an impac
data framefeature
tidyverse/dplyr#3059
NA
NA
tidyverse
dplyr


3128
2017-09-27 15:21:23
More precise error messages for mutate() et al.
krlmlr
1
When propagating an error from the R interpreter, we should mention specifically what caused the error, because the messages may not be very helpful, as in this SO example where the cause is a mistyped column name but error is simply: Error in mutate_impl(.data, dots): cannot coerce type ‘closure’ t
data framefeature
tidyverse/dplyr#3128
NA
NA
tidyverse
dplyr


3205
2017-11-13 14:40:45
translate_sql, as.character in combination with %in%
jessekps
1
dbplyr version 1.1.0 If I use %in% in combination with as.character, translate_sql generates SQL that is not valid for sqlite library(RSQLite) library(dbplyr) db = dbConnect(SQLite(), ‘:memory:’) translate_sql(booklet_id %in% as.character(1:4), con=db ) #>  “booklet_id” IN CAST((1, 2, 3, 4) AS
bugdatabase
tidyverse/dplyr#3205
NA
NA
tidyverse
dplyr


3267
2017-12-29 21:28:00
Feature Request: Preferred Column Values After Merge
billdenney
1
I often want to join two data.frames and then select the “best” result from the output columns. This happens when I may have two sources of information with partially overlapping information. One source may be more reliable than the other, so I would prefer to use source 1 if it has a value. If so
featuregeneric
tidyverse/dplyr#3267
NA
NA
tidyverse
dplyr


3278
2018-01-01 21:45:11
bind_rows should have col_type argument
hadley
1
So you could prespecify the types of the columns. For performance and safety.
data framefeature
tidyverse/dplyr#3278
NA
NA
tidyverse
dplyr


3335
2018-02-03 18:24:15
filtering wide tibble is slow
cnjr2
1
I find that filtering operations can be quite slow with wide tibbles. Here is an example of a 500 x 100,001 table (which is still quite modest), where the first column has a sample_id information. library(dplyr) library(purrr) n_samples <- 500 n_features <- 100000 df <- bind_cols( tibble(sample_
data frameperformance
tidyverse/dplyr#3335
NA
NA
tidyverse
dplyr


3347
2018-02-08 01:03:51
Teradata ROW_NUMBER() OVER (PARTITION BY …) issue
jakefrost
1
Hi all, thanks for all your work on the Teradata translations for dbplyr. One issue I’ve come across is that ROW_NUMBER() window functions generated by dbplyr produce errors. For example, if I run this code: flights %>% select(record_id, record_create_dt, acct_num, dep_dt, origin) %>% group_by
database
tidyverse/dplyr#3347
NA
NA
tidyverse
dplyr


3370
2018-02-22 05:47:07
transmute does not work with DBI
danielmcauley
1
transmute() exhibits desired behavior with data frames but throws an error when used in the context of a DBI. # this runs library(dplyr) transmute(mtcars, blah = cyl) # this does not library(dbplyr) con <- DBI::dbConnect(RSQLite::SQLite(), “:memory:”) copy_to(con, mtcars) mtcars2 <- tbl(con, “mt
databasereprex
tidyverse/dplyr#3370
NA
NA
tidyverse
dplyr


3383
2018-02-28 21:08:27
Update performance measurements
krlmlr
1
per #2557 (comment), CC @lionel-. Perhaps add tests from tpchr.
data frameperformance
tidyverse/dplyr#3383
NA
NA
tidyverse
dplyr


3429
2018-03-15 14:42:08
Implement and use reconstruct() internally
krlmlr
1
to get rid of redundant tbl_df methods which complicate navigating the code. When sloop is ready, we’ll be ready to switch.
architecturefeaturegeneric
tidyverse/dplyr#3429
NA
NA
tidyverse
dplyr
number	created_at	title	author	thumbs_up_no	body	labels	url	milestone_no	milestone_desc	owner	repo
2240	2016-11-08 08:46:46	join_by(): Syntax for generic joins	krlmlr	34	hadley#557 (comment) and hadley#378 (comment) propose a syntax for generic and rolling joins: left_join( FundMonths, Returns, join_by(FundID == FundID, yearmonth > gmonth + 3, yearmonth <= gmonth + 15) ) left_join( events, days, join_by(collector_id == collector_id, event_timestamp >= la	featuregeneric	tidyverse/dplyr#2240	NA	NA	tidyverse	dplyr
341	2014-03-20 12:57:03	Preserve zero-length groups	hadley	9	http://stackoverflow.com/questions/22523131 Not sure what the interface to this should be - probably should default to drop = FALSE.	data framefeature	tidyverse/dplyr#341	10		tidyverse	dplyr
2326	2016-12-15 23:18:13	Single table verbs should accept tibbles in conditions	hadley	8	Currently mutate() and summarise() only work with vectorised functions: functions that take a vector as input and return a vector (or “scalar”) as output. I don’t see any reason why summarise() and mutate() couldn’t also accept tibbles. The existing restrictions would continue to apply so that in s	data framefeature	tidyverse/dplyr#2326	NA	NA	tidyverse	dplyr
2047	2016-07-30 22:11:31	FR: `before` and `after` arguments to mutate()	krlmlr	6	These arguments would specify the position where the new columns are inserted, before or after some column given as index or name. See also tidyverse/tibble#99.	featuregeneric	tidyverse/dplyr#2047	NA	NA	tidyverse	dplyr
1185	2015-05-30 17:43:23	Create a group_indices as a new variable	matthieugomez	5	Some packages like ggplot2 act on groups defined by one variable only (as opposed to groups defined by several variables). It would be nice to have a function, say group(), that creates a new integer variable from groups defined by multiple variables: Batting %>% mutate(group = group(teamID, yearID)	data framefeature	tidyverse/dplyr#1185	NA	NA	tidyverse	dplyr
1792	2016-04-28 15:09:13	set_key	hadley	5	library(nycflights13) weather <- flights %>% set_key(year, month, day, hour, origin) planes <- planes %>% set_key(tailnum) airlines <- airlines %>% set_key(carrier) airports <- airports %>% set_key(faa) This would check that the combination of variables is a valid key (i.e. no duplicates and no miss	data framefeature	tidyverse/dplyr#1792	10		tidyverse	dplyr
977	2015-02-17 19:26:36	n_distinct way slower than length(unique)	dpeterson71	3	I would like to move to more uniform implementation of dplyr memes; I really like the syntax. However, I am seeing several instances where dplyr analogues to plyr or base-R functions incur a severe performance hit on my data sets. Here is a simple example ilustrating that dplyr’s n_distinct is a fa	data framefeatureperformance	tidyverse/dplyr#977	10		tidyverse	dplyr
2132	2016-09-21 17:42:51	Summarising verbs with variable-length outputs	lionel-	2	A new dplyr family of verbs for variable-length output may be useful. Like summarise() it would discard all input columns except for the grouping variables. This allows the output to have a different number of rows than the input. Unlike summarise(), it would not require length 1 results and would	featuregeneric	tidyverse/dplyr#2132	NA	NA	tidyverse	dplyr
2432	2017-02-16 20:47:38	Better support combining for non-base types	hadley	2	This is a meta issue related for all bind/grouped-mutate/join/filter issues related to custom S3 + S4 classes chron (#1829) lubridate (#1581, #1708) difftime (#2059) table (#2406) POSIXct (#2322) bit64 (#3210) See also r-lib/vctrs#27	bugdata frame	tidyverse/dplyr#2432	13		tidyverse	dplyr
2993	2017-07-23 09:06:53	DBI sources can’t provide sampling implementation	hannesmuehleisen	2	When using src_sql, it is possible to attach a custom class to tbl objects. But the documentation for src_sql states that it is deprecated and that src_dbi should be used instead. When using src_dbi, no custom class can be set for tbl objects (at least not as far as I can tell). This is mostly fine,	databasefeaturegeneric	tidyverse/dplyr#2993	NA	NA	tidyverse	dplyr
2995	2017-07-25 01:19:26	dplyr feature request, between() for character variables.	lhunsicker	2	I was referred here by the folks at RStudio. I’m not sure that I’m in the right place, as this is not a bug, but a suggestion for a new feature. If there is a more appropriate place for this, let me know. The between() function in dplyr is very nice and much appreciated. But it presently only w	data framefeature	tidyverse/dplyr#2995	NA	NA	tidyverse	dplyr
3259	2017-12-22 14:53:54	bind_rows() using tibbles with attributes loses attributes	DavisVaughan	2	I assume this is very similar to #2457. Using bind_rows() on two tibbles where either one has extra attributes removes all extra attributes. Perhaps an approach similar to #1692 can be taken where the attributes of the first are kept? Ideally I would like bind_rows() to be generic but I’ve read all	data framefeature	tidyverse/dplyr#3259	NA	NA	tidyverse	dplyr
3314	2018-01-19 14:09:20	Link to tidyselect from ?select and ?rename	krlmlr	2	The documentation of select() and rename() should link to the functions in tidyselect for more details and examples.	docs	tidyverse/dplyr#3314	NA	NA	tidyverse	dplyr
3357	2018-02-13 15:35:15	order_by() could have an error hint when confused with arrange()	econandrew	2	For those of us who sometimes have SQL on the brain… > df <- df %>% order_by(value) Error: `call` must be a function call, not a symbol > df <- df %>% order_by(-value) Error: Can’t use matrix or array for column indexing > df <- df %>% arrange(-value) e.g. Did you mean to use arrange()?	data framefeature	tidyverse/dplyr#3357	NA	NA	tidyverse	dplyr
1092	2015-04-21 18:26:54	Support for integer64 column in data frame?	coloneltriq	1	group_by() doesn’t appear to support integer64 columns. I saw an issue from about a year ago that said that support for 64 bit integers wasn’t available, but might be added. Has anything changed in that regard? This is a significant issue for me, as I’m dealing with a database whose index column	data framefeature	tidyverse/dplyr#1092	10		tidyverse	dplyr
2183	2016-10-18 14:57:49	feature request : add `merge` indicator after a merge in dplyr	randomgambit	1	Hello there, Congrats for the great work here! I have a suggestion to make. Is there a way to get the equivalent of a _merge indicator variable after a merge in Dplyr? Something similar to Pandas’ indicator = True option that essentially tells you how the merge went (how many matches from each datas	featuregeneric	tidyverse/dplyr#2183	NA	NA	tidyverse	dplyr
2355	2017-01-07 21:53:18	Use serialization for columns of type “list”	krlmlr	1	in join (#2194) and distinct (#2222) operations. We should be able to serialize all elements of a list efficiently by calling R_Serialize() for each. A specialized version of JoinVisitorImpl and VectorVisitor would then operate on the serializations. Perhaps hashing the serialization will be good en	data framefeature	tidyverse/dplyr#2355	NA	NA	tidyverse	dplyr
2922	2017-06-27 20:20:32	Investigate R_ObjectTables	krlmlr	1	instead of bindrcpp. benchmark robustness drop-in replacement? See RProtoBuf for an example implementation: https://github.com/eddelbuettel/rprotobuf/blob/20cc4ab41b36c9582adeed182f1c429e38df6be4/src/lookup.cpp#L222. Existing package: http://www.omegahat.net/RObjectTables/.	data framefeature	tidyverse/dplyr#2922	NA	NA	tidyverse	dplyr
2984	2017-07-19 21:59:06	Show example of vars() with mutate_at() in select_helpers	eamoncaddigan	1	Tidy evaluation has hit CRAN and data analyzers everywhere are probably stumbling over it (I know I did). The deprecation warning that appeared during a call mutate_each() was helpful for me, but perusing the docs wasn’t enough to get my code to work. The documentation for select_helpers would be m	docs	tidyverse/dplyr#2984	NA	NA	tidyverse	dplyr
3059	2017-08-28 14:22:37	Add warning when two different timezones are joined	danielsjf	1	This relates to this issue: #2643 The decision was taken to convert to UTC when two different posixct columns are joined. Indeed, the timezone is often only used for presentation, but some functions such as lubridate::year() work on the current timezone. Therefore such a decision could have an impac	data framefeature	tidyverse/dplyr#3059	NA	NA	tidyverse	dplyr
3128	2017-09-27 15:21:23	More precise error messages for mutate() et al.	krlmlr	1	When propagating an error from the R interpreter, we should mention specifically what caused the error, because the messages may not be very helpful, as in this SO example where the cause is a mistyped column name but error is simply: Error in mutate_impl(.data, dots): cannot coerce type ‘closure’ t	data framefeature	tidyverse/dplyr#3128	NA	NA	tidyverse	dplyr
3205	2017-11-13 14:40:45	translate_sql, as.character in combination with %in%	jessekps	1	dbplyr version 1.1.0 If I use %in% in combination with as.character, translate_sql generates SQL that is not valid for sqlite library(RSQLite) library(dbplyr) db = dbConnect(SQLite(), ‘:memory:’) translate_sql(booklet_id %in% as.character(1:4), con=db ) #> “booklet_id” IN CAST((1, 2, 3, 4) AS	bugdatabase	tidyverse/dplyr#3205	NA	NA	tidyverse	dplyr
3267	2017-12-29 21:28:00	Feature Request: Preferred Column Values After Merge	billdenney	1	I often want to join two data.frames and then select the “best” result from the output columns. This happens when I may have two sources of information with partially overlapping information. One source may be more reliable than the other, so I would prefer to use source 1 if it has a value. If so	featuregeneric	tidyverse/dplyr#3267	NA	NA	tidyverse	dplyr
3278	2018-01-01 21:45:11	bind_rows should have col_type argument	hadley	1	So you could prespecify the types of the columns. For performance and safety.	data framefeature	tidyverse/dplyr#3278	NA	NA	tidyverse	dplyr
3335	2018-02-03 18:24:15	filtering wide tibble is slow	cnjr2	1	I find that filtering operations can be quite slow with wide tibbles. Here is an example of a 500 x 100,001 table (which is still quite modest), where the first column has a sample_id information. library(dplyr) library(purrr) n_samples <- 500 n_features <- 100000 df <- bind_cols( tibble(sample_	data frameperformance	tidyverse/dplyr#3335	NA	NA	tidyverse	dplyr
3347	2018-02-08 01:03:51	Teradata ROW_NUMBER() OVER (PARTITION BY …) issue	jakefrost	1	Hi all, thanks for all your work on the Teradata translations for dbplyr. One issue I’ve come across is that ROW_NUMBER() window functions generated by dbplyr produce errors. For example, if I run this code: flights %>% select(record_id, record_create_dt, acct_num, dep_dt, origin) %>% group_by	database	tidyverse/dplyr#3347	NA	NA	tidyverse	dplyr
3370	2018-02-22 05:47:07	transmute does not work with DBI	danielmcauley	1	transmute() exhibits desired behavior with data frames but throws an error when used in the context of a DBI. # this runs library(dplyr) transmute(mtcars, blah = cyl) # this does not library(dbplyr) con <- DBI::dbConnect(RSQLite::SQLite(), “:memory:”) copy_to(con, mtcars) mtcars2 <- tbl(con, “mt	databasereprex	tidyverse/dplyr#3370	NA	NA	tidyverse	dplyr
3383	2018-02-28 21:08:27	Update performance measurements	krlmlr	1	per #2557 (comment), CC @lionel-. Perhaps add tests from tpchr.	data frameperformance	tidyverse/dplyr#3383	NA	NA	tidyverse	dplyr
3429	2018-03-15 14:42:08	Implement and use reconstruct() internally	krlmlr	1	to get rid of redundant tbl_df methods which complicate navigating the code. When sloop is ready, we’ll be ready to switch.	architecturefeaturegeneric	tidyverse/dplyr#3429	NA	NA	tidyverse	dplyr