Skip to content

Instantly share code, notes, and snippets.

@maelle
Created April 12, 2018 13:11
Show Gist options
  • Save maelle/c99dda9813d88ab4ad6ffab9a2b1de2b to your computer and use it in GitHub Desktop.
Save maelle/c99dda9813d88ab4ad6ffab9a2b1de2b to your computer and use it in GitHub Desktop.
Thumbs up via ghrecipes

You can get issues ordered by thumbs up votes on the first comment via ghrecipes! An idea of Kirill Müller’s

issues <- ghrecipes::get_issues_thumbs(owner = "tidyverse", 
                                          repo = "dplyr")

You could View(issues) or e.g.

issues <- dplyr::mutate_if(issues, is.character, stringr::str_squish)
knitr::kable(issues)
number created_at title author thumbs_up_no body labels url milestone_no milestone_desc owner repo
2240 2016-11-08 08:46:46 join_by(): Syntax for generic joins krlmlr 34 hadley#557 (comment) and hadley#378 (comment) propose a syntax for generic and rolling joins: left_join( FundMonths, Returns, join_by(FundID == FundID, yearmonth > gmonth + 3, yearmonth <= gmonth + 15) ) left_join( events, days, join_by(collector_id == collector_id, event_timestamp >= la featuregeneric tidyverse/dplyr#2240 NA NA tidyverse dplyr
341 2014-03-20 12:57:03 Preserve zero-length groups hadley 9 http://stackoverflow.com/questions/22523131 Not sure what the interface to this should be - probably should default to drop = FALSE. data framefeature tidyverse/dplyr#341 10 tidyverse dplyr
2326 2016-12-15 23:18:13 Single table verbs should accept tibbles in conditions hadley 8 Currently mutate() and summarise() only work with vectorised functions: functions that take a vector as input and return a vector (or “scalar”) as output. I don’t see any reason why summarise() and mutate() couldn’t also accept tibbles. The existing restrictions would continue to apply so that in s data framefeature tidyverse/dplyr#2326 NA NA tidyverse dplyr
2047 2016-07-30 22:11:31 FR: before and after arguments to mutate() krlmlr 6 These arguments would specify the position where the new columns are inserted, before or after some column given as index or name. See also tidyverse/tibble#99. featuregeneric tidyverse/dplyr#2047 NA NA tidyverse dplyr
1185 2015-05-30 17:43:23 Create a group_indices as a new variable matthieugomez 5 Some packages like ggplot2 act on groups defined by one variable only (as opposed to groups defined by several variables). It would be nice to have a function, say group(), that creates a new integer variable from groups defined by multiple variables: Batting %>% mutate(group = group(teamID, yearID) data framefeature tidyverse/dplyr#1185 NA NA tidyverse dplyr
1792 2016-04-28 15:09:13 set_key hadley 5 library(nycflights13) weather <- flights %>% set_key(year, month, day, hour, origin) planes <- planes %>% set_key(tailnum) airlines <- airlines %>% set_key(carrier) airports <- airports %>% set_key(faa) This would check that the combination of variables is a valid key (i.e. no duplicates and no miss data framefeature tidyverse/dplyr#1792 10 tidyverse dplyr
977 2015-02-17 19:26:36 n_distinct way slower than length(unique) dpeterson71 3 I would like to move to more uniform implementation of dplyr memes; I really like the syntax. However, I am seeing several instances where dplyr analogues to plyr or base-R functions incur a severe performance hit on my data sets. Here is a simple example ilustrating that dplyr’s n_distinct is a fa data framefeatureperformance tidyverse/dplyr#977 10 tidyverse dplyr
2132 2016-09-21 17:42:51 Summarising verbs with variable-length outputs lionel- 2 A new dplyr family of verbs for variable-length output may be useful. Like summarise() it would discard all input columns except for the grouping variables. This allows the output to have a different number of rows than the input. Unlike summarise(), it would not require length 1 results and would featuregeneric tidyverse/dplyr#2132 NA NA tidyverse dplyr
2432 2017-02-16 20:47:38 Better support combining for non-base types hadley 2 This is a meta issue related for all bind/grouped-mutate/join/filter issues related to custom S3 + S4 classes chron (#1829) lubridate (#1581, #1708) difftime (#2059) table (#2406) POSIXct (#2322) bit64 (#3210) See also r-lib/vctrs#27 bugdata frame tidyverse/dplyr#2432 13 tidyverse dplyr
2993 2017-07-23 09:06:53 DBI sources can’t provide sampling implementation hannesmuehleisen 2 When using src_sql, it is possible to attach a custom class to tbl objects. But the documentation for src_sql states that it is deprecated and that src_dbi should be used instead. When using src_dbi, no custom class can be set for tbl objects (at least not as far as I can tell). This is mostly fine, databasefeaturegeneric tidyverse/dplyr#2993 NA NA tidyverse dplyr
2995 2017-07-25 01:19:26 dplyr feature request, between() for character variables. lhunsicker 2 I was referred here by the folks at RStudio. I’m not sure that I’m in the right place, as this is not a bug, but a suggestion for a new feature. If there is a more appropriate place for this, let me know. The between() function in dplyr is very nice and much appreciated. But it presently only w data framefeature tidyverse/dplyr#2995 NA NA tidyverse dplyr
3259 2017-12-22 14:53:54 bind_rows() using tibbles with attributes loses attributes DavisVaughan 2 I assume this is very similar to #2457. Using bind_rows() on two tibbles where either one has extra attributes removes all extra attributes. Perhaps an approach similar to #1692 can be taken where the attributes of the first are kept? Ideally I would like bind_rows() to be generic but I’ve read all data framefeature tidyverse/dplyr#3259 NA NA tidyverse dplyr
3314 2018-01-19 14:09:20 Link to tidyselect from ?select and ?rename krlmlr 2 The documentation of select() and rename() should link to the functions in tidyselect for more details and examples. docs tidyverse/dplyr#3314 NA NA tidyverse dplyr
3357 2018-02-13 15:35:15 order_by() could have an error hint when confused with arrange() econandrew 2 For those of us who sometimes have SQL on the brain… > df <- df %>% order_by(value) Error: call must be a function call, not a symbol > df <- df %>% order_by(-value) Error: Can’t use matrix or array for column indexing > df <- df %>% arrange(-value) e.g. Did you mean to use arrange()? data framefeature tidyverse/dplyr#3357 NA NA tidyverse dplyr
1092 2015-04-21 18:26:54 Support for integer64 column in data frame? coloneltriq 1 group_by() doesn’t appear to support integer64 columns. I saw an issue from about a year ago that said that support for 64 bit integers wasn’t available, but might be added. Has anything changed in that regard? This is a significant issue for me, as I’m dealing with a database whose index column data framefeature tidyverse/dplyr#1092 10 tidyverse dplyr
2183 2016-10-18 14:57:49 feature request : add merge indicator after a merge in dplyr randomgambit 1 Hello there, Congrats for the great work here! I have a suggestion to make. Is there a way to get the equivalent of a _merge indicator variable after a merge in Dplyr? Something similar to Pandas’ indicator = True option that essentially tells you how the merge went (how many matches from each datas featuregeneric tidyverse/dplyr#2183 NA NA tidyverse dplyr
2355 2017-01-07 21:53:18 Use serialization for columns of type “list” krlmlr 1 in join (#2194) and distinct (#2222) operations. We should be able to serialize all elements of a list efficiently by calling R_Serialize() for each. A specialized version of JoinVisitorImpl and VectorVisitor would then operate on the serializations. Perhaps hashing the serialization will be good en data framefeature tidyverse/dplyr#2355 NA NA tidyverse dplyr
2922 2017-06-27 20:20:32 Investigate R_ObjectTables krlmlr 1 instead of bindrcpp. benchmark robustness drop-in replacement? See RProtoBuf for an example implementation: https://github.com/eddelbuettel/rprotobuf/blob/20cc4ab41b36c9582adeed182f1c429e38df6be4/src/lookup.cpp#L222. Existing package: http://www.omegahat.net/RObjectTables/. data framefeature tidyverse/dplyr#2922 NA NA tidyverse dplyr
2984 2017-07-19 21:59:06 Show example of vars() with mutate_at() in select_helpers eamoncaddigan 1 Tidy evaluation has hit CRAN and data analyzers everywhere are probably stumbling over it (I know I did). The deprecation warning that appeared during a call mutate_each() was helpful for me, but perusing the docs wasn’t enough to get my code to work. The documentation for select_helpers would be m docs tidyverse/dplyr#2984 NA NA tidyverse dplyr
3059 2017-08-28 14:22:37 Add warning when two different timezones are joined danielsjf 1 This relates to this issue: #2643 The decision was taken to convert to UTC when two different posixct columns are joined. Indeed, the timezone is often only used for presentation, but some functions such as lubridate::year() work on the current timezone. Therefore such a decision could have an impac data framefeature tidyverse/dplyr#3059 NA NA tidyverse dplyr
3128 2017-09-27 15:21:23 More precise error messages for mutate() et al. krlmlr 1 When propagating an error from the R interpreter, we should mention specifically what caused the error, because the messages may not be very helpful, as in this SO example where the cause is a mistyped column name but error is simply: Error in mutate_impl(.data, dots): cannot coerce type ‘closure’ t data framefeature tidyverse/dplyr#3128 NA NA tidyverse dplyr
3205 2017-11-13 14:40:45 translate_sql, as.character in combination with %in% jessekps 1 dbplyr version 1.1.0 If I use %in% in combination with as.character, translate_sql generates SQL that is not valid for sqlite library(RSQLite) library(dbplyr) db = dbConnect(SQLite(), ‘:memory:’) translate_sql(booklet_id %in% as.character(1:4), con=db ) #> “booklet_id” IN CAST((1, 2, 3, 4) AS bugdatabase tidyverse/dplyr#3205 NA NA tidyverse dplyr
3267 2017-12-29 21:28:00 Feature Request: Preferred Column Values After Merge billdenney 1 I often want to join two data.frames and then select the “best” result from the output columns. This happens when I may have two sources of information with partially overlapping information. One source may be more reliable than the other, so I would prefer to use source 1 if it has a value. If so featuregeneric tidyverse/dplyr#3267 NA NA tidyverse dplyr
3278 2018-01-01 21:45:11 bind_rows should have col_type argument hadley 1 So you could prespecify the types of the columns. For performance and safety. data framefeature tidyverse/dplyr#3278 NA NA tidyverse dplyr
3335 2018-02-03 18:24:15 filtering wide tibble is slow cnjr2 1 I find that filtering operations can be quite slow with wide tibbles. Here is an example of a 500 x 100,001 table (which is still quite modest), where the first column has a sample_id information. library(dplyr) library(purrr) n_samples <- 500 n_features <- 100000 df <- bind_cols( tibble(sample_ data frameperformance tidyverse/dplyr#3335 NA NA tidyverse dplyr
3347 2018-02-08 01:03:51 Teradata ROW_NUMBER() OVER (PARTITION BY …) issue jakefrost 1 Hi all, thanks for all your work on the Teradata translations for dbplyr. One issue I’ve come across is that ROW_NUMBER() window functions generated by dbplyr produce errors. For example, if I run this code: flights %>% select(record_id, record_create_dt, acct_num, dep_dt, origin) %>% group_by database tidyverse/dplyr#3347 NA NA tidyverse dplyr
3370 2018-02-22 05:47:07 transmute does not work with DBI danielmcauley 1 transmute() exhibits desired behavior with data frames but throws an error when used in the context of a DBI. # this runs library(dplyr) transmute(mtcars, blah = cyl) # this does not library(dbplyr) con <- DBI::dbConnect(RSQLite::SQLite(), “:memory:”) copy_to(con, mtcars) mtcars2 <- tbl(con, “mt databasereprex tidyverse/dplyr#3370 NA NA tidyverse dplyr
3383 2018-02-28 21:08:27 Update performance measurements krlmlr 1 per #2557 (comment), CC @lionel-. Perhaps add tests from tpchr. data frameperformance tidyverse/dplyr#3383 NA NA tidyverse dplyr
3429 2018-03-15 14:42:08 Implement and use reconstruct() internally krlmlr 1 to get rid of redundant tbl_df methods which complicate navigating the code. When sloop is ready, we’ll be ready to switch. architecturefeaturegeneric tidyverse/dplyr#3429 NA NA tidyverse dplyr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment