class: center, middle, inverse, title-slide .title[ # A toolbox for debugging and refactoring in R ] .author[ ### Antoine Fabri ] .author[ ### cynkra GmbH ] .date[ ### June 8th, 2023 ] --- class: small20 <style type="text/css"> .small20 { font-size: 20px; } .big40 { font-size: 40px; } .xlarge { font-size: 150% } .large { font-size: 130% } .medium { font-size: 80% } .small { font-size: 70% } .xsmall { font-size: 50% } .caption { text-align: center; font-size: .8rem; } .small-code .remark-code{ font-size: 80% } .xsmall-code .remark-code{ font-size: 50% } </style> ## Hi I'm Antoine .pull-left[ ] .pull-right[ ] .pull-left[ * https://www.cynkra.com/about/ * https://github.com/moodymudskipper * https://github.com/cynkra * https://stackoverflow.com/ * https://twitter.com/antoine_fabri ] .pull-right[ ![](data:image/png;base64,#images/me.jpg) ] --- class: small20 ## Resources .small-code[ ```r install.packages(c("tidyverse", "lintr", "styler", "usethis", "renv", "covr", "flow", "constructive")) remotes::install_github("moodymudskipper/refactor") remotes::install_github("moodymudskipper/boomer") ``` ] - {flow} : https://moodymudskipper.github.io/flow/ - {constructive} : https://cynkra.github.io/constructive/ - {boomer} : https://moodymudskipper.github.io/boomer/ - {refactor} : https://github.com/moodymudskipper/refactor - Good practice: https://design.tidyverse.org - Style: https://style.tidyverse.org - Git: https://happygitwithr.com - Packages: https://r-pkgs.org - Base R debugging: * https://adv-r.hadley.nz/debugging.html * https://www.youtube.com/watch?v=9vABzGCQeqU --- class: big40 ## Warm up - **Warm up** 📌 - Setup - Clean up - Step up - Fix up - Wrap up --- ## Warm up: Bugs everywhere - You'll spend about half of your time fixing bugs or introducing new ones - Debbuging : bugs bugs bugs - Refactoring : planning for less bugs, introducing new bugs - Features : design them, try to break them, solve the bugs - R users are not developers --- class: small20 ## Warm up: Mental game - You really really want to believe that the code you wrote is good - You don't know when you'll be done, your boss is waiting, your client is paying - You feel like an imposter because it "should be easy" - You catch yourself just staring at the screen, you're not even thinking anymore - Or you're thinking hard, but you don't really know what you're thinking about .center[ ![](data:image/png;base64,#images/debugger.png) .caption[https://www.xkcd.com/] ] --- ## Warm up: Mental game .pull-left[ Confused? Distressed? - Have a walk - Write it down - Talk it out ] .pull-right[.center[ ![](data:image/png;base64,#images/duck.png) .caption[https://www.smbc-comics.com/] ]] <!-- You don't need to be as smart as your younger brother, you'll beat him with experience, knowledge, wisdom, mental fortitude. And also, you shouldn't care about being better than your brother --> --- ## Warm up: Art or science ? A bit of both ? .pull-left[ <img src="data:image/png;base64,#images/art.png" style="width: 80%" /> ] .pull-right[ <img src="data:image/png;base64,#images/science.jpg" style="width: 80%" /> ] --- ## Warm up: How to get out of trouble? Don't get in trouble! .center[ <img src="data:image/png;base64,#images/wondering.jpg" style="width: 50%" /> ] - There are ways to get out of trouble - But did you really have to get into trouble ? --- ## Warm up: How not to get into trouble - Follow good practice for your own code - Depend on good code, don't reinvent the wheel - Don't let technical debt accumulate .center[ <img src="data:image/png;base64,#images/debt.png" style="width: 80%" /> ] --- ## Warm up: Technical debt - Cost of taking shortcuts or making compromises - Tight project deadlines, - Lack of resources - Inexperience - Devs just wanna have fun ? - ... Refactoring = investment to reduce technical debt (and recover your sanity) --- class: big40 ## Setup - Warm up - **Setup** 📌 - Clean up - Step up - Fix up - Wrap up --- ## Setup * We have a messy code base (maybe not your code) * We want a clean package to enjoy dedicated tools ... Where do we start ? ![](data:image/png;base64,#images/hadoken.jpeg) --- class: small20 ## Setup A path to refactor a messy codebase into a package. Real project : order might not be strict, steps might overlap. - Create a project - Version control - Syntactic code - No absolute paths - Create a package - Declare dependencies - Extract existing functions --- ## Setup: Create a project - Create a project 📌 - Version control - Syntactic code - No absolute paths - Create a package - Declare dependencies - Extract existing functions --- ## Setup: Create a project - No more messy flying scripts on your desktop ```r usethis::create_project() ``` Organization / collaboration / reproducibility... => Sets you up for the next steps --- ## Setup: Version control - Create a project : `usethis::create_project()` - Version control 📌 - Syntactic code - No absolute paths - Create a package - Declare dependencies - Extract existing functions --- ## Setup: Version control Without version control : - Email team about updates - Updates directly on production server - Previous code is lost, accidental deletions are deadly - No trace of who made the changes - Versions of code between users might be out of sync - Users shy to make any change --- ## Setup: Version control With Version control : - Version control itself is a communicating tool - Work on branches without affecting production code until confident - All changes can be reverted - All changes and their author can be identified - Everyone is synced - No harm is irreversible so more confident users --- ## Setup: Version control .pull-left[ * Not much knowledge needed to start * Can be done from RStudio * Jenny Bryan https://happygitwithr.com ```r usethis::use_git() ``` ] .pull-right[.center[ ![](data:image/png;base64,#images/git_2x.png) .caption[http://www.xkcd.com] ]] --- ## Setup: Syntactic code - Create a project : `usethis::create_project()` - Version control : `usethis::use_git()` - Syntactic code 📌 - No absolute paths - Create a package - Declare dependencies - Extract existing functions --- ## Setup: Syntactic code - Codebases often contain non syntactic code - Messy WIP files - Code uncarefully commented - Incorrect copy and paste - ... `refactor::check_files_parse()` will check all files of the project and make sure R scripts are really Rscripts and that their code is syntactic. ```r refactor::check_files_parse() ``` --- ## Setup: No absolute paths - Create a project : `usethis::create_project()` - Version control : `usethis::use_git()` - Syntactic code : `refactor::check_files_parse()` - No absolute paths 📌 - Create a package - Declare dependencies - Extract existing functions --- ## Setup: No absolute paths - Avoiding absolute paths is the norm in software development - They force all users to use the same directory layout - First reason why your code is not reproducible If external data stored in file outside of project : - Path to files should be set in environment variables, options or config files - Could you have a data package with those ? --- ## Setup: No absolute paths Relative paths, relative to what ? - Relative paths are relative to working directory - By default the project folder in R script if working in project - By default the Rmd file's folder in case of a report - A function might call `setwd()` and alter it and then your scripts don't work anymore - They are often build with `file.path()` Using `setwd()` sets you up for bad surprises, other scripts can use `setwd()` and disrupt our code, possibly writing the file at the wrong places etc --- ## Setup: No absolute paths `here::here()` creates a path relative to the project folder, when {here} is loaded it fetches the current working directory (often but not always the project root itself) and finds the project root using heuristics. - It guarantees your scripts and Rmds will refer to the same project root - Functions that use it won't be polluted by a user or function calling `setwd()` --- ## Setup: No absolute paths * Use {lintr} to detect and convert absolute paths, and to find problematic function calls. * Use `here::here()` in markdown reports so they have the same wd as your R scripts ```r ## Find absolute paths lint_dir(linters = absolute_path_linter()) ## Find uses of undesirable functions setwd and getwd lint_dir(linters = undesirable_function_linter(c(setwd = NA, getwd = NA))) ## in a markdown report here::here("hello", "world.png") ``` --- ## Setup: Create a package - Create a project : `usethis::create_project()` - Version control : `usethis::use_git()` - Syntactic code : `refactor::check_files_parse()` - No absolute paths : `lintr::lint_dir()` - Create a package 📌 - Declare dependencies - Extract existing functions --- ## Setup: Create a package - Let's make our current project a package! ```r usethis::create_package() # locally usethis::create_package(path) # at chosen location ``` - Move or copy our current project into a "inst/" subfolder. - Or edit `.Rbuildignore` to ignore some folders - We have a package! (with no object yet!) - Hadley Wickham https://r-pkgs.org --- ## Setup: Declare dependencies - Create a project : `usethis::create_project()` - Version control : `usethis::use_git()` - Syntactic code : `refactor::check_files_parse()` - No absolute paths : `lintr::lint_dir()` - Create a package : `usethis::create_package()` - Declare dependencies 📌 - Extract existing functions --- ## Setup: Declare dependencies - With scripts we declare dependencies with `library()` calls - Or we use the `dplyr::select()` notation and we often don't declare anything at all - For packages we need to add dependencies to the DESCRIPTION file, then we can use the `dplyr::select()` notation - If we want to call `select()` without prefix we also need to import it in our package --- ## Setup: Declare dependencies - `renv::dependencies()` will analyse your code and attempt to retrieve all dependencies. - Then edit the DESCRIPTION file to list those in `Imports`, or use `usethis::use_package("dplyr")` - For meta packages like {tidyverse}, mention names separately: ggplot2, tibble, tidyr, readr, purrr, dplyr, stringr, forcats --- ## Setup: Declare dependencies - Find all the `library()`/`require()` calls in your project - Create a "R/imports.R" script one line per packaged attached with `library()` : ```r #' @import dplyr #' @import ggplot2 NULL ``` - Ctrl + Shift + D ( or `devtools::document()`) will populate the NAMESPACE file - Ctrl + Shift + L ( or `devtools::load_all()`) will attach the imported functions --- class: small20 ## Setup: Declare dependencies - Create a project : `usethis::create_project()` - Version control : `usethis::use_git()` - Syntactic code : `refactor::check_files_parse()` - No absolute paths : `lintr::lint_dir()` - Create a package : `usethis::create_package()` - Declare dependencies : `renv::dependencies()` `usethis::use_package()` - Extract existing functions 📌 --- ## Setup: Extract existing functions Function definitions : - Are not expensive to source - Can be called in any order - Can be run before the analysis - Might clutter the scripts if they are not isolated - Are better ultimately stored in packages - Let's move them all to scripts under "R/" ```r # Find scripts that contain both function definitions and other objects `refactor::identify_hybrid_scripts()` ``` --- class: small20 ## Setup - Create a project : `usethis::create_project()` - Version control : `usethis::use_git()` - Syntactic code : `refactor::check_files_parse()` - No absolute paths : `lintr::lint_dir()` - Create a package : `usethis::create_package()` - Declare dependencies : `renv::dependencies()` `usethis::use_package()` - Extract existing functions : `refactor::identify_hybrid_scripts()` --- ## Clean up - Warm up - Setup - **Clean up** 📌 - Step up - Fix up - Wrap up --- ## Clean up * We have a working package * We didn't lose any information * We can recover an older situation if we observe unintended behavior * Now we can tighten things up a bit! --- ## Clean up: Unit tests - Unit tests 📌 - Extract new functions - lint and style - Explicit function imports - Strive for quiet code --- class: small20 ## Clean up: Unit tests Unit tests : * Ensure that your functions work * Ensure that what works keeps working - {testthat} is a common framework for unit tests - Snapshot tests are the easiest, they capture the output of a call and test it against subsequent runs - They play very well with version control - The priority is to write tests for functions you want to change .small-code[ ```r usethis::uses_testthat() usethis::use_test() expect_snapshot({ x <- "hello" y <- "world" my_function(x, y) }) ``` ] --- class: small20 ## Clean up: Unit tests coverage - Coverage is a measure of how much of the code is tested - Also makes testing a bit more fun .small-code[ ```r covr::report() ``` ] ![](data:image/png;base64,#images/coverage1.png) --- class: small20 ## Clean up: Unit tests coverage .small-code[ ```r covr::report() ``` ] ![](data:image/png;base64,#images/coverage2.png) --- ## Clean up: Extract new functions - Unit tests - Extract new functions 📌 - lint and style - Explicit function imports - Strive for quiet code --- ## Clean up: Extract new functions Scripts are easier to start than functions : - No need to think about a function name, no need to isolate arguments and output value - We can run them line by line, no need for `browser()` or `debug()`/`debugonce()` BUT : * They're a slippery slope, the garden grows! * If a script is confusing, it should probably be refactored into one or more functions * Too many functions : rarely an issue, what about too many scripts ? --- ## Clean up: Extract new functions * Build functions from scripts and place them in scripts under "R/" * Hunt source() calls and convert them to good function calls * Use `refactor::detect_similar_code()` to identify duplicated logic * Go through these resources : - https://design.tidyverse.org - https://style.tidyverse.org * Write new unit tests! --- ## Clean up: Extract new functions - A good script is self contained, it means - It loads all it needs - It writes its output and stops there, meaning it's not there to populate the global environment with variables for further scripts to pick up --- ## Clean up: Extract new functions - A good function works like a sourced script except it has : - It has well defined inputs aka arguments (with optional defaults) - A well defined return value OR a well defined side-effect - A scope - less worry about name collision --- ## Clean up: Extract new functions Pure functions: - A pure function's only effect is to return an output - This output depends only on its inputs Side effect functions: - Output invisibly their main argument or NULL Avoid hybrid functions! --- ## Clean up: lint and style - Unit tests - Extract new functions - lint and style 📌 - Explicit function imports - Strive for quiet code --- ## Clean up: lint and style - We can use the {lintr} package to improve our code step by step, we might use `lintr::lint_package()` - `styler::style_pkg()` will make your code look good and consistent. - `refactor:::use_lintr_template_on_dir()` will open a script where we can go through different linters ```r refactor:::use_lintr_template_on_dir() ``` --- ## Clean up: Explicit function imports - Unit tests - Extract new functions - lint and style - Explicit function imports 📌 - Strive for quiet code --- ## Clean up: Explicit function imports - "' @import dplyr" to "'@importFrom dplyr select" - "' @import dplyr" to `dplyr::select()` - to avoid conflicts due to new functions - to be notified quickly if a function disappears - to document what features we need from other packages ```r refactor::find_pkg_funs("dplyr") ``` --- ## Clean up: Strive for quiet code - Unit tests - Extract new functions - lint and style - Explicit function imports - Strive for quiet code 📌 --- ## Clean up: Strive for quiet code * Code that talks too much is a sign some things are not robust * We tend not to read it - Avoid most warnings as if they were errors - Avoid messages too when you can - Give an explicit argument to {dplyr} join functions - Provide the `col_types` argument to `readr::read_csv` - Ungroup your data!!! Use `.groups = "drop"` or `.by` - ... --- ## Clean up: Strive for quiet code - Unit tests - Extract new functions - lint and style - Explicit function imports - Strive for quiet code 📌 --- ## Clean up: Extra steps Not always required * Export functions called directly by the scripts * Install your package * Move remaining scripts to a different project that calls `library(yourpkg)` * OR Create Rmd report that call `devtools::load_all()` or `library(yourpkg)` --- ## Setup - Warm up - Setup - Clean up - **Step up** 📌 - Fix up - Wrap up --- ## Step up * We have a clean package * We have a set of steps/principle to apply to improve it and keep it clean Some additional tools might help * {refactor} * {flow} --- class: big40 ## Step up : {refactor} * {refactor} 📌 * {flow} --- ## Step up : {refactor} * `remotes::install_github("moodymudskipper/refactor")` * `%refactor_value%` * `%refactor_chunk%` - If functions are well tested we can change them fearlessly - But how much do you trust your tests ? - How nice would it be to test your refactoring safely on real cases continuously for some time ? --- class: small20 ## Step up : {refactor} `%refactor_value%` * Runs each side * Fails explicitly when the output is different * Often used with functions .small-code[ ```r library(refactor) # or import in your pkg multiply <- function(x, y) { purrr::reduce( replicate(y, x), .init = x, \(x, y) x + y ) - x } %refactor_value% { x * y } ``` ] --- class: small20 ## Step up : {refactor} Then we can use it for some time, both sides will be executed and we'll be notified if they give a different output .small-code[ ```r multiply(2, 3) ``` ``` ## [1] 6 ``` ```r multiply(2, 4.5) ``` ``` ## Error: The refactored expression returns a different value from the original one. ## ## `original`: 8 ## `refactored`: 9 ``` ] --- class: small20 ## Step up : {refactor} `%refactor_chunk%` * Runs each side * Fails explicitly when the environment changes are different * Useful in scripts .small-code[ ```r { data1 <- dplyr::filter(cars, speed < 5) data2 <- dplyr::mutate(data1, speed = speed * 1.60934, speed2 = speed * 1000/3600) } %refactor_chunk% { data1 <- subset(cars, speed < 5) data2 <- transform(data1, speed = speed * 1.60934, speed2 = speed * 1000/3600) } ``` ] --- class: small20 ## Step up : {refactor} .small-code[ ```r { data1 <- dplyr::filter(cars, speed < 5) data2 <- dplyr::mutate(data1, speed = speed * 1.60934, speed2 = speed * 1000/3600) } %refactor_chunk% { data1 <- subset(cars, speed < 5) data2 <- transform(data1, speed = speed * 1.60934, speed2 = speed * 1000/3600) } ``` ``` ## Error: The variable `data2` is bound to a different value after the original and refactored code ## original vs refactored ## speed2 ## - original[1, ] 1.788156 ## + refactored[1, ] 1.111111 ## - original[2, ] 1.788156 ## + refactored[2, ] 1.111111 ## ## `original$speed2`: 1.8 1.8 ## `refactored$speed2`: 1.1 1.1 ``` ] --- class: big40 ## Step up : {flow} * {refactor} * {flow} 📌 --- class: small20 ## Step up : {flow} {flow} helps you visualize: * The logic of individual functions or scripts * The dependencies between variables in a given function or script * The dependencies between functions in a package, or scripts in a folder ```r library(flow) ``` --- class: small20 ## Step up : {flow} Visualize the **logic of individual functions or scripts** * `flow_view()` can be used on functions or paths .xsmall-code[ .pull-left[ ```r flow_view(rle) ``` .center[ <img src="data:image/png;base64,#images/flow_view_rle.png" style="width: 80%" /> ] ] .pull-right[ ```r rle ``` ``` ## function (x) ## { ## if (!is.vector(x) && !is.list(x)) ## stop("'x' must be a vector of an atomic type") ## n <- length(x) ## if (n == 0L) ## return(structure(list(lengths = integer(), values = x), ## class = "rle")) ## y <- x[-1L] != x[-n] ## i <- c(which(y | is.na(y)), n) ## structure(list(lengths = diff(c(0L, i)), values = x[i]), ## class = "rle") ## } ## <bytecode: 0x13b2ad840> ## <environment: namespace:base> ``` ] ] --- class: small20 ## Step up : {flow} * Easier to follow logic for long functions * A good format to share with colleagues and management * Lets you identify branches to refactor * we can use the `out` arg to export or open temp files ```r flow_view(data.frame) ``` --- class: small20 ## Step up : {flow} Visualize the **dependencies between variables** in a given function or script * `flow_view_vars()` can be used on functions or paths .xsmall-code[ .pull-left[ ```r flow_view_vars(ave) ``` .center[ <img src="data:image/png;base64,#images/flow_view_vars_ave.png" style="width: 50%" /> ] ] .pull-right[ ```r ave ``` ``` ## function (x, ..., FUN = mean) ## { ## if (missing(...)) ## x[] <- FUN(x) ## else { ## g <- interaction(...) ## split(x, g) <- lapply(split(x, g), FUN) ## } ## x ## } ## <bytecode: 0x11f74ec78> ## <environment: namespace:stats> ``` ] ] --- class: small20 ## Step up : {flow} Visualize the **dependencies between variables** in a given function or script * `flow_view_vars()` can be used on functions or paths .xsmall-code[ .pull-left[ ```r flow_view_vars(ave, expand = FALSE) ``` .center[ <img src="data:image/png;base64,#images/flow_view_vars_ave2.png" style="width: 50%" /> ] ] .pull-right[ ```r ave ``` ``` ## function (x, ..., FUN = mean) ## { ## if (missing(...)) ## x[] <- FUN(x) ## else { ## g <- interaction(...) ## split(x, g) <- lapply(split(x, g), FUN) ## } ## x ## } ## <bytecode: 0x11f74ec78> ## <environment: namespace:stats> ``` ] ] --- class: small20 ## Step up : {flow} Visualize the **dependencies between functions** in a package, or scripts in a folder * `flow_view_deps()` shows the objects called recursively by its input .small-code[ ```r flow_view_deps(dplyr::ifelse) ``` ] <img src="data:image/png;base64,#images/flow_view_deps_if_else.png" style="width: 100%" /> --- class: small20 ## Step up : {flow} Visualize the **dependencies between functions** in a package, or scripts in a folder * `flow_view_deps()` shows the objects called recursively by its input .small-code[ ```r flow_view_deps(dplyr::ifelse, show_imports = "packages") ``` ] <img src="data:image/png;base64,#images/flow_view_deps_if_else2.png" style="width: 100%" /> --- class: small20 ## Step up : {flow} Visualize the **dependencies between functions** in a package, or scripts in a folder * `flow_view_deps()` shows the objects called recursively by its input .small-code[ ```r flow_view_deps(dplyr::ifelse, show_imports = "none") ``` ] <img src="data:image/png;base64,#images/flow_view_deps_if_else3.png" style="width: 100%" /> --- class: small20 ## Step up : {flow} Visualize the **dependencies between functions** in a package, or scripts in a folder * `flow_view_uses()` shows the functions which call the input directly or indirectly .small-code[ ```r flow_view_uses(dplyr::ifelse) ``` ] .center[ <img src="data:image/png;base64,#images/flow_view_uses_if_else.png" style="width: 60%"/> ] --- class: small20 ## Step up : {flow} Visualize the **dependencies between functions** in a package, or scripts in a folder * `flow_view_shiny()` shows the modular structure of a shiny app .small-code[ ```r flow_view_shiny(esquisse::esquisser, show_imports = "none") ``` ] <img src="data:image/png;base64,#images/flow_view_shiny_esquisse.png" style="width: 100%"/> --- ## Step up : {flow} * {refactor} : continuous live testing * {flow} : Understand the logic, improve the vision --- ## Setup - Warm up - Setup - Clean up - Step up* - **Fix up ** 📌 - Wrap up --- ## Fix up : Bugs! Despite our efforts, trouble found us .center[ <img src="data:image/png;base64,#images/wondering.jpg" style="width: 50%" /> ] --- ## Fix up * What's in a bug ? 📌 * Base R toolkit * {flow} * {constructive} * {boomer} --- class: small20 ## Fix up: What's in a bug ? A bug is a misunderstanding by the dev or user of either * The Logic * The State * The Data Where state is made of : * versions * OS * system dependencies * internet access etc. * Random seed * environment stack * ... --- ## Fix up: What's in a bug ? * The error message is an attempt to summarize those for you * The documentation is an attempt to provide you the only knowledge about the logic that you need * You should really read them! * But it might not be enough --- class: small20 ## Fix up: What's in a bug ? Minimal reprex: * Minimize the logic: minimal code, minimal dependencies * Minimize the state: Run in new session * Minimize the data In those 3 dimensions we can either: * start from nothing and add elements until it breaks * start from complex case and remove element until it works --- class: small20 ## Fix up: What's in a bug ? * Use the {reprex} package! * Capture those 3 dimensions into something pretty and concise * You're halfway there * And likely to get an answer if you ask * http://www.stackoverflow.com * RStudio community * twitter --- ## Fix up * What's in a bug ? * Base R toolkit 📌 * {flow} * {constructive} * {boomer} --- class: small20 ## Fix up: Base R toolkit Many resources already available, this one is great: * Quant Psych: debugging strategy for R * https://www.youtube.com/watch?v=9vABzGCQeqU I'll do a quick summary --- class: small20 ## Fix up: Base R toolkit - `options(warn = 2)` : Fail to better identify situation at time of warning - `options(error = recover)` : Explore data at different places in the call stack - `traceback()` : The sequence of calls that got you to an error - `browser()`, `debug()`, `debugonce()` : Explore the logic step by step from a given point, explore the data. - `typeof()`, `attributes()`, `str()`, `dput()`: often better than `print()` to understand objects. - `message()`, `cat()`: Log information to the console, or to a file, you can use `trace()`, `trace(,edit = TRUE)`, `untrace()` to insert logging calls in any function temporarily. - `try()`, `tryCatch()`: Capture error and for instance log or browse if error - `search()`, `sessionInfo()`, `Sys.info()`: Explore the state - `on.exit()`: run some code whenever a function is exited, including if error --- ## Fix up: {flow} * What's in a bug ? * Base R toolkit * {flow} 📌 * {constructive} * {boomer} --- class: small20 ## Fix up: {flow} Let's see how `flow_run()` can help us understand a bug better .small-code[ ```r # this works df <- data.frame( x = "Keep calm and", y = "love", z = "Ukraine" ) df ``` ``` ## x y z ## 1 Keep calm and love Ukraine ``` ] --- class: small20 ## Fix up: {flow} .small-code[ ```r # this also works df$y <- emo::ji("heart") df$z <- emo::ji("ukraine") df ``` ``` ## x y z ## 1 Keep calm and ❤️ 🇺🇦 ``` ] --- class: small20 ## Fix up: {flow} .small-code[ ```r # but this doesn't, why ? df <- data.frame( x = "Keep calm and", y = emo::ji("heart"), z = emo::ji("ukraine") ) ``` ``` ## Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors): cannot coerce class '"emoji"' to a data.frame ``` ] --- class: small20 ## Fix up: {flow} Let's explore .small-code[ ```r flow_run(data.frame( x = "Keep calm and", y = emo::ji("heart"), z = emo::ji("ukraine") ), out = "png") ``` ] --- class: small20 ## Fix up: {flow} The function `flow_compare_runs()` will compare 2 calls to a same function. .small-code[ ```r flow_compare_runs(rle(NULL), rle(c(1, 2, 2, 3))) ``` ] <img src="data:image/png;base64,#images/flow_compare_runs_rle.png" style="width: 40%"/> --- ## Fix up: {constructive} * What's in a bug ? * Base R toolkit * {flow} * {constructive} 📌 * {boomer} --- class: small20 ## Fix up: {constructive} * {constructive} strives to represent data simply and accurately. * Accuracy is crucial when debugging * We tend to trust `print()` but we shouldn't .small-code[ ```r df1 <- data.frame(date = as.Date("2023-08-24"), country = factor("UA")) df2 <- data.frame(date = "2023-08-24", country = "UA") attr(df2$date, "attr_date") <- "national_day" attr(df2, "attr_df") <- "some_dataset" df1 ``` ``` ## date country ## 1 2023-08-24 UA ``` ```r df2 ``` ``` ## date country ## 1 2023-08-24 UA ``` ] --- class: small20 ## Fix up: {constructive} `dput()` and `str()` are more accurate here, but hard to read: .small-code[ ```r dput(df1) ``` ``` ## structure(list(date = structure(19593, class = "Date"), country = structure(1L, levels = "UA", class = "factor")), class = "data.frame", row.names = c(NA, ## -1L)) ``` ```r dput(df2) ``` ``` ## structure(list(date = structure("2023-08-24", attr_date = "national_day"), ## country = "UA"), row.names = c(NA, -1L), class = "data.frame", attr_df = "some_dataset") ``` ```r str(df1) ``` ``` ## 'data.frame': 1 obs. of 2 variables: ## $ date : Date, format: "2023-08-24" ## $ country: Factor w/ 1 level "UA": 1 ``` ```r str(df2) ``` ``` ## 'data.frame': 1 obs. of 2 variables: ## $ date : chr "2023-08-24" ## ..- attr(*, "attr_date")= chr "national_day" ## $ country: chr "UA" ## - attr(*, "attr_df")= chr "some_dataset" ``` ] --- class: small20 ## Fix up: {constructive} By contrast `constructive::construct()` produces idiomatic code .small-code[ ```r library(constructive) construct(df1) ``` ``` ## data.frame(date = as.Date("2023-08-24"), country = factor("UA")) ``` ```r construct(df2) ``` ``` ## data.frame( ## date = "2023-08-24" |> ## structure(attr_date = "national_day"), ## country = "UA" ## ) |> ## structure(attr_df = "some_dataset") ``` ] --- class: small20 ## Fix up: {constructive} `constructive::construct_diff()` can be used to compare them .small-code[ ```r construct_diff(df1, df2) ``` ] <img src="data:image/png;base64,#images/construct_diff.png" style="width: 80%" /> --- class: small20 ## Fix up: {constructive} Sometimes dput is inaccurate. .small-code[ ```r dput(dplyr::select) ``` ``` ## function (.data, ...) ## { ## UseMethod("select") ## } ``` ```r construct(dplyr::select) ``` ``` ## (function(.data, ...) { ## UseMethod("select") ## }) |> ## (`environment<-`)(asNamespace("dplyr")) ``` ```r dput(RSQLite::SQLite()) ``` ``` ## new("SQLiteDriver", ) ``` ```r construct(RSQLite::SQLite()) ``` ``` ## new( ## "SQLiteDriver" |> ## structure(package = "RSQLite") ## ) ``` ] --- class: small20 ## Fix up: {constructive} Sometimes it is even non syntactic .small-code[ ```r dput(environment(dplyr::select)) ``` ``` ## <environment> ``` ```r construct(environment(dplyr::select)) ``` ``` ## asNamespace("dplyr") ``` ] --- class: small20 ## Fix up: {constructive} Sometimes it is even non syntactic .small-code[ ```r dt <- data.table::data.table(a=1) dput(dt) ``` ``` ## structure(list(a = 1), row.names = c(NA, -1L), class = c("data.table", ## "data.frame"), .internal.selfref = <pointer: 0x14c80bee0>) ``` ```r construct(dt) ``` ``` ## data.table::data.table(a = 1) ``` ] --- class: small20 ## Fix up: {constructive} * We have a lot of control on how we want to generate the code * This is done by using the `opts_*` functions implemented for supported class .small-code[ ```r construct(dplyr::band_members) ``` ``` ## tibble::tibble(name = c("Mick", "John", "Paul"), band = c("Stones", "Beatles", "Beatles")) ``` ```r construct(dplyr::band_members, opts_tbl_df("tribble")) ``` ``` ## tibble::tribble( ## ~name, ~band, ## "Mick", "Stones", ## "John", "Beatles", ## "Paul", "Beatles", ## ) ``` ] --- class: small20 ## Fix up: {constructive} Using the constructor "next" we can opt out of the idiomatic constructor and use the next method, yet still get a faithful object. .small-code[ ```r construct(dplyr::band_members, opts_tbl_df("next")) ``` ``` ## data.frame(name = c("Mick", "John", "Paul"), band = c("Stones", "Beatles", "Beatles")) |> ## structure(class = c("tbl_df", "tbl", "data.frame")) ``` ```r construct(dplyr::band_members, opts_tbl_df("next"), opts_data.frame("next")) ``` ``` ## list(name = c("Mick", "John", "Paul"), band = c("Stones", "Beatles", "Beatles")) |> ## structure(class = c("tbl_df", "tbl", "data.frame"), row.names = 1:3) ``` ] --- class: small20 ## Fix up: {constructive} To reproduce a bug `construct_multi()` is handy: .small-code[ ```r a <- head(cars, 2) b <- letters construct_multi(list(a = a, b = b)) ``` ``` ## a <- data.frame(speed = c(4, 4), dist = c(2, 10)) ## ## b <- c( ## "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", ## "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z" ## ) ``` ] --- ## Fix up: {boomer} * What's in a bug ? * Base R toolkit * {flow} * {constructive} * {boomer} 📌 --- class: small20 ## Fix up: {boomer} {boomer} makes expressions or functions verbose, so they print the output of each intermediate step. `boom()` explodes a call .small-code[ ```r # remotes::install_github("moodymudskipper/boomer") library(boomer) boom(1 + !1 * 2) boom(subset(head(mtcars, 2), qsec > 17)) ``` ] --- class: small20 ## Fix up: {boomer} * `rig()` sets up a function so when it's called it prints all intermediate steps * Use `rig_in_namespace()` to rig permanently a package function during development .small-code[ ```r hello <- function(x) { if(!is.character(x) || length(x) != 1) { stop("`x` should be a string") } paste0("Hello ", x, "!") } hello2 <- rig(hello) hello2("world") ``` ] --- class: small20 ## Fix up: {boomer} * `boom()` and `rig()` have a `print` arg to tweak the way values are printed. * We can use {constructive} there. Example from SO. A user wanted to understand what this does .small-code[ ```r library(dplyr, warn.conflicts = FALSE) fun <- function(df, Country_name){ Country_name <- rlang::parse_expr(quo_name(enquo(Country_name))) df %>% filter(Country == Country_name) } df <- data.frame(x = 1:2, Country = c("Belgium", "Ukraine")) df ``` ] --- class: small20 ## Fix up: {boomer} .small-code[ ```r fun2 <- boomer::rig(fun, print = constructive::construct) fun2(df, Ukraine) ``` ] --- ## Fix up: {boomer} * What's in a bug ? * Base R toolkit * {flow} * {constructive} * {boomer} --- ## Wrap up - Warm up : general ideas - Setup : make a messy code base into a proper pakage - Clean up : make it better - Step up : Useful refactoring tools - Fix up : Useful debugging tools - **Wrap up** 📌 --- class: center, middle # Questions? --- class: center, middle # Thank you!