Attaching package: 'blockr'
The following object is masked from 'package:graphics':
layout
Code
library(pracma)
David Granjon
September 16, 2024
Since 2023, BristolMyersSquibb, the Y company and cynkra have teamed up to develop a novel no-code solution for R.
Attaching package: 'blockr'
The following object is masked from 'package:graphics':
layout
library(pracma)
blockr is an R package designed to democratize data analysis by providing a flexible, intuitive, and code-free approach to building data pipelines. It has 2 main user targets:
blockr is data agnostic, meaning it can work with any kind of dataset, that is pharmaceutical data or sport analytics data. It builds on top of shiny to ensure real time feedback to any data change. Finally, it allows to export code to create reproducible data analysis.
As a simple user, you’re not expected to write any single line of code to use blockr. You can use the below kitchen sink to get started. This example is based on the palmer penguins data and running a single stack with 3 blocks: the first block to select the data, another one to create the plot and then add the points to it.
blockr has a its own validation system. For instance, using the below example, you can try to press return on the first block select box (penguins is the selected default). You’ll notice an immediate feedback message. A global message is displayed in the block upper middle part: “1 error(s) found in this block”. You get more detailed mesages next to the faulty input(s): “selected value(s) not among provided choices”. You can repeat the same experience with the last plot layer block, by emptying the color and shape select inputs. Error messages can accumulate.
You can dynamically add blocks to a current stack, that gathers a set of related blocks. You can think a stack as a data analysis recipe as in cooking, where blocks are instructions. To add a new block, you can click on the +
icon on the stack top right corner. This opens a sidebar on the left side, where one may search for blocks that are compatible with the current state of the pipeline. With an empty stack, only entry point blocks are suggested, so you can import data. Then, after clicking on the block, the suggestion list changes so you can, for instance, filter data or select only a subset of columns, and much more.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| components: [viewer]
#| column: screen-inset-shaded
#| viewerHeight: 800
webr::install("blockr", repos = c("https://bristolmyerssquibb.github.io/webr-repos", "https://repo.r-wasm.org"))
library(blockr)
library(palmerpenguins)
library(ggplot2)
new_ggplot_block <- function(col_x = character(), col_y = character(), ...) {
data_cols <- function(data) colnames(data)
new_block(
fields = list(
x = new_select_field(col_x, data_cols, type = "name"),
y = new_select_field(col_y, data_cols, type = "name")
),
expr = quote(
ggplot(mapping = aes(x = .(x), y = .(y)))
),
class = c("ggplot_block", "plot_block"),
...
)
}
new_geompoint_block <- function(color = character(), shape = character(), ...) {
data_cols <- function(data) colnames(data$data)
new_block(
fields = list(
color = new_select_field(color, data_cols, type = "name"),
shape = new_select_field(shape, data_cols, type = "name")
),
expr = quote(
geom_point(aes(color = .(color), shape = .(shape)), size = 2)
),
class = c("geompoint_block", "plot_layer_block", "plot_block"),
...
)
}
stack <- new_stack(
data_block = new_dataset_block("penguins", "palmerpenguins"),
plot_block = new_ggplot_block("flipper_length_mm", "body_mass_g"),
layer_block = new_geompoint_block("species", "species")
)
serve_stack(stack)
Let’s consider this dataset, which contains 120 years of olympics athletes data until Rio in 2016. In the below kitchen sink, we first add an upload block:
Add stack
.+
button and search for browser
, then select the new_filesbrowser_block
.File select
and select the downloaded file at step 1 (athlete_events.csv
).new_csv_block
. Repeat step 3 to add the new_csv_block
. The table is 271116
rows and 15
columns.new_filter_block
and select Sex
as column and then F
in the values input. We leave the comparison to ==
and click on the Run
button. Notice we now have 74522 rows.new_mutate_block
with the following expression: birth_year = Year - Age
(this gives us an approximate birth year). Click on submit.From now on, we leave the first stack as is and will reuse it in other stacks. We want to display the average height distribution for female athletes. Let’s do it below.
Add stack
.new_result_block
. This allows to import the data from the first stack (and potentially any stack from the dashboard). If you don’t see any data, select another stack name from the dropdown menu.new_ggplot_block
, leave x
as default function and select Height
as variable in the columns input.new_geomhistogram_block
. Now we have our distribution plot.Alternatively, you could remove the 2 plot blocks and add a new_summarize_block
using mean
as function and Height
as column (result: 168 cm).
In the following, we create a look-up table to be able to retrieve the athlete names based on their ID
.
new_select_block
and only select ID
, Name
, birth_year
, Team
and Sport
as columns.Our goal is now to find which athlete did 2 or more different sports.
new_filter_block
, select Medal
as column, !=
as comparison operator and leave the value empty. Click on run, which will only get athletes with medals.new_group_by_block
, grouping by ID
(as some athletes have the same name).new_summarize_block
by choising the function n_distinct
applied on the Sport
columns.new_filter_block
, select N_DISTINCT
as column, >=
as comparison operator and set the value to 2. Click on run. This gives us the athletes that are doing 2 sports or more.new_join_block
. Select left_join
as join function, select the third stack (lookup table) as join table and ID
as column.new_arrange_block
for the birth_year
column.As a conclusion, Hjrdis Viktoria Tpel (1904) was the first recorded athlete to compete in 2 different sports, swimming and diving for Sweden. Lauryn Chenet Williams (1984) is the latest for US with Athletics and Bobsleigh. It’s actually quite amazing to see people competing in two quite unrelated sports like swimming and handbain the case of Roswitha Krause.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| components: [viewer]
#| column: screen-inset-shaded
#| viewerHeight: 800
webr::install("blockr", repos = c("https://bristolmyerssquibb.github.io/webr-repos", "https://repo.r-wasm.org"))
webr::install("blockr.ggplot2", repos = c("https://bristolmyerssquibb.github.io/webr-repos", "https://repo.r-wasm.org"))
library(blockr)
library(blockr.ggplot2)
options(shiny.maxRequestSize = 100*1024^2)
do.call(set_workspace, args = list(title = "My workspace"))
serve_workspace(clear = FALSE)
As an end-user, you are not supposed to write code. As such, if you think anything is missing, you can open an issue here, or ask any developer you are working with to create new blocks. This leads us to the second part of this blog post … How to use blockr as a developers?
How to install it:
pak::pak("BristolMyersSquibb/blockr")
blockr can’t provide any single data manipulation or visualization block. That’s the reason why we made it easily extensible. You can get an introduction to blockr for developers here.
In the following, we create an ordinary differential equations solver block using the pracma package. We choose the Lorenz attractor. With R, equations may be written as:
lorenz <- function(t, y, parms) {
c(
X = parms[1] * y[1] + y[2] * y[3],
Y = parms[2] * (y[2] - y[3]),
Z = -y[1] * y[2] + parms[3] * y[2] - y[3]
)
}
where t
is the time, y
a vector of solutions and params
the various parameters. If you are familiar with deSolve, equations are defined with similar functions. For this blog post, we selected pracma as deSolve does not run in shinylive, so you could not see the embedded demonstration.
We want to add interactivity on the 3 different parameters. Hence, we create our new block function with 3 fields inside a list. Since the expected values are numbers, we leverage the new_numeric_field
. Parameters are only explicitly shown for the first field:
new_ode_block <- function(...) {
fields <- list(
a = new_numeric_field(value = -8 / 3, min = -10, max = 20),
b = new_numeric_field(-10, -50, 100),
c = new_numeric_field(28, 1, 100)
)
# TBD
# ...
}
As you may imagine, these fields are subsequently translated into shiny inputs, that is numericInput
in our example. If you face a situation where you need to implement a custom field, not included in blockr, you can read this vignette.
As next step, we instantiate our block with the new_block
blockr constructor:
A block is composed of fields, a quoted expression which involved fields (to delay the evaluation), somes classes which control the block behavior, and extra parameters passed with ...
. Finally, submit
allows to delay the block evaluation by requiring the user to click on a submit button (FALSE by default). This prevents from triggering unwanted intensive computations.
In our example, the expression calls the ode45
function. Notice the usage of substitute
to inject the lorenz
function within the expression. This is necessary since lorenz
is defined outside of the expression, and using quote
would fail. Fields are invoked with .(field_name)
, a rather strange notation, required by bquote
to process the expression. It is not mandory to understand this technical underlying detail, but this standard must be respected. You may also notice that some parameters like the initial conditions y0
or time values are hardcoded. We leave the reader to transform them into fields, as an exercise:
new_block(
fields = fields,
expr = substitute(
as.data.frame(
ode45(
fun,
y0 = c(X = 1, Y = 1, Z = 1),
t0 = 0,
tfinal = 100,
parms = c(.(a), .(b), .(c))
)
),
list(fun = lorenz)
)
# TBD
)
We give our block 2 classes, namely ode_block
and data_block
:
new_ode_block <- function(...) {
fields <- list(
a = new_numeric_field(-8 / 3, -10, 20),
b = new_numeric_field(-10, -50, 100),
c = new_numeric_field(28, 1, 100)
)
new_block(
fields = fields,
expr = substitute(
as.data.frame(
ode45(
fun,
y0 = c(X = 1, Y = 1, Z = 1),
t0 = 0,
tfinal = 100,
parms = c(.(a), .(b), .(c))
)
),
list(fun = lorenz)
),
...,
class = c("ode_block", "data_block")
)
}
As explained earlier, they are required to control the block behavior, as blockr is build with S3. For instance, data_block
have a specific evaluation method, to calculate the expression:
evaluate_block.data_block <- function (x, ...)
{
stopifnot(...length() == 0L)
eval(generate_code(x), new.env())
}
where generate_code
processes the block code. Data blocks are considered as entry point blocks, as opposed to transformation blocks, that operate on data. Therefore, you may easily understand that the evaluation method for a transform block requires to pass the data from the previous block with %>%
:
evaluate_block.block <- function (x, data, ...)
{
stopifnot(...length() == 0L)
eval(substitute(data %>% expr, list(expr = generate_code(x))), list(data = data))
}
If you want to build a plot block and plot layers blocks, you would have to design a specific evaluate method, that accounts for the +
operator required by ggplot2. To learn more about how to create a plot block, you can read this article.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| components: [viewer, editor]
#| column: screen-inset-shaded
#| viewerHeight: 800
webr::install("blockr", repos = c("https://bristolmyerssquibb.github.io/webr-repos", "https://repo.r-wasm.org"))
webr::install("blockr.ggplot2", repos = c("https://bristolmyerssquibb.github.io/webr-repos", "https://repo.r-wasm.org"))
library(blockr)
library(pracma)
library(blockr.ggplot2)
lorenz <- function(t, y, parms) {
c(
X = parms[1] * y[1] + y[2] * y[3],
Y = parms[2] * (y[2] - y[3]),
Z = -y[1] * y[2] + parms[3] * y[2] - y[3]
)
}
new_ode_block <- function(...) {
fields <- list(
a = new_numeric_field(-8 / 3, -10, 20),
b = new_numeric_field(-10, -50, 100),
c = new_numeric_field(28, 1, 100)
)
new_block(
fields = fields,
expr = substitute(
as.data.frame(
ode45(
fun,
y0 = c(X = 1, Y = 1, Z = 1),
t0 = 0,
tfinal = 100,
parms = c(.(a), .(b), .(c))
)
),
list(fun = lorenz)
),
...,
class = c("ode_block", "data_block")
)
}
stack <- new_stack(
new_ode_block,
new_ggplot_block(
func = c("x", "y"),
default_columns = c("y.1", "y.2")
),
new_geompoint_block
)
serve_stack(stack)
In the above example, we define the block on the fly. However, an other outstanding feature of blockr is the registry, which you can see as a blocks supermarket. From the R side, the registry is an environment that can be extended by developers who bring their own blocks packages:
To get an overview of all available blocks within the blockr core package, we call get_registry
:
ctor description category
1 arrange_block Arrange columns transform
2 csv_block Read a csv dataset parser
3 dataset_block Choose a dataset from a package data
4 filesbrowser_block Select files on the server file system data
5 filter_block filter rows in a table transform
6 group_by_block Group by columns transform
7 head_block Select n first rows of dataset transform
8 join_block Join 2 datasets transform
9 json_block Read a json dataset parser
10 mutate_block Mutate block transform
11 rds_block Read a rds dataset parser
12 result_block Shows result of another stack as data source data
13 select_block select columns in a table transform
14 summarize_block summarize data groups transform
15 upload_block Upload files from location data
16 xpt_block Read a xpt dataset parser
classes input output
1 arrange_block, transform_block, block data.frame data.frame
2 csv_block, parser_block, transform_block, block string data.frame
3 dataset_block, data_block, block <NA> data.frame
4 filesbrowser_block, data_block, block <NA> string
5 filter_block, transform_block, block data.frame data.frame
6 group_by_block, transform_block, block data.frame data.frame
7 head_block, transform_block, block data.frame data.frame
8 join_block, transform_block, block data.frame data.frame
9 json_block, parser_block, transform_block, block string data.frame
10 mutate_block, transform_block, block data.frame data.frame
11 rds_block, parser_block, transform_block, block string data.frame
12 result_block, data_block, block <NA> data.frame
13 select_block, transform_block, block data.frame data.frame
14 summarize_block, transform_block, block data.frame data.frame
15 upload_block, data_block, block <NA> string
16 xpt_block, parser_block, transform_block, block string data.frame
package
1 blockr
2 blockr
3 blockr
4 blockr
5 blockr
6 blockr
7 blockr
8 blockr
9 blockr
10 blockr
11 blockr
12 blockr
13 blockr
14 blockr
15 blockr
16 blockr
This function returns a dataframe containing information about blocks such as their constructors, like new_ode_block
, the description, the category (data, transform, plot … this is user defined), classes, accepted input, returned output and package.
To register a block we call register_block
(or register_blocks
for multiple blocks):
register_my_blocks <- function() {
register_block(
constructor = new_ode_block,
name = "ode block",
description = "Computed the Lorent attractor solutions",
classes = c("ode_block", "data_block"),
input = NA_character_,
output = "data.frame",
package = "<YOUR_PACKAGE>",
category = "data"
)
# You can register any other blocks here ...
}
where <YOUR_PACKAGE>
must be replaced by your real package name.
Within a zzz.R
script, you can ensure to register any block when the package loads with a hook:
.onLoad <- function(libname, pkgname) {
register_my_blocks()
invisible(NULL)
}
After the registration, you can check whether the registry is updated, by looking at the ode block:
register_my_blocks()
reg <- get_registry()
reg[reg$package == "<YOUR_PACKAGE>", ]
ctor description category
11 ode_block Computed the Lorent attractor solutions data
classes input output package
11 ode_block, data_block, block <NA> data.frame <YOUR_PACKAGE>