Title: | Exploring Data with Tree Data Structures |
---|---|
Description: | A simple tool allowing users to easily and dynamically explore or document a data set using a tree structure. |
Authors: | Justin Alford [aut, cre] |
Maintainer: | Justin Alford <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0.9000 |
Built: | 2025-01-19 03:14:57 UTC |
Source: | https://github.com/alforj/muir |
This function allows users to easily and dynamically explore or document a dataset using a tree structure.
build_tree(data, tree.dir = "LR", tree.height = NULL, tree.width = NULL)
build_tree(data, tree.dir = "LR", tree.height = NULL, tree.width = NULL)
data |
A muir-generated data frame to be processed into a tree using DiagrammeR |
tree.dir |
Direction of tree graph. Use either "LR" for left-to-right, "RL" for right-to left, "TB" for top-to-bottom, or "BT" for bottom-to-top. |
tree.height |
Control tree height to zoom in/out on nodes. Defaults to NULL |
tree.width |
Control tree width to zoom in/out on nodes. Defaults to NULL |
An object of class htmlwidget
(via DiagrammeR)that will
intelligently print itself into HTML in a variety of contexts
including the R console, within R Markdown documents,
and within Shiny output bindings.
This function allows users to easily and dynamically explore or document a data.frame using a tree data structure. Columns of interest in the data.frame can be provided to the function, as well as critieria for how they should be represented in discrete nodes, to generate a data tree representing those columns and filters.
muir(data, node.levels, node.limit = 3, level.criteria = NULL, label.vals = c("n():n"), tree.dir = "LR", show.percent = TRUE, num.precision = 2, show.empty.child = FALSE, tree.height = -1, tree.width = -1)
muir(data, node.levels, node.limit = 3, level.criteria = NULL, label.vals = c("n():n"), tree.dir = "LR", show.percent = TRUE, num.precision = 2, show.empty.child = FALSE, tree.height = -1, tree.width = -1)
data |
A data.frame to be explored using trees |
node.levels |
A character vector of columns from For each column, the user can add a suffix to the columnn name to indicate whether to generate
nodes for all distinct values of the column in the date.frame, a specific number of values
(i.e., the "Top (n)" values), and whether or not to aggregate remaining values into a separate
"Other" node, or to use user-provided filter criteria for the column as provided in
the Values can be provided as "colname", "colname:*", "colname:3", "colname:+", or "colname:*+". The separator character ":" and the special characters in the suffix that follow (as outlined below) indicate which approach to take for each column.
|
node.limit |
Numeric value. When providing a column in |
level.criteria |
A data.frame consisting of 4 character columns containing
column names (matching – without suffixes – those columns in |
label.vals |
Character vector of additional values to include in the node provided as a
character vector. Defaults to show "Count" ( Similar to |
tree.dir |
Character. The direction the tree graph should be rendered. Defaults to "LR"
|
show.percent |
Logical. Should nodes show the percent of records represented by
that node compared to the total number of records in |
num.precision |
Number of digits to print numeric label values out to (currently only for percent) |
show.empty.child |
Logical. Show a balanced tree with children nodes that are all empty or stop expanding the tree once there is a parent node that is empty. Defaults to FALSE – don't show empty children nodes |
tree.height |
Numeric. Control tree height to zoom in/out on nodes. Passed to DiagrammeR
as |
tree.width |
Numeric. Control tree width to zoom in/out on nodes. Passed to DiagrammeR
as |
An object of class htmlwidget
(via DiagrammeR) that will
intelligently print itself into HTML in a variety of contexts
including the R console, within R Markdown documents,
and within Shiny output bindings.
## Not run: # Load in the 'mtcars' dataset data(mtcars) # Basic exploration - show all values mtTree <- muir(data = mtcars, node.levels = c("cyl:*", "carb:*")) mtTree # Basic exploration - show all values overriding default node.limit mtTree <- muir(data = mtcars, node.levels = c("cyl:*", "carb:*"), node.limit = 5) mtTree # Show all values overriding default node.limit differently for each column mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:5")) mtTree # Show all values overriding default node.limit for each column # and aggregating all distinct values above the node.limit into a # separate "Other" column to collect remaining values # Top 2 occurring 'carb' values will be returned in their own nodes, # remaining values/counts will be aggregated into a separate "Other" node mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+")) mtTree # Add additional calculations to each node output (dplyr::summarise functions) mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+"), label.vals = c("n():n", "min(wt)", "max(wt)")) mtTree # Make new label values more reader-friendly mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+"), label.vals = c("n():n", "min(wt):Min Weight", "max(wt):Max Weight")) mtTree # Instead of just returning top counts for columns provided in \code{node.levels}, # provide custom filter criteria and custom node titles in \code{label.vals} # (criteria could also be read in from a csv file as a data.frame) criteria <- data.frame(col = c("cyl", "cyl", "carb"), oper = c("<", ">=", "=="), val = c(4, 4, 2), title = c("Less Than 4 Cylinders", "4 or More Cylinders", "2 Carburetors")) mtTree <- muir(data = mtcars, node.levels = c("cyl", "carb"), level.criteria = criteria, label.vals = c("n():n", "min(wt):Min Weight", "max(wt):Max Weight")) mtTree # Use same criteria but show all other values for the column where NOT # EQUAL to the combination of the filters provided for that column (e.g., for cyl # where !(cyl < 4 | cyl >= 4) in an "Other" node mtTree <- muir(data = mtcars, node.levels = c("cyl:+", "carb:+"), level.criteria = criteria, label.vals = c("n():n", "min(wt):Min Weight", "max(wt):Max Weight")) mtTree # Show empty child nodes (balanced tree) mtTree <- muir(data = mtcars, node.levels = c("cyl:+", "carb:+"), level.criteria = criteria, label.vals = c("n():n", "min(wt):Min Weight", "max(wt):Max Weight"), show.empty.child = TRUE) mtTree # Save tree to HTML file with \code{htmlwidgets} package to working directory mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+")) htmlwidgets::saveWidget(mtTree, "mtTree.html") ## End(Not run)
## Not run: # Load in the 'mtcars' dataset data(mtcars) # Basic exploration - show all values mtTree <- muir(data = mtcars, node.levels = c("cyl:*", "carb:*")) mtTree # Basic exploration - show all values overriding default node.limit mtTree <- muir(data = mtcars, node.levels = c("cyl:*", "carb:*"), node.limit = 5) mtTree # Show all values overriding default node.limit differently for each column mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:5")) mtTree # Show all values overriding default node.limit for each column # and aggregating all distinct values above the node.limit into a # separate "Other" column to collect remaining values # Top 2 occurring 'carb' values will be returned in their own nodes, # remaining values/counts will be aggregated into a separate "Other" node mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+")) mtTree # Add additional calculations to each node output (dplyr::summarise functions) mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+"), label.vals = c("n():n", "min(wt)", "max(wt)")) mtTree # Make new label values more reader-friendly mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+"), label.vals = c("n():n", "min(wt):Min Weight", "max(wt):Max Weight")) mtTree # Instead of just returning top counts for columns provided in \code{node.levels}, # provide custom filter criteria and custom node titles in \code{label.vals} # (criteria could also be read in from a csv file as a data.frame) criteria <- data.frame(col = c("cyl", "cyl", "carb"), oper = c("<", ">=", "=="), val = c(4, 4, 2), title = c("Less Than 4 Cylinders", "4 or More Cylinders", "2 Carburetors")) mtTree <- muir(data = mtcars, node.levels = c("cyl", "carb"), level.criteria = criteria, label.vals = c("n():n", "min(wt):Min Weight", "max(wt):Max Weight")) mtTree # Use same criteria but show all other values for the column where NOT # EQUAL to the combination of the filters provided for that column (e.g., for cyl # where !(cyl < 4 | cyl >= 4) in an "Other" node mtTree <- muir(data = mtcars, node.levels = c("cyl:+", "carb:+"), level.criteria = criteria, label.vals = c("n():n", "min(wt):Min Weight", "max(wt):Max Weight")) mtTree # Show empty child nodes (balanced tree) mtTree <- muir(data = mtcars, node.levels = c("cyl:+", "carb:+"), level.criteria = criteria, label.vals = c("n():n", "min(wt):Min Weight", "max(wt):Max Weight"), show.empty.child = TRUE) mtTree # Save tree to HTML file with \code{htmlwidgets} package to working directory mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+")) htmlwidgets::saveWidget(mtTree, "mtTree.html") ## End(Not run)