4.2 Unit Test Diagnostics
When tests fail—and they will—you need clear, actionable information about what went wrong. The {testthat} package provides several reporting formats that help you understand test results at different levels of detail.
4.2.1 Understanding Test Reporters
Test reporters control how {testthat} displays test results. You configure them in your {tests/init.r} file through the reporter parameter in test_dir():
box::use(testthat[...])
.on_load = function (ns) {
test_dir(box::file(), reporter = "progress") # or "summary", "check", etc.
}
box::export()Here are different types of reporter:
The default reporter shows only dots and failures:
$ Rscript module/matrix_ops.r
.....Each . represents a passing test. If a test fails, you’ll see F, where the test failed happened:
...F..This is perfect for quick checks during development when you just want to know if everything still works.
For slightly more detail, use reporter = "progress'":
.on_load = function (ns) {
test_dir(box::file(), reporter = 'progress')
}Output:
$ Rscript module/matrix_ops.r
✔ | F W S OK | Context
✔ | 5 | matrix_ops
══ Results ═════════════════════════════════════════════════
Duration: 0.1 s
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 5 ]This shows a summary table with pass/fail counts per context, making it easier to spot which test files have issues.
And by the way, this is the default reporter of test_dir().
The check reporter provides the most readable output for development:
.on_load = function (ns) {
test_dir(box::file(), reporter = 'check')
}Output:
$ Rscript module/statistics/models/linear.r
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 26 ]Each test’s description appears on its own line, making it immediately clear which specific assertions passed or failed.
For comprehensive reporting, especially in CI/CD pipelines, use reporter = 'summary':
.on_load = function (ns) {
test_dir(box::file(), reporter = 'summary')
}This gives you:
$ Rscript module/statistics/models/linear.r
linear-reg: ............
logistic-reg: ..............
══ DONE ═══════════════════════════════════════════════════════════════════════and you’ll see a number if the test fails.
4.2.2 Reading Test Failures
Understanding failure messages is crucial for efficient debugging. Here’s what a typical failure looks like:
$ Rscript module/statistics/models/linear.r
── Failure ('test-linear.r:15:5'): Linear Regression calculates correct coefficients ──
`model$out$coefficients` not equal to `as.vector(coef(base_model))`.
Component "coefficients": Mean relative difference: 0.0523
Backtrace:
1. testthat::expect_equal(...)
at test-linear.r:15:4
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 3 ]Let’s break down this failure message:
- Location:
'test-linear.r:15:5'tells you exactly where the failure occurred—line 15, column 5 - Context: The test description helps identify what functionality broke
- Expectation: Shows what you expected vs. what you got
- Details: Specific information about the mismatch (e.g., “Mean relative difference: 0.0523”)
- Backtrace: The call stack leading to the failure
4.2.3 Common Failure Patterns
4.2.3.1 Numerical Precision Concern
Floating-point arithmetic can cause unexpected failures:
test_that('matrix operations match', {
sols = A ^ -1 * b
expected = solve(A) %*% b
# This might fail!
expect_equal(sols, expected)
})The solution is quick and meticulous: Use tolerance for floating-point comparisons.
For example:
test_that('matrix operations match', {
sols = A ^ -1 * b
expected = solve(A) %*% b
# This accounts for rounding errors
expect_equal(sols, expected, tolerance = 1e-6)
})4.2.3.2 Dimension Mismatches
Matrix operations are particularly sensitive to dimensions:
test_that('matrix multiplication with dimension mismatch tries reverse order', {
m1 = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
m2 = matrix(c(7, 8), nrow = 2, ncol = 1)
result = m1 * m2
expected = m2 %*% m1
expect_equal(result, expected)
})If this fails with a dimension error, check:
- Are you multiplying in the right order?
- Do the matrix dimensions actually allow multiplication?
- Is your operator overloading logic handling edge cases?
4.2.3.3 Type Coercion Concern
R’s automatic type conversion can cause subtle bugs:
test_that('data frame conversion works', {
df = data.frame(a = c(1, 2), b = c(3, 4))
result = df * 2
# Might fail if df isn't properly converted to matrix
expect_s3_class(result, "matrix")
})4.2.4 Testing Best Practices
Write Descriptive Test Names
Good test names explain what is being tested and why it matters.
Bad description
test_that('test 1', { ... }) test_that('it works', { ... })Good description
test_that('matrix inverse (^-1) works correctly', { ... }) test_that('combined operations work (inverse then multiply)', { ... })
Test Both Success and Failure Cases
Don’t just test the happy path, must test where it shouldn’t be done.
For example, the logistic_reg() implementation must have the response variable limited to factor / binary (contains 0 and 1) data type, 1 variable only, and 2 unique classes only. It must not be
- Other than factor / binary data types like numeric data
- Have 2 or more variables
- The number of unique classes must not have 3 or more classes, within the response variable.
# Test success
test_that('binary response works with 0/1', {
model = logistic_reg(am ~ wt, data = mtcars)
expect_s3_class(model, "logistic_reg")
})
# Test failure
test_that('error thrown for non-binary response', {
test_data = mtcars
test_data$multi_class1 = sample(c("A", "B", "C"), nrow(test_data), replace = TRUE)
test_data$multi_class2 = sample(c("A", "B", "C"), nrow(test_data), replace = TRUE)
expect_error(
logistic_reg(cbind(multi_class1, multi_class2) ~ wt + hp, data = test_data),
"must be binary with exactly 2 unique values"
)
expect_error(
logistic_reg(mpg ~ wt + hp, data = test_data),
"must be binary with exactly 2 unique values or a factor class"
)
})4.2.5 Continuous Integration
R is also used for production, you see. For production code, I recommend integrating your tests into a CI/CD pipeline.
Here’s a simple GitHub Actions workflow:
# .github/workflows/test.yml
name: Run Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: r-lib/actions/setup-r@v2
with:
r-version: '4.3.0'
- name: Install dependencies
run: |
install.packages(c("box", "testthat", "dplyr", "purrr", "rlang"))
shell: Rscript {0}
- name: Test matrix_ops
run: Rscript module/matrix_ops.r
- name: Test linear regression
run: Rscript module/statistics/models/linear.r
- name: Test logistic regression
run: Rscript module/statistics/models/logistic.r