Zapper: A Very Fast Templating Engine

Thirty times faster than Handlebars, half the features!

I am introducing a new templating engine for Rust that is designed to be robust and very fast!

Why Zapper? Runtime templating is amazing, since you can reload templates on the fly or even allow users to provide their own templates, yet runtime templating engines are rarely fast. Templates that are statically compiled into your application can be super fast, but are completely inflexible. Recompiling and restarting your application just to change a template is especially boring. Zapper combines the flexibility of runtime templating with great performance!

Some of the many potential uses for Zapper:

Web backends with live reloading of templates
Automatic A/B testing systems where the variants are templates
Formatting a large dataset on-the-fly for easier consumption using user-provided templates
Sending out templated emails to an email list

This engine is Zapper! [crates.io link]

There is an interactive web demo of Zapper available here! The web demo was written using Rust and then compiled to WebAssembly. The demo allows you to edit the template in real time, adjust the number of rows being rendered (the number of template instantiations), as well as to observe the intermediate and final representations of the template.

Zapper is both a templating engine (the Zapper VM) and a templating language. Zapper-the-language currently makes no attempt to be compatible with Handlebars or any other templating language, but it’s entirely possible to build other language frontends for the templating engine. In the future, it would be great to see other language frontends built on the Zapper VM, and for the Zapper VM to be extended to support more complex templating functionality. At the moment, Zapper’s two biggest limitations are that it does not currently support conditional rendering or nested templates (like foreach loops over the elements of an interior list). Zapper could be extended to support those things, I just haven’t implemented them yet. Due to the design of Zapper, those features should have effectively no performance impact on templates that choose not to use them, which is nice. For applications rendering websites from templates, these limitations might be reason enough to stick with an option like Handlebars for the time being!

To see Zapper in action, look at this example, simplified here somewhat:

#[macro_use] extern crate zapper;

use zapper::compile;
use std::io::stdout;

#[derive(ZapperRunner)]
// sqrt is a filter function that operates on a numeric input
#[filter = "sqrt/0n"]
// toupper is a filter function that operates on a stringifiable input
#[filter = "toupper/0s"]
// round is a filter function that takes a numeric input *and* an argument
#[filter = "round/1n"]
struct Person {
    id: u64,
    name: String,
    age: u32,
    weight: f64,
}

// the "environment" struct provides values that are templatable,
// but should be considered constant when the template is compiled to bytecode
#[derive(ZapperEnv)]
// the "runner" is the struct that the bytecode will render
#[runner = "Person"]
struct Provider {
    provider: String,
    provider_code: u32,
}


//
// three example filter functions that can be used in our Zapper templates:
//

// given this (simple) template:
// {{ 32.54578 | round 3 }}
// this filter function receives:
//  input: 32.54578
//   args: [3]
//
// returns 32.545
fn round(_data: &Person, args: &[f64], input: f64) -> f64 {
    let digits = args[0] as u32;
    if digits > 10 {
        return input;
    }
    let factor = 10u32.pow(digits) as f64;
    let value = (input * factor).round() as f64;
    value / factor
}

fn sqrt(_data: &Person, _args: &[f64], input: f64) -> f64 {
    input.sqrt()
}

fn toupper(_data: &Person, _args: &[f64], input: &str, buffer: &mut String) {
    for c in input.as_bytes() {
        buffer.push(c.to_ascii_uppercase() as char)
    }
}

fn main() {
    let template = "{{provider}} {{provider_code + 4}} {{id}} {{name | toupper}} {{age | sqrt}} {{weight}}kg\n";

    let env = Provider {
        provider: "john doe".to_string(),
        provider_code: 31,
    };

    let mut bytecode = match compile(template, &env) {
        Ok(bc) => bc,
        Err(err) => {
            eprintln!("error compiling template: {}", err);
            return;
        }
    };

    // build up a group of 100 (similar) people
    let mut group = vec![];
    for i in 0..100 {
        group.push(Person {
            id: 12 + i,
            name: "Bob".to_string(),
            age: 49,
            weight: 170.3 + i as f64,
        });
    }

    for person in group {
        bytecode.render(&person, &mut stdout()).unwrap();
    }
}

Zapper Design Overview #

The two most important parts of this architecture:

no hashmaps
compiling the template (at runtime) into optimized bytecode for a simple bytecode VM.

At compile time for your Rust code, a series of enums are created which describe the data that is going to be rendered with templates. There are two basic data types in Zapper:

numeric values (f64 internally)
values which can be converted into strings (also includes numeric values)

In Zapper, the fields of a struct are divided on these lines, so there is an enum listing the fields that are numeric, and an enum listing the fields that either are strings, or must be converted to strings. Developer-defined filter functions are also available, which a template author can use to do more advanced manipulations before rendering.

Filter functions may be defined to work only on numeric values, or to work with any stringifiable type. It is also possible to define filter functions which operate on completely custom data types, but those data types must also be stringifiable in case the user wants to print the values directly.

Zapper offers Custom Derive support that makes it easy to autogenerate all of the required enums, as well as the implementations of the necessary traits, as seen in the example. The syntax for defining a filter function (seen here) deserves a quick mention:

      #[filter = "round/1n"]
                    |  |||
                    |  ||| 
   name ------------   |||
   separator ---------- ||
   number of arguments - |
   input data type ------

Valid data types are n for numeric, s for string, and x for custom.

If we look at the template expression {{acceleration | round 2}}, acceleration is the input expression to the filter function, and in this case it will be required to be a numeric value. The required first argument is the number 2. For now, only numeric literal arguments are allowed, but I would like to extend this to string literals as well.

At runtime, templates can be compiled and rendered at will. The compilation phase has several distinct phases:

The Tokenizer, which separates the incoming character stream into useful, labeled chunks known as tokens.
Assembling an Abstract Syntax Tree (AST), which forms a tree-like structure that makes the tokens easier to manipulate for optimization.
Optimizing the AST, which attempts to reduce the size of the AST by evaluating constant expressions and joining adjacent strings.
Emitting Bytecode, which is the set of instructions that must be executed sequentially to render an instance of the template.

The AST and Bytecode steps each identify different errors that make the template invalid. If a template is able to be compiled successfully, it will run without errors for any arbitrary set of values, assuming that no filter functions provided by the project utilizing Zapper are faulty. If a template is invalid, a reasonably useful error message should be returned. The templates are statically typed: doing arithmetic on a string field will not result in runnable bytecode, and the same with passing the wrong number of arguments to a filter, or passing an input of the wrong type to a filter function. All of this can be seen on the interactive demo page linked at the top of the post.

In many situations that involve templating, it is common to have some data that is held constant, and some data that varies for each instantiation. In Zapper, this is the Environment and the Runner. Data from the Environment is folded into the AST during the optimization step, and those fields cease to exist by the time the bytecode is executed.

The final bytecode is designed to be easily (de)serializable, if some application wanted to cache that on disk, although no serialization is currently implemented. Compile times for templates at run time would mainly be a concern only if that template is being executed just a handful of times after compilation.

I spent a few hours using AFL trying to discover and fix any crashes that could be caused by user-provided templates. Zapper seems to be safe from panics or other errors, as long as the user isn’t allowed to give you an infinitely large template.

Performance #

In one small benchmark, Zapper is at least 30x faster than the Rust implementation of Handlebars. The output of one benchmarking run from my desktop (reproduced in full here) showed 38x faster when both are using only a single core, and 133x faster when allowed to use multiple cores.

Criterion benchmark output:

zapper      time:   [369.25 us 369.52 us 369.82 us]
Found 3 outliers among 200 measurements (1.50%)
3 (1.50%) high mild

zapper_par  time:   [105.51 us 105.60 us 105.68 us]
Found 22 outliers among 200 measurements (11.00%)
5 (2.50%) low severe
10 (5.00%) low mild
7 (3.50%) high mild

hbs         time:   [14.063 ms 14.091 ms 14.121 ms]
Found 10 outliers among 200 measurements (5.00%)
7 (3.50%) high mild
3 (1.50%) high severe

Each of these benchmarks are measuring the time it took to render 1,000 instantiations of a given template. This particular run was performed on my desktop, which has an AMD Ryzen 7 2700X processor running at stock clocks, however the parallel benchmark doesn’t seem to gain much beyond 8 threads due to the fact that only 1,000 instantiations are being rendered, which doesn’t take enough CPU time to really utilize 16 threads. If we do 1000 / t, we can convert the above benchmarks into instantiations/second, or insts/s, as seen here:

zapper      time:   [2,704,018.1 insts/s  2,706,213.4 insts/s  2,708,192.2 insts/s]
zapper_par  time:   [9,462,528.4 insts/s  9,469,697.0 insts/s  9,477,774.6 insts/s]
hbs         time:   [   70,816.5 insts/s     70,967.3 insts/s     71,108.6 insts/s]

Conclusion #

Zapper is lightning fast, stable, and flexible. If you make use of Zapper, I would love to hear about it! If you come up with better or more comprehensive benchmarks, I’m happy to merge benchmark PRs.

The general design for Zapper was described to me by Joe Wilm, who graciously allowed me to implement it.