Skip to content

How Wombat works

When using Wombat, you will rely on two different things. The first is the wombat binary which you can download from the releases page. The second is a pipeline configuration file that you will need to write. This configuration file is written in YAML and describes the input, processing, and output stages of your pipeline:

input:
kafka:
addresses: [ TODO ]
topics: [ foo, bar ]
consumer_group: foogroup
pipeline:
processors:
- mapping: |
root.message = this
root.meta.link_count = this.links.length()
output:
aws_s3:
bucket: TODO
path: '${! metadata("kafka_topic") }/${! json("message.id") }.json'

Once you have the binary and the configuration file, you can run the pipeline with the following command:

Terminal window
wombat -c <your_config.yaml>

This will start the pipeline and it will begin processing messages from the input source, applying the processors you defined as part of the pipeline and then sending the processed messages to the output target.

Addressing Fields

Many components within Wombat allow you to target certain fields using a JSON dot path. The syntax of a path within Wombat is similar to JSON Pointers, except with dot separators instead of slashes (and no leading dot.) When a path is used to set a value any path segment that does not yet exist in the structure is created as an object.

For example, if we had the following JSON structure:

{
"foo": {
"bar": 21
}
}

The query path foo.bar would return 21.

The characters ~ (%x7E) and . (%x2E) have special meaning in Wombat paths. Therefore ~ needs to be encoded as ~0 and . needs to be encoded as ~1 when these characters appear within a key.

For example, if we had the following JSON structure:

{
"foo.foo": {
"bar~bo": {
"": {
"baz": 22
}
}
}
}

The query path foo~1foo.bar~0bo..baz would return 22.

Arrays

When Wombat encounters an array whilst traversing a JSON structure it requires the next path segment to be either an integer of an existing index, or, depending on whether the path is used to query or set the target value, the character * or - respectively.

For example, if we had the following JSON structure:

{
"foo": [
0,
1,
{
"bar": 23
}
]
}

The query path foo.2.bar would return 23.

Querying

When a query reaches an array the character * indicates that the query should return the value of the remaining path from each element of the array (within an array.)

Setting

When an array is reached the character - indicates that a new element should be appended to the end of the existing elements, if this character is not the final segment of the path then an object is created.

Reloading

It’s possible to have a running instance of Wombat reload configurations, including resource files imported with -r/ --resources, automatically when the files are updated without needing to manually restart the service. This is done by specifying the -w/--watcher flag when running Wombat in normal mode or in streams mode:

Terminal window
# Normal mode
wombat -w -r ./production/request.yaml -c ./config.yaml
Terminal window
# Streams mode
wombat -w -r ./production/request.yaml streams ./stream_configs/*.yaml

If a file update results in configuration parsing or linting errors then the change is ignored (with logs informing you of the problem) and the previous configuration will continue to be run (until the issues are fixed).

Shutting down

Under normal operating conditions, the Wombat process will shut down when there are no more messages produced by inputs and the final message has been processed. The shutdown procedure can also be initiated by sending the process a interrupt (SIGINT) or termination (SIGTERM) signal. There are two top-level configuration options that control the shutdown behaviour: shutdown_timeout and shutdown_delay.

Shutdown delay

The shutdown_delay option can be used to delay the start of the shutdown procedure. This is useful for pipelines that need a short grace period to have their metrics and traces scraped. While the shutdown delay is in effect, the HTTP metrics endpoint continues to be available for scraping and any active tracers are free to flush remaining traces.

The shutdown delay can be interrupted by sending the Wombat process a second OS interrupt or termination signal.

Shutdown timeout

The shutdown_timeout option sets a hard deadline for Wombat process to gracefully terminate. If this duration is exceeded then the process is forcefully terminated and any messages that were in-flight will be dropped.

This option takes effect after the shutdown_delay duration has passed if that is enabled.