Getting Started

Wombat is a declarative data streaming service that solves a wide range of data engineering problems with simple, chained, stateless processing steps. It implements transaction based resiliency with back pressure, so when connecting to at-least-once sources and sinks it’s able to guarantee at-least-once delivery without needing to persist messages during transit.

It’s simple to deploy, comes with a wide range of connectors, and is totally data agnostic, making it easy to drop into your existing infrastructure. Wombat has functionality that overlaps with integration frameworks, log aggregators and ETL workflow engines, and can therefore be used to complement these traditional data engineering tools or act as a simpler alternative.

Wombat is ready to commit to this relationship, are you?

Wombat is distributed as a single binary which you can download from the releases page.

If you have docker installed you can pull the latest official Wombat image with:

docker pull ghcr.io/wombatwisdom/wombat
docker run --rm -v /path/to/your/config.yaml:/wombat.yaml ghcr.io/wombatwisdom/wombat

Create a pipeline config file

A Wombat stream pipeline is configured with a single pipeline config file, you can generate a fresh one with:

wombat create > config.yaml

The main sections that make up a config are input, pipeline and output. When you generate a fresh config it’ll simply pipe stdin to stdout like this:

input:
    stdin: {}

pipeline:
    processors: []

output:
    stdout: {}

Eventually we’ll want to configure a more useful input and output, but for now this is useful for quickly testing processors.

Run your pipeline

Now that you have a simple pipeline configuration, you can pass it to the wombat command to actually run it:

wombat -c ./config.yaml

Anything you write to stdin will get written unchanged to stdout, cool! Resist the temptation to play with this for hours, there’s more stuff to try out.

Add a processor

Next, let’s add some processing steps in order to mutate messages. The most powerful one is the mapping processor which allows us to perform mappings, let’s add a mapping to uppercase our messages:

input:
  stdin: {}

pipeline:
  processors:
    - mapping: root = content().uppercase()

output:
  stdout: {}

Now your messages should come out in all caps, how whacky! IT’S LIKE WOMBAT IS SHOUTING BACK AT YOU! You can add as many processing steps as you like, and since processors are what make Wombat powerful they are worth experimenting with.

Working with structured data

Let’s create a more advanced pipeline that works with JSON documents:

input:
  stdin: {}

pipeline:
  processors:
    - sleep:
        duration: 500ms
    - mapping: |
        root.doc = this
        root.first_name = this.names.index(0).uppercase()
        root.last_name = this.names.index(-1).hash("sha256").encode("base64")

output:
  stdout: {}

First, we sleep for 500 milliseconds just to keep the suspense going. Next, we restructure our input JSON document by nesting it within a field doc, we map the upper-cased first element of names to a new field first_name. Finally, we map the hashed and base64 encoded value of the last element of names to a new field last_name.

Try running that config with some sample documents:

echo '{"id":"1","names":["celine","dion"]}
{"id":"2","names":["chad","robert","kroeger"]}' | wombat -c ./config.yaml

You should see (amongst some logs):

{"doc":{"id":"1","names":["celine","dion"]},"first_name":"CELINE","last_name":"1VvPgCW9sityz5XAMGdI2BTA7/44Wb3cANKxqhiCo50="}
{"doc":{"id":"2","names":["chad","robert","kroeger"]},"first_name":"CHAD","last_name":"uXXg5wCKPjpyj/qbivPbD9H9CZ5DH/F0Q1Twytnt2hQ="}

How exciting! I don’t know about you but I’m going to need to lie down for a while. Now that you are a Wombat expert might I suggest you peruse these sections to see if anything tickles your fancy?

Bloblang Dive deeper into the mistery mapping language.

Guides Practical examples of how to use Wombat.

Reference Navigate the reference documentation to find the meaning of that specific field.