parquet_decode
Decodes Parquet files into a batch of structured messages.
Introduced in version 4.4.0.
This processor uses https://github.com/parquet-go/parquet-go, which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases.
Examples
In this example we consume files from AWS S3 as they’re written by listening onto an SQS queue for upload events. We make sure to use the to_the_end
scanner which means files are read into memory in full, which then allows us to use a parquet_decode
processor to expand each file into a batch of messages. Finally, we write the data out to local files as newline delimited JSON.