parquet_encode
Encodes Parquet files from a batch of structured messages.
Introduced in version 4.4.0.
This processor uses https://github.com/parquet-go/parquet-go, which is itself experimental. Therefore changes could be made into how this processor functions outside of major version releases.
Examples
In this example we use the batching mechanism of an aws_s3
output to collect a batch of messages in memory, which then converts it to a parquet file and uploads it.
Fields
schema
Parquet schema.
Type: array
schema[].name
The name of the column.
Type: string
schema[].type
The type of the column, only applicable for leaf columns with no child fields. Some logical types can be specified here such as UTF8.
Type: string
Options:
BOOLEAN
, INT32
, INT64
, FLOAT
, DOUBLE
, BYTE_ARRAY
, UTF8
.
schema[].repeated
Whether the field is repeated.
Type: bool
Default: false
schema[].optional
Whether the field is optional.
Type: bool
Default: false
schema[].fields
A list of child fields.
Type: array
default_compression
The default compression type to use for fields.
Type: string
Default: "uncompressed"
Options:
uncompressed
, snappy
, gzip
, brotli
, zstd
, lz4raw
.
default_encoding
The default encoding type to use for fields. A custom default encoding is only necessary when consuming data with libraries that do not support DELTA_LENGTH_BYTE_ARRAY
and is therefore best left unset where possible.
Type: string
Default: "DELTA_LENGTH_BYTE_ARRAY"
Requires version 4.11.0 or newer
Options:
DELTA_LENGTH_BYTE_ARRAY
, PLAIN
.