Skip to content

openai_embeddings

Generates vector embeddings to represent input text, using the OpenAI API.

Introduced in version 4.32.0.

# Config fields, showing default values
label: ""
openai_embeddings:
server_address: https://api.openai.com/v1
api_key: "" # No default (required)
model: text-embedding-3-large # No default (required)
text_mapping: "" # No default (optional)
dimensions: 0 # No default (optional)

This processor sends text strings to the OpenAI API, which generates vector embeddings. By default, the processor submits the entire payload of each message as a string, unless you use the text_mapping configuration field to customize it.

To learn more about vector embeddings, see the OpenAI API documentation.

Examples

Compute embeddings for some generated data and store it within xrefs:component:outputs/pinecone.adoc[Pinecone]

input:
generate:
interval: 1s
mapping: |
root = {"text": fake("paragraph")}
pipeline:
processors:
- openai_embeddings:
model: text-embedding-3-large
api_key: "${OPENAI_API_KEY}"
text_mapping: "root = this.text"
output:
pinecone:
host: "${PINECONE_HOST}"
api_key: "${PINECONE_API_KEY}"
id: "root = uuid_v4()"
vector_mapping: "root = this"

Fields

server_address

The Open API endpoint that the processor sends requests to. Update the default value to use another OpenAI compatible service.

Type: string

Default: "https://api.openai.com/v1"

api_key

The API key for OpenAI API.

Type: string

model

The name of the OpenAI model to use.

Type: string

# Examples
model: text-embedding-3-large
model: text-embedding-3-small
model: text-embedding-ada-002

text_mapping

The text you want to generate a vector embedding for. By default, the processor submits the entire payload as a string.

Type: string

dimensions

The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

Type: int