Skip to content

cohere_embeddings

Generates vector embeddings to represent input text, using the Cohere API.

Introduced in version 4.37.0.

# Config fields, showing default values
label: ""
cohere_embeddings:
base_url: https://api.cohere.com
api_key: "" # No default (required)
model: embed-english-v3.0 # No default (required)
text_mapping: "" # No default (optional)
input_type: search_document
dimensions: 0 # No default (optional)

This processor sends text strings to the Cohere API, which generates vector embeddings. By default, the processor submits the entire payload of each message as a string, unless you use the text_mapping configuration field to customize it.

To learn more about vector embeddings, see the Cohere API documentation.

Examples

Compute embeddings for some generated data and store it within xrefs:component:outputs/qdrant.adoc[Qdrant]

input:
generate:
interval: 1s
mapping: |
root = {"text": fake("paragraph")}
pipeline:
processors:
- cohere_embeddings:
model: embed-english-v3
api_key: "${COHERE_API_KEY}"
text_mapping: "root = this.text"
output:
qdrant:
grpc_host: localhost:6334
collection_name: "example_collection"
id: "root = uuid_v4()"
vector_mapping: "root = this"

Fields

base_url

The base URL to use for API requests.

Type: string

Default: "https://api.cohere.com"

api_key

The API key for the Cohere API.

Type: string

model

The name of the Cohere model to use.

Type: string

# Examples
model: embed-english-v3.0
model: embed-english-light-v3.0
model: embed-multilingual-v3.0
model: embed-multilingual-light-v3.0

text_mapping

The text you want to generate a vector embedding for. By default, the processor submits the entire payload as a string.

Type: string

input_type

Specifies the type of input passed to the model.

Type: string

Default: "search_document"

OptionSummary
classificationUsed for embeddings passed through a text classifier.
clusteringUsed for the embeddings run through a clustering algorithm.
search_documentUsed for embeddings stored in a vector database for search use-cases.
search_queryUsed for embeddings of search queries run against a vector DB to find relevant documents.

dimensions

The number of dimensions of the output embedding. This is only available for embed-v4 and newer models. Possible values are 256, 512, 1024, and 1536.

Type: int