cohere_embeddings

Generates vector embeddings to represent input text, using the Cohere API.

Introduced in version 4.37.0.

# Config fields, showing default values
label: ""
cohere_embeddings:
  base_url: https://api.cohere.com
  api_key: "" # No default (required)
  model: embed-english-v3.0 # No default (required)
  text_mapping: "" # No default (optional)
  input_type: search_document
  dimensions: 0 # No default (optional)

This processor sends text strings to the Cohere API, which generates vector embeddings. By default, the processor submits the entire payload of each message as a string, unless you use the text_mapping configuration field to customize it.

To learn more about vector embeddings, see the Cohere API documentation.

Examples

Store embedding vectors in Qdrant

Compute embeddings for some generated data and store it within xrefs:component:outputs/qdrant.adoc[Qdrant]

input:
  generate:
    interval: 1s
    mapping: |
      root = {"text": fake("paragraph")}
pipeline:
  processors:
  - cohere_embeddings:
      model: embed-english-v3
      api_key: "${COHERE_API_KEY}"
      text_mapping: "root = this.text"
output:
  qdrant:
    grpc_host: localhost:6334
    collection_name: "example_collection"
    id: "root = uuid_v4()"
    vector_mapping: "root = this"

Fields

`base_url`

The base URL to use for API requests.

Type: string

Default: "https://api.cohere.com"

`api_key`

The API key for the Cohere API.

Type: string

`model`

The name of the Cohere model to use.

Type: string

# Examples

model: embed-english-v3.0

model: embed-english-light-v3.0

model: embed-multilingual-v3.0

model: embed-multilingual-light-v3.0

`text_mapping`

The text you want to generate a vector embedding for. By default, the processor submits the entire payload as a string.

Type: string

`input_type`

Specifies the type of input passed to the model.

Type: string

Default: "search_document"

Option	Summary
`classification`	Used for embeddings passed through a text classifier.
`clustering`	Used for the embeddings run through a clustering algorithm.
`search_document`	Used for embeddings stored in a vector database for search use-cases.
`search_query`	Used for embeddings of search queries run against a vector DB to find relevant documents.

`dimensions`

The number of dimensions of the output embedding. This is only available for embed-v4 and newer models. Possible values are 256, 512, 1024, and 1536.

Type: int