推理（Inference）

class elasticsearch.client.InferenceClient

To use this client, access client.inference from an Elasticsearch client. For example:

from elasticsearch import Elasticsearch

# Create the client instance
client = Elasticsearch(...)
# Use the inference client
client.inference.<method>(...)

completion(*, inference_id, input=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

在服务上执行补全推理

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-inference

Parameters:

inference_id (str) – 推理ID
input (str | Sequence[str] | None) – 推理输入。可以是字符串或字符串数组
task_settings (Any | None) – 可选的任务设置
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理请求完成的超时时间
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

delete(*, inference_id, task_type=None, dry_run=None, error_trace=None, filter_path=None, force=None, human=None, pretty=None)

删除推理端点

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-delete

Parameters:

inference_id (str) – 推理标识符
task_type (str | Literal['chat_completion', 'completion', 'rerank', 'sparse_embedding', 'text_embedding'] | None) – 任务类型
dry_run (bool | None) – 为true时，不删除端点并返回引用该端点的ingest处理器列表
force (bool | None) – 为true时，即使端点仍被ingest处理器或语义文本字段使用也会强制删除
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)

Return type:

ObjectApiResponse[Any]

get(*, task_type=None, inference_id=None, error_trace=None, filter_path=None, human=None, pretty=None)

获取推理端点

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-get

Parameters:

task_type (str | Literal['chat_completion', 'completion', 'rerank', 'sparse_embedding', 'text_embedding'] | None) – 任务类型
inference_id (str | None) – 推理ID
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)

Return type:

ObjectApiResponse[Any]

inference(*, inference_id, input=None, task_type=None, error_trace=None, filter_path=None, human=None, input_type=None, pretty=None, query=None, task_settings=None, timeout=None, body=None)

在服务上执行推理

此API允许您使用机器学习模型对提供的输入数据执行特定任务，并返回任务结果。

使用的推理端点可以执行创建时通过create inference API定义的特定任务。

有关与Amazon Bedrock、Anthropic或HuggingFace等服务使用此API的详细信息，请参阅服务特定文档。

info 推理API支持使用内置机器学习模型(ELSER、E5)、通过Eland上传的模型，以及Cohere、OpenAI、Azure、Google AI Studio、Google Vertex AI、Anthropic、Watsonx.ai或Hugging Face等服务。对于内置模型和通过Eland上传的模型，推理API提供了使用和管理训练模型的替代方式。但如果不打算使用推理API或需要使用非NLP模型，请使用机器学习训练模型API。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-inference

Parameters:

inference_id (str) – 推理端点的唯一标识符
input (str | Sequence[str] | None) – 要执行推理任务的文本。可以是单个字符串或数组。> info > 当前`completion`任务类型的推理端点仅支持单个字符串输入
task_type (str | Literal['chat_completion', 'completion', 'rerank', 'sparse_embedding', 'text_embedding'] | None) – 模型执行的推理任务类型
input_type (str | None) – 指定文本嵌入模型的输入数据类型。input_type`参数仅适用于`text_embedding`任务类型的推理端点。可能值包括：* `SEARCH * INGEST * CLASSIFICATION * CLUSTERING。并非所有服务都支持所有值。不支持的值会触发验证异常。接受值取决于配置的推理服务，详情请参阅相关服务文档。> info > 请求体根级别指定的`input_type`参数将优先于`task_settings`中指定的`input_type`参数
query (str | None) – 仅`rerank`任务需要的查询输入，其他任务不需要
task_settings (Any | None) – 单个推理请求的任务设置。这些设置特定于指定的任务类型，并覆盖初始化服务时指定的任务设置
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 等待推理请求完成的超时时间
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put(*, inference_id, inference_config=None, body=None, task_type=None, error_trace=None, filter_path=None, human=None, pretty=None, timeout=None)

创建推理端点

重要说明：推理API支持使用内置机器学习模型(ELSER、E5)、通过Eland上传的模型，以及Cohere、OpenAI、Mistral、Azure OpenAI、Google AI Studio、Google Vertex AI、Anthropic、Watsonx.ai或Hugging Face等服务。对于内置模型和通过Eland上传的模型，推理API提供了使用和管理训练模型的替代方式。但如果不打算使用推理API或需要使用非NLP模型，请使用机器学习训练模型API。

以下集成可通过推理API使用。集成名称旁标注了可用的任务类型：

阿里云AI搜索(completion, rerank, sparse_embedding, text_embedding)
Amazon Bedrock(completion, text_embedding)
Anthropic(completion)
Azure AI Studio(completion, 'rerank', text_embedding)
Azure OpenAI(completion, text_embedding)
Cohere(completion, rerank, text_embedding)
DeepSeek(completion, chat_completion)
Elasticsearch(rerank, sparse_embedding, text_embedding - 此服务适用于内置模型和通过Eland上传的模型)
ELSER(sparse_embedding)
Google AI Studio(completion, text_embedding)
Google Vertex AI(rerank, text_embedding)
Hugging Face(chat_completion, completion, rerank, text_embedding)
Mistral(chat_completion, completion, text_embedding)
OpenAI(chat_completion, completion, text_embedding)
VoyageAI(text_embedding, rerank)
Watsonx推理集成(text_embedding)
JinaAI(text_embedding, rerank)

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put

Parameters:

inference_id (str) – 推理ID
inference_config (Mapping[str, Any] | None)
task_type (str | Literal['chat_completion', 'completion', 'rerank', 'sparse_embedding', 'text_embedding'] | None) – 任务类型。可用任务类型请参考API描述中的集成列表
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理端点创建完成的超时时间
body (Mapping[str, Any] | None)
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)

Return type:

ObjectApiResponse[Any]

put_alibabacloud(*, task_type, alibabacloud_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建阿里云AI搜索推理终端节点。

创建推理终端节点以使用alibabacloud-ai-search服务执行推理任务。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-alibabacloud

Parameters:

task_type (str | Literal['completion', 'rerank', 'space_embedding', 'text_embedding']) – 模型将执行的推理任务类型。
alibabacloud_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['alibabacloud-ai-search'] | None) – 指定任务类型支持的服务类型。此处应为`alibabacloud-ai-search`。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置。这些设置特定于`alibabacloud-ai-search`服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置。这些设置特定于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_amazonbedrock(*, task_type, amazonbedrock_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建Amazon Bedrock推理终端节点。

创建推理终端节点以使用amazonbedrock服务执行推理任务。

提示您只需在创建推理模型时提供一次访问密钥和密钥。获取推理API不会检索您的访问密钥或密钥。创建推理模型后，无法更改关联的密钥对。如需使用不同的访问密钥和密钥对，请删除推理模型并使用相同的名称和更新后的密钥重新创建。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-amazonbedrock

Parameters:

task_type (str | Literal['completion', 'text_embedding']) – 模型将执行的推理任务类型。
amazonbedrock_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['amazonbedrock'] | None) – 指定任务类型支持的服务类型。此处应为`amazonbedrock`。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置。这些设置特定于`amazonbedrock`服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置。这些设置特定于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_anthropic(*, task_type, anthropic_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建Anthropic推理终端节点。

创建推理终端节点以使用anthropic服务执行推理任务。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-anthropic

Parameters:

task_type (str | Literal['completion']) – 任务类型。模型唯一有效的任务类型是`completion`。
anthropic_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['anthropic'] | None) – 指定任务类型支持的服务类型。此处应为`anthropic`。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置。这些设置特定于`watsonxai`服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置。这些设置特定于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_azureaistudio(*, task_type, azureaistudio_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建Azure AI Studio推理终端节点。

创建推理终端节点以使用azureaistudio服务执行推理任务。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-azureaistudio

Parameters:

task_type (str | Literal['completion', 'rerank', 'text_embedding']) – 模型将执行的推理任务类型。
azureaistudio_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['azureaistudio'] | None) – 指定任务类型支持的服务类型。此处应为`azureaistudio`。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置。这些设置特定于`openai`服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置。这些设置特定于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_azureopenai(*, task_type, azureopenai_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建Azure OpenAI推理终端节点。

创建推理终端节点以使用azureopenai服务执行推理任务。

您可以在Azure OpenAI部署中选择的聊天补全模型列表包括：

您可以在部署中选择的嵌入模型列表可在Azure模型文档中找到。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-azureopenai

Parameters:

task_type (str | Literal['completion', 'text_embedding']) – 模型将执行的推理任务类型。注意：`chat_completion`任务类型仅支持流式传输，且只能通过_stream API使用。
azureopenai_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['azureopenai'] | None) – 指定任务类型支持的服务类型。此处应为`azureopenai`。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置。这些设置特定于`azureopenai`服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置。这些设置特定于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_cohere(*, task_type, cohere_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建 Cohere 推理终端节点。

创建一个推理终端节点，用于通过 cohere 服务执行推理任务。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-cohere

Parameters:

task_type (str | Literal['completion', 'rerank', 'text_embedding']) – 模型将执行的推理任务类型。
cohere_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['cohere'] | None) – 指定任务类型支持的服务类型。此处为 cohere。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置，这些设置专属于 cohere 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置，这些设置专属于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_custom(*, task_type, custom_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, body=None)

创建自定义推理终端节点。

自定义服务提供了对未通过专用集成明确支持的外部推理服务的更精细控制。该服务允许您定义请求头、URL、查询参数、请求体和密钥。自定义服务支持模板替换功能，您可以定义模板并用与该键关联的值进行替换。模板是以 ${ 开头并以 } 结尾的字符串片段。系统会检查参数 secret_parameters 和 task_settings 中是否存在用于模板替换的键。模板替换支持在 request、headers、url 和 query_parameters 中使用。如果未找到模板对应的定义（键），将返回错误消息。对于如下终端节点定义：

PUT _inference/text_embedding/test-text-embedding
{
  "service": "custom",
  "service_settings": {
     "secret_parameters": {
          "api_key": "<some api key>"
     },
     "url": "...endpoints.huggingface.cloud/v1/embeddings",
     "headers": {
         "Authorization": "Bearer ${api_key}",
         "Content-Type": "application/json"
     },
     "request": "{\"input\": ${input}}",
     "response": {
         "json_parser": {
             "text_embeddings":"$.data[*].embedding[*]"
         }
     }
  }
}

要替换 ${api_key}，系统会检查 secret_parameters 和 task_settings 中是否存在名为 api_key 的键。

info 模板不应被引号包围。

预定义模板：

${input} 指代来自后续推理请求 input 字段的输入字符串数组。
${input_type} 指代输入类型转换值。
${query} 专用于重排序任务中的查询字段。
${top_n} 指代执行重排序请求时可用的 top_n 字段。
${return_documents} 指代执行重排序请求时可用的 return_documents 字段。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-custom

Parameters:

task_type (str | Literal['completion', 'rerank', 'sparse_embedding', 'text_embedding']) – 模型将执行的推理任务类型。
custom_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['custom'] | None) – 指定任务类型支持的服务类型。此处为 custom。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置，这些设置专属于 custom 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置，这些设置专属于您指定的任务类型。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_deepseek(*, task_type, deepseek_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, timeout=None, body=None)

创建 DeepSeek 推理终端节点。

创建一个推理终端节点，用于通过 deepseek 服务执行推理任务。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-deepseek

Parameters:

task_type (str | Literal['chat_completion', 'completion']) – 模型将执行的推理任务类型。
deepseek_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['deepseek'] | None) – 指定任务类型支持的服务类型。此处为 deepseek。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置，这些设置专属于 deepseek 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_elasticsearch(*, task_type, elasticsearch_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建 Elasticsearch 推理终端节点。

创建一个推理终端节点，用于通过 elasticsearch 服务执行推理任务。

info 您的 Elasticsearch 部署已包含预配置的 ELSER 和 E5 推理终端节点，只有在需要自定义设置时才需通过 API 创建终端节点。

如果通过 elasticsearch 服务使用 ELSER 或 E5 模型，当模型尚未下载时，API 请求将自动下载并部署该模型。

info 使用 Kibana 控制台时，响应中可能会出现 502 网关错误。此错误通常仅反映超时情况，而模型会在后台继续下载。您可以在机器学习 UI 中查看下载进度。如果使用 Python 客户端，可以将 timeout 参数设置为更高的值。

创建终端节点后，请等待模型部署完成后再使用。要验证部署状态，请使用获取训练模型统计信息 API。在响应中查找 "state": "fully_allocated" 并确保 "allocation_count" 与 "target_allocation_count" 匹配。除非必要，避免为同一模型创建多个终端节点，因为每个终端节点都会消耗大量资源。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-elasticsearch

Parameters:

task_type (str | Literal['rerank', 'sparse_embedding', 'text_embedding']) – 模型将执行的推理任务类型。
elasticsearch_inference_id (str) – 推理终端节点的唯一标识符。该值不得与 model_id 相同。
service (str | Literal['elasticsearch'] | None) – 指定任务类型支持的服务类型。此处为 elasticsearch。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置，这些设置专属于 elasticsearch 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置，这些设置专属于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_elser(*, task_type, elser_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, timeout=None, body=None)

创建 ELSER 推理终端节点。

创建一个推理终端节点，用于通过 elser 服务执行推理任务。您也可以通过 Elasticsearch 推理集成来部署 ELSER。

info 您的 Elasticsearch 部署已包含预配置的 ELSER 推理终端节点，只有在需要自定义设置时才需通过 API 创建终端节点。

如果 ELSER 模型尚未下载，API 请求将自动下载并部署该模型。

info 使用 Kibana 控制台时，响应中可能会出现 502 网关错误。此错误通常仅反映超时情况，而模型会在后台继续下载。您可以在机器学习 UI 中查看下载进度。如果使用 Python 客户端，可以将 timeout 参数设置为更高的值。

创建终端节点后，请等待模型部署完成后再使用。要验证部署状态，请使用获取训练模型统计信息 API。在响应中查找 "state": "fully_allocated" 并确保 "allocation_count" 与 "target_allocation_count" 匹配。除非必要，避免为同一模型创建多个终端节点，因为每个终端节点都会消耗大量资源。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-elser

Parameters:

task_type (str | Literal['sparse_embedding']) – 模型将执行的推理任务类型。
elser_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['elser'] | None) – 指定任务类型支持的服务类型。此处为 elser。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置，这些设置专属于 elser 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_googleaistudio(*, task_type, googleaistudio_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, timeout=None, body=None)

创建 Google AI Studio 推理终端节点。

创建一个推理终端节点，用于通过 googleaistudio 服务执行推理任务。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-googleaistudio

Parameters:

task_type (str | Literal['completion', 'text_embedding']) – 模型将执行的推理任务类型。
googleaistudio_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['googleaistudio'] | None) – 指定任务类型支持的服务类型。此处为 googleaistudio。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置，这些设置特定于 googleaistudio 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_googlevertexai(*, task_type, googlevertexai_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建 Google Vertex AI 推理终端节点。

创建一个推理终端节点，用于通过 googlevertexai 服务执行推理任务。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-googlevertexai

Parameters:

task_type (str | Literal['chat_completion', 'completion', 'rerank', 'text_embedding']) – 模型将执行的推理任务类型。
googlevertexai_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['googlevertexai'] | None) – 指定任务类型支持的服务类型。此处为 googlevertexai。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置，这些设置特定于 googlevertexai 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置，这些设置特定于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_hugging_face(*, task_type, huggingface_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建 Hugging Face 推理终端节点。

创建一个推理终端节点，用于通过 hugging_face 服务执行推理任务。支持的任务包括：text_embedding、completion 和 chat_completion。

要配置终端节点，请先访问 Hugging Face Inference Endpoints 页面并创建新终端节点。选择支持您计划使用任务的模型。

对于 Elastic 的 text_embedding 任务：所选模型必须支持 Sentence Embeddings 任务。在新终端节点创建页面，选择 Advanced Configuration 部分下的 Sentence Embeddings 任务。终端节点初始化后，复制生成的终端节点 URL。推荐用于 text_embedding 任务的模型：

all-MiniLM-L6-v2
all-MiniLM-L12-v2
all-mpnet-base-v2
e5-base-v2
e5-small-v2
multilingual-e5-base
multilingual-e5-small

对于 Elastic 的 chat_completion 和 completion 任务：所选模型必须支持 Text Generation 任务并暴露 OpenAI API。HuggingFace 为 Text Generation 支持无服务器和专用终端节点。创建专用终端节点时选择 Text Generation 任务。终端节点初始化后（专用）或准备就绪后（无服务器），确保其支持 OpenAI API 并在 URL 中包含 /v1/chat/completions 部分。然后复制完整终端节点 URL 以供使用。推荐用于 chat_completion 和 completion 任务的模型：

Mistral-7B-Instruct-v0.2
QwQ-32B
Phi-3-mini-128k-instruct

对于 Elastic 的 rerank 任务：所选模型必须支持 sentence-ranking 任务并暴露 OpenAI API。目前 HuggingFace 仅支持专用（非无服务器）终端节点用于 Rerank。终端节点初始化后，复制完整终端节点 URL 以供使用。已测试用于 rerank 任务的模型：

bge-reranker-base
jina-reranker-v1-turbo-en-GGUF

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-hugging-face

Parameters:

task_type (str | Literal['chat_completion', 'completion', 'rerank', 'text_embedding']) – 模型将执行的推理任务类型。
huggingface_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['hugging_face'] | None) – 指定任务类型支持的服务类型。此处为 hugging_face。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置，这些设置特定于 hugging_face 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置，这些设置特定于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_jinaai(*, task_type, jinaai_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建 JinaAI 推理终端节点。

创建一个推理终端节点，用于通过 jinaai 服务执行推理任务。

要查看可用的 rerank 模型，请参考 https://jina.ai/reranker。要查看可用的 text_embedding 模型，请参考 https://jina.ai/embeddings/。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-jinaai

Parameters:

task_type (str | Literal['rerank', 'text_embedding']) – 模型将执行的推理任务类型。
jinaai_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['jinaai'] | None) – 指定任务类型支持的服务类型。此处为 jinaai。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置，这些设置特定于 jinaai 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置，这些设置特定于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_mistral(*, task_type, mistral_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, timeout=None, body=None)

创建 Mistral 推理终端节点。

创建一个推理终端节点，用于通过 mistral 服务执行推理任务。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-mistral

Parameters:

task_type (str | Literal['chat_completion', 'completion', 'text_embedding']) – 模型将执行的推理任务类型。
mistral_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['mistral'] | None) – 指定任务类型支持的服务类型。此处为 mistral。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置，这些设置特定于 mistral 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_openai(*, task_type, openai_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建 OpenAI 推理终端节点。

创建推理终端节点，用于通过 openai 服务或兼容 openai 的 API 执行推理任务。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-openai

Parameters:

task_type (str | Literal['chat_completion', 'completion', 'text_embedding']) – 模型将执行的推理任务类型。注意：chat_completion 任务类型仅支持流式传输，且只能通过 _stream API 使用。
openai_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['openai'] | None) – 指定任务类型支持的服务类型。此处应为 openai。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置。这些设置专用于 openai 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置。这些设置专用于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_voyageai(*, task_type, voyageai_inference_id, service=None, service_settings=None, chunking_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

创建 VoyageAI 推理终端节点。

创建推理终端节点，用于通过 voyageai 服务执行推理任务。

除非必要，避免为同一模型创建多个终端节点，因为每个终端节点都会消耗大量资源。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-voyageai

Parameters:

task_type (str | Literal['rerank', 'text_embedding']) – 模型将执行的推理任务类型。
voyageai_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['voyageai'] | None) – 指定任务类型支持的服务类型。此处应为 voyageai。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置。这些设置专用于 voyageai 服务。
chunking_settings (Mapping[str, Any] | None) – 分块配置对象。
task_settings (Mapping[str, Any] | None) – 配置推理任务的设置。这些设置专用于您指定的任务类型。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

put_watsonx(*, task_type, watsonx_inference_id, service=None, service_settings=None, error_trace=None, filter_path=None, human=None, pretty=None, timeout=None, body=None)

创建 Watsonx 推理终端节点。

创建推理终端节点，用于通过 watsonxai 服务执行推理任务。使用 watsonxai 推理服务需要 IBM Cloud Databases for Elasticsearch 部署。您可以通过 IBM 目录、Cloud Databases CLI 插件、Cloud Databases API 或 Terraform 进行配置。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-watsonx

Parameters:

task_type (str | Literal['chat_completion', 'completion', 'text_embedding']) – 模型将执行的推理任务类型。
watsonx_inference_id (str) – 推理终端节点的唯一标识符。
service (str | Literal['watsonxai'] | None) – 指定任务类型支持的服务类型。此处应为 watsonxai。
service_settings (Mapping[str, Any] | None) – 用于安装推理模型的设置。这些设置专用于 watsonxai 服务。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理终端节点创建的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

rerank(*, inference_id, input=None, query=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

在服务上执行重新排序推理

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-inference

Parameters:

inference_id (str) – 推理终端节点的唯一标识符。
input (str | Sequence[str] | None) – 要执行推理任务的文本。可以是单个字符串或数组。> 信息 > 对于 completion 任务类型的推理终端节点，当前仅支持单个字符串作为输入。
query (str | None) – 查询输入。
task_settings (Any | None) – 单个推理请求的任务设置。这些设置专用于您指定的任务类型，并会覆盖初始化服务时指定的任务设置。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 等待推理请求完成的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

sparse_embedding(*, inference_id, input=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

在服务上执行稀疏嵌入推理

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-inference

Parameters:

inference_id (str) – 推理 ID
input (str | Sequence[str] | None) – 推理输入。可以是字符串或字符串数组。
task_settings (Any | None) – 可选的任务设置
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理请求完成的超时时间。
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

text_embedding(*, inference_id, input=None, error_trace=None, filter_path=None, human=None, pretty=None, task_settings=None, timeout=None, body=None)

在服务上执行文本嵌入推理

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-inference

Parameters:

inference_id (str) – 推理ID
input (str | Sequence[str] | None) – 推理输入。可以是字符串或字符串数组。
task_settings (Any | None) – 可选的任务设置
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 指定等待推理请求完成的超时时间
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)
body (Dict[str, Any] | None)

Return type:

ObjectApiResponse[Any]

update(*, inference_id, inference_config=None, body=None, task_type=None, error_trace=None, filter_path=None, human=None, pretty=None)

更新推理终端节点。

根据特定的终端节点服务和task_type，修改推理终端节点的task_settings、service_settings中的密钥或num_allocations。

重要提示：推理API允许您使用某些服务，例如内置的机器学习模型（ELSER、E5）、通过Eland上传的模型，以及Cohere、OpenAI、Azure、Google AI Studio、Google Vertex AI、Anthropic、Watsonx.ai或Hugging Face的模型。对于内置模型和通过Eland上传的模型，推理API提供了使用和管理训练模型的替代方法。但是，如果您不打算使用推理API来使用这些模型，或者想要使用非NLP模型，请使用机器学习训练模型API。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-update

Parameters:

inference_id (str) – 推理终端节点的唯一标识符
inference_config (Mapping[str, Any] | None)
task_type (str | Literal['chat_completion', 'completion', 'rerank', 'sparse_embedding', 'text_embedding'] | None) – 模型执行的推理任务类型
body (Mapping[str, Any] | None)
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)

Return type:

ObjectApiResponse[Any]