Elasticsearch

class elasticsearch.Elasticsearch

Elasticsearch 低级客户端。提供从 Python 到 Elasticsearch REST API 的直接映射。

客户端实例具有额外属性以访问不同命名空间的 API，如 async_search、indices、security 等：

client = Elasticsearch("http://localhost:9200")

# 获取文档 API
client.get(index="*", id="1")

# 获取索引 API
client.indices.get(index="*")

传输选项可以在客户端构造函数或使用 options() 方法设置：

# 在构造函数中设置 'api_key'
client = Elasticsearch(
    "http://localhost:9200",
    api_key="api_key",
)
client.search(...)

# 为每个请求设置 'api_key'
client.options(api_key="api_key").search(...)

__init__(hosts=None, *, cloud_id=None, api_key=None, basic_auth=None, bearer_auth=None, opaque_id=None, headers=<DEFAULT>, connections_per_node=<DEFAULT>, http_compress=<DEFAULT>, verify_certs=<DEFAULT>, ca_certs=<DEFAULT>, client_cert=<DEFAULT>, client_key=<DEFAULT>, ssl_assert_hostname=<DEFAULT>, ssl_assert_fingerprint=<DEFAULT>, ssl_version=<DEFAULT>, ssl_context=<DEFAULT>, ssl_show_warn=<DEFAULT>, transport_class=<class 'elastic_transport.Transport'>, request_timeout=<DEFAULT>, node_class=<DEFAULT>, node_pool_class=<DEFAULT>, randomize_nodes_in_pool=<DEFAULT>, node_selector_class=<DEFAULT>, dead_node_backoff_factor=<DEFAULT>, max_dead_node_backoff=<DEFAULT>, serializer=None, serializers=<DEFAULT>, default_mimetype='application/json', max_retries=<DEFAULT>, retry_on_status=<DEFAULT>, retry_on_timeout=<DEFAULT>, sniff_on_start=<DEFAULT>, sniff_before_requests=<DEFAULT>, sniff_on_node_failure=<DEFAULT>, sniff_timeout=<DEFAULT>, min_delay_between_sniffing=<DEFAULT>, sniffed_node_callback=None, meta_header=<DEFAULT>, http_auth=<DEFAULT>, _transport=None)

Parameters:

hosts (str | Sequence[str | Mapping[str, str | int] | NodeConfig] | None)
cloud_id (str | None)
api_key (str | Tuple[str, str] | None)
basic_auth (str | Tuple[str, str] | None)
bearer_auth (str | None)
opaque_id (str | None)
headers (DefaultType | Mapping[str, str])
connections_per_node (DefaultType | int)
http_compress (DefaultType | bool)
verify_certs (DefaultType | bool)
ca_certs (DefaultType | str)
client_cert (DefaultType | str)
client_key (DefaultType | str)
ssl_assert_hostname (DefaultType | str)
ssl_assert_fingerprint (DefaultType | str)
ssl_version (DefaultType | int)
ssl_context (DefaultType | Any)
ssl_show_warn (DefaultType | bool)
transport_class (Type[Transport])
request_timeout (DefaultType | None | float)
node_class (DefaultType | Type[BaseNode])
node_pool_class (DefaultType | Type[NodePool])
randomize_nodes_in_pool (DefaultType | bool)
node_selector_class (DefaultType | Type[NodeSelector])
dead_node_backoff_factor (DefaultType | float)
max_dead_node_backoff (DefaultType | float)
serializer (Serializer | None)
serializers (DefaultType | Mapping[str, Serializer])
default_mimetype (str)
max_retries (DefaultType | int)
retry_on_status (DefaultType | int | Collection[int])
retry_on_timeout (DefaultType | bool)
sniff_on_start (DefaultType | bool)
sniff_before_requests (DefaultType | bool)
sniff_on_node_failure (DefaultType | bool)
sniff_timeout (DefaultType | None | float)
min_delay_between_sniffing (DefaultType | None | float)
sniffed_node_callback (Callable[[Dict[str, Any], NodeConfig], NodeConfig | None] | None)
meta_header (DefaultType | bool)
http_auth (DefaultType | Any)
_transport (Transport | None)

Return type:

None

bulk(*, operations=None, body=None, index=None, error_trace=None, filter_path=None, human=None, include_source_on_error=None, list_executed_pipelines=None, pipeline=None, pretty=None, refresh=None, require_alias=None, require_data_stream=None, routing=None, source=None, source_excludes=None, source_includes=None, timeout=None, wait_for_active_shards=None)

批量索引或删除文档。在单个请求中执行多个 index、create、delete 和 update 操作。这减少了开销并可以显著提高索引速度。

如果启用了 Elasticsearch 安全功能，您必须对目标数据流、索引或索引别名拥有以下索引权限：

使用 create 操作需要 create_doc、create、index 或 write 索引权限。数据流仅支持 create 操作。
使用 index 操作需要 create、index 或 write 索引权限。
使用 delete 操作需要 delete 或 write 索引权限。
使用 update 操作需要 index 或 write 索引权限。
要通过批量 API 请求自动创建数据流或索引，需要 auto_configure、create_index 或 manage 索引权限。
要使用 refresh 参数使批量操作结果对搜索可见，需要 maintenance 或 manage 索引权限。

自动数据流创建需要启用数据流的匹配索引模板。

操作在请求体中使用换行分隔的 JSON (NDJSON) 结构指定：

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

index 和 create 操作期望在下一行有源数据，其语义与标准索引 API 中的 op_type 参数相同。如果目标中已存在相同 ID 的文档，create 操作会失败。 index 操作根据需要添加或替换文档。

注意：数据流仅支持 create 操作。要更新或删除数据流中的文档，必须定位包含该文档的后备索引。

update 操作期望在下一行指定部分文档、upsert 及脚本及其选项。

delete 操作不期望在下一行有源数据，其语义与标准删除 API 相同。

注意：数据的最后一行必须以换行符 (\n) 结尾。每个换行符前可以有回车符 (\r)。向 _bulk 端点发送 NDJSON 数据时，使用 Content-Type 头为 application/json 或 application/x-ndjson。由于此格式使用字面换行符 (\n) 作为分隔符，请确保 JSON 操作和源数据没有美化打印。

如果在请求路径中提供目标，则用于未明确指定 _index 参数的任何操作。

关于格式的说明：这里的目的是使处理尽可能快。由于某些操作被重定向到其他节点上的分片，接收节点端仅解析 action_meta_data。

使用此协议的客户端库应尝试在客户端执行类似操作，并尽可能减少缓冲。

单个批量请求中没有“正确”的操作数量。尝试不同的设置以找到适合您特定工作负载的最佳大小。请注意，Elasticsearch 默认将 HTTP 请求的最大大小限制为 100mb，因此客户端必须确保没有请求超过此大小。无法索引超过大小限制的单个文档，因此必须在发送到 Elasticsearch 之前将此类文档预处理为较小的部分。例如，在索引之前将文档拆分为页面或章节，或将原始二进制数据存储在 Elasticsearch 之外的系统中，并在发送到 Elasticsearch 的文档中用指向外部系统的链接替换原始数据。

客户端对批量请求的支持

一些官方支持的客户端提供助手来帮助批量请求和重新索引：

Go：查看 esutil.BulkIndexer
Perl：查看 Search::Elasticsearch::Client::5_0::Bulk 和 Search::Elasticsearch::Client::5_0::Scroll
Python：查看 elasticsearch.helpers.*
JavaScript：查看 client.helpers.*
.NET：查看 BulkAllObservable
PHP：查看批量索引。

使用 cURL 提交批量请求

如果向 curl 提供文本文件输入，必须使用 --data-binary 标志而不是普通的 -d。后者不保留换行符。例如：

$ cat requests
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
{"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}

乐观并发控制

批量 API 调用中的每个 index 和 delete 操作可以在其各自的操作和元数据行中包含 if_seq_no 和 if_primary_term 参数。 if_seq_no 和 if_primary_term 参数根据对现有文档的最后修改控制操作的执行方式。详情请参阅乐观并发控制。

版本控制

每个批量项可以使用 version 字段包含版本值。它根据 _version 映射自动遵循索引或删除操作的行为。它还支持 version_type。

路由

每个批量项可以使用 routing 字段包含路由值。它根据 _routing 映射自动遵循索引或删除操作的行为。

注意：数据流不支持自定义路由，除非在模板中启用了 allow_custom_routing 设置创建。

等待活动分片

进行批量调用时，可以设置 wait_for_active_shards 参数以要求在开始处理批量请求之前有最小数量的分片副本处于活动状态。

刷新

控制此请求所做的更改何时对搜索可见。

注意：只有接收批量请求的分片会受到刷新的影响。想象一个包含三个文档的 _bulk?refresh=wait_for 请求，这些文档恰好路由到具有五个分片的索引中的不同分片。请求将仅等待这三个分片刷新。构成索引的其他两个分片根本不参与 _bulk 请求。

您可能希望暂时禁用刷新间隔以提高大型批量请求的索引吞吐量。有关使用索引设置 API 的分步说明，请参阅链接文档。

https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk

Parameters:

operations (Sequence[Mapping[str, Any]] | None)
index (str | None) – 执行批量操作的数据流、索引或索引别名的名称。
include_source_on_error (bool | None) – 在解析错误的情况下是否在错误消息中包含文档源（True 或 false）。
list_executed_pipelines (bool | None) – 如果为 true，响应将包括为每个索引或创建运行的摄取管道。
pipeline (str | None) – 用于预处理传入文档的管道标识符。如果索引指定了默认摄取管道，将此值设置为 _none 将为此请求关闭默认摄取管道。如果配置了最终管道，无论此参数的值如何，它都将始终运行。
refresh (bool | str | Literal['false', 'true', 'wait_for'] | None) – 如果为 true，Elasticsearch 刷新受影响的分片以使此操作对搜索可见。如果为 wait_for，等待刷新以使此操作对搜索可见。如果为 false，不进行任何刷新操作。有效值：true、false、wait_for。
require_alias (bool | None) – 如果为 true，请求的操作必须针对索引别名。
require_data_stream (bool | None) – 如果为 true，请求的操作必须针对数据流（现有的或要创建的）。
routing (str | None) – 用于将操作路由到特定分片的自定义值。
source (bool | str | Sequence[str] | None) – 指示是否返回 _source 字段（true 或 false）或包含要返回的字段列表。
source_excludes (str | Sequence[str] | None) – 要从响应中排除的源字段的逗号分隔列表。也可以使用此参数从 _source_includes 查询参数指定的子集中排除字段。如果 _source 参数为 false，则忽略此参数。
source_includes (str | Sequence[str] | None) – 要在响应中包含的源字段的逗号分隔列表。如果指定此参数，则仅返回这些源字段。可以使用 _source_excludes 查询参数从此子集中排除字段。如果 _source 参数为 false，则忽略此参数。
timeout (str | Literal[-1] | ~typing.Literal[0] | None) – 每个操作等待以下操作的时间段：自动索引创建、动态映射更新和等待活动分片。默认为 `1m`（一分钟），保证 Elasticsearch 在失败前至少等待超时时间。实际等待时间可能更长，特别是在发生多次等待时。
wait_for_active_shards (int | str | Literal['all', 'index-setting'] | None) – 必须处于活动状态才能继续操作的分片副本数量。设置为 all 或任何正整数，最大为索引中的总分片数（number_of_replicas+1）。默认为 1，等待每个主分片处于活动状态。
body (Sequence[Mapping[str, Any]] | None)
error_trace (bool | None)
filter_path (str | Sequence[str] | None)
human (bool | None)
pretty (bool | None)

Return type:

Precision	Unique tile bins	H3 resolution	Unique hex bins	Ratio
1	4	0	122	30.5
2	16	0	122	7.625
3	64	1	842	13.15625
4	256	1	842	3.2890625
5	1024	2	5882	5.744140625
6	4096	2	5882	1.436035156
7	16384	3	41162	2.512329102
8	65536	3	41162	0.6280822754
9	262144	4	288122	1.099098206
10	1048576	4	288122	0.2747745514
11	4194304	5	2016842	0.4808526039
12	16777216	6	14117882	0.8414913416
13	67108864	6	14117882	0.2103728354
14	268435456	7	98825162	0.3681524172
15	1073741824	8	691776122	0.644266719
16	4294967296	8	691776122	0.1610666797
17	17179869184	9	4842432842	0.2818666889
18	68719476736	10	33897029882	0.4932667053
19	274877906944	11	237279209162	0.8632167343
20	1099511627776	11	237279209162	0.2158041836
21	4398046511104	12	1660954464122	0.3776573213
22	17592186044416	13	11626681248842	0.6609003122
23	70368744177664	13	11626681248842	0.165225078
24	281474976710656	14	81386768741882	0.2891438866
25	1125899906842620	15	569707381193162	0.5060018015
26	4503599627370500	15	569707381193162	0.1265004504
27	18014398509482000	15	569707381193162	0.03162511259
28	72057594037927900	15	569707381193162	0.007906278149
29	288230376151712000	15	569707381193162	0.001976569537