ES学习-Part1

ElasticSearch

ES核心概念

ES 概念	类比关系型数据库	通俗解释
Index	数据库	存储一类数据的集合，如“用户索引”
Document	表中的一行	一条具体的数据记录，格式为 JSON
Field	表中的列	数据的属性，如“姓名”、“年龄”
Mapping	表结构定义	定义字段的数据类型和属性
Shard	分区	将数据分片，便于分布式存储和查询
Replica	副本	数据的备份，提高容错性和查询性能

ES使用场景

我们在哪些场景下可以使用ES呢？

主要功能：

1）海量数据的分布式存储以及集群管理，达到了服务与数据的高可用以及水平扩展；

2）近实时搜索，性能卓越。对结构化、全文、地理位置等类型数据的处理；

3）海量数据的近实时分析（聚合功能）

应用场景：

1）网站搜索、垂直搜索、代码搜索；

2）日志管理与分析、安全指标监控、应用性能监控、Web抓取舆情分析；

ES架构设计

倒排索引

将词项拆分，按照字典排序，组成term dictionary，并将词项对应的信息，如文档id等对应放入另一侧，组成posting list

存放在磁盘中
Term Index

前缀树，通过对term的前缀做树的搜索结合，加上与term dictionary的映射，将之放到内存中，进行检索加速
Stored Fields

用于存放文档的完整信息
Doc Values

用于排序和聚合
segment

以上四部分组合起来，构成一个可以搜索的最底层结构，称为segment
lucene

多个segment组合起来，segment生成后不可修改，老segment用于读，新segment用于写，不定期合并segment防止句柄耗尽，构成搜索库lucene
高性能
1. 对写入lucene的数据进行分类，不同类数据写入不同lucene中。
2. 每一类数据进行数据分片，形成shared，每个分片都是一个独立的lucene库
高扩展性
1. 分片部署到不同机器上，每个机器就是一个node
高可用性
1. 为分片添加副本，主分片数据同步给副本，副本对外提供读操作，并且在主分片挂的时候升格为主分片（类似主从）
2. 角色分化。不同node负责不同的功能
3. 去中心化。每个节点引入魔改的Raft模块，保证不同node之间的数据一致性，同时可以了解其他node的健康状况，以便进行选主

ES配置

1	`curl -fsSL https://elastic.co/start-local \| sh`

官方Go-SDK使用

连接

client, err := elasticsearch.NewTypedClient(elasticsearch.Config{
		Addresses: []string{"http://localhost:9200"},
		Username:  "elastic",
		Password:  "LUB7jymz",
	})
	if err != nil {
		panic(err)
	}

创建index

// Create Index
res, err := client.Indices.Create("first_try").Do(context.Background())
if err != nil {
	log.Panicf("create index err:%v", err)
}
fmt.Println("Create Index Succefully", res)

index里存储document

//index a document
document := struct {
	Name string `json:"name"`
}{
	Name: "woodQQQ",
}
res, err := client.Index("first_try").Id("1").Request(document).Do(context.Background())
if err != nil {
	panic(err)
}
fmt.Println("Index Document Succefully", res)

通过id获取document

//get a document
res, err := client.Get("first_try", "1").Do(context.Background())
if err != nil {
	log.Panic(err)
}
fmt.Println("Get Document Succefully", string(res.Source_))

搜索document

//search document
res, err := client.Search().
	Index("index_name").
	Request(&search.Request{
		Query: &types.Query{
			Match: map[string]types.MatchQuery{
				"name": {Query: "Foo"},
			},
		},
	}).Do(context.Background())
if err != nil {
	log.Panic(err)
}
for _, hit := range res.Hits.Hits {
	fmt.Println(string(hit.Source_))
}

更新document

//update document
res, err := client.Update("first_try", "1").Request(&update.Request{
	Doc: json.RawMessage(`{ "name" : "woodQ" }`),
}).Do(context.Background())
if err != nil {
	log.Panic(err)
}
fmt.Println(res.Result)

Typed-API使用

✅ 0. 前提：创建 Typed Client

es, _ := elasticsearch.NewTypedClient(elasticsearch.Config{
    Addresses: []string{"http://localhost:9200"},
})

✅ 1. `Match` 查询：全文匹配（模糊搜索）

用于模糊匹配某个字段的内容（如搜索标题含 “go” 的文档）：

res, err := es.Search().
    Index("articles").
    Request(&types.SearchRequest{
        Query: &types.Query{
            Match: map[string]types.MatchQuery{
                "title": {
                    Query: "golang",
                },
            },
        },
    }).
    Do(context.Background())

✅ 2. `Term` 查询：精确匹配（适用于 keyword）

精确匹配字段值（非分析字段），如 status: "published"：

res, err := es.Search().
    Index("articles").
    Request(&types.SearchRequest{
        Query: &types.Query{
            Term: map[string]types.TermQuery{
                "status": {
                    Value: "published",
                },
            },
        },
    }).
    Do(context.Background())

✅ 3. `Bool` 查询：组合多个条件（must、should、must_not）

复合查询常用于构建多个 AND/OR 条件组合：

res, err := es.Search().
    Index("articles").
    Request(&types.SearchRequest{
        Query: &types.Query{
            Bool: &types.BoolQuery{
                Must: []types.Query{
                    {Match: map[string]types.MatchQuery{
                        "title": {Query: "golang"},
                    }},
                    {Term: map[string]types.TermQuery{
                        "status": {Value: "published"},
                    }},
                },
            },
        },
    }).
    Do(context.Background())

✅ 4. `Range` 查询：数值或日期范围

用于价格区间、时间范围、评分等：

res, err := es.Search().
    Index("products").
    Request(&types.SearchRequest{
        Query: &types.Query{
            Range: map[string]types.RangeQuery{
                "price": {
                    Gte: types.Float64(100),
                    Lte: types.Float64(300),
                },
            },
        },
    }).
    Do(context.Background())

✅ 5. 聚合（Aggregation）：统计分析

聚合可以统计数量、平均值、最大值等：

res, err := es.Search().
    Index("sales").
    Request(&types.SearchRequest{
        Size: types.Int(0), // 不返回文档，只要聚合结果
        Aggregations: map[string]types.Aggregations{
            "total_sales": {
                Sum: &types.SumAggregation{
                    Field: "amount",
                },
            },
        },
    }).
    Do(context.Background())

fmt.Println("Sum result:", res.Aggregations["total_sales"].Sum.Value)

✅ 6. 分页 + 排序

res, err := es.Search().
    Index("articles").
    Request(&types.SearchRequest{
        From: types.Int(0),
        Size: types.Int(10),
        Sort: []types.SortCombinations{
            types.SortOptions{
                SortOptions: map[string]types.SortOptionsValue{
                    "created_at": {
                        Order: types.SortOrderDesc.Ptr(),
                    },
                },
            },
        },
    }).
    Do(context.Background())

🔚 总结：Typed API 支持的常见查询手段

类型	用途示例
Match	模糊搜索（如标题、描述）
Term	精准匹配（如状态、分类 ID）
Bool	多条件组合查询
Range	日期、价格、数值区间
Aggregation	统计、分组、分析
Sort + Page	排序和分页

正确项目实操流程

✅ 第一步：定义文档结构（对应 Elasticsearch 映射）

在 Go 中，先定义你要索引的文档结构体：

type Article struct {
	ID        string    `json:"id,omitempty"`
	Title     string    `json:"title"`
	Author    string    `json:"author"`
	Tags      []string  `json:"tags,omitempty"`
	Published bool      `json:"published"`
	CreatedAt time.Time `json:"created_at"`
}

✅ 第二步：创建索引 + 设置 Mapping（字段类型）

用 typed client 显式创建索引结构：

res, err := es.Indices.Create("articles").
	Request(&types.CreateIndexRequest{
		Mappings: &types.TypeMapping{
			Properties: map[string]types.Property{
				"title": types.NewTextProperty(),
				"author": types.NewKeywordProperty(),
				"tags": types.NewKeywordProperty(),
				"published": types.NewBooleanProperty(),
				"created_at": types.NewDateProperty(),
			},
		},
	}).
	Do(context.Background())

if err != nil {
	log.Fatalf("Index creation failed: %v", err)
}

🔎 为何不自动创建？默认自动创建会使用默认 Mapping，容易出问题，比如 keyword 和 text 混用导致无法聚合、排序。

✅ 第三步：写入文档（Index）

将文档写入到上面创建的索引：

doc := Article{
	ID:        "1",
	Title:     "Intro to Elasticsearch with Go",
	Author:    "Alice",
	Tags:      []string{"elasticsearch", "go"},
	Published: true,
	CreatedAt: time.Now(),
}

_, err := es.Index("articles").
	Id(doc.ID).
	Document(doc).
	Do(context.Background())

✅ 第四步：构建检索逻辑（Search）

查询作者为 Alice，标题中含 “go”的已发布文章：

res, err := es.Search().
	Index("articles").
	Request(&types.SearchRequest{
		Query: &types.Query{
			Bool: &types.BoolQuery{
				Must: []types.Query{
					{Match: map[string]types.MatchQuery{
						"title": {Query: "go"},
					}},
					{Term: map[string]types.TermQuery{
						"author": {Value: "Alice"},
					}},
					{Term: map[string]types.TermQuery{
						"published": {Value: true},
					}},
				},
			},
		},
		Sort: []types.SortCombinations{
			types.SortOptions{
				SortOptions: map[string]types.SortOptionsValue{
					"created_at": {Order: types.SortOrderDesc.Ptr()},
				},
			},
		},
	}).
	Do(context.Background())

✅ 第五步：解析检索结果

for _, hit := range res.Hits.Hits {
	var a Article
	err := json.Unmarshal(hit.Source_, &a)
	if err != nil {
		log.Printf("Failed to unmarshal hit: %v", err)
		continue
	}
	fmt.Printf("Found article: %+v\n", a)
}

🧠 实践经验总结

阶段	建议
建索引	显式创建 mapping，明确哪些是 `keyword`（可聚合）哪些是 `text`（可搜索）
写文档	用结构体构建数据，防止 JSON 拼写错
搜索	统一用 `typedapi/types.Query` 构建，避免拼 JSON
结果解析	`json.Unmarshal(hit.Source_, &obj)` 是标准方式
测试 & 调试	用 Kibana 或 `curl` 先确认查询逻辑，再迁移到 typed client

ES自带的向量检索方案

Script Score的精确检索
将向量存储在文档的 dense_vector 字段中，然后在查询时通过脚本（Painless）计算查询向量与文档向量的相似度（常见指标有余弦相似度、欧氏距离、点积等）
1. 优点：无须预先构建索引
2. 缺点：脚本计算的开销会变大
KNN的近似最近邻检索
引入HNSW索引结构，加速大规模向量相似度检索
1. 原理：在文档索引阶段，通过向量插入建立 HNSW 图（类似于图结构，每个节点是一个向量，图中的边链接相似向量）
2. 查询时，起始于图中的某几个入口节点，向下逐层搜索与查询向量相似的节点，并逐步收敛到最接近的近邻。
3. 优点：高吞吐，高性能，查询速度快
4. 缺点：资源占用大
混合检索hybird search

先对某些字段过滤，过滤后做向量相似度检索

技术栈学习

#Agent #Es

ES学习-Part1

http://example.com/2025/06/01/ES学习-Part1/

作者

WoodQ

发布于

2025年6月1日

许可协议

Manus智能体学习下一篇

ES学习-Part1

ElasticSearch

ES核心概念

ES使用场景

ES架构设计

ES配置

官方Go-SDK使用

Typed-API使用

✅ 0. 前提：创建 Typed Client

✅ 1. Match 查询：全文匹配（模糊搜索）

✅ 2. Term 查询：精确匹配（适用于 keyword）

✅ 3. Bool 查询：组合多个条件（must、should、must_not）

✅ 4. Range 查询：数值或日期范围

✅ 5. 聚合（Aggregation）：统计分析

✅ 6. 分页 + 排序

🔚 总结：Typed API 支持的常见查询手段

正确项目实操流程

✅ 第一步：定义文档结构（对应 Elasticsearch 映射）

✅ 第二步：创建索引 + 设置 Mapping（字段类型）

✅ 第三步：写入文档（Index）

✅ 第四步：构建检索逻辑（Search）

✅ 第五步：解析检索结果

🧠 实践经验总结

ES自带的向量检索方案

✅ 1. `Match` 查询：全文匹配（模糊搜索）

✅ 2. `Term` 查询：精确匹配（适用于 keyword）

✅ 3. `Bool` 查询：组合多个条件（must、should、must_not）

✅ 4. `Range` 查询：数值或日期范围