Elastic 搜索开发实战
需求描述:
- 智能补全提示
- 结果的高亮显示
- 结果的聚合统计和过滤
- 相关搜索结果的推荐
- 短语纠错(fuzziness)
- 数据的实时同步与更新(logstash)
- 查得到
- 查得全
- 查得准
- 支持同义词
- 支持简繁体
- 支持拼音(pinyin)
- 支持 PPT 的搜索
- 支持自定义排序,按时间,按相关度
- 支持结果过滤,按分类、按标签、按时间范围等
- 搜索结果展示待加强,UI 设计
版本:
- Elasticsearch v6.2.4
- Kibana v6.2.4
- Logstash v6.2.4
- mysql-connector-java-8.0.20.tar.gz
kibana dev tools快捷键
command + i 自动缩进 command + enter 提交请求
常用 API 介绍
1、增删改查
# 创建文档 POST twitter/doc/1 { "name": "Jack", "age": 30 } # 取回文档 GET twitter/doc/1 # 完全替换 PUT twitter/doc/1 { "name": "Jack", "age": 35 } # 部分更新 POST twitter/doc/1/_update { "doc": { "name": "Mark" } } # 删除文档 DELETE twitter/doc/1
2、搜索的使用
# 创建两个索引文档 POST twitter/doc/1 { "name": "Jack", "age": 30 } POST twitter/doc/2 { "name": "Mark", "age": 35 } # 通过名称搜索 GET twitter/_search?q=Jack # 通过年龄检索 GET twitter/_search?q=35 # 限定查询年龄字段 GET twitter/_search?q=age:35 # QueryDSL查询表达式 POST twitter/_search { "query": { "match": { "age": 35 } } }
3、聚合的使用
# 再创建两个索引文档 POST twitter/doc/3 { "name":"john", "age":30 } POST twitter/doc/4 { "name":"mark", "age":40 } # 统计年龄的分布情况 POST twitter/_search { "size": 0, "aggs": { "age_stats": { "terms": { "field": "age", "size": 10 } } } }
聚合查询表达式说明
- 最外层的 size 为 0 不返回搜索命中的文档,只返回聚合的统计结果,
- aggs 就是我们描述聚合查询语句根节点,
- 使用 terms 聚合类型来对 age 字段的值进行统计,并且只返回前 10 个统计值,
- 然后这些统计结果我们命名为 age_stats
4、索引的管理
# 查看索引列表 GET _cat/indices?v # 删除索引 DELETE twitter # 查看集群的健康状况 GET _cluster/health
字段名 | 说明 |
---|---|
health | 索引健康状态,green 表示健康;yellow 表示数据完整,但是缺少副本;red 则表示有数据损坏。 |
status | 索引工作状态,open 表示索引打开中,可以被使用;close 表示索引被关闭,不能使用。 |
index | 索引名称。 |
uuid | 索引的唯一 ID 标识。 |
pri | 索引的主分片个数。 |
rep | 索引的副本分片个数。 |
docs.count | 索引内的文档个数。 |
docs.deleted | 索引内已经删除的文档个数。 |
搜索示例
数据准备
创建表
CREATE TABLE `blog` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键id', `title` varchar(60) DEFAULT NULL COMMENT '标题', `author` varchar(60) DEFAULT NULL COMMENT '作者', `content` text COMMENT '内容', `create_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', `update_time` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
获取测试数据
# -*- coding: utf-8 -*- from pprint import pprint import requests from parsel import Selector from puremysql import PureMysql def get_data(url): """ 获取古诗文网数据 eg: https://www.gushiwen.cn/ :return: list """ response = requests.get(url) sel = Selector(text=response.text) rows = sel.css(".main3 .left .sons") lst = [] for row in rows: title = row.css("b::text").extract_first() author = row.css(".source").xpath("string(.)").extract_first() content = row.css(".contson").xpath("string(.)").extract_first() if not title: continue item = { "title": title.strip(), "author": author.strip(), "content": content.replace('\n', ''), } pprint(item) lst.append(item) return lst def insert_data(lst): """ 数据入库 """ con = PureMysql(db_url="mysql://root:123456@127.0.0.1:3306/data?charset=utf8") table = con.table("blog") ret = table.insert(lst) con.close() print("成功入库", ret) def main(): # url = "https://www.gushiwen.cn/" for page in range(1, 11): url = f"https://www.gushiwen.cn/default.aspx?page={page}" lst = get_data(url) insert_data(lst) if __name__ == '__main__': main()
logstash同步数据配置
config/jdbc.conf
input { jdbc { jdbc_driver_library => "mysql-connector-java-8.0.16.jar" jdbc_driver_class => "com.mysql.cj.jdbc.Driver" jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/data" jdbc_user => "root" jdbc_password => "123456" statement => "SELECT id, title, content, author, create_time, update_time FROM blog" jdbc_paging_enabled => "true" jdbc_page_size => "5000" } } filter { } output { stdout { codec => rubydebug } elasticsearch { index => "blog", document_id => "%{id}" } }
同步数据
# 检查配置文件 $ ./bin/logstash -t -f config/jdbc.conf # 执行配置文件 $ ./bin/logstash -f config/jdbc.conf
问题及处理
处理elasticsearch跨域问题
config/elasticsearch.yml
http.cors.enabled: true http.cors.allow-origin: "*"
搜索提示
高亮结果显示
POST /blog/_search { "query": { "match": { "author": "李白" } }, "highlight": { "fields": { "author": {} } } }
搜索模板
将查询和参数分离
POST /blog/_search/template { "source": { "query": { "match": { "{{key}}": "{{value}}" } }, "size": "{{size}}" }, "params": { "key": "author", "value": "李白", "size": 10 } }
其他语句
# 调试模板渲染结果: GET _render/template # 取回模板定义的语法: GET _scripts/<templatename> # 删除模板定义的语法: DELETE _scripts/<templatename>
创建模板
POST /_scripts/blog_template_v1 { "script": { "lang": "mustache", "source": { "query": { "match": { "{{key}}": "{{value}}" } }, "highlight": { "fields": { "{{key}}": {} } }, "size": "{{size}}" } } }
使用模板
POST /blog/_search/template { "id": "blog_template_v1", "params": { "key": "author", "value": "李白", "size": 10 } }
模糊查询
GET test/_search { "query": { "match": { "doc":{ "query": "elastix", "fuzziness": "AUTO" } } } }
优化查询
POST _scripts/blog_template_v1 { "script": { "lang": "mustache", "source": { "size": "{{size}}", "query": { "bool": { "should": [ { "prefix": { "{{field}}.keyword": { "value": "{{query}}", "boost": 10 } } }, { "match_phrase_prefix": { "{{field}}": { "query": "{{query}}", "boost": 2 } } }, { "match": { "{{field}}": "{{query}}" } } ] } }, "_source": [ "title", "id", "uid", "views" ] } } }
重建索引
# 新建索引 PUT blog_v1 # 查看原索引的mapping GET blog/_mapping # 设置索引的mapping POST blog_v1/doc/_mapping { "doc": { "properties": { "@timestamp": { "type": "date" }, "@version": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "author": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "content": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "create_time": { "type": "date" }, "id": { "type": "long" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "update_time": { "type": "date" } } } } # 索引迁移 POST _reindex { "source": {"index": "blog"}, "dest": {"index": "blog_v1"} } # 查询测试 POST /blog_v1/_search
索引别名
# 查看别名 GET _cat/aliases # 添加别名 POST /_aliases { "actions": [ { "add": { "index": "blog", "alias": "my-blog" } } ] } # 切换别名 POST /_aliases { "actions": [ { "add": { "index": "blog_v1", "alias": "my-blog" } }, { "remove": { "index": "blog", "alias": "my-blog" } } ] } # 通过别名搜索 POST my-blog/_search
拼音处理的插件
https://github.com/medcl/elasticsearch-analysis-pinyin/releases/tag/v6.3.2
添加拼音搜索字段
# 关闭索引 POST my-blog/_close # 设置索引支持拼音分析器 PUT my-blog/_settings { "index": { "analysis": { "analyzer": { "pinyin_analyzer": { "tokenizer": "my_pinyin" } }, "tokenizer": { "my_pinyin": { "type": "pinyin", "keep_first_letter": true, "keep_separate_first_letter": true, "keep_full_pinyin": true, "keep_original": false, "limit_first_letter_length": 16, "lowercase": true } } } } } # 打开索引 POST my-blog/_open # 获取原索引mapping GET my-blog/_mapping # 添加字段 PUT my-blog/doc/_mapping { "doc": { "properties": { "@timestamp": { "type": "date" }, "@version": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "author": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "pinyin": { "type": "text", "analyzer": "pinyin_analyzer" } } }, "content": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "create_time": { "type": "date" }, "id": { "type": "long" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "update_time": { "type": "date" } } } } # 更新索引 POST my-blog/_update_by_query?conflicts=proceed # 测试拼音搜索 POST my-blog/_search { "query": {"match": { "author.pinyin": "libai" }} }
前端显示
<html lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>Document</title> <!-- 开发环境版本,包含了有帮助的命令行警告 --> <script src="https://cdn.jsdelivr.net/npm/vue/dist/vue.js"></script> <!-- 引入样式 --> <link rel="stylesheet" href="https://unpkg.com/element-ui/lib/theme-chalk/index.css" /> <!-- 引入组件库 --> <script src="https://unpkg.com/element-ui/lib/index.js"></script> <!-- axios --> <script src="https://unpkg.com/axios/dist/axios.min.js"></script> <style> /* 居中显示 */ #app { width: 200px; margin: 0 auto; margin-top: 300px; } /* 搜索结果高亮 */ em { color: red; } </style> </head> <body> <div id="app"> <el-autocomplete v-model="state" :fetch-suggestions="querySearchAsync" placeholder="请输入内容" @select="handleSelect" > <!-- 自定义显示 --> <template slot-scope="{ item }"> <div v-html="item.highlight.author[0]"></div> </template> </el-autocomplete> </div> <script> new Vue({ el: "#app", data() { return { list: [], state: "", }; }, methods: { async querySearchAsync(queryString, cb) { // 查询地址 const QUERY_URL = "http://localhost:9200/blog/_search/template"; // 查询语句 let query = { id: "blog_template_v1", params: { field: "author", query: queryString, size: 10, }, }; const res = await axios.post(QUERY_URL, query); console.log(res.data.hits.hits); cb(res.data.hits.hits); }, handleSelect(item) { console.log(item); }, }, }); </script> </body> </html>