Springboot ElasticSearch

Springboot + ElasticSearch 构建博客检索系统

资料:

Elastic中文社区

全文搜索引擎 Elasticsearch 入门教程

下载地址

项目工具
- SpringBoot
- ElasticSearch
- Kibana
- PostMan
- Vue

ElasticSearch

  • 分布式
  • 全文检索
  • 实时快速
  • Restful
Mysql ES
Database Index
Table Type
Row Document
Column Field
Scheme Mapping
MySQL: 
select * from user.user_info where name = "张三"

ES: 
GET /user/user_info/_search?q=name:张三

下载安装

下载版本
- ElasticSearch 6.3.2
- Kibana 6.3.2

国内镜像
- ElasticSearch 6.3.2
- Kibana 6.3.2

启动

# 启动 elasticsearch
cd elasticsearch-6.3.2
bash ./bin/elasticsearch

# 启动 kibana
cd kibana-6.3.2-darwin-x86_64
bash ./bin/kibana

查看:

交互操作

# 查看所有索引
GET /_all

# 创建索引
PUT /person

# 添加数据
PUT /person/_doc/1
{
  "name": "Tom",
  "pets": ["pig", "cat"]
}

# 添加数据
PUT /person/_doc/2
{
  "name": "Jack",
  "pets": ["dog", "cat"]
}

# 获取数据
GET /person/_doc/1

# 搜索数据
GET /person/_doc/_search?q=name:Tom

# 复杂查询,可以省略_doc
POST /person/_search
{
  "query": {
    "bool": {
      "should": {
        "match": {
          "name": "Tom"
        }
      }
    }
  }
}

# or查询
POST /person/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": "Tom"
          }
        },
        {
          "match": {
            "name": "Jack"
          }
        }
      ]
    }
  }
}


# and查询
POST /person/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "Tom"
          }
        },
        {
          "match": {
            "name": "Jack"
          }
        }
      ]
    }
  }
}

# 删除索引
DELETE  /person

基于MySQL实现

create table blog(
  id int(11) not null primary key auto_increment,
  title varchar(60) default null,
  content text,
  create_time datetime default null,
  update_time datetime default null
)

select * from blog where title like "%spring%" or content like "%pring%";

基于ES实现

MySQL->ES数据同步

全量同步
增量同步

开源中间件

binlog订阅:
- alibaba/canel
- siddontang/go-mysql-elasticsearch(开发阶段)
- logstash(id/time)

logstash全量、增量同步

国内镜像下载 logstash 6.3.2

下载MySQL驱动 mysql-connector-java.jar

同步示例

create table user(
  id int(11) not null primary key auto_increment,
  name varchar(60) default null,
  age int(11),
  create_time datetime default CURRENT_TIMESTAMP,
  update_time datetime default CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)

同步配置 mysql.conf

input {
  jdbc {
    # jdbc驱动包位置
    jdbc_driver_library => "./mysql-connector-java-8.0.16.jar"
    # 驱动类
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    # 数据库连接信息, 8.0以上版本:一定要把serverTimezone=UTC天加上
    jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/data?characterEncoding=utf8&useSSL=false&serverTimezone=UTC&rewriteBatchedStatements=true"
    # 用户
    jdbc_user => "root"
    # 密码
    jdbc_password => "123456"
    # 定时任务,默认一分钟
    schedule => "* * * * * *"
    # 全量同步,清空上次sql_last_value记录
    # clean_run => true
    # 执行的语句
    statement => "SELECT * FROM user WHERE update_time >= :sql_last_value"
    # 分页
    jdbc_paging_enabled => "true"
    jdbc_page_size => "5000"
    # 使用递增列的值
    use_column_value => true
    # 递增字段的类型
    tracking_column_type => "timestamp"
    # 递增字段的名称
    tracking_column => "update_time"
    # 同步点文件
    last_run_metadata_path => "user_syncpoint.txt"
  }
}
output {
    elasticsearch {
        # ES的IP地址及端口
        hosts => ["http://127.0.0.1:9200"]
        # 索引名称 可自定义
        index => "user"
        # 需要关联的数据库中有有一个id字段,对应类型中的id
        document_id => "%{id}"
    }
    stdout {
        # JSON格式输出
        codec => json_lines
    }
}

启动同步

$ ./bin/logstash -f ./config/mysql.conf

配置 pipelines.yml

- pipeline.id: table-user
  path.config: "./config/mysql.conf"

启动同步

$ ./bin/logstash

向user表中插入测试数据

# -*- coding: utf-8 -*-

from puremysql import PureMysql
from faker import Faker
import random

con = PureMysql(db_url="mysql://root:123456@127.0.0.1:3306/data?charset=utf8")
user_table = con.table("user")

# 生成模拟数据 100 * 5000 = 50W条
faker = Faker(locale="zh_CN")
for i in range(0, 100):
    lst = []
    for j in range(0, 5000):
        lst.append({
            "name": faker.name(),
            "age": random.randint(1, 100)
        })

    count = user_table.insert(lst)
    print(count)

con.close()

Jdbc input plugin 配置选项

Setting Input type Required Default
clean_run boolean No false
columns_charset hash No {}
connection_retry_attempts number No 1
connection_retry_attempts_wait_time number No
jdbc_connection_string string Yes -
jdbc_default_timezone string No -
jdbc_driver_class string Yes -
jdbc_driver_library string No -
jdbc_fetch_size number No -
jdbc_page_size number No 100000
jdbc_paging_enabled boolean No false
jdbc_password password No -
jdbc_password_filepath a valid filesystem path No -
jdbc_pool_timeout number No 5
jdbc_user string Yes -
jdbc_validate_connection boolean No false
jdbc_validation_timeout number No 3600
last_run_metadata_path string No "$HOME/.logstash_jdbc_last_run"
lowercase_column_names boolean No true
parameters hash No {}
plugin_timezone string, one of ["local", "utc"] No "utc"
prepared_statement_bind_values array No []
prepared_statement_name string No ""
record_last_run boolean No true
schedule string No -
sequel_opts hash No {}
sql_log_level string, one of ["fatal", "error", "warn", "info", "debug"] No "info"
statement string No -
statement_filepath a valid filesystem path No -
tracking_column string No -
tracking_column_type string, one of ["numeric", "timestamp"] No "numeric"
use_column_value boolean No false
use_prepared_statements boolean No false

配置参考:https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html

分词器

standard 中文单字拆分
simple
whitespace 不支持中文
language 不支持中文

POST _analyze
{
  "analyzer": "standard",
  "text": "hello world"
}

# hello world

POST _analyze
{
  "analyzer": "standard",
  "text": "中国人"
}
# 中 国 人

ik分词器

elasticsearch-analysis-ik

下载解压后放ES的plugins文件夹下,重启ES生效

分词语句:我是中国人

ik_smart:我//中国人
ik_max_word //中国人/中国/国人

自定义分词

添加自定义词语到文件

elasticsearch-analysis-ik-6.3.2/config/main.dic

再次分词

ik_smart:我是/中国人
ik_max_word 我是/中国人/中国/国人

SpringBoot集成ES

POST blog/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "title": "杏花"
          }
        },
        {
          "match_phrase": {
            "content": "杏花"
          }
        }
      ]
    }
  }
}