elasticsearch,  Linux

如何把elasticsearch添加到systemd系统服务

一背景说明

在一个3节点的elasticsearch集群,当前是以elastic用户通过手工启动服务来运行。机器信息如下:

[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.9.2009 (Core)
[root@localhost ~]# uname -rm
3.10.0-1160.el7.x86_64 x86_64
[root@localhost ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7821        5013        2458           9         349        2568
Swap:          8063           0        8063
[root@localhost ~]# 

elasticsearch版本为7.17.1,信息:

[root@localhost ~]# su - elastic
上一次登录:五 5月 10 11:56:57 CST 2024pts/0 上
-bash-4.2$ pwd
/home/elastic
-bash-4.2$ ll
总用量 0
drwxr-xr-x 11 elastic elastic 179 6月   7 2023 elasticsearch-7.17.1
-bash-4.2$ ./elasticsearch-7.17.1/
bin/     config/ data/   jdk/     lib/     logs/   modules/ –p/     plugins/
-bash-4.2$ ./elasticsearch-7.17.1/bin/elasticsearch --version
warning: usage of JAVA_HOME is deprecated, use ES_JAVA_HOME
Future versions of Elasticsearch will require Java 11; your Java version from [/usr/local/jdk1.8.0_341/jre] does not meet this requirement. Consider switching to a distribution of Elasticsearch with a bundled JDK. If you are already using a distribution with a bundled JDK, ensure the JAVA_HOME environment variable is not set.
Version: 7.17.1, Build: default/tar/e5acb99f822233d62d6444ce45a4543dc1c8059a/2022-02-23T22:20:54.153567231Z, JVM: 1.8.0_341
-bash-4.2$

elastic用户配置信息:

-bash-4.2$ id
uid=1001(elastic) gid=1001(elastic) 组=1001(elastic)
-bash-4.2$ ulimit -a
core file size         (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31191
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                     (-n) 65536
pipe size           (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority             (-r) 0
stack size             (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes             (-u) 4096
virtual memory         (kbytes, -v) unlimited
file locks                     (-x) unlimited
-bash-4.2$

手动启动命令为:

-bash-4.2$ /home/elastic/elasticsearch-7.17.1/bin/elasticsearch &

二 需求场景

需要将这个3节点的elasticsearch集群启动方式,修改为随着操作系统的启动而自动启动。而不要每次操作系统重启之后,还需要人为介入,去手工登录每台机器,然后再去启动elasticsearch服务。

三 解决方案

由于系统是CentOS 7.9,可以考虑将该服务加入到systemd,有操作系统去管控该服务。

1 创建配置文件

cat <<EOF > /etc/systemd/system/elasticsearch.service
[Unit]
Description=elasticsearch

[Service]
User=elastic
ExecStart=/home/elastic/elasticsearch-7.17.1/bin/elasticsearch
Restart=always

[Install]
WantedBy=multi-user.target
EOF

2 加载systemd daemon服务

systemctl daemon-reload 

3 配置和启动elasticsearch服务

systemctl enable elasticsearch.service 

systemctl start elasticsearch.service

systemctl status elasticsearch.service

四 故障处理

1 启动日志

通过systemctl start elasticsearch之后,发现服务表面上来看是正常启动的。但是,9200,9300端口在服务器上根本没有打开,客户端也无法正常访问elasticsearch服务。

通过journalctl -f来查看日志时,发现elasticsearch在不断重启,且看到有类似下述错误:

[root@localhost ~]# journalctl -f
.....
5月 10 15:11:36 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:36,047][INFO ][o.e.t.NettyAllocator     ] [ES01] creating NettyAllocator with the following configs: [name=elasticsearch_configured, chunk_size=1mb, suggested_max_allocation_size=1mb, factors={es.unsafe.use_netty_default_chunk_and_page_size=false, g1gc_enabled=true, g1gc_region_size=4mb}]
5月 10 15:11:36 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:36,088][INFO ][o.e.i.r.RecoverySettings ] [ES01] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]
5月 10 15:11:36 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:36,139][INFO ][o.e.d.DiscoveryModule   ] [ES01] using discovery type [zen] and seed hosts providers [settings]
5月 10 15:11:36 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:36,737][INFO ][o.e.g.DanglingIndicesState] [ES01] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
5月 10 15:11:37 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:37,774][INFO ][o.e.n.Node               ] [ES01] initialized
5月 10 15:11:37 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:37,775][INFO ][o.e.n.Node               ] [ES01] starting ...
5月 10 15:11:37 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:37,798][INFO ][o.e.x.s.c.f.PersistentCache] [ES01] persistent cache index loaded
5月 10 15:11:37 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:37,800][INFO ][o.e.x.d.l.DeprecationIndexingComponent] [ES01] deprecation component started
5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:38,009][INFO ][o.e.t.TransportService   ] [ES01] publish_address {10.0.9.239:9300}, bound_addresses {[::]:9300}
5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:38,877][INFO ][o.e.b.BootstrapChecks   ] [ES01] bound or publishing to a non-loopback address, enforcing bootstrap checks
5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: ERROR: [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: bootstrap check failure [1] of [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]
5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: ERROR: Elasticsearch did not exit normally - check the logs at /home/elastic/elasticsearch-7.17.1/logs/es-cluster.log
5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:38,904][INFO ][o.e.n.Node               ] [ES01] stopping ...
5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:38,929][INFO ][o.e.n.Node               ] [ES01] stopped
5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:38,930][INFO ][o.e.n.Node               ] [ES01] closing ...
5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: [2024-05-10T15:11:38,950][INFO ][o.e.n.Node               ] [ES01] closed
5月 10 15:11:39 localhost.localdomain systemd[1]: elasticsearch.service: main process exited, code=exited, status=78/n/a
5月 10 15:11:39 localhost.localdomain systemd[1]: Unit elasticsearch.service entered failed state.
5月 10 15:11:39 localhost.localdomain systemd[1]: elasticsearch.service failed.
5月 10 15:11:39 localhost.localdomain systemd[1]: elasticsearch.service holdoff time over, scheduling restart.
5月 10 15:11:39 localhost.localdomain systemd[1]: Stopped elasticsearch.
5月 10 15:11:39 localhost.localdomain systemd[1]: Started elasticsearch.
5月 10 15:11:44 localhost.localdomain elasticsearch[8755]: [2024-05-10T15:11:44,383][INFO ][o.e.n.Node               ] [ES01] version[7.17.1], pid[8755], build[default/tar/e5acb99f822233d62d6444ce45a4543dc1c8059a/2022-02-23T22:20:54.153567231Z], OS[Linux/3.10.0-1160.el7.x86_64/amd64], JVM[Eclipse Adoptium/OpenJDK 64-Bit Server VM/17.0.2/17.0.2+8]
5月 10 15:11:44 localhost.localdomain elasticsearch[8755]: [2024-05-10T15:11:44,385][INFO ][o.e.n.Node               ] [ES01] JVM home [/home/elastic/elasticsearch-7.17.1/jdk], using bundled JDK [true]
5月 10 15:11:44 localhost.localdomain elasticsearch[8755]: [2024-05-10T15:11:44,386][INFO ][o.e.n.Node               ] [ES01] JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j2.formatMsgNoLookups=true, -Djava.locale.providers=SPI,COMPAT, --add-opens=java.base/java.io=ALL-UNNAMED, -Xms4g, -Xmx4g, -XX:+UseG1GC, -Djava.io.tmpdir=/tmp/elasticsearch-2589143606734508201, -XX:+HeapDumpOnOutOfMemoryError, -XX:+ExitOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -XX:MaxDirectMemorySize=2147483648, -XX:G1HeapRegionSize=4m, -XX:InitiatingHeapOccupancyPercent=30, -XX:G1ReservePercent=15, -Des.path.home=/home/elastic/elasticsearch-7.17.1, -Des.path.conf=/home/elastic/elasticsearch-7.17.1/config, -Des.distribution.flavor=default, -Des.distribution.type=tar, -Des.bundled_jdk=true]
^C
[root@localhost ~]#

5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: bootstrap check failure [1] of [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535] 5月 10 15:11:38 localhost.localdomain elasticsearch[8504]: ERROR: Elasticsearch did not exit normally – check the logs at /home/elastic/elasticsearch-7.17.1/logs/es-cluster.log

2 分析问题:

通过手工切换到elastic用户,并且手工启动服务时,没有任何问题。且,elastic用户的系统配额也的确有设置file descrptor指标。

可是,为什么通过加入到systemd之后,再去启动服务时,抛出错误呢?

3 解决问题

通过分析排查,发现可以在把elasticsearch加入到systemd系统服务时,可以添加参数来实现,如下:

cat <<EOF > /etc/systemd/system/elasticsearch.service
[Unit]
Description=elasticsearch

[Service]
User=elastic
ExecStart=/home/elastic/elasticsearch-7.17.1/bin/elasticsearch
Restart=always
LimitNOFILE=65535


[Install]
WantedBy=multi-user.target
EOF

然后,依次执行下述命令:

systemctl disable elasticsearch.service 
systemctl daemon-reload
systemctl start elasticsearch.service
systemctl status elasticsearch.service
[root@localhost ~]# netstat -anp|grep 9200
tcp6       0      0 :::9200                 :::*                   LISTEN      22472/java          
[root@localhost ~]#

五 参考链接

关于设置/etc/systemd/system/elasticsearch.service

https://stackoverflow.com/questions/46771233/max-file-descriptors-for-elasticsearch-process-is-too-low

关于查看elasticsearch日志:

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/starting-elasticsearch.html#_running_elasticsearch_from_the_command_line

留言