Linux,  NGINX

服务器重启之后NGINX自动启动失败的可能原因

零 问题和故障现象

一早上,PM在微信群里反馈说,应用系统访问报错,让我到公司后,抽空赶紧排查下,解决掉问题。提供了一个消息说:昨晚客户机房ups报错,机房断电,服务器肯定有重启过。收到消息之后,第一反应应该是服务器重启之后,有些服务没有正确启动起来。快速回忆了一下该项目,NGINX有设置自动启动服务,后端服务运行在docker容器里,也设置自动重启,底层的数据库也是设置了自动启动服务的。

到达公司之后,访问应用系统时:https://policy.swywtg.cn/matters 确实报错了。通过VPN访问到应用系统服务器,先排查看看NGINX有没有什么报错日志,果然看到NGINX压根儿都没启动起来。于是,通过手工启动NGINX服务:

You have logged onto a secured server..All accesses logged
Authorized users only. All activity may be monitored and reported
Last login: Mon Jun 29 17:26:12 2026 from 172.16.1.29
[root@ywtg-app-13 ~]# docker ps
CONTAINER ID   IMAGE                                                                               COMMAND                  CREATED        STATUS        PORTS                                         NAMES
5d2b3186c6e0   swr.cn-south-1.myhuaweicloud.com/xmsme/shaowu/portal:2026-6-5.1                     "docker-entrypoint.s…"   3 weeks ago    Up 12 hours   0.0.0.0:82->3500/tcp, [::]:82->3500/tcp       portal
eb8b1495e7a9   swr.cn-south-1.myhuaweicloud.com/xmsme/fundamental_service/docker_registry:latest   "/entrypoint.sh /etc…"   4 months ago   Up 12 hours   0.0.0.0:5000->5000/tcp, [::]:5000->5000/tcp   registry
[root@ywtg-app-13 ~]# uptime 
 08:45:26 up 12:25,  1 user,  load average: 0.07, 0.02, 0.00
[root@ywtg-app-13 ~]# systemctl status nginx
● nginx.service - nginx
   Loaded: loaded (/etc/systemd/system/nginx.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2026-07-02 20:20:26 CST; 12h ago
  Process: 946 ExecStart=/etc/nginx/sbin/nginx (code=exited, status=1/FAILURE)

Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Service RestartSec=100ms expired, scheduling restart.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: Stopped nginx.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Start request repeated too quickly.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Failed with result 'exit-code'.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: Failed to start nginx.
[root@ywtg-app-13 ~]# 
[root@ywtg-app-13 ~]# systemctl start nginx 
[root@ywtg-app-13 ~]# 
[root@ywtg-app-13 ~]# systemctl status nginx
● nginx.service - nginx
   Loaded: loaded (/etc/systemd/system/nginx.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2026-07-03 08:46:15 CST; 3s ago
  Process: 380580 ExecStart=/etc/nginx/sbin/nginx (code=exited, status=0/SUCCESS)
 Main PID: 380584 (nginx)
    Tasks: 2
   Memory: 6.1M
   CGroup: /system.slice/nginx.service
           ├─380584 nginx: master process /etc/nginx/sbin/nginx
           └─380585 nginx: worker process

Jul 03 08:46:14 ywtg-app-13 systemd[1]: Starting nginx...
Jul 03 08:46:15 ywtg-app-13 systemd[1]: Started nginx.
[root@ywtg-app-13 ~]# 

 

然后,再访问应用系统之后,看到正常了,跟PM在群里反馈了。

到这里,问题解决了吗?

解决了,但是没有解决完,没有解决彻底。

为什么设置了NGINX的自动重启,可是服务器操作系统重启之后,NGINX没有启动起来。而手工启动NGINX却可以正常启动呢?并且,从上述NGINX的日志看到:

[root@ywtg-app-13 ~]# systemctl status nginx
● nginx.service - nginx
   Loaded: loaded (/etc/systemd/system/nginx.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2026-07-02 20:20:26 CST; 12h ago
  Process: 946 ExecStart=/etc/nginx/sbin/nginx (code=exited, status=1/FAILURE)

Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Service RestartSec=100ms expired, scheduling restart.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: Stopped nginx.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Start request repeated too quickly.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Failed with result 'exit-code'.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: Failed to start nginx.
[root@ywtg-app-13 ~]# 

 

since Thu 2026-07-02 20:20:26 ,12h以前,NGINX启动失败了,但是在此之前,曾经尝试过5次去启动该服务,每次间隔100ms,但是最后都失败了。重启请求太频繁,最终触发了系统的systemd的保护机制,不再继续尝试启动了,最终是failed的状态。直到,我登录服务器之后,尝试手工启动NGINX,并最终成功。

一 分析并解决问题

0 查看NGINX启动错误日志

[root@ywtg-app-13 ~]# tail -n 50 /etc/nginx/logs/error.log
...
2026/06/25 10:59:49 [notice] 254669#0: signal process started
2026/06/26 08:58:36 [notice] 924447#0: signal process started
2026/06/26 15:46:31 [notice] 1131680#0: signal process started
2026/06/29 09:23:51 [notice] 3131300#0: signal process started
2026/07/02 17:59:38 [notice] 1393351#0: signal process started
2026/07/02 20:20:23 [emerg] 860#0: host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113
2026/07/02 20:20:25 [emerg] 906#0: host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113
2026/07/02 20:20:25 [emerg] 913#0: host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113
2026/07/02 20:20:25 [emerg] 931#0: host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113
2026/07/02 20:20:25 [emerg] 946#0: host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113
[root@ywtg-app-13 ~]#

 

看到,昨天晚上,NGINX的确随着操作系统的启动而启动,但是尝试5次之后失败了,日志里显示有5条记录,这里有个线索指向了配置文件/etc/nginx/conf.d/8443https_policy_pc.conf的第113行,upstream找不到linye.swywtg.cn。

1 继续查看系统systemd记录的系统日志

[root@ywtg-app-13 ~]# journalctl -u nginx.service --since "2026-07-02 20:15:00" --until "2026-07-02 20:25:00"
-- Logs begin at Thu 2026-01-08 10:44:56 CST, end at Fri 2026-07-03 08:54:54 CST. --
Jul 02 20:20:23 ywtg-app-13 systemd[1]: Starting nginx...
Jul 02 20:20:24 ywtg-app-13 nginx[860]: nginx: [emerg] host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113
Jul 02 20:20:24 ywtg-app-13 systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jul 02 20:20:24 ywtg-app-13 systemd[1]: nginx.service: Failed with result 'exit-code'.
Jul 02 20:20:24 ywtg-app-13 systemd[1]: Failed to start nginx.
Jul 02 20:20:24 ywtg-app-13 systemd[1]: nginx.service: Service RestartSec=100ms expired, scheduling restart.
Jul 02 20:20:24 ywtg-app-13 systemd[1]: nginx.service: Scheduled restart job, restart counter is at 1.
Jul 02 20:20:24 ywtg-app-13 systemd[1]: Stopped nginx.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Starting nginx...
Jul 02 20:20:25 ywtg-app-13 nginx[906]: nginx: [emerg] host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Failed with result 'exit-code'.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Failed to start nginx.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Service RestartSec=100ms expired, scheduling restart.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Scheduled restart job, restart counter is at 2.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Stopped nginx.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Starting nginx...
Jul 02 20:20:25 ywtg-app-13 nginx[913]: nginx: [emerg] host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Failed with result 'exit-code'.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Failed to start nginx.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Service RestartSec=100ms expired, scheduling restart.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Scheduled restart job, restart counter is at 3.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Stopped nginx.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Starting nginx...
Jul 02 20:20:25 ywtg-app-13 nginx[931]: nginx: [emerg] host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Failed with result 'exit-code'.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Failed to start nginx.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Service RestartSec=100ms expired, scheduling restart.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Scheduled restart job, restart counter is at 4.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Stopped nginx.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Starting nginx...
Jul 02 20:20:25 ywtg-app-13 nginx[946]: nginx: [emerg] host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jul 02 20:20:25 ywtg-app-13 systemd[1]: nginx.service: Failed with result 'exit-code'.
Jul 02 20:20:25 ywtg-app-13 systemd[1]: Failed to start nginx.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Service RestartSec=100ms expired, scheduling restart.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: Stopped nginx.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Start request repeated too quickly.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: nginx.service: Failed with result 'exit-code'.
Jul 02 20:20:26 ywtg-app-13 systemd[1]: Failed to start nginx.
[root@ywtg-app-13 ~]# 

 

通过journalctl -u nginx.service –since "2026-07-02 20:15:00" –until "2026-07-02 20:25:00"命令,查看系统里nginx.service这个服务单元,在–since "2026-07-02 20:15:00" –until "2026-07-02 20:25:00"10分钟的时间范围内的所有报错日志,同样看到提示host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113。

这就说明了,昨晚NGINX服务随着操作系统的启动而自动启动时,最终没有启动的原因是host not found in upstream "linye.swywtg.cn" in /etc/nginx/conf.d/8443https_policy_pc.conf:113。

2 根据错误提示,查看 /etc/nginx/conf.d/8443https_policy_pc.conf配置文件


   110  # 2026.05.15  把http://linye.swywtg.cn:9104  代理为 https://policy.swywtg.cn/linye 核心代理配置
   111      location /linye/ {
   112          # 转发请求到后端地址
   113          proxy_pass http://linye.swywtg.cn:9104/;
   114
   115          # 传递原始请求头,保证后端能获取真实信息
   116          proxy_set_header Host $host;
   117          proxy_set_header X-Real-IP $remote_addr;
   118          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   119          proxy_set_header X-Forwarded-Proto $scheme;
   120
   121          # 解决可能出现的重定向路径问题
   122          proxy_redirect off;
   123      }
   124

 

没错,配置文件里的第113行,确实有一个反向地理的目的地址:http://linye.swywtg.cn:9104/

服务器上,尝试ping域名,Telnet对应的端口:

[root@ywtg-app-13 ~]# telnet linye.swywtg.cn 9104
Trying 218.67.107.130...
Connected to linye.swywtg.cn.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
[root@ywtg-app-13 ~]# ping linye.swywtg.cn       
PING linye.swywtg.cn (218.67.107.130) 56(84) bytes of data.
^C
--- linye.swywtg.cn ping statistics ---
8 packets transmitted, 0 received, 100% packet loss, time 7186ms

[root@ywtg-app-13 ~]# 

 

互联网上直接访问这个地址 http://linye.swywtg.cn:9104/

或者是 http://218.67.107.130:9104/#/ 域名加端口,都是正常的。

分析到这里,越来越疑惑了,NGINX日志里提示,因为upstream linye.swywtg.cn无法访问或解析而导致启动失败。实际上,该域名无论是从浏览器直接访问,还是Telnet 对应的端口,网络都是通的。

这就矛盾了,逻辑不成立了。

3 查看分析NGINX服务的配置文件/etc/systemd/system/nginx.service

[root@ywtg-app-13 ~]# cat /etc/systemd/system/nginx.service 
[Unit]
Description=nginx
After=network.target
 
[Service]
Type=forking
ExecStart=/etc/nginx/sbin/nginx
ExecReload=/etc/nginx/sbin/nginx -s reload
ExecStop=/etc/nginx/sbin/nginx -s quit
PrivateTmp=true
Restart=always

[Install]
WantedBy=multi-user.target
[root@ywtg-app-13 ~]# 

 

NGINX服务配置项里,在操作系统的网络服务启动之后,After=network.target,就开始尝试启动。有没有可能,在操作系统启动的过程中,此时服务器需要访问的DNS服务器还没有正常启动,导致服务器解析NGINX配置文件里的upstream linye.swywtg.cn失败,而导致无法启动。或者其它的网络组件没有完全启动成功,导致解析这个域名失败,最终造成NGINX启动失败呢?

经过查阅资料:

NGINX自身有一套自己的运行和校验机制,启动或重启过程中,它会对所有配置文件中的涉及到proxy_pass的所有域名都进行强制解析,如果解析失败,NGINX则会认为这是一个致命的错误,[emerg],直接拒绝启动。

4 修改NGINX服务配置文件

修改 /etc/systemd/system/nginx.service 配置文件,添加

After=network.target network-online.target nss-lookup.target Wants=network-online.target

意味着,需要网络启动成功,网络服务启动成功,dns解析可以之后,再尝试启动NGINX。并且,如果启动之后,需要间隔5s再次尝试重启。

[root@ywtg-app-13 ~]# cat /etc/systemd/system/nginx.service 
[Unit]
Description=nginx

# 核心:加入 network-online.target 和 nss-lookup.target(域名解析服务)
After=network.target network-online.target nss-lookup.target
Wants=network-online.target
 
[Service]
Type=forking
ExecStart=/etc/nginx/sbin/nginx
ExecReload=/etc/nginx/sbin/nginx -s reload
ExecStop=/etc/nginx/sbin/nginx -s quit
PrivateTmp=true
Restart=always
# 如果失败,等 5 秒再试,别傻傻地在 1 秒内连试 5 次
RestartSec=5s   

[Install]
WantedBy=multi-user.target
[root@ywtg-app-13 ~]# systemctl daemon-reload 
[root@ywtg-app-13 ~]# /etc/nginx/sbin/nginx -t
nginx: the configuration file /etc/nginx/conf/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/conf/nginx.conf test is successful
[root@ywtg-app-13 ~]# /etc/nginx/sbin/nginx -s reload
[root@ywtg-app-13 ~]# systemctl restart nginx
[root@ywtg-app-13 ~]# systemctl status nginx
● nginx.service - nginx
   Loaded: loaded (/etc/systemd/system/nginx.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2026-07-03 14:53:06 CST; 3s ago
  Process: 567200 ExecStart=/etc/nginx/sbin/nginx (code=exited, status=0/SUCCESS)
 Main PID: 567201 (nginx)
    Tasks: 2
   Memory: 1.9M
   CGroup: /system.slice/nginx.service
           ├─567201 nginx: master process /etc/nginx/sbin/nginx
           └─567202 nginx: worker process

Jul 03 14:53:06 ywtg-app-13 systemd[1]: Starting nginx...
Jul 03 14:53:06 ywtg-app-13 systemd[1]: Started nginx.
[root@ywtg-app-13 ~]#  
[root@ywtg-app-13 ~]# ps -ef|grep nginx
root      567201       1  0 14:53 ?        00:00:00 nginx: master process /etc/nginx/sbin/nginx
root      567546  567201  0 14:53 ?        00:00:00 nginx: worker process
root      567632  385040  0 14:54 pts/2    00:00:00 grep nginx
[root@ywtg-app-13 ~]# 

 

这里,执行了systemctl restart nginx,对于生产系统要谨慎使用,通常情况下,应该使用热加载的方式来加载配置。为了验证配置项,这里才使用restart。

二 问题复盘

结合PM早上提供的客户机房ups故障,服务器重启,导致系统故障,以及今天早上我手工启动NGINX正常的时间线:

  • 昨天晚上,2026年7月2日,服务器刚加电启动:此时系统的网卡虽然可能亮了,但是 DNS 解析服务(如 systemd-resolvedNetworkManager)还没有完全就绪,或者这台NGINX服务器无法访问到它的 DNS 服务器;
  • 接下来Nginx 急着跟随操作系统的启动而启动服务:因为 nginx.service 里只写了 After=network.target(这个参数只代表网卡设备起来了,不代表要求网络真正能通、DNS能解析);
  • 连续碰壁 5 次:Nginx 强行去解析 linye.swywtg.cn,结果解析失败了(DNS解析故障,或者网络不通),连续报错 5 次;
  • 触发保护,彻底罢工:systemd 发现 Nginx 在短短几秒内死了 5 次,触发了频率限制,判定服务启动失败,不再尝试继续启动服务;
  • 今天早上手动启动NGINX时:服务器开机已经10几个小时了,网络服务和 DNS 早就完全通了。所以,我再去 ping linye.swywtg.cn时,自然能解析到对应的IP 218.67.107.130,telnet linye.swywtg.cn 9104时,也是通的。最后我手动 systemctl start nginx 就一切正常,系统也可以正常访问了。

 

留言