Nginx性能优化完全指南:从入门到10万+并发实战

  • 时间:2025-12-01 21:13 作者: 来源: 阅读:3
  • 扫一扫,手机访问
摘要:前言 Nginx作为高性能Web服务器和反向代理,已经成为互联网基础设施的标配。但是,默认配置的Nginx远未发挥出真正的性能潜力。 本文将从多个维度系统性地讲解Nginx性能优化,涵盖从基础配置到系统调优,从单机优化到集群架构的完整方案。 你将学到: ✅ Nginx配置参数深度解析✅ 系统层面性能调优✅ 缓存策略与CDN集成✅ SSL/TLS性能优化✅ 负载均衡高级技巧✅ 监控与故障诊断

前言

Nginx作为高性能Web服务器和反向代理,已经成为互联网基础设施的标配。但是,默认配置的Nginx远未发挥出真正的性能潜力

本文将从多个维度系统性地讲解Nginx性能优化,涵盖从基础配置到系统调优,从单机优化到集群架构的完整方案。

你将学到

✅ Nginx配置参数深度解析✅ 系统层面性能调优✅ 缓存策略与CDN集成✅ SSL/TLS性能优化✅ 负载均衡高级技巧✅ 监控与故障诊断✅ 从1万到10万+并发的实战经验

关键词:Nginx优化、高并发、性能调优、负载均衡、缓存策略


目录

一、性能基准测试与问题定位二、Nginx配置层优化三、操作系统层优化四、网络层优化五、缓存策略优化六、SSL/TLS性能优化七、负载均衡与高可用八、监控与调优九、实战案例十、性能调优检查清单

一、性能基准测试与问题定位

1.1 建立性能基准

为什么要先测试?

了解当前性能瓶颈优化效果可量化避免过度优化
测试工具对比
工具适用场景优点缺点
ab快速测试简单易用功能单一
wrk压力测试性能强、支持Lua配置复杂
siege并发测试统计详细性能一般
JMeter复杂场景功能强大、GUI资源消耗大
Locust分布式压测Python脚本、易扩展学习曲线
基准测试实战

# 1. ab快速测试(Apache Bench)
ab -n 10000 -c 100 http://localhost/

# 参数说明:
# -n: 总请求数
# -c: 并发数
# -t: 测试时长(秒)
# -k: 开启HTTP KeepAlive

# 输出示例:
# Requests per second:    3421.56 [#/sec] (mean)
# Time per request:       29.226 [ms] (mean)
# Transfer rate:          684.31 [Kbytes/sec] received

# 2. wrk压力测试(推荐)
wrk -t 12 -c 400 -d 30s --latency http://localhost/

# 参数说明:
# -t: 线程数(建议=CPU核心数)
# -c: 连接数
# -d: 测试时长
# --latency: 显示延迟分布

# 输出示例:
# Running 30s test @ http://localhost/
#   12 threads and 400 connections
#   Thread Stats   Avg      Stdev     Max   +/- Stdev
#     Latency    45.32ms   12.89ms  201.15ms   87.23%
#     Req/Sec     8.12k     1.33k   12.45k    72.15%
#   Latency Distribution
#      50%   43.21ms
#      75%   51.34ms
#      90%   62.45ms
#      99%   89.12ms
#   2897456 requests in 30.01s, 2.13GB read
# Requests/sec:  96523.45
# Transfer/sec:     72.67MB

# 3. siege持续压测
siege -c 200 -t 60s http://localhost/

# 4. wrk高级用法(自定义请求)
cat > post.lua <<EOF
wrk.method = "POST"
wrk.body   = '{"key":"value"}'
wrk.headers["Content-Type"] = "application/json"
EOF

wrk -t 4 -c 100 -d 30s -s post.lua http://localhost/api

1.2 性能指标解读

关键指标
指标含义优秀良好需优化
QPS每秒请求数>50k10k-50k<10k
延迟(P50)50%请求的响应时间<10ms10-50ms>50ms
延迟(P99)99%请求的响应时间<50ms50-200ms>200ms
错误率失败请求比例<0.01%0.01-0.1%>0.1%
带宽网络吞吐量接近物理限制>50%<50%
瓶颈分析

# 1. 实时监控系统资源
# CPU使用率
top -p $(pgrep nginx | head -1)

# 内存使用
ps aux | grep nginx | awk '{sum+=$6} END {print sum/1024 "MB"}'

# 网络连接数
netstat -an | grep :80 | wc -l

# 2. Nginx状态监控
# 启用stub_status模块
# nginx.conf:
location /nginx_status {
    stub_status on;
    access_log off;
    allow 127.0.0.1;
    deny all;
}

# 访问查看
curl http://localhost/nginx_status

# 输出:
# Active connections: 291
# server accepts handled requests
#  16630948 16630948 31070465
# Reading: 6 Writing: 179 Keepalive: 106

# 3. 连接状态统计
ss -s

# 4. TCP队列积压
netstat -s | grep -i listen

# 5. 文件描述符使用
lsof -n | grep nginx | wc -l
cat /proc/sys/fs/file-nr

1.3 常见性能瓶颈

瓶颈类型症状排查方法
CPU瓶颈CPU使用率>80% top, perf
内存瓶颈频繁swap free -h, vmstat
磁盘I/Oiowait高 iostat, iotop
网络带宽带宽跑满 iftop, nethogs
连接数限制Too many open files ulimit -n
upstream慢后端响应慢 $upstream_response_time

二、Nginx配置层优化

2.1 Worker进程优化

核心参数

# nginx.conf 核心配置

# 1. Worker进程数
worker_processes auto;  # 推荐:auto(自动匹配CPU核心数)
# 或手动指定
# worker_processes 8;

# 2. Worker CPU亲和性(绑定CPU核心,减少上下文切换)
worker_cpu_affinity auto;
# 或手动绑定(4核心示例)
# worker_cpu_affinity 0001 0010 0100 1000;
# 8核心示例:
# worker_cpu_affinity 00000001 00000010 00000100 00001000 00010000 00100000 01000000 10000000;

# 3. Worker优先级(范围-20到19,越小优先级越高)
worker_priority -5;

# 4. 每个Worker的最大连接数
events {
    worker_connections 65535;  # 根据系统ulimit调整
    
    # 使用epoll(Linux高性能I/O模型)
    use epoll;
    
    # 尽可能接受更多连接
    multi_accept on;
    
    # 接受锁(防止惊群)
    accept_mutex off;  # Nginx 1.11.3+版本建议关闭
}

# 5. 最大文件打开数
worker_rlimit_nofile 65535;
计算公式

最大并发连接数 = worker_processes × worker_connections

理论QPS = 最大并发连接数 / 平均响应时间

示例:
8个worker × 65535连接 = 524,280 并发连接
如果平均响应时间50ms,理论QPS = 524,280 / 0.05 = 10,485,600

2.2 HTTP核心优化


http {
    # ========== 基础优化 ==========
    
    # 隐藏Nginx版本号(安全)
    server_tokens off;
    
    # 文件高效传输
    sendfile on;
    tcp_nopush on;    # 数据包累积到一定大小再发送
    tcp_nodelay on;   # 小数据包立即发送(和tcp_nopush不冲突)
    
    # ========== 超时设置 ==========
    
    # 客户端请求头超时
    client_header_timeout 15s;
    
    # 客户端请求体超时
    client_body_timeout 15s;
    
    # 响应客户端超时
    send_timeout 15s;
    
    # 长连接超时(重要!)
    keepalive_timeout 65s;
    keepalive_requests 100;  # 单个连接最大请求数
    
    # ========== 缓冲区优化 ==========
    
    # 客户端请求头缓冲
    client_header_buffer_size 4k;
    large_client_header_buffers 4 32k;
    
    # 客户端请求体缓冲
    client_body_buffer_size 128k;
    client_max_body_size 50m;
    
    # 输出缓冲
    output_buffers 4 32k;
    postpone_output 1460;  # 累积1460字节(一个MTU)后发送
    
    # ========== Gzip压缩 ==========
    
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;      # 压缩级别1-9,6是性能和压缩率的平衡
    gzip_types
        text/plain
        text/css
        text/xml
        text/javascript
        application/json
        application/javascript
        application/xml+rss
        application/rss+xml
        font/truetype
        font/opentype
        application/vnd.ms-fontobject
        image/svg+xml;
    gzip_min_length 1000;   # 小于1KB的文件不压缩
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    
    # ========== 日志优化 ==========
    
    # 访问日志格式
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for" '
                    'rt=$request_time uct="$upstream_connect_time" '
                    'uht="$upstream_header_time" urt="$upstream_response_time"';
    
    # 日志缓冲(减少磁盘写入)
    access_log /var/log/nginx/access.log main buffer=32k flush=5s;
    
    # 高并发场景可以关闭访问日志(提升性能)
    # access_log off;
    
    # 错误日志级别(warn或error)
    error_log /var/log/nginx/error.log warn;
    
    # ========== 文件缓存 ==========
    
    # 打开文件缓存
    open_file_cache max=10000 inactive=60s;
    open_file_cache_valid 30s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
    
    # ========== 其他优化 ==========
    
    # 重置超时的长连接
    reset_timedout_connection on;
    
    # 服务器名称哈希表
    server_names_hash_bucket_size 128;
    server_names_hash_max_size 512;
    
    # 类型哈希表
    types_hash_max_size 2048;
}

2.3 静态资源优化


server {
    listen 80;
    server_name static.example.com;
    
    root /var/www/static;
    
    # 静态资源位置
    location ~* .(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2|ttf|eot)$ {
        # 浏览器缓存
        expires 1y;
        add_header Cache-Control "public, immutable";
        
        # 访问日志(静态资源可以关闭)
        access_log off;
        
        # 跨域头
        add_header Access-Control-Allow-Origin *;
        
        # Gzip静态压缩(预压缩)
        gzip_static on;  # 需要编译时添加--with-http_gzip_static_module
        
        # 零拷贝(大文件)
        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        
        # 直接I/O(大文件,绕过缓存)
        directio 4m;
        directio_alignment 512;
        
        # 分片读取(大文件)
        output_buffers 1 128k;
    }
    
    # 小文件优化
    location ~* .(html|xml|json)$ {
        expires 1h;
        add_header Cache-Control "public";
        
        # 小文件使用内存缓存
        open_file_cache max=1000 inactive=20s;
        open_file_cache_valid 30s;
        open_file_cache_min_uses 2;
    }
}

2.4 反向代理优化


upstream backend {
    # 负载均衡策略
    # 1. 默认轮询
    # 2. least_conn - 最少连接
    # 3. ip_hash - IP哈希(会话保持)
    # 4. hash $request_uri consistent - 一致性哈希
    
    least_conn;
    
    # 后端服务器
    server 192.168.1.101:8080 weight=3 max_fails=2 fail_timeout=30s;
    server 192.168.1.102:8080 weight=2 max_fails=2 fail_timeout=30s;
    server 192.168.1.103:8080 weight=1 max_fails=2 fail_timeout=30s;
    
    # 后端长连接池(重要!)
    keepalive 100;          # 保持100个空闲连接
    keepalive_requests 100; # 每个连接最多100个请求
    keepalive_timeout 60s;  # 空闲连接超时
}

server {
    listen 80;
    server_name api.example.com;
    
    location / {
        # 代理配置
        proxy_pass http://backend;
        
        # ========== 代理头 ==========
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # ========== 超时设置 ==========
        proxy_connect_timeout 5s;      # 连接后端超时
        proxy_send_timeout 60s;        # 发送数据超时
        proxy_read_timeout 60s;        # 读取响应超时
        
        # ========== 缓冲区 ==========
        proxy_buffering on;
        proxy_buffer_size 8k;
        proxy_buffers 32 8k;
        proxy_busy_buffers_size 64k;
        
        # ========== 后端长连接(关键!)==========
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # ========== 错误处理 ==========
        proxy_next_upstream error timeout http_500 http_502 http_503;
        proxy_next_upstream_tries 2;
        proxy_next_upstream_timeout 10s;
        
        # ========== 临时文件 ==========
        proxy_max_temp_file_size 0;  # 禁用临时文件
    }
}

三、操作系统层优化

3.1 文件描述符限制


# 1. 查看当前限制
ulimit -n
# 默认通常是1024,远远不够

# 2. 临时修改(重启失效)
ulimit -n 65535

# 3. 永久修改(推荐)
sudo vi /etc/security/limits.conf

# 添加以下内容:
*  soft  nofile  65535
*  hard  nofile  65535
root  soft  nofile  65535
root  hard  nofile  65535

# 4. 系统级限制
sudo vi /etc/sysctl.conf

fs.file-max = 2097152
fs.nr_open = 2097152

# 应用配置
sudo sysctl -p

# 5. Systemd服务限制(如果使用systemd)
sudo mkdir -p /etc/systemd/system/nginx.service.d
sudo vi /etc/systemd/system/nginx.service.d/limits.conf

[Service]
LimitNOFILE=65535
LimitNPROC=65535

sudo systemctl daemon-reload
sudo systemctl restart nginx

# 6. 验证
cat /proc/$(pgrep nginx | head -1)/limits | grep "open files"

3.2 内核网络参数优化


# /etc/sysctl.conf
# 完整的高性能网络配置

# ========== TCP基础参数 ==========

# TCP连接队列
net.core.somaxconn = 65535                    # 最大监听队列
net.core.netdev_max_backlog = 65535           # 网卡接收队列
net.ipv4.tcp_max_syn_backlog = 65535          # SYN队列长度

# TCP连接数
net.ipv4.ip_local_port_range = 1024 65535     # 可用端口范围
net.ipv4.tcp_max_tw_buckets = 20000           # TIME_WAIT数量限制

# ========== TCP性能优化 ==========

# 开启TCP Fast Open(减少握手延迟)
net.ipv4.tcp_fastopen = 3

# TCP拥塞控制算法(BBR最优)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

# TCP窗口缩放
net.ipv4.tcp_window_scaling = 1

# TCP缓冲区大小(自动调整)
net.core.rmem_max = 16777216                  # 接收缓冲最大值
net.core.wmem_max = 16777216                  # 发送缓冲最大值
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.ipv4.tcp_rmem = 4096 87380 16777216       # min default max
net.ipv4.tcp_wmem = 4096 65536 16777216

# ========== TCP连接回收 ==========

# 快速回收TIME_WAIT连接
net.ipv4.tcp_tw_reuse = 1                     # 允许复用TIME_WAIT

# FIN超时时间(加快连接释放)
net.ipv4.tcp_fin_timeout = 15

# Keepalive设置
net.ipv4.tcp_keepalive_time = 600             # 开始探测前的空闲时间
net.ipv4.tcp_keepalive_probes = 3             # 探测次数
net.ipv4.tcp_keepalive_intvl = 15             # 探测间隔

# ========== SYN防护 ==========

# SYN Cookies(防SYN Flood)
net.ipv4.tcp_syncookies = 1

# SYN+ACK重试次数
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2

# ========== 内存优化 ==========

# TCP内存
net.ipv4.tcp_mem = 786432 1048576 1572864     # min pressure max (单位:页,1页=4KB)

# 禁用SWAP(可选,根据内存大小)
# vm.swappiness = 0

# ========== 其他优化 ==========

# 启用时间戳(RTT计算)
net.ipv4.tcp_timestamps = 1

# 启用SACK(选择性确认)
net.ipv4.tcp_sack = 1

# MTU探测
net.ipv4.tcp_mtu_probing = 1

# 应用配置
sudo sysctl -p

# 验证BBR是否生效
sysctl net.ipv4.tcp_congestion_control
lsmod | grep bbr

3.3 BBR拥塞控制算法

BBR vs 传统算法性能对比

场景Cubic(传统)BBR提升
低延迟网络950Mbps980Mbps+3%
高延迟网络(200ms)120Mbps850Mbps+708%
有丢包网络(1%)450Mbps780Mbps+73%

启用BBR(内核要求4.9+)


# 1. 检查内核版本
uname -r
# 如果低于4.9,需要升级内核

# 2. 检查是否支持BBR
grep -i bbr /boot/config-$(uname -r)

# 3. 启用BBR
echo "net.core.default_qdisc=fq" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_congestion_control=bbr" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

# 4. 验证
sysctl net.ipv4.tcp_congestion_control
# 输出:net.ipv4.tcp_congestion_control = bbr

lsmod | grep bbr
# 输出:tcp_bbr  20480  1

四、网络层优化

4.1 网卡多队列


# 1. 查看网卡队列数
ethtool -l eth0

# 输出:
# Channel parameters for eth0:
# Pre-set maximums:
# RX:             8
# TX:             8
# Combined:       8
# Current hardware settings:
# RX:             4
# TX:             4
# Combined:       4

# 2. 设置队列数(设为CPU核心数)
sudo ethtool -L eth0 combined 8

# 3. 查看中断分配
cat /proc/interrupts | grep eth0

# 4. 启用RPS/RFS(软件层多队列)
# RPS: Receive Packet Steering
for i in /sys/class/net/eth0/queues/rx-*/rps_cpus; do
    echo "ff" | sudo tee $i
done

# RFS: Receive Flow Steering
echo 32768 | sudo tee /proc/sys/net/core/rps_sock_flow_entries
for i in /sys/class/net/eth0/queues/rx-*/rps_flow_cnt; do
    echo 2048 | sudo tee $i
done

4.2 网卡Ring Buffer


# 1. 查看当前Ring Buffer大小
ethtool -g eth0

# 输出:
# Ring parameters for eth0:
# Pre-set maximums:
# RX:             4096
# TX:             4096
# Current hardware settings:
# RX:             512
# TX:             512

# 2. 增大Ring Buffer(减少丢包)
sudo ethtool -G eth0 rx 4096 tx 4096

# 3. 查看丢包统计
ethtool -S eth0 | grep -i drop
ethtool -S eth0 | grep -i error

# 4. 实时监控
watch -n 1 'ethtool -S eth0 | grep -E "rx_dropped|tx_dropped"'

4.3 网络性能监控


# 1. 实时带宽监控
iftop -i eth0

# 2. 连接数统计
ss -s

# 输出:
# Total: 1324
# TCP:   1200 (estab 980, closed 180, orphaned 0, timewait 150)

# 3. 每个状态的连接数
ss -ant | awk '{print $1}' | sort | uniq -c

# 4. 监控特定端口
watch -n 1 'ss -tan state established "( dport = :80 or sport = :80 )" | wc -l'

# 5. 查看TCP重传率
nstat -az | grep -i retrans

# 6. 网络延迟测试
ping -c 100 -i 0.2 -q target-server
# -q: 安静模式,只显示统计

五、缓存策略优化

5.1 Nginx缓存配置


http {
    # ========== 缓存路径配置 ==========
    
    # 定义缓存路径
    proxy_cache_path /var/cache/nginx/proxy
        levels=1:2                  # 二级目录结构
        keys_zone=proxy_cache:100m  # 内存中的缓存索引(100MB)
        max_size=10g                # 磁盘缓存最大10GB
        inactive=7d                 # 7天未访问删除
        use_temp_path=off;          # 直接写入缓存目录
    
    # FastCGI缓存
    fastcgi_cache_path /var/cache/nginx/fastcgi
        levels=1:2
        keys_zone=fastcgi_cache:100m
        max_size=5g
        inactive=7d
        use_temp_path=off;
    
    # 缓存KEY
    proxy_cache_key "$scheme$request_method$host$request_uri";
    
    # ========== 上游服务器配置 ==========
    upstream backend {
        server 192.168.1.101:8080;
        server 192.168.1.102:8080;
        
        keepalive 100;
    }
    
    server {
        listen 80;
        server_name www.example.com;
        
        # ========== 代理缓存配置 ==========
        location / {
            proxy_pass http://backend;
            
            # 启用缓存
            proxy_cache proxy_cache;
            
            # 缓存有效期(根据HTTP状态码)
            proxy_cache_valid 200 302 1h;
            proxy_cache_valid 301 1d;
            proxy_cache_valid 404 1m;
            proxy_cache_valid any 1m;
            
            # 缓存条件
            proxy_cache_methods GET HEAD;
            proxy_cache_min_uses 2;        # 访问2次后才缓存
            
            # 缓存锁(防止缓存雪崩)
            proxy_cache_lock on;
            proxy_cache_lock_timeout 5s;
            proxy_cache_lock_age 5s;
            
            # 陈旧缓存(后端故障时使用旧缓存)
            proxy_cache_use_stale error timeout updating
                                  http_500 http_502 http_503 http_504;
            
            # 后台更新缓存
            proxy_cache_background_update on;
            
            # 忽略后端的缓存控制头
            proxy_ignore_headers Cache-Control Expires;
            
            # 响应头显示缓存状态
            add_header X-Cache-Status $upstream_cache_status;
            # HIT: 缓存命中
            # MISS: 缓存未命中
            # EXPIRED: 缓存已过期
            # STALE: 使用陈旧缓存
            # UPDATING: 正在更新缓存
            # REVALIDATED: 缓存重新验证
            # BYPASS: 缓存被绕过
            
            # 不缓存的条件
            proxy_cache_bypass $http_pragma $http_authorization;
            proxy_no_cache $http_pragma $http_authorization;
            
            # 代理头
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
        }
        
        # ========== 动态内容不缓存 ==========
        location ~* .(php|jsp|cgi|asp|aspx)$ {
            proxy_pass http://backend;
            proxy_cache off;
        }
        
        # ========== 缓存清除接口 ==========
        location ~ /purge(/.*) {
            allow 127.0.0.1;
            deny all;
            
            proxy_cache_purge proxy_cache "$scheme$request_method$host$1";
        }
    }
}

5.2 浏览器缓存


server {
    listen 80;
    server_name static.example.com;
    
    root /var/www/static;
    
    # ========== 强缓存 ==========
    
    # 图片、字体(1年)
    location ~* .(jpg|jpeg|png|gif|ico|svg|woff|woff2|ttf|eot)$ {
        expires 1y;
        add_header Cache-Control "public, immutable";
        access_log off;
    }
    
    # CSS、JS(1个月)
    location ~* .(css|js)$ {
        expires 30d;
        add_header Cache-Control "public";
        
        # 支持ETag协商缓存
        etag on;
    }
    
    # HTML(不缓存,使用协商缓存)
    location ~* .html$ {
        expires -1;
        add_header Cache-Control "no-cache";
        etag on;
        if_modified_since exact;
    }
    
    # ========== 协商缓存 ==========
    
    # 启用ETag
    etag on;
    
    # Last-Modified
    if_modified_since exact;  # 精确匹配
}

5.3 Redis缓存集成


# 需要编译ngx_http_redis_module模块

http {
    upstream redis {
        server 127.0.0.1:6379;
        keepalive 10;
    }
    
    server {
        listen 80;
        
        location /api/ {
            set $redis_key "$uri$is_args$args";
            
            redis_pass redis;
            default_type text/html;
            
            # Redis超时
            redis_connect_timeout 1s;
            redis_read_timeout 1s;
            
            # 错误处理(Redis不可用时回源)
            error_page 404 502 504 = @fallback;
        }
        
        location @fallback {
            proxy_pass http://backend;
        }
    }
}

5.4 缓存命中率监控


# 1. 实时监控缓存状态
tail -f /var/log/nginx/access.log | awk '{print $(NF-1)}' | sort | uniq -c

# 输出示例:
#   1523 HIT
#    234 MISS
#     12 EXPIRED

# 2. 统计缓存命中率
awk '{print $(NF-1)}' /var/log/nginx/access.log | 
    awk '{count[$1]++} END {for(i in count) print i, count[i]}'

# 3. 查看缓存大小
du -sh /var/cache/nginx/*

# 4. 查看缓存文件数
find /var/cache/nginx/proxy -type f | wc -l

# 5. 缓存命中率计算
# 命中率 = HIT / (HIT + MISS) × 100%

六、SSL/TLS性能优化

6.1 SSL证书优化


server {
    listen 443 ssl http2;
    server_name www.example.com;
    
    # ========== 证书配置 ==========
    
    # SSL证书
    ssl_certificate /etc/nginx/ssl/fullchain.pem;
    ssl_certificate_key /etc/nginx/ssl/privkey.pem;
    
    # 证书链(提高兼容性)
    ssl_trusted_certificate /etc/nginx/ssl/chain.pem;
    
    # ========== SSL协议和加密套件 ==========
    
    # 只允许TLS 1.2和1.3
    ssl_protocols TLSv1.2 TLSv1.3;
    
    # 优先使用服务器端加密套件
    ssl_prefer_server_ciphers on;
    
    # 加密套件(安全性和性能平衡)
    ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384';
    
    # TLS 1.3专用加密套件
    ssl_conf_command Ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256;
    
    # ========== SSL会话缓存(关键优化!)==========
    
    # 共享会话缓存(所有worker共享)
    ssl_session_cache shared:SSL:50m;
    
    # 会话超时
    ssl_session_timeout 1d;
    
    # TLS 1.3会话票据(Session Ticket)
    ssl_session_tickets on;
    ssl_session_ticket_key /etc/nginx/ssl/ticket.key;
    
    # ========== OCSP Stapling(性能优化)==========
    
    ssl_stapling on;
    ssl_stapling_verify on;
    resolver 8.8.8.8 8.8.4.4 valid=300s;
    resolver_timeout 5s;
    
    # ========== SSL缓冲区 ==========
    
    ssl_buffer_size 4k;  # 减小值提高首字节时间(适合小文件)
    
    # ========== HTTP/2设置 ==========
    
    http2_max_field_size 16k;
    http2_max_header_size 32k;
    http2_max_requests 1000;
    
    # ========== 安全头 ==========
    
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    
    location / {
        proxy_pass http://backend;
    }
}

6.2 SSL性能测试


# 1. SSL握手性能测试
openssl s_time -connect www.example.com:443 -new -time 10

# 输出:
# 1234 connections in 10.00s; 123.4 connections/user sec

# 2. SSL会话复用测试
echo | openssl s_client -connect www.example.com:443 -reconnect 2>/dev/null | grep "Session-ID"

# 3. 查看SSL配置
openssl s_client -connect www.example.com:443 -tls1_3

# 4. 测试OCSP Stapling
openssl s_client -connect www.example.com:443 -status

# 5. SSL实验室测试(在线)
# https://www.ssllabs.com/ssltest/

6.3 Let’s Encrypt自动续期


# 1. 安装Certbot
sudo apt install certbot python3-certbot-nginx

# 2. 获取证书
sudo certbot --nginx -d www.example.com -d example.com

# 3. 自动续期
sudo certbot renew --dry-run

# 4. 配置自动续期任务
sudo crontab -e
# 添加:
0 0,12 * * * /usr/bin/certbot renew --quiet --post-hook "systemctl reload nginx"

# 5. 检查证书有效期
openssl x509 -in /etc/letsencrypt/live/example.com/fullchain.pem -noout -dates

七、负载均衡与高可用

7.1 负载均衡算法


upstream backend {
    # ========== 算法1:轮询(默认)==========
    # 每个请求按时间顺序分配
    # server 192.168.1.101:8080;
    # server 192.168.1.102:8080;
    
    # ========== 算法2:加权轮询 ==========
    # weight越大,分配的请求越多
    # server 192.168.1.101:8080 weight=3;
    # server 192.168.1.102:8080 weight=2;
    # server 192.168.1.103:8080 weight=1;
    
    # ========== 算法3:least_conn(最少连接)==========
    # 分配给连接数最少的服务器
    least_conn;
    server 192.168.1.101:8080;
    server 192.168.1.102:8080;
    
    # ========== 算法4:ip_hash(IP哈希)==========
    # 同一IP总是访问同一台服务器(会话保持)
    # ip_hash;
    # server 192.168.1.101:8080;
    # server 192.168.1.102:8080;
    
    # ========== 算法5:一致性哈希 ==========
    # 基于请求URI哈希
    # hash $request_uri consistent;
    # server 192.168.1.101:8080;
    # server 192.168.1.102:8080;
    
    # ========== 服务器参数 ==========
    # weight=N          权重
    # max_fails=N       失败N次标记为不可用
    # fail_timeout=Ns   失败超时时间
    # backup            备份服务器
    # down              标记为不可用
    # max_conns=N       最大连接数限制
    
    server 192.168.1.101:8080 max_fails=3 fail_timeout=30s max_conns=1000;
    server 192.168.1.102:8080 max_fails=3 fail_timeout=30s max_conns=1000;
    server 192.168.1.103:8080 backup;  # 备份服务器
    
    # ========== 长连接池 ==========
    keepalive 100;
    keepalive_requests 100;
    keepalive_timeout 60s;
    
    # ========== 健康检查(商业版)==========
    # health_check interval=5s fails=3 passes=2;
}

7.2 会话保持方案


# ========== 方案1:IP Hash ==========
upstream backend_iphash {
    ip_hash;
    server 192.168.1.101:8080;
    server 192.168.1.102:8080;
}

# ========== 方案2:Cookie ==========
upstream backend_cookie {
    # 需要nginx-sticky-module-ng模块
    sticky cookie srv_id expires=1h domain=.example.com path=/;
    
    server 192.168.1.101:8080;
    server 192.168.1.102:8080;
}

# ========== 方案3:自定义哈希 ==========
upstream backend_custom {
    # 基于Cookie中的session_id
    hash $cookie_session_id consistent;
    
    server 192.168.1.101:8080;
    server 192.168.1.102:8080;
}

# ========== 方案4:后端Session共享 ==========
# 推荐使用Redis存储Session,无需Nginx层面保持

7.3 跨机房负载均衡


# 场景:多机房部署,就近访问

# ========== 定义机房 ==========
upstream beijing_cluster {
    zone beijing 64k;
    
    server 10.1.1.101:8080;
    server 10.1.1.102:8080;
    
    keepalive 50;
}

upstream shanghai_cluster {
    zone shanghai 64k;
    
    server 10.2.1.101:8080;
    server 10.2.1.102:8080;
    
    keepalive 50;
}

# ========== 根据来源IP分配 ==========
geo $backend_cluster {
    default shanghai_cluster;
    
    # 北京地区IP段
    1.0.0.0/8        beijing_cluster;
    58.0.0.0/8       beijing_cluster;
    
    # 上海地区IP段
    60.0.0.0/8       shanghai_cluster;
    61.0.0.0/8       shanghai_cluster;
}

server {
    listen 80;
    server_name www.example.com;
    
    location / {
        proxy_pass http://$backend_cluster;
    }
}

7.4 灰度发布配置


# 场景:新版本灰度发布,5%流量到新版本

split_clients "${remote_addr}" $backend_pool {
    5%     new_version;
    *      stable_version;
}

upstream stable_version {
    server 192.168.1.101:8080;
    server 192.168.1.102:8080;
}

upstream new_version {
    server 192.168.1.201:8080;
    server 192.168.1.202:8080;
}

server {
    listen 80;
    
    location / {
        proxy_pass http://$backend_pool;
    }
}

7.5 高可用架构(Keepalived)


# ========== 架构 ==========
# Nginx-Master (VRRP Priority 100) + Keepalived
# Nginx-Backup (VRRP Priority 90)  + Keepalived
# Virtual IP: 192.168.1.100

# ========== 安装Keepalived ==========
sudo apt install keepalived

# ========== Master配置 ==========
# /etc/keepalived/keepalived.conf
global_defs {
    router_id LB_MASTER
}

vrrp_script check_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 2
    weight -20
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    
    authentication {
        auth_type PASS
        auth_pass 1234
    }
    
    virtual_ipaddress {
        192.168.1.100
    }
    
    track_script {
        check_nginx
    }
}

# ========== Backup配置 ==========
# 同上,修改:
# state BACKUP
# priority 90

# ========== 健康检查脚本 ==========
# /etc/keepalived/check_nginx.sh
#!/bin/bash
counter=$(ps -C nginx --no-heading|wc -l)
if [ $counter -eq 0 ]; then
    systemctl start nginx
    sleep 2
    counter=$(ps -C nginx --no-heading|wc -l)
    if [ $counter -eq 0 ]; then
        systemctl stop keepalived
    fi
fi

chmod +x /etc/keepalived/check_nginx.sh

# ========== 启动服务 ==========
sudo systemctl enable keepalived
sudo systemctl start keepalived

# ========== 验证 ==========
ip addr show eth0 | grep 192.168.1.100

八、监控与调优

8.1 Nginx监控指标


# ========== 启用stub_status ==========
server {
    listen 8080;
    server_name localhost;
    
    location /nginx_status {
        stub_status on;
        access_log off;
        allow 127.0.0.1;
        deny all;
    }
}

# 访问:curl http://localhost:8080/nginx_status
# 输出:
# Active connections: 291
# server accepts handled requests
#  16630948 16630948 31070465
# Reading: 6 Writing: 179 Keepalive: 106

# 指标解释:
# Active connections: 当前活跃连接数
# accepts: 总接受连接数
# handled: 总处理连接数
# requests: 总请求数
# Reading: 正在读取请求头的连接数
# Writing: 正在写响应的连接数
# Keepalive: 保持连接的空闲连接数

8.2 Prometheus监控


# ========== 安装nginx-prometheus-exporter ==========
wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz
tar -xzf nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz
sudo mv nginx-prometheus-exporter /usr/local/bin/

# ========== 启动exporter ==========
nginx-prometheus-exporter -nginx.scrape-uri=http://localhost:8080/nginx_status

# ========== Prometheus配置 ==========
# prometheus.yml
scrape_configs:
  - job_name: 'nginx'
    static_configs:
      - targets: ['localhost:9113']
        labels:
          instance: 'web-server-1'

# ========== Grafana仪表板 ==========
# Dashboard ID: 12708(Nginx Overview)

8.3 日志分析


# ========== 实时分析访问日志 ==========

# 1. 实时QPS
tail -f /var/log/nginx/access.log | pv -l -i 1 -r > /dev/null

# 2. 状态码统计
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

# 3. 响应时间分析
awk '{print $NF}' /var/log/nginx/access.log | 
    awk '{sum+=$1; count++} END {print "Avg:", sum/count, "Count:", count}'

# 4. Top 10 请求URI
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# 5. Top 10 访问IP
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# 6. 慢请求分析(响应时间>1秒)
awk '$NF > 1 {print $0}' /var/log/nginx/access.log

# 7. 错误日志统计
grep -E "error|warn" /var/log/nginx/error.log | awk '{print $9}' | sort | uniq -c

8.4 性能分析工具


# ========== 1. strace分析系统调用 ==========
sudo strace -p $(pgrep nginx | head -1) -c

# ========== 2. perf分析CPU ==========
sudo perf record -p $(pgrep nginx | head -1) -g -- sleep 10
sudo perf report

# ========== 3. FlameGraph火焰图 ==========
git clone https://github.com/brendangregg/FlameGraph
sudo perf record -F 99 -p $(pgrep nginx | head -1) -g -- sleep 30
sudo perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > nginx.svg

# ========== 4. SystemTap ==========
# 监控Nginx函数调用
sudo stap -e 'probe process("/usr/sbin/nginx").function("*") {
    printf("%s -> %s
", thread_indent(1), probefunc())
}'

九、实战案例

9.1 案例1:从5K到50K QPS

初始状态

配置:默认配置性能:5,000 QPSCPU:40%内存:2GB

优化步骤


# 步骤1:调整Worker配置
worker_processes auto;  # 8核CPU
worker_connections 65535;
worker_rlimit_nofile 65535;

# 结果:QPS提升到 8,000 (+60%)

# 步骤2:启用长连接
upstream backend {
    keepalive 100;
}
proxy_http_version 1.1;
proxy_set_header Connection "";

# 结果:QPS提升到 15,000 (+87%)

# 步骤3:启用缓存
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=cache:100m;
proxy_cache cache;
proxy_cache_valid 200 1h;

# 结果:QPS提升到 35,000 (+133%),缓存命中率80%

# 步骤4:系统层优化
# BBR + 内核参数优化
echo "net.ipv4.tcp_congestion_control=bbr" >> /etc/sysctl.conf
sysctl -p

# 结果:QPS提升到 45,000 (+28%)

# 步骤5:Gzip压缩
gzip on;
gzip_comp_level 6;
gzip_types text/plain text/css application/json;

# 结果:带宽减少60%,QPS稳定在 50,000

# 最终性能:
# - QPS: 50,000 (+900%)
# - 延迟(P99): 45ms
# - CPU: 65%
# - 内存: 3GB

9.2 案例2:电商大促优化

场景

活动预期:100万在线用户峰值QPS:50,000静态资源:图片、CSS、JS

优化方案


# ========== 1. 静态资源分离 ==========
server {
    listen 80;
    server_name static.example.com;
    
    root /data/static;
    
    location ~* .(jpg|png|css|js)$ {
        expires 1y;
        add_header Cache-Control "public, immutable";
        access_log off;
        
        # 预压缩
        gzip_static on;
        
        # 零拷贝
        sendfile on;
        tcp_nopush on;
    }
}

# ========== 2. API限流 ==========
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;

server {
    listen 80;
    server_name api.example.com;
    
    location /api/ {
        limit_req zone=api_limit burst=200 nodelay;
        proxy_pass http://backend;
    }
}

# ========== 3. 降级开关 ==========
geo $is_maintenance {
    default 0;
    # 白名单IP
    192.168.1.100 1;
}

server {
    listen 80;
    
    if ($is_maintenance = 0) {
        return 503;
    }
    
    error_page 503 @maintenance;
    
    location @maintenance {
        root /usr/share/nginx/html;
        rewrite ^(.*)$ /maintenance.html break;
    }
}

结果

✅ 活动期间稳定支撑50K QPS✅ 静态资源缓存命中率95%✅ API限流有效防止雪崩✅ 降级功能保障核心服务

9.3 案例3:跨地域加速

场景

用户分布:北京、上海、深圳服务器:北京单机房问题:南方用户延迟高(150ms+)

解决方案:SD-WAN多机房部署


# ========== 架构 ==========
# 北京机房:主站(10.168.1.100)
# 上海机房:从站(10.168.2.100)
# 深圳机房:从站(10.168.3.100)
# 使用星空组网互联

# ========== 步骤1:部署星空组网 ==========
# 三个机房都安装星空组网客户端
curl -O https://dl.starrylink.cn/install.sh
sudo bash install.sh

# 加入同一个网络
sudo starrylink-cli network join <network-id>

# 验证互通
ping 10.168.2.100
ping 10.168.3.100

# ========== 步骤2:配置Nginx负载均衡 ==========
# 北京主站配置
upstream multi_region {
    # 本地机房优先
    server 127.0.0.1:8080 weight=10;
    
    # 其他机房备用(通过星空组网访问)
    server 10.168.2.100:8080 weight=1 backup;
    server 10.168.3.100:8080 weight=1 backup;
    
    keepalive 50;
}

# ========== 步骤3:智能DNS解析 ==========
# 使用DNSPod等支持地域解析的DNS
# 北京用户 → beijing.example.com → 10.168.1.100
# 上海用户 → shanghai.example.com → 10.168.2.100
# 深圳用户 → shenzhen.example.com → 10.168.3.100

# ========== 步骤4:数据同步 ==========
# 使用rsync通过星空组网同步静态资源
# 北京 → 上海
rsync -avz --bwlimit=10000 /data/static/ 
    admin@10.168.2.100:/data/static/

# 北京 → 深圳
rsync -avz --bwlimit=10000 /data/static/ 
    admin@10.168.3.100:/data/static/

优化结果

地区优化前延迟优化后延迟提升
北京15ms12ms20% ↑
上海150ms20ms87% ↑
深圳180ms25ms86% ↑

关键优势

✅ P2P直连,延迟降低85%+✅ 自动NAT穿透,无需公网IP配置✅ 统一虚拟网络,管理简单✅ 加密传输,安全可靠

十、性能调优检查清单

10.1 基础配置检查


# ========== Nginx配置检查 ==========
- [ ] worker_processes = auto 或 CPU核心数
- [ ] worker_connections >= 10000
- [ ] worker_rlimit_nofile >= 65535
- [ ] use epoll(Linux)
- [ ] multi_accept on
- [ ] sendfile on
- [ ] tcp_nopush on
- [ ] tcp_nodelay on
- [ ] keepalive_timeout 合理设置(30-65s)
- [ ] gzip on(压缩级别5-6)
- [ ] access_log 使用buffer或关闭(高并发)
- [ ] open_file_cache 已配置

# ========== 系统配置检查 ==========
- [ ] ulimit -n >= 65535
- [ ] net.core.somaxconn >= 65535
- [ ] net.ipv4.tcp_max_syn_backlog >= 65535
- [ ] net.ipv4.ip_local_port_range = 1024 65535
- [ ] net.ipv4.tcp_tw_reuse = 1
- [ ] net.ipv4.tcp_fin_timeout <= 30
- [ ] net.ipv4.tcp_congestion_control = bbr
- [ ] fs.file-max >= 2097152
- [ ] vm.swappiness = 0(可选)

# ========== 反向代理检查 ==========
- [ ] proxy_buffering on
- [ ] proxy_http_version 1.1
- [ ] proxy_set_header Connection ""
- [ ] upstream keepalive 已配置
- [ ] proxy_cache 已启用(适用场景)
- [ ] proxy_next_upstream 配置容错

# ========== SSL/TLS检查 ==========
- [ ] ssl_session_cache shared:SSL:50m
- [ ] ssl_session_timeout >= 1h
- [ ] ssl_protocols TLSv1.2 TLSv1.3
- [ ] ssl_stapling on
- [ ] http2 已启用

# ========== 监控检查 ==========
- [ ] stub_status 已启用
- [ ] 日志格式包含$request_time
- [ ] Prometheus exporter 已部署
- [ ] Grafana仪表板已配置
- [ ] 告警规则已设置

10.2 性能基准对照表

并发等级QPS配置要点
1K<10K基础优化即可
5K10K-50K+ 长连接池 + 系统调优
10K50K-100K+ 缓存 + 负载均衡
50K100K-500K+ 多机房 + CDN
100K+500K++ 专业方案(LVS/F5)

10.3 故障排查清单


# ========== 问题1:QPS上不去 ==========
1. 检查CPU使用率(是否100%)
2. 检查网络带宽(是否跑满)
3. 检查文件描述符(Too many open files)
4. 检查后端响应时间
5. 检查是否开启长连接

# ========== 问题2:延迟高 ==========
1. 查看$request_time和$upstream_response_time
2. 检查磁盘I/O(iowait)
3. 检查网络延迟(ping/mtr)
4. 检查DNS解析时间
5. 检查SSL握手时间

# ========== 问题3:连接数异常 ==========
1. 查看TIME_WAIT数量(netstat -an | grep TIME_WAIT | wc -l)
2. 检查tcp_tw_reuse是否启用
3. 检查keepalive_timeout设置
4. 检查后端长连接

# ========== 问题4:502/504错误 ==========
1. 检查后端服务是否存活
2. 检查proxy_connect_timeout设置
3. 检查后端日志
4. 检查网络连通性
5. 检查SELinux/防火墙

# ========== 问题5:缓存不生效 ==========
1. 检查X-Cache-Status响应头
2. 查看缓存目录大小
3. 检查proxy_cache_valid设置
4. 检查后端Cache-Control头
5. 查看error.log

十一、总结

11.1 优化优先级


1. 【高】系统层优化(文件描述符、内核参数)
   ├─ ROI: ⭐⭐⭐⭐⭐
   └─ 难度: ⭐⭐

2. 【高】Nginx基础配置(worker、长连接)
   ├─ ROI: ⭐⭐⭐⭐⭐
   └─ 难度: ⭐

3. 【中】缓存策略(proxy_cache、浏览器缓存)
   ├─ ROI: ⭐⭐⭐⭐
   └─ 难度: ⭐⭐⭐

4. 【中】SSL/TLS优化(会话复用、OCSP Stapling)
   ├─ ROI: ⭐⭐⭐
   └─ 难度: ⭐⭐

5. 【低】网络层优化(网卡多队列、Ring Buffer)
   ├─ ROI: ⭐⭐
   └─ 难度: ⭐⭐⭐⭐

11.2 关键性能指标

指标目标值监控方式
QPS>50Kwrk/ab
延迟(P99)<50ms$request_time
错误率<0.01%4xx/5xx统计
缓存命中率>80%$upstream_cache_status
CPU使用率<70%top/htop
连接数< worker_connectionsstub_status

11.3 下一步行动

1. 建立性能基准测试 2. 应用本文的基础优化配置 3. 部署监控系统(Prometheus+Grafana) 4. 配置告警规则 5. 压测验证优化效果 6. 根据监控数据持续优化 7. 文档化你的配置和经验

参考资料

Nginx官方文档:http://nginx.org/en/docs/Nginx性能优化指南:https://www.nginx.com/blog/tuning-nginx/Linux内核文档:https://www.kernel.org/doc/Documentation/networking/BBR拥塞控制:https://github.com/google/bbrTCP/IP详解:Stevens, W. RichardHigh Performance Browser Networking:Ilya GrigorikPrometheus Nginx Exporter:https://github.com/nginxinc/nginx-prometheus-exporter

如果这篇文章对你有帮助,欢迎点赞收藏!⭐

有问题欢迎在评论区讨论 👇

  • 全部评论(0)
最新发布的资讯信息
【系统环境|】创建一个本地分支(2025-12-03 22:43)
【系统环境|】git 如何删除本地和远程分支?(2025-12-03 22:42)
【系统环境|】2019|阿里11面+EMC+网易+美团面经(2025-12-03 22:42)
【系统环境|】32位单片机定时器入门介绍(2025-12-03 22:42)
【系统环境|】从 10 月 19 日起,GitLab 将对所有免费用户强制实施存储限制(2025-12-03 22:42)
【系统环境|】价值驱动的产品交付-OKR、协作与持续优化实践(2025-12-03 22:42)
【系统环境|】IDEA 强行回滚已提交到Master上的代码(2025-12-03 22:42)
【系统环境|】GitLab 15.1发布,Python notebook图形渲染和SLSA 2级构建工件证明(2025-12-03 22:41)
【系统环境|】AI 代码审查 (Code Review) 清单 v1.0(2025-12-03 22:41)
【系统环境|】构建高效流水线:CI/CD工具如何提升软件交付速度(2025-12-03 22:41)
手机二维码手机访问领取大礼包
返回顶部