Initial commit: Telegram Management System
Some checks failed
Deploy / deploy (push) Has been cancelled
Some checks failed
Deploy / deploy (push) Has been cancelled
Full-stack web application for Telegram management - Frontend: Vue 3 + Vben Admin - Backend: NestJS - Features: User management, group broadcast, statistics 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
940
OPERATIONS.md
Normal file
940
OPERATIONS.md
Normal file
@@ -0,0 +1,940 @@
|
||||
# Telegram Management System - 运维操作手册
|
||||
|
||||
本手册提供了Telegram Management System日常运维操作的详细指导,包括常见操作、故障处理、性能调优和安全管理。
|
||||
|
||||
## 目录
|
||||
|
||||
- [日常运维操作](#日常运维操作)
|
||||
- [系统监控](#系统监控)
|
||||
- [故障诊断与处理](#故障诊断与处理)
|
||||
- [性能调优](#性能调优)
|
||||
- [安全管理](#安全管理)
|
||||
- [备份与恢复](#备份与恢复)
|
||||
- [版本更新](#版本更新)
|
||||
- [应急响应](#应急响应)
|
||||
|
||||
## 日常运维操作
|
||||
|
||||
### 服务状态检查
|
||||
|
||||
**检查应用服务状态**:
|
||||
|
||||
```bash
|
||||
# PM2服务状态
|
||||
pm2 status
|
||||
pm2 monit
|
||||
|
||||
# 检查进程
|
||||
ps aux | grep node
|
||||
ps aux | grep telegram-management
|
||||
|
||||
# 检查端口监听
|
||||
netstat -tlnp | grep :3000
|
||||
ss -tlnp | grep :3000
|
||||
|
||||
# 检查服务响应
|
||||
curl -I http://localhost:3000/health
|
||||
curl -s http://localhost:3000/health/detailed | jq .
|
||||
```
|
||||
|
||||
**检查数据库状态**:
|
||||
|
||||
```bash
|
||||
# MySQL服务状态
|
||||
sudo systemctl status mysql
|
||||
mysqladmin -u root -p status
|
||||
mysqladmin -u root -p processlist
|
||||
|
||||
# 连接数检查
|
||||
mysql -u root -p -e "SHOW STATUS LIKE 'Threads_connected';"
|
||||
mysql -u root -p -e "SHOW STATUS LIKE 'Max_used_connections';"
|
||||
|
||||
# 慢查询检查
|
||||
mysql -u root -p -e "SHOW STATUS LIKE 'Slow_queries';"
|
||||
```
|
||||
|
||||
**检查Redis状态**:
|
||||
|
||||
```bash
|
||||
# Redis服务状态
|
||||
sudo systemctl status redis
|
||||
redis-cli ping
|
||||
|
||||
# Redis信息
|
||||
redis-cli info server
|
||||
redis-cli info memory
|
||||
redis-cli info stats
|
||||
|
||||
# 连接数检查
|
||||
redis-cli info clients
|
||||
```
|
||||
|
||||
### 日志管理
|
||||
|
||||
**应用日志查看**:
|
||||
|
||||
```bash
|
||||
# PM2日志
|
||||
pm2 logs telegram-management-backend
|
||||
pm2 logs telegram-management-backend --lines 100
|
||||
|
||||
# 应用日志文件
|
||||
tail -f backend/logs/app.log
|
||||
tail -f backend/logs/error.log
|
||||
tail -f backend/logs/access.log
|
||||
|
||||
# 筛选错误日志
|
||||
grep -i error backend/logs/app.log
|
||||
grep -i "500\|error\|exception" backend/logs/access.log
|
||||
```
|
||||
|
||||
**系统日志查看**:
|
||||
|
||||
```bash
|
||||
# 系统日志
|
||||
sudo journalctl -u telegram-management-backend -f
|
||||
sudo journalctl -u mysql -f
|
||||
sudo journalctl -u redis -f
|
||||
|
||||
# Nginx日志
|
||||
sudo tail -f /var/log/nginx/access.log
|
||||
sudo tail -f /var/log/nginx/error.log
|
||||
```
|
||||
|
||||
**日志轮转管理**:
|
||||
|
||||
```bash
|
||||
# 手动轮转日志
|
||||
sudo logrotate -f /etc/logrotate.d/telegram-management
|
||||
|
||||
# 检查日志轮转状态
|
||||
sudo logrotate -d /etc/logrotate.d/telegram-management
|
||||
|
||||
# 清理旧日志
|
||||
find backend/logs -name "*.log.*" -mtime +30 -delete
|
||||
```
|
||||
|
||||
### 磁盘空间管理
|
||||
|
||||
**磁盘使用检查**:
|
||||
|
||||
```bash
|
||||
# 磁盘使用情况
|
||||
df -h
|
||||
du -sh /var/www/telegram-management/*
|
||||
|
||||
# 查找大文件
|
||||
find /var/www/telegram-management -type f -size +100M -exec ls -lh {} \;
|
||||
|
||||
# 分析目录大小
|
||||
du -h --max-depth=1 /var/www/telegram-management/
|
||||
```
|
||||
|
||||
**清理临时文件**:
|
||||
|
||||
```bash
|
||||
# 清理应用临时文件
|
||||
rm -rf backend/tmp/*
|
||||
rm -rf backend/sessions/tmp_*
|
||||
|
||||
# 清理系统临时文件
|
||||
sudo rm -rf /tmp/telegram-*
|
||||
sudo rm -rf /var/tmp/telegram-*
|
||||
|
||||
# 清理npm缓存
|
||||
npm cache clean --force
|
||||
```
|
||||
|
||||
### 数据库维护
|
||||
|
||||
**日常维护操作**:
|
||||
|
||||
```bash
|
||||
# 数据库优化
|
||||
mysql -u root -p -e "OPTIMIZE TABLE telegram_management.group_tasks;"
|
||||
mysql -u root -p -e "OPTIMIZE TABLE telegram_management.tg_account_pool;"
|
||||
mysql -u root -p -e "OPTIMIZE TABLE telegram_management.risk_logs;"
|
||||
|
||||
# 分析表统计信息
|
||||
mysql -u root -p -e "ANALYZE TABLE telegram_management.group_tasks;"
|
||||
|
||||
# 检查表状态
|
||||
mysql -u root -p -e "CHECK TABLE telegram_management.group_tasks;"
|
||||
|
||||
# 修复表(如需要)
|
||||
mysql -u root -p -e "REPAIR TABLE telegram_management.group_tasks;"
|
||||
```
|
||||
|
||||
**清理历史数据**:
|
||||
|
||||
```sql
|
||||
-- 清理30天前的风控日志
|
||||
DELETE FROM risk_logs WHERE createdAt < DATE_SUB(NOW(), INTERVAL 30 DAY);
|
||||
|
||||
-- 清理90天前的异常日志
|
||||
DELETE FROM anomaly_logs WHERE createdAt < DATE_SUB(NOW(), INTERVAL 90 DAY);
|
||||
|
||||
-- 清理完成的任务记录(保留6个月)
|
||||
DELETE FROM group_tasks
|
||||
WHERE status = 'completed'
|
||||
AND completedAt < DATE_SUB(NOW(), INTERVAL 6 MONTH);
|
||||
|
||||
-- 优化表空间
|
||||
OPTIMIZE TABLE risk_logs, anomaly_logs, group_tasks;
|
||||
```
|
||||
|
||||
## 系统监控
|
||||
|
||||
### 关键指标监控
|
||||
|
||||
**系统资源监控脚本** (`monitor.sh`):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
LOG_FILE="/var/log/telegram-management-monitor.log"
|
||||
ALERT_THRESHOLD_CPU=80
|
||||
ALERT_THRESHOLD_MEM=85
|
||||
ALERT_THRESHOLD_DISK=90
|
||||
|
||||
# 获取系统指标
|
||||
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
|
||||
MEM_USAGE=$(free | grep Mem | awk '{printf("%.2f", ($3/$2) * 100.0)}')
|
||||
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
|
||||
|
||||
# 记录指标
|
||||
echo "$(date '+%Y-%m-%d %H:%M:%S') - CPU: ${CPU_USAGE}%, MEM: ${MEM_USAGE}%, DISK: ${DISK_USAGE}%" >> $LOG_FILE
|
||||
|
||||
# 检查告警条件
|
||||
if (( $(echo "$CPU_USAGE > $ALERT_THRESHOLD_CPU" | bc -l) )); then
|
||||
echo "ALERT: High CPU usage: ${CPU_USAGE}%" | logger -t telegram-management
|
||||
fi
|
||||
|
||||
if (( $(echo "$MEM_USAGE > $ALERT_THRESHOLD_MEM" | bc -l) )); then
|
||||
echo "ALERT: High memory usage: ${MEM_USAGE}%" | logger -t telegram-management
|
||||
fi
|
||||
|
||||
if [ "$DISK_USAGE" -gt "$ALERT_THRESHOLD_DISK" ]; then
|
||||
echo "ALERT: High disk usage: ${DISK_USAGE}%" | logger -t telegram-management
|
||||
fi
|
||||
```
|
||||
|
||||
**应用性能监控**:
|
||||
|
||||
```bash
|
||||
# HTTP响应时间检查
|
||||
curl -o /dev/null -s -w "响应时间: %{time_total}s\n" http://localhost:3000/health
|
||||
|
||||
# 数据库连接检查
|
||||
mysql -u tg_user -p -e "SELECT COUNT(*) as active_connections FROM information_schema.processlist;"
|
||||
|
||||
# Redis性能检查
|
||||
redis-cli --latency-history -i 1
|
||||
|
||||
# PM2性能监控
|
||||
pm2 show telegram-management-backend
|
||||
```
|
||||
|
||||
### 自动化监控脚本
|
||||
|
||||
**健康检查脚本** (`health-check.sh`):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
SERVICE_NAME="telegram-management-backend"
|
||||
HEALTH_URL="http://localhost:3000/health"
|
||||
EMAIL_ALERT="admin@yourdomain.com"
|
||||
|
||||
# 检查PM2进程
|
||||
if ! pm2 list | grep -q "$SERVICE_NAME.*online"; then
|
||||
echo "服务 $SERVICE_NAME 未运行,尝试重启..."
|
||||
pm2 restart $SERVICE_NAME
|
||||
|
||||
# 等待服务启动
|
||||
sleep 10
|
||||
|
||||
# 再次检查
|
||||
if ! pm2 list | grep -q "$SERVICE_NAME.*online"; then
|
||||
echo "服务重启失败,发送告警邮件"
|
||||
echo "服务 $SERVICE_NAME 重启失败,请立即检查" | mail -s "紧急:服务异常" $EMAIL_ALERT
|
||||
fi
|
||||
fi
|
||||
|
||||
# 检查HTTP响应
|
||||
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" $HEALTH_URL)
|
||||
if [ "$HTTP_CODE" != "200" ]; then
|
||||
echo "健康检查失败,HTTP状态码: $HTTP_CODE"
|
||||
echo "健康检查失败,HTTP状态码: $HTTP_CODE" | mail -s "告警:健康检查失败" $EMAIL_ALERT
|
||||
fi
|
||||
|
||||
# 检查数据库连接
|
||||
if ! mysql -u tg_user -p$DB_PASSWORD -e "SELECT 1;" &> /dev/null; then
|
||||
echo "数据库连接失败"
|
||||
echo "数据库连接失败,请检查数据库服务" | mail -s "告警:数据库连接失败" $EMAIL_ALERT
|
||||
fi
|
||||
|
||||
# 检查Redis连接
|
||||
if ! redis-cli ping &> /dev/null; then
|
||||
echo "Redis连接失败"
|
||||
echo "Redis连接失败,请检查Redis服务" | mail -s "告警:Redis连接失败" $EMAIL_ALERT
|
||||
fi
|
||||
```
|
||||
|
||||
**定时任务配置**:
|
||||
|
||||
```bash
|
||||
# 编辑定时任务
|
||||
crontab -e
|
||||
|
||||
# 添加以下内容:
|
||||
# 每分钟检查系统资源
|
||||
* * * * * /path/to/monitor.sh
|
||||
|
||||
# 每5分钟进行健康检查
|
||||
*/5 * * * * /path/to/health-check.sh
|
||||
|
||||
# 每小时备份重要数据
|
||||
0 * * * * /path/to/backup.sh
|
||||
|
||||
# 每天凌晨清理日志
|
||||
0 2 * * * /path/to/cleanup-logs.sh
|
||||
```
|
||||
|
||||
## 故障诊断与处理
|
||||
|
||||
### 常见故障诊断
|
||||
|
||||
**服务无响应**:
|
||||
|
||||
```bash
|
||||
# 1. 检查进程状态
|
||||
pm2 status
|
||||
ps aux | grep node
|
||||
|
||||
# 2. 检查端口占用
|
||||
netstat -tlnp | grep :3000
|
||||
lsof -i :3000
|
||||
|
||||
# 3. 检查系统资源
|
||||
top
|
||||
free -h
|
||||
df -h
|
||||
|
||||
# 4. 查看错误日志
|
||||
pm2 logs telegram-management-backend --err
|
||||
tail -f backend/logs/error.log
|
||||
|
||||
# 5. 重启服务
|
||||
pm2 restart telegram-management-backend
|
||||
```
|
||||
|
||||
**数据库连接问题**:
|
||||
|
||||
```bash
|
||||
# 1. 检查MySQL服务
|
||||
sudo systemctl status mysql
|
||||
sudo systemctl restart mysql
|
||||
|
||||
# 2. 检查连接数
|
||||
mysql -u root -p -e "SHOW STATUS LIKE 'Threads_connected';"
|
||||
mysql -u root -p -e "SHOW VARIABLES LIKE 'max_connections';"
|
||||
|
||||
# 3. 检查锁等待
|
||||
mysql -u root -p -e "SHOW ENGINE INNODB STATUS\G" | grep -A 20 "LATEST DETECTED DEADLOCK"
|
||||
|
||||
# 4. 检查慢查询
|
||||
mysql -u root -p -e "SHOW STATUS LIKE 'Slow_queries';"
|
||||
tail -f /var/log/mysql/slow.log
|
||||
```
|
||||
|
||||
**内存泄漏诊断**:
|
||||
|
||||
```bash
|
||||
# 1. 生成堆快照
|
||||
kill -USR2 $(pgrep -f "telegram-management-backend")
|
||||
|
||||
# 2. 分析内存使用
|
||||
node --inspect backend/src/app.js
|
||||
# 使用Chrome DevTools连接并分析
|
||||
|
||||
# 3. 监控内存增长
|
||||
while true; do
|
||||
ps -p $(pgrep -f "telegram-management-backend") -o pid,vsz,rss,comm
|
||||
sleep 60
|
||||
done
|
||||
|
||||
# 4. 重启服务释放内存
|
||||
pm2 restart telegram-management-backend
|
||||
```
|
||||
|
||||
### 故障处理流程
|
||||
|
||||
**故障分级**:
|
||||
|
||||
- **P0 (紧急)**: 服务完全不可用
|
||||
- **P1 (重要)**: 核心功能异常
|
||||
- **P2 (一般)**: 部分功能异常
|
||||
- **P3 (轻微)**: 性能问题或警告
|
||||
|
||||
**P0故障处理**:
|
||||
|
||||
```bash
|
||||
# 1. 立即评估影响范围
|
||||
curl -I http://localhost:3000/health
|
||||
pm2 status
|
||||
|
||||
# 2. 快速恢复服务
|
||||
pm2 restart telegram-management-backend
|
||||
|
||||
# 3. 检查关键组件
|
||||
sudo systemctl status mysql redis nginx
|
||||
|
||||
# 4. 如无法快速恢复,启用备用方案
|
||||
# (根据实际情况,可能需要切换到备用服务器)
|
||||
|
||||
# 5. 记录故障信息
|
||||
echo "$(date): P0故障 - 服务不可用" >> /var/log/incidents.log
|
||||
```
|
||||
|
||||
**性能问题诊断**:
|
||||
|
||||
```bash
|
||||
# 1. CPU性能分析
|
||||
top -p $(pgrep -f "telegram-management-backend")
|
||||
perf top -p $(pgrep -f "telegram-management-backend")
|
||||
|
||||
# 2. 数据库性能分析
|
||||
mysql -u root -p -e "SHOW PROCESSLIST;"
|
||||
mysql -u root -p -e "SHOW ENGINE INNODB STATUS\G"
|
||||
|
||||
# 3. Redis性能分析
|
||||
redis-cli --bigkeys
|
||||
redis-cli --hotkeys
|
||||
redis-cli monitor
|
||||
|
||||
# 4. 网络性能分析
|
||||
netstat -i
|
||||
iftop
|
||||
```
|
||||
|
||||
## 性能调优
|
||||
|
||||
### 应用层优化
|
||||
|
||||
**Node.js参数调优**:
|
||||
|
||||
```bash
|
||||
# PM2配置优化
|
||||
pm2 start ecosystem.config.js --node-args="--max-old-space-size=4096 --optimize-for-size"
|
||||
|
||||
# 启用V8优化
|
||||
export NODE_OPTIONS="--max-old-space-size=4096 --optimize-for-size"
|
||||
```
|
||||
|
||||
**连接池优化**:
|
||||
|
||||
```javascript
|
||||
// 数据库连接池配置
|
||||
const dbConfig = {
|
||||
pool: {
|
||||
max: 50, // 最大连接数
|
||||
min: 10, // 最小连接数
|
||||
acquire: 30000, // 获取连接超时时间
|
||||
idle: 10000 // 连接空闲时间
|
||||
}
|
||||
};
|
||||
|
||||
// Redis连接池配置
|
||||
const redisConfig = {
|
||||
family: 4,
|
||||
keepAlive: true,
|
||||
lazyConnect: true,
|
||||
maxRetriesPerRequest: 3,
|
||||
retryDelayOnFailover: 100,
|
||||
enableOfflineQueue: false,
|
||||
maxmemoryPolicy: 'allkeys-lru'
|
||||
};
|
||||
```
|
||||
|
||||
### 数据库优化
|
||||
|
||||
**查询优化**:
|
||||
|
||||
```sql
|
||||
-- 分析慢查询
|
||||
SELECT * FROM mysql.slow_log WHERE start_time > DATE_SUB(NOW(), INTERVAL 1 HOUR);
|
||||
|
||||
-- 创建复合索引
|
||||
CREATE INDEX idx_task_status_created ON group_tasks(status, createdAt);
|
||||
CREATE INDEX idx_account_health_status ON tg_account_pool(healthScore, status);
|
||||
|
||||
-- 分区表优化(针对大表)
|
||||
ALTER TABLE risk_logs PARTITION BY RANGE (YEAR(createdAt)) (
|
||||
PARTITION p2023 VALUES LESS THAN (2024),
|
||||
PARTITION p2024 VALUES LESS THAN (2025),
|
||||
PARTITION p2025 VALUES LESS THAN (2026)
|
||||
);
|
||||
```
|
||||
|
||||
**配置优化**:
|
||||
|
||||
```ini
|
||||
# MySQL配置优化
|
||||
[mysqld]
|
||||
# InnoDB设置
|
||||
innodb_buffer_pool_size = 8G
|
||||
innodb_log_file_size = 512M
|
||||
innodb_log_buffer_size = 128M
|
||||
innodb_flush_log_at_trx_commit = 2
|
||||
|
||||
# 查询缓存
|
||||
query_cache_type = 1
|
||||
query_cache_size = 512M
|
||||
query_cache_limit = 32M
|
||||
|
||||
# 连接设置
|
||||
max_connections = 1000
|
||||
thread_cache_size = 100
|
||||
|
||||
# 临时表设置
|
||||
tmp_table_size = 256M
|
||||
max_heap_table_size = 256M
|
||||
```
|
||||
|
||||
### 缓存优化
|
||||
|
||||
**Redis优化策略**:
|
||||
|
||||
```bash
|
||||
# Redis配置调优
|
||||
redis-cli CONFIG SET maxmemory-policy allkeys-lru
|
||||
redis-cli CONFIG SET tcp-keepalive 300
|
||||
redis-cli CONFIG SET timeout 0
|
||||
|
||||
# 缓存预热脚本
|
||||
redis-cli EVAL "
|
||||
local keys = redis.call('KEYS', 'cache:account:*')
|
||||
for i=1,#keys do
|
||||
redis.call('EXPIRE', keys[i], 3600)
|
||||
end
|
||||
return #keys
|
||||
" 0
|
||||
```
|
||||
|
||||
**应用缓存策略**:
|
||||
|
||||
```javascript
|
||||
// 多级缓存实现
|
||||
class CacheManager {
|
||||
constructor() {
|
||||
this.l1Cache = new Map(); // 内存缓存
|
||||
this.l2Cache = redis; // Redis缓存
|
||||
}
|
||||
|
||||
async get(key) {
|
||||
// L1缓存查找
|
||||
if (this.l1Cache.has(key)) {
|
||||
return this.l1Cache.get(key);
|
||||
}
|
||||
|
||||
// L2缓存查找
|
||||
const value = await this.l2Cache.get(key);
|
||||
if (value) {
|
||||
this.l1Cache.set(key, JSON.parse(value));
|
||||
return JSON.parse(value);
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
async set(key, value, ttl = 3600) {
|
||||
this.l1Cache.set(key, value);
|
||||
await this.l2Cache.setex(key, ttl, JSON.stringify(value));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 安全管理
|
||||
|
||||
### 访问控制
|
||||
|
||||
**用户权限管理**:
|
||||
|
||||
```bash
|
||||
# 创建运维用户
|
||||
sudo useradd -m -s /bin/bash telegram-ops
|
||||
sudo usermod -aG sudo telegram-ops
|
||||
|
||||
# 设置SSH密钥认证
|
||||
mkdir -p /home/telegram-ops/.ssh
|
||||
cat >> /home/telegram-ops/.ssh/authorized_keys << EOF
|
||||
ssh-rsa YOUR_PUBLIC_KEY telegram-ops@management
|
||||
EOF
|
||||
chmod 700 /home/telegram-ops/.ssh
|
||||
chmod 600 /home/telegram-ops/.ssh/authorized_keys
|
||||
chown -R telegram-ops:telegram-ops /home/telegram-ops/.ssh
|
||||
```
|
||||
|
||||
**数据库安全**:
|
||||
|
||||
```sql
|
||||
-- 创建只读用户(用于监控)
|
||||
CREATE USER 'monitor'@'localhost' IDENTIFIED BY 'monitor_password';
|
||||
GRANT SELECT ON telegram_management.* TO 'monitor'@'localhost';
|
||||
|
||||
-- 创建备份用户
|
||||
CREATE USER 'backup'@'localhost' IDENTIFIED BY 'backup_password';
|
||||
GRANT SELECT, LOCK TABLES ON telegram_management.* TO 'backup'@'localhost';
|
||||
|
||||
-- 定期更新密码
|
||||
ALTER USER 'tg_user'@'localhost' IDENTIFIED BY 'new_secure_password';
|
||||
FLUSH PRIVILEGES;
|
||||
```
|
||||
|
||||
### 安全审计
|
||||
|
||||
**日志审计脚本** (`security-audit.sh`):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
AUDIT_LOG="/var/log/security-audit.log"
|
||||
DATE=$(date '+%Y-%m-%d %H:%M:%S')
|
||||
|
||||
echo "[$DATE] 开始安全审计" >> $AUDIT_LOG
|
||||
|
||||
# 检查失败的登录尝试
|
||||
FAILED_LOGINS=$(grep "Failed password" /var/log/auth.log | wc -l)
|
||||
echo "[$DATE] 失败登录尝试: $FAILED_LOGINS" >> $AUDIT_LOG
|
||||
|
||||
# 检查权限异常文件
|
||||
find /var/www/telegram-management -type f -perm /o+w >> $AUDIT_LOG
|
||||
|
||||
# 检查异常进程
|
||||
ps aux | grep -v "telegram-management\|mysql\|redis\|nginx" | grep -E "(bash|sh).*root" >> $AUDIT_LOG
|
||||
|
||||
# 检查网络连接
|
||||
netstat -an | grep :3000 | grep ESTABLISHED | wc -l >> $AUDIT_LOG
|
||||
|
||||
echo "[$DATE] 安全审计完成" >> $AUDIT_LOG
|
||||
```
|
||||
|
||||
**安全加固检查**:
|
||||
|
||||
```bash
|
||||
# 检查系统更新
|
||||
sudo apt list --upgradable
|
||||
|
||||
# 检查开放端口
|
||||
nmap -sT -O localhost
|
||||
|
||||
# 检查文件完整性
|
||||
find /var/www/telegram-management -type f -name "*.js" -exec md5sum {} \; > checksums.txt
|
||||
|
||||
# 检查SSL证书有效期
|
||||
openssl x509 -in /path/to/cert.pem -text -noout | grep "Not After"
|
||||
```
|
||||
|
||||
## 备份与恢复
|
||||
|
||||
### 自动化备份
|
||||
|
||||
**完整备份脚本** (`full-backup.sh`):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
BACKUP_BASE="/backup"
|
||||
DATE=$(date +%Y%m%d_%H%M%S)
|
||||
RETENTION_DAYS=30
|
||||
|
||||
# 创建备份目录
|
||||
mkdir -p $BACKUP_BASE/{mysql,redis,files,logs}/$DATE
|
||||
|
||||
# 数据库备份
|
||||
mysqldump -u backup -p$BACKUP_PASS --single-transaction --routines --triggers telegram_management > $BACKUP_BASE/mysql/$DATE/full_backup.sql
|
||||
gzip $BACKUP_BASE/mysql/$DATE/full_backup.sql
|
||||
|
||||
# Redis备份
|
||||
redis-cli --rdb $BACKUP_BASE/redis/$DATE/dump.rdb
|
||||
|
||||
# 文件备份
|
||||
tar -czf $BACKUP_BASE/files/$DATE/application.tar.gz /var/www/telegram-management
|
||||
tar -czf $BACKUP_BASE/files/$DATE/sessions.tar.gz /var/www/telegram-management/backend/sessions
|
||||
|
||||
# 日志备份
|
||||
tar -czf $BACKUP_BASE/logs/$DATE/logs.tar.gz /var/www/telegram-management/backend/logs
|
||||
|
||||
# 生成备份清单
|
||||
cat > $BACKUP_BASE/manifest_$DATE.txt << EOF
|
||||
备份时间: $(date)
|
||||
数据库大小: $(du -h $BACKUP_BASE/mysql/$DATE/full_backup.sql.gz | cut -f1)
|
||||
Redis大小: $(du -h $BACKUP_BASE/redis/$DATE/dump.rdb | cut -f1)
|
||||
应用文件大小: $(du -h $BACKUP_BASE/files/$DATE/application.tar.gz | cut -f1)
|
||||
会话文件大小: $(du -h $BACKUP_BASE/files/$DATE/sessions.tar.gz | cut -f1)
|
||||
日志文件大小: $(du -h $BACKUP_BASE/logs/$DATE/logs.tar.gz | cut -f1)
|
||||
EOF
|
||||
|
||||
# 清理过期备份
|
||||
find $BACKUP_BASE -type f -mtime +$RETENTION_DAYS -delete
|
||||
find $BACKUP_BASE -type d -empty -delete
|
||||
|
||||
echo "备份完成: $DATE"
|
||||
```
|
||||
|
||||
### 恢复操作
|
||||
|
||||
**数据库恢复**:
|
||||
|
||||
```bash
|
||||
# 完整恢复
|
||||
mysql -u root -p telegram_management < backup_file.sql
|
||||
|
||||
# 部分表恢复
|
||||
mysql -u root -p telegram_management -e "DROP TABLE IF EXISTS group_tasks;"
|
||||
mysqldump -u backup -p backup_telegram_management group_tasks | mysql -u root -p telegram_management
|
||||
|
||||
# 恢复验证
|
||||
mysql -u root -p -e "SELECT COUNT(*) FROM telegram_management.group_tasks;"
|
||||
```
|
||||
|
||||
**应用恢复**:
|
||||
|
||||
```bash
|
||||
# 停止服务
|
||||
pm2 stop telegram-management-backend
|
||||
|
||||
# 恢复应用文件
|
||||
cd /var/www
|
||||
sudo rm -rf telegram-management
|
||||
sudo tar -xzf /backup/files/20240101_020000/application.tar.gz
|
||||
|
||||
# 恢复会话文件
|
||||
sudo tar -xzf /backup/files/20240101_020000/sessions.tar.gz -C /var/www/telegram-management/backend/
|
||||
|
||||
# 恢复权限
|
||||
sudo chown -R telegram-ops:telegram-ops /var/www/telegram-management
|
||||
sudo chmod +x /var/www/telegram-management/backend/src/app.js
|
||||
|
||||
# 重启服务
|
||||
pm2 start ecosystem.config.js --env production
|
||||
```
|
||||
|
||||
### 灾难恢复
|
||||
|
||||
**故障转移步骤**:
|
||||
|
||||
```bash
|
||||
# 1. 评估故障影响
|
||||
curl -I http://primary-server:3000/health
|
||||
ping primary-server
|
||||
|
||||
# 2. 切换DNS解析到备用服务器
|
||||
# (需要根据DNS提供商操作)
|
||||
|
||||
# 3. 在备用服务器上恢复最新备份
|
||||
./restore-from-backup.sh latest
|
||||
|
||||
# 4. 验证服务功能
|
||||
curl -I http://backup-server:3000/health
|
||||
./health-check.sh
|
||||
|
||||
# 5. 通知相关人员
|
||||
echo "故障转移完成,当前使用备用服务器" | mail -s "故障转移通知" team@company.com
|
||||
```
|
||||
|
||||
## 版本更新
|
||||
|
||||
### 滚动更新流程
|
||||
|
||||
**更新脚本** (`rolling-update.sh`):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
NEW_VERSION=$1
|
||||
BACKUP_DIR="/backup/pre-update-$(date +%Y%m%d)"
|
||||
|
||||
if [ -z "$NEW_VERSION" ]; then
|
||||
echo "使用方法: $0 <版本号>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "开始更新到版本: $NEW_VERSION"
|
||||
|
||||
# 1. 创建更新前备份
|
||||
echo "创建更新前备份..."
|
||||
mkdir -p $BACKUP_DIR
|
||||
cp -r /var/www/telegram-management $BACKUP_DIR/
|
||||
|
||||
# 2. 下载新版本
|
||||
echo "下载新版本..."
|
||||
cd /tmp
|
||||
git clone -b $NEW_VERSION https://github.com/your-org/telegram-management-system.git
|
||||
cd telegram-management-system
|
||||
|
||||
# 3. 检查依赖变化
|
||||
echo "检查依赖变化..."
|
||||
diff package.json /var/www/telegram-management/backend/package.json
|
||||
|
||||
# 4. 执行数据库迁移(如需要)
|
||||
echo "执行数据库迁移..."
|
||||
cd backend
|
||||
npm run migrate:check
|
||||
|
||||
# 5. 构建新版本
|
||||
echo "构建前端..."
|
||||
cd ../frontend
|
||||
npm install
|
||||
npm run build
|
||||
|
||||
# 6. 停止服务
|
||||
echo "停止服务..."
|
||||
pm2 stop telegram-management-backend
|
||||
|
||||
# 7. 部署新版本
|
||||
echo "部署新版本..."
|
||||
cp -r /tmp/telegram-management-system/backend/* /var/www/telegram-management/backend/
|
||||
cp -r /tmp/telegram-management-system/frontend/dist/* /var/www/telegram-management/frontend/dist/
|
||||
|
||||
# 8. 安装新依赖
|
||||
cd /var/www/telegram-management/backend
|
||||
npm install --production
|
||||
|
||||
# 9. 执行数据库迁移
|
||||
npm run migrate
|
||||
|
||||
# 10. 启动服务
|
||||
echo "启动服务..."
|
||||
pm2 start ecosystem.config.js --env production
|
||||
|
||||
# 11. 健康检查
|
||||
sleep 10
|
||||
if curl -f http://localhost:3000/health; then
|
||||
echo "更新成功!"
|
||||
# 清理临时文件
|
||||
rm -rf /tmp/telegram-management-system
|
||||
else
|
||||
echo "更新失败,开始回滚..."
|
||||
pm2 stop telegram-management-backend
|
||||
cp -r $BACKUP_DIR/telegram-management/* /var/www/telegram-management/
|
||||
pm2 start ecosystem.config.js --env production
|
||||
fi
|
||||
```
|
||||
|
||||
### 回滚操作
|
||||
|
||||
**快速回滚脚本** (`rollback.sh`):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
BACKUP_DIR=$1
|
||||
|
||||
if [ -z "$BACKUP_DIR" ]; then
|
||||
echo "使用方法: $0 <备份目录>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "开始回滚到: $BACKUP_DIR"
|
||||
|
||||
# 停止当前服务
|
||||
pm2 stop telegram-management-backend
|
||||
|
||||
# 恢复备份
|
||||
cp -r $BACKUP_DIR/telegram-management/* /var/www/telegram-management/
|
||||
|
||||
# 恢复数据库(如需要)
|
||||
if [ -f "$BACKUP_DIR/database.sql" ]; then
|
||||
mysql -u root -p telegram_management < $BACKUP_DIR/database.sql
|
||||
fi
|
||||
|
||||
# 重启服务
|
||||
pm2 start ecosystem.config.js --env production
|
||||
|
||||
# 验证回滚
|
||||
sleep 10
|
||||
if curl -f http://localhost:3000/health; then
|
||||
echo "回滚成功!"
|
||||
else
|
||||
echo "回滚失败,请手动检查!"
|
||||
fi
|
||||
```
|
||||
|
||||
## 应急响应
|
||||
|
||||
### 应急响应流程
|
||||
|
||||
**P0级故障响应**:
|
||||
|
||||
1. **立即响应** (0-5分钟)
|
||||
- 确认故障并评估影响范围
|
||||
- 启动应急响应团队
|
||||
- 尝试快速恢复操作
|
||||
|
||||
2. **缓解措施** (5-15分钟)
|
||||
- 实施临时解决方案
|
||||
- 切换到备用系统(如有)
|
||||
- 通知用户和利益相关者
|
||||
|
||||
3. **根因分析** (15分钟-1小时)
|
||||
- 收集故障相关信息
|
||||
- 分析根本原因
|
||||
- 制定修复计划
|
||||
|
||||
4. **彻底修复** (1-4小时)
|
||||
- 实施永久性修复
|
||||
- 验证修复效果
|
||||
- 更新监控和告警
|
||||
|
||||
5. **事后总结** (24小时内)
|
||||
- 编写故障报告
|
||||
- 总结经验教训
|
||||
- 改进预防措施
|
||||
|
||||
### 应急联系信息
|
||||
|
||||
**联系清单**:
|
||||
|
||||
```bash
|
||||
# 应急联系人
|
||||
PRIMARY_ONCALL="张三 <zhangsan@company.com> +86-138-0000-0000"
|
||||
SECONDARY_ONCALL="李四 <lisi@company.com> +86-138-1111-1111"
|
||||
MANAGER="王五 <wangwu@company.com> +86-138-2222-2222"
|
||||
|
||||
# 外部服务联系方式
|
||||
CLOUD_PROVIDER_SUPPORT="+86-400-xxx-xxxx"
|
||||
DNS_PROVIDER_SUPPORT="support@dns-provider.com"
|
||||
SSL_PROVIDER_SUPPORT="support@ssl-provider.com"
|
||||
```
|
||||
|
||||
### 故障通知模板
|
||||
|
||||
**故障通知邮件模板**:
|
||||
|
||||
```
|
||||
主题:[P0故障] Telegram Management System服务异常
|
||||
|
||||
故障等级:P0 - 紧急
|
||||
发生时间:2024-01-01 14:30:00
|
||||
影响范围:全部用户
|
||||
故障现象:服务无响应,所有API调用失败
|
||||
|
||||
当前状态:正在处理中
|
||||
预计恢复时间:15:00:00
|
||||
|
||||
已采取措施:
|
||||
1. 重启应用服务
|
||||
2. 检查数据库连接
|
||||
3. 启动备用服务器
|
||||
|
||||
后续更新将在30分钟内发送。
|
||||
|
||||
运维团队
|
||||
Telegram Management System
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
本运维操作手册提供了Telegram Management System的完整运维指导,涵盖了日常操作、监控、故障处理、性能优化、安全管理、备份恢复、版本更新和应急响应等各个方面。请运维团队严格按照手册执行各项操作,确保系统稳定运行。
|
||||
Reference in New Issue
Block a user