Files
telegram-management-system/OPERATIONS.md
你的用户名 237c7802e5
Some checks failed
Deploy / deploy (push) Has been cancelled
Initial commit: Telegram Management System
Full-stack web application for Telegram management
- Frontend: Vue 3 + Vben Admin
- Backend: NestJS
- Features: User management, group broadcast, statistics

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-04 15:37:50 +08:00

940 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Telegram Management System - 运维操作手册
本手册提供了Telegram Management System日常运维操作的详细指导包括常见操作、故障处理、性能调优和安全管理。
## 目录
- [日常运维操作](#日常运维操作)
- [系统监控](#系统监控)
- [故障诊断与处理](#故障诊断与处理)
- [性能调优](#性能调优)
- [安全管理](#安全管理)
- [备份与恢复](#备份与恢复)
- [版本更新](#版本更新)
- [应急响应](#应急响应)
## 日常运维操作
### 服务状态检查
**检查应用服务状态**:
```bash
# PM2服务状态
pm2 status
pm2 monit
# 检查进程
ps aux | grep node
ps aux | grep telegram-management
# 检查端口监听
netstat -tlnp | grep :3000
ss -tlnp | grep :3000
# 检查服务响应
curl -I http://localhost:3000/health
curl -s http://localhost:3000/health/detailed | jq .
```
**检查数据库状态**:
```bash
# MySQL服务状态
sudo systemctl status mysql
mysqladmin -u root -p status
mysqladmin -u root -p processlist
# 连接数检查
mysql -u root -p -e "SHOW STATUS LIKE 'Threads_connected';"
mysql -u root -p -e "SHOW STATUS LIKE 'Max_used_connections';"
# 慢查询检查
mysql -u root -p -e "SHOW STATUS LIKE 'Slow_queries';"
```
**检查Redis状态**:
```bash
# Redis服务状态
sudo systemctl status redis
redis-cli ping
# Redis信息
redis-cli info server
redis-cli info memory
redis-cli info stats
# 连接数检查
redis-cli info clients
```
### 日志管理
**应用日志查看**:
```bash
# PM2日志
pm2 logs telegram-management-backend
pm2 logs telegram-management-backend --lines 100
# 应用日志文件
tail -f backend/logs/app.log
tail -f backend/logs/error.log
tail -f backend/logs/access.log
# 筛选错误日志
grep -i error backend/logs/app.log
grep -i "500\|error\|exception" backend/logs/access.log
```
**系统日志查看**:
```bash
# 系统日志
sudo journalctl -u telegram-management-backend -f
sudo journalctl -u mysql -f
sudo journalctl -u redis -f
# Nginx日志
sudo tail -f /var/log/nginx/access.log
sudo tail -f /var/log/nginx/error.log
```
**日志轮转管理**:
```bash
# 手动轮转日志
sudo logrotate -f /etc/logrotate.d/telegram-management
# 检查日志轮转状态
sudo logrotate -d /etc/logrotate.d/telegram-management
# 清理旧日志
find backend/logs -name "*.log.*" -mtime +30 -delete
```
### 磁盘空间管理
**磁盘使用检查**:
```bash
# 磁盘使用情况
df -h
du -sh /var/www/telegram-management/*
# 查找大文件
find /var/www/telegram-management -type f -size +100M -exec ls -lh {} \;
# 分析目录大小
du -h --max-depth=1 /var/www/telegram-management/
```
**清理临时文件**:
```bash
# 清理应用临时文件
rm -rf backend/tmp/*
rm -rf backend/sessions/tmp_*
# 清理系统临时文件
sudo rm -rf /tmp/telegram-*
sudo rm -rf /var/tmp/telegram-*
# 清理npm缓存
npm cache clean --force
```
### 数据库维护
**日常维护操作**:
```bash
# 数据库优化
mysql -u root -p -e "OPTIMIZE TABLE telegram_management.group_tasks;"
mysql -u root -p -e "OPTIMIZE TABLE telegram_management.tg_account_pool;"
mysql -u root -p -e "OPTIMIZE TABLE telegram_management.risk_logs;"
# 分析表统计信息
mysql -u root -p -e "ANALYZE TABLE telegram_management.group_tasks;"
# 检查表状态
mysql -u root -p -e "CHECK TABLE telegram_management.group_tasks;"
# 修复表(如需要)
mysql -u root -p -e "REPAIR TABLE telegram_management.group_tasks;"
```
**清理历史数据**:
```sql
-- 清理30天前的风控日志
DELETE FROM risk_logs WHERE createdAt < DATE_SUB(NOW(), INTERVAL 30 DAY);
-- 清理90天前的异常日志
DELETE FROM anomaly_logs WHERE createdAt < DATE_SUB(NOW(), INTERVAL 90 DAY);
-- 清理完成的任务记录保留6个月
DELETE FROM group_tasks
WHERE status = 'completed'
AND completedAt < DATE_SUB(NOW(), INTERVAL 6 MONTH);
-- 优化表空间
OPTIMIZE TABLE risk_logs, anomaly_logs, group_tasks;
```
## 系统监控
### 关键指标监控
**系统资源监控脚本** (`monitor.sh`):
```bash
#!/bin/bash
LOG_FILE="/var/log/telegram-management-monitor.log"
ALERT_THRESHOLD_CPU=80
ALERT_THRESHOLD_MEM=85
ALERT_THRESHOLD_DISK=90
# 获取系统指标
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
MEM_USAGE=$(free | grep Mem | awk '{printf("%.2f", ($3/$2) * 100.0)}')
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
# 记录指标
echo "$(date '+%Y-%m-%d %H:%M:%S') - CPU: ${CPU_USAGE}%, MEM: ${MEM_USAGE}%, DISK: ${DISK_USAGE}%" >> $LOG_FILE
# 检查告警条件
if (( $(echo "$CPU_USAGE > $ALERT_THRESHOLD_CPU" | bc -l) )); then
echo "ALERT: High CPU usage: ${CPU_USAGE}%" | logger -t telegram-management
fi
if (( $(echo "$MEM_USAGE > $ALERT_THRESHOLD_MEM" | bc -l) )); then
echo "ALERT: High memory usage: ${MEM_USAGE}%" | logger -t telegram-management
fi
if [ "$DISK_USAGE" -gt "$ALERT_THRESHOLD_DISK" ]; then
echo "ALERT: High disk usage: ${DISK_USAGE}%" | logger -t telegram-management
fi
```
**应用性能监控**:
```bash
# HTTP响应时间检查
curl -o /dev/null -s -w "响应时间: %{time_total}s\n" http://localhost:3000/health
# 数据库连接检查
mysql -u tg_user -p -e "SELECT COUNT(*) as active_connections FROM information_schema.processlist;"
# Redis性能检查
redis-cli --latency-history -i 1
# PM2性能监控
pm2 show telegram-management-backend
```
### 自动化监控脚本
**健康检查脚本** (`health-check.sh`):
```bash
#!/bin/bash
SERVICE_NAME="telegram-management-backend"
HEALTH_URL="http://localhost:3000/health"
EMAIL_ALERT="admin@yourdomain.com"
# 检查PM2进程
if ! pm2 list | grep -q "$SERVICE_NAME.*online"; then
echo "服务 $SERVICE_NAME 未运行,尝试重启..."
pm2 restart $SERVICE_NAME
# 等待服务启动
sleep 10
# 再次检查
if ! pm2 list | grep -q "$SERVICE_NAME.*online"; then
echo "服务重启失败,发送告警邮件"
echo "服务 $SERVICE_NAME 重启失败,请立即检查" | mail -s "紧急:服务异常" $EMAIL_ALERT
fi
fi
# 检查HTTP响应
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" $HEALTH_URL)
if [ "$HTTP_CODE" != "200" ]; then
echo "健康检查失败HTTP状态码: $HTTP_CODE"
echo "健康检查失败HTTP状态码: $HTTP_CODE" | mail -s "告警:健康检查失败" $EMAIL_ALERT
fi
# 检查数据库连接
if ! mysql -u tg_user -p$DB_PASSWORD -e "SELECT 1;" &> /dev/null; then
echo "数据库连接失败"
echo "数据库连接失败,请检查数据库服务" | mail -s "告警:数据库连接失败" $EMAIL_ALERT
fi
# 检查Redis连接
if ! redis-cli ping &> /dev/null; then
echo "Redis连接失败"
echo "Redis连接失败请检查Redis服务" | mail -s "告警Redis连接失败" $EMAIL_ALERT
fi
```
**定时任务配置**:
```bash
# 编辑定时任务
crontab -e
# 添加以下内容:
# 每分钟检查系统资源
* * * * * /path/to/monitor.sh
# 每5分钟进行健康检查
*/5 * * * * /path/to/health-check.sh
# 每小时备份重要数据
0 * * * * /path/to/backup.sh
# 每天凌晨清理日志
0 2 * * * /path/to/cleanup-logs.sh
```
## 故障诊断与处理
### 常见故障诊断
**服务无响应**:
```bash
# 1. 检查进程状态
pm2 status
ps aux | grep node
# 2. 检查端口占用
netstat -tlnp | grep :3000
lsof -i :3000
# 3. 检查系统资源
top
free -h
df -h
# 4. 查看错误日志
pm2 logs telegram-management-backend --err
tail -f backend/logs/error.log
# 5. 重启服务
pm2 restart telegram-management-backend
```
**数据库连接问题**:
```bash
# 1. 检查MySQL服务
sudo systemctl status mysql
sudo systemctl restart mysql
# 2. 检查连接数
mysql -u root -p -e "SHOW STATUS LIKE 'Threads_connected';"
mysql -u root -p -e "SHOW VARIABLES LIKE 'max_connections';"
# 3. 检查锁等待
mysql -u root -p -e "SHOW ENGINE INNODB STATUS\G" | grep -A 20 "LATEST DETECTED DEADLOCK"
# 4. 检查慢查询
mysql -u root -p -e "SHOW STATUS LIKE 'Slow_queries';"
tail -f /var/log/mysql/slow.log
```
**内存泄漏诊断**:
```bash
# 1. 生成堆快照
kill -USR2 $(pgrep -f "telegram-management-backend")
# 2. 分析内存使用
node --inspect backend/src/app.js
# 使用Chrome DevTools连接并分析
# 3. 监控内存增长
while true; do
ps -p $(pgrep -f "telegram-management-backend") -o pid,vsz,rss,comm
sleep 60
done
# 4. 重启服务释放内存
pm2 restart telegram-management-backend
```
### 故障处理流程
**故障分级**:
- **P0 (紧急)**: 服务完全不可用
- **P1 (重要)**: 核心功能异常
- **P2 (一般)**: 部分功能异常
- **P3 (轻微)**: 性能问题或警告
**P0故障处理**:
```bash
# 1. 立即评估影响范围
curl -I http://localhost:3000/health
pm2 status
# 2. 快速恢复服务
pm2 restart telegram-management-backend
# 3. 检查关键组件
sudo systemctl status mysql redis nginx
# 4. 如无法快速恢复,启用备用方案
# (根据实际情况,可能需要切换到备用服务器)
# 5. 记录故障信息
echo "$(date): P0故障 - 服务不可用" >> /var/log/incidents.log
```
**性能问题诊断**:
```bash
# 1. CPU性能分析
top -p $(pgrep -f "telegram-management-backend")
perf top -p $(pgrep -f "telegram-management-backend")
# 2. 数据库性能分析
mysql -u root -p -e "SHOW PROCESSLIST;"
mysql -u root -p -e "SHOW ENGINE INNODB STATUS\G"
# 3. Redis性能分析
redis-cli --bigkeys
redis-cli --hotkeys
redis-cli monitor
# 4. 网络性能分析
netstat -i
iftop
```
## 性能调优
### 应用层优化
**Node.js参数调优**:
```bash
# PM2配置优化
pm2 start ecosystem.config.js --node-args="--max-old-space-size=4096 --optimize-for-size"
# 启用V8优化
export NODE_OPTIONS="--max-old-space-size=4096 --optimize-for-size"
```
**连接池优化**:
```javascript
// 数据库连接池配置
const dbConfig = {
pool: {
max: 50, // 最大连接数
min: 10, // 最小连接数
acquire: 30000, // 获取连接超时时间
idle: 10000 // 连接空闲时间
}
};
// Redis连接池配置
const redisConfig = {
family: 4,
keepAlive: true,
lazyConnect: true,
maxRetriesPerRequest: 3,
retryDelayOnFailover: 100,
enableOfflineQueue: false,
maxmemoryPolicy: 'allkeys-lru'
};
```
### 数据库优化
**查询优化**:
```sql
-- 分析慢查询
SELECT * FROM mysql.slow_log WHERE start_time > DATE_SUB(NOW(), INTERVAL 1 HOUR);
-- 创建复合索引
CREATE INDEX idx_task_status_created ON group_tasks(status, createdAt);
CREATE INDEX idx_account_health_status ON tg_account_pool(healthScore, status);
-- 分区表优化(针对大表)
ALTER TABLE risk_logs PARTITION BY RANGE (YEAR(createdAt)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026)
);
```
**配置优化**:
```ini
# MySQL配置优化
[mysqld]
# InnoDB设置
innodb_buffer_pool_size = 8G
innodb_log_file_size = 512M
innodb_log_buffer_size = 128M
innodb_flush_log_at_trx_commit = 2
# 查询缓存
query_cache_type = 1
query_cache_size = 512M
query_cache_limit = 32M
# 连接设置
max_connections = 1000
thread_cache_size = 100
# 临时表设置
tmp_table_size = 256M
max_heap_table_size = 256M
```
### 缓存优化
**Redis优化策略**:
```bash
# Redis配置调优
redis-cli CONFIG SET maxmemory-policy allkeys-lru
redis-cli CONFIG SET tcp-keepalive 300
redis-cli CONFIG SET timeout 0
# 缓存预热脚本
redis-cli EVAL "
local keys = redis.call('KEYS', 'cache:account:*')
for i=1,#keys do
redis.call('EXPIRE', keys[i], 3600)
end
return #keys
" 0
```
**应用缓存策略**:
```javascript
// 多级缓存实现
class CacheManager {
constructor() {
this.l1Cache = new Map(); // 内存缓存
this.l2Cache = redis; // Redis缓存
}
async get(key) {
// L1缓存查找
if (this.l1Cache.has(key)) {
return this.l1Cache.get(key);
}
// L2缓存查找
const value = await this.l2Cache.get(key);
if (value) {
this.l1Cache.set(key, JSON.parse(value));
return JSON.parse(value);
}
return null;
}
async set(key, value, ttl = 3600) {
this.l1Cache.set(key, value);
await this.l2Cache.setex(key, ttl, JSON.stringify(value));
}
}
```
## 安全管理
### 访问控制
**用户权限管理**:
```bash
# 创建运维用户
sudo useradd -m -s /bin/bash telegram-ops
sudo usermod -aG sudo telegram-ops
# 设置SSH密钥认证
mkdir -p /home/telegram-ops/.ssh
cat >> /home/telegram-ops/.ssh/authorized_keys << EOF
ssh-rsa YOUR_PUBLIC_KEY telegram-ops@management
EOF
chmod 700 /home/telegram-ops/.ssh
chmod 600 /home/telegram-ops/.ssh/authorized_keys
chown -R telegram-ops:telegram-ops /home/telegram-ops/.ssh
```
**数据库安全**:
```sql
-- 创建只读用户(用于监控)
CREATE USER 'monitor'@'localhost' IDENTIFIED BY 'monitor_password';
GRANT SELECT ON telegram_management.* TO 'monitor'@'localhost';
-- 创建备份用户
CREATE USER 'backup'@'localhost' IDENTIFIED BY 'backup_password';
GRANT SELECT, LOCK TABLES ON telegram_management.* TO 'backup'@'localhost';
-- 定期更新密码
ALTER USER 'tg_user'@'localhost' IDENTIFIED BY 'new_secure_password';
FLUSH PRIVILEGES;
```
### 安全审计
**日志审计脚本** (`security-audit.sh`):
```bash
#!/bin/bash
AUDIT_LOG="/var/log/security-audit.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$DATE] 开始安全审计" >> $AUDIT_LOG
# 检查失败的登录尝试
FAILED_LOGINS=$(grep "Failed password" /var/log/auth.log | wc -l)
echo "[$DATE] 失败登录尝试: $FAILED_LOGINS" >> $AUDIT_LOG
# 检查权限异常文件
find /var/www/telegram-management -type f -perm /o+w >> $AUDIT_LOG
# 检查异常进程
ps aux | grep -v "telegram-management\|mysql\|redis\|nginx" | grep -E "(bash|sh).*root" >> $AUDIT_LOG
# 检查网络连接
netstat -an | grep :3000 | grep ESTABLISHED | wc -l >> $AUDIT_LOG
echo "[$DATE] 安全审计完成" >> $AUDIT_LOG
```
**安全加固检查**:
```bash
# 检查系统更新
sudo apt list --upgradable
# 检查开放端口
nmap -sT -O localhost
# 检查文件完整性
find /var/www/telegram-management -type f -name "*.js" -exec md5sum {} \; > checksums.txt
# 检查SSL证书有效期
openssl x509 -in /path/to/cert.pem -text -noout | grep "Not After"
```
## 备份与恢复
### 自动化备份
**完整备份脚本** (`full-backup.sh`):
```bash
#!/bin/bash
BACKUP_BASE="/backup"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30
# 创建备份目录
mkdir -p $BACKUP_BASE/{mysql,redis,files,logs}/$DATE
# 数据库备份
mysqldump -u backup -p$BACKUP_PASS --single-transaction --routines --triggers telegram_management > $BACKUP_BASE/mysql/$DATE/full_backup.sql
gzip $BACKUP_BASE/mysql/$DATE/full_backup.sql
# Redis备份
redis-cli --rdb $BACKUP_BASE/redis/$DATE/dump.rdb
# 文件备份
tar -czf $BACKUP_BASE/files/$DATE/application.tar.gz /var/www/telegram-management
tar -czf $BACKUP_BASE/files/$DATE/sessions.tar.gz /var/www/telegram-management/backend/sessions
# 日志备份
tar -czf $BACKUP_BASE/logs/$DATE/logs.tar.gz /var/www/telegram-management/backend/logs
# 生成备份清单
cat > $BACKUP_BASE/manifest_$DATE.txt << EOF
备份时间: $(date)
数据库大小: $(du -h $BACKUP_BASE/mysql/$DATE/full_backup.sql.gz | cut -f1)
Redis大小: $(du -h $BACKUP_BASE/redis/$DATE/dump.rdb | cut -f1)
应用文件大小: $(du -h $BACKUP_BASE/files/$DATE/application.tar.gz | cut -f1)
会话文件大小: $(du -h $BACKUP_BASE/files/$DATE/sessions.tar.gz | cut -f1)
日志文件大小: $(du -h $BACKUP_BASE/logs/$DATE/logs.tar.gz | cut -f1)
EOF
# 清理过期备份
find $BACKUP_BASE -type f -mtime +$RETENTION_DAYS -delete
find $BACKUP_BASE -type d -empty -delete
echo "备份完成: $DATE"
```
### 恢复操作
**数据库恢复**:
```bash
# 完整恢复
mysql -u root -p telegram_management < backup_file.sql
# 部分表恢复
mysql -u root -p telegram_management -e "DROP TABLE IF EXISTS group_tasks;"
mysqldump -u backup -p backup_telegram_management group_tasks | mysql -u root -p telegram_management
# 恢复验证
mysql -u root -p -e "SELECT COUNT(*) FROM telegram_management.group_tasks;"
```
**应用恢复**:
```bash
# 停止服务
pm2 stop telegram-management-backend
# 恢复应用文件
cd /var/www
sudo rm -rf telegram-management
sudo tar -xzf /backup/files/20240101_020000/application.tar.gz
# 恢复会话文件
sudo tar -xzf /backup/files/20240101_020000/sessions.tar.gz -C /var/www/telegram-management/backend/
# 恢复权限
sudo chown -R telegram-ops:telegram-ops /var/www/telegram-management
sudo chmod +x /var/www/telegram-management/backend/src/app.js
# 重启服务
pm2 start ecosystem.config.js --env production
```
### 灾难恢复
**故障转移步骤**:
```bash
# 1. 评估故障影响
curl -I http://primary-server:3000/health
ping primary-server
# 2. 切换DNS解析到备用服务器
# (需要根据DNS提供商操作)
# 3. 在备用服务器上恢复最新备份
./restore-from-backup.sh latest
# 4. 验证服务功能
curl -I http://backup-server:3000/health
./health-check.sh
# 5. 通知相关人员
echo "故障转移完成,当前使用备用服务器" | mail -s "故障转移通知" team@company.com
```
## 版本更新
### 滚动更新流程
**更新脚本** (`rolling-update.sh`):
```bash
#!/bin/bash
NEW_VERSION=$1
BACKUP_DIR="/backup/pre-update-$(date +%Y%m%d)"
if [ -z "$NEW_VERSION" ]; then
echo "使用方法: $0 <版本号>"
exit 1
fi
echo "开始更新到版本: $NEW_VERSION"
# 1. 创建更新前备份
echo "创建更新前备份..."
mkdir -p $BACKUP_DIR
cp -r /var/www/telegram-management $BACKUP_DIR/
# 2. 下载新版本
echo "下载新版本..."
cd /tmp
git clone -b $NEW_VERSION https://github.com/your-org/telegram-management-system.git
cd telegram-management-system
# 3. 检查依赖变化
echo "检查依赖变化..."
diff package.json /var/www/telegram-management/backend/package.json
# 4. 执行数据库迁移(如需要)
echo "执行数据库迁移..."
cd backend
npm run migrate:check
# 5. 构建新版本
echo "构建前端..."
cd ../frontend
npm install
npm run build
# 6. 停止服务
echo "停止服务..."
pm2 stop telegram-management-backend
# 7. 部署新版本
echo "部署新版本..."
cp -r /tmp/telegram-management-system/backend/* /var/www/telegram-management/backend/
cp -r /tmp/telegram-management-system/frontend/dist/* /var/www/telegram-management/frontend/dist/
# 8. 安装新依赖
cd /var/www/telegram-management/backend
npm install --production
# 9. 执行数据库迁移
npm run migrate
# 10. 启动服务
echo "启动服务..."
pm2 start ecosystem.config.js --env production
# 11. 健康检查
sleep 10
if curl -f http://localhost:3000/health; then
echo "更新成功!"
# 清理临时文件
rm -rf /tmp/telegram-management-system
else
echo "更新失败,开始回滚..."
pm2 stop telegram-management-backend
cp -r $BACKUP_DIR/telegram-management/* /var/www/telegram-management/
pm2 start ecosystem.config.js --env production
fi
```
### 回滚操作
**快速回滚脚本** (`rollback.sh`):
```bash
#!/bin/bash
BACKUP_DIR=$1
if [ -z "$BACKUP_DIR" ]; then
echo "使用方法: $0 <备份目录>"
exit 1
fi
echo "开始回滚到: $BACKUP_DIR"
# 停止当前服务
pm2 stop telegram-management-backend
# 恢复备份
cp -r $BACKUP_DIR/telegram-management/* /var/www/telegram-management/
# 恢复数据库(如需要)
if [ -f "$BACKUP_DIR/database.sql" ]; then
mysql -u root -p telegram_management < $BACKUP_DIR/database.sql
fi
# 重启服务
pm2 start ecosystem.config.js --env production
# 验证回滚
sleep 10
if curl -f http://localhost:3000/health; then
echo "回滚成功!"
else
echo "回滚失败,请手动检查!"
fi
```
## 应急响应
### 应急响应流程
**P0级故障响应**:
1. **立即响应** (0-5分钟)
- 确认故障并评估影响范围
- 启动应急响应团队
- 尝试快速恢复操作
2. **缓解措施** (5-15分钟)
- 实施临时解决方案
- 切换到备用系统(如有)
- 通知用户和利益相关者
3. **根因分析** (15分钟-1小时)
- 收集故障相关信息
- 分析根本原因
- 制定修复计划
4. **彻底修复** (1-4小时)
- 实施永久性修复
- 验证修复效果
- 更新监控和告警
5. **事后总结** (24小时内)
- 编写故障报告
- 总结经验教训
- 改进预防措施
### 应急联系信息
**联系清单**:
```bash
# 应急联系人
PRIMARY_ONCALL="张三 <zhangsan@company.com> +86-138-0000-0000"
SECONDARY_ONCALL="李四 <lisi@company.com> +86-138-1111-1111"
MANAGER="王五 <wangwu@company.com> +86-138-2222-2222"
# 外部服务联系方式
CLOUD_PROVIDER_SUPPORT="+86-400-xxx-xxxx"
DNS_PROVIDER_SUPPORT="support@dns-provider.com"
SSL_PROVIDER_SUPPORT="support@ssl-provider.com"
```
### 故障通知模板
**故障通知邮件模板**:
```
主题:[P0故障] Telegram Management System服务异常
故障等级P0 - 紧急
发生时间2024-01-01 14:30:00
影响范围:全部用户
故障现象服务无响应所有API调用失败
当前状态:正在处理中
预计恢复时间15:00:00
已采取措施:
1. 重启应用服务
2. 检查数据库连接
3. 启动备用服务器
后续更新将在30分钟内发送。
运维团队
Telegram Management System
```
---
本运维操作手册提供了Telegram Management System的完整运维指导涵盖了日常操作、监控、故障处理、性能优化、安全管理、备份恢复、版本更新和应急响应等各个方面。请运维团队严格按照手册执行各项操作确保系统稳定运行。