const MGroupService=require("@src/service/MGroupService"); const MGroupUserService=require("@src/service/MGroupUserService"); const MTgAccountService = require("@src/service/MTgAccountService"); const AccountUsage = require("@src/util/AccountUsage"); const logger = require("@src/util/Log4jUtil"); const MApiDataService = require("@src/service/MApiDataService"); const ClientBus = require("@src/client/ClientBus"); const {Api} = require("telegram"); const GroupMembersService=require("@src/dbService/GroupMembersService"); const MGroupMembers=require("@src/dbModes/MGroupMembers"); const axios=require("axios"); const cheerio = require('cheerio'); const MUserService = require("@src/service/MUserService"); const { Op } = require("sequelize"); const RedisService = require("@src/service/RedisService"); //采集群成员 class GroupCollectionMemberBus{ static getInstance() { //加锁 if (!GroupCollectionMemberBus.instance) { GroupCollectionMemberBus.instance = new GroupCollectionMemberBus(); } return GroupCollectionMemberBus.instance; } constructor() { this.logger=logger.getLogger("GroupCollectionMemberBus"); //正在采集的群组ids this.collectionGroupIdsObj={}; } //正在采集的群组的下标 getAllCollectionGroupIds(){ return Object.keys(this.collectionGroupIdsObj); } //获取账号,根据群组id获取Telegram账号 async getAccount(groupId){ //查询已加入过群的账号,不一定更好,因为群组可能会被解散或者账号被踢出群组 //之前先查询是否有已经加入群组的账号,如果没有就查询一个可用的账号。 //let account = await MTgAccountService.getInstance().getRandomAccountByUsageIdAndAddGroupId(AccountUsage.采集,groupId); let account = await MTgAccountService.getInstance().getOneCollectAccountByApiName("getParticipants"); if(!account){ this.logger.info("没有可用的账号"); } this.logger.info("采集群成员,获取账号:"+JSON.stringify(account)); return account; } //获取client async getClient(account){ //随机apiId let item = await MApiDataService.getInstance().getCanWorkRandomItem(); if (!item) { this.logger.error("群成员采集无apiId"); return null; } let client = await ClientBus.getInstance().getClient(account, item.apiId, item.apiHash, true); if(!client){ this.logger.error("群成员采集无Client"); return null; } try { await client.connect(); }catch (e) { this.logger.error("群成员采集Client连接失败"); } return client; } //通过t.me的方式获取描述 async getDescriptionFromWebByUsername(username){ let desc=null; //通过访问网页的方式获取简介 let tgPage = await axios({ method:"POST", url:"https://t.me/"+ username, }); if (tgPage.status === 200) { let $=cheerio.load(tgPage.data); desc=$(".tgme_page_description").text(); return desc; } } //通过api的方式获取描述 async getDescriptionByApi(client,username){ let desc=null; //通过api的方式获取简介 let tgPage = await client.getChat(username); if (tgPage.status === 200) { desc=tgPage.data.description; return desc; } } //执行结束后的回调,采集指定群组的所有成员。可以发起多个群的采集? //返回值:之前只做单个群的采集,所以不需要返回值,但是做了被群集合采集调用之后,就需要返回值方便进行处理了 //记得注意所有的分支都要处理好返回值 async start(groupId){ // if(this.collectionGroupIdsObj[groupId])return; let group=await MGroupService.getInstance().findById(groupId); if(!group)return { err:"群组不存在" }; let account=await this.getAccount(groupId); if(!account){ this.logger.error("群成员采集找不到账号"); return { err:"群成员采集找不到账号" }; } let client=await this.getClient(account); if(!client){ this.logger.error("群成员采集找不到client"); return { err:"群成员采集找不到client" }; } this.collectionGroupIdsObj[groupId]=1; let groupInfo; let channelId; let accessHash; let fullChannelInfo=null; let arr=[]; let offset=0; let deleteClient=()=>{ ClientBus.getInstance().deleteCache(client); account=null; client=null; } //目前是采集完毕才拉人。这个算是定义一个函数还是执行?????? 命名函数 let finished=()=>{ delete this.collectionGroupIdsObj[groupId]; } //把死循环搞成递归执行???? // waitUntil().interval(500) // .times(100000000) while (true){ if(!client){ account=await this.getAccount(groupId); if(!account){ this.logger.error("群成员采集找不到账号"); return { err:"群成员采集找不到账号" }; } client=await this.getClient(account); if(!client){ this.logger.error("群成员采集找不到client"); finished(); return{ err:"群成员采集找不到client" }; } } if(!groupInfo || groupInfo.err){//群组信息为空或者群组错误不为空 groupInfo=await client.getGroupInfoByLink(group.link); if (!groupInfo || groupInfo.err) { this.logger.error("群成员采集获取群信息失败,跳过"); if(groupInfo&& groupInfo.err){ this.logger.error("GroupInfo.err : "+ groupInfo.err); } if(groupInfo && groupInfo.err && (groupInfo.err.indexOf("USERNAME_NOT_OCCUPIED")!==-1 || groupInfo.err.indexOf("CHANNEL_INVALID")!==-1) || groupInfo.err.indexOf("USERNAME_INVALID")!==-1) //漏掉了这种,导致一直死循环 { deleteClient(); finished(); return { err:groupInfo.err } } if(groupInfo && groupInfo.err && groupInfo.err.indexOf("CHAT_ADMIN_REQUIRED")){ //是频道,而且没有管理员权限。这个时候应该退出循环。 this.logger.info("这个群实际上是一个频道,而且当前账号不是管理员,group username: "+ group.username); //是不是应该return还是 deleteClient(); finished(); return { err:groupInfo.err } } if(groupInfo && groupInfo.err && groupInfo.err.indexOf("群组已经被禁用")){ //这种也从群集合先移除掉吧。 //不能用这个账号了,一般是这个账号被这个群组封禁,或者这个账号异常。 //https://t.me/BinanceRussian 有的这种官方的有认证的大群组,就会出现这个错误。 //或者即使获取群成员也只获取到一百多个。可是这个是十几万人的群。 deleteClient() finished(); return { err:groupInfo.err } } //其他错误 //检查是不是client的问题 if(!client.canContinue()){ deleteClient(); } continue; //这样子会回到循环开始的地方重新去获取一个账号开始。 } //获取群组信息没错,会执行到这里 channelId=groupInfo.id; accessHash=groupInfo.accessHash; }//群组信息为空或者群组错误不为空 //fullChannelInfo =await client.getFullChannel(channelId, accessHash); //这个调用的频率限制是每秒一次,所以这里要等待一秒。??? //如果忘记加await,还会出现错误。变成了异步调用,然后群成员获取完之后提示一个东西出来说什么在断开后进行了网络连接。 //this.logger.info("fullChannelInfo: "+ JSON.stringify(fullChannelInfo)); // {"fullChat":{"flags":139273,"canViewParticipants":true,"canSetUsername":false,"canSetStickers":false,"hiddenPrehistory":false,"canSetLocation":false,"hasScheduled":false,"canViewStats":false,"blocked":false,"flags2":0,"canDeleteChannel":false,"id":"1334373934", // "about":"","participantsCount":4934,"adminsCount":null,"kickedCount":null,"bannedCount":null, // "onlineCount":40,"readInboxMaxId":0,"readOutboxMaxId":157777,"unreadCount":0, // "chatPhoto":{"flags":0,"hasStickers":false,"id":"6174610775218432400","accessHash":"-5621437017086602737", // "fileReference":{"type":"Buffer","data":[0,98,117,193,32,48,133,74,149,77,3,245,64,95,169,64,65,78,244,188,89]},"date":1588980498, // "sizes":[{"type":"a","w":160,"h":160,"size":7958,"className":"PhotoSize"},{"type":"b","w":320,"h":320,"size":19165,"className":"PhotoSize"}, // {"type":"c","w":640,"h":640,"size":46021,"className":"PhotoSize"},{"type":"i","bytes":{"type":"Buffer","data":[1,40,40,217,162,170,73,169,218,199,35,35,59,110,83,131,242,154,111,246,181,167,247,219,254,249,52,236,192,187,69,65,29,204,115,198,94,22,220,7,243,166,130,219,149,183,6,36,224,224,214,124,234,246,29,139,52,81,69,88,140,107,155,121,162,185,153,150,57,25,100,96,193,163,235,244,53,19,125,167,114,178,193,58,225,10,227,105,31,195,138,216,154,19,35,171,6,0,128,71,35,56,207,113,239,80,253,141,242,167,206,57,24,235,147,158,190,254,245,92,192,81,177,180,186,137,12,156,169,44,62,83,220,115,254,53,100,69,115,31,57,206,0,24,81,219,142,156,125,106,83,109,50,148,217,49,32,17,144,73,20,246,183,149,155,62,121,28,231,140,255,0,141,101,40,243,59,177,220,146,19,39,148,158,103,222,218,55,125,104,164,130,22,136,156,200,206,15,169,206,40,166,163,110,162,38,162,138,42,128,40,162,138,0,40,162,138,0]},"className":"PhotoStrippedSize"}],"videoSizes":null,"dcId":5,"className":"Photo"},"notifySettings":{"flags":0,"showPreviews":null,"silent":null,"muteUntil":null,"iosSound":null,"androidSound":null,"otherSound":null,"className":"PeerNotifySettings"},"exportedInvite":null,"botInfo":[],"migratedFromChatId":null,"migratedFromMaxId":null,"pinnedMsgId":null,"stickerset":null,"availableMinId":null,"folderId":null,"linkedChatId":null,"location":null,"slowmodeSeconds":30,"slowmodeNextSendDate":null,"statsDc":null,"pts":212205,"call":null,"ttlPeriod":null,"pendingSuggestions":null,"groupcallDefaultJoinAs":null,"themeEmoticon":null,"requestsPending":null,"recentRequesters":null,"defaultSendAs":null,"availableReactions":null,"className":"ChannelFull"},"chats":[{"flags":4464964,"creator":false,"left":true,"broadcast":false,"verified":false,"megagroup":true,"restricted":false,"signatures":false,"min":false,"scam":false,"hasLink":false,"hasGeo":false,"slowmodeEnabled":true,"callActive":false,"callNotEmpty":false,"fake":false,"gigagroup":false,"noforwards":false,"id":"1334373934","accessHash":"-1124390018036649201","title":"CET 2021 (Ck)","username":"CkCET2021","photo":{"flags":2,"hasVideo":false,"photoId":"6174610775218432400","strippedThumb":{"type":"Buffer","data":[1,8,8,209,79,180,239,253,230,210,185,227,2,138,40,165,107,129]},"dcId":5,"className":"ChatPhoto"},"date":1586679026,"restrictionReason":null,"adminRights":null,"bannedRights":null,"defaultBannedRights":{"flags":165368,"viewMessages":false,"sendMessages":false,"sendMedia":false,"sendStickers":true,"sendGifs":true,"sendGames":true,"sendInline":true,"embedLinks":true,"sendPolls":true,"changeInfo":true,"inviteUsers":true,"pinMessages":true,"untilDate":2147483647,"className":"ChatBannedRights"},"participantsCount":null,"className":"Channel"}],"users":[],"className":"messages.ChatFull"} this.logger.info("马上开始执行获取群成员"); let res=null; try{ res=await client.getParticipants({ channel: new Api.InputChannel({ channelId, accessHash }), //可以获取群成员的过滤器,可以是管理员,可以是踢出的人等类型 filter: new Api.ChannelParticipantsRecent({}), offset: offset, limit: 10000, hash: 2176980 //暂时不知道这个值是什么 }); }catch (e) { this.logger.error(e.toString()); } this.logger.info("保存获取群成员的操作记录"); await RedisService.getInstance().saveGetParticipantsRecord(client.phone,groupId,new Date().getTime()); //获取群成员失败 if(!res || res.err || !res.users.length>0){ if(!client.canContinue()){ //每个请求前,每个请求后。都检查下客户端? deleteClient(); } if(res && res.err && res.err.indexOf("CHAT_ADMIN_REQUIRED")){ //说明这是个频道。有的时候从数据库获取了是群组,但是群组也是可能改成频道啥的。所以还是要具体判断。 //需要返回情况,方便处理整理的集合里面有频道的情况,方便进行群集合的剔除。或者在源头上进行群集合的过滤。 this.logger.info("这个群实际上是一个频道,而且当前账号不是管理员,group username: "+ group.username); deleteClient(); finished(); return { err:res.err } // return { // err:"频道读取成员需要是管理员权限:CHAT_ADMIN_REQUIRED" // } } if( res&& res.users.length<=0){ this.logger.info("群成员采集获取群成员失败或者是最后一页,跳出循环"); //获取到为零,也可能是最后一页吧? break; } //其他的错误情况,比如群组不存在,群组不是群组等等。 continue; } this.logger.info("获取到分页成员数量:"+res.users.length); for(let i=0;i{ //检查是不是机器人,发现机器人很多是没有 firstname lastname的 let isOk = true if(item.bot || (!item.firstName&&!item.lastName) || item.deleted || !item.username) { //暂时没有username的先不要,后期能拉uid的时候再处理。 // this.logger.info("机器人,不插入"); isOk = false } return isOk; }); await this.saveGroupUserTran(arr, groupId) deleteClient(); finished(); //返回采集到的符合条件的群成员 return arr; } async saveGroupUserTran(tgUserArr, groupId) { await this.saveUser(tgUserArr, groupId) await this.saveUserGroup(tgUserArr, groupId) await MGroupService.getInstance().updateUserCountAndLastUpdateTime(groupId); } async saveUser(userArray, groupId, opts) { // 去除数据库中已有的数据 this.logger.info("用户开始入库", userArray.length) let memberIdArr = userArray.map(item=>{ return item.id }) let dbUserArr = await MUserService.getInstance().findAllByParam({ attributes:["user_id"], where:{ user_id:{ [Op.in]: memberIdArr } } }) let dbUserIdArr = dbUserArr.map(e=>e['user_id']) // 过滤掉 已经在数据库的数据 userArray = userArray.filter(e=> !dbUserIdArr.includes(e.id) ) // 转换为数据库对象 let userDbArray = userArray.map(item=>{ return { group_id:groupId, username:item.username, //用户名 会有为空的情况 //fist_name:item.firstName, // 就因为first_name 错误写成fist_name 导致有的字段变成空数据,但是照样可以插入。只是没有firstname lastname。也没有报错,可笑了。 //看来不能直接写字段名字,要用对象的方式。面向对象编程才不容易报错。调试了好久,老师的 first_name:item.firstName, last_name:item.lastName, //不能为空 user_id:item.id, access_hash: item.accessHash+"", //SequelizeValidationError: string violation: access_hash cannot be an array or an object phone:item.phone, //会有为空的情况 // description: , //user_status:item.status.className, //会有为空的时候报错,处理一下 update_status_time:new Date().getTime(), update_time:new Date().getTime(), //last_online_time:item.status.wasOnline, ////会有为空的时候报错,处理一下 } } ) // 批量插入 let result = await MUserService.getInstance().bulkCreate(userDbArray, opts) this.logger.debug("用户入库结束", userDbArray.length) return result; } // 保存用户和群组的关系 async saveUserGroup(tgUserArray, groupId, opts = {}) { if (!tgUserArray || tgUserArray.length === 0 || !groupId) { return [] } this.logger.debug("用户群组关系入库开始", tgUserArray.length) const tgUserIdArr = tgUserArray.map(e=>e.id); // 获取数据库的userId let userDbArr = await MUserService.getInstance().getModel().findAll({ attributes: ['id'], where:{ user_id: {[Op.in]: tgUserIdArr} } }) let dbUserIdArr = userDbArr.map(e=>e.id) // 查询关系表中的记录 const guDbArr = await MGroupUserService.getInstance().findAllByParam({ attributes: ['user_id'], where: { group_id: groupId, user_id: { [Op.in]: dbUserIdArr } } }) // 过滤掉已存在的记录 const guDbUserIdArr = guDbArr.map(e=>e.user_id) dbUserIdArr = dbUserIdArr.filter(e=>!guDbUserIdArr.includes(e)) // 插入群组与用户的关系 const groupUserDbArr = dbUserIdArr.map(e=>{return { user_id: e, group_id: groupId }}) const result = await MGroupUserService.getInstance().getModel().bulkCreate(groupUserDbArr, opts) this.logger.debug("用户群组关系入库结束", groupUserDbArr.length) return result } } module.exports=GroupCollectionMemberBus; //插入mongo数据库 // let member=new MGroupMembers(); // member.groupId=groupId; // member.username=item.username; // member.firstname=item.firstName; // member.lastname=item.lastName; // member.desc=desc; // let status=item.status; // if(status){ // member.userStatus=status.className; // member.updateStatusTime=new Date().getTime(); // if(status.className === "userStatusOffline"){ // member.lastOnlineTime=status.wasOnline; // } // } // await GroupMembersService.getInstance().create(member.getObject());