限制聚合聚合中的聚合

我有这样的集合,但有更多的数据。

{ _id: ObjectId("db759d014f70743495ef1000"), tracked_item_origin: "winword", tracked_item_type: "Software", machine_user: "mmm.mmm", organization_id: ObjectId("a91864df4f7074b33b020000"), group_id: ObjectId("20ea74df4f7074b33b520000"), tracked_item_id: ObjectId("1a050df94f70748419140000"), tracked_item_name: "Word", duration: 9540, } { _id: ObjectId("2b769d014f70743495fa1000"), tracked_item_origin: "http://www.facebook.com", tracked_item_type: "Site", machine_user: "gabriel.mello", organization_id: ObjectId("a91864df4f7074b33b020000"), group_id: ObjectId("3f6a64df4f7074b33b040000"), tracked_item_id: ObjectId("6f3466df4f7074b33b080000"), tracked_item_name: "Facebook", duration: 7920, } 

我做一个聚合,ho返回分组数据,如下所示:

 {"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"Twitter"}, "duration"=>288540}, {"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"ANoticia"}, "duration"=>237300}, {"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"Facebook"}, "duration"=>203460}, {"_id"=>{"tracked_item_type"=>"Software", "tracked_item_name"=>"Word"}, "duration"=>269760}, {"_id"=>{"tracked_item_type"=>"Software", "tracked_item_name"=>"Excel"}, "duration"=>204240} 

简单的聚合代码:

 AgentCollector.collection.aggregate( {'$match' => {group_id: '20ea74df4f7074b33b520000'}}, {'$group' => { _id: {tracked_item_type: '$tracked_item_type', tracked_item_name: '$tracked_item_name'}, duration: {'$sum' => '$duration'} }}, {'$sort' => { '_id.tracked_item_type' => 1, duration: -1 }} ) 

有一种方法可以通过tracked_item_type键仅限制2个项目吗? 防爆。 2个站点和2个软件。

由于你的问题目前还不清楚,我真的希望你的意思是你要指定两个Site键和2个Software键,因为这是一个很好而简单的答案,你可以添加到你的$ match阶段,如:

 {$match: { group_id: "20ea74df4f7074b33b520000", tracked_item_name: {$in: ['Twitter', 'Facebook', 'Word', 'Excel' ] } }}, 

我们都欢呼,快乐;)

但是,如果你的问题是更恶魔般的问题,例如,按照持续时间从结果中获取前2个SitesSoftware条目,那么我们非常感谢你产生这种可憎的东西

警告:

您的里程可能会因您实际想做的事情而有所不同,或者是否会因您的结果大小而爆炸。 但这是作为您所处的内容的一个例子:

 db.collection.aggregate([ // Match items first to reduce the set {$match: {group_id: "20ea74df4f7074b33b520000" }}, // Group on the types and "sum" of duration {$group: { _id: { tracked_item_type: "$tracked_item_type", tracked_item_name: "$tracked_item_name" }, duration: {$sum: "$duration"} }}, // Sort by type and duration descending {$sort: { "_id.tracked_item_type": 1, duration: -1 }}, /* The fun part */ // Re-shape results to "sites" and "software" arrays {$group: { _id: null, sites: {$push: {$cond: [ {$eq: ["$_id.tracked_item_type", "Site" ]}, { _id: "$_id", duration: "$duration" }, null ]} }, software: {$push: {$cond: [ {$eq: ["$_id.tracked_item_type", "Software" ]}, { _id: "$_id", duration: "$duration" }, null ]} } }}, // Remove the null values for "software" {$unwind: "$software"}, {$match: { software: {$ne: null} }}, {$group: { _id: "$_id", software: {$push: "$software"}, sites: {$first: "$sites"} }}, // Remove the null values for "sites" {$unwind: "$sites"}, {$match: { sites: {$ne: null} }}, {$group: { _id: "$_id", software: {$first: "$software"}, sites: {$push: "$sites"} }}, // Project out software and limit to the *top* 2 results {$unwind: "$software"}, {$project: { _id: 0, _id: { _id: "$software._id", duration: "$software.duration" }, sites: "$sites" }}, {$limit : 2}, // Project sites, grouping multiple software per key, requires a sort // then limit the *top* 2 results {$unwind: "$sites"}, {$group: { _id: { _id: "$sites._id", duration: "$sites.duration" }, software: {$push: "$_id" } }}, {$sort: { "_id.duration": -1 }}, {$limit: 2} ]) 

现在导致的结果是* 不完全是理想的结果,但是它可以以编程方式工作,并且比在循环中过滤先前的结果更好。 (我的测试数据)

 { "result" : [ { "_id" : { "_id" : { "tracked_item_type" : "Site", "tracked_item_name" : "Digital Blasphemy" }, "duration" : 8000 }, "software" : [ { "_id" : { "tracked_item_type" : "Software", "tracked_item_name" : "Word" }, "duration" : 9540 }, { "_id" : { "tracked_item_type" : "Software", "tracked_item_name" : "Notepad" }, "duration" : 4000 } ] }, { "_id" : { "_id" : { "tracked_item_type" : "Site", "tracked_item_name" : "Facebook" }, "duration" : 7920 }, "software" : [ { "_id" : { "tracked_item_type" : "Software", "tracked_item_name" : "Word" }, "duration" : 9540 }, { "_id" : { "tracked_item_type" : "Software", "tracked_item_name" : "Notepad" }, "duration" : 4000 } ] } ], "ok" : 1 } 

因此,您会看到arrays中排名前2的Sites ,每个Sites嵌入了前2个Software项目。 聚合本身,无法进一步明确这一点,因为我们需要重新合并我们拆分的项目才能做到这一点,而且还没有我们可以用来执行此操作的运算符。

但这很有趣。 它并非一路完成,但大多数情况下,将其转化为4文档响应将是相对简单的代码。 但是我的头已经伤了。