Hey, since i’m new using Mongo (python driver) i have a problem using look up in pipeline
when i try to join 2 collections by a field it return an empty list although the fields have similar value
So after checking i think because the field that i want to join isn’t an ObjectId so how can I make it an ObjectID (knowing that before making join , i did a project aggregation to get rid of the old ObjectId and keep only 2 field from the first collection)
Another question if i could use look up on a pipeline aggregation instead a collection thank you
Please provide sample documents from the source collections.
Please provide expected results.
Please provide the pipeline you have tried.
As for
you may be able to create a view from the pipeline and then lookup on the view. See https://docs.mongodb.com/manual/core/views/
thanks so here my code here in picture i did a project pipeline for the collections then join i want to group the count and length by id (which i get from different collections after projection stage to remove irrelevent data ) (here i joined 2 pipeline but even if i make a project pipeline on one collection then i do a join to the other collection i get similar result )
thank you
We cannot cut-n-paste an image in our installation. Documents and code is requested so that we do not start from scratch and enter documents and code manually.
See Formatting code and log snippets in posts before posting.
sorry here is my code
collection=db[“c1”]
collection2=db[“c2”]
results=collection2.aggregate([
{ “$unwind”: “$related.list_of_doc”},
{ “$project”: { “related.list_of_doc”: 1,"_id":0} },
{ “$group”: {"_id": “$related.list_of_doc.job_id”,“count”: {"$sum": 1}}}])
print(“Collection1 processing:”)
for i in results:
pprint.pprint(i)
result2=collection.aggregate([
{ "$project": { "job_id": 1,"_id":0} },
{ "$group": {"_id": "$job_id","length": {"$sum": 1}}},
print("******************")
print(“Collection2 processing:”)
for i in result2:
pprint.pprint(i)
resultf=collection.aggregate([
{ "$project": { "job_id": 1,"_id":0} },
{ "$group": {"_id": "$job_id","length": {"$sum": 1}}},
{ "$lookup":
{
"from": "results",
"localField": "_id",
"foreignField": "_id",
"as": "merging"
}}
])
As i mentioned i even tried to join the second collection before aggregation and didn’t work also
the format before projection
for the first collection:
{"_id":“val”,“field1.field1_1.field1_1_1.job_id”:“val”,…} i want only job_id to get the count after doing calculus
for the second collection:
{"_id":“val”,“job_id”:“val”,…} i want only the job_id to get the length which is the count of each job_id
then joining by job_id to get length and count thanks
…
Still no
and
is not marked down as specified in the link I provided. So it means that to experiment with one would have to replace all fancy html quotes with real quotes.
one row from collection1
“_id”: {"$oid": “614358432de31fd”}, “dup_id”: [{“object/field/@id”: “6546ae4b-771-98c7-99544c64b517”}, {“object/field/@last-modified”: “2021-09-13T08:19:15Z”}], “related”: {“list_of_doc”: [{“job_id”: 467, “id_mongo”: {"$oid": “614358184832de31fc”}}, {“job_id”: 498, “id_mongo”: {"$oid": “615462d9d0dc2dec2”}}, {“job_id”: 430, “id_mongo”: {"$oid": “6181642f9671d”}}, {“job_id”: 437, “id_mongo”: {"$oid": “619a5ef98fbc17f9”}}]}}
one row from collection2
{"_id": {"$oid": “603e23f6ecf2”}, “job_id”: 192, “object”: {“OpenIOC”: {"@id": “5ff109ac-8225-4749-8b85-cd3176e24d1e”, “@last-modified”: “2020-12-01T15:39:51Z”, “@published-date”: “0001-01-01T00:00:00”}}
i want to do some calculus from collection 1 and 2 as shown in the code then group by the “job_id”
Your documents cannot be inserted in our installation because the quotes are all wrong. Like
and as already mention please
can i send you .json format of the collection?
Yes well formed json documents formatted like indicated in the provided link.
OpenIOC.json (866.7 KB)
OpenIOC_dup.json (29.1 KB)
there is the json format of the input (_dup is collection 2) thank you
The aggregation that counts works perfectly.
If you add the following stage with will get an array named OpenIOC that contains documents from OpenIOC that matches the job_id (which is not named _id since it is the grouping key).
{
'$lookup':
{
from: 'OpenIOC',
localField: '_id',
foreignField: 'job_id',
as: 'OpenIOC'
}
}
/* Gives something like
[
{ _id: 9000, count: 1, OpenIOC : [ /* 0 documents */ ] }
{ _id: 467, count: 9, OpenIOC: [ /* some documents from collection OpenIOC */ ] }
/* ... more top level documents */
{ _id: 498, count: 9, OpenIOC: [ /* some objects from collection OpenIOC */ }
]
*/
For testing I counted sized the OpenIOC array with
_size_OpenIOC = { "$project" : { "_id" : 1 , "count" : 1 , "_size_OpenIOC" : { "$size" : "$ioc" } } }
which gave:
{ _id: 9000, count: 1, _size_OpenIOC: 0 }
{ _id: 467, count: 9, _size_OpenIOC: 9 }
{ _id: 430, count: 14, _size_OpenIOC: 13 }
{ _id: 445, count: 103, _size_OpenIOC: 13 }
{ _id: 437, count: 238, _size_OpenIOC: 13 }
{ _id: 464, count: 3, _size_OpenIOC: 10 }
{ _id: 441, count: 24, _size_OpenIOC: 13 }
{ _id: 498, count: 9, _size_OpenIOC: 9 }
thank you for the solution but what i’m looking for is summing job_id from collection one (which is _dup.json)
{ “$unwind”: “$related.list_of_doc”},
{ “$project”: { “related.list_of_doc”: 1,"_id":0} },
{ “$group”: {"_id": “$related.list_of_doc.job_id”,“count”: {"$sum": 1}}}])
doing the sale thing on collection 2
{ “$project”: { “job_id”: 1,"_id":0} },
{ “$group”: {"_id": “$job_id”,“length”: {"$sum": 1}}},
then grouping by job_id to have the count and the length thank you
This is what you get with
and this is the result on the shared data set:
I called the field _size_OpenIOC but you may name it length if you wish.
Not necessarily in the most efficient way. With 5.0 you may be able to count from OpenIOC without building the whole array. But the starting point is already not optimal. For example, you $project:“related.list_of_doc” but you only use job_id. You might as well skip the $project stage completely or
$project:{"job_id":"$related.list_of_doc.job_id"}
Before pressing ↰ Reply, I tried the following which works.
// Replace previous $lookup with
_lookup_2 =
{
"$lookup" :
{
"from" : "OpenIOC" ,
"let" : { "job_id" : "$_id" } ,
pipeline :
[
{ "$match" : { "$expr" : { "$eq" : [ "$job_id" , "$$job_id" ]}}} ,
{ "$group" : { "_id" : "$job_id","length":{"$sum":1}}}
] ,
"as" : "ioc"
}
}
// And add stage _project_2 to get the desired format
_project_2 = { "$project" : { "count":1 , "length":{ "$arrayElemAt" : [ "$ioc.length" , 0 ]}}}
thank you so much it helps me
This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.