Docs Menu

Docs Home โ†’ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐœ๋ฐœ โ†’ Python ๋“œ๋ผ์ด๋ฒ„ โ†’ PyMongo

๊ทธ๋ฃน ๋ฐ ํ•ฉ๊ณ„

์ด ํŽ˜์ด์ง€์˜ ๋‚ด์šฉ

  • ์„œ๋ก 
  • ์ง‘๊ณ„ ์ž‘์—… ์š”์•ฝ
  • ์‹œ์ž‘ํ•˜๊ธฐ ์ „์—
  • ํŠœํ† ๋ฆฌ์–ผ
  • 2020๋…„ ์ฃผ๋ฌธ์— ๋งค์น˜ ๋‹จ๊ณ„ ์ถ”๊ฐ€
  • ์ฃผ๋ฌธ ๋‚ ์งœ๋ณ„๋กœ ์ •๋ ฌํ•˜๋Š” sort ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค
  • ์ด๋ฉ”์ผ ์ฃผ์†Œ๋ณ„๋กœ ๊ทธ๋ฃนํ™”ํ•  ๊ทธ๋ฃน ๋‹จ๊ณ„ ์ถ”๊ฐ€
  • ์ฒซ ์ฃผ๋ฌธ ๋‚ ์งœ๋ณ„๋กœ ์ •๋ ฌํ•˜๋Š” ์ •๋ ฌ ๋‹จ๊ณ„ ์ถ”๊ฐ€
  • ์ด๋ฉ”์ผ ์ฃผ์†Œ๋ฅผ ํ‘œ์‹œํ•˜๋Š” ๋‹จ๊ณ„ ์ถ”๊ฐ€
  • ์„ค์ •๋˜์ง€ ์•Š์€ ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ถˆํ•„์š”ํ•œ ํ•„๋“œ ์ œ๊ฑฐ
  • ์ง‘๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ ํ•ด์„

์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” PyMongo๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ง‘๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์„ฑํ•˜๊ณ , ์ปฌ๋ ‰์…˜์—์„œ ์ง‘๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ , ์ƒ˜ํ”Œ ์•ฑ์„ ์™„๋ฃŒํ•˜๊ณ  ์‹คํ–‰ํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ์ธ์‡„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์• ๊ทธ๋ฆฌ๊ฒŒ์ด์…˜์€ ๋‹ค์Œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

  • ๋ฌธ์„œ์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์„ ํ•„๋“œ ๊ฐ’์œผ๋กœ ์ผ์น˜

  • ๊ณตํ†ต ํ•„๋“œ ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ๋ฌธ์„œ ๊ทธ๋ฃนํ™”

  • ๊ฐ ๊ฒฐ๊ณผ ๋ฌธ์„œ์— ๊ณ„์‚ฐ๋œ ํ•„๋“œ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ๊ณ ๊ฐ ์ฃผ๋ฌธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋ฃนํ™”ํ•˜๊ณ  ๋ถ„์„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์—๋Š” 2020๋…„์— ํ’ˆ๋ชฉ์„ ๊ตฌ๋งคํ•œ ๊ณ ๊ฐ ๋ชฉ๋ก์ด ํ‘œ์‹œ๋˜๋ฉฐ ๊ฐ ๊ณ ๊ฐ์˜ 2020๋…„ ์ฃผ๋ฌธ ๋‚ด์—ญ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

์ด ์˜ˆ์—์„œ๋Š” ๊ฐœ๋ณ„ ์ œํ’ˆ ์ฃผ๋ฌธ์„ ์„ค๋ช…ํ•˜๋Š” ๋ฌธ์„œ๊ฐ€ ํฌํ•จ๋œ orders collection ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์ฃผ๋ฌธ์€ ํ•œ ๋ช…์˜ ๊ณ ๊ฐ์—๊ฒŒ๋งŒ ํ•ด๋‹นํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ฃผ๋ฌธ ๋ฌธ์„œ๋Š” ๊ณ ๊ฐ ์ด๋ฉ”์ผ ์ฃผ์†Œ๊ฐ€ ํฌํ•จ๋œ customer_id ํ•„๋“œ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ทธ๋ฃนํ™”๋ฉ๋‹ˆ๋‹ค.

์ด ํŠœํ† ๋ฆฌ์–ผ์„ ์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ์• ๊ทธ๋ฆฌ๊ฒŒ์ด์…˜ ํ…œํ”Œ๋ฆฟ ์•ฑ ์ง€์นจ์„ ์™„๋ฃŒํ•˜์—ฌ ์ž‘๋™ํ•˜๋Š” Python ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ ์„ค์ •ํ•˜๋‹ค ํ•˜์„ธ์š”.

์•ฑ์„ ์„ค์ •ํ•œ ํ›„ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ orders ์ปฌ๋ ‰์…˜์— ์•ก์„ธ์Šคํ•ฉ๋‹ˆ๋‹ค.

orders_coll = agg_db["orders"]

๋‹ค์Œ ์ฝ”๋“œ์— ํ‘œ์‹œ๋œ ๋Œ€๋กœ ๊ธฐ์กด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ญ์ œํ•˜๊ณ  ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๋ฅผ orders collection์— ์‚ฝ์ž…ํ•ฉ๋‹ˆ๋‹ค.

orders_coll.delete_many({})
order_data = [
{
"customer_id": "elise_smith@myemail.com",
"orderdate": datetime(2020, 5, 30, 8, 35, 52),
"value": 231
},
{
"customer_id": "elise_smith@myemail.com",
"orderdate": datetime(2020, 1, 13, 9, 32, 7),
"value": 99
},
{
"customer_id": "oranieri@warmmail.com",
"orderdate": datetime(2020, 1, 1, 8, 25, 37),
"value": 63
},
{
"customer_id": "tj@wheresmyemail.com",
"orderdate": datetime(2019, 5, 28, 19, 13, 32),
"value": 2
},
{
"customer_id": "tj@wheresmyemail.com",
"orderdate": datetime(2020, 11, 23, 22, 56, 53),
"value": 187
},
{
"customer_id": "tj@wheresmyemail.com",
"orderdate": datetime(2020, 8, 18, 23, 4, 48),
"value": 4
},
{
"customer_id": "elise_smith@myemail.com",
"orderdate": datetime(2020, 12, 26, 8, 55, 46),
"value": 4
},
{
"customer_id": "tj@wheresmyemail.com",
"orderdate": datetime(2021, 2, 28, 7, 49, 32),
"value": 1024
},
{
"customer_id": "elise_smith@myemail.com",
"orderdate": datetime(2020, 10, 3, 13, 49, 44),
"value": 102
}
]
orders_coll.insert_many(order_data)
1

๋จผ์ € 2020 ์— ์ ‘์ˆ˜๋œ ์ฃผ๋ฌธ๊ณผ ์ผ์น˜ํ•˜๋Š” $match ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

pipeline.append({
"$match": {
"orderdate": {
"$gte": datetime(2020, 1, 1, 0, 0, 0),
"$lt": datetime(2021, 1, 1, 0, 0, 0)
}
}
})
2

๋‹ค์Œ์œผ๋กœ, $sort ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ orderdate ํ•„๋“œ์— ์˜ค๋ฆ„์ฐจ์ˆœ ์ •๋ ฌ์„ ์„ค์ •ํ•˜์—ฌ ๊ฐ ๊ณ ๊ฐ์˜ ๊ฐ€์žฅ ์ด๋ฅธ 2020 ๊ตฌ๋งค๋ฅผ ๋‹ค์Œ ๋‹จ๊ณ„์—์„œ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.

pipeline.append({
"$sort": {
"orderdate": 1
}
})
3

$group ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ customer_id ํ•„๋“œ ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ์ฃผ๋ฌธ์„ ๊ทธ๋ฃนํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„์—์„œ๋Š” ๊ฒฐ๊ณผ ๋ฌธ์„œ์— ๋‹ค์Œ ํ•„๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์• ๊ทธ๋ฆฌ๊ฒŒ์ด์…˜ ์ž‘์—…์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

  • first_purchase_date: ๊ณ ๊ฐ์˜ ์ฒซ ๊ตฌ๋งค ๋‚ ์งœ

  • total_value: ๊ณ ๊ฐ์˜ ๋ชจ๋“  ๊ตฌ๋งค ์ด์•ก

  • total_orders: ๊ณ ๊ฐ์˜ ์ด ๊ตฌ๋งค ํšŸ์ˆ˜

  • orders: ๊ฐ ๊ตฌ๋งค์˜ ๋‚ ์งœ ๋ฐ ๊ธˆ์•ก์„ ํฌํ•จํ•œ ๊ณ ๊ฐ์˜ ๋ชจ๋“  ๊ตฌ๋งค ๋ชฉ๋ก

pipeline.append({
"$group": {
"_id": "$customer_id",
"first_purchase_date": {"$first": "$orderdate"},
"total_value": {"$sum": "$value"},
"total_orders": {"$sum": 1},
"orders": {"$push": {"orderdate": "$orderdate", "value": "$value"}}
}
})
4

๋‹ค์Œ์œผ๋กœ, ๋‹ค๋ฅธ $sort ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ first_purchase_date ํ•„๋“œ์— ์˜ค๋ฆ„์ฐจ์ˆœ ์ •๋ ฌ์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

pipeline.append({
"$sort": {
"first_purchase_date": 1
}
})
5

$set ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ $group ๋‹จ๊ณ„์—์„œ ์„ค์ •ํ•œ _id ํ•„๋“œ์˜ ๊ฐ’์—์„œ customer_id ํ•„๋“œ๋ฅผ ๋‹ค์‹œ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

pipeline.append({
"$set": {
"customer_id": "$_id"
}
})
6

๋งˆ์ง€๋ง‰์œผ๋กœ $unset ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. $unset ๋‹จ๊ณ„๋Š” ๊ฒฐ๊ณผ ๋ฌธ์„œ์—์„œ _id ํ•„๋“œ๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

pipeline.append({"$unset": ["_id"]})
7

orders collection์—์„œ ์• ๊ทธ๋ฆฌ๊ฒŒ์ด์…˜์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋์— ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜์„ธ์š”.

aggregation_result = orders_coll.aggregate(pipeline)

๋งˆ์ง€๋ง‰์œผ๋กœ shell์—์„œ ๋‹ค์Œ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์—ฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.

python3 agg_tutorial.py
8

์ด ์• ๊ทธ๋ฆฌ๊ฒŒ์ด์…˜์€ 2020๋…„์˜ ๊ณ ๊ฐ ์ฃผ๋ฌธ์— ๋Œ€ํ•œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์š”์•ฝ์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

{
'first_purchase_date': datetime.datetime(2020, 1, 1, 8, 25, 37),
'total_value': 63,
'total_orders': 1,
'orders': [ { 'orderdate': datetime.datetime(2020, 1, 1, 8, 25, 37), 'value': 63 } ],
'customer_id': 'oranieri@warmmail.com'
}
{
'first_purchase_date': datetime.datetime(2020, 1, 13, 9, 32, 7),
'total_value': 436,
'total_orders': 4,
'orders': [
{ 'orderdate': datetime.datetime(2020, 1, 13, 9, 32, 7), 'value': 99 },
{ 'orderdate': datetime.datetime(2020, 5, 30, 8, 35, 52), 'value': 231 },
{ 'orderdate': datetime.datetime(2020, 10, 3, 13, 49, 44), 'value': 102 },
{ 'orderdate': datetime.datetime(2020, 12, 26, 8, 55, 46), 'value': 4 }
],
'customer_id': 'elise_smith@myemail.com'
}
{
'first_purchase_date': datetime.datetime(2020, 8, 18, 23, 4, 48),
'total_value': 191,
'total_orders': 2,
'orders': [
{ 'orderdate': datetime.datetime(2020, 8, 18, 23, 4, 48), 'value': 4 },
{ 'orderdate': datetime.datetime(2020, 11, 23, 22, 56, 53), 'value': 187 }
],
'customer_id': 'tj@wheresmyemail.com'
}

๊ฒฐ๊ณผ ๋ฌธ์„œ์—๋Š” ํŠน์ • ๊ณ ๊ฐ์˜ ๋ชจ๋“  ์ฃผ๋ฌธ์— ๋Œ€ํ•œ ์„ธ๋ถ€ ์ •๋ณด๊ฐ€ ๊ณ ๊ฐ์˜ ์ด๋ฉ”์ผ ์ฃผ์†Œ๋ณ„๋กœ ๊ทธ๋ฃนํ™”๋˜์–ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ํŠœํ† ๋ฆฌ์–ผ์˜ ์ „์ฒด ์ฝ”๋“œ๋ฅผ ๋ณด๋ ค๋ฉด ์™„๋ฃŒ๋œ ๊ทธ๋ฃน ๋ฐ ์ „์ฒด ์•ฑ ์„ Github ์ฐธ์กฐํ•˜์„ธ์š”. ์—์„œ .

โ†ย ํ•„ํ„ฐ๋ง๋œ ํ•˜์œ„ ์ง‘ํ•ฉ