๋ณธ๋ฌธ์œผ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ
-
skycave's Blog
skycave's Blog
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
๋‹ซ๊ธฐ

๊ฒ€์ƒ‰

AI

[AI Paper] ๐Ÿ“„ CRAG: Comprehensive RAG Benchmark

By skycave
2026๋…„ 01์›” 25์ผ 7 Min Read
0

๐Ÿ“„ CRAG: Comprehensive RAG Benchmark

๐Ÿ“‹ ๋ฉ”ํƒ€ ์ •๋ณด

ํ•ญ๋ชฉ ๋‚ด์šฉ
๋…ผ๋ฌธ ์ œ๋ชฉ CRAG — Comprehensive RAG Benchmark
๋ฐœํ‘œ ์—ฐ๋„ 2024
ํ•™ํšŒ/์ €๋„ NeurIPS 2024 (Datasets and Benchmarks Track)
์ €์ž Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen ์™ธ 21๋ช… (์ด 27๋ช…)
์†Œ์† Meta AI, HKUST (GZ)
arXiv 2406.04744
GitHub facebookresearch/CRAG
OpenReview Q7lAqY41HH
KDD Cup 2024 Meta Comprehensive RAG Benchmark Challenge
๋ผ์ด์„ ์Šค CC BY-NC 4.0

๐ŸŽฏ ํ•œ์ค„ ์š”์•ฝ

์‹ค์ œ QA ํƒœ์Šคํฌ์˜ ๋‹ค์–‘์„ฑ๊ณผ ๋™์  ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•œ 4,409๊ฐœ QA ์Œ ๊ธฐ๋ฐ˜์˜ ์ข…ํ•ฉ์ ์ธ RAG ๋ฒค์น˜๋งˆํฌ๋กœ, ์ตœ์‹  LLM์ด 34% ์ดํ•˜, SOTA RAG ์†”๋ฃจ์…˜๋„ 63%์˜ ์ •ํ™•๋„๋งŒ ๋‹ฌ์„ฑํ•˜๋ฉฐ ์—ฌ์ „ํžˆ 16-25%์˜ ํ™˜๊ฐ(hallucination)์„ ๋ณด์ž„์„ ๋ฐํž˜.


๐Ÿ” ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ

๊ธฐ์กด RAG ๋ฒค์น˜๋งˆํฌ์˜ ํ•œ๊ณ„์ 

  1. ์ •์  ๋ฐ์ดํ„ฐ์…‹ ๋ฌธ์ œ
    • Natural Questions, TriviaQA, MS MARCO ๋“ฑ ๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ๋Š” ์ •์ ์ธ ์‚ฌ์‹ค ๊ธฐ๋ฐ˜
    • ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ณ€ํ•˜๋Š” ์ •๋ณด(์ฃผ๊ฐ€, ์Šคํฌ์ธ  ๊ฒฐ๊ณผ ๋“ฑ)๋ฅผ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•จ
  2. ๋‹จ์ผ ์†Œ์Šค ์˜์กด
    • ๋Œ€๋ถ€๋ถ„ ์›น ๊ฒ€์ƒ‰ ๋˜๋Š” KG ๊ฒ€์ƒ‰ ์ค‘ ํ•˜๋‚˜๋งŒ ๊ณ ๋ ค
    • ์‹ค์ œ RAG ์‹œ์Šคํ…œ์ด ์ง๋ฉดํ•˜๋Š” ๋‹ค์ค‘ ์†Œ์Šค ํ†ตํ•ฉ ๋ฌธ์ œ ๋ฏธ๋ฐ˜์˜
  3. ์—”ํ‹ฐํ‹ฐ ์ธ๊ธฐ๋„ ํŽธํ–ฅ
    • ์ฃผ๋กœ ์ธ๊ธฐ ์žˆ๋Š”(head) ์—”ํ‹ฐํ‹ฐ์— ์ง‘์ค‘
    • Long-tail ์—”ํ‹ฐํ‹ฐ์— ๋Œ€ํ•œ ํ‰๊ฐ€ ๋ถ€์กฑ
  4. ์งˆ๋ฌธ ์œ ํ˜•์˜ ์ œํ•œ
    • ๋‹จ์ˆœ ์‚ฌ์‹ค ์งˆ๋ฌธ ์œ„์ฃผ
    • ๋น„๊ต, ์ง‘๊ณ„, ๋‹ค์ค‘ ํ™‰ ์ถ”๋ก  ๋“ฑ ๋ณต์žกํ•œ ์งˆ๋ฌธ ์œ ํ˜• ๋ถ€์กฑ
  5. RAG ํŠนํ™” ํ‰๊ฐ€ ๋ถ€์žฌ
    • LLM ๋‹จ๋… ํ‰๊ฐ€์™€ RAG ์‹œ์Šคํ…œ ํ‰๊ฐ€์˜ ์ฐจ๋ณ„ํ™” ๋ถ€์กฑ
    • ๊ฒ€์ƒ‰ ํ’ˆ์งˆ๊ณผ ์ƒ์„ฑ ํ’ˆ์งˆ์˜ ํ†ตํ•ฉ ํ‰๊ฐ€ ๋ฏธํก

์™œ ์ด ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•œ๊ฐ€?

  • RAG๊ฐ€ LLM์˜ ์ง€์‹ ๋ถ€์กฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ์œ ๋งํ•œ ์ ‘๊ทผ๋ฒ•์œผ๋กœ ๋ถ€์ƒ
  • ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ๋กœ๋Š” RAG ์‹œ์Šคํ…œ์˜ ์‹ค์ œ ์„ฑ๋Šฅ์„ ์ œ๋Œ€๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์—†์Œ
  • ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ, ์‹œ๊ฐ„์  ๋™์ ์„ฑ, ์—”ํ‹ฐํ‹ฐ ์ธ๊ธฐ๋„, ์งˆ๋ฌธ ๋ณต์žก๋„๋ฅผ ํฌ๊ด„ํ•˜๋Š” ์ข…ํ•ฉ์  ๋ฒค์น˜๋งˆํฌ ํ•„์š”

๐Ÿ’ก ํ•ต์‹ฌ ์•„์ด๋””์–ด

๋™์  QA ๋ฒค์น˜๋งˆํฌ ์„ค๊ณ„ ์›์น™

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    CRAG ์„ค๊ณ„ ์›์น™                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  1. ์‹œ๊ฐ„์  ๋™์ ์„ฑ (Temporal Dynamism)                    โ”‚
โ”‚     - ์‹ค์‹œ๊ฐ„ ~ ์ˆ˜๋…„ ๋‹จ์œ„๋กœ ๋ณ€ํ•˜๋Š” ์ •๋ณด ํฌํ•จ               โ”‚
โ”‚                                                         โ”‚
โ”‚  2. ์—”ํ‹ฐํ‹ฐ ์ธ๊ธฐ๋„ ๋‹ค์–‘์„ฑ (Entity Popularity)             โ”‚
โ”‚     - Head / Torso / Tail ์—”ํ‹ฐํ‹ฐ ๊ท ํ˜• ์žˆ๊ฒŒ ํฌํ•จ          โ”‚
โ”‚                                                         โ”‚
โ”‚  3. ์งˆ๋ฌธ ๋ณต์žก๋„ ์ŠคํŽ™ํŠธ๋Ÿผ (Question Complexity)           โ”‚
โ”‚     - ๋‹จ์ˆœ ์‚ฌ์‹ค๋ถ€ํ„ฐ ๋‹ค์ค‘ ํ™‰ ์ถ”๋ก ๊นŒ์ง€ 8๊ฐ€์ง€ ์œ ํ˜•          โ”‚
โ”‚                                                         โ”‚
โ”‚  4. ๋‹ค์ค‘ ๋„๋ฉ”์ธ ์ปค๋ฒ„๋ฆฌ์ง€ (Domain Coverage)               โ”‚
โ”‚     - 5๊ฐœ ๋„๋ฉ”์ธ: ๊ธˆ์œต, ์Šคํฌ์ธ , ์Œ์•…, ์˜ํ™”, ๋ฐฑ๊ณผ์‚ฌ์ „      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4,409๊ฐœ QA ์Œ ๊ตฌ์„ฑ

  • Web ๊ธฐ๋ฐ˜ ์งˆ๋ฌธ: 2,425๊ฐœ (์›น ๊ฒ€์ƒ‰ ์‹œ๋ฎฌ๋ ˆ์ด์…˜)
  • KG ๊ธฐ๋ฐ˜ ์งˆ๋ฌธ: 1,984๊ฐœ (์ง€์‹ ๊ทธ๋ž˜ํ”„ ๊ฒ€์ƒ‰ ์‹œ๋ฎฌ๋ ˆ์ด์…˜)
  • Mock API ์ œ๊ณต: ์›น ๊ฒ€์ƒ‰ ๋ฐ KG ๊ฒ€์ƒ‰ ์‹œ๋ฎฌ๋ ˆ์ด์…˜

๊ฒ€์ƒ‰ ์ปจํ…์ธ  ๊ทœ๋ชจ

  • ์›น ํŽ˜์ด์ง€: ์งˆ๋ฌธ๋‹น ์ตœ๋Œ€ 50๊ฐœ์˜ ์ „์ฒด HTML ํŽ˜์ด์ง€
  • Mock KG: 260๋งŒ ๊ฐœ ์—”ํ‹ฐํ‹ฐ ํฌํ•จ
  • ์ด ์›น ํŽ˜์ด์ง€: ์•ฝ 22๋งŒ ๊ฐœ

๐Ÿ—๏ธ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์„ฑ

1. ์งˆ๋ฌธ ์œ ํ˜• (8๊ฐ€์ง€ ์นดํ…Œ๊ณ ๋ฆฌ)

์œ ํ˜• ์„ค๋ช… ์˜ˆ์‹œ
Simple ๋‹จ์ˆœ ์‚ฌ์‹ค ์งˆ๋ฌธ “Taylor Swift์˜ ์ƒ๋…„์›”์ผ์€?”
Simple w/ Condition ์กฐ๊ฑด๋ถ€ ๋‹จ์ˆœ ์งˆ๋ฌธ “2023๋…„ 1์›” 15์ผ Apple ์ฃผ๊ฐ€๋Š”?”
Comparison ๋‘ ์—”ํ‹ฐํ‹ฐ ๋น„๊ต “Adele๊ณผ Ed Sheeran ์ค‘ ๋ˆ„๊ฐ€ ๋จผ์ € ๋ฐ๋ท”ํ–ˆ๋‚˜?”
Aggregation ์ •๋ณด ์ง‘๊ณ„ ํ•„์š” “Leonardo DiCaprio๊ฐ€ ๋ฐ›์€ ์˜ค์Šค์นด์ƒ ๊ฐœ์ˆ˜๋Š”?”
Set ์ง‘ํ•ฉ ํ˜•ํƒœ ๋‹ต๋ณ€ “๋‚จ๋ฐ˜๊ตฌ์— ์žˆ๋Š” ๋Œ€๋ฅ™๋“ค์€?”
Multi-hop ๋‹ค์ค‘ ์ถ”๋ก  ๋‹จ๊ณ„ “BTS ๋ฆฌ๋”์˜ ์ถœ์ƒ ๋„์‹œ์˜ ์ธ๊ตฌ๋Š”?”
Post-processing ํ›„์ฒ˜๋ฆฌ ํ•„์š” “์ง€๋‚œ 5๋…„๊ฐ„ ๊ฐ€์žฅ ๋งŽ์ด ์ƒ์Šนํ•œ ์ฃผ์‹ TOP 3?”
False Premise ์ž˜๋ชป๋œ ์ „์ œ ํฌํ•จ “2025๋…„ ์›”๋“œ์ปต ์šฐ์ŠนํŒ€์€?” (๊ฐœ์ตœ ์•ˆ ๋จ)

์ฐธ๊ณ : Simple ๋ฐ Simple w/ Condition ์งˆ๋ฌธ์ด ์ „์ฒด์˜ ์•ฝ 43%๋ฅผ ์ฐจ์ง€

2. ๋„๋ฉ”์ธ (5๊ฐœ ์˜์—ญ)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     ๋„๋ฉ”์ธ๋ณ„ ํŠน์„ฑ                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                            โ”‚
โ”‚   ๋น ๋ฅธ ๋ณ€ํ™” โ†โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’ ๋А๋ฆฐ ๋ณ€ํ™”    โ”‚
โ”‚                                                            โ”‚
โ”‚   Finance    Sports    Music    Movie    Open Domain       โ”‚
โ”‚   (์‹ค์‹œ๊ฐ„)    (๋น ๋ฆ„)   (์ ์ง„์ )  (์ ์ง„์ )    (์•ˆ์ •์ )       โ”‚
โ”‚                                                            โ”‚
โ”‚   - ์ฃผ๊ฐ€      - ๊ฒฝ๊ธฐ๊ฒฐ๊ณผ  - ์•จ๋ฒ”๋ฐœ๋งค  - ๊ฐœ๋ด‰์ผ   - ์—ญ์‚ฌ์    โ”‚
โ”‚   - ํ™˜์œจ      - ์ˆœ์œ„      - ํˆฌ์–ด์ผ์ •  - ์ถœ์—ฐ์ง„   - ๊ณผํ•™์    โ”‚
โ”‚   - ์‹œ์žฅ๋™ํ–ฅ  - ์„ ์ˆ˜ํ†ต๊ณ„  - ์ˆ˜์ƒ๋‚ด์—ญ  - ์ˆ˜์ƒ๋‚ด์—ญ - ์ง€๋ฆฌ์    โ”‚
โ”‚                                                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3. ์‹œ๊ฐ„์  ๋™์ ์„ฑ (4๋‹จ๊ณ„)

๋™์ ์„ฑ ์ˆ˜์ค€ ๋ณ€ํ™” ์ฃผ๊ธฐ ์˜ˆ์‹œ
Real-time ์ดˆ~๋ถ„ ๋‹จ์œ„ ์‹ค์‹œ๊ฐ„ ์ฃผ๊ฐ€, ์‹ค์‹œ๊ฐ„ ๊ฒฝ๊ธฐ ์Šค์ฝ”์–ด
Fast-changing ์‹œ๊ฐ„~์ผ ๋‹จ์œ„ ์ผ์ผ ์ฃผ๊ฐ€, ๊ฒฝ๊ธฐ ๊ฒฐ๊ณผ
Slow-changing ์›”~๋…„ ๋‹จ์œ„ ์•จ๋ฒ” ๋ฐœ๋งค, ์˜ํ™” ๊ฐœ๋ด‰
Stable ๊ฑฐ์˜ ๋ถˆ๋ณ€ ์—ญ์‚ฌ์  ์‚ฌ์‹ค, ์ง€๋ฆฌ ์ •๋ณด

4. ์—”ํ‹ฐํ‹ฐ ์ธ๊ธฐ๋„ (3๋‹จ๊ณ„)

์ธ๊ธฐ๋„ ์„ค๋ช… LLM ํ•™์Šต ๋ฐ์ดํ„ฐ ํฌํ•จ ๊ฐ€๋Šฅ์„ฑ
Head ๋งค์šฐ ์œ ๋ช…ํ•œ ์—”ํ‹ฐํ‹ฐ ๋†’์Œ (์‚ฌ์ „ ํ•™์Šต์— ํฌํ•จ)
Torso ์ค‘๊ฐ„ ์ธ๊ธฐ๋„ ์—”ํ‹ฐํ‹ฐ ์ค‘๊ฐ„
Tail Long-tail ์—”ํ‹ฐํ‹ฐ ๋‚ฎ์Œ (RAG ์˜์กด ํ•„์š”)

5. KDD Cup 2024 Challenge ๊ตฌ์„ฑ

3๊ฐ€์ง€ ํƒœ์Šคํฌ:

  1. Task 1: ์›น ํŽ˜์ด์ง€์—์„œ ์ •๋ณด ์••์ถ•ํ•˜์—ฌ ์ •ํ™•ํ•œ ๋‹ต๋ณ€ ์ƒ์„ฑ
  2. Task 2: Mock KG์˜ ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ
  3. Task 3: ๊ด‘๋ฒ”์œ„ํ•œ ์›น ํŽ˜์ด์ง€์™€ API์—์„œ ํ•ต์‹ฌ ๋ฐ์ดํ„ฐ ์„ ํƒ ๋ฐ ํ†ตํ•ฉ

๋ฐ์ดํ„ฐ ๋ถ„ํ• :
– Validation: 30%
– Public Test: 30%
– Private Test: 40%
– ์ด 2,706๊ฐœ ์˜ˆ์ œ๊ฐ€ validation ๋ฐ public test๋กœ ๊ณต์œ 
– ์ด ์ƒ๊ธˆ: USD 31,500


๐Ÿ“Š ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ

์ฃผ์š” ์‹คํ—˜ ๊ฒฐ๊ณผ

1. LLM ๋‹จ๋… ์„ฑ๋Šฅ vs RAG ์„ฑ๋Šฅ

์‹œ์Šคํ…œ ์ •ํ™•๋„ ํ™˜๊ฐ๋ฅ 
GPT-4 Turbo (LLM only) โ‰ค34% –
GPT-4 Turbo + Naive RAG 44% –
SOTA Industry RAG 63% 16-25%
KDD Cup 2024 ์ตœ๊ณ  ์„ฑ์  51% 17-25%

2. ์—”ํ‹ฐํ‹ฐ ์ธ๊ธฐ๋„๋ณ„ ์„ฑ๋Šฅ (GPT-4 Turbo)

์ •ํ™•๋„ (Truthfulness)
โ”‚
โ”‚  21% โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ Head
โ”‚  11% โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ Torso
โ”‚   8% โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ Tail
โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

๋ฐœ๊ฒฌ: RAG ์ถ”๊ฐ€ ์‹œ Torso(+7%), Tail(+6%) ๊ฐœ์„ ๋˜๋‚˜ Head(-4%) ์„ฑ๋Šฅ ํ•˜๋ฝ

3. ์‹œ๊ฐ„์  ๋™์ ์„ฑ๋ณ„ ์„ฑ๋Šฅ

๋™์ ์„ฑ GPT-4 ์ •ํ™•๋„ ๋น„๊ณ 
Fast-changing <15% ๊ฐ€์žฅ ์–ด๋ ค์›€
Real-time ๋งค์šฐ ๋‚ฎ์Œ Finance, Sports ๋„๋ฉ”์ธ
Slow-changing ์ค‘๊ฐ„ –
Stable ์ƒ๋Œ€์  ์–‘ํ˜ธ –

4. ์งˆ๋ฌธ ์œ ํ˜•๋ณ„ ๋‚œ์ด๋„

๊ฐ€์žฅ ์–ด๋ ค์šด ์งˆ๋ฌธ ์œ ํ˜•:
– Set ์งˆ๋ฌธ (์ง‘ํ•ฉ ๋‹ต๋ณ€)
– Post-processing ํ•„์š” ์งˆ๋ฌธ
– False Premise ์งˆ๋ฌธ

5. ๋„๋ฉ”์ธ๋ณ„ ์„ฑ๋Šฅ

๋„๋ฉ”์ธ RAG Truthfulness ํŠน์ง•
Finance ๋‚ฎ์Œ ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ์˜์กด
Sports ๋‚ฎ์Œ ๋น ๋ฅธ ๋ณ€ํ™”
Music ์ค‘๊ฐ„ –
Movie ์ค‘๊ฐ„ –
Open ์ƒ๋Œ€์  ์–‘ํ˜ธ ์•ˆ์ •์  ์ •๋ณด

ํ‰๊ฐ€ ์ง€ํ‘œ

CRAG๋Š” ํ™˜๊ฐ(hallucination)๊ณผ ๋ฌด์‘๋‹ต(missing)์„ ๊ตฌ๋ถ„:
– ํ™˜๊ฐ ๋‹ต๋ณ€: ๋” ๋†’์€ ํŽ˜๋„ํ‹ฐ (์‚ฌ์šฉ์ž ์‹ ๋ขฐ ์†์ƒ)
– ๋ฌด์‘๋‹ต: ์ƒ๋Œ€์ ์œผ๋กœ ๋‚ฎ์€ ํŽ˜๋„ํ‹ฐ

Score = Correct - alpha * Hallucinated - beta * Missing
(alpha > beta, ํ™˜๊ฐ์— ๋” ํฐ ํŽ˜๋„ํ‹ฐ)

์‚ฐ์—…์šฉ RAG ์†”๋ฃจ์…˜ ํ‰๊ฐ€

์†”๋ฃจ์…˜ Truthfulness ์ง€์—ฐ ์‹œ๊ฐ„
Copilot Pro ์ตœ๊ณ  (~51%) ๊ฐ€์žฅ ๋†’์Œ (~11.6์ดˆ)
Perplexity ์ค‘์ƒ ์ค‘๊ฐ„
ChatGPT Plus (GPT-4o) ์ค‘์ƒ ์ค‘๊ฐ„

๐Ÿ’ช ๊ฐ•์  ๋ฐ ๊ธฐ์—ฌ

ํ•™์ˆ ์  ๊ธฐ์—ฌ

  1. ์ตœ์ดˆ์˜ ์ข…ํ•ฉ์  RAG ๋ฒค์น˜๋งˆํฌ
    • ์‹œ๊ฐ„์  ๋™์ ์„ฑ, ์—”ํ‹ฐํ‹ฐ ์ธ๊ธฐ๋„, ์งˆ๋ฌธ ๋ณต์žก๋„๋ฅผ ๋ชจ๋‘ ๊ณ ๋ ค
    • ์‹ค์ œ RAG ์‹œ์Šคํ…œ์˜ ๋„์ „ ๊ณผ์ œ๋ฅผ ํ˜„์‹ค์ ์œผ๋กœ ๋ฐ˜์˜
  2. ์ƒˆ๋กœ์šด ํ‰๊ฐ€ ํŒจ๋Ÿฌ๋‹ค์ž„
    • ํ™˜๊ฐ๊ณผ ๋ฌด์‘๋‹ต์˜ ์ฐจ๋ณ„ํ™”๋œ ํ‰๊ฐ€
    • RAG ํŠนํ™” ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ ์ œ์‹œ
  3. ๋Œ€๊ทœ๋ชจ ์ปค๋ฎค๋‹ˆํ‹ฐ ๊ฒ€์ฆ
    • KDD Cup 2024 ์ฑŒ๋ฆฐ์ง€๋กœ ์ˆ˜์ฒœ ๋ช… ์ฐธ๊ฐ€
    • 31,500 USD ์ƒ๊ธˆ ๊ทœ๋ชจ

์‹ค์šฉ์  ๊ธฐ์—ฌ

  1. Mock API ์ œ๊ณต
    • ์›น ๊ฒ€์ƒ‰ ๋ฐ KG ๊ฒ€์ƒ‰ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
    • ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ์‹คํ—˜ ํ™˜๊ฒฝ
  2. ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ ์ปค๋ฒ„๋ฆฌ์ง€
    • ์‚ฐ์—…๋ณ„ RAG ์‹œ์Šคํ…œ ํ‰๊ฐ€ ๊ฐ€๋Šฅ
    • ๋„๋ฉ”์ธ ํŠนํ™” ๊ฐœ์„ ์  ์‹๋ณ„
  3. ์˜คํ”ˆ์†Œ์Šค ๊ณต๊ฐœ
    • ๋ฐ์ดํ„ฐ์…‹, ์ฝ”๋“œ, ํ‰๊ฐ€ ๋„๊ตฌ ๋ชจ๋‘ ๊ณต๊ฐœ
    • ์—ฐ๊ตฌ ์ปค๋ฎค๋‹ˆํ‹ฐ ์ ‘๊ทผ์„ฑ ํ™•๋ณด

์ธ์‚ฌ์ดํŠธ ์ œ๊ณต

  1. RAG์˜ ํ•œ๊ณ„ ๋ช…ํ™•ํ™”
    • ์ตœ์‹  SOTA๋„ 63% ์ •ํ™•๋„, 16-25% ํ™˜๊ฐ๋ฅ 
    • ํŠนํžˆ ๋™์  ์ •๋ณด, long-tail ์—”ํ‹ฐํ‹ฐ์—์„œ ์ทจ์•ฝ
  2. ๊ฐœ์„  ๋ฐฉํ–ฅ ์ œ์‹œ
    • ์‹ค์‹œ๊ฐ„ ์ •๋ณด ์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ ๊ฐ•ํ™” ํ•„์š”
    • Long-tail ์—”ํ‹ฐํ‹ฐ ๊ฒ€์ƒ‰ ์ •๋ฐ€๋„ ํ–ฅ์ƒ ํ•„์š”
    • ๋ณต์žกํ•œ ์ถ”๋ก  ๋Šฅ๋ ฅ ๊ฐœ์„  ํ•„์š”

โš ๏ธ ํ•œ๊ณ„์ 

๋ฒค์น˜๋งˆํฌ ์„ค๊ณ„ ํ•œ๊ณ„

  1. 1๋‹จ๊ณ„ ๊ฒ€์ƒ‰ ํ‰๊ฐ€ ๋ฏธํฌํ•จ
    • ๊ฒ€์ƒ‰ ํ›„๋ณด๊ตฐ ๊ตฌ์„ฑ(retrieval candidate pool) ์ง์ ‘ ํ‰๊ฐ€ ์•ˆ ํ•จ
    • KDD Cup 3๊ฐœ์›” ์ผ์ • ๊ณ ๋ คํ•œ ์„ค๊ณ„ ๊ฒฐ์ •
    • ์‚ฌ์šฉ์ž๊ฐ€ 22๋งŒ ์›น ํŽ˜์ด์ง€๋ฅผ corpus๋กœ retriever ๊ตฌ์ถ• ๊ฐ€๋Šฅ
  2. ์˜์–ด ์ค‘์‹ฌ
    • ๋‹ค๊ตญ์–ด(multilingual) ์ง€์› ๋ฏธํก
    • ํ–ฅํ›„ ํ™•์žฅ ๊ณ„ํš ์ค‘
  3. ๋‹จ์ผ ํ„ด QA ํ•œ์ •
    • ๋ฉ€ํ‹ฐํ„ด ๋Œ€ํ™” ์‹œ๋‚˜๋ฆฌ์˜ค ๋ฏธํฌํ•จ
    • ์‹ค์ œ ์‚ฌ์šฉ ํŒจํ„ด๊ณผ ์ฐจ์ด
  4. ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜
    • ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ(์ด๋ฏธ์ง€, ๋น„๋””์˜ค ๋“ฑ) ๋ฏธ์ง€์›
    • ํ–ฅํ›„ ํ™•์žฅ ๊ณ„ํš ์ค‘

๋ฐ์ดํ„ฐ ๊ด€๋ จ ํ•œ๊ณ„

  1. Mock ๋ฐ์ดํ„ฐ ์˜์กด
    • ์‹ค์ œ ๊ฒ€์ƒ‰ ์—”์ง„/KG์™€ ์ฐจ์ด ๊ฐ€๋Šฅ
    • ๊ฒ€์ƒ‰ ํ’ˆ์งˆ์˜ ํ˜„์‹ค์„ฑ ์ œํ•œ
  2. ์‹œ๊ฐ„ ๊ฒฝ๊ณผ์— ๋”ฐ๋ฅธ ์ง„๋ถ€ํ™”
    • ์ •๋‹ต์ด ๋ณ€ํ•  ์ˆ˜ ์žˆ๋Š” ๋™์  ์งˆ๋ฌธ ์กด์žฌ
    • ์ง€์†์  ์—…๋ฐ์ดํŠธ ํ•„์š”
  3. ๋„๋ฉ”์ธ ์ œํ•œ
    • 5๊ฐœ ๋„๋ฉ”์ธ์œผ๋กœ ํ•œ์ •
    • ์˜๋ฃŒ, ๋ฒ•๋ฅ  ๋“ฑ ์ „๋ฌธ ๋„๋ฉ”์ธ ๋ฏธํฌํ•จ

ํ‰๊ฐ€ ์ง€ํ‘œ ํ•œ๊ณ„

  1. Exact Match ๊ธฐ๋ฐ˜
    • ์˜๋ฏธ์ ์œผ๋กœ ๋™๋“ฑํ•œ ๋‹ต๋ณ€ ์ฒ˜๋ฆฌ ์–ด๋ ค์›€
    • ๋ถ€๋ถ„ ์ •๋‹ต ํ‰๊ฐ€ ์ œํ•œ
  2. ํ™˜๊ฐ ํŒ์ • ๊ธฐ์ค€
    • ์ž๋™ํ™”๋œ ํ™˜๊ฐ ํƒ์ง€์˜ ํ•œ๊ณ„
    • Human evaluation ๋น„์šฉ ๋ฌธ์ œ

ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

๋ฐฉํ–ฅ ์„ค๋ช…
๋‹ค๊ตญ์–ด ํ™•์žฅ ๋‹ค์–‘ํ•œ ์–ธ์–ด์˜ QA ์Œ ์ถ”๊ฐ€
๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ํ™•์žฅ ์ด๋ฏธ์ง€, ๋น„๋””์˜ค ํฌํ•จ (CRAG-MM ์ง„ํ–‰ ์ค‘)
๋ฉ€ํ‹ฐํ„ด ๋Œ€ํ™” ๋Œ€ํ™” ์ปจํ…์ŠคํŠธ์—์„œ์˜ RAG ํ‰๊ฐ€
Tail ์—”ํ‹ฐํ‹ฐ ๊ฐœ์„  Long-tail ์ •๋ณด ๊ฒ€์ƒ‰ ์ •ํ™•๋„ ํ–ฅ์ƒ
ํ™˜๊ฐ ๊ฐ์†Œ ๋” ๊ฐ•๊ฑดํ•œ ๊ทธ๋ผ์šด๋”ฉ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๊ฐœ๋ฐœ
์‹œ๊ฐ„์  ์ถ”๋ก  ๊ฐ•ํ™” ์‹ค์‹œ๊ฐ„ ์ •๋ณด ์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ ํ–ฅ์ƒ

๐Ÿ”— ๊ด€๋ จ ๋…ผ๋ฌธ

๊ธฐ์กด QA ๋ฒค์น˜๋งˆํฌ

๋…ผ๋ฌธ ํŠน์ง• CRAG์™€์˜ ์ฐจ์ด
Natural Questions (2019) Google ๊ฒ€์ƒ‰ ๋กœ๊ทธ ๊ธฐ๋ฐ˜ ์‹ค์ œ ์งˆ๋ฌธ ์ •์ , ๋‹จ์ผ ์†Œ์Šค
TriviaQA (2017) ํŠธ๋ฆฌ๋น„์•„ ์Šคํƒ€์ผ, ๋‹ค์ค‘ ์†Œ์Šค ์ •์ , ์ œํ•œ๋œ ๋ณต์žก๋„
HotpotQA (2018) ๋ฉ€ํ‹ฐํ™‰ ์ถ”๋ก  ์ •์ , KG ๋ฏธํฌํ•จ
MS MARCO (2016) ๋Œ€๊ทœ๋ชจ ์ฝ๊ธฐ ์ดํ•ด ์ •์ , ๋™์ ์„ฑ ๋ฏธ๊ณ ๋ ค
QALD-10 (2013) KG ๊ธฐ๋ฐ˜ QA ์›น ๊ฒ€์ƒ‰ ๋ฏธํฌํ•จ

RAG ๊ด€๋ จ ์—ฐ๊ตฌ

  • RAG: Retrieval-Augmented Generation (Lewis et al., 2020)
  • REALM (Guu et al., 2020)
  • RETRO (Borgeaud et al., 2022)
  • Self-RAG (Asai et al., 2023)

ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ

  • RAGCHECKER (NeurIPS 2024) – RAG ์‹œ์Šคํ…œ ์„ธ๋ถ„ํ™” ํ‰๊ฐ€
  • RAGAS – RAG ํ‰๊ฐ€ ์ž๋™ํ™” ํ”„๋ ˆ์ž„์›Œํฌ

๐Ÿ’ป ์‹ค๋ฌด ์ ์šฉ ํฌ์ธํŠธ

RAG ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ ์‹œ ๊ณ ๋ ค์‚ฌํ•ญ

1. ๋™์  ์ •๋ณด ์ฒ˜๋ฆฌ ์ „๋žต

# ์‹œ๊ฐ„์  ๋™์ ์„ฑ์— ๋”ฐ๋ฅธ ์ฒ˜๋ฆฌ ์ „๋žต ์˜ˆ์‹œ
def get_retrieval_strategy(question_type):
    if is_realtime(question_type):
        return "live_api_call"  # ์‹ค์‹œ๊ฐ„ API ํ˜ธ์ถœ
    elif is_fast_changing(question_type):
        return "cached_with_short_ttl"  # ์งง์€ TTL ์บ์‹œ
    elif is_slow_changing(question_type):
        return "cached_with_long_ttl"  # ๊ธด TTL ์บ์‹œ
    else:
        return "static_knowledge_base"  # ์ •์  ์ง€์‹๋ฒ ์ด์Šค

2. ์—”ํ‹ฐํ‹ฐ ์ธ๊ธฐ๋„ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ ์กฐ์ •

  • Head ์—”ํ‹ฐํ‹ฐ: LLM ๋‚ด๋ถ€ ์ง€์‹ ํ™œ์šฉ ๊ฐ€๋Šฅ, ๊ฒ€์ƒ‰ ๋ณด์กฐ์ 
  • Torso/Tail ์—”ํ‹ฐํ‹ฐ: ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์— ๋” ์˜์กด, ๊ฒ€์ƒ‰ ํ’ˆ์งˆ ์ค‘์š”

3. ์งˆ๋ฌธ ์œ ํ˜•๋ณ„ ํŒŒ์ดํ”„๋ผ์ธ ์„ค๊ณ„

์งˆ๋ฌธ ์œ ํ˜• ๊ถŒ์žฅ ํŒŒ์ดํ”„๋ผ์ธ
Simple ๋‹จ์ผ ๊ฒ€์ƒ‰ -> ์ง์ ‘ ๋‹ต๋ณ€
Comparison ๋‹ค์ค‘ ๊ฒ€์ƒ‰ -> ๋น„๊ต ๋กœ์ง
Aggregation ๋‹ค์ค‘ ๊ฒ€์ƒ‰ -> ์ง‘๊ณ„ ์ฒ˜๋ฆฌ
Multi-hop ๋ฐ˜๋ณต ๊ฒ€์ƒ‰ -> ์ฒด์ธ ์ถ”๋ก 
False Premise ์ „์ œ ๊ฒ€์ฆ -> ์กฐ๊ฑด๋ถ€ ๋‹ต๋ณ€

4. ํ™˜๊ฐ ๋ฐฉ์ง€ ์ „๋žต

  1. ์ถœ์ฒ˜ ๋ช…์‹œ: ๋‹ต๋ณ€์— ๊ฒ€์ƒ‰ ์ถœ์ฒ˜ ํฌํ•จ
  2. ์‹ ๋ขฐ๋„ ์ ์ˆ˜: ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ์‹ ๋ขฐ๋„ ๊ธฐ๋ฐ˜ ๋‹ต๋ณ€ ์กฐ์ ˆ
  3. “๋ชจ๋ฅด๊ฒ ์Œ” ์˜ต์…˜: ํ™•์‹คํ•˜์ง€ ์•Š์„ ๋•Œ ์ธ์ •
  4. Fact-checking: ์ƒ์„ฑ๋œ ๋‹ต๋ณ€ ๊ฒ€์ฆ ๋‹จ๊ณ„ ์ถ”๊ฐ€

5. ํ‰๊ฐ€ ์ฒดํฌ๋ฆฌ์ŠคํŠธ

[ ] ๋„๋ฉ”์ธ๋ณ„ ์„ฑ๋Šฅ ๋ถ„๋ฆฌ ์ธก์ •
[ ] ์‹œ๊ฐ„์  ๋™์ ์„ฑ๋ณ„ ์„ฑ๋Šฅ ๋ถ„์„
[ ] ์—”ํ‹ฐํ‹ฐ ์ธ๊ธฐ๋„๋ณ„ ์„ฑ๋Šฅ ๋ถ„์„
[ ] ์งˆ๋ฌธ ๋ณต์žก๋„๋ณ„ ์„ฑ๋Šฅ ๋ถ„์„
[ ] ํ™˜๊ฐ๋ฅ  vs ๋ฌด์‘๋‹ต๋ฅ  ๊ตฌ๋ถ„ ์ธก์ •
[ ] ๊ฒ€์ƒ‰ ํ’ˆ์งˆ๊ณผ ์ƒ์„ฑ ํ’ˆ์งˆ ๋ถ„๋ฆฌ ํ‰๊ฐ€

CRAG ๋ฒค์น˜๋งˆํฌ ํ™œ์šฉ ๋ฐฉ๋ฒ•

์„ค์น˜ ๋ฐ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ

# GitHub์—์„œ CRAG ํด๋ก 
git clone https://github.com/facebookresearch/CRAG.git
cd CRAG

# ๋ฐ์ดํ„ฐ์…‹ ๋‹ค์šด๋กœ๋“œ (๋ผ์ด์„ ์Šค ๋™์˜ ํ•„์š”)
# CC BY-NC 4.0 ๋ผ์ด์„ ์Šค ํ™•์ธ

ํ™œ์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค

  1. ์‹œ์Šคํ…œ ํ‰๊ฐ€: ์ž์ฒด RAG ์‹œ์Šคํ…œ์„ CRAG๋กœ ํ‰๊ฐ€ํ•˜์—ฌ ์•ฝ์  ํŒŒ์•…
  2. A/B ํ…Œ์ŠคํŠธ: ๊ฐœ์„  ์ „ํ›„ CRAG ์ ์ˆ˜ ๋น„๊ต
  3. ๋„๋ฉ”์ธ ํŠนํ™”: ํŠน์ • ๋„๋ฉ”์ธ subset์œผ๋กœ ์ง‘์ค‘ ํ‰๊ฐ€
  4. ๊ฒฝ์Ÿ ๋ฒค์น˜๋งˆํ‚น: ๋‹ค๋ฅธ ์‹œ์Šคํ…œ๊ณผ ๊ฐ๊ด€์  ๋น„๊ต

์‹ค๋ฌด ์ธ์‚ฌ์ดํŠธ

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    CRAG์—์„œ ๋ฐฐ์šด ์‹ค๋ฌด ๊ตํ›ˆ                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  * RAG๊ฐ€ ํ•ญ์ƒ ๋„์›€์ด ๋˜์ง€ ์•Š์Œ                                   โ”‚
โ”‚    - Head ์—”ํ‹ฐํ‹ฐ์—์„œ ์˜คํžˆ๋ ค ์„ฑ๋Šฅ ์ €ํ•˜ ๊ฐ€๋Šฅ                        โ”‚
โ”‚    - ์ ์ ˆํ•œ ๊ฒ€์ƒ‰ vs ์ง์ ‘ ์ƒ์„ฑ ํŒ๋‹จ ํ•„์š”                          โ”‚
โ”‚                                                                 โ”‚
โ”‚  * ํ™˜๊ฐ ์ธก์ •์ด ํ•„์ˆ˜                                              โ”‚
โ”‚    - ์ •ํ™•๋„๋งŒ์œผ๋กœ ๋ถ€์กฑ, ํ™˜๊ฐ๋ฅ  ๋ณ„๋„ ์ธก์ •                          โ”‚
โ”‚    - SOTA๋„ 17-25% ํ™˜๊ฐ๋ฅ  ์œ ์ง€                                   โ”‚
โ”‚                                                                 โ”‚
โ”‚  * ๋„๋ฉ”์ธ๋ณ„ ์ „๋žต ํ•„์š”                                            โ”‚
โ”‚    - Finance/Sports: ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ์†Œ์Šค ํ•„์š”                     โ”‚
โ”‚    - Open Domain: ๊ธฐ์กด ์ง€์‹ ํ™œ์šฉ ๊ฐ€๋Šฅ                            โ”‚
โ”‚                                                                 โ”‚
โ”‚  * Long-tail ์—”ํ‹ฐํ‹ฐ ํŠน๋ณ„ ๊ด€๋ฆฌ                                    โ”‚
โ”‚    - RAG์˜ ๊ฐ€์žฅ ํฐ ๊ฐ€์น˜๋Š” Tail ์—”ํ‹ฐํ‹ฐ                             โ”‚
โ”‚    - ๊ฒ€์ƒ‰ ํ’ˆ์งˆ ํ–ฅ์ƒ์ด ํ•ต์‹ฌ                                       โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿท๏ธ Tags

#RAG #Benchmark #NeurIPS2024 #MetaAI #QuestionAnswering #LLM #Retrieval #KnowledgeGraph #Hallucination #TemporalDynamism #EntityPopularity #KDDCup2024 #FactualQA #Evaluation #Dataset


๐Ÿ“š ์ฐธ๊ณ  ์ž๋ฃŒ

  • ๋…ผ๋ฌธ (arXiv)
  • NeurIPS 2024 Proceedings
  • GitHub Repository
  • KDD Cup 2024 Challenge
  • OpenReview
์ž‘์„ฑ์ž

skycave

Follow Me
๋‹ค๋ฅธ ๊ธฐ์‚ฌ
Previous

[AI Paper] ๐Ÿ“„ AutoAgents: A Framework for Automatic Agent Generation

Next

[AI Paper] ๐Ÿ“„ Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

๋Œ“๊ธ€ ์—†์Œ! ์ฒซ ๋Œ“๊ธ€์„ ๋‚จ๊ฒจ๋ณด์„ธ์š”.

๋‹ต๊ธ€ ๋‚จ๊ธฐ๊ธฐ ์‘๋‹ต ์ทจ์†Œ

์ด๋ฉ”์ผ ์ฃผ์†Œ๋Š” ๊ณต๊ฐœ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•„์ˆ˜ ํ•„๋“œ๋Š” *๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค

์ตœ์‹ ๊ธ€

  • ๐Ÿ“Š ์ผ์ผ ๋‰ด์Šค ๊ฐ์„ฑ ๋ฆฌํฌํŠธ – 2026-01-28
  • AI ์‹œ์Šคํ…œ์˜ ๋ฌธ๋งฅ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰(Contextual Retrieval) | Anthropic
  • “Think” ํˆด: Claude๊ฐ€ ๋ฉˆ์ถฐ์„œ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ | Anthropic
  • Claude Code ๋ชจ๋ฒ” ์‚ฌ๋ก€ \ Anthropic
  • ์šฐ๋ฆฌ๊ฐ€ ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ์—ฐ๊ตฌ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ ๋ฐฉ๋ฒ•
Copyright 2026 — skycave's Blog. All rights reserved. Blogsy WordPress Theme