๋ณธ๋ฌธ์œผ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ
-
skycave's Blog
skycave's Blog
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
๋‹ซ๊ธฐ

๊ฒ€์ƒ‰

AI

[AI Paper] ๐Ÿ“„ Self-RAG: Learning to Retrieve, Generate, and Critique

By skycave
2026๋…„ 01์›” 25์ผ 11 Min Read
0

๐Ÿ“„ Self-RAG: Learning to Retrieve, Generate, and Critique

๐Ÿ“‹ ๋ฉ”ํƒ€ ์ •๋ณด

ํ•ญ๋ชฉ ๋‚ด์šฉ
๋…ผ๋ฌธ ์ œ๋ชฉ Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
์ €์ž Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi
์†Œ์† University of Washington, IBM Research AI, Allen Institute for AI
๋ฐœํ‘œ ICLR 2024 (Oral Presentation, ์ƒ์œ„ 1%)
์—ฐ๋„ 2024
arXiv 2310.11511
GitHub AkariAsai/self-rag
ํ”„๋กœ์ ํŠธ selfrag.github.io
HuggingFace selfrag/selfrag_llama2_7b, selfrag/selfrag_llama2_13b

๐ŸŽฏ ํ•œ์ค„ ์š”์•ฝ

Self-RAG๋Š” LLM์ด Reflection Token์„ ํ†ตํ•ด ๊ฒ€์ƒ‰ ํ•„์š”์„ฑ์„ ์Šค์Šค๋กœ ํŒ๋‹จํ•˜๊ณ , ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์˜ ๊ด€๋ จ์„ฑ๊ณผ ์ƒ์„ฑ ๊ฒฐ๊ณผ์˜ ํ’ˆ์งˆ์„ ์ž์ฒด ํ‰๊ฐ€ํ•˜์—ฌ ์‚ฌ์‹ค์„ฑ๊ณผ ์ •ํ™•๋„๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ž๊ธฐ ๋ฐ˜์„ฑ์ (Self-Reflective) RAG ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค.


๐Ÿ” ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ

LLM์˜ ๊ทผ๋ณธ์  ํ•œ๊ณ„

LLM์€ ํŒŒ๋ผ๋ฏธํ„ฐ์— ์ธ์ฝ”๋”ฉ๋œ ์ง€์‹(Parametric Knowledge)์—๋งŒ ์˜์กดํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค:

  • ์‚ฌ์‹ค์  ์˜ค๋ฅ˜(Hallucination): ์กด์žฌํ•˜์ง€ ์•Š๋Š” ์ •๋ณด๋ฅผ ๋งˆ์น˜ ์‚ฌ์‹ค์ฒ˜๋Ÿผ ์ƒ์„ฑ
  • ์ง€์‹ ๋‹จ์ ˆ(Knowledge Cutoff): ํ•™์Šต ๋ฐ์ดํ„ฐ ์ดํ›„์˜ ์ •๋ณด์— ๋Œ€์‘ ๋ถˆ๊ฐ€
  • ์ „๋ฌธ ๋„๋ฉ”์ธ ์ทจ์•ฝ์„ฑ: ํŠน์ˆ˜ ๋ถ„์•ผ์˜ ์„ธ๋ถ€ ์ง€์‹ ๋ถ€์กฑ

๊ธฐ์กด RAG์˜ ํ•œ๊ณ„์ 

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     ๊ธฐ์กด RAG์˜ ๊ตฌ์กฐ์  ๋ฌธ์ œ                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                     โ”‚
โ”‚  Query โ”€โ”€โ–บ [๋ฌด์กฐ๊ฑด ๊ฒ€์ƒ‰] โ”€โ”€โ–บ Top-K ๋ฌธ์„œ โ”€โ”€โ–บ LLM ์ƒ์„ฑ                 โ”‚
โ”‚                                                                     โ”‚
โ”‚  ๋ฌธ์ œ์ :                                                            โ”‚
โ”‚  1. ๊ฒ€์ƒ‰ ํ•„์š”์„ฑ ํŒ๋‹จ ์—†์ด ํ•ญ์ƒ ๊ฒ€์ƒ‰ (๋น„ํšจ์œจ)                         โ”‚
โ”‚  2. ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์˜ ๊ด€๋ จ์„ฑ ๊ฒ€์ฆ ์—†์Œ (๋…ธ์ด์ฆˆ ์œ ์ž…)                       โ”‚
โ”‚  3. ์ƒ์„ฑ ๋‚ด์šฉ์ด ๊ฒ€์ƒ‰ ๋ฌธ์„œ์— ๊ทผ๊ฑฐํ•˜๋Š”์ง€ ํ™•์ธ ๋ถˆ๊ฐ€                     โ”‚
โ”‚  4. ํƒœ์Šคํฌ๋ณ„ ์œ ์—ฐํ•œ ๋™์ž‘ ์กฐ์ • ๋ถˆ๊ฐ€                                   โ”‚
โ”‚                                                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

1. ๋ฌด๋ถ„๋ณ„ํ•œ ๊ฒ€์ƒ‰ (Indiscriminate Retrieval)

  • ๊ธฐ์กด RAG๋Š” ๊ฒ€์ƒ‰์ด ํ•„์š”ํ•œ์ง€ ์—ฌ๋ถ€์™€ ๊ด€๊ณ„์—†์ด ํ•ญ์ƒ ๊ณ ์ •๋œ ์ˆ˜์˜ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰
  • ๊ฐ„๋‹จํ•œ ์ƒ์‹ ์งˆ๋ฌธ์—๋„ ๋ถˆํ•„์š”ํ•˜๊ฒŒ ๊ฒ€์ƒ‰ ์ˆ˜ํ–‰ํ•˜์—ฌ ํšจ์œจ์„ฑ ์ €ํ•˜
  • ์˜ˆ: “2+2๋Š” ๋ฌด์—‡์ธ๊ฐ€?” ๊ฐ™์€ ์งˆ๋ฌธ์—๋„ Wikipedia ๊ฒ€์ƒ‰

2. ๊ฒ€์ƒ‰ ํ’ˆ์งˆ ํ‰๊ฐ€ ๋ถ€์žฌ

  • ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๊ฐ€ ์‹ค์ œ๋กœ ์ฟผ๋ฆฌ์™€ ๊ด€๋ จ์„ฑ์ด ์žˆ๋Š”์ง€ ํ‰๊ฐ€ํ•˜์ง€ ์•Š์Œ
  • ๋ฌด๊ด€ํ•˜๊ฑฐ๋‚˜ ์ž˜๋ชป๋œ ์ •๋ณด๊ฐ€ ์ƒ์„ฑ ๊ณผ์ •์— ํฌํ•จ๋˜์–ด ์˜คํžˆ๋ ค ํ’ˆ์งˆ ์ €ํ•˜
  • Retriever์˜ ํ•œ๊ณ„๋กœ ๋…ธ์ด์ฆˆ๊ฐ€ ๊ทธ๋Œ€๋กœ LLM์— ์ „๋‹ฌ๋จ

3. ์ƒ์„ฑ-์ฆ๊ฑฐ ๋ถˆ์ผ์น˜

  • ์ƒ์„ฑ๋œ ๋‚ด์šฉ์ด ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์— ์˜ํ•ด ์‹ค์ œ๋กœ ์ง€์ง€(Support)๋˜๋Š”์ง€ ๊ฒ€์ฆ ๋ถˆ๊ฐ€
  • LLM์ด ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๋ฌด์‹œํ•˜๊ณ  ์ž์ฒด ์ง€์‹์œผ๋กœ ๋‹ต๋ณ€ํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐœ์ƒ
  • ์ธ์šฉ๊ณผ ๋‚ด์šฉ ๊ฐ„์˜ ๋ถˆ์ผ์น˜ ๋ฌธ์ œ

4. ์œ ์—ฐ์„ฑ ๋ถ€์กฑ

  • ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ ์š”๊ตฌ์‚ฌํ•ญ์— ๋งž๊ฒŒ ๋ชจ๋ธ ๋™์ž‘์„ ์กฐ์ •ํ•  ์ˆ˜ ์—†์Œ
  • ์‚ฌ์‹ค ๊ฒ€์ฆ ํƒœ์Šคํฌ vs ์ฐฝ์˜์  ๊ธ€์“ฐ๊ธฐ ๋“ฑ ๋‹ค๋ฅธ ์š”๊ตฌ์— ๋™์ผํ•˜๊ฒŒ ๋Œ€์‘

๐Ÿ’ก ํ•ต์‹ฌ ์•„์ด๋””์–ด

1. Self-Reflection Tokens (์ž๊ธฐ ์„ฑ์ฐฐ ํ† ํฐ)

Self-RAG์˜ ํ•ต์‹ฌ ํ˜์‹ ์€ Reflection Token์ด๋ผ๋Š” ํŠน์ˆ˜ ํ† ํฐ์„ ๋ชจ๋ธ ์–ดํœ˜์— ์ถ”๊ฐ€ํ•˜์—ฌ, ๋ชจ๋ธ์ด ์ž์‹ ์˜ ๋™์ž‘์„ ์ œ์–ดํ•˜๊ณ  ํ‰๊ฐ€ํ•˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Reflection Tokens ์ฒด๊ณ„                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚   [Retrieve]    โ”‚  ๊ฒ€์ƒ‰์ด ํ•„์š”ํ•œ์ง€ ํŒ๋‹จ (On-demand)              โ”‚
โ”‚   [ISREL]       โ”‚  ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์˜ ๊ด€๋ จ์„ฑ ํ‰๊ฐ€                     โ”‚
โ”‚   [ISSUP]       โ”‚  ์ƒ์„ฑ ๋‚ด์šฉ์ด ๋ฌธ์„œ์— ์˜ํ•ด ์ง€์ง€๋˜๋Š”์ง€ ํ‰๊ฐ€       โ”‚
โ”‚   [ISUSE]       โ”‚  ์ตœ์ข… ๋‹ต๋ณ€์˜ ์ „์ฒด์  ์œ ์šฉ์„ฑ ํ‰๊ฐ€                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

ํ•ต์‹ฌ ์žฅ์ :
– ๋ณ„๋„์˜ ์™ธ๋ถ€ ํ‰๊ฐ€ ๋ชจ๋ธ ์—†์ด ๋‹จ์ผ ๋ชจ๋ธ ๋‚ด์—์„œ ์ฒ˜๋ฆฌ
– ์ถ”๋ก  ์‹œ์ ์— ํ† ํฐ ๊ฐ€์ค‘์น˜ ์กฐ์ •์œผ๋กœ ๋™์ž‘ ์ œ์–ด ๊ฐ€๋Šฅ
– End-to-end ํ•™์Šต์œผ๋กœ ์ตœ์ ํ™”๋œ ์„ฑ๋Šฅ

2. Adaptive Retrieval (์ ์‘์  ๊ฒ€์ƒ‰)

๊ธฐ์กด RAG: Query โ†’ [ํ•ญ์ƒ ๊ฒ€์ƒ‰] โ†’ Documents โ†’ Generation
Self-RAG: Query โ†’ [๊ฒ€์ƒ‰ ํ•„์š”?] โ†’ [์„ ํƒ์  ๊ฒ€์ƒ‰] โ†’ [ํ’ˆ์งˆ ํ‰๊ฐ€] โ†’ Generation
  • On-demand Retrieval: ๊ฒ€์ƒ‰์ด ์‹ค์ œ๋กœ ํ•„์š”ํ•œ ๊ฒฝ์šฐ์—๋งŒ ์ˆ˜ํ–‰
  • Multi-step Retrieval: ์ƒ์„ฑ ์ค‘ ์—ฌ๋Ÿฌ ๋ฒˆ ๊ฒ€์ƒ‰ํ•˜๊ฑฐ๋‚˜ ์™„์ „ํžˆ ๊ฑด๋„ˆ๋›ธ ์ˆ˜ ์žˆ์Œ
  • Task-adaptive: ํƒœ์Šคํฌ ํŠน์„ฑ์— ๋”ฐ๋ผ ๊ฒ€์ƒ‰ ๋นˆ๋„ ์ž๋™ ์กฐ์ ˆ
    • ์‚ฌ์‹ค ๊ฒ€์ฆ ํƒœ์Šคํฌ: ๋” ์ž์ฃผ ๊ฒ€์ƒ‰ํ•˜์—ฌ ์ •ํ™•์„ฑ ํ™•๋ณด
    • ์ฐฝ์˜์  ์ž‘๋ฌธ: ๊ฒ€์ƒ‰ ์ตœ์†Œํ™”ํ•˜์—ฌ ์œ ์ฐฝ์„ฑ ์œ ์ง€

3. Critique and Generation (๋น„ํ‰๊ณผ ์ƒ์„ฑ์˜ ํ†ตํ•ฉ)

Self-RAG๋Š” ์ƒ์„ฑ๊ณผ ํ‰๊ฐ€๋ฅผ ํ•˜๋‚˜์˜ ๋ชจ๋ธ์—์„œ ํ†ตํ•ฉ ์ˆ˜ํ–‰:

  1. Retrieval Decision: ํ˜„์žฌ ์ปจํ…์ŠคํŠธ์—์„œ ๊ฒ€์ƒ‰์ด ๋„์›€์ด ๋ ์ง€ ํŒ๋‹จ
  2. Relevance Evaluation: ๊ฒ€์ƒ‰๋œ ๊ฐ ๋ฌธ์„œ์˜ ๊ด€๋ จ์„ฑ ํ‰๊ฐ€
  3. Support Assessment: ์ƒ์„ฑ ๋‚ด์šฉ์ด ์ฆ๊ฑฐ์— ๊ทผ๊ฑฐํ•˜๋Š”์ง€ ํ™•์ธ
  4. Utility Scoring: ์ตœ์ข… ๋‹ต๋ณ€์˜ ์ „์ฒด์  ์œ ์šฉ์„ฑ ํ‰๊ฐ€

๐Ÿ—๏ธ ์•„ํ‚คํ…์ฒ˜ / ๋ฐฉ๋ฒ•๋ก 

Reflection Token ์ƒ์„ธ ์„ค๋ช…

1. Retrieve Token (๊ฒ€์ƒ‰ ๊ฒฐ์ • ํ† ํฐ)

ํ† ํฐ ์˜๋ฏธ ์‚ฌ์šฉ ์‹œ์ 
[Retrieve=Yes] ์™ธ๋ถ€ ์ง€์‹ ํ•„์š” ์‚ฌ์‹ค ๊ธฐ๋ฐ˜ ์ •๋ณด, ์ตœ์‹  ์ •๋ณด, ์ „๋ฌธ ์ง€์‹ ํ•„์š” ์‹œ
[Retrieve=No] ๋‚ด๋ถ€ ์ง€์‹ ์ถฉ๋ถ„ ์ƒ์‹, ์ผ๋ฐ˜ ์ง€์‹, ์ฐฝ์˜์  ์ž‘์—… ์‹œ
[Retrieve=Continue] ์ด์ „ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ๊ณ„์† ์‚ฌ์šฉ ๊ธด ๋‹ต๋ณ€ ์ƒ์„ฑ ์ค‘ ๋™์ผ ๋งฅ๋ฝ ์œ ์ง€ ์‹œ

ํŒ๋‹จ ๊ธฐ์ค€:
– ํ˜„์žฌ ์ฟผ๋ฆฌ์— ๋Œ€ํ•ด ๋ชจ๋ธ์˜ ๋‚ด๋ถ€ ์ง€์‹๋งŒ์œผ๋กœ ์‹ ๋ขฐ์„ฑ ์žˆ๊ฒŒ ๋‹ต๋ณ€ ๊ฐ€๋Šฅํ•œ์ง€
– ๊ฒ€์ƒ‰์ด ๋‹ต๋ณ€ ํ’ˆ์งˆ ํ–ฅ์ƒ์— ์‹ค์งˆ์ ์œผ๋กœ ๊ธฐ์—ฌํ•˜๋Š”์ง€

2. ISREL Token (๊ด€๋ จ์„ฑ ํ† ํฐ)

ํ† ํฐ ์˜๋ฏธ ํ›„์† ์ฒ˜๋ฆฌ
[ISREL=Relevant] ๋ฌธ์„œ๊ฐ€ ์ฟผ๋ฆฌ ํ•ด๊ฒฐ์— ์œ ์šฉ ํ•ด๋‹น ๋ฌธ์„œ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒ์„ฑ ์ง„ํ–‰
[ISREL=Irrelevant] ๋ฌธ์„œ๊ฐ€ ์ฟผ๋ฆฌ์™€ ๋ฌด๊ด€ ํ•ด๋‹น ๋ฌธ์„œ ์ œ์™ธ, ๋‹ค๋ฅธ ๋ฌธ์„œ ์‚ฌ์šฉ

ํ‰๊ฐ€ ๊ด€์ :
– ๊ฒ€์ƒ‰๋œ passage๊ฐ€ ์ž…๋ ฅ ์งˆ๋ฌธ์„ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ์‹ค์ œ๋กœ ๋„์›€์ด ๋˜๋Š” ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š”์ง€

3. ISSUP Token (์ง€์ง€ ํ† ํฐ)

ํ† ํฐ ์˜๋ฏธ Hallucination ์œ„ํ—˜๋„
[ISSUP=Fully Supported] ๋ชจ๋“  ์ฃผ์žฅ์ด ๋ฌธ์„œ์— ๊ทผ๊ฑฐ ๋‚ฎ์Œ
[ISSUP=Partially Supported] ์ผ๋ถ€๋งŒ ๋ฌธ์„œ์— ๊ทผ๊ฑฐ ์ค‘๊ฐ„
[ISSUP=No Support] ๋ฌธ์„œ์— ๊ทผ๊ฑฐ ์—†์Œ ๋†’์Œ

ํ•ต์‹ฌ ์—ญํ• :
– ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์˜ ๊ฐ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ์ง„์ˆ (Verifiable Statement)์ด ๊ฒ€์ƒ‰ ๋ฌธ์„œ์— ์˜ํ•ด ๋’ท๋ฐ›์นจ๋˜๋Š”์ง€ ํ™•์ธ
– Hallucination ๋ฐฉ์ง€์˜ ํ•ต์‹ฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜

4. ISUSE Token (์œ ์šฉ์„ฑ ํ† ํฐ)

ํ† ํฐ ์ ์ˆ˜ ์˜๋ฏธ
[ISUSE=5] 5์  ์™„๋ฒฝํ•˜๊ฒŒ ์œ ์šฉํ•œ ๋‹ต๋ณ€
[ISUSE=4] 4์  ๋งค์šฐ ์œ ์šฉํ•จ
[ISUSE=3] 3์  ์ ์ ˆํžˆ ์œ ์šฉํ•จ
[ISUSE=2] 2์  ๋‹ค์†Œ ์œ ์šฉํ•จ
[ISUSE=1] 1์  ์œ ์šฉํ•˜์ง€ ์•Š์Œ

ํ‰๊ฐ€ ๊ธฐ์ค€:
– ๊ฒ€์ƒ‰ ๋ฌธ์„œ์™€ ๋…๋ฆฝ์ ์œผ๋กœ ์ตœ์ข… ๋‹ต๋ณ€์ด ์›๋ž˜ ์งˆ๋ฌธ์— ์–ผ๋งˆ๋‚˜ ์œ ์šฉํ•œ์ง€ ์ข…ํ•ฉ ํ‰๊ฐ€


ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ

Self-RAG๋Š” 4๋‹จ๊ณ„ ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์„ ํ†ตํ•ด ํ›ˆ๋ จ๋œ๋‹ค:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Self-RAG Training Pipeline                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                      โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚  โ”‚  Step 1: Critic Data Creation (GPT-4 ํ™œ์šฉ)                  โ”‚     โ”‚
โ”‚  โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€   โ”‚     โ”‚
โ”‚  โ”‚  โ€ข GPT-4์— few-shot prompting์œผ๋กœ Reflection Token ์ƒ์„ฑ     โ”‚     โ”‚
โ”‚  โ”‚  โ€ข ๊ฐ ํ† ํฐ ํƒ€์ž…๋ณ„ 4K-20K ์ƒ˜ํ”Œ ์ˆ˜์ง‘                          โ”‚     โ”‚
โ”‚  โ”‚  โ€ข Human evaluation๊ณผ ๋น„๊ตํ•˜์—ฌ ํ’ˆ์งˆ ๊ฒ€์ฆ                    โ”‚     โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                              โ†“                                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚  โ”‚  Step 2: Critic Model Training                              โ”‚     โ”‚
โ”‚  โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€   โ”‚     โ”‚
โ”‚  โ”‚  โ€ข Llama 2-7B๋ฅผ base model๋กœ ์‚ฌ์šฉ                           โ”‚     โ”‚
โ”‚  โ”‚  โ€ข Reflection Token๋“ค์„ vocabulary์— ์ถ”๊ฐ€                   โ”‚     โ”‚
โ”‚  โ”‚  โ€ข Standard next-token prediction์œผ๋กœ ํ•™์Šต                  โ”‚     โ”‚
โ”‚  โ”‚  โ€ข GPT-4 ์˜ˆ์ธก๊ณผ 90% ์ด์ƒ ์ผ์น˜์œจ ๋‹ฌ์„ฑ                        โ”‚     โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                              โ†“                                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚  โ”‚  Step 3: Generator Data Creation                            โ”‚     โ”‚
โ”‚  โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€   โ”‚     โ”‚
โ”‚  โ”‚  โ€ข ํ•™์Šต๋œ Critic + Retriever๋กœ ์›๋ณธ ์ฝ”ํผ์Šค ์ฆ๊ฐ•             โ”‚     โ”‚
โ”‚  โ”‚  โ€ข ๊ฐ segment์— Reflection Token ์˜คํ”„๋ผ์ธ ์‚ฝ์ž…              โ”‚     โ”‚
โ”‚  โ”‚  โ€ข Retrieve=Yes์ธ ๊ฒฝ์šฐ Top-K ๋ฌธ์„œ ์ถ”๊ฐ€                      โ”‚     โ”‚
โ”‚  โ”‚  โ€ข ๊ฒ€์ƒ‰๋œ ํ…์ŠคํŠธ ์ฒญํฌ๋Š” ํ•™์Šต ์‹œ ๋งˆ์Šคํ‚น                      โ”‚     โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                              โ†“                                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚  โ”‚  Step 4: Generator Model Training                           โ”‚     โ”‚
โ”‚  โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€   โ”‚     โ”‚
โ”‚  โ”‚  โ€ข ํ™•์žฅ๋œ vocabulary๋กœ ์ตœ์ข… Generator ๋ชจ๋ธ ํ•™์Šต             โ”‚     โ”‚
โ”‚  โ”‚  โ€ข Target output + Reflection Token ๋™์‹œ ์˜ˆ์ธก               โ”‚     โ”‚
โ”‚  โ”‚  โ€ข RLHF ๋Œ€๋น„ ํ›จ์”ฌ ๋‚ฎ์€ ํ•™์Šต ๋น„์šฉ                            โ”‚     โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                                                                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

ํ•™์Šต ํšจ์œจ์„ฑ์˜ ํ•ต์‹ฌ:
– Critic ๋ชจ๋ธ์„ ์˜คํ”„๋ผ์ธ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ํ† ํฐ ์‚ฝ์ž…
– ํ•™์Šต ์‹œ์ ์— Critic ๋ชจ๋ธ์„ ํ˜ธ์ŠคํŒ…ํ•  ํ•„์š” ์—†์Œ
– RLHF(PPO ๋“ฑ) ๋Œ€๋น„ ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ ์ด๊ณ  ์•ˆ์ •์ 


์ถ”๋ก  ์•Œ๊ณ ๋ฆฌ์ฆ˜

Input Query
     โ”‚
     โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  [Retrieve] ์˜ˆ์ธก   โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚  ๊ฒ€์ƒ‰ ํ•„์š”์„ฑ ํŒ๋‹จ   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
     โ”‚
     โ”œโ”€โ”€ [Retrieve=No] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
     โ”‚                                        โ–ผ
     โ”‚                              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
     โ”‚                              โ”‚  ์ง์ ‘ ์ƒ์„ฑ      โ”‚
     โ”‚                              โ”‚  (๋‚ด๋ถ€ ์ง€์‹)    โ”‚
     โ”‚                              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
     โ”‚
     โ””โ”€โ”€ [Retrieve=Yes/Continue]
                 โ”‚
                 โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚  Retriever     โ”‚
        โ”‚  (Top-K ๊ฒ€์ƒ‰)  โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚  [ISREL] ํ‰๊ฐ€      โ”‚
        โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
        โ”‚  ๊ฐ ๋ฌธ์„œ ๊ด€๋ จ์„ฑ    โ”‚
        โ”‚  ํ‰๊ฐ€ ๋ฐ ํ•„ํ„ฐ๋ง    โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚  Segment ์ƒ์„ฑ      โ”‚
        โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
        โ”‚  ๊ด€๋ จ ๋ฌธ์„œ๋ณ„๋กœ     โ”‚
        โ”‚  ์‘๋‹ต ํ›„๋ณด ์ƒ์„ฑ    โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚  [ISSUP] ํ‰๊ฐ€      โ”‚
        โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
        โ”‚  ์ƒ์„ฑ ๋‚ด์šฉ์˜       โ”‚
        โ”‚  ์ฆ๊ฑฐ ์ง€์ง€๋„ ํ™•์ธ  โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚  [ISUSE] ํ‰๊ฐ€      โ”‚
        โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
        โ”‚  ์ „์ฒด ์œ ์šฉ์„ฑ ์ ์ˆ˜  โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚  Segment-level     โ”‚
        โ”‚  Beam Search       โ”‚
        โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
        โ”‚  Critique Score    โ”‚
        โ”‚  ๊ธฐ๋ฐ˜ ์ตœ์  ์„ ํƒ    โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
          Final Output

Critique Score ๊ณ„์‚ฐ

Segment-level beam search์—์„œ ๊ฐ segment์˜ ์ ์ˆ˜๋Š” Reflection Token ํ™•๋ฅ ์˜ ๊ฐ€์ค‘ ํ•ฉ์œผ๋กœ ๊ณ„์‚ฐ:

Score = w_rel ร— P([ISREL]=Relevant)
      + w_sup ร— P([ISSUP]=Fully_Supported)
      + w_use ร— P([ISUSE]=5)
๊ฐ€์ค‘์น˜ ์—ญํ•  ์กฐ์ • ๋ฐฉํ–ฅ
w_rel ๊ด€๋ จ์„ฑ ์ค‘์š”๋„ ๋…ธ์ด์ฆˆ ํ•„ํ„ฐ๋ง ๊ฐ•ํ™” ์‹œ ์ฆ๊ฐ€
w_sup ์ง€์ง€๋„ ์ค‘์š”๋„ Hallucination ๋ฐฉ์ง€ ๊ฐ•ํ™” ์‹œ ์ฆ๊ฐ€
w_use ์œ ์šฉ์„ฑ ์ค‘์š”๋„ ๋‹ต๋ณ€ ํ’ˆ์งˆ ๊ฐ•์กฐ ์‹œ ์ฆ๊ฐ€

์ถ”๋ก  ์‹œ ์ œ์–ด ๊ฐ€๋Šฅ์„ฑ:
– ์ด ๊ฐ€์ค‘์น˜๋“ค์„ ์กฐ์ •ํ•˜์—ฌ ํƒœ์Šคํฌ๋ณ„ ๋งž์ถค ๋™์ž‘ ๊ฐ€๋Šฅ
– ์‚ฌ์‹ค์„ฑ ์ค‘์‹œ vs ์ฐฝ์˜์„ฑ ์ค‘์‹œ ๊ฐ„ trade-off ์กฐ์ ˆ


๐Ÿ“Š ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ

ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹

๋ฐ์ดํ„ฐ์…‹ ํƒœ์Šคํฌ ์œ ํ˜• ํŠน์ง• ํ‰๊ฐ€ ์ง€ํ‘œ
PopQA Open-domain QA WikiData ๊ธฐ๋ฐ˜, ์ตœ์‹  ์—”ํ‹ฐํ‹ฐ ํฌํ•จ Accuracy
TriviaQA Open-domain QA Trivia ์งˆ๋ฌธ Accuracy (EM)
PubHealth Fact Verification ๊ฑด๊ฐ• ์ •๋ณด ์‚ฌ์‹ค ๊ฒ€์ฆ Accuracy
ARC-Challenge Reasoning ๊ณผํ•™ ์ถ”๋ก  Accuracy
ASQA Long-form QA ๋ชจํ˜ธํ•œ ์งˆ๋ฌธ, ์ข…ํ•ฉ ๋‹ต๋ณ€ ํ•„์š” FactScore, Citation Precision/Recall
Biography Long-form Generation ์ธ๋ฌผ ์ „๊ธฐ ์ƒ์„ฑ FactScore

์ฃผ์š” ์‹คํ—˜ ๊ฒฐ๊ณผ

Short-form Generation (QA, Fact Verification, Reasoning)

Model PopQA TriviaQA PubHealth ARC-C
Llama2-13B (No Retrieval) 14.7% 47.0% – 29.4%
Alpaca-13B (No Retrieval) 24.4% 66.9% 51.1% 57.6%
Llama2-chat + RAG – – – –
ChatGPT 29.3% – 70.0% –
Self-RAG 7B 50.5% 66.4% 72.4% 67.3%
Self-RAG 13B 55.8% 69.3% 74.5% 73.1%

Long-form Generation (Citation ํ•„์š” ํƒœ์Šคํฌ)

Model ASQA EM Citation Prec. Citation Rec. Bio FactScore
Llama2-chat-13B + RAG 17.1% 41.7% 17.4% –
ChatGPT 25.2% 61.8% 68.9% 71%
Retrieval-augmented ChatGPT – – – –
Self-RAG 13B 29.3% 70.3% 71.3% 80%

ํ•ต์‹ฌ ๋ฐœ๊ฒฌ:
– Self-RAG 7B/13B๊ฐ€ ChatGPT์™€ ๊ธฐ์กด RAG ๋ชจ๋ธ ๋Œ€๋น„ ์ผ๊ด€๋œ ์„ฑ๋Šฅ ํ–ฅ์ƒ
– ํŠนํžˆ Citation Precision์—์„œ ํ˜„์ €ํ•œ ๊ฐœ์„  (70.3% vs 61.8%)
– Biography ์ƒ์„ฑ์—์„œ FactScore 80% ๋‹ฌ์„ฑ

Ablation Study ๊ฒฐ๊ณผ

์„ค์ • PopQA PubHealth ASQA
Full Self-RAG 54.9% 72.2% 28.7%
w/o Retriever 32.9% (-40%) 70.8% (-2%) 25.1%
w/o Critic 47.2% 68.4% 26.9%
Always Retrieve (Top-1) 51.3% 71.5% 27.8%
No Adaptive Retrieval – ์†Œํญ ํ•˜๋ฝ –

Ablation ๋ถ„์„:

  1. Retriever ์ œ๊ฑฐ ์‹œ:
    • PopQA: 40% ์„ฑ๋Šฅ ํ•˜๋ฝ (๊ฒ€์ƒ‰์ด ํ•ต์‹ฌ์ )
    • PubHealth: 2% ํ•˜๋ฝ (๋‚ด๋ถ€ ์ง€์‹์œผ๋กœ๋„ ์–ด๋А ์ •๋„ ๋Œ€์‘ ๊ฐ€๋Šฅ)
    • ํƒœ์Šคํฌ ํŠน์„ฑ์— ๋”ฐ๋ผ ๊ฒ€์ƒ‰ ์ค‘์š”๋„๊ฐ€ ๋‹ค๋ฆ„์„ ์ž…์ฆ
  2. Critic ์ œ๊ฑฐ ์‹œ:
    • ์ „๋ฐ˜์  ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฐœ์ƒ
    • ์ž๊ธฐ ํ‰๊ฐ€ ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ์ค‘์š”์„ฑ ํ™•์ธ
  3. Always Retrieve (๊ด€๋ จ์„ฑ ๋ฌด์‹œ):
    • PopQA, ASQA์—์„œ ํฐ ์„ฑ๋Šฅ ํ•˜๋ฝ
    • ๋ฌด๊ด€ํ•œ ๋ฌธ์„œ๊ฐ€ ๋…ธ์ด์ฆˆ๋กœ ์ž‘์šฉํ•จ์„ ํ™•์ธ
  4. ๊ฒ€์ƒ‰ ๋นˆ๋„์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ:
    • PubHealth: ๊ฒ€์ƒ‰ ๋นˆ๋„ ๋‚ฎ์ถฐ๋„ ์„ฑ๋Šฅ ์ €ํ•˜ ์ž‘์Œ
    • PopQA: ๊ฒ€์ƒ‰ ๋นˆ๋„ ๋‚ฎ์ถ”๋ฉด ์„ฑ๋Šฅ ์ €ํ•˜ ํผ
    • Adaptive retrieval์˜ ํ•„์š”์„ฑ ์ž…์ฆ

๐Ÿ’ช ๊ฐ•์  ๋ฐ ๊ธฐ์—ฌ

1. ์ ์‘์  ๊ฒ€์ƒ‰ ๋ฉ”์ปค๋‹ˆ์ฆ˜ (Adaptive Retrieval)

  • ์ฟผ๋ฆฌ์™€ ํƒœ์Šคํฌ ํŠน์„ฑ์— ๋”ฐ๋ผ ๊ฒ€์ƒ‰ ์—ฌ๋ถ€ ๋ฐ ๋นˆ๋„๋ฅผ ๋™์ ์œผ๋กœ ๊ฒฐ์ •
  • ๋ถˆํ•„์š”ํ•œ ๊ฒ€์ƒ‰์œผ๋กœ ์ธํ•œ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ
  • ํšจ์œจ์„ฑ๊ณผ ์ •ํ™•์„ฑ์˜ ๊ท ํ˜• ๋‹ฌ์„ฑ

2. ๋‹ค๋‹จ๊ณ„ ์ž๊ธฐ ํ‰๊ฐ€ (Multi-level Self-Assessment)

  • Retrieve โ†’ Relevance โ†’ Support โ†’ Utility 4๋‹จ๊ณ„ ํ‰๊ฐ€
  • ์„ธ๋ถ„ํ™”๋œ ํ’ˆ์งˆ ๊ด€๋ฆฌ๋กœ hallucination ๋Œ€ํญ ๊ฐ์†Œ
  • ๊ฐ ๋‹จ๊ณ„์—์„œ ๋ฌธ์ œ๋ฅผ ์กฐ๊ธฐ ๋ฐœ๊ฒฌํ•˜๊ณ  ๋Œ€์‘

3. ํšจ์œจ์ ์ธ ํ•™์Šต ๋ฐฉ๋ฒ•

  • RLHF ๋Œ€๋น„ ํ›จ์”ฌ ๋‚ฎ์€ ํ•™์Šต ๋น„์šฉ
  • Critic ๋ชจ๋ธ์„ ํ†ตํ•œ ์˜คํ”„๋ผ์ธ ํ† ํฐ ์‚ฝ์ž…์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ 
  • GPT-4 ์ง€์‹์„ ์†Œํ˜• Critic ๋ชจ๋ธ๋กœ ์ง€์‹ ์ฆ๋ฅ˜(Distillation)

4. ์ถ”๋ก  ์‹œ์  ์ œ์–ด ๊ฐ€๋Šฅ์„ฑ (Inference-time Controllability)

  • Reflection Token ๊ฐ€์ค‘์น˜ ์กฐ์ •์œผ๋กœ ํƒœ์Šคํฌ๋ณ„ ๋งž์ถค ์„ค์ •
  • ์‚ฌ์‹ค์„ฑ vs ์ฐฝ์˜์„ฑ ๊ฐ„ trade-off ์œ ์—ฐํ•˜๊ฒŒ ์กฐ์ ˆ
  • ๋ณ„๋„ ์žฌํ•™์Šต ์—†์ด ๋™์ž‘ ๋ณ€๊ฒฝ ๊ฐ€๋Šฅ

5. End-to-End ํ†ตํ•ฉ ํ”„๋ ˆ์ž„์›Œํฌ

  • ๋ณ„๋„์˜ retriever, generator, critic ํŒŒ์ดํ”„๋ผ์ธ์ด ์•„๋‹Œ ๋‹จ์ผ ๋ชจ๋ธ
  • ๋ชจ๋“  ์ปดํฌ๋„ŒํŠธ๊ฐ€ jointly ํ•™์Šต๋˜์–ด ์ตœ์ ํ™”
  • ๋ฐฐํฌ ๋ฐ ๊ด€๋ฆฌ ๋ณต์žก๋„ ๊ฐ์†Œ

6. ๋›ฐ์–ด๋‚œ ์‹คํ—˜ ์„ฑ๊ณผ

  • 7B, 13B ๋ชจ๋ธ๋กœ ChatGPT์™€ ๊ธฐ์กด RAG ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ
  • ํŠนํžˆ ์‚ฌ์‹ค์„ฑ(factuality)๊ณผ ์ธ์šฉ ์ •ํ™•๋„์—์„œ ํฐ ํ–ฅ์ƒ
  • ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ์—์„œ ์ผ๊ด€๋œ ๊ฐœ์„ 

โš ๏ธ ํ•œ๊ณ„์ 

1. Token Probability ์ ‘๊ทผ ํ•„์š”

  • Self-RAG ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ ์šฉ์„ ์œ„ํ•ด ํ† ํฐ ํ™•๋ฅ ์— ๋Œ€ํ•œ ์ ‘๊ทผ์ด ํ•„์ˆ˜
  • API ๊ธฐ๋ฐ˜ ๋ชจ๋ธ(GPT-4, Claude ๋“ฑ)์—๋Š” ์ง์ ‘ ์ ์šฉ ์–ด๋ ค์›€
  • ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ์—์„œ๋งŒ ์™„์ „ํ•œ ๊ธฐ๋Šฅ ํ™œ์šฉ ๊ฐ€๋Šฅ

2. ์ถ”์ถœํ˜• ํƒœ์Šคํฌ์— ์ƒ๋Œ€์  ๊ฐ•์ 

  • ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์—์„œ ์ •๋ณด๋ฅผ ์ถ”์ถœ/๋ณต์‚ฌํ•˜๋Š” ํƒœ์Šคํฌ์—์„œ ํšจ๊ณผ์ 
  • ์ถ”๋ก ์ด๋‚˜ ์ข…ํ•ฉ์ด ํ•„์š”ํ•œ ํƒœ์Šคํฌ(PubHealth, ARC-Challenge)์—์„œ๋Š” ๊ฐœ์„  ํญ ์ œํ•œ์ 
  • ๋ณต์žกํ•œ ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก ์—๋Š” ์ถ”๊ฐ€์ ์ธ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ํ•„์š”

3. ํ•™์Šต ๋ฐ์ดํ„ฐ ์˜์กด์„ฑ

  • Critic ๋ชจ๋ธ ํ•™์Šต์— GPT-4 ์ƒ์„ฑ ๋ฐ์ดํ„ฐ ํ•„์š”
  • ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์ด ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ ์„ฑ๋Šฅ์— ์˜ํ–ฅ
  • GPT-4 API ๋น„์šฉ ๋ฐ ์žฌํ˜„์„ฑ ๋ฌธ์ œ

4. ์™„์ „ํ•œ Hallucination ๋ฐฉ์ง€ ๋ถˆ๊ฐ€

  • Self-RAG๋„ ์—ฌ์ „ํžˆ ์ง€์ง€๋˜์ง€ ์•Š๋Š” ์ฃผ์žฅ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ
  • Critic ๋ชจ๋ธ์˜ ํ‰๊ฐ€ ์‹ ๋ขฐ์„ฑ์— ์˜์กด
  • ๊ฒ€์ƒ‰ ์ฝ”ํผ์Šค์— ์—†๋Š” ์ •๋ณด์— ๋Œ€ํ•ด์„œ๋Š” ํ•œ๊ณ„

5. ๊ณ„์‚ฐ ๋น„์šฉ ์ฆ๊ฐ€

  • Segment๋ณ„ ๊ฒ€์ƒ‰ ๋ฐ ํ‰๊ฐ€๋กœ ์ธํ•œ ์ถ”๋ก  ์‹œ๊ฐ„ ์ฆ๊ฐ€
  • Beam search๋กœ ์ธํ•œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ์ฆ๊ฐ€
  • ๋Œ€๊ทœ๋ชจ Wikipedia ์ž„๋ฒ ๋”ฉ ์ €์žฅ์— ์•ฝ 100GB RAM ํ•„์š”

6. ๊ฒ€์ƒ‰ ์ฝ”ํผ์Šค ์˜์กด์„ฑ

  • ๊ฒ€์ƒ‰ ์ฝ”ํผ์Šค์˜ ํ’ˆ์งˆ๊ณผ ์ปค๋ฒ„๋ฆฌ์ง€์— ์„ฑ๋Šฅ ์˜์กด
  • ์ง€์‹ ์ตœ์‹ ์„ฑ ์œ ์ง€ ๋ฌธ์ œ (Wikipedia ์—…๋ฐ์ดํŠธ ์ฃผ๊ธฐ)
  • ํŠน์ˆ˜ ๋„๋ฉ”์ธ์—์„œ๋Š” ๋ณ„๋„ ์ฝ”ํผ์Šค ๊ตฌ์ถ• ํ•„์š”

๐Ÿ”— ๊ด€๋ จ ๋…ผ๋ฌธ

์„ ํ–‰ ์—ฐ๊ตฌ – RAG ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•

๋…ผ๋ฌธ ์—ฐ๋„ ํ•ต์‹ฌ ์•„์ด๋””์–ด Self-RAG์™€์˜ ์ฐจ์ด์ 
RAG (Lewis et al.) 2020 Retrieval + Generation ๊ฒฐํ•ฉ ๊ณ ์ • ๊ฒ€์ƒ‰, ํ’ˆ์งˆ ํ‰๊ฐ€ ์—†์Œ
REALM (Guu et al.) 2020 Retrieval๋กœ ์‚ฌ์ „ํ•™์Šต ๊ฒ€์ƒ‰ ์‹œ์  ๊ณ ์ •
RETRO (Borgeaud et al.) 2022 ๋Œ€๊ทœ๋ชจ ์‚ฌ์ „ํ•™์Šต์— retrieval ํ†ตํ•ฉ ์ถ”๊ฐ€ encoder ํ•„์š”, ์ ์‘์  ๊ฒ€์ƒ‰ ์—†์Œ
Atlas (Izacard et al.) 2022 Few-shot learning + RAG ์ž๊ธฐ ํ‰๊ฐ€ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์—†์Œ
REPLUG (Shi et al.) 2023 LLM ํ”ผ๋“œ๋ฐฑ์œผ๋กœ retriever fine-tuning Generator ํ•™์Šต ์•ˆ ํ•จ

์„ ํ–‰ ์—ฐ๊ตฌ – Active/Adaptive Retrieval

๋…ผ๋ฌธ ์—ฐ๋„ ํ•ต์‹ฌ ์•„์ด๋””์–ด Self-RAG์™€์˜ ์ฐจ์ด์ 
FLARE (Jiang et al.) 2023 ์ €์‹ ๋ขฐ ํ† ํฐ ์‹œ ๊ฒ€์ƒ‰ ํŠธ๋ฆฌ๊ฑฐ ์ƒ์„ฑ ํ’ˆ์งˆ ํ‰๊ฐ€ ์—†์Œ
Active Retrieval (Jiang et al.) 2023 ํ† ํฐ ํ™•๋ฅ  ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ ํƒ€์ด๋ฐ Critique ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์—†์Œ

์„ ํ–‰ ์—ฐ๊ตฌ – Self-Critique/Reflection

๋…ผ๋ฌธ ์—ฐ๋„ ํ•ต์‹ฌ ์•„์ด๋””์–ด Self-RAG์™€์˜ ์ฐจ์ด์ 
Self-Refine (Madaan et al.) 2023 ์ƒ์„ฑ ํ›„ ๋ฐ˜๋ณต์  ์ž๊ธฐ ํ”ผ๋“œ๋ฐฑ Retrieval๊ณผ ๋ถ„๋ฆฌ๋จ
Constitutional AI (Bai et al.) 2022 ์›์น™ ๊ธฐ๋ฐ˜ ์ž๊ธฐ ๋น„ํ‰ ๊ฒ€์ƒ‰ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์—†์Œ

ํ›„์† ์—ฐ๊ตฌ

๋…ผ๋ฌธ ์—ฐ๋„ ๋ฐœ์ „ ๋ฐฉํ–ฅ
CRAG (Yan et al.) 2024 Corrective RAG – ๊ฒ€์ƒ‰ ํ’ˆ์งˆ ํ‰๊ฐ€ ํ›„ ์›น ๊ฒ€์ƒ‰ ํด๋ฐฑ
Speculative RAG (Zhang et al.) 2024 ์†Œํ˜• specialist LM์œผ๋กœ ๋“œ๋ž˜ํ”„ํŒ…, ๋Œ€ํ˜• ๋ชจ๋ธ๋กœ ๊ฒ€์ฆ
RAT (Liu et al.) 2024 RAG + Chain-of-Thought ๊ฒฐํ•ฉ
Adaptive RAG 2024 ์ฟผ๋ฆฌ ๋ณต์žก๋„์— ๋”ฐ๋ฅธ ๋™์  ์ „๋žต ์„ ํƒ

๋น„๊ต ๋ถ„์„ ํ‘œ

๋ฐฉ๋ฒ• ์ ์‘์  ๊ฒ€์ƒ‰ ๊ด€๋ จ์„ฑ ํ‰๊ฐ€ ์ง€์ง€๋„ ํ‰๊ฐ€ ์œ ์šฉ์„ฑ ํ‰๊ฐ€ ํ•™์Šต ๋ฐฉ์‹
Standard RAG X X X X ์—†์Œ
FLARE O (๋ถ€๋ถ„) X X X ์—†์Œ
CRAG X O (์™ธ๋ถ€) X X Plug-and-play
Self-RAG O O (๋‚ด๋ถ€) O (๋‚ด๋ถ€) O (๋‚ด๋ถ€) End-to-end

๐Ÿ’ป ์‹ค๋ฌด ์ ์šฉ ํฌ์ธํŠธ

1. ์ ์šฉ ์ ํ•ฉ ์‹œ๋‚˜๋ฆฌ์˜ค

์‹œ๋‚˜๋ฆฌ์˜ค ์ ํ•ฉ๋„ ์ด์œ 
Knowledge-intensive QA ๋งค์šฐ ๋†’์Œ ์‚ฌ์‹ค ๊ธฐ๋ฐ˜ ๋‹ต๋ณ€์— ๊ฒ€์ƒ‰ ํ•„์ˆ˜
Fact Verification ๋งค์šฐ ๋†’์Œ ์ฆ๊ฑฐ ๊ธฐ๋ฐ˜ ๊ฒ€์ฆ์— ์ตœ์ 
Document-grounded Generation ๋†’์Œ ์ธ์šฉ ์ •ํ™•๋„ ํ–ฅ์ƒ
Citation-required Tasks ๋†’์Œ ์ถœ์ฒ˜ ๋ช…์‹œ๊ฐ€ ํ•„์š”ํ•œ ํ•™์ˆ /๋ฒ•๋ฅ  ๋„๋ฉ”์ธ
Customer Support Bot ์ค‘๊ฐ„ ์ •์ฑ…/๋งค๋‰ด์–ผ ๊ธฐ๋ฐ˜ ๋‹ต๋ณ€
Creative Writing ๋‚ฎ์Œ ๊ฒ€์ƒ‰๋ณด๋‹ค ์ฐฝ์˜์„ฑ์ด ์ค‘์š”

2. ๊ตฌํ˜„ ์˜ต์…˜

Option A: ๊ณต์‹ Self-RAG ๋ชจ๋ธ ์‚ฌ์šฉ

# HuggingFace์—์„œ Self-RAG ๋ชจ๋ธ ๋กœ๋“œ
from vllm import LLM, SamplingParams

model = LLM("selfrag/selfrag_llama2_7b", dtype="half")
sampling_params = SamplingParams(
    temperature=0.0,
    top_p=1.0,
    max_tokens=100,
    skip_special_tokens=False  # Reflection tokens ํ™•์ธ์šฉ
)

# ์ฟผ๋ฆฌ ์‹คํ–‰
prompt = "What is the capital of France?"
outputs = model.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)

Option B: LangGraph๋ฅผ ํ™œ์šฉํ•œ Self-RAG ํŒจํ„ด ๊ตฌํ˜„

from typing import TypedDict, List
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import Chroma

# State ์ •์˜
class GraphState(TypedDict):
    question: str
    generation: str
    documents: List[str]
    relevance_scores: List[float]

# ๋…ธ๋“œ ํ•จ์ˆ˜๋“ค
def should_retrieve(state) -> str:
    """[Retrieve] ํ† ํฐ ์—ญํ•  - ๊ฒ€์ƒ‰ ํ•„์š”์„ฑ ํŒ๋‹จ"""
    question = state["question"]
    # LLM์œผ๋กœ ๊ฒ€์ƒ‰ ํ•„์š”์„ฑ ํŒ๋‹จ
    decision = retrieval_grader.invoke({
        "question": question
    })
    return "retrieve" if decision == "yes" else "generate_direct"

def retrieve(state):
    """๋ฌธ์„œ ๊ฒ€์ƒ‰"""
    question = state["question"]
    documents = retriever.invoke(question)
    return {"documents": documents}

def grade_documents(state):
    """[ISREL] ํ† ํฐ ์—ญํ•  - ๋ฌธ์„œ ๊ด€๋ จ์„ฑ ํ‰๊ฐ€"""
    question = state["question"]
    documents = state["documents"]

    filtered_docs = []
    for doc in documents:
        # LLM์œผ๋กœ ๊ด€๋ จ์„ฑ ํ‰๊ฐ€
        score = relevance_grader.invoke({
            "question": question,
            "document": doc.page_content
        })
        if score.binary_score == "relevant":
            filtered_docs.append(doc)

    return {"documents": filtered_docs}

def generate(state):
    """๋‹ต๋ณ€ ์ƒ์„ฑ"""
    question = state["question"]
    documents = state["documents"]
    generation = rag_chain.invoke({
        "context": "\n\n".join([d.page_content for d in documents]),
        "question": question
    })
    return {"generation": generation}

def grade_generation(state) -> str:
    """[ISSUP] + [ISUSE] ํ† ํฐ ์—ญํ• """
    # ํ™˜๊ฐ ์ฒดํฌ (ISSUP)
    hallucination_score = hallucination_grader.invoke({
        "documents": state["documents"],
        "generation": state["generation"]
    })

    if hallucination_score.binary_score == "no":
        return "regenerate"

    # ๋‹ต๋ณ€ ์œ ์šฉ์„ฑ ์ฒดํฌ (ISUSE)
    answer_score = answer_grader.invoke({
        "question": state["question"],
        "generation": state["generation"]
    })

    return "useful" if answer_score.binary_score == "yes" else "not_useful"

# ๊ทธ๋ž˜ํ”„ ๊ตฌ์„ฑ
workflow = StateGraph(GraphState)

# ๋…ธ๋“œ ์ถ”๊ฐ€
workflow.add_node("retrieve", retrieve)
workflow.add_node("grade_documents", grade_documents)
workflow.add_node("generate", generate)
workflow.add_node("generate_direct", generate_direct)

# ์กฐ๊ฑด๋ถ€ ์—ฃ์ง€
workflow.set_conditional_entry_point(
    should_retrieve,
    {
        "retrieve": "retrieve",
        "generate_direct": "generate_direct"
    }
)

workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    lambda x: "generate" if x["documents"] else "websearch",
    {"generate": "generate", "websearch": "websearch"}
)
workflow.add_conditional_edges(
    "generate",
    grade_generation,
    {"useful": END, "not_useful": "websearch", "regenerate": "generate"}
)

app = workflow.compile()

3. ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๊ฐ€์ด๋“œ

ํƒœ์Šคํฌ ์œ ํ˜• w_rel w_sup w_use ์„ค๋ช…
์‚ฌ์‹ค ๊ฒ€์ฆ ๋†’์Œ ๋งค์šฐ ๋†’์Œ ์ค‘๊ฐ„ Hallucination ๋ฐฉ์ง€ ์ตœ์šฐ์„ 
Open QA ๋†’์Œ ๋†’์Œ ๋†’์Œ ๊ท ํ˜• ์žกํžŒ ์„ค์ •
Long-form ์ƒ์„ฑ ์ค‘๊ฐ„ ๋†’์Œ ๋งค์šฐ ๋†’์Œ ์œ ์šฉ์„ฑ๊ณผ ํ’ˆ์งˆ ์ค‘์‹œ
์ฐฝ์˜์  ์ž‘๋ฌธ ๋‚ฎ์Œ ๋‚ฎ์Œ ๋งค์šฐ ๋†’์Œ ์œ ์ฐฝ์„ฑ๊ณผ ์ฐฝ์˜์„ฑ ์ค‘์‹œ

4. ๋ฐฐํฌ ์‹œ ๊ณ ๋ ค์‚ฌํ•ญ

ํ•ญ๋ชฉ ๊ถŒ์žฅ ์‚ฌํ•ญ
๋ฉ”๋ชจ๋ฆฌ ์ „์ฒด Wikipedia ์ž„๋ฒ ๋”ฉ ์‹œ 100GB+ RAM ํ•„์š”
๊ฒ€์ƒ‰๊ธฐ Contriever ๋˜๋Š” DPR ๊ถŒ์žฅ
์ฒญํฌ ํฌ๊ธฐ 250 ํ† ํฐ ๊ถŒ์žฅ (๋…ผ๋ฌธ ์„ค์ •)
์ถ”๋ก  ์ตœ์ ํ™” vLLM ์‚ฌ์šฉ์œผ๋กœ ์ฒ˜๋ฆฌ๋Ÿ‰ ํ–ฅ์ƒ
๋ชจ๋ธ ์„ ํƒ 7B (์†๋„ ์ค‘์‹œ) vs 13B (ํ’ˆ์งˆ ์ค‘์‹œ)
์ฝ”ํผ์Šค ๋„๋ฉ”์ธ๋ณ„ ์ปค์Šคํ…€ ์ฝ”ํผ์Šค ๊ตฌ์ถ• ๊ถŒ์žฅ

5. ํ‰๊ฐ€ ์ง€ํ‘œ ์„ ํƒ

ํƒœ์Šคํฌ ๊ถŒ์žฅ ์ง€ํ‘œ
QA Accuracy, Exact Match (EM), F1
Long-form FactScore, MAUVE, Citation Precision/Recall
Fact Verification Accuracy, F1
์ผ๋ฐ˜ ROUGE, BERTScore

6. ํ™œ์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค๋ณ„ ๊ถŒ์žฅ์‚ฌํ•ญ

  1. ๊ธฐ์—… QA ์‹œ์Šคํ…œ:
    • ์‚ฌ๋‚ด ๋ฌธ์„œ ๊ธฐ๋ฐ˜ RAG + Self-RAG ํŒจํ„ด
    • Citation ํ‘œ์‹œ๋กœ ๋‹ต๋ณ€ ์‹ ๋ขฐ์„ฑ ํ–ฅ์ƒ
  2. ๋ฒ•๋ฅ /์˜๋ฃŒ ๋„๋ฉ”์ธ:
    • ๋†’์€ w_sup ์„ค์ •์œผ๋กœ ๊ทผ๊ฑฐ ๊ธฐ๋ฐ˜ ๋‹ต๋ณ€ ๊ฐ•์ œ
    • ๋„๋ฉ”์ธ ํŠนํ™” ์ฝ”ํผ์Šค ๊ตฌ์ถ• ํ•„์ˆ˜
  3. ์—ฐ๊ตฌ ๋ณด์กฐ ๋„๊ตฌ:
    • ASQA ์Šคํƒ€์ผ์˜ ์ข…ํ•ฉ ๋‹ต๋ณ€ ์ƒ์„ฑ
    • ์ธ์šฉ ์ถ”์  ๊ธฐ๋Šฅ ํ™œ์„ฑํ™”
  4. ๊ณ ๊ฐ ์ง€์› ๋ด‡:
    • ์ •์ฑ…/๋งค๋‰ด์–ผ ๊ธฐ๋ฐ˜ ์ผ๊ด€๋œ ๋‹ต๋ณ€
    • Adaptive retrieval๋กœ ํšจ์œจ์„ฑ ํ™•๋ณด

๐Ÿท๏ธ Tags

#RAG #Self-RAG #Retrieval-Augmented-Generation #LLM #Self-Reflection #Adaptive-Retrieval #ICLR2024 #Hallucination-Reduction #Fact-Verification #Knowledge-Grounding #Critique-Model #Reflection-Tokens #Llama2 #LangGraph #NLP #InformationRetrieval

์ž‘์„ฑ์ž

skycave

Follow Me
๋‹ค๋ฅธ ๊ธฐ์‚ฌ
Previous

[AI Paper] ๐Ÿ“„ Reflexion: Language Agents with Verbal Reinforcement Learning

Next

[AI Paper] The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption

๋Œ“๊ธ€ ์—†์Œ! ์ฒซ ๋Œ“๊ธ€์„ ๋‚จ๊ฒจ๋ณด์„ธ์š”.

๋‹ต๊ธ€ ๋‚จ๊ธฐ๊ธฐ ์‘๋‹ต ์ทจ์†Œ

์ด๋ฉ”์ผ ์ฃผ์†Œ๋Š” ๊ณต๊ฐœ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•„์ˆ˜ ํ•„๋“œ๋Š” *๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค

์ตœ์‹ ๊ธ€

  • ๐Ÿ“Š ์ผ์ผ ๋‰ด์Šค ๊ฐ์„ฑ ๋ฆฌํฌํŠธ – 2026-01-28
  • AI ์‹œ์Šคํ…œ์˜ ๋ฌธ๋งฅ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰(Contextual Retrieval) | Anthropic
  • “Think” ํˆด: Claude๊ฐ€ ๋ฉˆ์ถฐ์„œ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ | Anthropic
  • Claude Code ๋ชจ๋ฒ” ์‚ฌ๋ก€ \ Anthropic
  • ์šฐ๋ฆฌ๊ฐ€ ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ์—ฐ๊ตฌ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ ๋ฐฉ๋ฒ•
Copyright 2026 — skycave's Blog. All rights reserved. Blogsy WordPress Theme