๋ณธ๋ฌธ์œผ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ
-
skycave's Blog
skycave's Blog
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
๋‹ซ๊ธฐ

๊ฒ€์ƒ‰

AI

[AI Paper] ๐Ÿ“„ ReAct: Synergizing Reasoning and Acting in Language Models

By skycave
2026๋…„ 01์›” 25์ผ 11 Min Read
0

๐Ÿ“„ ReAct: Synergizing Reasoning and Acting in Language Models

๐Ÿ“‹ ๋ฉ”ํƒ€ ์ •๋ณด

ํ•ญ๋ชฉ ๋‚ด์šฉ
์ €์ž Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
์†Œ์† Princeton University (Department of Computer Science), Google Research (Brain Team)
๋ฐœํ‘œ์ฒ˜ ICLR 2023 (International Conference on Learning Representations)
์—ฐ๋„ 2022๋…„ 10์›” (arXiv), 2023๋…„ (ICLR ๋ฐœํ‘œ)
์ˆ˜์ƒ ICLR 2023 Notable Top 5% Paper, Oral Presentation
arXiv https://arxiv.org/abs/2210.03629
OpenReview https://openreview.net/forum?id=WE_vluYUL-X
GitHub https://github.com/ysymyth/ReAct
ํ”„๋กœ์ ํŠธ ํŽ˜์ด์ง€ https://react-lm.github.io/

๐ŸŽฏ ํ•œ์ค„ ์š”์•ฝ

LLM์ด ์ถ”๋ก (Reasoning)๊ณผ ํ–‰๋™(Acting)์„ ๊ต์ฐจ(interleaved) ๋ฐฉ์‹์œผ๋กœ ์ƒ์„ฑํ•˜๋„๋ก ํ•˜์—ฌ, ๋™์  ์ถ”๋ก ์„ ํ†ตํ•œ ํ–‰๋™ ๊ณ„ํš ์ˆ˜๋ฆฝ๊ณผ ์™ธ๋ถ€ ํ™˜๊ฒฝ๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ํ†ตํ•œ ์ •๋ณด ์ˆ˜์ง‘์„ ๋™์‹œ์— ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ AI Agent์˜ ํ•ต์‹ฌ ํ”„๋ ˆ์ž„์›Œํฌ


๐Ÿ” ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ

๊ธฐ์กด ๋ฌธ์ œ์ 

  1. Reasoning๊ณผ Acting์˜ ๋ถ„๋ฆฌ๋œ ์—ฐ๊ตฌ
    • LLM์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ(์˜ˆ: Chain-of-Thought prompting)๊ณผ ํ–‰๋™ ๋Šฅ๋ ฅ(์˜ˆ: action plan generation)์ด ๋ณ„๊ฐœ์˜ ์—ฐ๊ตฌ ์ฃผ์ œ๋กœ ๋‹ค๋ค„์ ธ ์˜ด
    • ๋‘ ๋Šฅ๋ ฅ ๊ฐ„์˜ ์‹œ๋„ˆ์ง€ ํšจ๊ณผ๋ฅผ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•จ
  2. Chain-of-Thought (CoT)์˜ ํ•œ๊ณ„
    • ์™ธ๋ถ€ ์„ธ๊ณ„์— ๋Œ€ํ•œ ์ ‘๊ทผ ๋ถˆ๊ฐ€
    • ์ง€์‹ ์—…๋ฐ์ดํŠธ ๋ถˆ๊ฐ€๋Šฅ
    • ํ™˜๊ฐ(Hallucination) ๋ฌธ์ œ: ๋…ผ๋ฆฌ์ ์œผ๋กœ ๋“ค๋ฆฌ์ง€๋งŒ ์‚ฌ์‹ค๊ณผ ๋‹ค๋ฅธ ๋‚ด์šฉ ์ƒ์„ฑ
      • HotpotQA์—์„œ CoT์˜ ํ™˜๊ฐ์œผ๋กœ ์ธํ•œ false positive rate์ด ReAct ๋Œ€๋น„ 2๋ฐฐ ์ด์ƒ (14% vs 6%)
      • CoT์˜ ์ฃผ์š” ์‹คํŒจ ์›์ธ ์ค‘ 56%๊ฐ€ ํ™˜๊ฐ์œผ๋กœ ์ธํ•œ ๊ฒƒ
    • ์˜ค๋ฅ˜ ์ „ํŒŒ(Error Propagation): ์ดˆ๊ธฐ ์ถ”๋ก  ์˜ค๋ฅ˜๊ฐ€ ํ›„์† ๋‹จ๊ณ„๋กœ ์ „ํŒŒ
  3. Action-only ์ ‘๊ทผ๋ฒ•์˜ ํ•œ๊ณ„
    • ๊ณ ์ˆ˜์ค€ ๋ชฉํ‘œ ์ถ”์  ๋ฐ ๊ณ„ํš ์กฐ์ • ๋Šฅ๋ ฅ ๋ถ€์กฑ
    • ์˜ˆ์™ธ ์ƒํ™ฉ ์ฒ˜๋ฆฌ ์–ด๋ ค์›€
    • ํ–‰๋™์˜ ๊ทผ๊ฑฐ๋‚˜ ์˜๋„ ํŒŒ์•… ๋ถˆ๊ฐ€
    • ๋ชฉํ‘œ๋ฅผ ํ•˜์œ„ ๋ชฉํ‘œ๋กœ ์ ์ ˆํžˆ ๋ถ„ํ•ดํ•˜์ง€ ๋ชปํ•จ

์™œ ์ด ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•œ๊ฐ€?

  • ์ธ๊ฐ„์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ฐฉ์‹: ์ถ”๋ก ๊ณผ ํ–‰๋™์ด ๊ธด๋ฐ€ํ•˜๊ฒŒ ์—ฐ๊ฒฐ
  • ์˜ˆ: ์š”๋ฆฌํ•  ๋•Œ ๋ ˆ์‹œํ”ผ๋ฅผ ์ถ”๋ก ํ•˜๋ฉด์„œ(reason to act) ๋™์‹œ์— ์žฌ๋ฃŒ ์ƒํƒœ๋ฅผ ํ™•์ธํ•˜๋ฉฐ ๊ณ„ํš ์กฐ์ •(act to reason)
  • ๋˜ ๋‹ค๋ฅธ ์˜ˆ: “์‹ํƒ ์œ„์— ์—†์œผ๋‹ˆ ์„œ๋ž์„ ํ™•์ธํ•ด๋ด์•ผ๊ฒ ๋‹ค”๋Š” ์ƒ๊ฐ(์ถ”๋ก ) -> ์„œ๋ž ์—ด๊ธฐ(ํ–‰๋™) -> ๊ฒฐ๊ณผ ํ™•์ธ(๊ด€์ฐฐ)
  • LLM๋„ ์ด๋Ÿฌํ•œ ์ธ๊ฐ„๊ณผ ์œ ์‚ฌํ•œ ์‹œ๋„ˆ์ง€์  ์ ‘๊ทผ์ด ํ•„์š”

๐Ÿ’ก ํ•ต์‹ฌ ์•„์ด๋””์–ด

ReAct = Reasoning + Acting

ReAct๋Š” LLM์ด ์ถ”๋ก  ํŠธ๋ ˆ์ด์Šค(reasoning traces)์™€ ํƒœ์Šคํฌ๋ณ„ ํ–‰๋™(task-specific actions)์„ ์ธํ„ฐ๋ฆฌ๋น™(interleaving) ๋ฐฉ์‹์œผ๋กœ ์ƒ์„ฑํ•˜๋„๋ก ํ•จ.

์–‘๋ฐฉํ–ฅ ์‹œ๋„ˆ์ง€

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        ReAct Framework                         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚   Reason to Act                    Act to Reason               โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€               โ”‚
โ”‚   โ€ข ๋™์  ์ถ”๋ก ์œผ๋กœ ๊ณ ์ˆ˜์ค€ ๊ณ„ํš ์ˆ˜๋ฆฝ    โ€ข ์™ธ๋ถ€ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉ      โ”‚
โ”‚   โ€ข ๊ณ„ํš ์œ ์ง€ ๋ฐ ์กฐ์ •                โ€ข ์ถ”๊ฐ€ ์ •๋ณด ํš๋“            โ”‚
โ”‚   โ€ข ์˜ˆ์™ธ ์ƒํ™ฉ ์ฒ˜๋ฆฌ                  โ€ข ์ถ”๋ก ์— ์ƒˆ ์ •๋ณด ๋ฐ˜์˜       โ”‚
โ”‚                                                                 โ”‚
โ”‚         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”           โ”‚
โ”‚         โ”‚Thoughtโ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚Actionโ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚Observationโ”‚           โ”‚
โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜           โ”‚
โ”‚              โ–ฒ                               โ”‚                  โ”‚
โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚
โ”‚                         (๋ฐ˜๋ณต)                                  โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Thought-Action-Observation ๋ฃจํ”„

๊ตฌ์„ฑ์š”์†Œ ์—ญํ•  ํŠน์ง• ์˜ˆ์‹œ
Thought ์ถ”๋ก  ์™ธ๋ถ€ ํ™˜๊ฒฝ์— ์˜ํ–ฅ ์—†์Œ, ๋‚ด๋ถ€ ์ถ”๋ก ๋งŒ “Colorado orogeny์˜ ๋™๋ถ€ ๊ตฌ์—ญ์„ ์ฐพ์•„๋ด์•ผ๊ฒ ๋‹ค”
Action ํ–‰๋™ ์™ธ๋ถ€ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉ Search[High Plains], Lookup[eastern sector]
Observation ๊ด€์ฐฐ ํ–‰๋™ ๊ฒฐ๊ณผ ์ˆ˜์‹  “High Plains๋Š” ํ•ด๋ฐœ 1,800~7,000ft์— ์œ„์น˜…”

ํ•ต์‹ฌ ์ฐจ๋ณ„์ 

  • Reasoning traces๋Š” ์™ธ๋ถ€ ํ™˜๊ฒฝ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Œ (์ˆœ์ˆ˜ ๋‚ด๋ถ€ ์ถ”๋ก )
  • Actions๋Š” ์™ธ๋ถ€ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜์—ฌ observation feedback์„ ๋ฐ›์Œ
  • ๋‘ ๊ฐ€์ง€๊ฐ€ ๊ต์ฐจ(interleaved)๋˜๋ฉด์„œ ์‹œ๋„ˆ์ง€ ํšจ๊ณผ ๋ฐœ์ƒ

๐Ÿ—๏ธ ์•„ํ‚คํ…์ฒ˜ / ๋ฐฉ๋ฒ•๋ก 

์‹œ์Šคํ…œ ๊ตฌ์กฐ

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         ReAct System                               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚   Prompt    โ”‚     โ”‚              Frozen LLM                 โ”‚  โ”‚
โ”‚  โ”‚ (Few-shot   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚       (PaLM-540B / GPT-3 ๋“ฑ)            โ”‚  โ”‚
โ”‚  โ”‚ Exemplars)  โ”‚     โ”‚                                         โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                         โ”‚                          โ”‚
โ”‚                                         โ–ผ                          โ”‚
โ”‚                      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”‚
โ”‚                      โ”‚     Autoregressive Generation    โ”‚          โ”‚
โ”‚                      โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚          โ”‚
โ”‚                      โ”‚  โ”‚ Thought โ”‚ or โ”‚   Action    โ”‚  โ”‚          โ”‚
โ”‚                      โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚          โ”‚
โ”‚                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ”‚
โ”‚                                         โ”‚                          โ”‚
โ”‚                           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
โ”‚                           โ–ผ                           โ–ผ            โ”‚
โ”‚                    Thought์ธ ๊ฒฝ์šฐ              Action์ธ ๊ฒฝ์šฐ       โ”‚
โ”‚                    (์ปจํ…์ŠคํŠธ์—                (์™ธ๋ถ€ ํ™˜๊ฒฝ ํ˜ธ์ถœ)     โ”‚
โ”‚                     ์ถ”๊ฐ€๋งŒ)                          โ”‚             โ”‚
โ”‚                                                      โ–ผ             โ”‚
โ”‚                                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚                                        โ”‚  External Environmentโ”‚     โ”‚
โ”‚                                        โ”‚  - Wikipedia API     โ”‚     โ”‚
โ”‚                                        โ”‚  - ALFWorld          โ”‚     โ”‚
โ”‚                                        โ”‚  - WebShop           โ”‚     โ”‚
โ”‚                                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                                                   โ”‚                โ”‚
โ”‚                                                   โ–ผ                โ”‚
โ”‚                                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚                                        โ”‚    Observation      โ”‚     โ”‚
โ”‚                                        โ”‚  (์ปจํ…์ŠคํŠธ์— ์ถ”๊ฐ€)   โ”‚     โ”‚
โ”‚                                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                                                                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

ReAct ํ”„๋กฌํ”„ํŠธ ๊ตฌ์กฐ

[Few-shot ์˜ˆ์‹œ๋“ค]
Question: {์งˆ๋ฌธ}
Thought 1: {ํ˜„์žฌ ์ƒํ™ฉ ๋ถ„์„ ๋ฐ ๊ณ„ํš}
Action 1: {์ˆ˜ํ–‰ํ•  ์•ก์…˜}[{ํŒŒ๋ผ๋ฏธํ„ฐ}]
Observation 1: {์•ก์…˜ ๊ฒฐ๊ณผ}
Thought 2: {๊ด€์ฐฐ ๊ฒฐ๊ณผ ๋ถ„์„ ๋ฐ ๋‹ค์Œ ๊ณ„ํš}
Action 2: ...
...
Thought N: {์ตœ์ข… ๋‹ต๋ณ€ ๋„์ถœ ์ถ”๋ก }
Action N: Finish[{์ตœ์ข… ๋‹ต๋ณ€}]

์ง€์› ์•ก์…˜ ํƒ€์ž… (QA/Fact Verification)

์•ก์…˜ ์„ค๋ช… ์˜ˆ์‹œ
Search[entity] Wikipedia์—์„œ ์—”ํ‹ฐํ‹ฐ ๊ฒ€์ƒ‰, ์ฒซ 5๋ฌธ์žฅ ๋ฐ˜ํ™˜. ์—†์œผ๋ฉด ์ƒ์œ„ 5๊ฐœ ์œ ์‚ฌ ์—”ํ‹ฐํ‹ฐ ์ œ์•ˆ Search[Colorado orogeny]
Lookup[keyword] ํ˜„์žฌ ํŽ˜์ด์ง€์—์„œ ํ‚ค์›Œ๋“œ๊ฐ€ ํฌํ•จ๋œ ๋‹ค์Œ ๋ฌธ์žฅ ๋ฐ˜ํ™˜ (Ctrl+F ์‹œ๋ฎฌ๋ ˆ์ด์…˜) Lookup[eastern sector]
Finish[answer] ์ตœ์ข… ๋‹ต๋ณ€ ์ œ์ถœ ๋ฐ ํƒœ์Šคํฌ ์ข…๋ฃŒ Finish[1,800 to 7,000 ft]

์‹ค์ œ ์˜ˆ์‹œ (HotpotQA)

Question: Colorado orogeny์˜ ๋™๋ถ€ ๊ตฌ์—ญ์ด ํ™•์žฅ๋˜๋Š” ์ง€์—ญ์˜ ๊ณ ๋„ ๋ฒ”์œ„๋Š”?

Thought 1: Colorado orogeny๋ฅผ ๊ฒ€์ƒ‰ํ•˜์—ฌ ๋™๋ถ€ ๊ตฌ์—ญ ์ •๋ณด๋ฅผ ์ฐพ์•„์•ผ ํ•œ๋‹ค.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building...
              (๋™๋ถ€ ๊ตฌ์—ญ์— ๋Œ€ํ•œ ์ง์ ‘์  ์–ธ๊ธ‰ ์—†์Œ)

Thought 2: ๋™๋ถ€ ๊ตฌ์—ญ์— ๋Œ€ํ•œ ์–ธ๊ธ‰์ด ์—†๋‹ค. ๋™๋ถ€ ๊ตฌ์—ญ์„ lookupํ•ด์•ผ ํ•œ๋‹ค.
Action 2: Lookup[eastern sector]
Observation 2: (Result 1/1) The eastern sector extends into the High Plains
              and is called the Central Plains orogeny.

Thought 3: ๋™๋ถ€ ๊ตฌ์—ญ์ด High Plains๋กœ ํ™•์žฅ๋œ๋‹ค. High Plains์˜ ๊ณ ๋„๋ฅผ ๊ฒ€์ƒ‰ํ•ด์•ผ ํ•œ๋‹ค.
Action 3: Search[High Plains]
Observation 3: High Plains refers to one of the two distinct land regions...
              rise in elevation from around 1,800 to 7,000 ft.

Thought 4: High Plains์˜ ๊ณ ๋„ ๋ฒ”์œ„๋Š” 1,800~7,000 ft์ด๋‹ค. ์ด๊ฒƒ์ด ๋‹ต์ด๋‹ค.
Action 4: Finish[1,800 to 7,000 ft]

Interactive Decision Making ์˜ˆ์‹œ (ALFWorld)

Task: put a clean apple in the fridge.

Thought: To complete this task, I need to find an apple, clean it, then put it in the fridge.
Action: go to countertop 1
Observation: On the countertop 1, you see an apple 1.

Thought: I found an apple. Now I need to take it.
Action: take apple 1 from countertop 1
Observation: You pick up the apple 1.

Thought: I have the apple. Now I need to clean it at the sink.
Action: go to sinkbasin 1
Observation: On the sinkbasin 1, you see nothing.

Thought: I'm at the sink. Let me clean the apple.
Action: clean apple 1 with sinkbasin 1
Observation: You clean the apple 1 using the sinkbasin 1.

Thought: The apple is clean. Now I need to put it in the fridge.
Action: go to fridge 1
Observation: The fridge 1 is closed.

Action: open fridge 1
Observation: You open the fridge 1.

Action: put apple 1 in/on fridge 1
Observation: You put the apple 1 in/on the fridge 1.

์•Œ๊ณ ๋ฆฌ์ฆ˜ Pseudo Code

def react_agent(question, few_shot_examples, llm, environment):
    """
    ReAct Agent ์•Œ๊ณ ๋ฆฌ์ฆ˜

    Args:
        question: ํ•ด๊ฒฐํ•  ์งˆ๋ฌธ/ํƒœ์Šคํฌ
        few_shot_examples: ์ธ์ปจํ…์ŠคํŠธ ํ•™์Šต์šฉ ์˜ˆ์‹œ ๊ถค์ ๋“ค
        llm: ์‚ฌ์ „ํ•™์Šต๋œ ์–ธ์–ด ๋ชจ๋ธ (frozen, PaLM-540B)
        environment: ์™ธ๋ถ€ ํ™˜๊ฒฝ (Wikipedia API, ALFWorld ๋“ฑ)

    Returns:
        final_answer: ์ตœ์ข… ๋‹ต๋ณ€
    """
    # ํ”„๋กฌํ”„ํŠธ ์ดˆ๊ธฐํ™”
    context = few_shot_examples + f"\nQuestion: {question}\n"
    step = 1

    while True:
        # 1. Thought ์ƒ์„ฑ (์ถ”๋ก  ๋‹จ๊ณ„)
        thought = llm.generate(
            context + f"Thought {step}:",
            stop_tokens=["Action"]
        )
        context += f"Thought {step}: {thought}\n"

        # 2. Action ์ƒ์„ฑ (ํ–‰๋™ ๊ฒฐ์ •)
        action = llm.generate(
            context + f"Action {step}:",
            stop_tokens=["Observation"]
        )
        context += f"Action {step}: {action}\n"

        # 3. Action ํŒŒ์‹ฑ ๋ฐ ์‹คํ–‰
        action_type, action_arg = parse_action(action)

        # ์ข…๋ฃŒ ์กฐ๊ฑด ์ฒดํฌ
        if action_type == "finish":
            return action_arg

        # 4. ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜์—ฌ Observation ํš๋“
        observation = environment.execute(action_type, action_arg)
        context += f"Observation {step}: {observation}\n"

        step += 1

        # ์ตœ๋Œ€ ์Šคํ… ๋„๋‹ฌ ์‹œ ์ข…๋ฃŒ
        if step > MAX_STEPS:
            return "Unable to find answer"


def parse_action(action_string):
    """
    ์•ก์…˜ ๋ฌธ์ž์—ด์„ ํŒŒ์‹ฑํ•˜์—ฌ ํƒ€์ž…๊ณผ ์ธ์ž ์ถ”์ถœ
    ์˜ˆ: "search[Colorado orogeny]" -> ("search", "Colorado orogeny")
    """
    match = re.match(r"(\w+)\[(.+)\]", action_string)
    if match:
        return match.group(1), match.group(2)
    return None, None

4๊ฐ€์ง€ ํ”„๋กฌํ”„ํŒ… ๋ฐฉ๋ฒ• ๋น„๊ต

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              Prompting Methods Comparison                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                            โ”‚
โ”‚  (a) Standard:   Question โ†’ Answer                         โ”‚
โ”‚                  (์ง์ ‘ ๋‹ต๋ณ€, ์ถ”๋ก  ์—†์Œ)                     โ”‚
โ”‚                                                            โ”‚
โ”‚  (b) CoT:        Question โ†’ Thoughtโ‚ โ†’ ... โ†’ Answer        โ”‚
โ”‚      (Reason     (๋‚ด๋ถ€ ์ถ”๋ก ๋งŒ, ์™ธ๋ถ€ ์ƒํ˜ธ์ž‘์šฉ ์—†์Œ)          โ”‚
โ”‚       Only)                                                โ”‚
โ”‚                                                            โ”‚
โ”‚  (c) Act-only:   Question โ†’ Actionโ‚ โ†’ Obsโ‚ โ†’ ... โ†’ Answer  โ”‚
โ”‚                  (ํ–‰๋™๋งŒ, ์ถ”๋ก  ์—†์Œ)                        โ”‚
โ”‚                                                            โ”‚
โ”‚  (d) ReAct:      Question โ†’ Thoughtโ‚ โ†’ Actionโ‚ โ†’ Obsโ‚ โ†’    โ”‚
โ”‚                            Thoughtโ‚‚ โ†’ Actionโ‚‚ โ†’ Obsโ‚‚ โ†’ ... โ”‚
โ”‚                  (์ถ”๋ก  + ํ–‰๋™ ๊ต์ฐจ)                         โ”‚
โ”‚                                                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“Š ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ

์‚ฌ์šฉ ๋ชจ๋ธ ๋ฐ ์„ค์ •

  • Base Model: PaLM-540B (frozen, prompting only)
  • Few-shot Examples: 1-6๊ฐœ์˜ in-context ์˜ˆ์‹œ
  • ๋น„๊ต ๋ชจ๋ธ: GPT-3 (text-davinci-002)

๋ฒค์น˜๋งˆํฌ ๊ฐœ์š”

๋ฒค์น˜๋งˆํฌ ํƒœ์Šคํฌ ์œ ํ˜• ํ™˜๊ฒฝ ํ‰๊ฐ€ ์ง€ํ‘œ
HotpotQA ๋‹ค์ค‘ ํ™‰ ์งˆ๋ฌธ ๋‹ต๋ณ€ Wikipedia API Exact Match (EM)
FEVER ์‚ฌ์‹ค ๊ฒ€์ฆ (SUPPORTS/REFUTES/NOT ENOUGH INFO) Wikipedia API Accuracy
ALFWorld ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ ๊ฒŒ์ž„ ๊ฐ€์ƒ ๊ฐ€์ • ํ™˜๊ฒฝ Success Rate
WebShop ์›นํŽ˜์ด์ง€ ํƒ์ƒ‰ ๋ฐ ์‡ผํ•‘ ์›น ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ Success Rate, Reward

Knowledge-Intensive Tasks ๊ฒฐ๊ณผ

HotpotQA (Multi-hop Question Answering)

๋ฐฉ๋ฒ• Exact Match (EM) ํŠน์ง•
Standard 25.7% ๋‹จ์ˆœ ๋‹ต๋ณ€ ์ƒ์„ฑ
CoT (Chain-of-Thought) 29.4% ๋‚ด๋ถ€ ์ถ”๋ก ๋งŒ ์‚ฌ์šฉ
Act-only 25.7% ํ–‰๋™๋งŒ ์ˆ˜ํ–‰, ์ถ”๋ก  ์—†์Œ
ReAct 27.4% ์ถ”๋ก  + ํ–‰๋™ ๊ฒฐํ•ฉ
ReAct + CoT-SC 35.1% ๋‘ ๋ฐฉ๋ฒ• ๊ฒฐํ•ฉ ์‹œ ์ตœ๊ณ  ์„ฑ๋Šฅ

Note: CoT๊ฐ€ HotpotQA์—์„œ ์•ฝ๊ฐ„ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋‚˜, ReAct๋Š” ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ๊ณผ ์‹ ๋ขฐ์„ฑ์—์„œ ์šฐ์œ„

FEVER (Fact Verification)

๋ฐฉ๋ฒ• Accuracy False Positive Rate
CoT 56.3% 14% (ํ™˜๊ฐ์œผ๋กœ ์ธํ•œ)
Act-only 58.9% –
ReAct 60.9% 6%
ReAct + CoT-SC 64.6% –

ReAct๊ฐ€ FEVER์—์„œ CoT ๋Œ€๋น„ 4.6%p ํ–ฅ์ƒ, ํ™˜๊ฐ์œผ๋กœ ์ธํ•œ ์˜ค๋ฅ˜ ๋Œ€ํญ ๊ฐ์†Œ

Decision Making Tasks ๊ฒฐ๊ณผ

ALFWorld (Text-based Game)

๋ฐฉ๋ฒ• Success Rate Training Data
BUTLER (IL) 26% 10^5 instances
BUTLER (IL+RL) 37% 10^5 instances
Act-only (1-shot) 45% 1 example
ReAct (1-shot) 71% 1 example
ReAct (best trial) 71% –
ReAct (worst trial) 48% –

ReAct๊ฐ€ ๊ฐ•ํ™”ํ•™์Šต/๋ชจ๋ฐฉํ•™์Šต ๋Œ€๋น„ 34%p ์ ˆ๋Œ€์  ํ–ฅ์ƒ, ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ๊ฑฐ์˜ ๋ถˆํ•„์š”

WebShop (Web Navigation)

๋ฐฉ๋ฒ• Success Rate Reward
IL (Imitation Learning) 29.1% –
IL + RL 30% –
ReAct (1-shot) 40% –
Human Expert 50% –

ReAct๊ฐ€ ๊ธฐ์กด ์ตœ๊ณ  ๋ฐฉ๋ฒ• ๋Œ€๋น„ 10%p ํ–ฅ์ƒ, ์ธ๊ฐ„ ์ˆ˜์ค€(50%)์— ๊ทผ์ ‘

์˜ค๋ฅ˜ ๋ถ„์„

์˜ค๋ฅ˜ ์œ ํ˜• CoT ReAct ์„ค๋ช…
ํ™˜๊ฐ (Hallucination) 56% ๋‚ฎ์Œ CoT์˜ ์ฃผ์š” ์‹คํŒจ ์›์ธ
๋น„์ •๋ณด์  ๊ฒ€์ƒ‰ – 23% ReAct์˜ ์ฃผ์š” ์‹คํŒจ ์›์ธ
์ถ”๋ก  ์˜ค๋ฅ˜ ์ค‘๊ฐ„ ๋‚ฎ์Œ –
False Positive Rate 14% 6% ํ™˜๊ฐ์œผ๋กœ ์ธํ•œ ์ž˜๋ชป๋œ ๊ธ์ •

ํ•ต์‹ฌ ๋ฐœ๊ฒฌ

  1. ํ™˜๊ฐ ๊ฐ์†Œ: ReAct๋Š” CoT ๋Œ€๋น„ ํ™˜๊ฐ์œผ๋กœ ์ธํ•œ ์‹คํŒจ๋ฅผ ํฌ๊ฒŒ ๊ฐ์†Œ์‹œํ‚ด
  2. ์ƒ˜ํ”Œ ํšจ์œจ์„ฑ: 1-2๊ฐœ์˜ few-shot ์˜ˆ์ œ๋งŒ์œผ๋กœ 10^3~10^5๊ฐœ ์ธ์Šคํ„ด์Šค๋กœ ํ›ˆ๋ จ๋œ IL/RL ๋ฐฉ๋ฒ• ๋Šฅ๊ฐ€
  3. ReAct + CoT ๊ฒฐํ•ฉ์ด ์ตœ์ : ๋‚ด๋ถ€ ์ง€์‹(CoT)๊ณผ ์™ธ๋ถ€ ์ •๋ณด(ReAct) ๋ชจ๋‘ ํ™œ์šฉ ๊ฐ€๋Šฅ
  4. ๊ฒ€์ƒ‰ ํ’ˆ์งˆ์˜ ์ค‘์š”์„ฑ: ReAct์—์„œ ๋น„์ •๋ณด์  ๊ฒ€์ƒ‰(non-informative search)์€ ์˜ค๋ฅ˜์˜ 23% ์ฐจ์ง€

๐Ÿ’ช ๊ฐ•์  ๋ฐ ๊ธฐ์—ฌ

ํ•™์ˆ ์  ๊ธฐ์—ฌ

  1. ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„ ์ œ์‹œ: ์ถ”๋ก ๊ณผ ํ–‰๋™์˜ ์‹œ๋„ˆ์ง€๋ฅผ ํ†ตํ•œ LLM ์—์ด์ „ํŠธ์˜ ๊ธฐ์ดˆ ํ”„๋ ˆ์ž„์›Œํฌ ํ™•๋ฆฝ

  2. ํ™˜๊ฐ ๋ฌธ์ œ ์™„ํ™”: ์™ธ๋ถ€ ์ง€์‹ ๊ธฐ๋ฐ˜๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ํ†ตํ•ด ์‚ฌ์‹ค์  ์ •ํ™•์„ฑ ํ–ฅ์ƒ

  3. ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ํ–ฅ์ƒ: ์ธ๊ฐ„๊ณผ ์œ ์‚ฌํ•œ ๋ฌธ์ œ ํ•ด๊ฒฐ ๊ถค์  ์ƒ์„ฑ์œผ๋กœ ๋””๋ฒ„๊น… ์šฉ์ด

์‹ค์šฉ์  ์žฅ์ 

์žฅ์  ์„ค๋ช…
ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ๊ฐ ๋‹จ๊ณ„์˜ ์ถ”๋ก  ๊ณผ์ •์ด ๋ช…์‹œ์ ์œผ๋กœ ํ‘œํ˜„๋จ
๋””๋ฒ„๊น… ์šฉ์ด ์–ด๋А ๋‹จ๊ณ„์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ๋Š”์ง€ ์‰ฝ๊ฒŒ ํŒŒ์•… ๊ฐ€๋Šฅ
ํ™˜๊ฐ ๊ฐ์†Œ ์™ธ๋ถ€ ์ง€์‹ ์†Œ์Šค์™€ ์ƒํ˜ธ์ž‘์šฉํ•˜์—ฌ ์‚ฌ์‹ค ํ™•์ธ
์˜ค๋ฅ˜ ์ „ํŒŒ ๋ฐฉ์ง€ ์ค‘๊ฐ„ ๋‹จ๊ณ„์—์„œ ์™ธ๋ถ€ ํ”ผ๋“œ๋ฐฑ์œผ๋กœ ์˜ค๋ฅ˜ ์ˆ˜์ •
๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ 1-2๊ฐœ์˜ few-shot ์˜ˆ์‹œ๋งŒ์œผ๋กœ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ
์œ ์—ฐ์„ฑ ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ(QA, ๊ฒŒ์ž„, ์›น)์— ์ ์šฉ ๊ฐ€๋Šฅ
์ œ์–ด ๊ฐ€๋Šฅ์„ฑ ์ธ๊ฐ„์ด ์ถ”๋ก  ๊ณผ์ •์— ๊ฐœ์ž…ํ•˜์—ฌ ์ˆ˜์ • ๊ฐ€๋Šฅ

Human-in-the-Loop ๊ฐ€๋Šฅ์„ฑ

  • ํ™˜๊ฐ์ด ๋ฐœ์ƒํ•œ ๋ฌธ์žฅ์„ ์ธ๊ฐ„์ด ์ˆ˜์ •ํ•˜๋ฉด ๋ชจ๋ธ์ด ์ด๋ฅผ ๋ฐ˜์˜ํ•˜์—ฌ ์˜ฌ๋ฐ”๋ฅธ ๋‹ต์„ ๋„์ถœ
  • ์†Œ์ˆ˜์˜ Thought๋งŒ ์ˆ˜์ •ํ•˜๋ฉด ๋˜๋ฏ€๋กœ ์ธ๊ฐ„-๊ธฐ๊ณ„ ํ˜‘์—…์— ํšจ๊ณผ์ 

์‹ค๋ฌด์  ์˜ํ–ฅ

  • LangChain Agents์˜ ๊ธฐ๋ฐ˜: ์ƒ์šฉ AI ์—์ด์ „ํŠธ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ํ•ต์‹ฌ ๊ตฌ์กฐ๋กœ ์ฑ„ํƒ
  • AI Agent ์—ฐ๊ตฌ์˜ ๊ธฐ์ดˆ: ํ›„์† ์—ฐ๊ตฌ(Reflexion, ReSpAct, AutoGPT ๋“ฑ)์˜ foundation
  • ์‚ฐ์—… ํ‘œ์ค€ํ™”: Tool-calling AI ์‹œ์Šคํ…œ์˜ de facto ํ‘œ์ค€ ํŒจํ„ด์œผ๋กœ ์ž๋ฆฌ์žก์Œ

โš ๏ธ ํ•œ๊ณ„์  ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ

๋…ผ๋ฌธ์—์„œ ์–ธ๊ธ‰ํ•œ ํ•œ๊ณ„

  1. ๊ฒ€์ƒ‰ ์˜์กด์„ฑ

    • ๋น„์ •๋ณด์  ๊ฒ€์ƒ‰(non-informative search)์ด ์˜ค๋ฅ˜์˜ 23% ์ฐจ์ง€
    • ๊ฒ€์ƒ‰ ์‹คํŒจ ์‹œ ์ถ”๋ก  ๋ณต๊ตฌ ์–ด๋ ค์›€
    • ๋ชจ๋ธ์ด ์ถ”๋ก ์„ ์žฌ๊ตฌ์„ฑํ•˜๊ธฐ ์–ด๋ ค์›€
  2. ๊ตฌ์กฐ์  ๊ฒฝ์ง์„ฑ
    • Thought -> Action -> Observation์˜ ๊ณ ์ •๋œ ์ˆœ์„œ
    • ์œ ์—ฐ์„ฑ ๋ถ€์กฑ
  3. ํ”„๋กฌํ”„ํŒ… ๊ธฐ๋ฐ˜์˜ ํ•œ๊ณ„
    • Few-shot ์˜ˆ์‹œ์˜ ํ’ˆ์งˆ์— ์„ฑ๋Šฅ ์˜์กด
    • ๋„๋ฉ”์ธ๋ณ„ ํ”„๋กฌํ”„ํŠธ ์„ค๊ณ„ ํ•„์š”
  4. ์ถ”๋ก  ๋ฃจํ”„ ๋ฌธ์ œ
    • ๊ฐ™์€ Thought๋‚˜ Action์„ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฃจํ”„์— ๋น ์งˆ ์ˆ˜ ์žˆ์Œ
  5. ์ธ๊ฐ„ ์ˆ˜์ค€ ๋Œ€๋น„ ์„ฑ๋Šฅ ๊ฒฉ์ฐจ
    • WebShop์—์„œ ์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€(50%) ๋Œ€๋น„ ์—ฌ์ „ํžˆ ๋‚ฎ์€ ์„ฑ๋Šฅ(40%)

ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

๋ฐฉํ–ฅ ์„ค๋ช…
๊ฐ•ํ™”ํ•™์Šต ํ†ตํ•ฉ RL์„ ํ†ตํ•œ ์žฅ๊ธฐ ๊ณ„ํš ๋ฐ ์ „๋žต ํ•™์Šต, ๋ณด์ƒ/ํŒจ๋„ํ‹ฐ ๊ธฐ๋ฐ˜ ํ–‰๋™ ๊ฐœ์„ 
๋ฉ€ํ‹ฐํƒœ์Šคํฌ ํ•™์Šต ์—ฌ๋Ÿฌ ํƒœ์Šคํฌ ๋™์‹œ ํ•™์Šต์œผ๋กœ ์ผ๋ฐ˜ํ™” ํ–ฅ์ƒ, few-shot ์˜์กด๋„ ๊ฐ์†Œ
Fine-tuning ReAct ๊ถค์ ์œผ๋กœ ์†Œํ˜• ๋ชจ๋ธ ์ง์ ‘ ํ•™์Šต, ๋” ๋‚˜์€ ์„ฑ๋Šฅ
Human Feedback ์ธ๊ฐ„ ํ”ผ๋“œ๋ฐฑ ํ†ตํ•ฉ์œผ๋กœ ์ถ”๋ก  ํ’ˆ์งˆ ํ–ฅ์ƒ
๋” ๋‚˜์€ ๊ฒ€์ƒ‰ ์ •๋ณด ๊ฒ€์ƒ‰ ํ’ˆ์งˆ ๊ฐœ์„ ์„ ํ†ตํ•œ ๋น„์ •๋ณด์  ๊ฒ€์ƒ‰ ๋ฌธ์ œ ํ•ด๊ฒฐ
๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ํ™•์žฅ ๋น„์ „, ์˜ค๋””์˜ค ๋“ฑ ๋‹ค์–‘ํ•œ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์ง€์›

ํ›„์† ์—ฐ๊ตฌ๋“ค์˜ ๊ฐœ์„ 

์—ฐ๊ตฌ ALFWorld WebShop ๊ฐœ์„ ์ 
ReAct (์›๋ณธ) 71% 40% Baseline
ReSpAct 77% (+6%) 44% (+4%) Self-Planning ์ถ”๊ฐ€
A3T (ActRe) 96% (+25%) 49% (+9%) ActRe ๊ธฐ๋ฐ˜ ํ•™์Šต
REBACT 98.51% 61% Reflection ๊ฐ•ํ™”

๐Ÿ”— ๊ด€๋ จ ๋…ผ๋ฌธ

์„ ํ–‰ ์—ฐ๊ตฌ

๋…ผ๋ฌธ ๊ด€๊ณ„ ํ•ต์‹ฌ ๋‚ด์šฉ
Chain-of-Thought Prompting (Wei et al., 2022) Reasoning ๊ธฐ๋ฐ˜ ์ถ”๋ก  ๊ณผ์ •์„ ๋‹จ๊ณ„๋ณ„๋กœ ์ƒ์„ฑํ•˜์—ฌ ๋ณต์žกํ•œ ๋ฌธ์ œ ํ•ด๊ฒฐ
Self-Consistency (Wang et al., 2022) Reasoning ๊ธฐ๋ฐ˜ ๋‹ค์ค‘ ์ถ”๋ก  ๊ฒฝ๋กœ ์ƒ˜ํ”Œ๋ง ๋ฐ ์ผ๊ด€์„ฑ ๊ฒ€์ฆ
Zero-shot CoT (Kojima et al., 2022) Reasoning ๊ธฐ๋ฐ˜ “Let’s think step by step”์œผ๋กœ ์ œ๋กœ์ƒท ์ถ”๋ก 
Inner Monologue (Huang et al., 2022) Acting ๊ธฐ๋ฐ˜ ๋กœ๋ด‡ ํ™˜๊ฒฝ์—์„œ closed-loop ํ”ผ๋“œ๋ฐฑ ์‹œ์Šคํ…œ
SayCan (Ahn et al., 2022) Acting ๊ธฐ๋ฐ˜ LLM + ๋กœ๋ด‡ ํ–‰๋™ ๊ฒฐํ•ฉ
WebGPT (Nakano et al., 2021) Acting ๊ธฐ๋ฐ˜ ์›น ๋ธŒ๋ผ์šฐ์ง•์„ ํ†ตํ•œ QA

ํ›„์†/๊ด€๋ จ ์—ฐ๊ตฌ

๋…ผ๋ฌธ ๊ด€๊ณ„ ํ•ต์‹ฌ ๋‚ด์šฉ
Reflexion (Shinn et al., 2023) ReAct ํ™•์žฅ ์ž๊ธฐ ๋ฐ˜์„ฑ์„ ํ†ตํ•œ ์—์ด์ „ํŠธ ๊ฐœ์„ , ๋ฉ”๋ชจ๋ฆฌ ํ”ผ๋“œ๋ฐฑ
Tree of Thoughts (Yao et al., 2023) Reasoning ํ™•์žฅ ๋‹ค์ค‘ ์ถ”๋ก  ๊ฒฝ๋กœ ํŠธ๋ฆฌ ํƒ์ƒ‰
Toolformer (Schick et al., 2023) ๋„๊ตฌ ์‚ฌ์šฉ LLM์˜ ์ž๊ฐ€ ์ง€๋„ API ํ˜ธ์ถœ ํ•™์Šต
MRKL Systems (Karpas et al., 2022) ๊ด€๋ จ ํ”„๋ ˆ์ž„์›Œํฌ Modular Reasoning, Knowledge, Language
AutoGPT, BabyAGI (2023) ์‹ค์šฉ ์‘์šฉ ReAct ๊ธฐ๋ฐ˜ ์ž์œจ ์—์ด์ „ํŠธ

ReAct vs Inner Monologue ์ฐจ์ด์ 

  • Inner Monologue: Closed-loop ํ”ผ๋“œ๋ฐฑ ์‹œ์Šคํ…œ์˜ ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ, ReAct์˜ ์˜๊ฐ
  • ReAct ์ €์ž๋“ค์˜ ์ฃผ์žฅ: Inner Monologue๋Š” ์ง„์ •ํ•œ ์˜๋ฏธ์˜ “inner thoughts”๋ฅผ ํฌํ•จํ•˜์ง€ ์•Š์Œ
  • ReAct๋Š” explicitํ•œ reasoning traces๋ฅผ ํ†ตํ•ด ๋” ๋ช…ํ™•ํ•œ ์ถ”๋ก  ๊ณผ์ • ์ œ๊ณต

๐Ÿ’ป ์‹ค๋ฌด ์ ์šฉ ํฌ์ธํŠธ

ReAct ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ (QA์šฉ)

Solve a question answering task with interleaving Thought, Action, Observation steps.

Thought can reason about the current situation.
Action can be three types:
(1) Search[entity]: searches the exact entity on Wikipedia and returns the first paragraph.
(2) Lookup[keyword]: returns the next sentence containing keyword in the current passage.
(3) Finish[answer]: returns the answer and finishes the task.

Here are some examples:
{few_shot_examples}

Question: {input_question}
{agent_scratchpad}

LangChain ๊ตฌํ˜„ ์˜ˆ์‹œ (Classic)

from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI

# 1. ํ”„๋กฌํ”„ํŠธ ๋กœ๋“œ
prompt = hub.pull("hwchase17/react")

# 2. LLM ์ดˆ๊ธฐํ™”
llm = ChatOpenAI(model="gpt-4", temperature=0)

# 3. ๋„๊ตฌ ์ •์˜
tools = [TavilySearchResults(max_results=3)]

# 4. ReAct ์—์ด์ „ํŠธ ์ƒ์„ฑ
agent = create_react_agent(llm, tools, prompt)

# 5. ์‹คํ–‰๊ธฐ ์ƒ์„ฑ
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=10,
    handle_parsing_errors=True
)

# 6. ์‹คํ–‰
result = agent_executor.invoke({
    "input": "What is the elevation range of the area that the eastern sector of Colorado orogeny extends into?"
})

LangGraph ๊ตฌํ˜„ (Modern Approach – ๊ถŒ์žฅ)

from langgraph.prebuilt import create_react_agent
from langchain_core.tools import tool

@tool
def search_wikipedia(query: str) -> str:
    """Search Wikipedia for information about a topic."""
    # Wikipedia API ํ˜ธ์ถœ ๊ตฌํ˜„
    return wikipedia_api.search(query)

@tool
def lookup_keyword(keyword: str) -> str:
    """Look up a keyword in the current page."""
    return current_page.find(keyword)

# ReAct ์—์ด์ „ํŠธ ์ƒ์„ฑ
graph = create_react_agent(
    model="anthropic:claude-3-7-sonnet-latest",
    tools=[search_wikipedia, lookup_keyword],
    prompt="You are a helpful research assistant."
)

# ์‹คํ–‰
inputs = {"messages": [{"role": "user", "content": "์งˆ๋ฌธ ๋‚ด์šฉ"}]}
for chunk in graph.stream(inputs, stream_mode="updates"):
    print(chunk)

์ปค์Šคํ…€ ReAct ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ (ํ•œ๊ตญ์–ด)

REACT_PROMPT_KR = """๋‹ค์Œ ๋„๊ตฌ๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ ์งˆ๋ฌธ์— ๋‹ตํ•˜์„ธ์š”:

{tools}

๋‹ค์Œ ํ˜•์‹์„ ์‚ฌ์šฉํ•˜์„ธ์š”:

Question: ๋‹ตํ•ด์•ผ ํ•  ์งˆ๋ฌธ
Thought: ๋ฌด์—‡์„ ํ•ด์•ผ ํ• ์ง€ ํ•ญ์ƒ ์ƒ๊ฐํ•˜์„ธ์š”
Action: [{tool_names}] ์ค‘ ํ•˜๋‚˜์˜ ์•ก์…˜
Action Input: ์•ก์…˜์— ๋Œ€ํ•œ ์ž…๋ ฅ
Observation: ์•ก์…˜์˜ ๊ฒฐ๊ณผ
... (Thought/Action/Action Input/Observation์„ N๋ฒˆ ๋ฐ˜๋ณต)
Thought: ์ด์ œ ์ตœ์ข… ๋‹ต์„ ์•Œ์•˜์Šต๋‹ˆ๋‹ค
Final Answer: ์›๋ž˜ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์ตœ์ข… ๋‹ต๋ณ€

์‹œ์ž‘ํ•˜์„ธ์š”!

Question: {input}
{agent_scratchpad}"""

์‹ค๋ฌด ์ ์šฉ ์‹œ ๊ณ ๋ ค์‚ฌํ•ญ

๊ณ ๋ ค์‚ฌํ•ญ ๊ถŒ์žฅ ์‚ฌํ•ญ
Few-shot ์˜ˆ์‹œ ๋„๋ฉ”์ธ์— ๋งž๋Š” ๊ณ ํ’ˆ์งˆ ์˜ˆ์‹œ 3-6๊ฐœ ์ค€๋น„
๋„๊ตฌ ์„ค๊ณ„ ๋ช…ํ™•ํ•œ ์„ค๋ช…๊ณผ ํŒŒ๋ผ๋ฏธํ„ฐ ์ •์˜ ํ•„์ˆ˜
์ตœ๋Œ€ ๋ฐ˜๋ณต ํšŸ์ˆ˜ ๋ฌดํ•œ ๋ฃจํ”„ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•ด max_iterations ์„ค์ • (7-10 ๊ถŒ์žฅ)
์—๋Ÿฌ ์ฒ˜๋ฆฌ ๊ฒ€์ƒ‰ ์‹คํŒจ ์‹œ ๋Œ€์ฒด ์ „๋žต ๊ตฌํ˜„, ํŒŒ์‹ฑ ์—๋Ÿฌ ํ•ธ๋“ค๋ง
ํ† ํฐ ๊ด€๋ฆฌ Thought ์ƒ์„ฑ์œผ๋กœ ์ธํ•œ ์ถ”๊ฐ€ ํ† ํฐ ๋น„์šฉ ๊ณ ๋ ค
๋กœ๊น… Thought/Action/Observation ์ „์ฒด ๋กœ๊น…์œผ๋กœ ๋””๋ฒ„๊น…
ํด๋ฐฑ ์ „๋žต ReAct ์‹คํŒจ ์‹œ CoT ํด๋ฐฑ ๊ตฌํ˜„ ๊ถŒ์žฅ

๋””๋ฒ„๊น… ํŒ

# verbose=True๋กœ ์„ค์ •ํ•˜์—ฌ ์ „์ฒด ์ถ”๋ก  ๊ณผ์ • ํ™•์ธ
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,  # ์ถ”๋ก  ๊ณผ์ • ์ถœ๋ ฅ
    return_intermediate_steps=True  # ์ค‘๊ฐ„ ๋‹จ๊ณ„ ๋ฐ˜ํ™˜
)

result = agent_executor.invoke({"input": "์งˆ๋ฌธ"})

# ์ค‘๊ฐ„ ๋‹จ๊ณ„ ๋ถ„์„
for step in result["intermediate_steps"]:
    action, observation = step
    print(f"Action: {action.tool}")
    print(f"Input: {action.tool_input}")
    print(f"Observation: {observation}")
    print("---")

๐Ÿท๏ธ Tags

#AIAgent #ReAct #Reasoning #Acting #LLM #PromptEngineering #ChainOfThought #CoT #Hallucination #HotpotQA #FEVER #ALFWorld #WebShop #LangChain #LangGraph #ICLR2023 #Princeton #GoogleBrain #LanguageModel #AgenticAI #ToolUse #ThoughtActionObservation #InteractiveDecisionMaking #FewShotLearning #WikipediaAPI


๐Ÿ“š ์ฐธ๊ณ  ์ž๋ฃŒ

  • arXiv ๋…ผ๋ฌธ
  • OpenReview
  • ๊ณต์‹ GitHub
  • ํ”„๋กœ์ ํŠธ ํŽ˜์ด์ง€
  • Google Research Blog
  • LangChain ReAct Agent ๋ฌธ์„œ
  • LangGraph ReAct Agent
  • Prompt Engineering Guide – ReAct
์ž‘์„ฑ์ž

skycave

Follow Me
๋‹ค๋ฅธ ๊ธฐ์‚ฌ
Previous

[AI Paper] ๐Ÿ“„ RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems

Next

[AI Paper] ๐Ÿ“„ Reflexion: Language Agents with Verbal Reinforcement Learning

๋Œ“๊ธ€ ์—†์Œ! ์ฒซ ๋Œ“๊ธ€์„ ๋‚จ๊ฒจ๋ณด์„ธ์š”.

๋‹ต๊ธ€ ๋‚จ๊ธฐ๊ธฐ ์‘๋‹ต ์ทจ์†Œ

์ด๋ฉ”์ผ ์ฃผ์†Œ๋Š” ๊ณต๊ฐœ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•„์ˆ˜ ํ•„๋“œ๋Š” *๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค

์ตœ์‹ ๊ธ€

  • ๐Ÿ“Š ์ผ์ผ ๋‰ด์Šค ๊ฐ์„ฑ ๋ฆฌํฌํŠธ – 2026-01-28
  • AI ์‹œ์Šคํ…œ์˜ ๋ฌธ๋งฅ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰(Contextual Retrieval) | Anthropic
  • “Think” ํˆด: Claude๊ฐ€ ๋ฉˆ์ถฐ์„œ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ | Anthropic
  • Claude Code ๋ชจ๋ฒ” ์‚ฌ๋ก€ \ Anthropic
  • ์šฐ๋ฆฌ๊ฐ€ ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ์—ฐ๊ตฌ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ ๋ฐฉ๋ฒ•
Copyright 2026 — skycave's Blog. All rights reserved. Blogsy WordPress Theme