๋ณธ๋ฌธ์œผ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ
-
skycave's Blog
skycave's Blog
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
๋‹ซ๊ธฐ

๊ฒ€์ƒ‰

AI

[AI Paper] ๐Ÿ“„ Beyond Pipelines: Model-Native Agentic AI Survey (2025)

By skycave
2026๋…„ 01์›” 25์ผ 5 Min Read
0

๐Ÿ“„ Beyond Pipelines: Model-Native Agentic AI Survey (2025)

๐Ÿ“‹ ๋ฉ”ํƒ€ ์ •๋ณด

ํ•ญ๋ชฉ ๋‚ด์šฉ
์ œ๋ชฉ Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI
์ €์ž Jitao Sang ์™ธ 7์ธ (Beijing Jiaotong University)
์ถœํŒ์ผ 2025๋…„ 10์›” (arXiv:2510.16720)
๋ฒ„์ „ v2 (2025.10.26)
๋ถ„์•ผ Agentic AI, LLM, Reinforcement Learning
arXiv https://arxiv.org/abs/2510.16720
GitHub https://github.com/ADaM-BJTU/model-native-agentic-ai

๐ŸŽฏ ํ•œ์ค„ ์š”์•ฝ

LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ ๊ตฌ์ถ• ํŒจ๋Ÿฌ๋‹ค์ž„์ด ์™ธ๋ถ€ ๋กœ์ง์œผ๋กœ ์กฐ์œจ๋˜๋Š” Pipeline ๋ฐฉ์‹์—์„œ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ ๋‚ด๋ถ€์— ์—์ด์ „ํŠธ ๋Šฅ๋ ฅ์„ ๋‚ด์žฌํ™”ํ•˜๋Š” Model-native ๋ฐฉ์‹์œผ๋กœ ์ „ํ™˜๋˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ด ๋ณ€ํ™”์˜ ํ•ต์‹ฌ ๋™๋ ฅ์€ ๊ฐ•ํ™”ํ•™์Šต(RL)์ด๋‹ค.


๐Ÿ” ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ

Agentic AI์˜ ์ƒˆ๋กœ์šด ๊ตญ๋ฉด

  • LLM์ด ๋‹จ์ˆœํžˆ ์‘๋‹ตํ•˜๋Š” ๊ฒƒ์„ ๋„˜์–ด ํ–‰๋™ํ•˜๊ณ (act), ์ถ”๋ก ํ•˜๊ณ (reason), ์ ์‘ํ•˜๋Š”(adapt) ์ƒˆ๋กœ์šด AI ๋ฐœ์ „ ๋‹จ๊ณ„ ์ง„์ž…
  • AI ์—์ด์ „ํŠธ๊ฐ€ ์‹ค์„ธ๊ณ„ ํ™˜๊ฒฝ๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•˜๋ฉฐ ๋ณต์žกํ•œ ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ์‹œ์Šคํ…œ์œผ๋กœ ์ง„ํ™”

Pipeline ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•์˜ ํ•œ๊ณ„

ํ•œ๊ณ„์  ์„ค๋ช…
๊ฒฝ์ง์„ฑ(Rigidity) ์‚ฌ์ „ ์ •์˜๋œ ์›Œํฌํ”Œ๋กœ์šฐ์— ๋”ฐ๋ผ ์‹คํ–‰๋˜์–ด ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ์ƒํ™ฉ์— ๋Œ€์‘ ๋ถˆ๊ฐ€
์ทจ์•ฝ์„ฑ(Brittleness) ์„ธ์‹ฌํ•˜๊ฒŒ ์—”์ง€๋‹ˆ์–ด๋ง๋œ ํŒŒ์ดํ”„๋ผ์ธ์— ๊ณผ๋„ํ•˜๊ฒŒ ์˜์กด
๋†’์€ ๋น„์šฉ ์ง€์‹ ํ˜•์‹ํ™”์™€ ํ”„๋กฌํ”„ํŠธ ์„ค๊ณ„์— ๋งŽ์€ ๋น„์šฉ ์†Œ์š”
์ˆ˜๋™์  ์—ญํ•  LLM์„ ๋Šฅ๋™์  ์˜์‚ฌ๊ฒฐ์ •์ž๊ฐ€ ์•„๋‹Œ ์ˆ˜๋™์  ๋„๊ตฌ๋กœ ์ทจ๊ธ‰
์ ์‘์„ฑ ๋ถ€์กฑ ๋™์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๋Š” ํ™˜๊ฒฝ์— ์ ์‘ํ•˜๊ธฐ ์–ด๋ ค์›€

๐Ÿ’ก ํ•ต์‹ฌ ์•„์ด๋””์–ด

Pipeline โ†’ Model-native ํŒจ๋Ÿฌ๋‹ค์ž„ ์ „ํ™˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    ํŒจ๋Ÿฌ๋‹ค์ž„ ์ „ํ™˜์˜ ๋ณธ์งˆ                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Pipeline-based: LLM + ์™ธ๋ถ€ ๋ชจ๋“ˆ ์กฐํ•ฉ (ํ”„๋กฌํ”„ํŠธ/์›Œํฌํ”Œ๋กœ์šฐ ์—ฐ๊ฒฐ)    โ”‚
โ”‚       โ†“                                                         โ”‚
โ”‚  Model-native: ๋‹จ์ผ ํ†ตํ•ฉ ๋ชจ๋ธ (end-to-end ํ•™์Šต์œผ๋กœ ๋Šฅ๋ ฅ ๋‚ด์žฌํ™”)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๊ฐ•ํ™”ํ•™์Šต(RL)์˜ ์—ญํ• 

  • ํŒจ๋Ÿฌ๋‹ค์ž„ ์ „ํ™˜์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์  ์—”์ง„ ์—ญํ• 
  • ์ •์  ๋ฐ์ดํ„ฐ ๋ชจ๋ฐฉ ํ•™์Šต์—์„œ ๊ฒฐ๊ณผ ๊ธฐ๋ฐ˜ ํƒ์ƒ‰(outcome-driven exploration)์œผ๋กœ ์ „ํ™˜
  • LLM + RL + Task๋ผ๋Š” ํ†ตํ•ฉ ์†”๋ฃจ์…˜์˜ ๊ธฐ๋ฐ˜ ์ œ๊ณต
  • ์–ธ์–ด, ๋น„์ „, ์ฒดํ™”๋œ(embodied) ๋„๋ฉ”์ธ ์ „๋ฐ˜์— ์ ์šฉ

ํ•ต์‹ฌ ๊ฐ€์„ค

๊ฐ•๋ ฅํ•œ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์€ ์ตœ์ข… ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ํ”ผ๋“œ๋ฐฑ์„ ํ†ตํ•ด ํšจ๊ณผ์ ์ธ ์ถ”๋ก  ์ •์ฑ…์„ ์ž์œจ์ ์œผ๋กœ ํƒ์ƒ‰ํ•˜๊ณ  ๋‚ด์žฌํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.


๐Ÿ—๏ธ ๋ถ„๋ฅ˜ ์ฒด๊ณ„

1. Pipeline-based Agents

์™ธ๋ถ€ ๋กœ์ง๊ณผ ๋ชจ๋“ˆ์„ ํ†ตํ•ด ์—์ด์ „ํŠธ ๋Šฅ๋ ฅ์„ ๊ตฌํ˜„ํ•˜๋Š” ๋ฐฉ์‹

Planning (๊ณ„ํš)

  • ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฐ˜: Chain-of-Thought (CoT), Tree-of-Thought (ToT)
  • ์™ธ๋ถ€ ํ”Œ๋ž˜๋„ˆ ํ†ตํ•ฉ: ์‹ฌ๋ณผ๋ฆญ ํ”Œ๋ž˜๋„ˆ, ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ํ”Œ๋ž˜๋„ˆ
  • ๋ถ„ํ•ด ์ „๋žต: ๋ณต์žกํ•œ ์ž‘์—…์„ ํ•˜์œ„ ์ž‘์—…์œผ๋กœ ๋ถ„ํ•ด

Tool Use (๋„๊ตฌ ์‚ฌ์šฉ)

  • ํ”„๋ ˆ์ž„์›Œํฌ: LangChain, ReAct
  • ๋„๊ตฌ ํ˜ธ์ถœ: API, ๊ฒ€์ƒ‰ ์—”์ง„, ์ฝ”๋“œ ์‹คํ–‰ ํ™˜๊ฒฝ ์—ฐ๊ฒฐ
  • ํŒŒ์‹ฑ ๊ธฐ๋ฐ˜: LLM ์ถœ๋ ฅ์„ ํŒŒ์‹ฑํ•˜์—ฌ ๋„๊ตฌ ํ˜ธ์ถœ๋กœ ๋ณ€ํ™˜

Memory (๋ฉ”๋ชจ๋ฆฌ)

  • ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ: ์ตœ๊ทผ ์ปจํ…์ŠคํŠธ๋งŒ ์œ ์ง€
  • ์š”์•ฝ ๊ธฐ๋ฐ˜: ๊ธด ์ปจํ…์ŠคํŠธ๋ฅผ ์š”์•ฝํ•˜์—ฌ ๊ด€๋ฆฌ
  • RAG (Retrieval-Augmented Generation): ์™ธ๋ถ€ ์ง€์‹ ๋ฒ ์ด์Šค ๊ฒ€์ƒ‰

2. Model-native Agents

์—์ด์ „ํŠธ ๋Šฅ๋ ฅ์„ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ ๋‚ด๋ถ€์— ๋‚ด์žฌํ™”ํ•˜๋Š” ๋ฐฉ์‹

Planning (๊ณ„ํš)

  • OpenAI o1: ๋Œ€๊ทœ๋ชจ RL์„ ํ†ตํ•ด ๊ณ„ํš ๋Šฅ๋ ฅ ์ตœ์ดˆ ๋‚ด์žฌํ™”
  • DeepSeek R1: ๊ฒฐ๊ณผ ๊ธฐ๋ฐ˜ ๋ณด์ƒ๋งŒ์œผ๋กœ ์ถ”๋ก /๊ณ„ํš ํ–‰๋™ ํ•™์Šต
    • ๋‹จ๊ณ„๋ณ„ ๊ฐ๋… ๋น„์šฉ ๋Œ€ํญ ์ ˆ๊ฐ
    • Outcome reward: ์ตœ์ข… ๋‹ต์˜ ์ •ํ™•์„ฑ๋งŒ ๊ฒ€์ฆ

Tool Use (๋„๊ตฌ ์‚ฌ์šฉ)

  • OpenAI o3: ๋„๊ตฌ ์‚ฌ์šฉ์„ ์ถ”๋ก  ๊ณผ์ •์— ํ†ตํ•ฉ
  • Moonshot K2: ๋Œ€๊ทœ๋ชจ ๋„๊ตฌ ์‚ฌ์šฉ ๊ถค์  ํ•ฉ์„ฑ + ๋‹ค๋‹จ๊ณ„ RL
  • ๋ชจ๋ธ์ด ์–ธ์ œ, ์–ด๋–ป๊ฒŒ ๋„๊ตฌ๋ฅผ ํ˜ธ์ถœํ• ์ง€ ์Šค์Šค๋กœ ํ•™์Šต

Memory (๋ฉ”๋ชจ๋ฆฌ)

  • MemAct: ์ปจํ…์ŠคํŠธ ๊ด€๋ฆฌ๋ฅผ ์—์ด์ „ํŠธ๊ฐ€ ํ˜ธ์ถœํ•˜๋Š” ๋„๊ตฌ๋กœ ์žฌ์ •์˜
  • MemoryLLM: ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ง์ ‘ ํŒŒ๋ผ๋ฏธํ„ฐํ™”
    • ์ž ์žฌ ๋ฉ”๋ชจ๋ฆฌ ํ† ํฐ์ด forward pass์˜ ์ผ๋ถ€๋กœ ์ง€์† ์—…๋ฐ์ดํŠธ

๐Ÿ“Š ํŒจ๋Ÿฌ๋‹ค์ž„ ๋น„๊ต

์ธก๋ฉด Pipeline-based Model-native
์•„ํ‚คํ…์ฒ˜ LLM + ์™ธ๋ถ€ ๋ชจ๋“ˆ ์กฐํ•ฉ ๋‹จ์ผ ํ†ตํ•ฉ ๋ชจ๋ธ
๋Šฅ๋ ฅ ์œ„์น˜ ์™ธ๋ถ€ ์Šคํฌ๋ฆฝํŠธ/๋ชจ๋“ˆ์— ๋ถ„์‚ฐ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋‚ด์žฌํ™”
ํ•™์Šต ๋ฐฉ์‹ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง, ํŒŒ์ธํŠœ๋‹ End-to-end RL
์ ์‘์„ฑ ์ œํ•œ์  (์‚ฌ์ „ ์ •์˜๋œ ์›Œํฌํ”Œ๋กœ์šฐ) ๋†’์Œ (๊ฒฝํ—˜์„ ํ†ตํ•œ ํ•™์Šต)
์œ ์—ฐ์„ฑ ๋ชจ๋“ˆ ๊ต์ฒด๋กœ ์œ ์—ฐ์„ฑ ํ™•๋ณด ํ•™์Šต๋œ ์ •์ฑ…์œผ๋กœ ์œ ์—ฐ์„ฑ ํ™•๋ณด
ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ๋ช…์‹œ์  ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ์ถ”์  ์šฉ์ด ๋‚ด๋ถ€ ํ‘œํ˜„ ํ•ด์„ ํ•„์š”
์œ ์ง€๋ณด์ˆ˜ ํ”„๋กฌํ”„ํŠธ/๋ชจ๋“ˆ ํŠœ๋‹ ํ•„์š” ์žฌํ•™์Šต์œผ๋กœ ๊ฐœ์„ 
ํ™•์žฅ์„ฑ ๋ชจ๋“ˆ ์ถ”๊ฐ€๋กœ ๊ธฐ๋Šฅ ํ™•์žฅ ํ•™์Šต ๋ฐ์ดํ„ฐ/ํ™˜๊ฒฝ ํ™•์žฅ
๋Œ€ํ‘œ ์‹œ์Šคํ…œ LangChain, ReAct, AutoGPT OpenAI o1/o3, DeepSeek R1, UI-TARS

๐Ÿ’ช Model-native์˜ ์žฅ์ 

1. ํ–ฅ์ƒ๋œ ์ ์‘์„ฑ

  • ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ์ƒํ™ฉ์—์„œ๋„ ํ•™์Šต๋œ ์ •์ฑ…์œผ๋กœ ๋Œ€์‘
  • ๋™์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๋Š” ํ™˜๊ฒฝ์— ์ ์‘ ๊ฐ€๋Šฅ

2. ๊ฒฌ๊ณ ์„ฑ(Robustness)

  • ์™ธ๋ถ€ ๋ชจ๋“ˆ ์˜์กด๋„ ๊ฐ์†Œ๋กœ ์‹œ์Šคํ…œ ์•ˆ์ •์„ฑ ํ–ฅ์ƒ
  • ๋‹จ์ผ ๋ชจ๋ธ ๋‚ด ํ†ตํ•ฉ์œผ๋กœ ์˜ค๋ฅ˜ ์ „ํŒŒ ์ตœ์†Œํ™”

3. ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ

  • ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ๊ณผ ์ž‘์—…์— ๊ฑธ์ณ ํ•™์Šต๋œ ๋Šฅ๋ ฅ ์ „์ด
  • ์ƒˆ๋กœ์šด ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋Œ€ํ•œ zero-shot ์ ์šฉ ๊ฐ€๋Šฅ

4. ํšจ์œจ์  ํ•™์Šต

  • Outcome-based reward: ์ค‘๊ฐ„ ๋‹จ๊ณ„ ํ‰๊ฐ€ ์—†์ด ์ตœ์ข… ๊ฒฐ๊ณผ๋กœ ํ•™์Šต
  • DeepSeek R1 ๋ฐฉ์‹: ๋‹จ๊ณ„๋ณ„ ๊ฐ๋… ๋น„์šฉ ๋Œ€ํญ ์ ˆ๊ฐ

5. ์ž์œจ์  ์˜์‚ฌ๊ฒฐ์ •

  • ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ๊ณ„ํš, ๋„๊ตฌ ํ˜ธ์ถœ, ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ๊ฒฐ์ •
  • ํ”„๋กœ์•กํ‹ฐ๋ธŒํ•œ ์—์ด์ „ํŠธ ํ–‰๋™ ๊ฐ€๋Šฅ

โš ๏ธ ๋„์ „ ๊ณผ์ œ

๊ธฐ์ˆ ์  ๋„์ „

  1. ํ•™์Šต ํšจ์œจ์„ฑ: ๋Œ€๊ทœ๋ชจ RL ํ•™์Šต์— ํ•„์š”ํ•œ ์ปดํ“จํŒ… ์ž์›
  2. ๋ณด์ƒ ์„ค๊ณ„: ํšจ๊ณผ์ ์ธ outcome-based reward ์„ค๊ณ„
  3. ํƒ์ƒ‰-ํ™œ์šฉ ๊ท ํ˜•: ์ƒˆ๋กœ์šด ์ „๋žต ํƒ์ƒ‰๊ณผ ๊ธฐ์กด ์ง€์‹ ํ™œ์šฉ์˜ ๊ท ํ˜•
  4. ์žฅ๊ธฐ ๊ณ„ํš: ๊ธด horizon์—์„œ์˜ ์•ˆ์ •์ ์ธ ํ•™์Šต

์•ˆ์ „์„ฑ ๋ฐ ์ •๋ ฌ ๋„์ „

  1. ํ–‰๋™ ์˜ˆ์ธก ๋ถˆ๊ฐ€๋Šฅ์„ฑ: ๋‚ด์žฌํ™”๋œ ์ •์ฑ…์˜ ํ•ด์„ ์–ด๋ ค์›€
  2. ์•ˆ์ „ํ•œ ํƒ์ƒ‰: RL ํƒ์ƒ‰ ๊ณผ์ •์—์„œ์˜ ์•ˆ์ „์„ฑ ๋ณด์žฅ
  3. ๋ชฉํ‘œ ์ •๋ ฌ: ์—์ด์ „ํŠธ ๋ชฉํ‘œ์™€ ์ธ๊ฐ„ ์˜๋„์˜ ์ •๋ ฌ

์‹ค์šฉ์  ๋„์ „

  1. ๋ฐ์ดํ„ฐ ์š”๊ตฌ๋Ÿ‰: ๋‹ค์–‘ํ•œ ์ƒํ˜ธ์ž‘์šฉ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ํ•„์š”
  2. ํ™˜๊ฒฝ ๊ตฌ์ถ•: ํ•™์Šต์„ ์œ„ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ๊ตฌ์ถ•
  3. ํ‰๊ฐ€ ์ง€ํ‘œ: Model-native ์—์ด์ „ํŠธ ์„ฑ๋Šฅ ์ธก์ • ๋ฐฉ๋ฒ•๋ก 

๐Ÿ”ฌ ์ฃผ์š” ์‚ฌ๋ก€ ์—ฐ๊ตฌ

Deep Research Agent

  • ํŠน์ง•: ์žฅ๊ธฐ ์ถ”๋ก (long-horizon reasoning) ๊ฐ•์กฐ
  • ํŒจ๋Ÿฌ๋‹ค์ž„ ์ „ํ™˜: ์™ธ๋ถ€ ๊ฒ€์ƒ‰/์š”์•ฝ ๋ชจ๋“ˆ โ†’ ๋‚ด์žฌํ™”๋œ ์—ฐ๊ตฌ ๋Šฅ๋ ฅ
  • ์˜ˆ์‹œ: OpenAI์˜ research agent ๊ธฐ๋Šฅ

GUI Agent

  • ํŠน์ง•: ์ฒดํ™”๋œ ์ƒํ˜ธ์ž‘์šฉ(embodied interaction) ๊ฐ•์กฐ
  • UI-TARS (ByteDance):
    • End-to-end ํ•™์Šต์œผ๋กœ ์‹œ๊ฐ์ /UI ์ปจํ…์ŠคํŠธ์—์„œ ์ €์ˆ˜์ค€ ํ–‰๋™ ์˜ˆ์ธก
    • GPT-4o, Claude, Gemini๋ณด๋‹ค ํŠน์ • ๋ฒค์น˜๋งˆํฌ์—์„œ ์šฐ์ˆ˜
  • GUI-Owl, OpenCUA:
    • RL๋กœ GUI ๊ณ„ํš ๋ฐ ์‹คํ–‰ ์™„์ „ ๋‚ด์žฌํ™”
    • Outcome-based reward๋กœ ์žฅ๊ธฐ horizon ์ตœ์ ํ™”

๐Ÿ”ฎ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ

1. Multi-agent Collaboration์˜ ๋‚ด์žฌํ™”

  • ํ˜„์žฌ: ์™ธ๋ถ€ ํ”„๋กœํ† ์ฝœ๋กœ ์—์ด์ „ํŠธ ๊ฐ„ ํ˜‘๋ ฅ ์กฐ์œจ
  • ๋ฏธ๋ž˜: ํ˜‘๋ ฅ ๋Šฅ๋ ฅ ์ž์ฒด๋ฅผ ๋ชจ๋ธ์— ๋‚ด์žฌํ™”

2. Reflection ๋Šฅ๋ ฅ์˜ ๋‚ด์žฌํ™”

  • ์ž๊ธฐ ํ‰๊ฐ€ ๋ฐ ๊ฐœ์„  ๋Šฅ๋ ฅ์˜ ํ•™์Šต
  • ๋ฉ”ํƒ€์ธ์ง€์  ๋Šฅ๋ ฅ์˜ ๋ชจ๋ธ ํ†ตํ•ฉ

3. System Layer vs Model Layer์˜ ์—ญํ•  ์ง„ํ™”

  • ์‹œ์Šคํ…œ ๋ ˆ์ด์–ด: ์ธํ”„๋ผ, ์•ˆ์ „์„ฑ, ์ธํ„ฐํŽ˜์ด์Šค ๋‹ด๋‹น
  • ๋ชจ๋ธ ๋ ˆ์ด์–ด: ํ•ต์‹ฌ ์—์ด์ „ํŠธ ๋Šฅ๋ ฅ ๋‹ด๋‹น
  • ๋‘ ๋ ˆ์ด์–ด ๊ฐ„ ์ฑ…์ž„ ๋ถ„๋‹ด์˜ ์ง„ํ™”

4. ํ†ตํ•ฉ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ

  • LLM + RL + Task: ํ†ตํ•ฉ๋œ ํ•™์Šต ์†”๋ฃจ์…˜
  • ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ(์–ธ์–ด, ๋น„์ „, ์ฒดํ™”)์— ๊ฑธ์นœ ์ ์šฉ

๐Ÿ”— ๊ด€๋ จ ๋…ผ๋ฌธ

ํ•ต์‹ฌ ์ฐธ์กฐ ๋…ผ๋ฌธ

๋…ผ๋ฌธ ๊ธฐ์—ฌ
OpenAI o1 RL์„ ํ†ตํ•œ ๊ณ„ํš ๋Šฅ๋ ฅ ์ตœ์ดˆ ๋‚ด์žฌํ™”
DeepSeek R1 Outcome-based reward๋งŒ์œผ๋กœ ์ถ”๋ก  ํ•™์Šต
OpenAI o3 ๋„๊ตฌ ์‚ฌ์šฉ์„ ์ถ”๋ก  ๊ณผ์ •์— ํ†ตํ•ฉ
UI-TARS End-to-end GUI ์—์ด์ „ํŠธ
MemAct ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ์˜ ๋„๊ตฌํ™”
MemoryLLM ๋ฉ”๋ชจ๋ฆฌ์˜ ํŒŒ๋ผ๋ฏธํ„ฐํ™”

๊ด€๋ จ ์„œ๋ฒ ์ด

  • The Landscape of Agentic Reinforcement Learning for LLMs: A Survey (2025)
  • Memory in the Age of AI Agents (2025)
  • Multi-Agent Collaboration Mechanisms: A Survey of LLMs (2025)

๐Ÿ’ป ์‹ค๋ฌด ์‹œ์‚ฌ์ 

์—์ด์ „ํŠธ ์‹œ์Šคํ…œ ์„ค๊ณ„ ์‹œ ๊ณ ๋ ค์‚ฌํ•ญ

  1. ํŒจ๋Ÿฌ๋‹ค์ž„ ์„ ํƒ ๊ธฐ์ค€
    • ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ/์ œ์–ด ์ค‘์š” โ†’ Pipeline-based ์œ ์ง€
    • ์ ์‘์„ฑ/๊ฒฌ๊ณ ์„ฑ ์ค‘์š” โ†’ Model-native ์ „ํ™˜ ๊ฒ€ํ† 
  2. ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ ‘๊ทผ
    • ์™„์ „ํ•œ ์ „ํ™˜๋ณด๋‹ค ์ ์ง„์  ๋‚ด์žฌํ™” ๊ณ ๋ ค
    • ํ•ต์‹ฌ ๋Šฅ๋ ฅ๋ถ€ํ„ฐ ๋‹จ๊ณ„์ ์œผ๋กœ Model-nativeํ™”
  3. RL ๋„์ž… ์ „๋žต
    • Outcome-based reward๋กœ ๊ฐ๋… ๋น„์šฉ ์ ˆ๊ฐ
    • ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ์—์„œ ์•ˆ์ „ํ•œ ํƒ์ƒ‰

ํ”„๋ ˆ์ž„์›Œํฌ ์„ ํƒ ๊ฐ€์ด๋“œ

์ƒํ™ฉ ๊ถŒ์žฅ ์ ‘๊ทผ๋ฒ•
๋น ๋ฅธ ํ”„๋กœํ† ํƒ€์ดํ•‘ LangChain, ReAct (Pipeline)
๋ณต์žกํ•œ ์›Œํฌํ”Œ๋กœ์šฐ LangGraph (Pipeline)
๋†’์€ ์ ์‘์„ฑ ํ•„์š” Model-native ๋˜๋Š” ํ•˜์ด๋ธŒ๋ฆฌ๋“œ
๋ฆฌ์†Œ์Šค ์ œํ•œ Pipeline-based
์žฅ๊ธฐ ํˆฌ์ž ๊ฐ€๋Šฅ Model-native ์—ฐ๊ตฌ/๊ฐœ๋ฐœ

์ฃผ์š” ํŠธ๋ Œ๋“œ ๋ชจ๋‹ˆํ„ฐ๋ง

  1. OpenAI, DeepSeek ๋“ฑ์˜ reasoning model ๋ฐœ์ „
  2. GUI/Embodied agent์˜ end-to-end ํ•™์Šต ์ง„์ „
  3. Multi-modal agentic AI ๋ฐœ์ „
  4. ์•ˆ์ „ํ•œ RL ํƒ์ƒ‰ ๊ธฐ๋ฒ• ๋ฐœ์ „

๐Ÿ“š ํ•ต์‹ฌ ๊ฐœ๋… ์šฉ์–ด์ง‘

์šฉ์–ด ์ •์˜
Agentic AI ํ–‰๋™ํ•˜๊ณ , ์ถ”๋ก ํ•˜๊ณ , ์ ์‘ํ•˜๋Š” AI ์‹œ์Šคํ…œ
Pipeline-based ์™ธ๋ถ€ ๋กœ์ง/๋ชจ๋“ˆ๋กœ ์—์ด์ „ํŠธ ๋Šฅ๋ ฅ์„ ์กฐ์œจํ•˜๋Š” ๋ฐฉ์‹
Model-native ์—์ด์ „ํŠธ ๋Šฅ๋ ฅ์„ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋‚ด์žฌํ™”ํ•˜๋Š” ๋ฐฉ์‹
Outcome-based reward ์ตœ์ข… ๊ฒฐ๊ณผ์˜ ์ •ํ™•์„ฑ๋งŒ์œผ๋กœ ๋ณด์ƒ์„ ์ œ๊ณตํ•˜๋Š” ๋ฐฉ์‹
End-to-end learning ์ž…๋ ฅ์—์„œ ์ถœ๋ ฅ๊นŒ์ง€ ์ „์ฒด๋ฅผ ํ†ตํ•ฉ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹

๐Ÿท๏ธ Tags

#AI-Agent #LLM #Reinforcement-Learning #Model-Native #Pipeline #Survey #DeepSeek-R1 #OpenAI-o1 #Planning #Tool-Use #Memory #GUI-Agent #2025 #Paradigm-Shift #End-to-End-Learning

์ž‘์„ฑ์ž

skycave

Follow Me
๋‹ค๋ฅธ ๊ธฐ์‚ฌ
Previous

[AI Paper] ๐Ÿ“„ MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Next

[AI Paper] Multi-Agent Constraint Factorization Reveals Latent Invariant Solution Structure

๋Œ“๊ธ€ ์—†์Œ! ์ฒซ ๋Œ“๊ธ€์„ ๋‚จ๊ฒจ๋ณด์„ธ์š”.

๋‹ต๊ธ€ ๋‚จ๊ธฐ๊ธฐ ์‘๋‹ต ์ทจ์†Œ

์ด๋ฉ”์ผ ์ฃผ์†Œ๋Š” ๊ณต๊ฐœ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•„์ˆ˜ ํ•„๋“œ๋Š” *๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค

์ตœ์‹ ๊ธ€

  • ๐Ÿ“Š ์ผ์ผ ๋‰ด์Šค ๊ฐ์„ฑ ๋ฆฌํฌํŠธ – 2026-01-28
  • AI ์‹œ์Šคํ…œ์˜ ๋ฌธ๋งฅ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰(Contextual Retrieval) | Anthropic
  • “Think” ํˆด: Claude๊ฐ€ ๋ฉˆ์ถฐ์„œ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ | Anthropic
  • Claude Code ๋ชจ๋ฒ” ์‚ฌ๋ก€ \ Anthropic
  • ์šฐ๋ฆฌ๊ฐ€ ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ์—ฐ๊ตฌ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ ๋ฐฉ๋ฒ•
Copyright 2026 — skycave's Blog. All rights reserved. Blogsy WordPress Theme