[AI Paper] π Cognitive Architectures for Language Agents (CoALA)
π Cognitive Architectures for Language Agents (CoALA)
π λ©ν μ 보
| νλͺ© | λ΄μ© |
|---|---|
| μ λͺ© | Cognitive Architectures for Language Agents |
| μ μ | Theodore R. Sumers*, Shunyu Yao*, Karthik Narasimhan, Thomas L. Griffiths (*λλ± κΈ°μ¬) |
| μμ | Princeton University |
| λ°νμ² | Transactions on Machine Learning Research (TMLR) |
| λ°ν μ°λ | 2024λ 2μ 22μΌ |
| arXiv | 2309.02427 |
| GitHub | awesome-language-agents |
π― νμ€ μμ½
μΈμ§ κ³Όνκ³Ό κΈ°νΈμ AIμ μμ¬λ₯Ό λ°νμΌλ‘, LLM κΈ°λ° μΈμ΄ μμ΄μ νΈλ₯Ό λ©λͺ¨λ¦¬, μ‘μ 곡κ°, μμ¬κ²°μ νλ‘μΈμ€μ μΈ μΆμΌλ‘ 체κ³νν ν΅ν© νλ μμν¬λ₯Ό μ μνμ¬, κΈ°μ‘΄ μμ΄μ νΈλ€μ λΆλ₯νκ³ μΈμ΄ κΈ°λ° λ²μ© μ§λ₯(Language-based General Intelligence)μ ν₯ν λ―Έλ λ°μ λ°©ν₯μ μ μνλ€.
π μ°κ΅¬ λ°°κ²½ λ° λκΈ°
κΈ°μ‘΄ λ¬Έμ μ
- 체κ³μ νλ μμν¬ λΆμ¬: LLMμ μΈλΆ 리μμ€(μΈν°λ·, API λ±)λ λ΄λΆ μ μ΄ νλ¦(ν둬ννΈ μ²΄μ΄λ)μ κ²°ν©ν “μΈμ΄ μμ΄μ νΈ”λ€μ΄ λ±μ₯νμ§λ§, μ΄λ€μ 체κ³μ μΌλ‘ μ΄ν΄νκ³ λΉκ΅ν νλ μμν¬κ° μμμ
- μ©μ΄ λ° κ°λ λΆμΌμΉ: λ€μν μ°κ΅¬λ€μ΄ μλ‘ λ€λ₯Έ μ©μ΄μ μ κ·Όλ²μ μ¬μ©νμ¬ μ°κ΅¬μ κ° μν΅μ΄ μ΄λ €μ
- μ€κ³ μμΉ λΆμ¬: μλ‘μ΄ μμ΄μ νΈλ₯Ό κ°λ°ν λ μ°Έκ³ ν μ μλ 체κ³μ μΈ μ€κ³ κ°μ΄λλΌμΈμ΄ μμμ
μΈμ§ μν€ν μ²μ νμμ±
- μΈμ§ κ³Όν μ°κ²°: Soar, ACT-R κ°μ κ³ μ μ μΈμ§ μν€ν μ²μ μμ λ μ°κ΅¬ μ±κ³Όλ₯Ό νλ LLM μμ΄μ νΈμ μ μ©
- ν΅ν©μ κ΄μ : κ°λ³ μμ΄μ νΈλ€μ 곡ν΅μ κ³Ό μ°¨μ΄μ μ λͺ νν νμ ν μ μλ λ μ¦ μ 곡
- Production System μ¬ν΄μ: LLMμ “νλ₯ μ μμ± μμ€ν (Probabilistic Production System)”μΌλ‘ μμΉμμΌ κ³ μ AIμ μ°κ²°
- λ―Έλ λ°©ν₯ μ μ: μΈμ΄ κΈ°λ° λ²μ© μ§λ₯(Language-based General Intelligence)μ ν₯ν λ‘λλ§΅ μ립
π‘ ν΅μ¬ μμ΄λμ΄
LLMμ “νλ₯ μ μμ± μμ€ν (Probabilistic Production System)”μΌλ‘ λ°λΌλ³΄κΈ°
CoALAλ LLMμ κ³ μ μ μμ± μμ€ν (Production System)μ νλμ νμ₯μΌλ‘ ν΄μνλ€:
κ³ μ μ μμ± μμ€ν
: IF condition THEN action (κ·μΉ κΈ°λ° λ¬Έμμ΄ μ‘°μ)
β
LLM κΈ°λ° μμ€ν
: λ¬Έμμ΄ μμ±μ λν νλ₯ λΆν¬ μ μ (νμ΅λ ν¨ν΄ κΈ°λ°)
Cognitive Architecture νλ μμν¬: μΈ κ°μ§ ν΅μ¬ μ°¨μ
CoALAλ μΈμ΄ μμ΄μ νΈλ₯Ό μΈ κ°μ§ μ°¨μμΌλ‘ μ‘°μ§ννλ€:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CoALA Framework β
βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββββββββββ€
β Memory β Action Space β Decision Making β
β (μ 보 μ μ₯) β (νλ 곡κ°) β (μμ¬κ²°μ ) β
βββββββββββββββββββΌββββββββββββββββββΌββββββββββββββββββββββββββ€
β β’ Working β β’ Internal β β’ Planning Stage β
β β’ Episodic β - Reasoning β (μΆλ‘ /κ²μμΌλ‘ νκ°) β
β β’ Semantic β - Retrieval β β’ Execution Stage β
β β’ Procedural β - Learning β (μ νλ νλ μ€ν) β
β β β’ External β β
β β - Grounding β β
βββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββββββββββ
ν΅μ¬ μ€κ³ μμΉ
- λͺ¨λν: κ° κ΅¬μ± μμκ° λ 립μ μΌλ‘ μ€κ³/κ°μ κ°λ₯
- μ μ°μ±: LLMμ΄ κΈ°μ‘΄ hand-coded ruleμ λ체νμ¬ μ μ°ν μΆλ‘ μ 곡
- ν μ€νΈ μ€μ¬: ν μ€νΈλ₯Ό μ¬μ€μμ λ΄λΆ νν(de facto representation)μΌλ‘ μ¬μ©
- μν ꡬ쑰: μ§μμ μΈ μΈμ§-νλ 루νλ₯Ό ν΅ν μ μμ λμ
ποΈ νλ μμν¬ κ΅¬μ‘°
1. λ©λͺ¨λ¦¬ μμ€ν (Memory System)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Memory System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Working Memory (μμ
λ©λͺ¨λ¦¬) β β
β β - νμ¬ μμ¬κ²°μ μ£ΌκΈ°λ₯Ό μν νμ± μ 보 β β
β β - μ§κ° μ
λ ₯, νμ± μ§μ, νμ¬ λͺ©ν β β
β β - LLM, μ₯κΈ° λ©λͺ¨λ¦¬, νκ²½ κ° μ€μ νλΈ μν β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Long-term Memory (μ₯κΈ° λ©λͺ¨λ¦¬) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β Episodic Memory (μΌν κΈ°μ΅) β β
β β - κ³Όκ±° κ²½ν/μ΄λ²€νΈ κΈ°λ‘ β β
β β - μ: "μ§λλ² ν΄κ²°μ±
Xλ₯Ό μλνμ λ λ¬΄μ¨ μΌμ΄?" β β
β β - κ²μ: recency + importance + relevance μ μ μ‘°ν© β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β Semantic Memory (μλ―Έ κΈ°μ΅) β β
β β - μΈκ³μ λν μ¬μ€μ /μΌλ°νλ μ§μ β β
β β - ꡬν: μ§μ λ² μ΄μ€, μμ§μ AI, λ²‘ν° μλ² λ© β β
β β - μ: "μλ λ μ μμ§λ§ νμ‘°λ μμΈ" β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β Procedural Memory (μ μ°¨ κΈ°μ΅) β β
β β - μμ
μν λ°©λ² (μ½λ, LLM νλΌλ―Έν°, ν둬ννΈμ λ΄μ₯) β β
β β - μμ΄μ νΈμ μ€μ λμ λ°©μμ κ·μ β β
β β - μ: Voyagerμ μ½λ κΈ°λ° μ€ν¬ λΌμ΄λΈλ¬λ¦¬ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
λ©λͺ¨λ¦¬ μ νλ³ μμΈ
| λ©λͺ¨λ¦¬ μ ν | μν | ꡬν λ°©λ² | μμ |
|---|---|---|---|
| Working | νμ¬ μ»¨ν μ€νΈ μ μ§ | LLM 컨ν μ€νΈ μλμ° | μ΅κ·Ό λν, λΆλΆ μ루μ |
| Episodic | κ³Όκ±° κ²½ν μ μ₯ | λ²‘ν° DB + μκ° μ 보 | Generative Agentsμ μ΄λ²€νΈ κΈ°μ΅ |
| Semantic | μ¬μ€ μ§μ μ μ₯ | μ§μ κ·Έλν, λ²‘ν° μλ² λ© | λλ©μΈ μ§μ, κ·μΉ |
| Procedural | μ€ν λ°©λ² μ μ₯ | μ½λ, ν둬ννΈ, λͺ¨λΈ κ°μ€μΉ | Voyagerμ μ€ν¬ λΌμ΄λΈλ¬λ¦¬ |
2. νλ κ³΅κ° (Action Space)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Action Space β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ€
β Internal Actions β External Actions β
β (λ΄λΆ νλ) β (μΈλΆ νλ - Grounding) β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€
β β β
β ββββββββββββββββββββ β ββββββββββββββββββββββββββββββ β
β β Reasoning β β β Physical Environment β β
β β (μΆλ‘ ) β β β - λ‘λ΄ μ μ΄, 물리μ μνΈμμ© β β
β β - LLMμΌλ‘ μμ
β β ββββββββββββββββββββββββββββββ β
β β λ©λͺ¨λ¦¬ κ°±μ β β β
β β - μ μ§μ/ν΄λ¦¬ β β ββββββββββββββββββββββββββββββ β
β β μ€ν± μμ± β β β Digital Environment β β
β ββββββββββββββββββββ β β - API νΈμΆ, μΉ λΈλΌμ°μ§ β β
β β β - μ½λ μ€ν, νμΌ μ‘°μ β β
β ββββββββββββββββββββ β ββββββββββββββββββββββββββββββ β
β β Retrieval β β β
β β (κ²μ) β β ββββββββββββββββββββββββββββββ β
β β - μ₯κΈ° λ©λͺ¨λ¦¬ β β β Communicative β β
β β μμ μ½κΈ° β β β - μ¬μ©μ/λ€λ₯Έ μμ΄μ νΈμ β β
β β - Rule/Sparse/ β β β λν β β
β β Dense κ²μ β β ββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββ β β
β β β
β ββββββββββββββββββββ β β
β β Learning β β β
β β (νμ΅) β β β
β β - μ₯κΈ° λ©λͺ¨λ¦¬ β β β
β β μ μ°κΈ° β β β
β β - κ²½ν/μ§μ/ β β β
β β μ€ν¬ μ μ₯ β β β
β ββββββββββββββββββββ β β
ββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββ
λ΄λΆ νλ (Internal Actions) μμΈ
| νλ μ ν | μ€λͺ | λ©λͺ¨λ¦¬ μ κ·Ό | ꡬν μμ |
|---|---|---|---|
| Reasoning | LLMμΌλ‘ μμ λ©λͺ¨λ¦¬ μ λ°μ΄νΈ, μ μ§μ/ν΄λ¦¬μ€ν± μμ± | Write to Working | Chain-of-Thought, μν© λΆμ |
| Retrieval | μ₯κΈ° λ©λͺ¨λ¦¬μμ μμ λ©λͺ¨λ¦¬λ‘ μ 보 μ½κΈ° | Read from Long-term | Dense/Sparse/Rule κΈ°λ° κ²μ |
| Learning | κ²½ν/μ§μ/μ€ν¬μ μ₯κΈ° λ©λͺ¨λ¦¬μ κΈ°λ‘ | Write to Long-term | μνΌμλ μ μ₯, μ€ν¬ λΌμ΄λΈλ¬λ¦¬ μ λ°μ΄νΈ |
3. μμ¬κ²°μ νλ‘μΈμ€ (Decision Making)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Decision Cycle β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ β
β β Perception β <- νκ²½μΌλ‘λΆν° μ
λ ₯ β
β ββββββββ¬ββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PLANNING STAGE β β
β β βββββββββββββββββββββββββββββββββββββββββββ β β
β β β 1. Reasoning (μΆλ‘ ) β β β
β β β - μν© λΆμ β β β
β β β - νλ ν보 μμ± (Propose) β β β
β β βββββββββββββββββββββββββββββββββββββββββββ€ β β
β β β 2. Retrieval (κ²μ) β β β
β β β - κ΄λ ¨ κ²½ν/μ§μ/μ μ°¨ κ²μ β β β
β β βββββββββββββββββββββββββββββββββββββββββββ€ β β
β β β 3. Evaluation & Selection (νκ°/μ ν) β β β
β β β - νλ ν보 νκ° (Evaluate) β β β
β β β - μ΅μ νλ μ ν (Select) β β β
β β βββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β EXECUTION STAGE β β
β β βββββββββββββββββββββ¬ββββββββββββββββββββββ β β
β β β Learning Action β Grounding Action β β β
β β β (μ₯κΈ° λ©λͺ¨λ¦¬ κ°±μ ) β (μΈλΆ νκ²½ μνΈμμ©) β β β
β β βββββββββββββββββββββ΄ββββββββββββββββββββββ β β
β ββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β
β β β
β [λ€μ μ£ΌκΈ°λ‘ λ°λ³΅] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
μμ¬κ²°μ μ£ΌκΈ° μμ¬μ½λ
class CoALAAgent:
def __init__(self):
self.working_memory = WorkingMemory()
self.episodic_memory = EpisodicMemory()
self.semantic_memory = SemanticMemory()
self.procedural_memory = ProceduralMemory()
self.llm = LanguageModel()
def decision_cycle(self):
"""λ©μΈ μμ¬κ²°μ 루ν - μ§μμ μΌλ‘ μ€ν"""
while True:
# 1. μ§κ° μ
λ ₯ μμ
perception = self.receive_perception()
self.working_memory.update(perception)
# 2. Planning Stage (κ³ν λ¨κ³)
action = self.planning_stage()
# 3. Execution Stage (μ€ν λ¨κ³)
self.execution_stage(action)
def planning_stage(self):
"""κ³ν λ¨κ³: μΆλ‘ κ³Ό κ²μμ λ°λ³΅νλ©° νλ μ ν"""
while not action_selected:
# Reasoning: LLMμΌλ‘ μν© λΆμ λ° ν보 μμ±
analysis = self.reasoning_action(self.working_memory)
# Retrieval: κ΄λ ¨ μ 보 μ₯κΈ° λ©λͺ¨λ¦¬μμ κ²μ
relevant_info = self.retrieval_action(analysis)
self.working_memory.update(relevant_info)
# νλ ν보 μ μ λ° νκ°/μ ν
candidates = self.propose_actions(self.working_memory)
action = self.evaluate_and_select(candidates)
return action # Learning λλ Grounding νλ
def execution_stage(self, action):
"""μ€ν λ¨κ³: μ νλ νλ μ€ν"""
if action.type == "learning":
self.update_long_term_memory(action)
elif action.type == "grounding":
result = self.execute_external_action(action)
self.working_memory.update(result)
π μμ΄μ νΈ λΆλ₯
CoALA νλ μμν¬λ‘ λΆμν κΈ°μ‘΄ μμ΄μ νΈλ€
CoALAλ μ€ν λ Όλ¬Έμ΄ μλ νλ μμν¬/μλ² μ΄ λ Όλ¬ΈμΌλ‘, κΈ°μ‘΄ μμ΄μ νΈλ€μ 체κ³μ μΌλ‘ λΆλ₯νκ³ λΆμνλ€:
| μμ΄μ νΈ | Working Memory | Long-term Memory | μ£Όμ Action | νΉμ§ |
|---|---|---|---|---|
| ReAct | LLM 컨ν μ€νΈ | μ νμ | Reasoning + Grounding | Thought-Action μΈν°λ¦¬λΉ |
| Toolformer | LLM 컨ν μ€νΈ | – | Grounding (API νΈμΆ) | μΈλΆ λꡬ μ¬μ© λ₯λ ₯ νμ₯ |
| AutoGPT/BabyAGI | λ²‘ν° μ€ν μ΄ | Task List + Results | 볡μ‘ν νμ€ν¬ 루ν | μμ¨ μμ΄μ νΈ κ°λ μ¦λͺ |
| Voyager | LLM 컨ν μ€νΈ | Procedural (μ€ν¬ λΌμ΄λΈλ¬λ¦¬) | + Procedural Learning | μ½λ κΈ°λ° μ€ν¬ νμ΅ |
| Generative Agents | LLM 컨ν μ€νΈ | Episodic + Semantic | 4κ°μ§ λͺ¨λ | κ°μ₯ μμ ν λ©λͺ¨λ¦¬ μμ€ν |
| Tree of Thoughts | LLM 컨ν μ€νΈ | – | λ°λ³΅μ μΆλ‘ | νΈλ¦¬ ꡬ쑰 μμ¬κ²°μ |
| Reflexion | LLM 컨ν μ€νΈ | Episodic + Semantic | + Learning | μκΈ° μ±μ°° λ° μ μ |
μμ΄μ νΈλ³ CoALA κ΄μ μ¬μΈ΅ λΆμ
ReAct
- Memory: Working memoryλ§ μ¬μ© (LLM 컨ν μ€νΈ μλμ°)
- Actions: μΈλΆ λꡬ(μΉ κ²μ, lookup) + LLM κΈ°λ° μΆλ‘
- Decision: Thought-Action μΈν°λ¦¬λΉμΌλ‘ μΆλ‘ κ³Ό νλμ μλμ§
- νκ³: μ₯κΈ° κΈ°μ΅ λΆμ¬λ‘ κ²½ν μΆμ λΆκ°
Voyager
- Memory: Procedural memory (μ½λ κΈ°λ° μ€ν¬ λΌμ΄λΈλ¬λ¦¬) μΆκ°
- Actions: Dense retrievalλ‘ μ€ν¬ κ²μ + Minecraft νκ²½ μνΈμμ©
- Learning: μλ‘μ΄ μ€ν¬μ procedural memoryμ μ μ₯
- νΉμ§: CoALAμμ κ°μ‘°νλ μ μ°¨μ νμ΅μ κ³ κΈ μ¬λ‘
Generative Agents
- Memory: Episodic + Semantic memory λͺ¨λ νμ©
- Retrieval: Recency(κ·μΉ κΈ°λ°) + Importance(μΆλ‘ κΈ°λ°) + Relevance(μλ² λ© κΈ°λ°) μ μ μ‘°ν©
- Learning: Episodic μΆλ‘ μ ν΅ν΄ Semantic μ§μ μμ± (κ²½ν -> μΌλ°ν)
- νΉμ§: κ°μ₯ μμ ν ννμ CoALA λ©λͺ¨λ¦¬ μμ€ν ꡬν
Action Space vs Decision Complexity νΈλ μ΄λμ€ν
Action Space 볡μ‘λ β ββββΊ Decision Procedure 볡μ‘λ β
β β
β β
Voyager, λ λ§μ hand-craft
Generative Agents 컀μ€ν°λ§μ΄μ§ νμ
ν΅μ¬ ν΅μ°°: λ λ₯λ ₯μλ μμ΄μ νΈ(Voyager, Generative Agents)λ λ ν° νλ 곡κ°μ 보μ νμ§λ§, μ΄λ λ 볡μ‘ν μμ¬κ²°μ λ¬Έμ λ₯Ό μΌκΈ°νμ¬ λ λ§μΆ€νλ μμ¬κ²°μ μ μ°¨κ° νμνλ€.
πͺ κ°μ λ° κΈ°μ¬
1. μ΄λ‘ μ κΈ°μ¬
- ν΅ν© νλ μμν¬: μ°λ°μ μ΄λ μΈμ΄ μμ΄μ νΈ μ°κ΅¬λ₯Ό 체κ³ν
- νμ€ μ©μ΄ μ 립: μ°κ΅¬μ κ° μν΅μ μν κ³΅ν΅ μ΄ν μ 곡
- μμ¬μ μ°κ²°: κ³ μ μ μΈμ§ μν€ν μ²(Soar, ACT-R)μ νλ LLMμ κ°κ΅ μν
- μ΄λ‘ μ κΈ°λ°: LLMμ “νλ₯ μ μμ± μμ€ν ”μΌλ‘ μμΉμμΌ νλ¬Έμ λ§₯λ½ μ 곡
2. μ€μ©μ κΈ°μ¬
- λΆμ λꡬ: κΈ°μ‘΄ μμ΄μ νΈλ€μ μ₯λ¨μ μ 체κ³μ μΌλ‘ λΉκ΅ κ°λ₯
- μ€κ³ κ°μ΄λ: μλ‘μ΄ μμ΄μ νΈ κ°λ° μ κ³ λ €ν΄μΌ ν μ°¨μλ€ λͺ μν
- κ° λΆμ: κΈ°μ‘΄ μ°κ΅¬μμ νμλμ§ μμ μμ μλ³
- μ²μ¬μ§ μ 곡: μ΄λ€ μ»΄ν¬λνΈκ° νμνκ³ μ΄λ»κ² μνΈμμ©ν΄μΌ νλμ§ μ μ
3. μ±λ₯ ν₯μ μ μ¦
- GPT-3.5 + Cognitive Architecture: μ½λ© λ²€μΉλ§ν¬μμ 48% -> 95% μ±λ₯ ν₯μ
- λꡬ μ¬μ© + Agentic Reflection κ²°ν©μ ν¨κ³Ό μ μ¦
4. νλ¬Έμ μν₯
- λ€μμ νμ μ°κ΅¬μ μν₯
- awesome-language-agents μ μ₯μ: μ§μμ μΌλ‘ μ λ°μ΄νΈλλ κ΄λ ¨ μ°κ΅¬ λͺ©λ‘ μ μ§
β οΈ νκ³μ λ° ν₯ν μ°κ΅¬
νλ μμν¬μ νκ³
- κ°λ
μ νλ μμν¬μ νκ³
- μ§μ μ μΈ μ±λ₯ λ²€μΉλ§ν¬λ μ€ν κ²°κ³Ό μ 곡νμ§ μμ
- ꡬ체μ μΈ κ΅¬ν κ°μ΄λλΌμΈλ³΄λ€λ κ°λ μ μ‘°μ§νμ μ΄μ
- λ¨μν¨μ΄ κ°μ μ΄μ νκ³ – μΈλΆ ꡬν κ°μ΄λ λΆμ‘±
- λ©λͺ¨λ¦¬ ꡬνμ λμ
- ν¨μ¨μ μΈ μ₯κΈ° λ©λͺ¨λ¦¬ κ²μ/μ°κΈ° λ©μ»€λμ¦ λ―Έν΄κ²°
- λκ·λͺ¨ λ©λͺ¨λ¦¬ κ΄λ¦¬ λ¬Έμ
λ―Έλ μ°κ΅¬ λ°©ν₯
1. Working Memory & Reasoning κ³ λν
- λ¨μν ν둬ννΈ μμ§λμ΄λ§μ λμ΄μλ “μ§μ ν μ¬κ³ ” λ©μ»€λμ¦ νꡬ
- μ μμ€ λ¬Έμμ΄ μ‘°μμ΄ μλ κ³ μμ€ μΈμ§ μ€κ³
2. μμ΄μ νΈ μμ μ± (Agent Safety)
- λ΄λΆμ μν: Learning νλ(νΉν μ μ°¨ μμ /μμ )μ΄ μΌκΈ°ν μ μλ λ¬Έμ
- μΈλΆμ μν: Grounding νλ(bashμ rm, μ ν΄ λ°μΈ, 물리μ μν)μ μνμ±
- νλ κ³΅κ° κ΄μ μμμ μμ μ± λΆμ νμ
3. Reinforcement Learning ν΅ν©
- κ²½ν κΈ°λ° μμ¬κ²°μ μ μ± μ΅μ ν
- End-to-end λλ co-adapted νμ΅
4. ν΅ν© νμ΅ (Unified Learning)
- Memory, Action Selection, Skill Acquisitionμ ν΅ν© νμ΅
- μ¬μ νλ ¨λ LLM λ₯λ ₯μλ§ μμ‘΄νμ§ μλ μν€ν μ²
5. Symbolic-LLM κ²°ν©
- μμ§μ μΆλ‘ μ»΄ν¬λνΈμ LLMμ μ΅μ μ‘°ν© νꡬ
- μ ν¨λ¬λ€μμ μ₯μ νμ©
6. Meta-learning & Value Alignment
- μμ΄μ νΈ μ½λ μ체μ λ©ν νμ΅
- μΈκ° κ°μΉμμ μ λ ¬ (Alignment)
# λ―Έλ λ°©ν₯: μκΈ° μ½λλ₯Ό μμ νλ μμ΄μ νΈ
def meta_learning(self):
# κ²½ν λΆμ
performance = self.analyze_performance()
# μ½λ κ°μ μ μμ±
code_improvement = self.llm.generate_code_improvement(performance)
# μκΈ° μ½λ μμ
self.update_agent_code(code_improvement)
π κ΄λ ¨ λ Όλ¬Έ
μ ν μ°κ΅¬ (Foundational Works)
| λ Όλ¬Έ | μ°λ | κ΄λ ¨μ± |
|---|---|---|
| Soar: An Architecture for General Intelligence | 1987 | CoALAμ μ£Όμ μκ° μμ², λ©λͺ¨λ¦¬ ꡬ쑰 |
| ACT-R: A Theory of Higher Level Cognition | 1998 | μΈμ§ μν€ν μ² κΈ°λ° |
| ReAct: Synergizing Reasoning and Acting in Language Models | 2022 | λΆμ λμ μμ΄μ νΈ |
| Toolformer: Language Models Can Teach Themselves to Use Tools | 2023 | λꡬ μ¬μ© λ₯λ ₯ νμ₯ |
| Reflexion: Language Agents with Verbal Reinforcement Learning | 2023 | μκΈ° μ±μ°° λ©μ»€λμ¦ |
| Voyager: An Open-Ended Embodied Agent | 2023 | Procedural Memory μ¬λ‘ |
| Generative Agents: Interactive Simulacra of Human Behavior | 2023 | μμ ν λ©λͺ¨λ¦¬ μμ€ν |
| Tree of Thoughts: Deliberate Problem Solving with Large Language Models | 2023 | μμ¬κ²°μ ꡬ쑰 |
νμ μ°κ΅¬ (2024-2025)
| λ Όλ¬Έ | μ°λ | λ΄μ© |
|---|---|---|
| Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives | 2024-01 | λ°μ± λ©μ»€λμ¦ κ°μ |
| Agent-Pro: Learning to Evolve via Policy-Level Reflection | 2024-02 | μ μ± μμ€ νμ΅ |
| AppWorld: Benchmarking Interactive Coding Agents | 2024-07 | μμ΄μ νΈ λ²€μΉλ§ν¬ |
| Uncertainty-Aware Language Agent (UALA) | 2024 (ACL) | λΆνμ€μ± κΈ°λ° μμ΄μ νΈ |
| A Survey on Large Language Model based Autonomous Agents | 2023-2024 | μ’ ν© μλ² μ΄ |
π» μ€λ¬΄ μ μ© ν¬μΈνΈ
1. μμ΄μ νΈ μ€κ³ 체ν¬λ¦¬μ€νΈ
## CoALA κΈ°λ° μμ΄μ νΈ μ€κ³ 체ν¬λ¦¬μ€νΈ
### λ©λͺ¨λ¦¬ μ€κ³
- [ ] Working Memory: νμ¬ μ»¨ν
μ€νΈ κ΄λ¦¬ λ°©λ² μ μ
- [ ] Episodic Memory: κ³Όκ±° κ²½ν μ μ₯ νμ μ¬λΆ
- [ ] Semantic Memory: μΈλΆ μ§μ λ² μ΄μ€ μ°λ νμ μ¬λΆ
- [ ] Procedural Memory: μ€ν¬/μ½λ λΌμ΄λΈλ¬λ¦¬ νμ μ¬λΆ
### μ‘μ
κ³΅κ° μ€κ³
- [ ] λ΄λΆ νλ: μΆλ‘ /κ²μ/νμ΅ μ€ νμν κ²
- [ ] μΈλΆ νλ: μ΄λ€ νκ²½κ³Ό μνΈμμ©νλμ§ (물리/λμ§νΈ/λν)
- [ ] νλ κ³΅κ° ν¬κΈ° vs μμ¬κ²°μ 볡μ‘λ νΈλ μ΄λμ€ν κ³ λ €
### μμ¬κ²°μ μ€κ³
- [ ] κ³ν λ¨κ³μ 볡μ‘λ κ²°μ (λ¨μ vs λ°λ³΅μ νκ°)
- [ ] νλ μ ν μ λ΅ μ μ
### μμ μ± μ€κ³
- [ ] Learning νλμ λ΄λΆ μν λΆμ
- [ ] Grounding νλμ μΈλΆ μν λΆμ
2. LangChain κΈ°λ° κ΅¬ν μμ
from langchain.agents import AgentExecutor
from langchain.memory import ConversationBufferMemory, VectorStoreRetrieverMemory
from langchain_openai import ChatOpenAI
class CoALAInspiredAgent:
"""CoALA μμΉμ λ°μν μμ΄μ νΈ κ΅¬μ‘°"""
def __init__(self):
self.llm = ChatOpenAI(model="gpt-4")
# Working Memory: λν λ²νΌ
self.working_memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
# Semantic Memory: λ²‘ν° μ€ν μ΄ κΈ°λ°
self.semantic_memory = VectorStoreRetrieverMemory(
retriever=self.setup_retriever()
)
# Episodic Memory: κ²½ν μ μ₯μ
self.episodic_memory = []
# Procedural Memory: μ€ν¬ λΌμ΄λΈλ¬λ¦¬
self.procedural_memory = self.load_skills()
def planning_stage(self, user_input):
"""κ³ν λ¨κ³: μΆλ‘ κ³Ό κ²μ"""
# 1. Retrieval - κ΄λ ¨ μ 보 κ²μ
relevant_knowledge = self.semantic_memory.load_memory_variables(
{"query": user_input}
)
relevant_experiences = self.retrieve_episodes(user_input)
# 2. Reasoning - μν© λΆμ λ° νλ κ³ν
context = self.build_context(
user_input,
relevant_knowledge,
relevant_experiences
)
action_plan = self.llm.invoke(context)
return action_plan
def execution_stage(self, action_plan):
"""μ€ν λ¨κ³: νμ΅ λλ κ·ΈλΌμ΄λ©"""
if action_plan.requires_external_action:
# Grounding: μΈλΆ λꡬ μ€ν
result = self.execute_tool(action_plan.tool, action_plan.args)
else:
# Learning: λ©λͺ¨λ¦¬ μ
λ°μ΄νΈ
self.update_memories(action_plan)
result = action_plan.response
# κ²½ν κΈ°λ‘ (Episodic Learning)
self.episodic_memory.append({
"input": action_plan.input,
"action": action_plan.action,
"result": result
})
return result
3. μ£Όμ μ€κ³ κ³ λ €μ¬ν
| κ³ λ €μ¬ν | κΆμ₯ μ¬ν |
|---|---|
| λ©λͺ¨λ¦¬ μ ν | μμ νΉμ±μ λ°λΌ νμν λ©λͺ¨λ¦¬ μ νλ§ κ΅¬ν (볡μ‘λ κ΄λ¦¬) |
| νλ κ³΅κ° ν¬κΈ° | νλ 곡κ°μ΄ ν΄μλ‘ μμ¬κ²°μ μ΄ λ³΅μ‘ν΄μ§ – νΈλ μ΄λμ€ν κ³ λ € |
| μμ¬κ²°μ 루ν | λ©μΈ λ‘μ§μ ν κ³³μ μ§μ€, λͺ¨λν μ μ§ |
| νμ΅ μ λ΅ | λ¨μ: In-context -> μ€κ°: RAG -> κ³ κΈ: νμΈνλ/μ½λ μμ |
| κ²μ λ°©μ | Rule-based vs Sparse vs Dense – μν©μ λ§κ² μ‘°ν© |
4. μμ μ± μ²΄ν¬ν¬μΈνΈ
def safety_check(action):
"""CoALA μμ μ± κ΄μ μ νλ κ²μ¬"""
# λ΄λΆ μν: Learning νλ κ²μ¬
if action.type == "learning":
if action.target == "procedural_memory":
if action.operation in ["delete", "modify"]:
return require_confirmation(
"μ μ°¨ λ©λͺ¨λ¦¬ μμ μ μμ΄μ νΈ νλμ μν₯μ μ€ μ μμ΅λλ€."
)
# μΈλΆ μν: Grounding νλ κ²μ¬
if action.type == "grounding":
dangerous_commands = ["rm", "delete", "format", "sudo"]
if any(cmd in action.command for cmd in dangerous_commands):
return require_confirmation(
f"μνν λͺ
λ Ήμ΄ κ°μ§: {action.command}"
)
return True
π·οΈ Tags
#AIAgent #CognitiveArchitecture #LLM #LanguageAgent #CoALA #Memory #ReAct #Voyager #GenerativeAgents #Reflexion #DecisionMaking #Princeton #TMLR #2024 #Framework #Survey #Soar #ACTR #ProductionSystem #ProceduralMemory #EpisodicMemory #SemanticMemory #WorkingMemory #AgentSafety #MetaLearning #Toolformer #TreeOfThoughts #Grounding #Reasoning #Retrieval #Learning
π μ°Έκ³ μλ£
- λ Όλ¬Έ μλ¬Έ: arXiv:2309.02427
- GitHub μ μ₯μ: awesome-language-agents
- OpenReview: TMLR 리뷰
- Princeton μ°κ΅¬ νμ΄μ§: Princeton Research