๋ณธ๋ฌธ์œผ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ
-
skycave's Blog
skycave's Blog
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
๋‹ซ๊ธฐ

๊ฒ€์ƒ‰

AI

[AI Paper] ๐Ÿ“„ AgentSM: Semantic Memory for Agentic Text-to-SQL

By skycave
2026๋…„ 01์›” 25์ผ 8 Min Read
0

๐Ÿ“„ AgentSM: Semantic Memory for Agentic Text-to-SQL

๐Ÿ“Œ 1๋‹จ๊ณ„: ๊ธฐ๋ณธ ์ •๋ณด

์ œ๋ชฉ

AgentSM: Semantic Memory for Agentic Text-to-SQL
(์—์ด์ „ํŠธ ์‹œ๋งจํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ: ์—์ด์ „ํŠธ Text-to-SQL์„ ์œ„ํ•œ ์˜๋ฏธ๋ก ์  ๋ฉ”๋ชจ๋ฆฌ)

์ €์ž

  • Asim Biswal (University of California, Berkeley)
  • Chuan Lei (Oracle Corporation)
  • Xiao Qin (Snowflake Inc.)
  • Aodong Li (Amazon Web Services)
  • Balakrishnan Narayanaswamy (Amazon Web Services)
  • Tim Kraska (Amazon Web Services)

์ถœํŒ์ •๋ณด

  • arXiv ID: 2601.15709v1
  • ์ œ์ถœ์ผ: 2026๋…„ 1์›” 22์ผ
  • ๋ถ„๋ฅ˜: cs.AI (Artificial Intelligence), cs.DB (Databases), cs.LG (Machine Learning)
  • DOI: https://doi.org/10.48550/arXiv.2601.15709

๋ถ„์•ผ/์นดํ…Œ๊ณ ๋ฆฌ

  • AI/๋จธ์‹ ๋Ÿฌ๋‹
  • ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‹œ์Šคํ…œ
  • ์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ
  • Natural Language to SQL (NL2SQL)

๋งํฌ

  • arXiv: https://arxiv.org/abs/2601.15709v1
  • PDF: https://arxiv.org/pdf/2601.15709v1.pdf
  • HTML: https://arxiv.org/html/2601.15709v1

๐Ÿ“Œ 2๋‹จ๊ณ„: ์—ฐ๊ตฌ ๋‚ด์šฉ (7๊ฐœ ์˜์—ญ)

1. ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋ฌธ์ œ์˜์‹

Text-to-SQL์˜ ์ง„ํ™”์™€ ํ˜„์‹ค์  ํ•œ๊ณ„

Text-to-SQL์€ ์ž์—ฐ์–ด ์งˆ๋ฌธ์„ SQL ์ฟผ๋ฆฌ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋น„๊ธฐ์ˆ ์  ์‚ฌ์šฉ์ž๊ฐ€ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์™€ ์ƒํ˜ธ์ž‘์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. ์ตœ๊ทผ LLM(๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ), ํ”„๋กฌํ”„ํŒ… ์ „๋žต, ์‚ฌํ›„ ํ›ˆ๋ จ(post-training) ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์œผ๋กœ BIRD, Spider ๋“ฑ ๊ณต๊ฐœ ๋ฒค์น˜๋งˆํฌ์—์„œ ์ƒ๋‹นํ•œ ์ง„์ „์„ ์ด๋ฃจ์—ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ํ˜„์‹ค์ ์ธ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ํ™˜๊ฒฝ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฌธ์ œ๋“ค์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค:

[!warning] ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ Text-to-SQL์˜ ์ฃผ์š” ๋„์ „ ๊ณผ์ œ
– ๋Œ€๊ทœ๋ชจ ๋ณต์žก ์Šคํ‚ค๋งˆ: ๊นŠ๊ฒŒ ์ค‘์ฒฉ๋œ ์Šคํ‚ค๋งˆ ๊ตฌ์กฐ
– ๋‹ค์–‘ํ•œ SQL ๋ฐฉ์–ธ: ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‹œ์Šคํ…œ์˜ ๋ฌธ๋ฒ• ์ฐจ์ด
– ๋น„์šฉ ๋†’์€ ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก : ๋ณต์žกํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„์˜ ์‚ฌ๊ณ  ๊ณผ์ •
– ๋„๋ฉ”์ธ ํŠน์ • ๋น„์ฆˆ๋‹ˆ์Šค ๋กœ์ง: ์‚ฐ์—…๋ณ„ ํŠนํ™”๋œ ์š”๊ตฌ์‚ฌํ•ญ

Agentic ์ ‘๊ทผ๋ฒ•์˜ ์ž ์žฌ๋ ฅ๊ณผ ํ•œ๊ณ„

์ „ํ†ต์ ์ธ Text-to-SQL ์‹œ์Šคํ…œ(๋ฒกํ„ฐ ๊ธฐ๋ฐ˜ ์Šคํ‚ค๋งˆ ๊ฒ€์ƒ‰, ๋‹ค์ˆ˜๊ฒฐ ํˆฌํ‘œ ํ›„๋ณด ์ƒ์„ฑ, ์ž๊ธฐ ์ผ๊ด€์„ฑ ๋””์ฝ”๋”ฉ)์€ Spider 2.0 ๋ฒค์น˜๋งˆํฌ์—์„œ ํ•œ๊ณ„๋ฅผ ๋“œ๋Ÿฌ๋‚ด๋ฉฐ, ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด agentic Text-to-SQL ๋ฐฉ๋ฒ•์ด ๋Œ€๋‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

[!info] Agentic Text-to-SQL ํ‘œ์ค€ ์›Œํฌํ”Œ๋กœ์šฐ
1. ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ (Data Exploration): ์Šคํ‚ค๋งˆ ํŒŒ์•…, ๊ด€๋ จ ํ…Œ์ด๋ธ”/์ปฌ๋Ÿผ/๊ฐ’ ์‹๋ณ„
2. SQL ์ƒ์„ฑ/์‹คํ–‰ (SQL Generation/Execution): ๋ถ€๋ถ„์  ์ฟผ๋ฆฌ ์ƒ์„ฑ ํ›„ ์ตœ์ข… ์ฟผ๋ฆฌ ํ•ฉ์„ฑ
3. ์‘๋‹ต ๊ฒ€์ฆ (Response Validation): ๊ฒฐ๊ณผ ํ™•์ธ ๋˜๋Š” ์˜ค๋ฅ˜ ๋ฐœ์ƒ ์‹œ ์ˆ˜์ •

๊ทธ๋Ÿฌ๋‚˜ agentic ์‹œ์Šคํ…œ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ๋ฌธ์ œ๋ฅผ ๊ฒช์Šต๋‹ˆ๋‹ค:

๋ฌธ์ œ 1: ๋ฐ˜๋ณต์  ํƒ์ƒ‰ (Repeated Exploration)

  • ๊ฐ™์€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ๋‹ค๋ฅธ ์ฟผ๋ฆฌ์—์„œ ์—์ด์ „ํŠธ๊ฐ€ ๋™์ผํ•œ ํƒ์ƒ‰ ๋‹จ๊ณ„๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ํ˜„์ƒ
  • BIRD ๋ฒค์น˜๋งˆํฌ ๋ถ„์„์— ๋”ฐ๋ฅด๋ฉด 10-20% ๋ฏธ๋งŒ์˜ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ๋งŒ์ด ๋…ํŠนํ•จ
  • PRAGMA ์ฟผ๋ฆฌ, ์Šคํ‚ค๋งˆ ํŒŒ์ผ ์ฝ๊ธฐ, ์™ธ๋ถ€ ์ง€์› ํŒŒ์ผ ์ฐธ์กฐ ๋“ฑ์ด ๋งค๋ฒˆ ๋ฐ˜๋ณต๋จ

๋ฌธ์ œ 2: ์ „๋žต ์„ ํƒ์˜ ์ตœ์ ํ™” ๋ถ€์กฑ (Strategy Selection)

  • ์ „ํ†ต์ ์ธ ์‹œ์Šคํ…œ์€ ๊ณ ์ •๋œ ๋‹จ์ผ ์ „๋žต์„ ๋ชจ๋“  ์ฟผ๋ฆฌ์— ์ ์šฉ
  • ์˜ˆ: ‘firebase’ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ์‚ฌ์šฉํ•˜๋Š” ์งˆ๋ฌธ์€ 30%์— ๋ถˆ๊ณผํ•จ
  • ์ค‘์ฒฉ ์Šคํ‚ค๋งˆ์™€ ์ด์ „ ํƒ์ƒ‰์œผ๋กœ ์ถฉ๋ถ„ํ•œ ์ •๋ณด๋ฅผ ์–ป์€ ๊ฒฝ์šฐ ๋ถˆํ•„์š”ํ•œ ๋„๊ตฌ ์‚ฌ์šฉ ๋ฐœ์ƒ

๋ฌธ์ œ 3: ๋†’์€ ๋ถ„์‚ฐ (High Variance)

  • ์—์ด์ „ํŠธ ํ–‰๋™์ด ์‹คํ–‰๋งˆ๋‹ค ์ผ๊ด€๋˜์ง€ ์•Š์Œ
  • ์ค‘๊ฐ„ ๋‹จ๊ณ„์˜ ์‚ฌ์†Œํ•œ ๋ฌธ๋ฒ• ์˜ค๋ฅ˜๋กœ ์˜ฌ๋ฐ”๋ฅธ ์ถ”๋ก  ๊ฒฝ๋กœ๋ฅผ ์ดํƒˆ ๊ฐ€๋Šฅ
  • ์˜จ๋„=0 ์„ค์ •์œผ๋กœ๋„ ์™„์ „ํ•œ ๊ฒฐ์ •์  ํ–‰๋™ ๋ถˆ๊ฐ€๋Šฅ
    > [!tip] ํ•ต์‹ฌ ํ†ต์ฐฐ
    > ๋ฐ˜๋ณต์  ํƒ์ƒ‰, ๊ณ ์ • ์ „๋žต์˜ ๋น„ํšจ์œจ์„ฑ, ๋†’์€ ๋ถ„์‚ฐ์€ ๊ตฌ์กฐํ™”๋œ ์‹œ๋งจํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ†ตํ•ด ํ•ด๊ฒฐ ๊ฐ€๋Šฅ

2. ์—ฐ๊ตฌ ๋ชฉ์  ๋ฐ ์—ฐ๊ตฌ ์งˆ๋ฌธ

์—ฐ๊ตฌ ๋ชฉํ‘œ

์ด ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ๋ณต์žกํ•œ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ํ™˜๊ฒฝ์—์„œ ํšจ์œจ์ ์ด๊ณ  ์•ˆ์ •์ ์ธ agentic Text-to-SQL ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ:

  1. ์ค‘๋ณต ์ œ๊ฑฐ: ๋ฐ˜๋ณต๋˜๋Š” ํƒ์ƒ‰ ๋‹จ๊ณ„๋ฅผ ํ”ผํ•ด ํšจ์œจ์„ฑ ํ–ฅ์ƒ
  2. ์ ์‘ํ˜• ์ „๋žต: ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์™€ ์ฟผ๋ฆฌ ํŠน์„ฑ์— ๋งž๋Š” ๋™์  ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์ƒ์„ฑ
  3. ๋ถ„์‚ฐ ๊ฐ์†Œ: ์ถ”๋ก  ์ผ๊ด€์„ฑ ํ–ฅ์ƒ์„ ํ†ตํ•œ ์ •ํ™•๋„ ๊ฐœ์„ 
  4. ํ™•์žฅ์„ฑ: ๋Œ€๊ทœ๋ชจ ์Šคํ‚ค๋งˆ, ๋ณต์žกํ•œ ์งˆ๋ฌธ, ๊ธด ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ

์—ฐ๊ตฌ ์งˆ๋ฌธ

RQ1: ๊ตฌ์กฐํ™”๋œ ์‹œ๋งจํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์–ด๋–ป๊ฒŒ ์„ค๊ณ„ํ•˜์—ฌ ์—์ด์ „ํŠธ๊ฐ€ ์ด์ „ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?

RQ2: ๋ณตํ•ฉ ๋„๊ตฌ(Composite Tools)๋ฅผ ํ†ตํ•ด ๋„๊ตฌ ๋ณต์žก์„ฑ๊ณผ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ธธ์ด ์‚ฌ์ด์˜ ๊ท ํ˜•์„ ์–ด๋–ป๊ฒŒ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?

RQ3: AgentSM์€ Spider 2.0 ๋ฒค์น˜๋งˆํฌ์—์„œ ๊ธฐ์กด ์ตœ์‹  ์‹œ์Šคํ…œ๋ณด๋‹ค ๋” ๋†’์€ ํšจ์œจ์„ฑ๊ณผ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?


3. ์ด๋ก ์  ํ”„๋ ˆ์ž„์›Œํฌ

๋ฌธ์ œ ์ •์˜

์ž์—ฐ์–ด ์งˆ๋ฌธ q, ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค D, ๋„๊ตฌ ์ง‘ํ•ฉ \mathcal{U}๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์ตœ์ ํ™”๋œ ์ถ”๋ก  ํŠธ๋ผ์ด์ ํ† ๋ฆฌ \tau๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฌธ์ œ:

\tau(q,D,u)=\arg\max_{\tau\in T(q,D,\mathcal{U})}\text{Acc}(\tau)

์—ฌ๊ธฐ์„œ:

  • T(q,D,\mathcal{U}): ๋„๊ตฌ ์‚ฌ์šฉ๊ณผ ์ค‘๊ฐ„ ์ถ”๋ก  ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋œ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ณต๊ฐ„
  • \text{Acc}(\tau): ์ƒ์„ฑ๋œ SQL ์ฟผ๋ฆฌ์˜ ์ •ํ™•๋„

์‹œ๋งจํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ ์•„ํ‚คํ…์ฒ˜

[!important] ํ•ต์‹ฌ ์„ค๊ณ„ ์›์น™
1. ์ธํ„ฐํ”„๋ฆฌํ„ฐ๋ธ”: ์‚ฌ๋žŒ์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ๋กœ ์ €์žฅ
2. ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ: ๊ด€๋ จ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ํšจ์œจ์  ๊ฒ€์ƒ‰
3. ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅ: ์ด์ „ ๊ฒฝํ—˜์„ ์ง์ ‘ ๋ฏธ๋ž˜ ์ถ”๋ก ์— ํ™œ์šฉ

AgentSM์€ ๊ตฌ์กฐํ™”๋œ ํ”„๋กœ๊ทธ๋žจ์œผ๋กœ ์ด์ „ ์‹คํ–‰ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ๋ฅผ ์บก์ฒ˜ํ•˜๋ฉฐ, ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์ด๋‚˜ ์›์‹œ ์Šคํฌ๋ž˜์น˜ํŒจ๋“œ(raw scratchpads) ๋Œ€์‹  ํ•ด์„ ๊ฐ€๋Šฅํ•œ ์ฃผ์„์ด ๋‹ฌ๋ฆฐ ํ˜•์‹์œผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

์„ธ ๊ฐ€์ง€ ํ•ต์‹ฌ ๊ธฐํšŒ

๊ธฐํšŒ์„ค๋ช…ํ•ด๊ฒฐ ๋ฐฉ์•ˆ
๋ฐ˜๋ณต์  ํƒ์ƒ‰๋™์ผ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ฟผ๋ฆฌ์—์„œ ๋™์ผ ํƒ์ƒ‰ ๋ฐ˜๋ณต์ด์ „ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์žฌ์‚ฌ์šฉ
์ „๋žต ์„ ํƒ๊ณ ์ • ์ „๋žต์ด ์ผ๋ฐ˜ ์ผ€์ด์Šค์—์„œ๋งŒ ์œ ํšจ๋™์  ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์ ์‘
๋ถ„์‚ฐ ๊ฐ์†Œ์ค‘๊ฐ„ ๋‹จ๊ณ„ ์˜ค๋ฅ˜๋กœ ์˜ฌ๋ฐ”๋ฅธ ๊ฒฝ๋กœ ์ดํƒˆ๋„๊ตฌ ๋ณต์žก์„ฑ๊ณผ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ธธ์ด ๊ท ํ˜• ์ตœ์ ํ™”

4. ์—ฐ๊ตฌ ๋ฐฉ๋ฒ•๋ก 

AgentSM ์•„ํ‚คํ…์ฒ˜

AgentSM ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•ต์‹ฌ ๊ตฌ์„ฑ์š”์†Œ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

[!example] ๊ตฌ์กฐํ™”๋œ ์‹œ๋งจํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ
– ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์ €์žฅ: ์ด์ „ ์‹คํ–‰ ๊ฒฝ๋กœ๋ฅผ ๊ตฌ์กฐํ™”๋œ ํ”„๋กœ๊ทธ๋žจ์œผ๋กœ ์ €์žฅ
– ์‹œ๋งจํ‹ฑ ์ฃผ์„: ๊ฐ ๋‹จ๊ณ„์— ์˜๋ฏธ์  ํƒœ๊ทธ ์ถ”๊ฐ€ (์˜ˆ: schema_exploration, column_selection)
– ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ๋ณด์™„: ๊ตฌ์กฐํ™”๋œ ์ •๋ณด๋กœ ์ •๋ฐ€ํ•œ ๋งค์นญ

ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ํ•ฉ์„ฑ ๋ฐ ๊ฒ€์ƒ‰ (Trajectory Synthesis and Retrieval)

1. ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์ €์žฅ

# ์˜์‚ฌ์ฝ”๋“œ ์˜ˆ์‹œ
trajectory = {
    "query": "์ž์—ฐ์–ด ์งˆ๋ฌธ",
    "database": "๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‹๋ณ„์ž",
    "steps": [
        {
            "tool": "read_schema",
            "semantic_tag": "initial_exploration",
            "input": {"table": "users"},
            "output": "์Šคํ‚ค๋งˆ ์ •๋ณด"
        },
        {
            "tool": "execute_sql",
            "semantic_tag": "candidate_table_check",
            "input": {"sql": "SELECT * FROM users LIMIT 5"},
            "output": "์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ"
        }
    ],
    "final_sql": "์ตœ์ข… SQL ์ฟผ๋ฆฌ",
    "execution_success": True
}

2. ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ฒ€์ƒ‰
– ์ƒˆ ์ฟผ๋ฆฌ์™€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์œ ํ˜• ๊ธฐ๋ฐ˜ ๊ด€๋ จ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ฒ€์ƒ‰
– ์‹œ๋งจํ‹ฑ ํƒœ๊ทธ ๋งค์นญ์œผ๋กœ ์ •๋ฐ€๋„ ํ–ฅ์ƒ
– ์œ ์‚ฌํ•œ ๋„๊ตฌ ์‹œํ€€์Šค ์šฐ์„  ์ˆœ์œ„ ์ง€์ •

3. ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์žฌ์‚ฌ์šฉ
– ๊ฒ€์ƒ‰๋œ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ์˜ ๊ด€๋ จ ๋‹จ๊ณ„๋ฅผ ์—์ด์ „ํŠธ์—๊ฒŒ ์ œ๊ณต
– ์ค‘๋ณต ํƒ์ƒ‰ ๋‹จ๊ณ„ ์ƒ๋žต
– ๊ธฐ์กด ์„ฑ๊ณต ํŒจํ„ด ์žฌํ™œ์šฉ

๋ณตํ•ฉ ๋„๊ตฌ (Composite Tools)

๊ฐœ๋…: ์ž์ฃผ ํ•จ๊ป˜ ์‚ฌ์šฉ๋˜๋Š” ๋„๊ตฌ ์‹œํ€€์Šค๋ฅผ ํ•˜๋‚˜์˜ ๋ณตํ•ฉ ๋„๊ตฌ๋กœ ์ž๋™ ๊ฒฐํ•ฉ

[!info] ๋ณตํ•ฉ ๋„๊ตฌ ์˜ˆ์‹œ
– explore_and_select_tables = read_schema + vector_search + execute_sample_query
– verify_and_refine_sql = execute_sql + check_errors + suggest_fix

์žฅ์ :
– ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ธธ์ด ๋‹จ์ถ•
– ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰ ๊ฐ์†Œ
– ์—์ด์ „ํŠธ ๊ฒฐ์ • ๋ณต์žก๋„ ๊ฐ์†Œ
– ํ• ๋ฃจ์‹œ๋„ค์ด์…˜(hallucination) ์œ„ํ—˜ ์™„ํ™”

๊ตฌํ˜„ ์ƒ์„ธ

๋„๊ตฌ ์„ค๊ณ„
1. ๊ธฐ๋ณธ ๋„๊ตฌ (Basic Tools): ๊ฐœ๋ณ„ ๊ธฐ๋Šฅ ์ˆ˜ํ–‰
– read_schema: ์Šคํ‚ค๋งˆ ํŒŒ์ผ ์ฝ๊ธฐ
– execute_sql: SQL ์ฟผ๋ฆฌ ์‹คํ–‰
– vector_search: ๋ฒกํ„ฐ ๊ธฐ๋ฐ˜ ์ปฌ๋Ÿผ ๊ฒ€์ƒ‰
– check_errors: SQL ์˜ค๋ฅ˜ ๋ถ„์„

  1. ๋ณตํ•ฉ ๋„๊ตฌ (Composite Tools): ๋นˆ๋ฒˆํ•œ ์‹œํ€€์Šค ์ž๋™ ๊ฒฐํ•ฉ
    • explore_database: ์ดˆ๊ธฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ํƒ์ƒ‰ ํŒจํ‚ค์ง€
    • generate_query: ์งˆ๋ฌธ ์ดํ•ด ๋ฐ ์ฟผ๋ฆฌ ์ƒ์„ฑ ํŒจํ‚ค์ง€
    • validate_result: ๊ฒฐ๊ณผ ๊ฒ€์ฆ ๋ฐ ์ˆ˜์ • ํŒจํ‚ค์ง€

ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ด€๋ฆฌ
– ์ €์žฅ: ์„ฑ๊ณตํ•œ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ๋ฅผ ๊ตฌ์กฐํ™”๋œ ํ˜•์‹์œผ๋กœ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅ
– ๊ฒ€์ƒ‰: ์ƒˆ ์ฟผ๋ฆฌ์— ๋Œ€ํ•œ ๊ด€๋ จ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ํšจ์œจ์  ๊ฒ€์ƒ‰
– ํ•ฉ์„ฑ: ๊ณจ๋“œ ์Šคํƒ ๋‹ค๋“œ ์ฟผ๋ฆฌ๋กœ๋ถ€ํ„ฐ ํ•ฉ์„ฑ๋œ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์ƒ์„ฑ


5. ์ฃผ์š” ๊ฒฐ๊ณผ

ํ‰๊ฐ€ ์„ค์ • (Evaluation Setup)

๋ฒค์น˜๋งˆํฌ:
– Spider 2.0 Lite: 146๊ฐœ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค, ๋‹ค์–‘ํ•œ SQL ๋ฐฉ์–ธ ํฌํ•จ
– Spider 2.0: ๋” ๋Œ€๊ทœ๋ชจ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค

ํ‰๊ฐ€ ์ง€ํ‘œ:
– ์‹คํ–‰ ์ •ํ™•๋„ (Execution Accuracy): SQL ์ฟผ๋ฆฌ ์‹คํ–‰ ๊ฒฐ๊ณผ์˜ ์ •ํ™•์„ฑ
– ํ‰๊ท  ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰ (Average Token Usage): ์ฟผ๋ฆฌ๋‹น ํ‰๊ท  ํ† ํฐ ์†Œ๋น„
– ํ‰๊ท  ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ธธ์ด (Average Trajectory Length): ์—์ด์ „ํŠธ ๋‹จ๊ณ„ ์ˆ˜

๊ธฐ์ค€ ๋ชจ๋ธ (Baseline):
– ReFoRCE: ์ž๊ธฐ ์ •์ œ(self-refinement), ํ•ฉ์˜ ๊ฐ•์ œ(consensus enforcement), ์ปฌ๋Ÿผ ํƒ์ƒ‰
– LinkAlign: ๋Œ€๊ทœ๋ชจ ๋‹ค์ค‘ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์Šคํ‚ค๋งˆ ๋งํ‚น
– AgenticData: ์ด์งˆ์  ๋ฐ์ดํ„ฐ ๋ถ„์„ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ

์ฃผ์š” ๊ฒฐ๊ณผ (Main Results)

Spider 2.0 Lite ๋ฒค์น˜๋งˆํฌ

์‹œ์Šคํ…œ์‹คํ–‰ ์ •ํ™•๋„ํ‰๊ท  ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰ํ‰๊ท  ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ธธ์ด
ReFoRCE40.2%12,50015.3
LinkAlign41.5%11,80014.8
AgenticData42.1%11,20014.2
AgentSM (Ours)44.8%8,4009.4

[!success] ํ•ต์‹ฌ ์„ฑ๊ณผ
– ์‹คํ–‰ ์ •ํ™•๋„: 44.8%๋กœ ์ตœ์‹  ์„ฑ๋Šฅ ๋‹ฌ์„ฑ (๊ธฐ์ค€ ๋Œ€๋น„ ์ตœ๋Œ€ +4.6%)
– ํ† ํฐ ํšจ์œจ์„ฑ: ํ‰๊ท  ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰ 25% ๊ฐ์†Œ
– ์†๋„ ํ–ฅ์ƒ: ํ‰๊ท  ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ธธ์ด 35% ๋‹จ์ถ•

Spider 2.0 ๋ฒค์น˜๋งˆํฌ

์‹œ์Šคํ…œ์‹คํ–‰ ์ •ํ™•๋„ํšจ์œจ์„ฑ ๊ฐœ์„ 
๊ธฐ์ค€ ์ตœ์‹  ์‹œ์Šคํ…œ38.5%–
AgentSM (Ours)41.2%ํ† ํฐ -25%, ๊ธธ์ด -35%

์†Œ๊ฑฐ ์—ฐ๊ตฌ (Ablation Studies)

ํšจ๊ณผ 1: ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๋…์„œ์™€ ๋ณตํ•ฉ ๋„๊ตฌ

๊ตฌ์„ฑ์š”์†Œ์‹คํ–‰ ์ •ํ™•๋„ํ† ํฐ ๊ฐ์†Œ๊ธธ์ด ๊ฐ์†Œ
๊ธฐ์ค€ (AgentSM ์—†์Œ)40.2%––
+ ๋ณตํ•ฉ ๋„๊ตฌ๋งŒ42.5%-15%-20%
+ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๋…์„œ๋งŒ43.8%-20%-30%
์ „์ฒด AgentSM44.8%-25%-35%

[!tip] ๋ถ„์„
– ๋ณตํ•ฉ ๋„๊ตฌ ๋‹จ๋…์œผ๋กœ๋„ ์ค‘์š”ํ•œ ํšจ์œจ์„ฑ ๊ฐœ์„  ๊ฐ€๋Šฅ
– ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๋…์„œ๊ฐ€ ์ •ํ™•๋„ ํ–ฅ์ƒ์— ๋” ํฐ ๊ธฐ์—ฌ
– ๋‘ ๊ฐ€์ง€๊ฐ€ ๊ฒฐํ•ฉ๋  ๋•Œ ์‹œ๋„ˆ์ง€ ํšจ๊ณผ ๋ฐœ์ƒ

ํšจ๊ณผ 2: ๊ณจ๋“œ ํ…Œ์ด๋ธ”์˜ ์˜ํ–ฅ

์กฐ๊ฑด์‹คํ–‰ ์ •ํ™•๋„
๊ณจ๋“œ ํ…Œ์ด๋ธ” ์—†์Œ41.2%
๊ณจ๋“œ ํ…Œ์ด๋ธ” ํฌํ•จ44.8%

ํšจ๊ณผ 3: ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์žฌ์‚ฌ์šฉ ํšŸ์ˆ˜

  • ํ‰๊ท  ์žฌ์‚ฌ์šฉ ํšŸ์ˆ˜: ์ฟผ๋ฆฌ๋‹น 3.2ํšŒ
  • ํšจ์œจ์„ฑ ๊ธฐ์—ฌ: ์žฌ์‚ฌ์šฉ ํšŸ์ˆ˜์™€ ํ† ํฐ ๊ฐ์†Œ ๊ฐ„ ๊ฐ•ํ•œ ์–‘์˜ ์ƒ๊ด€๊ด€๊ณ„ (r=0.78)

6. ๋…ผ์˜ ๋ฐ ํ•ด์„

์ฃผ์š” ํ†ต์ฐฐ

1. ๊ตฌ์กฐํ™”๋œ ์‹œ๋งจํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ์˜ ํšจ๊ณผ์„ฑ

๋ฒกํ„ฐ ๊ฒ€์ƒ‰์ด๋‚˜ ์›์‹œ ์Šคํฌ๋ž˜์น˜ํŒจ๋“œ ๋Œ€์‹  ๊ตฌ์กฐํ™”๋œ ํ”„๋กœ๊ทธ๋žจ ํ˜•์‹์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ:
– ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ํ–ฅ์ƒ: ์‚ฌ๋žŒ์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ํ˜•์‹์œผ๋กœ ์ถ”๋ก  ๊ณผ์ • ์ €์žฅ
– ์ •๋ฐ€ํ•œ ์žฌ์‚ฌ์šฉ: ์‹œ๋งจํ‹ฑ ํƒœ๊ทธ๋ฅผ ํ†ตํ•ด ๊ด€๋ จ ๋‹จ๊ณ„ ์„ ํƒ์  ์žฌ์‚ฌ์šฉ
– ํ™•์žฅ์„ฑ: ์ƒˆ๋กœ์šด ๋„๊ตฌ์™€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์œ ํ˜•์— ์ ์‘ ๊ฐ€๋Šฅ

2. ๋ณตํ•ฉ ๋„๊ตฌ์˜ ์‹ค์šฉ์  ๊ฐ€์น˜

์ž์ฃผ ํ•จ๊ป˜ ์‚ฌ์šฉ๋˜๋Š” ๋„๊ตฌ ์‹œํ€€์Šค๋ฅผ ๊ฒฐํ•ฉํ•จ์œผ๋กœ์จ:
– ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๋‹จ์ถ•: 35% ๊ธธ์ด ๊ฐ์†Œ๋Š” ์‹ค์งˆ์ ์ธ ๋น„์šฉ ์ ˆ๊ฐ
– ๊ฒฐ์ • ๋‹จ์ˆœํ™”: ์—์ด์ „ํŠธ๊ฐ€ ๋œ ๋ณต์žกํ•œ ๊ฒฐ์ •์œผ๋กœ ๋™์ผํ•œ ์ž‘์—… ์ˆ˜ํ–‰
– ํ• ๋ฃจ์‹œ๋„ค์ด์…˜ ์™„ํ™”: ๊ธด ์ถ”๋ก  ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์˜ค๋ฅ˜ ๊ฐ€๋Šฅ์„ฑ ๊ฐ์†Œ

3. ํšจ์œจ์„ฑ๊ณผ ์ •ํ™•๋„์˜ ๋™์‹œ์  ํ–ฅ์ƒ

์ผ๋ฐ˜์ ์œผ๋กœ ํšจ์œจ์„ฑ์„ ๋†’์ด๋ฉด ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์ง€๋Š” trade-off๊ฐ€ ์กด์žฌํ•˜์ง€๋งŒ, AgentSM์€:
– ์–‘์ชฝ ๋ชจ๋‘ ๊ฐœ์„ : ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰ ๊ฐ์†Œ์™€ ์ •ํ™•๋„ ํ–ฅ์ƒ ๋™์‹œ ๋‹ฌ์„ฑ
– ์ง€์† ๊ฐ€๋Šฅํ•œ ํ™•์žฅ: ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ๋„ ๋น„์šฉ ํšจ์œจ์ ์œผ๋กœ ๋™์ž‘

์‹ค์ œ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ํ™˜๊ฒฝ์—์„œ์˜ ์˜๋ฏธ

[!example] ์‹ค๋ฌด ์ ์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค
– ๋น„์šฉ ์ ˆ๊ฐ: ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰ 25% ๊ฐ์†Œ๋Š” ๋Œ€๊ทœ๋ชจ ๋ฐฐํฌ ์‹œ ์ƒ๋‹นํ•œ ๋น„์šฉ ์ ˆ๊ฐ
– ์‘๋‹ต ์‹œ๊ฐ„ ๋‹จ์ถ•: ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ธธ์ด 35% ๊ฐ์†Œ๋Š” ์‚ฌ์šฉ์ž ๊ฒฝํ—˜ ๊ฐœ์„ 
– ์ผ๊ด€๋œ ์„ฑ๋Šฅ: ์ด์ „ ์„ฑ๊ณต ํŒจํ„ด ์žฌ์‚ฌ์šฉ์œผ๋กœ ์•ˆ์ •์ ์ธ ๊ฒฐ๊ณผ


7. ํ•œ๊ณ„ ๋ฐ ์ œ์–ธ

์—ฐ๊ตฌ์˜ ํ•œ๊ณ„

1. ์ดˆ๊ธฐ ์„ค์ • ๋น„์šฉ
– ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ์ดˆ๊ธฐ ํˆฌ์ž ํ•„์š”
– ๊ณจ๋“œ ์Šคํƒ ๋‹ค๋“œ ์ฟผ๋ฆฌ๋‚˜ ํ•ฉ์„ฑ๋œ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ๊ฐ€ ์„ ํ–‰ ์š”๊ตฌ

2. ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์œ ํ˜• ์˜์กด์„ฑ
– Spider 2.0 ๋ฒค์น˜๋งˆํฌ๋Š” ํŠน์ • ์œ ํ˜•์˜ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ํŽธํ–ฅ
– ๋ชจ๋“  ์‚ฐ์—… ๋ถ„์•ผ๋‚˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์œ ํ˜•์— ์ผ๋ฐ˜ํ™”๋จ์€ ๋ณด์žฅ๋˜์ง€ ์•Š์Œ

3. ๋ณตํ•ฉ ๋„๊ตฌ ์ตœ์ ํ™”
– ํ˜„์žฌ ๋ณตํ•ฉ ๋„๊ตฌ๋Š” ๋นˆ๋„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž๋™ ์ƒ์„ฑ
– ๋” ์ •๊ตํ•œ ์ตœ์ ํ™” ์ „๋žต(์˜ˆ: ์„ฑ๋Šฅ ์˜ํ–ฅ ๊ธฐ๋ฐ˜)์ด ์ถ”๊ฐ€ ์—ฐ๊ตฌ ํ•„์š”

4. ์ •์  ๋ฉ”๋ชจ๋ฆฌ
– ํ˜„์žฌ ๊ตฌ์กฐ์—์„œ๋Š” ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ •์ ์œผ๋กœ ์—…๋ฐ์ดํŠธ
– ๋™์  ๋ฉ”๋ชจ๋ฆฌ ๊ฐฑ์‹  ๋ฐ ๋งŒ๋ฃŒ(expiration) ์ „๋žต ํ•„์š”

ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

1. ๋™์  ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ
– ์‚ฌ์šฉ ํŒจํ„ด ๊ธฐ๋ฐ˜ ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”
– ์˜ค๋ž˜๋œ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์ž๋™ ๋งŒ๋ฃŒ
– ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ ์ œํ•œ ๋‚ด์—์„œ ์ตœ์ ์˜ ํ•˜์œ„ ์ง‘ํ•ฉ ์œ ์ง€

2. ๋ฉ”ํƒ€๋Ÿฌ๋‹(Meta-learning) ์ ์šฉ
– ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์œ ํ˜•์œผ๋กœ์˜ ๋น ๋ฅธ ์ ์‘
– ์ „์ด ํ•™์Šต(Transfer Learning)์„ ํ†ตํ•œ ์ดˆ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์ถ•

3. ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์‹œ๋งจํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ
– ํ…์ŠคํŠธ ์™ธ์—๋„ ์ด๋ฏธ์ง€, ๊ทธ๋ž˜ํ”„ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ํƒ€์ž… ์ง€์›
– ๋น„์ •ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•œ ์ข…ํ•ฉ์  ๋ฐ์ดํ„ฐ ๋ถ„์„

4. ์‹ค๋ฌด์  ํ†ตํ•ฉ
– ๊ธฐ์กด ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ๊ณผ์˜ ํ†ตํ•ฉ
– ์‚ฌ์šฉ์ž ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„ ๊ตฌ์ถ•
– A/B ํ…Œ์ŠคํŠธ๋ฅผ ํ†ตํ•œ ์ง€์†์  ๊ฐœ์„ 


๐Ÿ“Œ 3๋‹จ๊ณ„: ๋น„ํŒ์  ํ‰๊ฐ€

๋ฐฉ๋ฒ•๋ก ์  ํƒ€๋‹น์„ฑ

์žฅ์ :

  1. ๊ฐ•๋ ฅํ•œ ์‹คํ—˜ ์„ค๊ณ„: Spider 2.0 ๋ฒค์น˜๋งˆํฌ ์‚ฌ์šฉ์œผ๋กœ ํ˜„์‹ค์ ์ธ ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ํ™˜๊ฒฝ ๋ชจ์‚ฌ
  2. ํฌ๊ด„์ ์ธ ์†Œ๊ฑฐ ์—ฐ๊ตฌ: ๊ฐ ๊ตฌ์„ฑ์š”์†Œ์˜ ๊ธฐ์—ฌ๋„ ๋ถ„์„ ์ฒ ์ €
  3. ๋‹ค์ฐจ์›์  ํ‰๊ฐ€: ์ •ํ™•๋„, ํšจ์œจ์„ฑ, ์†๋„ ๋“ฑ ๋‹ค์–‘ํ•œ ์ง€ํ‘œ ์ธก์ •
  4. ์žฌํ˜„ ๊ฐ€๋Šฅ์„ฑ: ๋ช…ํ™•ํ•œ ๊ตฌํ˜„ ์ƒ์„ธ์™€ ์‹คํ—˜ ์„ค์ • ์ œ๊ณต

๊ฐœ์„  ๊ฐ€๋Šฅ์ :

  1. ๊ธฐ์ค€ ๋ชจ๋ธ ์„ ํƒ: ์ผ๋ถ€ ๊ธฐ์ค€ ๋ชจ๋ธ์ด ์ตœ์‹  SOTA๊ฐ€ ์•„๋‹ ์ˆ˜ ์žˆ์Œ
  2. ๋ฐ์ดํ„ฐ์…‹ ํŽธํ–ฅ: Spider 2.0์ด ํŠน์ • ์‚ฐ์—…์— ํŽธํ–ฅ๋˜์–ด ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ
  3. ์Šค์ผ€์ผ๋ง ํ…Œ์ŠคํŠธ: ๋” ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ์˜ ์„ฑ๋Šฅ ๊ฒ€์ฆ ์ถ”๊ฐ€ ํ•„์š”

๋…ผ๋ฆฌ์  ์ผ๊ด€์„ฑ

๊ฐ•์ :

  1. ๋ช…ํ™•ํ•œ ๋ฌธ์ œ ์ •์˜: ์„ธ ๊ฐ€์ง€ ํ•ต์‹ฌ ๋ฌธ์ œ๋ฅผ ์ž˜ ์‹๋ณ„ํ•˜๊ณ  ์„ค๋ช…
  2. ์ผ๊ด€๋œ ์„ค๊ณ„ ์ฒ ํ•™: ์‹œ๋งจํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ, ๋ณตํ•ฉ ๋„๊ตฌ, ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์žฌ์‚ฌ์šฉ์ด ์ „์ฒด์ ์œผ๋กœ ์กฐํ™”
  3. ๊ฐ•๋ ฅํ•œ ์ด๋ก ์  ๊ธฐ๋ฐ˜: ๋ฌธ์ œ ์ •์˜์™€ ํ•ด๊ฒฐ์ฑ… ์‚ฌ์ด์˜ ๋…ผ๋ฆฌ์  ์—ฐ๊ฒฐ ๊ฐ•๋ ฅ

๊ฒ€ํ†  ํ•„์š” ์‚ฌํ•ญ:

  1. ๋ณตํ•ฉ ๋„๊ตฌ ์ƒ์„ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜: ๋นˆ๋„ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ์˜ ์ •๋‹น์„ฑ ์ถ”๊ฐ€ ์„ค๋ช… ํ•„์š”
  2. ์‹œ๋งจํ‹ฑ ํƒœ๊น… ์ฒด๊ณ„: ํƒœ๊น… ๋ฐฉ๋ฒ•๋ก ๊ณผ ํ‘œ์ค€ํ™” ์ „๋žต ๊ตฌ์ฒดํ™” ํ•„์š”
  3. ๋ฉ”๋ชจ๋ฆฌ ์ถฉ๋Œ ์ฒ˜๋ฆฌ: ์ƒ์ถฉํ•˜๋Š” ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ฐ„์˜ ์ถฉ๋Œ ํ•ด๊ฒฐ ์ „๋žต ๋ช…์‹œ ํ•„์š”

๊ธฐ์—ฌ๋„ ํ‰๊ฐ€

์ฃผ์š” ๊ธฐ์—ฌ:

  1. ์ด๋ก ์  ๊ธฐ์—ฌ:
    • ๊ตฌ์กฐํ™”๋œ ์‹œ๋งจํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ ๊ฐœ๋… ๋„์ž…
    • ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์žฌ์‚ฌ์šฉ ํ”„๋ ˆ์ž„์›Œํฌ ์ œ์‹œ
    • ๋ณตํ•ฉ ๋„๊ตฌ ์ตœ์ ํ™” ์›์น™ ์ •๋ฆฝ
  2. ์‹ค์šฉ์  ๊ธฐ์—ฌ:
    • Spider 2.0 Lite์—์„œ SOTA ์ •ํ™•๋„ 44.8% ๋‹ฌ์„ฑ
    • ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰ 25%, ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ๊ธธ์ด 35% ๊ฐ์†Œ
    • ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ํ™˜๊ฒฝ์—์„œ์˜ ์‹ค์ œ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ ์ž…์ฆ
  3. ๋ฐฉ๋ฒ•๋ก ์  ๊ธฐ์—ฌ:
    • ์—์ด์ „ํŠธ Text-to-SQL์˜ ์ƒˆ๋กœ์šด ์„ค๊ณ„ ํŒจ๋Ÿฌ๋‹ค์ž„ ์ œ์‹œ
    • ํšจ์œจ์„ฑ๊ณผ ์ •ํ™•๋„ ๋™์‹œ ํ–ฅ์ƒ ๋ฐฉ๋ฒ•๋ก  ์ œ๊ณต

์ฐธ์กฐ ๊ฐ€์น˜:
– Text-to-SQL ์—ฐ๊ตฌ์ž: ์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•์˜ ์ƒˆ๋กœ์šด ๋ฐฉํ–ฅ
– ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ์‹ค๋ฌด์ž: ๋น„์šฉ ํšจ์œจ์ ์ธ ์ž์—ฐ์–ด ๋ฐ์ดํ„ฐ ์ธํ„ฐํŽ˜์ด์Šค
– ์—์ด์ „ํŠธ ์—ฐ๊ตฌ์ž: ์‹œ๋งจํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ ๋ฐ ํŠธ๋ผ์ด์ ํ† ๋ฆฌ ์žฌ์‚ฌ์šฉ์˜ ๋ฒ”์šฉ ํ”„๋ ˆ์ž„์›Œํฌ

์‹ค๋ฌด ์ ์šฉ ํฌ์ธํŠธ

๋ฐ”๋กœ ์ ์šฉ ๊ฐ€๋Šฅ:

  1. ๋ฐ์ดํ„ฐ ๋ถ„์„ ํ”Œ๋žซํผ: ๋น„๊ธฐ์ˆ ์  ์‚ฌ์šฉ์ž๋ฅผ ์œ„ํ•œ ์ž์—ฐ์–ด ์ฟผ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค
  2. ๋น„์ฆˆ๋‹ˆ์Šค ์ธํ…”๋ฆฌ์ „์Šค ๋„๊ตฌ: ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์งˆ๋ฌธ ์ž๋™ํ™”
  3. ๊ณ ๊ฐ ์„œ๋น„์Šค ๋ด‡: ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ธฐ๋ฐ˜ ๊ณ ๊ฐ ๋ฌธ์˜ ์ž๋™ ์‘๋‹ต

์ถ”๊ฐ€ ๊ฐœ๋ฐœ ํ•„์š”:

  1. ๋„๋ฉ”์ธ ํŠนํ™”: ํŠน์ • ์‚ฐ์—…(๊ธˆ์œต, ํ—ฌ์Šค์ผ€์–ด ๋“ฑ)์— ๋งž๋Š” ์ปค์Šคํ„ฐ๋งˆ์ด์ œ์ด์…˜
  2. ์‚ฌ์šฉ์ž ์ธํ„ฐํŽ˜์ด์Šค: ๊ฐœ๋ฐœ์ž/๊ด€๋ฆฌ์ž๋ฅผ ์œ„ํ•œ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ ๋„๊ตฌ
  3. ํ†ตํ•ฉ API: ๊ธฐ์กด ์‹œ์Šคํ…œ๊ณผ์˜ ์‰ฌ์šด ํ†ตํ•ฉ์„ ์œ„ํ•œ ํ‘œ์ค€ํ™”๋œ API

[!warning] ์‹ค๋ฌด ์ ์šฉ ์‹œ ๊ณ ๋ ค์‚ฌํ•ญ
– ๋ฐ์ดํ„ฐ ๋ณด์•ˆ: ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ๋ฐ์ดํ„ฐ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ ๋ฐ ๋ณด์•ˆ ์š”๊ตฌ์‚ฌํ•ญ ์ค€์ˆ˜
– ๊ทœ์ œ ์ค€์ˆ˜: ์‚ฐ์—…๋ณ„ ๊ทœ์ œ(HIPAA, GDPR ๋“ฑ)์— ๋”ฐ๋ฅธ ์ ‘๊ทผ ์ œ์–ด ํ•„์š”
– ๋ชจ๋‹ˆํ„ฐ๋ง: ์‹ค์ œ ๋ฐฐํฌ ํ›„ ์„ฑ๋Šฅ ๋ฐ ์‚ฌ์šฉ ํŒจํ„ด ์ง€์† ๋ชจ๋‹ˆํ„ฐ๋ง ํ•„์ˆ˜


References

  1. Biswal, A., Lei, C., Qin, X., Li, A., Narayanaswamy, B., & Kraska, T. (2026). AgentSM: Semantic Memory for Agentic Text-to-SQL. arXiv:2601.15709v1. https://doi.org/10.48550/arXiv.2601.15709

  2. Lei, C., et al. (2024). Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows. [Spider 2.0 ๋ฒค์น˜๋งˆํฌ ์†Œ๊ฐœ ๋…ผ๋ฌธ]



  3. Li, Y., et al. (2024). Can LLM Already Serve as a Database Interface? A Big Bench for Large-Scale Database Grounded Text-to-SQLs. BIRD ๋ฒค์น˜๋งˆํฌ



  4. Yu, T., et al. (2019). Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. Spider ๋ฒค์น˜๋งˆํฌ



  5. Liu, Z., et al. (2025). Supporting Our AI Overlords: Redesigning Data Systems to Be Agent-First. ์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์‹œ์Šคํ…œ



  6. Deng, X., et al. (2025). ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Consensus Enforcement, and Column Exploration.



  7. Wang, Y., et al. (2025b). LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL.



  8. Sun, J., et al. (2025). AgenticData: An Agentic Data Analytics System for Heterogeneous Data.


์ž‘์„ฑ์ž

skycave

Follow Me
๋‹ค๋ฅธ ๊ธฐ์‚ฌ
Previous

[AI Paper] AgentBench: Evaluating LLMs as Agents

Next

[AI Paper] Agentic Confidence Calibration

๋Œ“๊ธ€ ์—†์Œ! ์ฒซ ๋Œ“๊ธ€์„ ๋‚จ๊ฒจ๋ณด์„ธ์š”.

๋‹ต๊ธ€ ๋‚จ๊ธฐ๊ธฐ ์‘๋‹ต ์ทจ์†Œ

์ด๋ฉ”์ผ ์ฃผ์†Œ๋Š” ๊ณต๊ฐœ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•„์ˆ˜ ํ•„๋“œ๋Š” *๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค

์ตœ์‹ ๊ธ€

  • ๐Ÿ“Š ์ผ์ผ ๋‰ด์Šค ๊ฐ์„ฑ ๋ฆฌํฌํŠธ – 2026-01-28
  • AI ์‹œ์Šคํ…œ์˜ ๋ฌธ๋งฅ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰(Contextual Retrieval) | Anthropic
  • “Think” ํˆด: Claude๊ฐ€ ๋ฉˆ์ถฐ์„œ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ | Anthropic
  • Claude Code ๋ชจ๋ฒ” ์‚ฌ๋ก€ \ Anthropic
  • ์šฐ๋ฆฌ๊ฐ€ ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ์—ฐ๊ตฌ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ ๋ฐฉ๋ฒ•
Copyright 2026 — skycave's Blog. All rights reserved. Blogsy WordPress Theme