๋ณธ๋ฌธ์œผ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ
-
skycave's Blog
skycave's Blog
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
  • Home
  • Investment
  • IT
    • Data engineering
    • AI
    • Programing
  • Leisure
    • Camping
    • Fishing
  • Travel
    • Domestic
    • Overseas
  • Book
  • Product
  • Hot keyword in google
๋‹ซ๊ธฐ

๊ฒ€์ƒ‰

AI

[AI Paper] ๐Ÿ“„ ToolLLM: Facilitating LLMs to Master 16000+ APIs

By skycave
2026๋…„ 01์›” 25์ผ 5 Min Read
0

๐Ÿ“„ ToolLLM: Facilitating LLMs to Master 16000+ APIs

๐Ÿ“‹ ๋ฉ”ํƒ€ ์ •๋ณด

ํ•ญ๋ชฉ ๋‚ด์šฉ
๋…ผ๋ฌธ ์ œ๋ชฉ ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
๋ฐœํ‘œ ICLR 2024 (Spotlight)
์ €์ž Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu ์™ธ ๋‹ค์ˆ˜
์†Œ์† Tsinghua University, ModelBest Inc., Renmin University of China, Yale University, WeChat AI (Tencent Inc.), Zhihu Inc.
arXiv 2307.16789
GitHub OpenBMB/ToolBench
OpenReview dHng2O0Jjr

๐ŸŽฏ ํ•œ์ค„ ์š”์•ฝ

16,000๊ฐœ ์ด์ƒ์˜ ์‹ค์ œ API๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์˜คํ”ˆ์†Œ์Šค LLM์˜ ๋„๊ตฌ ์‚ฌ์šฉ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ToolBench ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•, DFSDT ์ถ”๋ก  ์•Œ๊ณ ๋ฆฌ์ฆ˜, ToolEval ์ž๋™ ํ‰๊ฐ€ ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•˜์—ฌ ToolLLaMA๊ฐ€ ChatGPT์™€ ์œ ์‚ฌํ•œ ๋„๊ตฌ ํ™œ์šฉ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•จ.


๐Ÿ” ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ

๋ฌธ์ œ ์ธ์‹

  • ์˜คํ”ˆ์†Œ์Šค LLM์˜ ํ•œ๊ณ„: LLaMA ๋“ฑ ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ๋“ค์€ ๋„๊ตฌ ์‚ฌ์šฉ(Tool-use) ๋Šฅ๋ ฅ์ด ํ˜„์ €ํžˆ ๋ถ€์กฑ
  • ๊ธฐ์กด Instruction Tuning์˜ ํ•œ๊ณ„: ๋Œ€๋ถ€๋ถ„ ๊ธฐ๋ณธ์ ์ธ ์–ธ์–ด ์ž‘์—…์—๋งŒ ์ง‘์ค‘ํ•˜๊ณ  ๋„๊ตฌ ์‚ฌ์šฉ ๋„๋ฉ”์ธ์€ ๋ฌด์‹œ
  • ํ์‡„ํ˜• ๋ชจ๋ธ๊ณผ์˜ ๊ฒฉ์ฐจ: ChatGPT, GPT-4 ๋“ฑ SOTA ํ์‡„ํ˜• ๋ชจ๋ธ๋“ค์€ ๋›ฐ์–ด๋‚œ ๋„๊ตฌ ์‚ฌ์šฉ ๋Šฅ๋ ฅ ๋ณด์œ 

๊ธฐ์กด ์—ฐ๊ตฌ์˜ ๋ฌธ์ œ์ 

  1. ์ œํ•œ๋œ API: ์‹ค์ œ API(์˜ˆ: REST API)๋ฅผ ํฌํ•จํ•˜์ง€ ์•Š๊ฑฐ๋‚˜, ๋‹ค์–‘์„ฑ์ด ๋ถ€์กฑํ•œ ์†Œ์ˆ˜์˜ API๋งŒ ๊ณ ๋ ค
  2. ์ œํ•œ๋œ ์‹œ๋‚˜๋ฆฌ์˜ค: ๋‹จ์ˆœํ•œ ๋‹จ์ผ ๋„๊ตฌ ์‚ฌ์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค์— ๊ตญํ•œ
  3. ์—ด๋“ฑํ•œ ๊ณ„ํš ๋ฐ ์ถ”๋ก : ๋ณต์žกํ•œ ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก  ๋Šฅ๋ ฅ ๋ถ€์กฑ

์—ฐ๊ตฌ ๋ชฉํ‘œ

  • ์˜คํ”ˆ์†Œ์Šค LLM์ด 16,000๊ฐœ ์ด์ƒ์˜ ์‹ค์ œ API๋ฅผ ๋งˆ์Šคํ„ฐํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›
  • ๋ฐ์ดํ„ฐ ๊ตฌ์ถ•๋ถ€ํ„ฐ ๋ชจ๋ธ ํ›ˆ๋ จ, ํ‰๊ฐ€๊นŒ์ง€ ์ข…ํ•ฉ์ ์ธ ๋„๊ตฌ ์‚ฌ์šฉ ํ”„๋ ˆ์ž„์›Œํฌ ์ œ๊ณต

๐Ÿ’ก ํ•ต์‹ฌ ์•„์ด๋””์–ด

1. ToolBench ๋ฐ์ดํ„ฐ์…‹

ToolBench๋Š” ๋„๊ตฌ ์‚ฌ์šฉ์„ ์œ„ํ•œ ๋Œ€๊ทœ๋ชจ Instruction Tuning ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ, ChatGPT๋ฅผ ํ™œ์šฉํ•ด ์ž๋™ ๊ตฌ์ถ•๋จ.

3๋‹จ๊ณ„ ๊ตฌ์ถ• ๊ณผ์ •

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    ToolBench ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  1๋‹จ๊ณ„: API ์ˆ˜์ง‘                                                  โ”‚
โ”‚  โ””โ”€โ”€ RapidAPI Hub์—์„œ 16,464๊ฐœ ์‹ค์ œ RESTful API ์ˆ˜์ง‘              โ”‚
โ”‚      โ””โ”€โ”€ 49๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ (๊ธˆ์œต, ์†Œ์…œ๋ฏธ๋””์–ด, ๋‚ ์”จ, ์Œ์•… ๋“ฑ)            โ”‚
โ”‚                                                                   โ”‚
โ”‚  2๋‹จ๊ณ„: ๋ช…๋ น์–ด ์ƒ์„ฑ                                               โ”‚
โ”‚  โ””โ”€โ”€ ChatGPT๋ฅผ ํ™œ์šฉํ•ด ๋‹ค์–‘ํ•œ ๋ช…๋ น์–ด ์ž๋™ ์ƒ์„ฑ                     โ”‚
โ”‚      โ”œโ”€โ”€ Single-tool ์‹œ๋‚˜๋ฆฌ์˜ค                                    โ”‚
โ”‚      โ””โ”€โ”€ Multi-tool ์‹œ๋‚˜๋ฆฌ์˜ค                                     โ”‚
โ”‚          โ”œโ”€โ”€ Intra-category (๋™์ผ ์นดํ…Œ๊ณ ๋ฆฌ ๋‚ด ๋‹ค์ค‘ ๋„๊ตฌ)          โ”‚
โ”‚          โ””โ”€โ”€ Intra-collection (๋™์ผ ์ปฌ๋ ‰์…˜ ๋‚ด ๋‹ค์ค‘ ๋„๊ตฌ)          โ”‚
โ”‚                                                                   โ”‚
โ”‚  3๋‹จ๊ณ„: ์†”๋ฃจ์…˜ ๊ฒฝ๋กœ ์ฃผ์„                                          โ”‚
โ”‚  โ””โ”€โ”€ DFSDT ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋ณต์žกํ•œ ๋ช…๋ น์–ด์— ๋Œ€ํ•œ ํ•ด๋‹ต ๊ฒฝ๋กœ ์ƒ์„ฑ       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๋ฐ์ดํ„ฐ์…‹ ๊ทœ๋ชจ

  • API ์ˆ˜: 16,464๊ฐœ
  • ์นดํ…Œ๊ณ ๋ฆฌ: 49๊ฐœ
  • (๋ช…๋ น์–ด, ์†”๋ฃจ์…˜ ๊ฒฝ๋กœ) ์Œ: 126,486๊ฐœ

2. DFSDT (Depth-First Search-based Decision Tree)

LLM์˜ ๊ณ„ํš ๋ฐ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์˜์‚ฌ๊ฒฐ์ • ํŠธ๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜.

DFSDT vs ReACT ๋น„๊ต

ํŠน์„ฑ ReACT DFSDT
ํƒ์ƒ‰ ๋ฐฉ์‹ ๋‹จ์ผ ๊ฒฝ๋กœ ๋‹ค์ค‘ ๊ฒฝ๋กœ ํƒ์ƒ‰
์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ ์˜ค๋ฅ˜ ์‹œ ์ค‘๋‹จ ๋ฐฑํŠธ๋ž˜ํ‚น์œผ๋กœ ๋ณต๊ตฌ
ํƒ์ƒ‰ ๊ณต๊ฐ„ ์ œํ•œ์  ํ™•์žฅ๋œ ํƒ์ƒ‰ ๊ณต๊ฐ„
๋ณต์žกํ•œ ์ž‘์—… ๋‚ฎ์€ ์„ฑ๊ณต๋ฅ  ๋†’์€ ์„ฑ๊ณต๋ฅ 

DFSDT ์ž‘๋™ ์›๋ฆฌ

                    [Root: ์‚ฌ์šฉ์ž ๋ช…๋ น์–ด]
                           โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ–ผ            โ–ผ            โ–ผ
         [๋…ธ๋“œ 1]      [๋…ธ๋“œ 2]      [๋…ธ๋“œ 3]
         API ํ˜ธ์ถœ A    API ํ˜ธ์ถœ B    API ํ˜ธ์ถœ C
              โ”‚            โ”‚            โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”   ์‹คํŒจโ†’ํฌ๊ธฐ    โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”
         โ–ผ         โ–ผ               โ–ผ         โ–ผ
      [์„ฑ๊ณต]    [์‹คํŒจ]          [๋…ธ๋“œ 3-1] [๋…ธ๋“œ 3-2]
         โ”‚     ๋ฐฑํŠธ๋ž˜ํ‚น              โ”‚         โ”‚
         โ–ผ        โ”‚             [์„ฑ๊ณต]     [์‹คํŒจ]
      [์™„๋ฃŒ]   ์ƒˆ ๊ฒฝ๋กœ ํƒ์ƒ‰          โ”‚      ๋ฐฑํŠธ๋ž˜ํ‚น
                                 [์™„๋ฃŒ]

ํ•ต์‹ฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜

  1. ๋‹ค์ค‘ ์ถ”๋ก  ๊ฒฝ๋กœ ํ‰๊ฐ€: ์—ฌ๋Ÿฌ ์ถ”๋ก  ๊ฒฝ๋กœ๋ฅผ ๋™์‹œ์— ๊ณ ๋ ค
  2. ์„ ํƒ์  ์ง„ํ–‰/ํ›„ํ‡ด: ์œ ๋งํ•œ ๊ฒฝ๋กœ๋กœ ์ง„ํ–‰ํ•˜๊ฑฐ๋‚˜, ๋ง‰ํžŒ ๊ฒฝ๋กœ์—์„œ ํ›„ํ‡ด
  3. “Finish by Giving Up” ํ•จ์ˆ˜: ํ˜„์žฌ ๋…ธ๋“œ ํฌ๊ธฐ ํ›„ ์ƒˆ ๋…ธ๋“œ ํ™•์žฅ
  4. DFS ์„ ํ˜ธ ์ด์œ : ์œ ํšจํ•œ ๊ฒฝ๋กœ ํ•˜๋‚˜๋งŒ ์ฐพ์œผ๋ฉด ์ฃผ์„ ์™„๋ฃŒ (BFS๋ณด๋‹ค API ํ˜ธ์ถœ ๋น„์šฉ ์ ˆ๊ฐ)

3. ToolEval ํ‰๊ฐ€ ์‹œ์Šคํ…œ

ChatGPT ๊ธฐ๋ฐ˜ ์ž๋™ ํ‰๊ฐ€ ์‹œ์Šคํ…œ์œผ๋กœ, AlpacaEval์˜ ๋ฐฉ์‹์„ ๋”ฐ๋ฆ„.

ํ‰๊ฐ€ ์ง€ํ‘œ

์ง€ํ‘œ ์„ค๋ช… ์ธก์ • ๋ฐฉ์‹
Pass Rate ๋ช…๋ น์–ด ์„ฑ๊ณต์  ์™„๋ฃŒ ๋น„์œจ ์ œํ•œ๋œ API ํ˜ธ์ถœ ๋‚ด ์ž‘์—… ์™„๋ฃŒ ์—ฌ๋ถ€
Win Rate ์†”๋ฃจ์…˜ ๊ฒฝ๋กœ ์„ ํ˜ธ๋„ ๋‘ ์†”๋ฃจ์…˜ ๋น„๊ต ์‹œ ์„ ํ˜ธ๋˜๋Š” ๋น„์œจ

Pass Rate ํ‰๊ฐ€ ๊ธฐ์ค€

  • Pass: ์ž‘์—… ์„ฑ๊ณต์  ์™„๋ฃŒ
  • Fail: ์ž‘์—… ์‹คํŒจ
  • Unsure: ํŒ๋‹จ ๋ถˆ๊ฐ€

์‹ ๋ขฐ๋„ ๊ฒ€์ฆ

  • Pass Rate: ์ธ๊ฐ„ ํ‰๊ฐ€์ž์™€ 87.1% ์ผ์น˜
  • Win Rate: ์ธ๊ฐ„ ํ‰๊ฐ€์ž์™€ 80.3% ์ผ์น˜

๐Ÿ—๏ธ ์•„ํ‚คํ…์ฒ˜ / ๋ฐฉ๋ฒ•๋ก 

์ „์ฒด ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         ToolLLM Framework                            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”           โ”‚
โ”‚  โ”‚  ์‚ฌ์šฉ์ž ๋ช…๋ น์–ด  โ”‚โ”€โ”€โ”€โ–ถโ”‚ API Retriever โ”‚โ”€โ”€โ”€โ–ถโ”‚  ToolLLaMA   โ”‚          โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜           โ”‚
โ”‚         โ”‚                   โ”‚                    โ”‚                    โ”‚
โ”‚         โ”‚            ๊ด€๋ จ API ์ถ”์ฒœ          DFSDT ์ถ”๋ก                 โ”‚
โ”‚         โ”‚                   โ”‚                    โ”‚                    โ”‚
โ”‚         โ–ผ                   โ–ผ                    โ–ผ                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”‚
โ”‚  โ”‚                    API ์‹คํ–‰ ํ™˜๊ฒฝ                          โ”‚        โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”‚        โ”‚
โ”‚  โ”‚  โ”‚ API 1   โ”‚ โ”‚ API 2   โ”‚ โ”‚ API 3   โ”‚ โ”‚  ...    โ”‚        โ”‚        โ”‚
โ”‚  โ”‚  โ”‚(RapidAPI)โ”‚ โ”‚(RapidAPI)โ”‚ โ”‚(RapidAPI)โ”‚ โ”‚(16,464๊ฐœ)โ”‚        โ”‚        โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ”‚        โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ”‚
โ”‚                              โ”‚                                        โ”‚
โ”‚                              โ–ผ                                        โ”‚
โ”‚                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                 โ”‚
โ”‚                     โ”‚   ์ตœ์ข… ์‘๋‹ต    โ”‚                                 โ”‚
โ”‚                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                 โ”‚
โ”‚                                                                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

API Retriever

  • ๋ชฉ์ : ๋Œ€๊ทœ๋ชจ API ํ’€์—์„œ ๊ด€๋ จ API ์ž๋™ ์„ ํƒ
  • ํ•™์Šต: Neural API Retriever ํ›ˆ๋ จ
  • ๊ธฐ๋Šฅ: ๋ช…๋ น์–ด ์ž…๋ ฅ ์‹œ ๊ด€๋ จ API ์„ธํŠธ ์ถ”์ฒœ
  • ์„ฑ๋Šฅ: ์‹ค์ œ ์‚ฌ์šฉ API์™€ ๋†’์€ ์ •๋ฐ€๋„๋กœ ์ผ์น˜

ToolLLaMA ํ›ˆ๋ จ

  1. ๋ฒ ์ด์Šค ๋ชจ๋ธ: LLaMA
  2. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ: ToolBench 126,486๊ฐœ (๋ช…๋ น์–ด, ์†”๋ฃจ์…˜ ๊ฒฝ๋กœ) ์Œ
  3. ๋ฐฉ์‹: Supervised Fine-tuning (SFT)
  4. ๊ต์‚ฌ ๋ชจ๋ธ: ChatGPT (gpt-3.5-turbo-16k)

๐Ÿ“Š ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ

์ฃผ์š” ์„ฑ๋Šฅ ๋น„๊ต

๋ชจ๋ธ Pass Rate Win Rate
Text-Davinci-003 22.6% 16.5%
Claude-2 34.4% 40.8%
ToolLLaMA + DFSDT 66.7% 67.3%
ChatGPT ~70% ~70%
GPT-4 + DFSDT ์ตœ๊ณ  ์„ฑ๋Šฅ ์ตœ๊ณ  ์„ฑ๋Šฅ

DFSDT ํšจ๊ณผ

๋ชจ๋“  ํ…Œ์ŠคํŠธ๋œ LLM์—์„œ DFSDT๋Š” ReACT ๋Œ€๋น„ Pass Rate์™€ Win Rate ๋ชจ๋‘์—์„œ ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์ž„.

์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ

์‹œ๋‚˜๋ฆฌ์˜ค ์„ค๋ช… ToolLLaMA + DFSDT Pass Rate
I1-Inst. ์ƒˆ๋กœ์šด ๋ช…๋ น์–ด (๊ธฐ์กด ๋„๊ตฌ) 77.0%
I1-Tool ์ƒˆ๋กœ์šด ๋„๊ตฌ (๊ธฐ์กด ์นดํ…Œ๊ณ ๋ฆฌ) 62.0%
I1-Cat. ์ƒˆ๋กœ์šด ์นดํ…Œ๊ณ ๋ฆฌ 58.0%
I2-Inst. ์ƒˆ ๋ช…๋ น์–ด + ์ƒˆ ๋„๊ตฌ 52.0%
I2-Cat. ์ƒˆ ๋ช…๋ น์–ด + ์ƒˆ ์นดํ…Œ๊ณ ๋ฆฌ 48.0%
I3-Inst. ๊ฐ€์žฅ ์–ด๋ ค์šด ์‹œ๋‚˜๋ฆฌ์˜ค 45.0%

OOD (Out-of-Distribution) ํ…Œ์ŠคํŠธ

  • APIBench ๋ฒค์น˜๋งˆํฌ์—์„œ ToolLLaMA๋Š” ํ›ˆ๋ จ์— ์‚ฌ์šฉ๋˜์ง€ ์•Š์€ API์™€ ๋ช…๋ น์–ด์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , APIBench ์ „์šฉ ๋ชจ๋ธ์ธ Gorilla์™€ ๋™๋“ฑํ•œ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ

๐Ÿ’ช ๊ฐ•์  ๋ฐ ๊ธฐ์—ฌ

1. ๋Œ€๊ทœ๋ชจ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ์…‹

  • 16,464๊ฐœ ์‹ค์ œ API ํฌํ•จ (๊ธฐ์กด ์—ฐ๊ตฌ ๋Œ€๋น„ ์••๋„์  ๊ทœ๋ชจ)
  • ์ž๋™ํ™”๋œ ๋ฐ์ดํ„ฐ ๊ตฌ์ถ• ํŒŒ์ดํ”„๋ผ์ธ (์ตœ์†Œํ•œ์˜ ์ธ๊ฐ„ ๊ฐ๋…)
  • ๋‹ค์–‘ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค (๋‹จ์ผ/๋‹ค์ค‘ ๋„๊ตฌ) ์ปค๋ฒ„

2. ํ˜์‹ ์ ์ธ ์ถ”๋ก  ์•Œ๊ณ ๋ฆฌ์ฆ˜ (DFSDT)

  • ๊ธฐ์กด ReACT์˜ ํ•œ๊ณ„ ๊ทน๋ณต
  • ๋ณต์žกํ•œ ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก  ๊ฐ€๋Šฅ
  • ๋ฐฑํŠธ๋ž˜ํ‚น์„ ํ†ตํ•œ ์˜ค๋ฅ˜ ๋ณต๊ตฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜

3. ํ‘œ์ค€ํ™”๋œ ํ‰๊ฐ€ ์ฒด๊ณ„ (ToolEval)

  • ์ž๋™ํ™”๋œ ํ‰๊ฐ€ ์‹œ์Šคํ…œ
  • ์ธ๊ฐ„ ํ‰๊ฐ€์™€ ๋†’์€ ์ƒ๊ด€๊ด€๊ณ„ (87.1%, 80.3%)
  • ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๋ฒค์น˜๋งˆํ‚น

4. ์˜คํ”ˆ์†Œ์Šค ์ƒํƒœ๊ณ„ ๊ธฐ์—ฌ

  • ๋ชจ๋“  ์ฝ”๋“œ, ๋ฐ์ดํ„ฐ, ๋ชจ๋ธ ๊ณต๊ฐœ
  • ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ(LLaMA) ๊ธฐ๋ฐ˜์œผ๋กœ ChatGPT๊ธ‰ ๋„๊ตฌ ์‚ฌ์šฉ ๋Šฅ๋ ฅ ๋‹ฌ์„ฑ

โš ๏ธ ํ•œ๊ณ„์ 

1. ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์˜์กด์„ฑ

  • ChatGPT๋ฅผ ํ†ตํ•œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์œผ๋กœ ์ƒํƒœํ•™์  ํƒ€๋‹น์„ฑ ์ œํ•œ
  • ์‹ค์ œ ์‚ฌ์šฉ์ž ๋กœ๊ทธ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ๋ถ€์žฌ

2. ๋ณต์žกํ•œ ๋‹ค์ค‘ ๋„๊ตฌ ์›Œํฌํ”Œ๋กœ์šฐ

  • ๋Œ€๋ถ€๋ถ„ ๋‹จ์ผ ๋„๊ตฌ ๋˜๋Š” ์ˆœ์ฐจ์  ๋‹ค์ค‘ ๋„๊ตฌ ์‚ฌ์šฉ์— ์ง‘์ค‘
  • ํŒŒ์ดํ”„๋ผ์ธ ์ˆ˜์ค€์˜ ๋‹ค์ค‘ ๋ผ์šด๋“œ ์›Œํฌํ”Œ๋กœ์šฐ๋Š” ์ถฉ๋ถ„ํžˆ ํƒ๊ตฌ๋˜์ง€ ์•Š์Œ

3. ํ‰๊ฐ€ ๋ชจํ˜ธ์„ฑ

  • ์—ฌ๋Ÿฌ ์œ ํšจํ•œ ๋„๊ตฌ ์‚ฌ์šฉ ์ „๋žต ์กด์žฌ
  • ํ˜•์‹์  ์ •ํ™•์„ฑ์˜ ๋ชจํ˜ธํ•จ์œผ๋กœ ์ง€ํ‘œ ํ‘œ์ค€ํ™” ์–ด๋ ค์›€

4. ๋™์  API ํ™˜๊ฒฝ ๋Œ€์‘

  • ์™ธ๋ถ€ ๋„๊ตฌ ๋ชฉ๋ก์ด ์ง„ํ™”ํ•จ์— ๋”ฐ๋ผ ์ง€์†์ ์ธ ์ ์‘ ํ•„์š”
  • API ๋ณ€๊ฒฝ ์‹œ ์žฌํ›ˆ๋ จ ํ•„์š”

5. ๋„๊ตฌ ๊ฒ€์ƒ‰์˜ ์ •ํ™•์„ฑ

  • ๋Œ€๊ทœ๋ชจ API ํ’€์—์„œ ์ •ํ™•ํ•œ ๋„๊ตฌ ๊ฒ€์ƒ‰์ด ์—ฌ์ „ํžˆ ๋„์ „์ 
  • LLM์˜ ์ œํ•œ๋œ ๋„๊ตฌ ๊ธฐ๋Šฅ ์ดํ•ด

๐Ÿ”— ๊ด€๋ จ ๋…ผ๋ฌธ

์„ ํ–‰ ์—ฐ๊ตฌ

๋…ผ๋ฌธ ์„ค๋ช…
Toolformer (Schick et al., 2023) API ํ˜ธ์ถœ ์‹œ์ /๋ฐฉ๋ฒ• ํ•™์Šต ๋ชจ๋ธ
API-Bank 73๊ฐœ API ๋„๊ตฌ ํฌํ•จ ํ‰๊ฐ€ ๋ฒค์น˜๋งˆํฌ
Gorilla API ํ˜ธ์ถœ ํŠนํ™” LLM
ReACT Reasoning + Acting ํ†ตํ•ฉ ํ”„๋ ˆ์ž„์›Œํฌ

ํ›„์† ์—ฐ๊ตฌ

๋…ผ๋ฌธ ์„ค๋ช…
AnyTool (2024) ์ž๊ธฐ ๋ฐ˜์„ฑ์ , ๊ณ„์ธต์  ์—์ด์ „ํŠธ
PEToolLLM (Xu et al., 2025) ๊ฐœ์ธํ™”๋œ ๋„๊ตฌ ํ•™์Šต
TL-Training (Ye et al., 2024) ์ž‘์—…-ํŠน์„ฑ ๊ธฐ๋ฐ˜ ๋„๊ตฌ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ
ToolQA ๋„๊ตฌ ์‚ฌ์šฉ QA ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹

๊ด€๋ จ ์„œ๋ฒ ์ด

  • LLM-Based Agents for Tool Learning: A Survey (2025, Springer)

๐Ÿ’ป ์‹ค๋ฌด ์ ์šฉ ํฌ์ธํŠธ

1. API ํ†ตํ•ฉ ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ

# ToolLLM ์Šคํƒ€์ผ์˜ API ํ˜ธ์ถœ ํŒจํ„ด
1. ์‚ฌ์šฉ์ž ๋ช…๋ น์–ด ๋ถ„์„
2. API Retriever๋กœ ๊ด€๋ จ API ์„ ํƒ
3. DFSDT๋กœ ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก  ์‹คํ–‰
4. API ์‹คํ–‰ ๋ฐ ๊ฒฐ๊ณผ ํ†ตํ•ฉ
5. ์ตœ์ข… ์‘๋‹ต ์ƒ์„ฑ

2. ๋„๊ตฌ ์‚ฌ์šฉ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•

  • RapidAPI ๊ฐ™์€ API ๋งˆ์ผ“ํ”Œ๋ ˆ์ด์Šค ํ™œ์šฉ
  • ChatGPT/GPT-4๋กœ ๋ช…๋ น์–ด ์ž๋™ ์ƒ์„ฑ
  • DFSDT๋กœ ๋ณต์žกํ•œ ์†”๋ฃจ์…˜ ๊ฒฝ๋กœ ์ฃผ์„

3. ํ‰๊ฐ€ ์‹œ์Šคํ…œ ๊ตฌ์ถ•

  • Pass Rate: ์ž‘์—… ์™„๋ฃŒ ์—ฌ๋ถ€ ์ธก์ •
  • Win Rate: A/B ํ…Œ์ŠคํŠธ ์Šคํƒ€์ผ ๋น„๊ต ํ‰๊ฐ€
  • LLM ๊ธฐ๋ฐ˜ ์ž๋™ ํ‰๊ฐ€ ํŒŒ์ดํ”„๋ผ์ธ

4. ํ”„๋กœ๋•์…˜ ๊ณ ๋ ค์‚ฌํ•ญ

  • API ๊ฒ€์ƒ‰ ์ตœ์ ํ™”: ๋Œ€๊ทœ๋ชจ API ํ’€์—์„œ ๋น ๋ฅธ ๊ฒ€์ƒ‰
  • ์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ: DFSDT์˜ ๋ฐฑํŠธ๋ž˜ํ‚น ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๊ตฌํ˜„
  • API ๋ฒ„์ „ ๊ด€๋ฆฌ: ๋™์  API ๋ณ€๊ฒฝ์— ๋Œ€์‘

5. ํ™œ์šฉ ๊ฐ€๋Šฅํ•œ ๋„๋ฉ”์ธ

  • ๊ณ ๊ฐ ์„œ๋น„์Šค: ๋‹ค์–‘ํ•œ ์™ธ๋ถ€ ์„œ๋น„์Šค API ์—ฐ๋™
  • ์—…๋ฌด ์ž๋™ํ™”: ๋ณต์žกํ•œ ๋ฉ€ํ‹ฐ์Šคํ… ์›Œํฌํ”Œ๋กœ์šฐ
  • ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘/๋ถ„์„: ์—ฌ๋Ÿฌ API ์กฐํ•ฉ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ

๐Ÿท๏ธ Tags

#ToolLLM #ToolBench #DFSDT #ToolEval #API #Tool-Use #LLM #Agent #ICLR2024 #OpenBMB #LLaMA #ChatGPT #ReACT #Instruction-Tuning #RapidAPI #Neural-Retriever #Decision-Tree #Multi-Tool #AI-Agent #Function-Calling

์ž‘์„ฑ์ž

skycave

Follow Me
๋‹ค๋ฅธ ๊ธฐ์‚ฌ
Previous

[AI Paper] ๐Ÿ“„ Tool Learning with Foundation Models

Next

[AI Paper] ๐Ÿ“„ Toolformer: Language Models Can Teach Themselves to Use Tools

๋Œ“๊ธ€ ์—†์Œ! ์ฒซ ๋Œ“๊ธ€์„ ๋‚จ๊ฒจ๋ณด์„ธ์š”.

๋‹ต๊ธ€ ๋‚จ๊ธฐ๊ธฐ ์‘๋‹ต ์ทจ์†Œ

์ด๋ฉ”์ผ ์ฃผ์†Œ๋Š” ๊ณต๊ฐœ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•„์ˆ˜ ํ•„๋“œ๋Š” *๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค

์ตœ์‹ ๊ธ€

  • ๐Ÿ“Š ์ผ์ผ ๋‰ด์Šค ๊ฐ์„ฑ ๋ฆฌํฌํŠธ – 2026-01-28
  • AI ์‹œ์Šคํ…œ์˜ ๋ฌธ๋งฅ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰(Contextual Retrieval) | Anthropic
  • “Think” ํˆด: Claude๊ฐ€ ๋ฉˆ์ถฐ์„œ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ | Anthropic
  • Claude Code ๋ชจ๋ฒ” ์‚ฌ๋ก€ \ Anthropic
  • ์šฐ๋ฆฌ๊ฐ€ ๋ฉ€ํ‹ฐ ์—์ด์ „ํŠธ ์—ฐ๊ตฌ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ ๋ฐฉ๋ฒ•
Copyright 2026 — skycave's Blog. All rights reserved. Blogsy WordPress Theme