Loading "stacks" of models on-demand? Does a tool like this exist?

reddit-localllama · www.reddit.com ·3 pts·3 replies ↗ ·1d

I'd like to self-host some LLM models but a couple different ones for different usecases, and they don't all fit in VRAM at the same time. So i'm kind of looking for a tool in which i can define "profiles" or "stacks" of LLM's that get loa…