The Origin and Principles of this Mini-Booklet
The current local model space has a problem.
I read a great article from from Armin Ronacher:
It focused on the problem he was having with the current experience available for local inference (think local model hosting and usage) where a lot can be done to reach an optimal setup getting a desired experience is very hard. Currently, the experience of getting inference working can be overwhelming and easily lead to an inoptimal setup. I believe the best setup is one that is built into a desired experienced and not just optimal.
So just what is the optimal setup?
It is easy to say there are a myrdiad of endlessly unique setups out there but it's not true. There is a 3-D array of setups split across three dimensions being: use case, response experience, and hardware. Everything you consider in the local inference space will be driven from these three axis.
All of these axis have their own form of variability.
Use Case: One moment you may be coding up a new feature in a complex long lived application and the next you may be writing documentation for a completely new project or perhaps trying to map out the functionality of an existing project and find something specific in there.
Response Experience: It is all about the flavor of response behavior you want whether that be high throughput, high factual accuracy, or high creativity. While it is easy to see this experience as part of the use case I believe it is helpful to separate them because you can easily have the use case of product development and want completely different things out of it based on whether you're building an mvp to see what sticks to the wall or working on the actual production application.
Hardware: It's easy to think in a very static way but what about when you bring in cloud spot-instances for on the fly requests or you get some new hardware temporarily thanks to a cloud free trial. Especially if you are not just serving your own inference needs but others as well then hardware may get very variable to stay cost-effective.
The most optimal inference server choice for you is going to be the one that min-maxes for these three axes. Often because of the variability just discussed this optimal fit will be a mix of different inference server options for different scenarios with tooling to conveniantly switch.
That said, the most optimal inference server setup is not the best setup for you as alluded to above.
Then what is the best setup?
One where the experience of setting up, mutating, and using feels good and aligns with what the user wants.
The experience is where Armin had a problem and where I believe many are. Armin talked about a lack of critical mass behind a a technical solution in the space as the source and I do believe that is a part of the problem but not the whole of it.
The other glaring issue is the lack of organized educative material in the space for both easily drawing from other's optimized setups and drawing from other's learning. Currently, we have a bunch of spread out technical papers that go deeper than the average user is looking for, project documentation spread over GitHub repositories, PRs, issues, AI doc sites, personal setups hidden away in personal blogs and GitHub gists, and business writeups with no technical help just selling products badly.
Without a good centralized place of learning, what was an advantage of choice and diversity of optimal setups supported becomes a disadvantage to the desired experience. This is because most users when they enter the local model space will face choice paralysis or mess their setup up and get ejected out. Not to mention those who walk away simply because there is nothing to clearly and confidently dive into.
We lose out when critical mass is lacking in the technical solution space and would benefit from improved capability and experience if we get it but we also need that critical mass in the education and learning resources connected to this space. Without either of these the experience to easily becomes folks running around planting their heads into a bunch of walls until finding the door to their best setup ... if you got stayed at it long enough and got lucky.
To expand on that desired experience, it could be heavily exploratory to learn as much as possible or straight forward where you provide what you have and what you want to a system and get a moderately optimal setup out of that.
What am I doing here.
I'm a resource hoarder and have had to scour across the internet to try and get a unified understanding of local inference. Because I have some kind of geek problem I don't really do the shallow or deep end of the pool with research initiatives like this preferring a nice middle ground so for a long time I stayed away from AI because of the time sink I expected (rightly so) and of late I have been reading late into the night and even had an all nighter recently cause this stuff has insane potential.
My goal with this mini-booklet is to do two main things.
- Make decision frameworks and variables to consider for people more clear.
- Bring together information highly useful for making decisions in this space.
Note that although I call this a mini booklet the experience of it more a cross between
- Guides on software exploration, selection, and optimization fundamentals.
- A library with a bunch of links organized topically with relevant context to them provided.
If the local inference community can beat down some of the context paralysis in beginners' minds by giving them organized path to learn what they need with everything easily at hand and no chaff to get through.
I won't be building anything for aggregating and organizing personal setups anytime soon but hope to one day throw some agents at it. 😃
Extra Note on My Philosophy Here
The reason I want to provide decision framework, resource links, ideas, and approaches to using chatbots and search is because
- Students put more into learning when the learning provides a way to engage.
- So much of the answers and knowledge is already out there. The internet is a magical thing and we need less people repeating the same answers and more people broadcasting the questions and search approach people need to find those answers quickly and effeciently.
- When you teach a student to search they can find answer personalized to their setup current to the time they are researching. It's the difference between giving someone a fish and teaching them how to fish.
- This is largely for beginners and speaking as someone constantly learning about things where I am a beginner. The thing I always felt I was missing most was the right questions and thinking methods, not specific technical answers.
My goal is to make it so the reader can learn something from each page even if they already have some idea of what they are using there or won't be messing around so much like with the optimization fundamentals discussions in the Low Level Pieces page.
I don't live up to this yet but I think the education approach and philosophy described here is something to work towards and learn from for sure.
Terraform, Linux, and College with Matthew Sanabria - YouTube