AI Research Mini-Booklet

Links

to use for reference but not include in writing

to include in page

casey and ai expert: https://www.youtube.com/watch?v=Sp1EmFRDquA

llm wiki idea: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

what to do with all this https://www.dbreunig.com/2026/05/04/10-lessons-for-agentic-coding.html https://www.dbreunig.com/2024/02/01/pursuing-quiet-ai.html https://steipete.me/posts/2025/shipping-at-inference-speed

prediction/observation https://www.dbreunig.com/2026/03/26/winchester-mystery-house.html

approach https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing-the-fuck-down/

tooling https://eugeneyan.com/writing/working-with-ai/https://www.dbreunig.com/2024/10/18/the-3-ai-use-cases-gods-interns-and-cogs.html#cogs

https://buttondown.com/ultradune/archive/eval-008-nvidia-just-open-sourced-an-inference/

The inference stack is being decomposed. A year ago, you picked one engine and it handled everything. Now we have specialized layers — execution engines (vLLM, SGLang, TGI), orchestration frameworks (Dynamo), structured generation engines (XGrammar), and quantization toolchains (TorchAO). Each layer is independently optimizable.

I need to look into: XGrammar + jump-forward decoding

https://read.theaimerge.com/p/the-smartest-ai-engineers-will-bet

Across the industry, only a small number of teams have managed to move beyond pilots and demos. And when systems fail, it’s rarely because of the model itself. It’s the engineering around the model: how systems are designed, monitored, tested, and improved over time. These are the same problems software teams have always faced, but made harder this time, due to the non-deterministic behavior of AI Systems.

https://jarvislabs.ai/blog/expert-parallelism-mixed-strategies-vllm

https://read.theaimerge.com/p/understanding-llm-inference

https://www.vectara.com/glossary-of-llm-terms

https://huggingface.co/collections/nityan/mustread-papers

https://blog.ngxson.com/easier-to-understand-what-is-transformer

https://www.mercity.ai/blog-post/guide-to-fine-tuning-llms-with-lora-and-qlora/

https://rajatpandit.com/ai-infrastructure/the-integer-moment/

https://rajatpandit.com/insights/

https://huggingface.co/docs/transformers/en/kv_cache

tools and projects https://github.com/Tiiny-AI/PowerInfer https://github.com/microsoft/LLMLingua

neat summary on mini max model insights https://www.linkedin.com/posts/sebastianraschka_the-minimax-m2-series-was-one-of-the-most-share-7465419259985174529-p3ub/

ArXiv In-depth Analysis – Medium