← Back to work

AI / Machine Learning

iLLuMinator 4.9B

A transformer-based language model trained from scratch at 4.9 billion parameters. Implements Grouped Query Attention, Rotary Position Embeddings, SwiGLU activations, and RMSNorm. Includes integrated RAG capabilities for intelligent Q&A, custom tokenizer, and multi-scale model configs (120M to 4.9B). Available on Hugging Face.

Tech stack

Python, PyTorch, Transformers, RAG, FAISS

PythonPyTorchTransformersLLMRAG

What I learned

  • - Training large models is mostly an optimization and data pipeline problem.
  • - Small architecture choices can have huge effects on stability and throughput.
  • - Evaluation and observability are essential; loss alone does not tell the full story.

Links