AI / Machine Learning
iLLuMinator 4.9B
A transformer-based language model trained from scratch at 4.9 billion parameters. Implements Grouped Query Attention, Rotary Position Embeddings, SwiGLU activations, and RMSNorm. Includes integrated RAG capabilities for intelligent Q&A, custom tokenizer, and multi-scale model configs (120M to 4.9B). Available on Hugging Face.
Tech stack
Python, PyTorch, Transformers, RAG, FAISS
PythonPyTorchTransformersLLMRAG
What I learned
- - Training large models is mostly an optimization and data pipeline problem.
- - Small architecture choices can have huge effects on stability and throughput.
- - Evaluation and observability are essential; loss alone does not tell the full story.