@taylanturker
Building Paper Circle
LoRA enables efficient adaptation of large language models by training small rank decomposition matrices instead of fine-tuning all parameters. LoRA democratizes large language model fine-tuning by making adaptation practical for resource-constrained organizations, enabling efficient multi-task deployment and reducing environmental impact. It has become a foundational technique for practical LLM adaptation in industry and research.
VL-JEPA is a vision-language model that predicts continuous text embeddings instead of generating tokens, achieving better performance with 50% fewer parameters than standard VLMs. This work demonstrates that embedding-space prediction is a more efficient alternative to autoregressive token generation for vision-language tasks, enabling smaller models to match larger ones' performance. The approach reduces computational costs at inference through selective decoding and enables flexible task support without architectural changes, making it practical for resource-constrained applications.
Ragas is a reference-free evaluation framework that assesses different dimensions of Retrieval Augmented Generation systems without requiring ground truth annotations. As organizations rapidly adopt RAG systems with LLMs, automated evaluation is critical for development velocity and risk mitigation. This framework enables practitioners to quickly iterate on RAG architectures and identify failure modes related to hallucinations, retrieval quality, or generation fidelity without costly human evaluation campaigns.
Mercury introduces ultra-fast diffusion-based language models that generate multiple tokens in parallel, achieving state-of-the-art speed while maintaining competitive quality for coding tasks. This work addresses a critical bottleneck in LLM deployment by achieving dramatic speedups without quality loss, making advanced AI coding assistants more practical for real-time applications and reducing computational costs for developers and enterprises.
The GPT-3 paper. The few-shot capabilities demonstrated here were mind-blowing at the time and kicked off the current AI wave.
BERT changed NLP forever. The bidirectional pre-training approach was a breakthrough that enabled so many downstream applications.
Understanding these scaling laws is crucial for anyone planning to train large models. The predictability is remarkable.
The paper that started it all. If you work in ML and havent read this, stop everything and read it now.
Interesting improvement over vanilla LoRA. The magnitude/direction decomposition seems to help with learning capacity.
Game-changer for running large models on consumer hardware. The combination of 4-bit quantization with LoRA is brilliant.