Netrunners - Penetration testing and Red Teaming Engagements.
  • Home
  • Writeups
    Red Teaming Research
  • About
  • Contact

Featured Writeup in: RESEARCH

Feb 25, 2025

Fine-Tuning GLM-4.7-Flash (30B MoE) with LoRA on a Single 48GB GPU

TL;DR

I fine-tuned a 30B parameter model (GLM-4.7-Flash) on a single 48GB GPU. Everyone said you need 60GB+. The trick: 80% of the model’s weights are stored in a format that no quantization library can touch, so you offload them to CPU RAM instead. Four monkey-patches to make the training stack not crash, a custom autograd hook to stop PyTorch from eating all your VRAM, and about 9 hours of training per run. Three runs produced broken models before I figured out you can’t just train the attention layers on an MoE model, you need the shared expert FFN layers too. The working pipeline uses 30GB VRAM and ~114GB RAM. If you have a …

Continue Reading_ Continue Reading_
Research

Fine-Tuning GLM-4.7-Flash (30B MoE) with LoRA on a Single 48GB GPU

TL;DR

I fine-tuned a 30B parameter model (GLM-4.7-Flash) on a single 48GB GPU. Everyone said you need 60GB+. The trick: 80% of the …

Feb 25, 2025 Continue Reading_ Continue Reading_
Research
01
This site uses cyber cookies. By continuing to use this website, you allow our netrunners to manage them.I Accept
Contact a Netrunner _ Contact a Netrunner_

Netrunners © 2024