sam wang

i'm currently investigating when activation-based probes fail to generalize as monitors, and whether invariant risk minimization (IRM) can improve out-of-distribution (OOD) robustness with Adrians Skapars at AISC.
recently, i've been playing around with Bartosz Cywinski's taboo models. i've also been exploring emergent misalignment papers (1, 2).
previously, i worked at Berkeley TAFLab, where we built autonomous ocean drones to help cargo ships plan safer routes.
before that, i did my undergrad at UC Berkeley (graduated in 2025).
other: i'm a big fan of #Nebrasketball (GBR) and Passenger beans.