sam

i'm currently investigating when activation-based probes fail to generalize as monitors, and whether invariant risk minimization (IRM) can improve out-of-distribution (OOD) robustness with Adrians Skapars at AISC.

recently, i've been playing around with Bartosz Cywinski's taboo models. i've also been exploring emergent misalignment papers (1, 2).

previously, i worked at Berkeley TAFLab, where we built autonomous ocean drones to help cargo ships plan safer routes.

before that, i did my undergrad at UC Berkeley (graduated in 2025).

other: i'm a big fan of #Nebrasketball (GBR) and Passenger beans.

sam wang