On-device machine studying (ML) guarantees to enhance the privateness, responsiveness, and proliferation of recent, clever person experiences by transferring ML computation onto on a regular basis private gadgets. Nonetheless, right this moment’s massive ML fashions have to be drastically compressed to run effectively on-device, a hurtle that requires deep, but at present area of interest experience. To have interaction the broader human-centered ML neighborhood in on-device ML experiences, we current the outcomes from an interview research with 30 consultants at Apple specializing in producing environment friendly fashions. We compile tacit information that consultants have developed via sensible expertise with mannequin compression throughout totally different {hardware} platforms. Our findings provide pragmatic concerns lacking from prior work, overlaying the design course of, trade-offs, and technical methods that go into creating environment friendly fashions. Lastly, we distill design suggestions for tooling to assist ease the problem of this work and produce on-device ML into to extra widespread follow.