Hardware-based malware detectors (HMDs) are a key emerging technology to build trustworthy systems, especially mobile platforms. Quantifying the efficacy of HMDs against malicious adversaries is thus an important problem. The challenge lies in that real-world malware adapts to defenses, evades being run in experimental settings, and hides behind benign applications. Thus, realizing the potential of HMDs as a small and battery-efficient line of defense requires a rigorous foundation for evaluating HMDs. We introduce Sherlock—a white-box methodology that quantifies an HMD’s ability to detect malware and identify the reason why. Sherlock first deconstructs malware into atomic, orthogonal actions to synthesize a diverse malware suite. Sherlock then drives both malware and benign programs with real user-inputs, and compares their executions to determine an HMD’s operating range, i.e., the smallest malware actions an HMD can detect. We show three case studies using Sherlock to not only quantify HMDs’ operating ranges but design better detectors. First, using information about concrete malware actions, we build a discretewavelet transform based unsupervised HMD that outperforms prior work based on power transforms by 24.7% (AUC metric). Second, training a supervised HMD using Sherlock’s diverse malware dataset yields 12.5% better HMDs than past approaches that train on ad-hoc subsets of malware. Finally, Sherlock shows why a malware instance is detectable. This yields a surprising new result—obfuscation techniques used by malware to evade static analyses makes them more detectable using HMDs.