Controlling the FDR in imperfect matches to an incomplete database

J Am Stat Assoc. 2018;113(523):973-982. doi: 10.1080/01621459.2017.1375931. Epub 2018 Jun 28.


We consider the problem of controlling the FDR among discoveries from searching an incomplete database. This problem differs from the classical multiple testing setting because there are two different types of false discoveries: those arising from objects that have no match in the database and those that are incorrectly matched. We show that commonly used FDR controlling procedures are inadequate for this setup, a special case of which is tandem mass spectrum identification. We then derive a novel FDR controlling approach which extensive simulations suggest is unbiased. We also compare its performance with problem-specific as well as general FDR controlling procedures using both simulated and real mass spectrometry data.

Keywords: false discovery rate; multiple hypothesis testing; tandem mass spectrometry.