This application generates a decoy library from a given annotated MS library.
Ahrné E., Ohta Y, Nikitin F., Scherl A., Lisacek F., Müller M., An improved method
for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates, Proteomics., 2011, 11 (7), pp 4085-4095
The method is based on the shuffling of each MS precursor peptide and the
generation of decoy spectra obtained after the theoretical fragmentation.
Each generated spectrum fragment then follows a different fate:
- the mz is recalculated for any Annotated fragments
- the Non-Annotated (NA) fragment is kept or sampled from the overall NA distributions
The decision to sample a NA peak depends on its deviation from the overall population
(of closed mz fragments sharing the same precursor charge in the precursor mz neighborhood) -
The less it deviates the more probable it will be sampled.
1. parse the annotated spectra library file
(with --etd option: precursor neighbors fragments removed)
2. make n distributions of all NA fragments by precursor charge (n charges) for future sampling
(example at charge +3)
FOR each original_spectrum DO
3. [make decoy spectrum]
count <- 0
3.1 shuffle precursor peptide sequence (do not shuffle N/C terminal amino-acids (peptidase footprint))
3.2 create the fragmentation spectrum from the shuffled precursor
/// TODO: (specific kind of fragmentation ??)
FOR all peaks DO
IF annotated peak THEN
recalculate the new mzs
ELSE IF mz closed to the baseline THEN
pick a peak mz in the NA sample (see 2)
keep the peak
4. [compute spectrum match (score [0, 1[)]
score <- original_spectrum versus decoy_spectrum (with only annotated peaks)
IF score > dot_product_threshold (decoy ~ original) and count < 10 THEN
count <- count + 1
5. [write decoy spectrum]
flush the most different (lowest score) decoy spectrum in output file
usage: Deliberator <mslib> [-a] [-c] [-d <arg>] [--decoy-tag]
[--default-decoy-tag] [--default-pep-decoy-tag] [-h] [-i <arg>] [--log
<arg>] [-o <arg>] [-p <arg>] [--pep-decoy-tag <arg>] [-q] [-r <arg>] [-s
<arg>] [-t <arg>] [-v] [-w <arg>]
-a,--average set the average mass mode for
peptide mass calculation
by default: MONOISOTOPIC.
-c,--concat-libs concatenate libs
by default: false.
-d,--dp-threshold <arg> define the dot-product threshold
([0-1[) for spectrum shuffling
by default: 0.7.
--decoy-tag set this decoy tag in 'Comments' of
by default: No tag.
--default-decoy-tag set this default decoy tag in
'Comments' of decoy spectra
by default: 'DECOY_'.
--default-pep-decoy-tag set this default decoy tag in
peptide 'Name' of decoy spectra
by default: 'decoy_'.
-h,--help print this message.
-i,--setting-file <arg> give a property file with all input
--log <arg> define the log file.
-o,--output <arg> set the output filename (.msp or
.sptxt file only).
-p,--precision <arg> define the number of fractional
digits for output
by default: 2.
--pep-decoy-tag <arg> set this decoy tag in peptide 'Name'
of decoy spectra
by default: No tag.
-q,--quiet quiet mode (verbose off)
by default: false.
-r,--render-dir <arg> render NA peak histograms
(render-dir/hist) and original + decoy spectra (render-dir/scan).
warning: execution time x10.
-s,--sampling-prob <arg> define the probability of sampling
non-annotated peaks ([0-1[)
by default: -1.0.
-t,--tol <arg> define the tolerance for mz fragment
by default: 0.1.
-v,--version print the version info.
-w,--sampling-interval-width <arg> define the bin width of
non-annotated (NA) peaks histograms for sampling
by default: 100.
The latest version is v0.19 -
Download app with
the default properties file.
||Deliberator was generating the decoy library in sptxt format only. It now produces
sptxt or msp file given the file name extension.
||The MW is now evaluated to the "exact molar mass of the peptide ion"
and not to the neutral molar mass.
||While shuffling peak sequence, some peak mz were becoming
negative after recomputation given their fragment type. We now
||MS1/MS2 dist computation sometimes crashed
|Runtime behavior Change
||Parsed spectra (needed in MS1/MS2 distributions process) are not kept
in memory anymore. They are now serialized in hadoop file reducing the memory overhead.
|Decoy process progression Change
||The progression is now visible in a progress bar.