Generic MS Parsers
Overview
Our MS readers (specific and generic) can parse files of many formats like mgf, mzxml or mzml.
These parsers have the main advantage of handling huge data file through iterators and filters.
It is also possible to handle bunches of spectra in the fly in the limit of available memory.
Specific Parsing
We provide a few specific MS parsers that may be create explicitly and that parse specific file format (dta, mgf, mzxml, ...):
import org.expasy.jpl.io.ms.reader.MZXMLReader; import org.expasy.jpl.io.ms.MassSpectrum; // build the specific parser MZXMLReader reader = MZXMLReader.newInstance(); reader.parse(mzxmlFile); // get the iterator over spectrum Iterator<MassSpectrum> it = reader.iterator(); while (it.hasNext()) { // get next spectrum it.next(); }
Generic Parsing
We also propose a more generic way to parse MS spectra:
import org.expasy.jpl.io.ms.reader.MSReaderFacade; // build a new generic MS parser MSReader reader = MSReaderFacade.newInstance(); reader.parse(file); // get infos on run reader.getExperimentInfos(); // get the iterator over spectrum Iterator<MassSpectrum> it = reader.iterator(); while (it.hasNext()) { // get next spectrum it.next(); }
You can also limit the kind of file this generic parser may parse:
// this reader is only able to read ms spectra library reader = MSReaderFacade.withExtension(Pattern.compile("msp|sptxt", Pattern.CASE_INSENSITIVE)); // the following call will throw a ParseException as mgf format // is not readable by this parser reader.parse("test.mgf");
Parsing With Filters
Filters (through JPLICondition) can be defined and coupled to any MS parsers to select spectra to keep:
import org.expasy.jpl.core.util.condition.Condition; import org.expasy.jpl.core.util.condition.ConditionImpl; import org.expasy.jpl.core.util.condition.operator.OperatorLowerThan; ... // a ms level filter on spectrum Condition<MassSpectrum> msLevelFilter(int level) { // show how to access mslevel from a spectrum Transformer<MassSpectrum, Integer> sp2level = new Transformer<MassSpectrum, Integer>() { public Integer process(MassSpectrum sp) { return sp.getPeakList().getMsLevel(); } }; return new ConditionImpl.Builder<MassSpectrum, Integer>(level) .accessor(sp2level).build(); } ... reader.setFilter(msLevelFilter(1)); reader.parse(file); Iterator<MassSpectrum> it = reader.iterator(); while (it.hasNext()) { // get next MS1 spectrum it.next(); }
Look conditions in jpl-commons for more informations.
Progress Bar Ready
All Progression bar that implements JPLProgressBar can be given to any MS parser:
// a generic parser MSReader parser = MSReaderFacade.newInstance(); // new terminal progress bar TerminalProgressBar pb = TerminalProgressBar.indeterminate(); // set the progress bar length pb.setBarLength(20); // set the roaming segment length pb.setSegmentLength(12); parser.setProgressBar(pb); parser.parse(new File(paramManager.getFilename())); Iterator it = parser.iterator(); while (it.hasNext()) { // each entry parsed internally // increments the step in the progress bar it.next(); } // 0 [============ ] // .. [ ============ ] // 1000 [ ============ ] // 5949 [ ============ ] // .. // 5979 [ ============ ] // .. // 9208 [====================] // task finished
Block Parsing
Another mode of parsing is available in the new version. It consists of returning blocks of spectra at once:
reader.parse(file); Iterator<MassSpectrum> it = reader.iterator(); // the number of spectra max in a block (depending on // allocated memory) reader.getNextToListMaxSize(); // get the next block List<MassSpectrum> l = it.nextToList(245); while (it.hasNext()) { // get the next bunch of spectra l = it.nextToList(); }
Calls to nextToList() may pose some kind of memory issues when executed on large files. Ms-readers deal with this problem by automatically estimating and controlling the limit in the size of each returned list.
Hopefully, the memory management for those who would like to tune the block parameters can also be handled easily:
// set the maximum part of total free memory dedicated to the
// nextToList block for all ms-readers (optional)
reader.setMaxMemoryRatio(0.5);
// set the max memory in bytes for a block (100 KB)
reader.setNextToListMemoryLimit(100000);
reader.parse(file);
Iterator<MassSpectrum> it = reader.iterator();
// get the next block
List<MassSpectrum> l = it.nextToList();
// comparing with the max number of spectra in a block equivalent
// to a size of 100 KB
AssertTrue(l.size() <= reader.getNextToListMaxSize());