Proteome Informatics Group > Java Proteomic Library
 

File Configuration

Overview

We try as much as possible to do not hard-code constants. We put constants in text files so that they can be easily changed without recompiling code. Unfortunately as we exposed it as another API we keep our format through releases. It gives the avantage to alter the behavior of the code as we enable this functionality for users. It make it flexible and ready for evolution.

Parsing File

We begin to expose conf file with our chemical parsers inside whitch we can tune any definition of the format for atoms and molecules.

Any conf file is like property object with a key and a value. We define tokens and grammars and propose a way to tag them given their key format. We have hard-coded them :-) (by now) as pattern "_TOKEN" or "_PATTERN" and have allowed definition of tokens among pattern value.

Configurator

By now, we get tokens and patterns with JPLChemicalFormatConfigurator and its static methods getToken(name) and getPattern(name). It contains static loader that parse tokens.conf and patterns.conf and does not propose a mechanism to load other files. This issue will have to be fix as more parsers will need to extend the basic language defined in these conf files.

Example

Configuration file content:

	AA_TOKEN := "AminoAcid"
	
	MONOMER_DELIMITOR_TOKEN := "\s*\:\s*"
	MONOMER_PATTERN := AA_TOKEN + "|" + NUC_TOKEN + "|" + DNUC_TOKEN
	MODIF_MONOMER_PATTERN := "(" + MONOMER_PATTERN + ")" + MONOMER_DELIMITOR_TOKEN + "([A-Z])"
      

java source code:

	private static final Pattern MODIF_MONOMER_PATTERN =
    	    Pattern.compile(JPLChemicalFormatConfigurator
	        .getPattern("MODIF_MONOMER_PATTERN"));
	...
	if (modif.matches(MODIF_MONOMER_PATTERN.pattern())) {
	   builder.addModifAt(getMonomer(modif), nakedPos);
	}
      

What's next

This system of token/grammar definition is quite unsophisticated. We will surely use a dedicated java module to handle this like yacc/bison tokeniser/parsertandem was doing formerly in C-language.