Klibrate

From PROTEOMICA
Jump to: navigation, search

Klibrate v1.14 is a program made in the Jesus Vazquez Cardiovascular Proteomics Lab at Centro Nacional de Investigaciones Cardiovasculares, used to perform the calibration of experimental data, as a first step to integrate these data into higher levels along with the SanXoT program.

To perform the calibration two parameters have to be calculated: the k (weight constant), and the variance. They are calculated iteratively using the Levenberg-Marquardt algorithm, starting from the seeds the user introduces (it is possible to perform the calculation without the iterative calculation by forcing both parameters with the -f option). In the integration that follows the variance can be recalculated.

Klibrate needs two input files:

  • the original data file, containing unique identifiers of each scan, such as "RawFile05.raw-scan19289-charge2" or "File05B_scannumber12877_z3", the Xi which corresponds to the log2(A/B), and the Vi which corresponds to the weight of the measure).
  • the relations file, containing a first column with the higher level identifiers (such as the peptide sequence, for example "CGLAGCGLLK", or the protein, if you wish to directly integrate scans into proteins, such as the Uniprot Accession Numbers "P01308" or KEGG Gene ID "hsa:3630"), and the lower level identifiers within the abovementioned original data file (such as "RawFile05.raw-scan19289-charge2").

And delivers the output calibrated file:

  • the calibrated data file, containing the same information as the original data file, but changing the values of the third column (containing the weights) to adapt the information to the calibrated weights that can be used as input in the SanXoT program.

Usage:

klibrate.py [OPTIONS] -r[relations file] -d[original data file] -o[calibrated output file]

Arguments:

  -h, --help          Display this help and exit.
  -a, --analysis=string
                      Use a prefix for the output files. If this is not
                      provided, then the prefix will be garnered from the data
                      file.
  -b, --no-verbose       Do not print result summary after executing.
  -d, --datafile      Input data file with text identificators in the first
                      column, measured values (x) in the second column, and
                      uncalibrated weights (v) in the third column.
  -D, --outgraphdata=filename
                      To use a non-default name for the data used to create
                      calibration graph files.
  -f, --forceparameters
                      Use the parameters (k and variance) as provided, without
                      using the Levenberg-Marquardt algorithm.
  -g, --no-showgraph  Do not show the rank(V) vs 1 / MSD graph after
                      the calculation.
  -G, --outgraphvvalue=filename
                      To use a non-default name for the graph file which shows
                      the value of V (the weight) versus 1 / MSD.
  -k, --kseed         Seed for the weight constant. Default is k = 1.
  -K, --kfile=filename
                      Get the K value from a text file. It must contain a line
                      (not more than once) with the text "K = [float]". This
                      suits the info file from another integration (see -L).
  -L, --infofile=filename
                      To use a non-default name for the info file.
  -m, --maxiterations Maximum number of iterations performed by the Levenberg-
                      Marquardt algorithm to calculate the variance and the k
                      constant. If unused, the default value of the algorithm
                      is taken.
  -o, --outputfile    To use a non-default output calibrated file name (see
                      above for more information on this file).
  -p, --place, --folder=foldername
                      To use a different common folder for the output files.
                      If this is not provided, the the folder used will be the
                      same as the input folder.
  -r, --relfile, --relationsfile
                      Relations file, with identificators of the higher level
                      in the first column, and identificators of the lower
                      level in the second column.
  -R, --outgraphvrank=filename
                      To use a non-default name for the graph file which shows
                      the rank of V (the weight) versus 1 / MSD.
  -s, --no-showsteps  Do not print result summary and steps of each Levenberg-
                      Marquardt iteration.
  -v, --var, --varianceseed
                      Seed used to start calculating the variance.
                      Default is 0.001.
  -V, --varfile=filename
                      Get the variance value from a text file. It must contain
                      a line (not more than once) with the text
                      "Variance = [double]". This suits the info file from
                      another integration (see -L).
  -w, --window        The amount of weight-ordered lower level elements
                      (scans, usually) that are taken at a time to calculate
                      the median of the weight, which is compared to the fit;
                      default is 200.


examples:

  • To calculate the variance and k starting with a seed v = 0.03 and k = 40, printing the steps of the Levenberg-Marquardt algorithm and results, showing the rank(Vs) vs 1 / MSD graph afterwards:
klibrate.py -gbs -v0.03 -k40 -dC:\temp\originalDataFile.txt -rC:\temp\relationsFile.txt -oC:\temp\calibratedWeights.xls
  • To get fast results of an integration forcing a variance = 0.02922 and a k = 35.28:
klibrate.py -f -v0.02922 -k35.28 -dC:\temp\originalDataFile.txt -rC:\temp\relationsFile.txt -oC:\temp\calibratedWeights.xls
  • To see the graph resulting from a calculation with variance = 0.02922 and a k = 35.28:
klibrate.py -gf -v0.02922 -k35.28 -dC:\temp\originalDataFile.txt -rC:\temp\relationsFile.txt -oC:\temp\calibratedWeights.xls