You're reading the documentation for a development version. For the latest released version, please have a look at v0.8.12.

Scores

CADET-Match has many scores available. Each score is designed to solve a particular type of problem. The most common ones will be covered first. There are a few additional ones available in the code that are retained for backwards compatibility that will not be covered and their usage is not recomended.

{
"scores": [
    {
    },
    {
    }
],
}

Common

A few things are common to nearly all the scores.

Almost all of them support slicing where a start and stop time can be given and the score will only look in that time slice. This is really useful if you know that a certain component corresponds to a particular peak.

The scores inherit csv and output_path from the experiment they are part of but these can also be specified on an individual score. If you know a particular component is tied to a particular peak it can be given a separate output_path and csv file to improve matching.

The derivative of the shape function can be very useful in matching data. Most chromatography peaks are not symmetrical and the derivative of the shape improves matching.

Another feature common to most of the scores is the idea of decay. Based on working with experimentalists pumps often have small delays. Decay controls matching the time offset of the signal. Setting decay to 1 gives an immediate penalty and setting decay to 0 gives a reduced penalty for small offsets. It is recomended to use decay = 1 when estimating porosity and setting decay = 0 for most other estimations.

Sum of squared error scores

Sum of squared errors is a common metric used in optimization but it caused problems with chromatography. Between issues like pump delays and small flow rate variations there is a tendency for the peaks to shift. A narrowly shifting peak is a particular problem for SSE and it causes the score to favor the peak being in the right place even if the shape is far from correct. It is not generally recomended to use these scores.

When using multiple scores SSE and non-SSE based scores can’t be mixed. This is due to numerical issues.

SSE based scores can’t be used for error modeling.

Shape

Shape is the most general purpose score for fitting. It looks at the similarity of the shape between synthetic and experimental data. The score prioritizes the shape being accurate over the shape eluting at exactly the right time. The shape is dictated by the physics of the reaction while the position can shift slightly due to things like pump delays.

Key

Values

Default

Required

Description

name

String

None

Yes

name of the score, it must be unique

decay

Boolean

False

No

set the decay for time offsets

derivative

Boolean

True

No

set using the derivative of the shape

start

Float

None

No

Slice start in seconds

stop

Float

None

No

Slice stop in seconds

csv

Integer

None

No

This is the path to the csv file to match against.

output_path

Path or List

None

No

Path or list of paths to the output data to match against

{
            "name": "main_peak",
            "type": "Shape",
    "start": 100.0,
    "stop": 300.0
    }

Shape Front

Shape front is a modification of the shape score that only looks at the front of a peak. This is especially important with breakthrough curves where the back of the peak may not even exist.

Key

Values

Default

Required

Description

name

String

None

Yes

name of the score, it must be unique

min_percent

Float

0.02

No

Percent of peak max to identify the bottom of the front of the peak

max_percent

Float

0.98

No

Percent of peak max to identify the top of the front of the peak

decay

Boolean

False

No

set the decay for time offsets

derivative

Boolean

True

No

set using the derivative of the shape

start

Float

None

No

Slice start in seconds

stop

Float

None

No

Slice stop in seconds

csv

Integer

None

No

This is the path to the csv file to match against.

output_path

Path or List

None

No

Path or list of paths to the output data to match against

{
    "name": "peak_front",
    "type": "ShapeFront",
    "min_percent": 0.05,
    "max_percent": 0.95
}

Shape Back

Shape back is a modification of the shape score that only looks at the back of a peak. This is often used when a system starts saturated and there is no front side of the peak and only a back side exists.

Key

Values

Default

Required

Description

name

String

None

Yes

name of the score, it must be unique

min_percent

Float

0.02

No

Percent of peak max to identify the bottom of the front of the peak

max_percent

Float

0.98

No

Percent of peak max to identify the top of the front of the peak

decay

Boolean

False

No

set the decay for time offsets

derivative

Boolean

True

No

set using the derivative of the shape

start

Float

None

No

Slice start in seconds

stop

Float

None

No

Slice stop in seconds

csv

Integer

None

No

This is the path to the csv file to match against.

output_path

Path or List

None

No

Path or list of paths to the output data to match against

{
    "name": "peak_back",
    "type": "ShapeBack",
    "start": 300,
    "stop": 600
}

SSE

SSE is sum of squared errors and this is a typical score used in optimization. In chromatography the pulses are narrow and have a tendency to shift position which makes this a hard score to optimize with. For some problems this score may work but in general it is not advised.

This score can’t be used with error modeling.

Key

Values

Default

Required

Description

name

String

None

Yes

name of the score, it must be unique

start

Float

None

No

Slice start in seconds

stop

Float

None

No

Slice stop in seconds

csv

Integer

None

No

This is the path to the csv file to match against.

output_path

Path or List

None

No

Path or list of paths to the output data to match against

{
    "name": "peak",
    "type": "SSE",
    "start": 300,
    "stop": 600
}

Dextran Shape

Dextran shape is a special score designed to deal with the non-idealities of Dextran in a column. It can be used with any non-ideal molecule where only part of the front of the peak can be used. It automatically isolates as much of the front of peak and uses as much of the front of the peak as possible.

Key

Values

Default

Required

Description

name

String

None

Yes

name of the score, it must be unique

start

Float

None

No

Slice start in seconds

stop

Float

None

No

Slice stop in seconds

csv

Integer

None

No

This is the path to the csv file to match against.

output_path

Path or List

None

No

Path or list of paths to the output data to match against

{
    "name": "peak",
    "type": "DextranShape",
}

Dextran SSE

Dextran SSE is uses the same slicing as Dextran shape but uses SSE instead of similarity metrics it used some of squared errors.

This score can’t be used with error modeling.

Key

Values

Default

Required

Description

name

String

None

Yes

name of the score, it must be unique

start

Float

None

No

Slice start in seconds

stop

Float

None

No

Slice stop in seconds

csv

Integer

None

No

This is the path to the csv file to match against.

output_path

Path or List

None

No

Path or list of paths to the output data to match against

{
    "name": "peak",
    "type": "DextranSSE",
}

Fractionation Slide

This score is used for fractionation. It requires an additional csv file with the fractionation data.

The fractionation files has 3 or more columns and it is easier to explain the the example below. The first colums has a header of Start and the entries in the column are the start times of fractionation in seconds. The second column has a header of Stop and the entries are the stop times of fractionation. The times don’t have to be continuous and can have gaps.

Each column after the first 2 starts with a header that is the component number and the values in the column are the concentration in mol/m^3 (mM) of the sample. If there is no data for a sample in a particular fraction the entry can be left blank and it will be handled.

In the case below 3 samples are collected from 400-450s, 450-500s, and 500-550s on component 0 and component 1.

Start

Stop

0

1

400

450

0.0051

0.0054

450

500

0.0178

0.0190

500

550

0.0265

0.0287

Key

Values

Default

Required

Description

name

String

None

Yes

name of the score, it must be unique

unit_name

String

None

Yes

Name of the unit operation that is fractionated (usually outlet representing UV detector)

fraction_csv

Path

None

Yes

csv file with fractionation data

start

Float

None

No

Slice start in seconds

stop

Float

None

No

Slice stop in seconds

csv

Integer

None

No

This is the path to the csv file to match against.

output_path

Path or List

None

No

Path or list of paths to the output data to match against

{
    "name": "fractionation",
    "type": "fractionationSlide",
    "unit_name": "unit_002",
    "fraction_csv": "frac.csv"
}

Fractionation SSE

Fractionation based score using SSE and was written for testing purpose in a paper and is not generally recomended.

This score can’t be used with error modeling.

Key

Values

Default

Required

Description

name

String

None

Yes

name of the score, it must be unique

unit_name

String

None

Yes

Name of the unit operation that is fractionated (usually outlet representing UV detector)

fraction_csv

Path

None

Yes

csv file with fractionation data

start

Float

None

No

Slice start in seconds

stop

Float

None

No

Slice stop in seconds

csv

Integer

None

No

This is the path to the csv file to match against.

output_path

Path or List

None

No

Path or list of paths to the output data to match against

{
    "name": "fractionation",
    "type": "fractionationSSE",
    "unit_name": "unit_002",
    "fraction_csv": "frac.csv"
}

Ceiling

Ceiling is a special case score. It is almost always used with start and stop and is used to ensure nothing is above the defined value in the selected interval. On some experimental systems that are running very close to overload a fit can be obtained that has a large amount of material coming off during loading. This can be used as a restriction for that effect.

Key

Values

Default

Required

Description

name

String

None

Yes

name of the score, it must be unique

max_value

Float

None

Yes

max value that is allowed

start

Float

None

No

Slice start in seconds

stop

Float

None

No

Slice stop in seconds

csv

Integer

None

No

This is the path to the csv file to match against.

output_path

Path or List

None

No

Path or list of paths to the output data to match against

{
    "name": "limit",
    "type": "Ceiling",
    "max_value": 0.05,
    "start": 0,
    "stop": 100
}

Other

There are a few other scores that exist.

AbsoluteTime and AbsoluteHeight are used by error modeling and can’t be used for parameter estimation. When continueMCMC=1 is set these scores are automatically added when needed. It is not advisable to add these by hand.

There are many older variations of Shape, ShapeFront and ShapeBack and these are all obsolete now and the same things can be done with the derivative and decay options.