nnfwtbn package¶

Subpackages¶

nnfwtbn.tests package

Submodules¶

nnfwtbn.cut module¶

class nnfwtbn.cut.Cut(func=None, label=None)[source]¶

Bases: object

Representation of an analysis cut. The class can be used to apply event selections based on conditions on columns in a pandas dataframe or derived quantities.

Cuts store the condition to be applied to a dataframe. New cut objects accept all event by default. The selection can be limited by passing a lambda to the constructor.

>>> sel_all = Cut()
>>> sel_pos = Cut(lambda df: df.value > 0)

The cut object lives independently of the dataframe. Calling the cut with a dataframe returns a new dataframe containing only rows which pass the selection criteria.

>>> df = pd.DataFrame([0, 1, -2, -3, 4], columns=["value"])
>>> sel_all(df)
   value
0      0
1      1
2     -2
3     -3
4      4
>>> sel_pos(df)
   value
1      1
4      4

The index array for a given data set is calculated by calling the idx_array() method with a data dataframe.

>>> sel_pos.idx_array(df)
  False
   True
  False
  False
   True
Name: value, dtype: bool

Cuts can be used to build logical expression using the bitwise and (&), or (|), xor (^) and not (~).

>>> sel_even = Cut(lambda df: df.value % 2 == 0)
>>> sel_pos_even = sel_pos & sel_even
>>> sel_pos_even(df)
   value
4      4

Equivalently, cuts support logical operations directly using lambdas.

>>> sel_pos_even_lambda = sel_pos & (lambda df: df.value % 2 == 0)
>>> sel_pos_even_lambda(df)
   value
4      4

Cuts might be named by passing the ‘label’ argument to the constructor. Cut names can be used during plotting as labels to specify the plotted region.

>>> sel_sr = Cut(lambda df: df.is_sr == 1, label="Signal Region")
>>> sel_sr.label
'Signal Region'

__and__(other)[source]¶: Returns a new cut implementing the logical AND of this cut and the other cut. The other cat be a Cut or any callable.

__call__(dataframe)[source]¶: Applies the internally stored cut to the given dataframe and returns a new dataframe containing only entries passing the event selection.

__dict__ = mappingproxy({'__module__': 'nnfwtbn.cut', '__doc__': '\n Representation of an analysis cut. The class can be used to apply event\n selections based on conditions on columns in a pandas dataframe or derived\n quantities.\n\n Cuts store the condition to be applied to a dataframe. New cut objects\n accept all event by default. The selection can be limited by passing a\n lambda to the constructor. \n\n >>> sel_all = Cut()\n >>> sel_pos = Cut(lambda df: df.value > 0)\n\n The cut object lives independently of the dataframe. Calling the cut with\n a dataframe returns a new dataframe containing only rows which pass the\n selection criteria.\n\n >>> df = pd.DataFrame([0, 1, -2, -3, 4], columns=["value"])\n >>> sel_all(df)\n value\n 0 0\n 1 1\n 2 -2\n 3 -3\n 4 4\n >>> sel_pos(df)\n value\n 1 1\n 4 4\n\n The index array for a given data set is calculated by calling the\n idx_array() method with a data dataframe.\n\n >>> sel_pos.idx_array(df)\n 0 False\n 1 True\n 2 False\n 3 False\n 4 True\n Name: value, dtype: bool\n\n Cuts can be used to build logical expression using the bitwise and (&), or\n (|), xor (^) and not (~).\n\n >>> sel_even = Cut(lambda df: df.value % 2 == 0)\n >>> sel_pos_even = sel_pos & sel_even\n >>> sel_pos_even(df)\n value\n 4 4\n\n Equivalently, cuts support logical operations directly using lambdas.\n\n >>> sel_pos_even_lambda = sel_pos & (lambda df: df.value % 2 == 0)\n >>> sel_pos_even_lambda(df)\n value\n 4 4\n\n Cuts might be named by passing the \'label\' argument to the constructor.\n Cut names can be used during plotting as labels to specify the plotted\n region.\n\n >>> sel_sr = Cut(lambda df: df.is_sr == 1, label="Signal Region")\n >>> sel_sr.label\n \'Signal Region\'\n ', '__init__': <function Cut.__init__>, '__call__': <function Cut.__call__>, 'idx_array': <function Cut.idx_array>, '__and__': <function Cut.__and__>, '__or__': <function Cut.__or__>, '__xor__': <function Cut.__xor__>, '__invert__': <function Cut.__invert__>, '__rand__': <function Cut.__rand__>, '__ror__': <function Cut.__ror__>, '__rxor__': <function Cut.__rxor__>, '__dict__': <attribute '__dict__' of 'Cut' objects>, '__weakref__': <attribute '__weakref__' of 'Cut' objects>, '__annotations__': {}})¶

__init__(func=None, label=None)[source]¶: Creates a new cut. The optional func argument is called with the dataframe upon evaluation. The function must return an index array. If the optional function is omitted, Every row in the dataframe is accepted by this cut.

__invert__()[source]¶: Returns a new cut implementing the logical NOT of this cut.

__module__ = 'nnfwtbn.cut'¶

__or__(other)[source]¶: Returns a new cut implementing the logical OR of this cut and the other cut. The other cat be a Cut or any callable.

__rand__(other)[source]¶

__ror__(other)[source]¶

__rxor__(other)[source]¶

__xor__(other)[source]¶: Returns a new cut implementing the logical XOR of this cut and the other cut. The other can be a callable.

idx_array(dataframe)[source]¶: Applies the internally stored cut to the given dataframe and returns an index array, specifying which event passed the event selection.

nnfwtbn.error module¶

exception nnfwtbn.error.InvalidBins[source]¶

Bases: TypeError

__module__ = 'nnfwtbn.error'¶

exception nnfwtbn.error.InvalidBlinding[source]¶

Bases: TypeError

__module__ = 'nnfwtbn.error'¶

exception nnfwtbn.error.InvalidProcessSelection[source]¶

Bases: ValueError

__module__ = 'nnfwtbn.error'¶

exception nnfwtbn.error.InvalidProcessType[source]¶

Bases: ValueError

__module__ = 'nnfwtbn.error'¶

nnfwtbn.helpers module¶

nnfwtbn.helpers.python_to_str(obj)[source]¶: Convert an arbitrary python object into a string and encode it in base64.

nnfwtbn.helpers.str_to_python(string)[source]¶: Reverse of the python_to_str() function.

nnfwtbn.interface module¶

This module provides classes to interface between classifiers from other frameworks.

class nnfwtbn.interface.Classifier[source]¶

Bases: abc.ABC

Abstract classifier train with another framework and loaded into nnfwtbn.

__abstractmethods__ = frozenset({'predict'})¶

__dict__ = mappingproxy({'__module__': 'nnfwtbn.interface', '__doc__': '\n Abstract classifier train with another framework and loaded into nnfwtbn.\n ', 'predict': <function Classifier.predict>, '__dict__': <attribute '__dict__' of 'Classifier' objects>, '__weakref__': <attribute '__weakref__' of 'Classifier' objects>, '__abstractmethods__': frozenset({'predict'}), '_abc_impl': <_abc_data object>, '__annotations__': {}})¶

__module__ = 'nnfwtbn.interface'¶

abstract predict()[source]¶: Returns an array with the predicted values.

class nnfwtbn.interface.TmvaBdt(filename)[source]¶

Bases: nnfwtbn.interface.Classifier

Experimental class to use BDT’s from TMVA. The class has the following limitations:.

The XML file must contain exactly one classifier.

The boosting method must be AdaBoost.

Fisher cuts cannot be used.

__abstractmethods__ = frozenset({})¶

__init__(filename)[source]¶: Loads the BDT from an XML file.

__module__ = 'nnfwtbn.interface'¶

predict(dataframe)[source]¶: Evaluate the BDT on the given dataframe. The method returns an array with the BDT scores.

nnfwtbn.model module¶

class nnfwtbn.model.BinaryCV(mod_var=None, frac_var=None, k=None)[source]¶

Bases: nnfwtbn.model.CrossValidator

Defines a training set and a test set using a binary split. There is no independent validation set in this case. The BinaryCV should not be used for parameter optimization.

fold 0: | Training | Test & Val | fold 1: | Test & Val | Training |

The BinaryCV can be used after parameter optimization with ClassicalCV to retrain the model on the full half. The valiation performance contain in HepNet.history is the test performance.

__abstractmethods__ = frozenset({})¶

__init__(mod_var=None, frac_var=None, k=None)[source]¶: k is set to 2. The argument k has no effect.

__module__ = 'nnfwtbn.model'¶

select_slice(df, slice_id)[source]¶

Returns the index array to select all events from the dataset of a given slice.

NB: This method is for internal usage only. There might be more than k slices.

select_test(df, fold_i)[source]¶: Returns the index array to select all test events from the dataset for the given fold.

select_training(df, fold_i)[source]¶: Returns the index array to select all training events from the dataset for the given fold.

select_validation(df, fold_i)[source]¶: Returns the index array to select all validation events from the dataset for the given fold.

class nnfwtbn.model.ClassicalCV(k, mod_var=None, frac_var=None)[source]¶

Bases: nnfwtbn.model.CrossValidator

Performs the k-fold cross validation on half of the data set. The other half is designated as the test set.

fold 0: | Tr | Tr | Tr | Tr | Va | Test | fold 1: | Tr | Tr | Tr | Va | Tr | Test | fold 2: | Tr | Tr | Va | Tr | Tr | Test | fold 3: | Tr | Va | Tr | Tr | Tr | Test | fold 4: | Va | Tr | Tr | Tr | Tr | Test |

Va=Validation, Tr=Training

__abstractmethods__ = frozenset({})¶

__module__ = 'nnfwtbn.model'¶

select_slice(df, slice_id)[source]¶

Returns the index array to select all events from the dataset of a given slice.

NB: This method is for internal usage only. There might be more than k slices.

select_test(df, fold_i)[source]¶: Returns the index array to select all test events from the dataset for the given fold.

select_training(df, fold_i)[source]¶: Returns the index array to select all training events from the dataset for the given fold.

select_validation(df, fold_i)[source]¶: Returns the index array to select all validation events from the dataset for the given fold.

class nnfwtbn.model.CrossValidator(k, mod_var=None, frac_var=None)[source]¶

Bases: abc.ABC

Abstract class of a cross validation method.

__abstractmethods__ = frozenset({'select_slice', 'select_test', 'select_training', 'select_validation'})¶

__dict__ = mappingproxy({'__module__': 'nnfwtbn.model', '__doc__': '\n Abstract class of a cross validation method.\n ', '__init__': <function CrossValidator.__init__>, '__eq__': <function CrossValidator.__eq__>, 'select_slice': <function CrossValidator.select_slice>, 'select_training': <function CrossValidator.select_training>, 'select_validation': <function CrossValidator.select_validation>, 'select_test': <function CrossValidator.select_test>, 'select_cv_set': <function CrossValidator.select_cv_set>, 'retrieve_fold_info': <function CrossValidator.retrieve_fold_info>, 'save_to_h5': <function CrossValidator.save_to_h5>, 'load_from_h5': <classmethod object>, '__dict__': <attribute '__dict__' of 'CrossValidator' objects>, '__weakref__': <attribute '__weakref__' of 'CrossValidator' objects>, '__hash__': None, '__abstractmethods__': frozenset({'select_test', 'select_validation', 'select_training', 'select_slice'}), '_abc_impl': <_abc_data object>, '__annotations__': {}})¶

__eq__(other)[source]¶: Compare if two cross validators are the same.

__hash__ = None¶

__init__(k, mod_var=None, frac_var=None)[source]¶: Creates a new cross validator. The argument k determines the number of folders. The mod_var specifies a variable whose ‘mod k’ value defines the set. The frac_var specifies a variable whose decimals defines the set. Only one of the two can be used. Both options can be either a string naming the column in the dataframe or a variable object.

__module__ = 'nnfwtbn.model'¶

classmethod load_from_h5(path, key)[source]¶: Create a new cross validator instance from an hdf5 file. ‘path’ is the file path and ‘key’ is the path inside the hdf5 file.

retrieve_fold_info(df, cv)[source]¶: Returns and array of integers to specify which event was used for train/val/test in which fold

save_to_h5(path, key, overwrite=False)[source]¶: Save cross validator definition to a hdf5 file. ‘path’ is the file path and ‘key’ is the path inside the hdf5 file. If overwrite is true then already existing file contents are overwritten.

select_cv_set(df, cv, fold_i)[source]¶: Returns the index array to select all events from the cross validator set specified with cv (‘train’, ‘val’, ‘test’) for the given fold.

abstract select_slice(df, slice_id)[source]¶

Returns the index array to select all events from the dataset of a given slice.

NB: This method is for internal usage only. There might be more than k slices.

abstract select_test(df, fold_i)[source]¶: Returns the index array to select all test events from the dataset for the given fold.

abstract select_training(df, fold_i)[source]¶: Returns the index array to select all training events from the dataset for the given fold.

abstract select_validation(df, fold_i)[source]¶: Returns the index array to select all validation events from the dataset for the given fold.

class nnfwtbn.model.EstimatorNormalizer(df, input_list=None, center=None, width=None)[source]¶

Bases: nnfwtbn.model.Normalizer

Normalizer which uses estimators to compute the normalization moments. This method might be lead to sub-optimal results if there are outliers.

__abstractmethods__ = frozenset({})¶

__call__(df)[source]¶: See base class.

__eq__(other)[source]¶: See base class.

__hash__ = None¶

__init__(df, input_list=None, center=None, width=None)[source]¶: See base class.

__module__ = 'nnfwtbn.model'¶

property offsets¶: Every normalizor must reduce to a simple (offset + scale * x) normalization to be used with lwtnn. This property returns the offset parameters for all variables.

property scales¶: Every normalizor must reduce to a simple (offset + scale * x) normalization to be used with lwtnn. This property returns the scale parameters for all variables.

class nnfwtbn.model.HepNet(keras_model, cross_validator, normalizer, input_list, output_list)[source]¶

Bases: object

Meta model of a concrete neural network around the underlying Keras model. The HEP net handles cross validation, normalization of the input variables, the input weights, and the actual Keras model. A HEP net has no free parameters.

__dict__ = mappingproxy({'__module__': 'nnfwtbn.model', '__doc__': '\n Meta model of a concrete neural network around the underlying Keras model.\n The HEP net handles cross validation, normalization of the input\n variables, the input weights, and the actual Keras model. A HEP net has no\n free parameters.\n ', '__init__': <function HepNet.__init__>, '__eq__': <function HepNet.__eq__>, 'fit': <function HepNet.fit>, 'predict': <function HepNet.predict>, 'save': <function HepNet.save>, 'load': <classmethod object>, 'export': <function HepNet.export>, '__dict__': <attribute '__dict__' of 'HepNet' objects>, '__weakref__': <attribute '__weakref__' of 'HepNet' objects>, '__hash__': None, '__annotations__': {}})¶

__eq__(other)[source]¶: Check if two models have the same configuration.

__hash__ = None¶

__init__(keras_model, cross_validator, normalizer, input_list, output_list)[source]¶

Creates a new HEP model. The keras model parameter must be a class that returns a new instance of the compiled model (The HEP net needs to able to create multiple models, one for each cross validation fold.)

The cross_validator must be a CrossValidator object.

The normalizer must be a Normalizer class that returns a normalizer. Each cross_validation fold uses a separate normalizer with independent normalization weights.

The input and output lists are lists of variables of column names used as input and target of the keras model. The input is normalized.

__module__ = 'nnfwtbn.model'¶

export(path_base, command='converters/keras2json.py', expression={})[source]¶

Exports the network such that it can be converted to lwtnn’s json format. The method generate a set of files for each cross validation fold. For every fold, the archtecture, the weights, the input variables and their normalization is exported. To simplify the conversion to lwtnn’s json format, the method also creates a bash script which converts all folds.

The path_base argument should be a path or a name of the network. The names of the generated files are created by appending to path_base.

The optional expression can be used to inject the CAF expression when

the NN is used. The final json file will contain an entry KEY=VALUE if a variable matches the dict key.

fit(df, weight=None, **kwds)[source]¶: Calls fit() on all folds. All kwds are passed to fit().

classmethod load(path)[source]¶: Restore a model from a hdf5 file.

predict(df, cv='val', retrieve_fold_info=False, **kwds)[source]¶

Calls predict() on the Keras model. The argument cv specifies the cross validation set to select: ‘train’, ‘val’, ‘test’. Default is ‘val’.

All other keywords are passed to predict.

save(path)[source]¶: Save the model and all associated components to a hdf5 file.

class nnfwtbn.model.MixedCV(k, mod_var=None, frac_var=None)[source]¶

Bases: nnfwtbn.model.CrossValidator

Performs the k-fold cross validation where validation and test sets are both interleaved.

fold 0: | Tr | Tr | Tr | Te | Va | fold 1: | Tr | Tr | Te | Va | Tr | fold 2: | Tr | Te | Va | Tr | Tr | fold 3: | Te | Va | Tr | Tr | Tr | fold 4: | Va | Tr | Tr | Tr | Te |

Va=Validation, Tr=Training, Te=Test

__abstractmethods__ = frozenset({})¶

__module__ = 'nnfwtbn.model'¶

select_slice(df, slice_id)[source]¶

Returns the index array to select all events from the dataset of a given slice.

NB: This method is for internal usage only. There might be more than k slices.

select_test(df, fold_i)[source]¶: Returns the index array to select all test events from the dataset for the given fold.

select_training(df, fold_i)[source]¶: Returns the index array to select all training events from the dataset for the given fold.

select_validation(df, fold_i)[source]¶: Returns the index array to select all validation events from the dataset for the given fold.

class nnfwtbn.model.NoTestCV(mod_var=None, frac_var=None, k=10)[source]¶

Bases: nnfwtbn.model.CrossValidator

Uses the whole dataset for training and validation with a single fold. The test set is empty.

fold 0: | Training | Val |

The NoTestCV can be useful if the test dataset is provided independently from the training and validation, for example if a different generator is used for the training or if real-time (non-hep) data is used as a “test” set.

__abstractmethods__ = frozenset({})¶

__init__(mod_var=None, frac_var=None, k=10)[source]¶: The parameter k defines the inverse fraction of the validation set. For example, k=5 will allocate 1/5 = 20% of the dataset for validation.

__module__ = 'nnfwtbn.model'¶

select_slice(df, slice_id)[source]¶

Returns the index array to select all events from the dataset of a given slice.

NB: This method is for internal usage only. There might be more than k slices.

select_test(df, fold_i)[source]¶: Returns the index array to select all test events from the dataset for the given fold. The test set is empty.

select_training(df, fold_i)[source]¶: Returns the index array to select all training events from the dataset. The fold_i parameter has no effect.

select_validation(df, fold_i)[source]¶: Returns the index array to select all validation events from the dataset for the given fold.

class nnfwtbn.model.Normalizer(df, input_list=None)[source]¶

Bases: abc.ABC

Abstract normalizer which shift and scales the distribution such that it hash zero mean and unit width.

__abstractmethods__ = frozenset({'__call__', '__eq__', '__init__', '_load_from_h5', '_save_to_h5', 'offsets', 'scales'})¶

abstract __call__(df)[source]¶: Applies the normalized of the input_columns to the given dataframe and returns a normalized copy.

__dict__ = mappingproxy({'__module__': 'nnfwtbn.model', '__doc__': '\n Abstract normalizer which shift and scales the distribution such that it hash\n zero mean and unit width.\n ', '__init__': <function Normalizer.__init__>, '__call__': <function Normalizer.__call__>, '__eq__': <function Normalizer.__eq__>, 'scales': <property object>, 'offsets': <property object>, 'save_to_h5': <function Normalizer.save_to_h5>, '_save_to_h5': <function Normalizer._save_to_h5>, 'load_from_h5': <classmethod object>, '_load_from_h5': <classmethod object>, '__dict__': <attribute '__dict__' of 'Normalizer' objects>, '__weakref__': <attribute '__weakref__' of 'Normalizer' objects>, '__hash__': None, '__abstractmethods__': frozenset({'offsets', '__eq__', '_save_to_h5', '__init__', '__call__', '_load_from_h5', 'scales'}), '_abc_impl': <_abc_data object>, '__annotations__': {}})¶

abstract __eq__(other)[source]¶: Check if two normalizers are the same.

__hash__ = None¶

abstract __init__(df, input_list=None)[source]¶: Returns a normalizer object with the normalization moments stored internally. The input_list argument specifies which inputs should be normalized. All other columns are left untouched.

__module__ = 'nnfwtbn.model'¶

classmethod load_from_h5(path, key)[source]¶: Create a new normalizer instance from an hdf5 file. ‘path’ is the file path and ‘key’ is the path inside the hdf5 file.

abstract property offsets¶: Every normalizor must reduce to a simple (offset + scale * x) normalization to be used with lwtnn. This property returns the offset parameters for all variables.

save_to_h5(path, key, overwrite=False)[source]¶: Save normalizer definition to a hdf5 file. ‘path’ is the file path and ‘key’ is the path inside the hdf5 file. If overwrite is true then already existing file contents are overwritten.

abstract property scales¶: Every normalizor must reduce to a simple (offset + scale * x) normalization to be used with lwtnn. This property returns the scale parameters for all variables.

nnfwtbn.model.normalize_category_weights(df, categories, weight='weight')[source]¶

The categorical weight normalizer acts on the weight variable only. The returned dataframe will satisfy the following conditions:

The sum of weights of all events is equal to the total number of entries.

The sum of weights of a category is equal to the total number of entries divided by the number of classes. Therefore the sum of weights of two categories are equal.

The relative weights within a category are unchanged.

nnfwtbn.plot module¶

class nnfwtbn.plot.HistogramFactory(*args, **kwds)[source]¶

Bases: object

Short-cut to create multiple histogram with the same set of processes or in the same region.

__call__(*args, **kwds)[source]¶

Proxy for method to hist(). The positional argument passed to hist() are the positional argument given to the constructor concatenated with the positional argument given to this method. The keyword argument for hist() is the union of the keyword arguments passed to the constructor and this method. The argument passed to this method have precedence.

The method returns the return value of hist.

__dict__ = mappingproxy({'__module__': 'nnfwtbn.plot', '__doc__': '\n Short-cut to create multiple histogram with the same set of processes or\n in the same region.\n ', '__init__': <function HistogramFactory.__init__>, '__call__': <function HistogramFactory.__call__>, '__dict__': <attribute '__dict__' of 'HistogramFactory' objects>, '__weakref__': <attribute '__weakref__' of 'HistogramFactory' objects>, '__annotations__': {}})¶

__init__(*args, **kwds)[source]¶: Accepts any number of positional and keyword arguments. The arguments are stored internally and use default value for hist(). See __call__().

__module__ = 'nnfwtbn.plot'¶

nnfwtbn.plot.confusion_matrix(df, x_processes, y_processes, x_label, y_label, weight=None, axes=None, figure=None, atlas='Internal', info=None, enlarge=1.3, normalize_rows=False, **kwds)[source]¶: Creates a confusion matrix.

nnfwtbn.plot.correlation_matrix(df, variables, weight=None, axes=None, figure=None, atlas='Internal', info=None, enlarge=1.3, normalize_rows=False, **kwds)[source]¶: Plot the Pearson correlation coefficient matrix. The square matrix is returned as a DataFrame.

nnfwtbn.plot.fill_labels(label, info)[source]¶

nnfwtbn.plot.hist(dataframe, variable, bins, stacks, selection=None, range=None, blind=None, figure_size=None, weight=None, y_log=False, y_min=None, vlines=[], denominator=0, numerator=- 1, ratio_label=None, diff=False, ratio_range=None, atlas=None, info=None, enlarge=1.6, density=False, include_outside=False, return_uhepp=False, **kwds)[source]¶

Creates a histogram of stacked processes. The first argument is the dataframe to operate on. The ‘variable’ argument defines the x-axis. The variable argument can be a Variable object or a string naming a column in the dataframe.

The ‘bins’ argument can be an integer specifying the number of bins or a list with all bin boundaries. If it is an integer, the argument range is mandatory. The range argument must be a tuple with the lowest and highest bin edge. The properties of a Variable object are used for the x- and y-axis labels.

Stacks must be Stack objects. The plotting style is defined via the stack object.

The optional blind argument controls which stack should be blinded. The argument can be a single stack, a list of stacks or None. By default, no stack is blinded.

This method creates a new figure and axes internally (handled by uhepp). The figure size can be changed with the figure_size argument. If this argument is not None, it must be a tuple of (width, height).

The method returns (figure, axes) which were used during plotting. This might be identical to the figure and axes arguments. If a ratio plot is drawn, the axes return value is a list of main, ratio plot.

The weight is used to weight the entries. Entries have unit weight if omitted. The argument can be a string name of a column or a variable object.

If the y_log argument is set to True, the y axis will be logarithmic. The axis is enlarged on a logarithmic scale to make room for the ATLAS labels. The optional y_min argument can be used to set the lower limit of the y axis. The default is 0 for linear scale, and 1 for logarithmic scale.

The option vlines can be used to draw vertical lines onto the histogram, e.g., a cut line. The argument should be an array, one item for each line. If the item is a number a red line will be drawn at that x-position. If it is a dict, it will take the item ‘x’ to determine the position, all other keywords are passed to matplotlibs plot method.`

The ratio_label option controls the label of the ratio plot.

The ratio_range argument control the y-range of the ratio plot. If set to None, it will scale automatically to include all points. The default is is None.

If diff is set to True, The difference between the ‘numerator’ and the ‘denominator’ is down instead of their ratio.

The module constants ATLAS and INFO are passed to atlasify. Overwrite them to change the badges.

If the density argument is True, the area of each stack is normalized to unity.

If return_uhepp is True, the method return a UHepPlot object.

nnfwtbn.plot.human_readable(label)[source]¶: Convert labels to plain ascii strings

nnfwtbn.plot.roc(df, signal_process, background_process, discriminant, steps=100, selection=None, min=None, max=None, axes=None, weight=None, atlas='Internal', info=None, enlarge=1.3, return_auc=False)[source]¶

Creates a ROC.

The method returns a dataframe with the signal efficiency and background rejection columns. The length of the dataframe equals the steps parameter.

If return_auc is True, the method returns a tuple with the area under the curve and an uncertainty estimation on the area.

nnfwtbn.process module¶

class nnfwtbn.process.Process(label, selection=None, range=None, range_var=None)[source]¶

Bases: object

This class represents a physics process to be selected during training and plotting. The class stores the cuts to select the process’ events from a dataframe, its style and human-readable name for plotting.

DEFAULT_RANGE_VAR = 'fpid'¶

__call__(dataframe)[source]¶: Returns a dataframe containing only the events of this process.

__dict__ = mappingproxy({'__module__': 'nnfwtbn.process', '__doc__': "\n This class represents a physics process to be selected during training and\n plotting. The class stores the cuts to select the process' events from a\n dataframe, its style and human-readable name for plotting.\n ", 'DEFAULT_RANGE_VAR': 'fpid', '__init__': <function Process.__init__>, '__call__': <function Process.__call__>, 'idx_array': <function Process.idx_array>, '__repr__': <function Process.__repr__>, '__dict__': <attribute '__dict__' of 'Process' objects>, '__weakref__': <attribute '__weakref__' of 'Process' objects>, '__annotations__': {}})¶

__init__(label, selection=None, range=None, range_var=None)[source]¶

Returns a new process object. The process has a human-readable name (potentially using latex) and a selection cut. The selection argument can be a cut object or any callable. Stacking of processes is handled by the plotting method.

>>> Process("Top", lambda d: d.is_top)
<Process 'Top': (func)>

>>> Process("VBF", lambda d: d.is_VBFH)
<Process 'VBF': (func)>

The optional argument range accepts a two-value tuple and is a shortcut to defined a selection cut accepting events whose ‘range_var’ is between (including boundaries) the given values. The range_var can be a string naming a column in the dataframe or a Variable object.

>>> Process("Z\\rightarrow\\ell\\ell", range=(-599, -500))
<Process 'Z\\rightarrow\\ell\\ell': [-599, -500]>

If the range_var argument is omitted, the value of Process.DEFAULT_RANGE_VAR is used, this defaults to ‘fpid’.

A process behaves like a cut in many ways. For example, the call() and idx_array methods are identical.

__module__ = 'nnfwtbn.process'¶

__repr__()[source]¶: Returns a string representation of the process.

idx_array(dataframe)[source]¶: Returns the index array of the given dataframe which selects all events of this process.

nnfwtbn.stack module¶

class nnfwtbn.stack.DataStack(*args, **kwds)[source]¶

Bases: nnfwtbn.stack.Stack

Short-hand class for a Stack with only data-like processes.

__init__(*args, **kwds)[source]¶

Creates a new stack and sets its default properties. If a process is added (via add_process()) to the stack without specifying a custom style, the defaults are used.

The object is initialized with the processes passed to the method.

__module__ = 'nnfwtbn.stack'¶

add_process(*args, **kwds)[source]¶

Adds a new process to the stack. Arguments passed to this method the precedence over the default passed to the constructor.

The process argument must be a Process object with information about the selection and the label in the legend. The histtype argument can take the value ‘step’, ‘stepfilled’, ‘line’, ‘points’. This argument controls the type of the histogram. If the data_uncertainty is set to True, then the get_total_uncertainty() will return sqrt(get_total()). This method is useful when plotting Asimov data. If the option is False, the weights are used to compute the uncertainty.

Additional keyword arguments are stored internally for the plotting method to be forwarded to matplotlib.

class nnfwtbn.stack.McStack(*args, **kwds)[source]¶

Bases: nnfwtbn.stack.Stack

Short-hand class for a Stack with only Monte-Carlo-like processes.

__init__(*args, **kwds)[source]¶

Creates a new stack and sets its default properties. If a process is added (via add_process()) to the stack without specifying a custom style, the defaults are used.

The object is initialized with the processes passed to the method.

__module__ = 'nnfwtbn.stack'¶

add_process(*args, **kwds)[source]¶

Adds a new process to the stack. Arguments passed to this method the precedence over the default passed to the constructor.

The process argument must be a Process object with information about the selection and the label in the legend. The histtype argument can take the value ‘step’, ‘stepfilled’, ‘line’, ‘points’. This argument controls the type of the histogram. If the data_uncertainty is set to True, then the get_total_uncertainty() will return sqrt(get_total()). This method is useful when plotting Asimov data. If the option is False, the weights are used to compute the uncertainty.

Additional keyword arguments are stored internally for the plotting method to be forwarded to matplotlib.

class nnfwtbn.stack.Stack(*processes, histtype='stepfilled', data_uncertainty=False, palette=None, **aux)[source]¶

Bases: object

This class represents a collection of Prcesses drawn as a stack in histograms created with hist(). The Stack class stores information about the plotting style (e.g. markersize, linestyle), the histogram type (step, stepfilled, points), the color wheel, and the method to compute the total uncertainty of the stack.

A stack is not tied to a specific plot. It can be reused for plot with different binning, different variables or different selections.

__dict__ = mappingproxy({'__module__': 'nnfwtbn.stack', '__doc__': '\n This class represents a collection of Prcesses drawn as a stack in\n histograms created with hist(). The Stack class stores information about\n the plotting style (e.g. markersize, linestyle), the histogram type\n (step, stepfilled, points), the color wheel, and the method to compute\n the total uncertainty of the stack.\n\n A stack is not tied to a specific plot. It can be reused for plot with\n different binning, different variables or different selections.\n ', '__init__': <function Stack.__init__>, 'add_process': <function Stack.add_process>, 'get_hist': <function Stack.get_hist>, 'get_total': <function Stack.get_total>, 'get_uncertainty': <function Stack.get_uncertainty>, 'get_total_uncertainty': <function Stack.get_total_uncertainty>, 'get_histtype': <function Stack.get_histtype>, 'get_aux': <function Stack.get_aux>, 'is_data_uncertainty': <function Stack.is_data_uncertainty>, '__len__': <function Stack.__len__>, '__dict__': <attribute '__dict__' of 'Stack' objects>, '__weakref__': <attribute '__weakref__' of 'Stack' objects>, '__annotations__': {}})¶

__init__(*processes, histtype='stepfilled', data_uncertainty=False, palette=None, **aux)[source]¶

Creates a new stack and sets its default properties. If a process is added (via add_process()) to the stack without specifying a custom style, the defaults are used.

The object is initialized with the processes passed to the method.

__len__()[source]¶: Returns the number of processes in this stack.

__module__ = 'nnfwtbn.stack'¶

add_process(process, histtype=None, data_uncertainty=None, **aux)[source]¶

Adds a new process to the stack. Arguments passed to this method the precedence over the default passed to the constructor.

The process argument must be a Process object with information about the selection and the label in the legend. The histtype argument can take the value ‘step’, ‘stepfilled’, ‘line’, ‘points’. This argument controls the type of the histogram. If the data_uncertainty is set to True, then the get_total_uncertainty() will return sqrt(get_total()). This method is useful when plotting Asimov data. If the option is False, the weights are used to compute the uncertainty.

Additional keyword arguments are stored internally for the plotting method to be forwarded to matplotlib.

get_aux(i)[source]¶: Returns the auxiliary keyword arguments. The returned dict is a mix of the default keyword arguments updated by the ones used when adding a process.

get_hist(df, i, bins, variable, weight, include_outside=False)[source]¶: Returns the yields per bin for the i-th process in the stack. The bins argument specifies the bin edges.

get_histtype(i)[source]¶: Return the histtype of process i.

get_total(df, bins, variable, weight, include_outside=False)[source]¶: Returns the sum of yields per bin of all processes. The bins argument specifies the bin edges.

get_total_uncertainty(df, bins, variable, weight, include_outside=False)[source]¶: Returns the uncertainty of the total yield per bin. The bins argument specifies the bin edges.

get_uncertainty(df, i, bins, variable, weight, include_outside=False)[source]¶: Returns the uncertainty of the total yield per bin. The bins argument specifies the bin edges.

is_data_uncertainty(i)[source]¶: Returns True if process i uses data_uncertainties.

class nnfwtbn.stack.SystStack(df, *args, **kwds)[source]¶

Bases: nnfwtbn.stack.Stack

__init__(df, *args, **kwds)[source]¶

Creates a new stack and sets its default properties. If a process is added (via add_process()) to the stack without specifying a custom style, the defaults are used.

The object is initialized with the processes passed to the method.

__module__ = 'nnfwtbn.stack'¶

get_hist(df, *args, **kwds)[source]¶: Returns the yields per bin for the i-th process in the stack. The bins argument specifies the bin edges.

get_stat_uncertainty(df, *args, **kwds)[source]¶

get_syst_uncertainty(df, bins, variable, weight, include_outside=False)[source]¶

get_total(df, *args, **kwds)[source]¶: Returns the sum of yields per bin of all processes. The bins argument specifies the bin edges.

get_total_uncertainty(*args, **kwds)[source]¶: Returns the uncertainty of the total yield per bin. The bins argument specifies the bin edges.

class nnfwtbn.stack.TruthStack(*processes, histtype='stepfilled', data_uncertainty=False, palette=None, **aux)[source]¶

Bases: nnfwtbn.stack.Stack

__module__ = 'nnfwtbn.stack'¶

get_total_uncertainty(df, bins, *args, **kwds)[source]¶: Returns the uncertainty of the total yield per bin. The bins argument specifies the bin edges.

nnfwtbn.stack.not_none_or_default(value, default)[source]¶: Returns the value if it is not None. Otherwise returns the default.

nnfwtbn.toydata module¶

This module implements method to generate a deterministic, physics-inspired toy dataset. The dataset is intended for documentations and examples. The module does not rely on external random number generators (seeding numpy might break user code).

nnfwtbn.toydata.augment(point)[source]¶

nnfwtbn.toydata.draw(rng, pdf, size=1, lower=0, upper=1, N=100)[source]¶: Draws a size-shaped random sample from the given PDF. The PDF must be normalized to unity withing the given limits.

nnfwtbn.toydata.generate(total, vbfh_frac=0.2, shuffle=True)[source]¶

nnfwtbn.toydata.get()[source]¶

nnfwtbn.toydata.mcmc(length, pdf)[source]¶: Generate length many samples.

nnfwtbn.toydata.mcmc_step(x, pdf)[source]¶: Perform a single step of Markov chain Monte Carlo.

nnfwtbn.toydata.proposal(point)[source]¶

nnfwtbn.toydata.vbfh_pdf(point)[source]¶

Returns the relative probability density at the given point. The function is not properly normalized. The outer dimension of the point contains the following values:

jet_1_pt

jet_1_eta

jet_1_phi

jet_2_pt

jet_2_eta

jet_2_phi

met_phi

met_pt

tau_phi

tau_eta

tau_pt

lep_phi

lep_eta

lep_pt

random value

nnfwtbn.toydata.ztt_pdf(point)[source]¶

nnfwtbn.variable module¶

class nnfwtbn.variable.BlindingStrategy[source]¶

Bases: abc.ABC

The BlindingStrategy class represents a blinding strategy. This is an abstract base class. Sub-classes must implement the __call__ method.

__abstractmethods__ = frozenset({'__call__'})¶

abstract __call__(dataframe, variable, bins, range=None)[source]¶: Returns the additional selection in order to blind a process. The first argument is the dataframe to operate on. The second argument is the variable whose histogram should be blinded. The arguments bins and range are identical to the ones for the hist method. They might be used in sub-classes to align the blinding cuts to bin borders.

__dict__ = mappingproxy({'__module__': 'nnfwtbn.variable', '__doc__': '\n The BlindingStrategy class represents a blinding strategy. This is an\n abstract base class. Sub-classes must implement the __call__ method.\n ', '__call__': <function BlindingStrategy.__call__>, '__dict__': <attribute '__dict__' of 'BlindingStrategy' objects>, '__weakref__': <attribute '__weakref__' of 'BlindingStrategy' objects>, '__abstractmethods__': frozenset({'__call__'}), '_abc_impl': <_abc_data object>, '__annotations__': {}})¶

__module__ = 'nnfwtbn.variable'¶

class nnfwtbn.variable.RangeBlindingStrategy(start, end)[source]¶

Bases: nnfwtbn.variable.BlindingStrategy

Concrete blinding strategy which removes all events between a certain x-axis range. The range might be extended to match the bin borders.

__abstractmethods__ = frozenset({})¶

__call__(variable, bins, range=None)[source]¶: See base class. Returns the additional selection.

__init__(start, end)[source]¶: Returns a new RangeBlindingStrategy object. When the object is called, it returns a selection removing all events that lay between start and end. The range might be extended to match bin borders.

__module__ = 'nnfwtbn.variable'¶

class nnfwtbn.variable.Variable(name, definition, unit=None, blinding=None)[source]¶

Bases: object

Representation of a quantity derived from the columns of a dataframe. The variable can also directly represent a column of the dataframe.

The variable object defines a human-readable name for the variable and it’s physical unit. The name and the unit are used for plotting and labeling of axes.

>>> Variable("MMC", "ditau_mmc_mlm_m", "GeV")
<Variable 'MMC' [GeV]>

__call__(dataframe)[source]¶: Returns an array or series of variable computed from the given dataframe. This method does not apply the blinding!

__dict__ = mappingproxy({'__module__': 'nnfwtbn.variable', '__doc__': '\n Representation of a quantity derived from the columns of a dataframe. The\n variable can also directly represent a column of the dataframe. \n\n The variable object defines a human-readable name for the variable and\n it\'s physical unit. The name and the unit are used for plotting and\n labeling of axes.\n\n >>> Variable("MMC", "ditau_mmc_mlm_m", "GeV")\n <Variable \'MMC\' [GeV]>\n ', '__init__': <function Variable.__init__>, '__call__': <function Variable.__call__>, '__repr__': <function Variable.__repr__>, '__eq__': <function Variable.__eq__>, 'save_to_h5': <function Variable.save_to_h5>, 'load_from_h5': <classmethod object>, '__dict__': <attribute '__dict__' of 'Variable' objects>, '__weakref__': <attribute '__weakref__' of 'Variable' objects>, '__hash__': None, '__annotations__': {}})¶

__eq__(other)[source]¶: Compare if two variables are the same.

__hash__ = None¶

__init__(name, definition, unit=None, blinding=None)[source]¶

Returns a new variable object. The first argument is a human-readable name (potentially using latex). The second argument defines the value of the variable. This can be a string naming the column of the dataframe or a callable that computes the value when a dataframe is passed to it.

>>> Variable("MMC", "ditau_mmc_mlm_m", "GeV")
<Variable 'MMC' [GeV]>

>>> Variable("$\\Delta \\eta$", lambda df: df.jet_0_eta - df.jet_1_eta)
<Variable '$\\Delta \\eta$'>

The optional argument unit defines the unit of the variable. This information is used for plotting, especially for labeling axes.

The optional blinding argument accepts a blinding object implementing the blinding strategy.

__module__ = 'nnfwtbn.variable'¶

__repr__()[source]¶: Returns a string representation.

classmethod load_from_h5(path, key)[source]¶: Create a new Variable instance from an hdf5 file. ‘path’ is the file path and ‘key’ is the path inside the hdf5 file.

save_to_h5(path, key, overwrite=False)[source]¶: Save variable definition to a hdf5 file. ‘path’ is the file path and ‘key’ is the path inside the hdf5 file. If overwrite is true then already existing file contents are overwritten.

nnfwtbn package¶

Subpackages¶

Submodules¶

nnfwtbn.cut module¶

nnfwtbn.error module¶

nnfwtbn.helpers module¶

nnfwtbn.interface module¶

nnfwtbn.model module¶

nnfwtbn.plot module¶

nnfwtbn.process module¶

nnfwtbn.stack module¶

nnfwtbn.toydata module¶

nnfwtbn.variable module¶

Module contents¶