nnfwtbn package¶
Subpackages¶
Submodules¶
nnfwtbn.cut module¶
-
class
nnfwtbn.cut.
Cut
(func=None, label=None)[source]¶ Bases:
object
Representation of an analysis cut. The class can be used to apply event selections based on conditions on columns in a pandas dataframe or derived quantities.
Cuts store the condition to be applied to a dataframe. New cut objects accept all event by default. The selection can be limited by passing a lambda to the constructor.
>>> sel_all = Cut() >>> sel_pos = Cut(lambda df: df.value > 0)
The cut object lives independently of the dataframe. Calling the cut with a dataframe returns a new dataframe containing only rows which pass the selection criteria.
>>> df = pd.DataFrame([0, 1, -2, -3, 4], columns=["value"]) >>> sel_all(df) value 0 0 1 1 2 -2 3 -3 4 4 >>> sel_pos(df) value 1 1 4 4
The index array for a given data set is calculated by calling the idx_array() method with a data dataframe.
>>> sel_pos.idx_array(df) 0 False 1 True 2 False 3 False 4 True Name: value, dtype: bool
Cuts can be used to build logical expression using the bitwise and (&), or (|), xor (^) and not (~).
>>> sel_even = Cut(lambda df: df.value % 2 == 0) >>> sel_pos_even = sel_pos & sel_even >>> sel_pos_even(df) value 4 4
Equivalently, cuts support logical operations directly using lambdas.
>>> sel_pos_even_lambda = sel_pos & (lambda df: df.value % 2 == 0) >>> sel_pos_even_lambda(df) value 4 4
Cuts might be named by passing the ‘label’ argument to the constructor. Cut names can be used during plotting as labels to specify the plotted region.
>>> sel_sr = Cut(lambda df: df.is_sr == 1, label="Signal Region") >>> sel_sr.label 'Signal Region'
-
__and__
(other)[source]¶ Returns a new cut implementing the logical AND of this cut and the other cut. The other cat be a Cut or any callable.
-
__call__
(dataframe)[source]¶ Applies the internally stored cut to the given dataframe and returns a new dataframe containing only entries passing the event selection.
-
__dict__
= mappingproxy({'__module__': 'nnfwtbn.cut', '__doc__': '\n Representation of an analysis cut. The class can be used to apply event\n selections based on conditions on columns in a pandas dataframe or derived\n quantities.\n\n Cuts store the condition to be applied to a dataframe. New cut objects\n accept all event by default. The selection can be limited by passing a\n lambda to the constructor. \n\n >>> sel_all = Cut()\n >>> sel_pos = Cut(lambda df: df.value > 0)\n\n The cut object lives independently of the dataframe. Calling the cut with\n a dataframe returns a new dataframe containing only rows which pass the\n selection criteria.\n\n >>> df = pd.DataFrame([0, 1, -2, -3, 4], columns=["value"])\n >>> sel_all(df)\n value\n 0 0\n 1 1\n 2 -2\n 3 -3\n 4 4\n >>> sel_pos(df)\n value\n 1 1\n 4 4\n\n The index array for a given data set is calculated by calling the\n idx_array() method with a data dataframe.\n\n >>> sel_pos.idx_array(df)\n 0 False\n 1 True\n 2 False\n 3 False\n 4 True\n Name: value, dtype: bool\n\n Cuts can be used to build logical expression using the bitwise and (&), or\n (|), xor (^) and not (~).\n\n >>> sel_even = Cut(lambda df: df.value % 2 == 0)\n >>> sel_pos_even = sel_pos & sel_even\n >>> sel_pos_even(df)\n value\n 4 4\n\n Equivalently, cuts support logical operations directly using lambdas.\n\n >>> sel_pos_even_lambda = sel_pos & (lambda df: df.value % 2 == 0)\n >>> sel_pos_even_lambda(df)\n value\n 4 4\n\n Cuts might be named by passing the \'label\' argument to the constructor.\n Cut names can be used during plotting as labels to specify the plotted\n region.\n\n >>> sel_sr = Cut(lambda df: df.is_sr == 1, label="Signal Region")\n >>> sel_sr.label\n \'Signal Region\'\n ', '__init__': <function Cut.__init__>, '__call__': <function Cut.__call__>, 'idx_array': <function Cut.idx_array>, '__and__': <function Cut.__and__>, '__or__': <function Cut.__or__>, '__xor__': <function Cut.__xor__>, '__invert__': <function Cut.__invert__>, '__rand__': <function Cut.__rand__>, '__ror__': <function Cut.__ror__>, '__rxor__': <function Cut.__rxor__>, '__dict__': <attribute '__dict__' of 'Cut' objects>, '__weakref__': <attribute '__weakref__' of 'Cut' objects>, '__annotations__': {}})¶
-
__init__
(func=None, label=None)[source]¶ Creates a new cut. The optional func argument is called with the dataframe upon evaluation. The function must return an index array. If the optional function is omitted, Every row in the dataframe is accepted by this cut.
-
__module__
= 'nnfwtbn.cut'¶
-
__or__
(other)[source]¶ Returns a new cut implementing the logical OR of this cut and the other cut. The other cat be a Cut or any callable.
-
nnfwtbn.error module¶
nnfwtbn.helpers module¶
nnfwtbn.interface module¶
This module provides classes to interface between classifiers from other frameworks.
-
class
nnfwtbn.interface.
Classifier
[source]¶ Bases:
abc.ABC
Abstract classifier train with another framework and loaded into nnfwtbn.
-
__abstractmethods__
= frozenset({'predict'})¶
-
__dict__
= mappingproxy({'__module__': 'nnfwtbn.interface', '__doc__': '\n Abstract classifier train with another framework and loaded into nnfwtbn.\n ', 'predict': <function Classifier.predict>, '__dict__': <attribute '__dict__' of 'Classifier' objects>, '__weakref__': <attribute '__weakref__' of 'Classifier' objects>, '__abstractmethods__': frozenset({'predict'}), '_abc_impl': <_abc_data object>, '__annotations__': {}})¶
-
__module__
= 'nnfwtbn.interface'¶
-
-
class
nnfwtbn.interface.
TmvaBdt
(filename)[source]¶ Bases:
nnfwtbn.interface.Classifier
Experimental class to use BDT’s from TMVA. The class has the following limitations:.
The XML file must contain exactly one classifier.
The boosting method must be AdaBoost.
Fisher cuts cannot be used.
-
__abstractmethods__
= frozenset({})¶
-
__module__
= 'nnfwtbn.interface'¶
nnfwtbn.model module¶
-
class
nnfwtbn.model.
BinaryCV
(mod_var=None, frac_var=None, k=None)[source]¶ Bases:
nnfwtbn.model.CrossValidator
Defines a training set and a test set using a binary split. There is no independent validation set in this case. The BinaryCV should not be used for parameter optimization.
fold 0: | Training | Test & Val | fold 1: | Test & Val | Training |
The BinaryCV can be used after parameter optimization with ClassicalCV to retrain the model on the full half. The valiation performance contain in HepNet.history is the test performance.
-
__abstractmethods__
= frozenset({})¶
-
__module__
= 'nnfwtbn.model'¶
-
select_slice
(df, slice_id)[source]¶ Returns the index array to select all events from the dataset of a given slice.
NB: This method is for internal usage only. There might be more than k slices.
-
select_test
(df, fold_i)[source]¶ Returns the index array to select all test events from the dataset for the given fold.
-
-
class
nnfwtbn.model.
ClassicalCV
(k, mod_var=None, frac_var=None)[source]¶ Bases:
nnfwtbn.model.CrossValidator
Performs the k-fold cross validation on half of the data set. The other half is designated as the test set.
fold 0: | Tr | Tr | Tr | Tr | Va | Test | fold 1: | Tr | Tr | Tr | Va | Tr | Test | fold 2: | Tr | Tr | Va | Tr | Tr | Test | fold 3: | Tr | Va | Tr | Tr | Tr | Test | fold 4: | Va | Tr | Tr | Tr | Tr | Test |
Va=Validation, Tr=Training
-
__abstractmethods__
= frozenset({})¶
-
__module__
= 'nnfwtbn.model'¶
-
select_slice
(df, slice_id)[source]¶ Returns the index array to select all events from the dataset of a given slice.
NB: This method is for internal usage only. There might be more than k slices.
-
select_test
(df, fold_i)[source]¶ Returns the index array to select all test events from the dataset for the given fold.
-
-
class
nnfwtbn.model.
CrossValidator
(k, mod_var=None, frac_var=None)[source]¶ Bases:
abc.ABC
Abstract class of a cross validation method.
-
__abstractmethods__
= frozenset({'select_slice', 'select_test', 'select_training', 'select_validation'})¶
-
__dict__
= mappingproxy({'__module__': 'nnfwtbn.model', '__doc__': '\n Abstract class of a cross validation method.\n ', '__init__': <function CrossValidator.__init__>, '__eq__': <function CrossValidator.__eq__>, 'select_slice': <function CrossValidator.select_slice>, 'select_training': <function CrossValidator.select_training>, 'select_validation': <function CrossValidator.select_validation>, 'select_test': <function CrossValidator.select_test>, 'select_cv_set': <function CrossValidator.select_cv_set>, 'retrieve_fold_info': <function CrossValidator.retrieve_fold_info>, 'save_to_h5': <function CrossValidator.save_to_h5>, 'load_from_h5': <classmethod object>, '__dict__': <attribute '__dict__' of 'CrossValidator' objects>, '__weakref__': <attribute '__weakref__' of 'CrossValidator' objects>, '__hash__': None, '__abstractmethods__': frozenset({'select_test', 'select_validation', 'select_training', 'select_slice'}), '_abc_impl': <_abc_data object>, '__annotations__': {}})¶
-
__hash__
= None¶
-
__init__
(k, mod_var=None, frac_var=None)[source]¶ Creates a new cross validator. The argument k determines the number of folders. The mod_var specifies a variable whose ‘mod k’ value defines the set. The frac_var specifies a variable whose decimals defines the set. Only one of the two can be used. Both options can be either a string naming the column in the dataframe or a variable object.
-
__module__
= 'nnfwtbn.model'¶
-
classmethod
load_from_h5
(path, key)[source]¶ Create a new cross validator instance from an hdf5 file. ‘path’ is the file path and ‘key’ is the path inside the hdf5 file.
-
retrieve_fold_info
(df, cv)[source]¶ Returns and array of integers to specify which event was used for train/val/test in which fold
-
save_to_h5
(path, key, overwrite=False)[source]¶ Save cross validator definition to a hdf5 file. ‘path’ is the file path and ‘key’ is the path inside the hdf5 file. If overwrite is true then already existing file contents are overwritten.
-
select_cv_set
(df, cv, fold_i)[source]¶ Returns the index array to select all events from the cross validator set specified with cv (‘train’, ‘val’, ‘test’) for the given fold.
-
abstract
select_slice
(df, slice_id)[source]¶ Returns the index array to select all events from the dataset of a given slice.
NB: This method is for internal usage only. There might be more than k slices.
-
abstract
select_test
(df, fold_i)[source]¶ Returns the index array to select all test events from the dataset for the given fold.
-
-
class
nnfwtbn.model.
EstimatorNormalizer
(df, input_list=None, center=None, width=None)[source]¶ Bases:
nnfwtbn.model.Normalizer
Normalizer which uses estimators to compute the normalization moments. This method might be lead to sub-optimal results if there are outliers.
-
__abstractmethods__
= frozenset({})¶
-
__hash__
= None¶
-
__module__
= 'nnfwtbn.model'¶
-
property
offsets
¶ Every normalizor must reduce to a simple (offset + scale * x) normalization to be used with lwtnn. This property returns the offset parameters for all variables.
-
property
scales
¶ Every normalizor must reduce to a simple (offset + scale * x) normalization to be used with lwtnn. This property returns the scale parameters for all variables.
-
-
class
nnfwtbn.model.
HepNet
(keras_model, cross_validator, normalizer, input_list, output_list)[source]¶ Bases:
object
Meta model of a concrete neural network around the underlying Keras model. The HEP net handles cross validation, normalization of the input variables, the input weights, and the actual Keras model. A HEP net has no free parameters.
-
__dict__
= mappingproxy({'__module__': 'nnfwtbn.model', '__doc__': '\n Meta model of a concrete neural network around the underlying Keras model.\n The HEP net handles cross validation, normalization of the input\n variables, the input weights, and the actual Keras model. A HEP net has no\n free parameters.\n ', '__init__': <function HepNet.__init__>, '__eq__': <function HepNet.__eq__>, 'fit': <function HepNet.fit>, 'predict': <function HepNet.predict>, 'save': <function HepNet.save>, 'load': <classmethod object>, 'export': <function HepNet.export>, '__dict__': <attribute '__dict__' of 'HepNet' objects>, '__weakref__': <attribute '__weakref__' of 'HepNet' objects>, '__hash__': None, '__annotations__': {}})¶
-
__hash__
= None¶
-
__init__
(keras_model, cross_validator, normalizer, input_list, output_list)[source]¶ Creates a new HEP model. The keras model parameter must be a class that returns a new instance of the compiled model (The HEP net needs to able to create multiple models, one for each cross validation fold.)
The cross_validator must be a CrossValidator object.
The normalizer must be a Normalizer class that returns a normalizer. Each cross_validation fold uses a separate normalizer with independent normalization weights.
The input and output lists are lists of variables of column names used as input and target of the keras model. The input is normalized.
-
__module__
= 'nnfwtbn.model'¶
-
export
(path_base, command='converters/keras2json.py', expression={})[source]¶ Exports the network such that it can be converted to lwtnn’s json format. The method generate a set of files for each cross validation fold. For every fold, the archtecture, the weights, the input variables and their normalization is exported. To simplify the conversion to lwtnn’s json format, the method also creates a bash script which converts all folds.
The path_base argument should be a path or a name of the network. The names of the generated files are created by appending to path_base.
The optional expression can be used to inject the CAF expression when
the NN is used. The final json file will contain an entry KEY=VALUE if a variable matches the dict key.
-
-
class
nnfwtbn.model.
MixedCV
(k, mod_var=None, frac_var=None)[source]¶ Bases:
nnfwtbn.model.CrossValidator
Performs the k-fold cross validation where validation and test sets are both interleaved.
fold 0: | Tr | Tr | Tr | Te | Va | fold 1: | Tr | Tr | Te | Va | Tr | fold 2: | Tr | Te | Va | Tr | Tr | fold 3: | Te | Va | Tr | Tr | Tr | fold 4: | Va | Tr | Tr | Tr | Te |
Va=Validation, Tr=Training, Te=Test
-
__abstractmethods__
= frozenset({})¶
-
__module__
= 'nnfwtbn.model'¶
-
select_slice
(df, slice_id)[source]¶ Returns the index array to select all events from the dataset of a given slice.
NB: This method is for internal usage only. There might be more than k slices.
-
select_test
(df, fold_i)[source]¶ Returns the index array to select all test events from the dataset for the given fold.
-
-
class
nnfwtbn.model.
NoTestCV
(mod_var=None, frac_var=None, k=10)[source]¶ Bases:
nnfwtbn.model.CrossValidator
Uses the whole dataset for training and validation with a single fold. The test set is empty.
fold 0: | Training | Val |
The NoTestCV can be useful if the test dataset is provided independently from the training and validation, for example if a different generator is used for the training or if real-time (non-hep) data is used as a “test” set.
-
__abstractmethods__
= frozenset({})¶
-
__init__
(mod_var=None, frac_var=None, k=10)[source]¶ The parameter k defines the inverse fraction of the validation set. For example, k=5 will allocate 1/5 = 20% of the dataset for validation.
-
__module__
= 'nnfwtbn.model'¶
-
select_slice
(df, slice_id)[source]¶ Returns the index array to select all events from the dataset of a given slice.
NB: This method is for internal usage only. There might be more than k slices.
-
select_test
(df, fold_i)[source]¶ Returns the index array to select all test events from the dataset for the given fold. The test set is empty.
-
-
class
nnfwtbn.model.
Normalizer
(df, input_list=None)[source]¶ Bases:
abc.ABC
Abstract normalizer which shift and scales the distribution such that it hash zero mean and unit width.
-
__abstractmethods__
= frozenset({'__call__', '__eq__', '__init__', '_load_from_h5', '_save_to_h5', 'offsets', 'scales'})¶
-
abstract
__call__
(df)[source]¶ Applies the normalized of the input_columns to the given dataframe and returns a normalized copy.
-
__dict__
= mappingproxy({'__module__': 'nnfwtbn.model', '__doc__': '\n Abstract normalizer which shift and scales the distribution such that it hash\n zero mean and unit width.\n ', '__init__': <function Normalizer.__init__>, '__call__': <function Normalizer.__call__>, '__eq__': <function Normalizer.__eq__>, 'scales': <property object>, 'offsets': <property object>, 'save_to_h5': <function Normalizer.save_to_h5>, '_save_to_h5': <function Normalizer._save_to_h5>, 'load_from_h5': <classmethod object>, '_load_from_h5': <classmethod object>, '__dict__': <attribute '__dict__' of 'Normalizer' objects>, '__weakref__': <attribute '__weakref__' of 'Normalizer' objects>, '__hash__': None, '__abstractmethods__': frozenset({'offsets', '__eq__', '_save_to_h5', '__init__', '__call__', '_load_from_h5', 'scales'}), '_abc_impl': <_abc_data object>, '__annotations__': {}})¶
-
__hash__
= None¶
-
abstract
__init__
(df, input_list=None)[source]¶ Returns a normalizer object with the normalization moments stored internally. The input_list argument specifies which inputs should be normalized. All other columns are left untouched.
-
__module__
= 'nnfwtbn.model'¶
-
classmethod
load_from_h5
(path, key)[source]¶ Create a new normalizer instance from an hdf5 file. ‘path’ is the file path and ‘key’ is the path inside the hdf5 file.
-
abstract property
offsets
¶ Every normalizor must reduce to a simple (offset + scale * x) normalization to be used with lwtnn. This property returns the offset parameters for all variables.
-
save_to_h5
(path, key, overwrite=False)[source]¶ Save normalizer definition to a hdf5 file. ‘path’ is the file path and ‘key’ is the path inside the hdf5 file. If overwrite is true then already existing file contents are overwritten.
-
abstract property
scales
¶ Every normalizor must reduce to a simple (offset + scale * x) normalization to be used with lwtnn. This property returns the scale parameters for all variables.
-
-
nnfwtbn.model.
normalize_category_weights
(df, categories, weight='weight')[source]¶ The categorical weight normalizer acts on the weight variable only. The returned dataframe will satisfy the following conditions:
The sum of weights of all events is equal to the total number of entries.
The sum of weights of a category is equal to the total number of entries divided by the number of classes. Therefore the sum of weights of two categories are equal.
The relative weights within a category are unchanged.
nnfwtbn.plot module¶
-
class
nnfwtbn.plot.
HistogramFactory
(*args, **kwds)[source]¶ Bases:
object
Short-cut to create multiple histogram with the same set of processes or in the same region.
-
__call__
(*args, **kwds)[source]¶ Proxy for method to hist(). The positional argument passed to hist() are the positional argument given to the constructor concatenated with the positional argument given to this method. The keyword argument for hist() is the union of the keyword arguments passed to the constructor and this method. The argument passed to this method have precedence.
The method returns the return value of hist.
-
__dict__
= mappingproxy({'__module__': 'nnfwtbn.plot', '__doc__': '\n Short-cut to create multiple histogram with the same set of processes or\n in the same region.\n ', '__init__': <function HistogramFactory.__init__>, '__call__': <function HistogramFactory.__call__>, '__dict__': <attribute '__dict__' of 'HistogramFactory' objects>, '__weakref__': <attribute '__weakref__' of 'HistogramFactory' objects>, '__annotations__': {}})¶
-
__init__
(*args, **kwds)[source]¶ Accepts any number of positional and keyword arguments. The arguments are stored internally and use default value for hist(). See __call__().
-
__module__
= 'nnfwtbn.plot'¶
-
-
nnfwtbn.plot.
confusion_matrix
(df, x_processes, y_processes, x_label, y_label, weight=None, axes=None, figure=None, atlas='Internal', info=None, enlarge=1.3, normalize_rows=False, **kwds)[source]¶ Creates a confusion matrix.
-
nnfwtbn.plot.
correlation_matrix
(df, variables, weight=None, axes=None, figure=None, atlas='Internal', info=None, enlarge=1.3, normalize_rows=False, **kwds)[source]¶ Plot the Pearson correlation coefficient matrix. The square matrix is returned as a DataFrame.
-
nnfwtbn.plot.
hist
(dataframe, variable, bins, stacks, selection=None, range=None, blind=None, figure_size=None, weight=None, y_log=False, y_min=None, vlines=[], denominator=0, numerator=- 1, ratio_label=None, diff=False, ratio_range=None, atlas=None, info=None, enlarge=1.6, density=False, include_outside=False, return_uhepp=False, **kwds)[source]¶ Creates a histogram of stacked processes. The first argument is the dataframe to operate on. The ‘variable’ argument defines the x-axis. The variable argument can be a Variable object or a string naming a column in the dataframe.
The ‘bins’ argument can be an integer specifying the number of bins or a list with all bin boundaries. If it is an integer, the argument range is mandatory. The range argument must be a tuple with the lowest and highest bin edge. The properties of a Variable object are used for the x- and y-axis labels.
Stacks must be Stack objects. The plotting style is defined via the stack object.
The optional blind argument controls which stack should be blinded. The argument can be a single stack, a list of stacks or None. By default, no stack is blinded.
This method creates a new figure and axes internally (handled by uhepp). The figure size can be changed with the figure_size argument. If this argument is not None, it must be a tuple of (width, height).
The method returns (figure, axes) which were used during plotting. This might be identical to the figure and axes arguments. If a ratio plot is drawn, the axes return value is a list of main, ratio plot.
The weight is used to weight the entries. Entries have unit weight if omitted. The argument can be a string name of a column or a variable object.
If the y_log argument is set to True, the y axis will be logarithmic. The axis is enlarged on a logarithmic scale to make room for the ATLAS labels. The optional y_min argument can be used to set the lower limit of the y axis. The default is 0 for linear scale, and 1 for logarithmic scale.
The option vlines can be used to draw vertical lines onto the histogram, e.g., a cut line. The argument should be an array, one item for each line. If the item is a number a red line will be drawn at that x-position. If it is a dict, it will take the item ‘x’ to determine the position, all other keywords are passed to matplotlibs plot method.`
The ratio_label option controls the label of the ratio plot.
The ratio_range argument control the y-range of the ratio plot. If set to None, it will scale automatically to include all points. The default is is None.
If diff is set to True, The difference between the ‘numerator’ and the ‘denominator’ is down instead of their ratio.
The module constants ATLAS and INFO are passed to atlasify. Overwrite them to change the badges.
If the density argument is True, the area of each stack is normalized to unity.
If return_uhepp is True, the method return a UHepPlot object.
-
nnfwtbn.plot.
roc
(df, signal_process, background_process, discriminant, steps=100, selection=None, min=None, max=None, axes=None, weight=None, atlas='Internal', info=None, enlarge=1.3, return_auc=False)[source]¶ Creates a ROC.
The method returns a dataframe with the signal efficiency and background rejection columns. The length of the dataframe equals the steps parameter.
If return_auc is True, the method returns a tuple with the area under the curve and an uncertainty estimation on the area.
nnfwtbn.process module¶
-
class
nnfwtbn.process.
Process
(label, selection=None, range=None, range_var=None)[source]¶ Bases:
object
This class represents a physics process to be selected during training and plotting. The class stores the cuts to select the process’ events from a dataframe, its style and human-readable name for plotting.
-
DEFAULT_RANGE_VAR
= 'fpid'¶
-
__dict__
= mappingproxy({'__module__': 'nnfwtbn.process', '__doc__': "\n This class represents a physics process to be selected during training and\n plotting. The class stores the cuts to select the process' events from a\n dataframe, its style and human-readable name for plotting.\n ", 'DEFAULT_RANGE_VAR': 'fpid', '__init__': <function Process.__init__>, '__call__': <function Process.__call__>, 'idx_array': <function Process.idx_array>, '__repr__': <function Process.__repr__>, '__dict__': <attribute '__dict__' of 'Process' objects>, '__weakref__': <attribute '__weakref__' of 'Process' objects>, '__annotations__': {}})¶
-
__init__
(label, selection=None, range=None, range_var=None)[source]¶ Returns a new process object. The process has a human-readable name (potentially using latex) and a selection cut. The selection argument can be a cut object or any callable. Stacking of processes is handled by the plotting method.
>>> Process("Top", lambda d: d.is_top) <Process 'Top': (func)>
>>> Process("VBF", lambda d: d.is_VBFH) <Process 'VBF': (func)>
The optional argument range accepts a two-value tuple and is a shortcut to defined a selection cut accepting events whose ‘range_var’ is between (including boundaries) the given values. The range_var can be a string naming a column in the dataframe or a Variable object.
>>> Process("Z\\rightarrow\\ell\\ell", range=(-599, -500)) <Process 'Z\\rightarrow\\ell\\ell': [-599, -500]>
If the range_var argument is omitted, the value of Process.DEFAULT_RANGE_VAR is used, this defaults to ‘fpid’.
A process behaves like a cut in many ways. For example, the call() and idx_array methods are identical.
-
__module__
= 'nnfwtbn.process'¶
-
nnfwtbn.stack module¶
-
class
nnfwtbn.stack.
DataStack
(*args, **kwds)[source]¶ Bases:
nnfwtbn.stack.Stack
Short-hand class for a Stack with only data-like processes.
-
__init__
(*args, **kwds)[source]¶ Creates a new stack and sets its default properties. If a process is added (via add_process()) to the stack without specifying a custom style, the defaults are used.
The object is initialized with the processes passed to the method.
-
__module__
= 'nnfwtbn.stack'¶
-
add_process
(*args, **kwds)[source]¶ Adds a new process to the stack. Arguments passed to this method the precedence over the default passed to the constructor.
The process argument must be a Process object with information about the selection and the label in the legend. The histtype argument can take the value ‘step’, ‘stepfilled’, ‘line’, ‘points’. This argument controls the type of the histogram. If the data_uncertainty is set to True, then the get_total_uncertainty() will return sqrt(get_total()). This method is useful when plotting Asimov data. If the option is False, the weights are used to compute the uncertainty.
Additional keyword arguments are stored internally for the plotting method to be forwarded to matplotlib.
-
-
class
nnfwtbn.stack.
McStack
(*args, **kwds)[source]¶ Bases:
nnfwtbn.stack.Stack
Short-hand class for a Stack with only Monte-Carlo-like processes.
-
__init__
(*args, **kwds)[source]¶ Creates a new stack and sets its default properties. If a process is added (via add_process()) to the stack without specifying a custom style, the defaults are used.
The object is initialized with the processes passed to the method.
-
__module__
= 'nnfwtbn.stack'¶
-
add_process
(*args, **kwds)[source]¶ Adds a new process to the stack. Arguments passed to this method the precedence over the default passed to the constructor.
The process argument must be a Process object with information about the selection and the label in the legend. The histtype argument can take the value ‘step’, ‘stepfilled’, ‘line’, ‘points’. This argument controls the type of the histogram. If the data_uncertainty is set to True, then the get_total_uncertainty() will return sqrt(get_total()). This method is useful when plotting Asimov data. If the option is False, the weights are used to compute the uncertainty.
Additional keyword arguments are stored internally for the plotting method to be forwarded to matplotlib.
-
-
class
nnfwtbn.stack.
Stack
(*processes, histtype='stepfilled', data_uncertainty=False, palette=None, **aux)[source]¶ Bases:
object
This class represents a collection of Prcesses drawn as a stack in histograms created with hist(). The Stack class stores information about the plotting style (e.g. markersize, linestyle), the histogram type (step, stepfilled, points), the color wheel, and the method to compute the total uncertainty of the stack.
A stack is not tied to a specific plot. It can be reused for plot with different binning, different variables or different selections.
-
__dict__
= mappingproxy({'__module__': 'nnfwtbn.stack', '__doc__': '\n This class represents a collection of Prcesses drawn as a stack in\n histograms created with hist(). The Stack class stores information about\n the plotting style (e.g. markersize, linestyle), the histogram type\n (step, stepfilled, points), the color wheel, and the method to compute\n the total uncertainty of the stack.\n\n A stack is not tied to a specific plot. It can be reused for plot with\n different binning, different variables or different selections.\n ', '__init__': <function Stack.__init__>, 'add_process': <function Stack.add_process>, 'get_hist': <function Stack.get_hist>, 'get_total': <function Stack.get_total>, 'get_uncertainty': <function Stack.get_uncertainty>, 'get_total_uncertainty': <function Stack.get_total_uncertainty>, 'get_histtype': <function Stack.get_histtype>, 'get_aux': <function Stack.get_aux>, 'is_data_uncertainty': <function Stack.is_data_uncertainty>, '__len__': <function Stack.__len__>, '__dict__': <attribute '__dict__' of 'Stack' objects>, '__weakref__': <attribute '__weakref__' of 'Stack' objects>, '__annotations__': {}})¶
-
__init__
(*processes, histtype='stepfilled', data_uncertainty=False, palette=None, **aux)[source]¶ Creates a new stack and sets its default properties. If a process is added (via add_process()) to the stack without specifying a custom style, the defaults are used.
The object is initialized with the processes passed to the method.
-
__module__
= 'nnfwtbn.stack'¶
-
add_process
(process, histtype=None, data_uncertainty=None, **aux)[source]¶ Adds a new process to the stack. Arguments passed to this method the precedence over the default passed to the constructor.
The process argument must be a Process object with information about the selection and the label in the legend. The histtype argument can take the value ‘step’, ‘stepfilled’, ‘line’, ‘points’. This argument controls the type of the histogram. If the data_uncertainty is set to True, then the get_total_uncertainty() will return sqrt(get_total()). This method is useful when plotting Asimov data. If the option is False, the weights are used to compute the uncertainty.
Additional keyword arguments are stored internally for the plotting method to be forwarded to matplotlib.
-
get_aux
(i)[source]¶ Returns the auxiliary keyword arguments. The returned dict is a mix of the default keyword arguments updated by the ones used when adding a process.
-
get_hist
(df, i, bins, variable, weight, include_outside=False)[source]¶ Returns the yields per bin for the i-th process in the stack. The bins argument specifies the bin edges.
-
get_total
(df, bins, variable, weight, include_outside=False)[source]¶ Returns the sum of yields per bin of all processes. The bins argument specifies the bin edges.
-
get_total_uncertainty
(df, bins, variable, weight, include_outside=False)[source]¶ Returns the uncertainty of the total yield per bin. The bins argument specifies the bin edges.
-
-
class
nnfwtbn.stack.
SystStack
(df, *args, **kwds)[source]¶ Bases:
nnfwtbn.stack.Stack
-
__init__
(df, *args, **kwds)[source]¶ Creates a new stack and sets its default properties. If a process is added (via add_process()) to the stack without specifying a custom style, the defaults are used.
The object is initialized with the processes passed to the method.
-
__module__
= 'nnfwtbn.stack'¶
-
get_hist
(df, *args, **kwds)[source]¶ Returns the yields per bin for the i-th process in the stack. The bins argument specifies the bin edges.
-
-
class
nnfwtbn.stack.
TruthStack
(*processes, histtype='stepfilled', data_uncertainty=False, palette=None, **aux)[source]¶ Bases:
nnfwtbn.stack.Stack
-
__module__
= 'nnfwtbn.stack'¶
-
nnfwtbn.toydata module¶
This module implements method to generate a deterministic, physics-inspired toy dataset. The dataset is intended for documentations and examples. The module does not rely on external random number generators (seeding numpy might break user code).
-
nnfwtbn.toydata.
draw
(rng, pdf, size=1, lower=0, upper=1, N=100)[source]¶ Draws a size-shaped random sample from the given PDF. The PDF must be normalized to unity withing the given limits.
-
nnfwtbn.toydata.
vbfh_pdf
(point)[source]¶ Returns the relative probability density at the given point. The function is not properly normalized. The outer dimension of the point contains the following values:
jet_1_pt
jet_1_eta
jet_1_phi
jet_2_pt
jet_2_eta
jet_2_phi
met_phi
met_pt
tau_phi
tau_eta
tau_pt
lep_phi
lep_eta
lep_pt
random value
nnfwtbn.variable module¶
-
class
nnfwtbn.variable.
BlindingStrategy
[source]¶ Bases:
abc.ABC
The BlindingStrategy class represents a blinding strategy. This is an abstract base class. Sub-classes must implement the __call__ method.
-
__abstractmethods__
= frozenset({'__call__'})¶
-
abstract
__call__
(dataframe, variable, bins, range=None)[source]¶ Returns the additional selection in order to blind a process. The first argument is the dataframe to operate on. The second argument is the variable whose histogram should be blinded. The arguments bins and range are identical to the ones for the hist method. They might be used in sub-classes to align the blinding cuts to bin borders.
-
__dict__
= mappingproxy({'__module__': 'nnfwtbn.variable', '__doc__': '\n The BlindingStrategy class represents a blinding strategy. This is an\n abstract base class. Sub-classes must implement the __call__ method.\n ', '__call__': <function BlindingStrategy.__call__>, '__dict__': <attribute '__dict__' of 'BlindingStrategy' objects>, '__weakref__': <attribute '__weakref__' of 'BlindingStrategy' objects>, '__abstractmethods__': frozenset({'__call__'}), '_abc_impl': <_abc_data object>, '__annotations__': {}})¶
-
__module__
= 'nnfwtbn.variable'¶
-
-
class
nnfwtbn.variable.
RangeBlindingStrategy
(start, end)[source]¶ Bases:
nnfwtbn.variable.BlindingStrategy
Concrete blinding strategy which removes all events between a certain x-axis range. The range might be extended to match the bin borders.
-
__abstractmethods__
= frozenset({})¶
-
__init__
(start, end)[source]¶ Returns a new RangeBlindingStrategy object. When the object is called, it returns a selection removing all events that lay between start and end. The range might be extended to match bin borders.
-
__module__
= 'nnfwtbn.variable'¶
-
-
class
nnfwtbn.variable.
Variable
(name, definition, unit=None, blinding=None)[source]¶ Bases:
object
Representation of a quantity derived from the columns of a dataframe. The variable can also directly represent a column of the dataframe.
The variable object defines a human-readable name for the variable and it’s physical unit. The name and the unit are used for plotting and labeling of axes.
>>> Variable("MMC", "ditau_mmc_mlm_m", "GeV") <Variable 'MMC' [GeV]>
-
__call__
(dataframe)[source]¶ Returns an array or series of variable computed from the given dataframe. This method does not apply the blinding!
-
__dict__
= mappingproxy({'__module__': 'nnfwtbn.variable', '__doc__': '\n Representation of a quantity derived from the columns of a dataframe. The\n variable can also directly represent a column of the dataframe. \n\n The variable object defines a human-readable name for the variable and\n it\'s physical unit. The name and the unit are used for plotting and\n labeling of axes.\n\n >>> Variable("MMC", "ditau_mmc_mlm_m", "GeV")\n <Variable \'MMC\' [GeV]>\n ', '__init__': <function Variable.__init__>, '__call__': <function Variable.__call__>, '__repr__': <function Variable.__repr__>, '__eq__': <function Variable.__eq__>, 'save_to_h5': <function Variable.save_to_h5>, 'load_from_h5': <classmethod object>, '__dict__': <attribute '__dict__' of 'Variable' objects>, '__weakref__': <attribute '__weakref__' of 'Variable' objects>, '__hash__': None, '__annotations__': {}})¶
-
__hash__
= None¶
-
__init__
(name, definition, unit=None, blinding=None)[source]¶ Returns a new variable object. The first argument is a human-readable name (potentially using latex). The second argument defines the value of the variable. This can be a string naming the column of the dataframe or a callable that computes the value when a dataframe is passed to it.
>>> Variable("MMC", "ditau_mmc_mlm_m", "GeV") <Variable 'MMC' [GeV]>
>>> Variable("$\\Delta \\eta$", lambda df: df.jet_0_eta - df.jet_1_eta) <Variable '$\\Delta \\eta$'>
The optional argument unit defines the unit of the variable. This information is used for plotting, especially for labeling axes.
The optional blinding argument accepts a blinding object implementing the blinding strategy.
-
__module__
= 'nnfwtbn.variable'¶
-