metasklearn.utils package
metasklearn.utils.data_handler module
- class metasklearn.utils.data_handler.Data(X=None, y=None, name='Unknown')[source]
Bases:
objectThe structure of our supported Data class
- Parameters:
X (np.ndarray) – The features of your data
y (np.ndarray) – The labels of your data
- SUPPORT = {'scaler': ['standard', 'minmax', 'max-abs', 'log1p', 'loge', 'sqrt', 'sinh-arc-sinh', 'robust', 'box-cox', 'yeo-johnson']}
- class metasklearn.utils.data_handler.DataTransformer(scaling_methods=('standard',), list_dict_paras=None)[source]
Bases:
BaseEstimator,TransformerMixinA Scikit-learn compatible transformer that applies a sequence of scaling techniques to the input data, including standard, min-max, log, robust, and custom transformations.
- SUPPORTED_SCALERS
Dictionary mapping scaler names to their corresponding classes.
- Type:
dict
- SUPPORTED_SCALERS = {'box-cox': <class 'metasklearn.utils.scaler.BoxCoxScaler'>, 'log1p': <class 'metasklearn.utils.scaler.Log1pScaler'>, 'loge': <class 'metasklearn.utils.scaler.LogeScaler'>, 'max-abs': <class 'sklearn.preprocessing._data.MaxAbsScaler'>, 'minmax': <class 'sklearn.preprocessing._data.MinMaxScaler'>, 'robust': <class 'sklearn.preprocessing._data.RobustScaler'>, 'sinh-arc-sinh': <class 'metasklearn.utils.scaler.SinhArcSinhScaler'>, 'sqrt': <class 'metasklearn.utils.scaler.SqrtScaler'>, 'standard': <class 'sklearn.preprocessing._data.StandardScaler'>, 'yeo-johnson': <class 'metasklearn.utils.scaler.YeoJohnsonScaler'>}
- fit(X, y=None)[source]
Fit the sequence of scalers on the data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data.
y (Ignored) – Not used, exists for compatibility with sklearn’s pipeline.
- Returns:
self – Fitted transformer.
- Return type:
object
- class metasklearn.utils.data_handler.FeatureEngineering[source]
Bases:
objectA class for performing custom feature engineering on numeric datasets.
- create_threshold_binary_features(X, threshold)[source]
Add binary indicator columns to mark values below a given threshold. Each original column is followed by a new column indicating whether each value is below the threshold (1 if True, 0 otherwise).
- Parameters:
X (numpy.ndarray) – The input 2D matrix of shape (n_samples, n_features).
threshold (float) – The threshold value used to determine binary flags.
- Returns:
A new 2D matrix of shape (n_samples, 2 * n_features), where each original column is followed by its binary indicator column.
- Return type:
numpy.ndarray
- Raises:
ValueError – If X is not a NumPy array or not 2D. If threshold is not a numeric type.
- class metasklearn.utils.data_handler.TimeSeriesDifferencer(interval=1)[source]
Bases:
objectA class for applying and reversing differencing on time series data.
Differencing helps remove trends and seasonality from time series for better modeling.
- difference(X)[source]
Apply differencing to the input time series.
- Parameters:
X (array-like) – The original time series data.
- Returns:
The differenced time series of length (len(X) - interval).
- Return type:
np.ndarray
- inverse_difference(diff_data)[source]
Reverse the differencing transformation using the stored original data.
- Parameters:
diff_data (array-like) – The differenced data to invert.
- Returns:
The reconstructed original data (excluding the first interval values).
- Return type:
np.ndarray
- Raises:
ValueError – If the original data is not available.
metasklearn.utils.evaluation module
- metasklearn.utils.evaluation.get_all_classification_metrics()[source]
Gets a dictionary of all supported classification metrics.
This function returns a dictionary where keys are metric names and values are their optimization types (“min” or “max”).
- Returns:
A dictionary containing all supported classification metrics.
- Return type:
dict
- metasklearn.utils.evaluation.get_all_regression_metrics()[source]
Gets a dictionary of all supported regression metrics.
This function returns a dictionary where keys are metric names and values are their optimization types (“min” or “max”).
- Returns:
A dictionary containing all supported regression metrics.
- Return type:
dict
- metasklearn.utils.evaluation.get_metric_sklearn(task='classification', metric_names=None)[source]
Creates a dictionary of scorers for scikit-learn cross-validation.
This function takes the task type (classification or regression) and a list of metric names. It creates an appropriate metrics instance (ClassificationMetric or RegressionMetric) and iterates through the provided metric names. For each metric name, it checks if it exists in the metrics instance and retrieves the corresponding method. Finally, it uses make_scorer to convert the method to a scorer and adds it to a dictionary.
- Parameters:
task (str, optional) – The task type, either “classification” or “regression”. Defaults to “classification”.
metric_names (list, optional) – A list of metric names. Defaults to None.
- Returns:
A dictionary of scorers for scikit-learn cross-validation.
- Return type:
dict
- metasklearn.utils.evaluation.get_metrics(problem, y_true, y_pred, metrics=None, testcase='test')[source]
Calculates metrics for regression or classification tasks.
This function takes the true labels (y_true), predicted labels (y_pred), problem type (regression or classification), a dictionary or list of metrics to calculate, and an optional test case name. It returns a dictionary containing the calculated metrics with descriptive names.
- Parameters:
problem (str) – The type of problem, either “regression” or “classification”.
y_true (array-like) – The true labels.
y_pred (array-like) – The predicted labels.
metrics (dict or list, optional) – A dictionary or list of metrics to calculate. Defaults to None.
testcase (str, optional) – An optional test case name to prepend to the metric names. Defaults to “test”.
- Returns:
A dictionary containing the calculated metrics with descriptive names.
- Return type:
dict
- Raises:
ValueError – If the metrics parameter is not a list or dictionary.
metasklearn.utils.scaler module
- class metasklearn.utils.scaler.BoxCoxScaler(lmbda=None)[source]
Bases:
BaseEstimator,TransformerMixin
- class metasklearn.utils.scaler.LabelEncoder[source]
Bases:
objectEncode categorical labels as integer indices and decode them back.
This class maps unique categorical labels to integers from 0 to n_classes - 1.
- fit(y)[source]
Fit the encoder by finding unique labels in the input data.
- Parameters:
y (array-like) – Input labels.
- Returns:
self – Fitted LabelEncoder instance.
- Return type:
- fit_transform(y)[source]
Fit the encoder and transform labels in one step.
- Parameters:
y (array-like of shape (n_samples,)) – Input labels.
- Returns:
Encoded integer labels.
- Return type:
np.ndarray
- class metasklearn.utils.scaler.ObjectiveScaler(obj_name='sigmoid', ohe_scaler=None)[source]
Bases:
objectFor label scaler in classification (binary and multiple classification)
- class metasklearn.utils.scaler.OneHotEncoder[source]
Bases:
objectA simple implementation of one-hot encoding for 1D categorical data.
- categories_
Sorted array of unique categories fitted from the input data.
- Type:
np.ndarray
- fit(X)[source]
Fit the encoder to the unique categories in X.
- Parameters:
X (array-like) – 1D array of categorical values.
- Returns:
Fitted OneHotEncoder instance.
- Return type:
self
- fit_transform(X)[source]
Fit the encoder to X and transform X.
- Parameters:
X (array-like) – 1D array of categorical values.
- Returns:
One-hot encoded array of shape (n_samples, n_categories).
- Return type:
np.ndarray
- inverse_transform(one_hot)[source]
Convert one-hot encoded data back to original categories.
- Parameters:
one_hot (np.ndarray) – 2D array of one-hot encoded data.
- Returns:
1D array of original categorical values.
- Return type:
np.ndarray
- Raises:
ValueError – If the encoder has not been fitted or shape mismatch occurs.
- transform(X)[source]
Transform input data into one-hot encoded format.
- Parameters:
X (array-like) – 1D array of categorical values.
- Returns:
One-hot encoded array of shape (n_samples, n_categories).
- Return type:
np.ndarray
- Raises:
ValueError – If the encoder has not been fitted or unknown category is found.
- class metasklearn.utils.scaler.SinhArcSinhScaler(epsilon=0.1, delta=1.0)[source]
Bases:
BaseEstimator,TransformerMixin
metasklearn.utils.validation module
- metasklearn.utils.validation.check_bool(name: str, value: bool, bound=(True, False))[source]
Validate a boolean value against allowed values.
- Parameters:
name (str) – Name of the parameter.
value (bool) – Boolean value to validate.
bound (tuple, optional) – Tuple of allowed boolean values.
- Returns:
The validated boolean.
- Return type:
bool
- Raises:
ValueError – If the value is not a valid boolean or not in allowed set.
- metasklearn.utils.validation.check_float(name: str, value: None, bound=None)[source]
Validate and cast a value to a float, with optional bounds.
- Parameters:
name (str) – Name of the parameter (used in error message).
value (int or float) – The value to check.
bound (tuple or list, optional) – Inclusive or exclusive bounds.
- Returns:
The validated float value.
- Return type:
float
- Raises:
ValueError – If the value is not a float or not within bounds.
- metasklearn.utils.validation.check_int(name: str, value: None, bound=None)[source]
Validate and cast a value to an integer, with optional bounds.
- Parameters:
name (str) – Name of the parameter (used in error message).
value (int or float) – The value to check.
bound (tuple or list, optional) – Inclusive or exclusive bounds for validation.
- Returns:
The validated integer value.
- Return type:
int
- Raises:
ValueError – If the value is not an integer or not within bounds.
- metasklearn.utils.validation.check_str(name: str, value: str, bound=None)[source]
Validate a string against a list of allowed values.
- Parameters:
name (str) – Name of the parameter.
value (str) – The string value to check.
bound (list, optional) – List of allowed values.
- Returns:
The validated string.
- Return type:
str
- Raises:
ValueError – If the string is not allowed or not a string.
- metasklearn.utils.validation.check_tuple_float(name: str, values: tuple, bounds=None)[source]
Validate a sequence of floats with optional individual bounds.
- Parameters:
name (str) – Name of the parameter.
values (list, tuple, or ndarray) – Sequence of numeric values.
bounds (list of tuple/list, optional) – Bounds for each element.
- Returns:
The validated sequence.
- Return type:
list or tuple or ndarray
- Raises:
ValueError – If values are not floats or not within bounds.
- metasklearn.utils.validation.check_tuple_int(name: str, values: None, bounds=None)[source]
Validate a sequence of integers with optional individual bounds.
- Parameters:
name (str) – Name of the parameter.
values (list, tuple, or ndarray) – Sequence of integer values.
bounds (list of tuple/list, optional) – Bounds for each element.
- Returns:
The validated sequence.
- Return type:
list or tuple or ndarray
- Raises:
ValueError – If values are not integers or not within bounds.
- metasklearn.utils.validation.is_in_bound(value, bound)[source]
Check whether a numeric value is within the given bound.
The function accepts bounds as a list (inclusive) or tuple (exclusive). Supports open-ended bounds using float(‘-inf’) or float(‘inf’).
- Parameters:
value (float or int) – The value to check.
bound (list or tuple) – A 2-element sequence defining the lower and upper bounds.
- Returns:
True if the value is within the bounds, False otherwise.
- Return type:
bool
- metasklearn.utils.validation.is_str_in_list(value: str, my_list: list)[source]
Check whether a string is present in a given list.
- Parameters:
value (str) – The string to check.
my_list (list) – The list of valid strings.
- Returns:
True if the string is in the list, False otherwise.
- Return type:
bool