Installation

Install the current PyPI release

$ pip install metasklearn==0.3.0

Install directly from source code.

$ git clone https://github.com/thieu1995/MetaSklearn.git
$ cd MetaSklearn
$ python setup.py install

In case, you want to install the development version from Github

$ pip install git+https://github.com/thieu1995/MetaSklearn

After installation, you can check the version of installed MetaSklearn:

$ python
>>> import metasklearn
>>> metasklearn.__version__

Tutorials

In this section, we will explore the usage of the MetaSklearn model with the assistance of a dataset. While all the preprocessing steps mentioned below can be replicated using Scikit-Learn, we have implemented some utility functions to provide users with convenience and faster usage.

Provided classes

Classes that hold Searcher and Dataset

from metasklearn import DataTransformer, Data
from metasklearn import MetaSearchCV

DataTransformer class

We provide many scaler classes that you can select and make a combination of transforming your data via DataTransformer class. For example: scale data by Loge and then Sqrt and then MinMax.

from metasklearn import DataTransformer
import pandas as pd
from sklearn.model_selection import train_test_split

dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:5].values
y = dataset.iloc[:, 5].values
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.2)

dt = DataTransformer(scaling_methods=("loge", "sqrt", "minmax"))
X_train_scaled = dt.fit_transform(X_train)
X_test_scaled = dt.transform(X_test)

Data class

You can load your dataset into Data class
You can split dataset to train and test set
You can scale dataset without using DataTransformer class
You can scale labels using LabelEncoder

from metasklearn import Data
import pandas as pd

dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:5].values
y = dataset.iloc[:, 5].values

data = Data(X, y, name="position_salaries")

#### Split dataset into train and test set
data.split_train_test(test_size=0.2, shuffle=True, random_state=100, inplace=True)

#### Feature Scaling
data.X_train, scaler_X = data.scale(data.X_train, scaling_methods=("standard", "sqrt", "minmax"))
data.X_test = scaler_X.transform(data.X_test)

data.y_train, scaler_y = data.encode_label(data.y_train)  # This is for classification problem only
data.y_test = scaler_y.transform(data.y_test)

Searcher class

In this example, we will use MetaSearchCV to search for the best hyper-parameters of the SVM model.

from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from metasklearn import MetaSearchCV, FloatVar, StringVar

# Load dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define param bounds for SVC

# param_bounds = {          ==> This is for GridSearchCV, show you how to convert to our MetaSearchCV
#     "C": [0.1, 100],
#     "gamma": [1e-4, 1],
#     "kernel": ["linear", "rbf", "poly"]
# }

param_bounds = [
    FloatVar(lb=0., ub=100., name="C"),
    FloatVar(lb=1e-4, ub=1., name="gamma"),
    StringVar(valid_sets=("linear", "rbf", "poly"), name="kernel")
]

# Initialize and fit MetaSearchCV
searcher = MetaSearchCV(
    estimator=SVC(),
    param_bounds=param_bounds,
    task_type="classification",
    optim="BaseGA",     # Using Genetic Algorithm for hyper-parameter optimization
    optim_params={"epoch": 20, "pop_size": 30, "name": "GA"},
    cv=3,
    scoring="AS",  # or any custom scoring like "F1_macro"
    seed=42,
    n_jobs=2,
    verbose=True,
    mode='single', n_workers=None, termination=None
)

searcher.fit(X_train, y_train)
print("Best parameters (Classification):", searcher.best_params)
print("Best model: ", searcher.best_estimator)
print("Best score during searching: ", searcher.best_score)

# Make prediction after re-fit
y_pred = searcher.predict(X_test)
print("Test Accuracy:", searcher.score(X_test, y_test))
print("Test Score: ", searcher.scores(X_test, y_test, list_metrics=("AS", "RS", "PS", "F1S")))

Please check out the examples for more details on how to use the MetaSearchCV class.