heamy.estimator module

Regressor

class heamy.estimator.Regressor(dataset, estimator=None, parameters=None, name=None, use_cache=True)[source]

Bases: heamy.estimator.BaseEstimator

Wrapper for regression problems.

Parameters:

dataset : Dataset object

estimator : a callable scikit-learn like interface, custom function/class, optional

parameters : dict, optional

Arguments for estimator object.

name : str, optional

The unique name of Estimator object.

use_cache : bool, optional

if True then validate/predict/stack/blend results will be cached.

blend(proportion=0.2, stratify=False, seed=100, indices=None)

Blend a single model. You should rarely be using this method. Use ModelsPipeline.blend instead.

Parameters:

proportion : float, default 0.2

Test size holdout.

stratify : bool, default False

seed : int, default 100

indices : list(np.ndarray,np.ndarray), default None

Two numpy arrays that contain indices for train/test slicing. (train_index,test_index)

Returns:

Dataset

stack(k=5, stratify=False, shuffle=True, seed=100, full_test=True)

Stack a single model. You should rarely be using this method. Use ModelsPipeline.stack instead.

Parameters:

k : int, default 5

stratify : bool, default False

shuffle : bool, default True

seed : int, default 100

full_test : bool, default True

If True then evaluate test dataset on the full data otherwise take the mean of every fold.

Returns:

Dataset with out of fold predictions.

validate(scorer=None, k=1, test_size=0.1, stratify=False, shuffle=True, seed=100, indices=None)

Evaluate score by cross-validation.

Parameters:

scorer : function(y_true,y_pred), default None

Scikit-learn like metric that returns a score.

k : int, default 1

The number of folds for validation.

If k=1 then randomly split X_train into two parts otherwise use K-fold approach.

test_size : float, default 0.1

Size of the test holdout if k=1.

stratify : bool, default False

shuffle : bool, default True

seed : int, default 100

indices : list(np.array,np.array), default None

Two numpy arrays that contain indices for train/test slicing. (train_index,test_index)

Returns:

y_true: list

Actual labels.

y_pred: list

Predicted labels.

Examples

>>> # Custom indices
>>> train_index = np.array(range(250))
>>> test_index = np.array(range(250,333))
>>> res = model_rf.validate(mean_absolute_error,indices=(train_index,test_index))

Classifier

class heamy.estimator.Classifier(dataset, estimator=None, parameters=None, name=None, use_cache=True, probability=True)[source]

Bases: heamy.estimator.BaseEstimator

Wrapper for classification problems.

Parameters:

dataset : Dataset object

estimator : a callable scikit-learn like interface, custom function/class, optional

parameters : dict, optional

Arguments for estimator object.

name : str, optional

The unique name of Estimator object.

use_cache : bool, optional

if True then validate/predict/stack/blend results will be cached.

blend(proportion=0.2, stratify=False, seed=100, indices=None)

Blend a single model. You should rarely be using this method. Use ModelsPipeline.blend instead.

Parameters:

proportion : float, default 0.2

Test size holdout.

stratify : bool, default False

seed : int, default 100

indices : list(np.ndarray,np.ndarray), default None

Two numpy arrays that contain indices for train/test slicing. (train_index,test_index)

Returns:

Dataset

stack(k=5, stratify=False, shuffle=True, seed=100, full_test=True)

Stack a single model. You should rarely be using this method. Use ModelsPipeline.stack instead.

Parameters:

k : int, default 5

stratify : bool, default False

shuffle : bool, default True

seed : int, default 100

full_test : bool, default True

If True then evaluate test dataset on the full data otherwise take the mean of every fold.

Returns:

Dataset with out of fold predictions.

validate(scorer=None, k=1, test_size=0.1, stratify=False, shuffle=True, seed=100, indices=None)

Evaluate score by cross-validation.

Parameters:

scorer : function(y_true,y_pred), default None

Scikit-learn like metric that returns a score.

k : int, default 1

The number of folds for validation.

If k=1 then randomly split X_train into two parts otherwise use K-fold approach.

test_size : float, default 0.1

Size of the test holdout if k=1.

stratify : bool, default False

shuffle : bool, default True

seed : int, default 100

indices : list(np.array,np.array), default None

Two numpy arrays that contain indices for train/test slicing. (train_index,test_index)

Returns:

y_true: list

Actual labels.

y_pred: list

Predicted labels.

Examples

>>> # Custom indices
>>> train_index = np.array(range(250))
>>> test_index = np.array(range(250,333))
>>> res = model_rf.validate(mean_absolute_error,indices=(train_index,test_index))