heamy.estimator module¶
Regressor¶
-
class
heamy.estimator.
Regressor
(dataset, estimator=None, parameters=None, name=None, use_cache=True)[source]¶ Bases:
heamy.estimator.BaseEstimator
Wrapper for regression problems.
Parameters: dataset : Dataset object
estimator : a callable scikit-learn like interface, custom function/class, optional
parameters : dict, optional
Arguments for estimator object.
name : str, optional
The unique name of Estimator object.
use_cache : bool, optional
if True then validate/predict/stack/blend results will be cached.
-
blend
(proportion=0.2, stratify=False, seed=100, indices=None)¶ Blend a single model. You should rarely be using this method. Use ModelsPipeline.blend instead.
Parameters: proportion : float, default 0.2
Test size holdout.
stratify : bool, default False
seed : int, default 100
indices : list(np.ndarray,np.ndarray), default None
Two numpy arrays that contain indices for train/test slicing. (train_index,test_index)
Returns: Dataset
-
stack
(k=5, stratify=False, shuffle=True, seed=100, full_test=True)¶ Stack a single model. You should rarely be using this method. Use ModelsPipeline.stack instead.
Parameters: k : int, default 5
stratify : bool, default False
shuffle : bool, default True
seed : int, default 100
full_test : bool, default True
If True then evaluate test dataset on the full data otherwise take the mean of every fold.
Returns: Dataset with out of fold predictions.
-
validate
(scorer=None, k=1, test_size=0.1, stratify=False, shuffle=True, seed=100, indices=None)¶ Evaluate score by cross-validation.
Parameters: scorer : function(y_true,y_pred), default None
Scikit-learn like metric that returns a score.
k : int, default 1
The number of folds for validation.
If k=1 then randomly split X_train into two parts otherwise use K-fold approach.
test_size : float, default 0.1
Size of the test holdout if k=1.
stratify : bool, default False
shuffle : bool, default True
seed : int, default 100
indices : list(np.array,np.array), default None
Two numpy arrays that contain indices for train/test slicing. (train_index,test_index)
Returns: y_true: list
Actual labels.
y_pred: list
Predicted labels.
Examples
>>> # Custom indices >>> train_index = np.array(range(250)) >>> test_index = np.array(range(250,333)) >>> res = model_rf.validate(mean_absolute_error,indices=(train_index,test_index))
-
Classifier¶
-
class
heamy.estimator.
Classifier
(dataset, estimator=None, parameters=None, name=None, use_cache=True, probability=True)[source]¶ Bases:
heamy.estimator.BaseEstimator
Wrapper for classification problems.
Parameters: dataset : Dataset object
estimator : a callable scikit-learn like interface, custom function/class, optional
parameters : dict, optional
Arguments for estimator object.
name : str, optional
The unique name of Estimator object.
use_cache : bool, optional
if True then validate/predict/stack/blend results will be cached.
-
blend
(proportion=0.2, stratify=False, seed=100, indices=None)¶ Blend a single model. You should rarely be using this method. Use ModelsPipeline.blend instead.
Parameters: proportion : float, default 0.2
Test size holdout.
stratify : bool, default False
seed : int, default 100
indices : list(np.ndarray,np.ndarray), default None
Two numpy arrays that contain indices for train/test slicing. (train_index,test_index)
Returns: Dataset
-
stack
(k=5, stratify=False, shuffle=True, seed=100, full_test=True)¶ Stack a single model. You should rarely be using this method. Use ModelsPipeline.stack instead.
Parameters: k : int, default 5
stratify : bool, default False
shuffle : bool, default True
seed : int, default 100
full_test : bool, default True
If True then evaluate test dataset on the full data otherwise take the mean of every fold.
Returns: Dataset with out of fold predictions.
-
validate
(scorer=None, k=1, test_size=0.1, stratify=False, shuffle=True, seed=100, indices=None)¶ Evaluate score by cross-validation.
Parameters: scorer : function(y_true,y_pred), default None
Scikit-learn like metric that returns a score.
k : int, default 1
The number of folds for validation.
If k=1 then randomly split X_train into two parts otherwise use K-fold approach.
test_size : float, default 0.1
Size of the test holdout if k=1.
stratify : bool, default False
shuffle : bool, default True
seed : int, default 100
indices : list(np.array,np.array), default None
Two numpy arrays that contain indices for train/test slicing. (train_index,test_index)
Returns: y_true: list
Actual labels.
y_pred: list
Predicted labels.
Examples
>>> # Custom indices >>> train_index = np.array(range(250)) >>> test_index = np.array(range(250,333)) >>> res = model_rf.validate(mean_absolute_error,indices=(train_index,test_index))
-