heamy.pipeline module¶

class heamy.pipeline.ModelsPipeline(*args)[source]¶

Combines sequence of models.

add(model)[source]¶

Adds a single model.

Parameters:	model : Estimator

apply(func)[source]¶

Applies function along models output.

Parameters:

func : function

Arbitrary function with one argument.

Returns:

PipeApply

Examples

>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> pipeline.apply(lambda x: np.max(x,axis=0)).execute()

blend(proportion=0.2, stratify=False, seed=100, indices=None, add_diff=False)[source]¶

Blends sequence of models.

Parameters:

proportion : float, default 0.2

stratify : bool, default False

seed : int, default False

indices : list(np.ndarray,np.ndarray), default None

Two numpy arrays that contain indices for train/test slicing.

add_diff : bool, default False

Returns:

DataFrame

Examples

>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> pipeline.blend(seed=15)

>>> # Custom indices
>>> train_index = np.array(range(250))
>>> test_index = np.array(range(250,333))
>>> res = model_rf.blend(indicies=(train_index,test_index))

find_weights(scorer, test_size=0.2, method='SLSQP')[source]¶

Finds optimal weights for weighted average of models.

Parameters:

scorer : function

Scikit-learn like metric.

test_size : float, default 0.2

method : str

Type of solver. Should be one of:

‘Nelder-Mead’

‘Powell’

‘CG’

‘BFGS’

‘Newton-CG’

‘L-BFGS-B’

‘TNC’

‘COBYLA’

‘SLSQP’

‘dogleg’

‘trust-ncg’

Returns:

list

gmean()[source]¶

Returns the gmean of the models predictions.

Returns:	PipeApply

max()[source]¶

Returns the max of the models predictions.

Returns:	PipeApply

mean()[source]¶

Returns the mean of the models predictions.

Returns:	PipeApply

Examples

>>> # Execute
>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> pipeline.mean().execute()

>>> # Validate
>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> pipeline.mean().validate()

min()[source]¶

Returns the min of the models predictions.

Returns:	PipeApply

stack(k=5, stratify=False, shuffle=True, seed=100, full_test=True, add_diff=False)[source]¶

Stacks sequence of models.

Parameters:

k : int, default 5

Number of folds.

stratify : bool, default False

shuffle : bool, default True

seed : int, default 100

full_test : bool, default True

If True then evaluate test dataset on the full data otherwise take the mean of every fold.

add_diff : bool, default False

Returns:

DataFrame

Examples

>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> stack_ds = pipeline.stack(k=10, seed=111)

weight(weights)[source]¶

Applies weighted mean to models.

Parameters:	weights : list
Returns:	np.ndarray

Examples

>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> pipeline.weight([0.8,0.2])