heamy.pipeline module

class heamy.pipeline.ModelsPipeline(*args)[source]

Combines sequence of models.

add(model)[source]

Adds a single model.

Parameters:model : Estimator
apply(func)[source]

Applies function along models output.

Parameters:

func : function

Arbitrary function with one argument.

Returns:

PipeApply

Examples

>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> pipeline.apply(lambda x: np.max(x,axis=0)).execute()
blend(proportion=0.2, stratify=False, seed=100, indices=None, add_diff=False)[source]

Blends sequence of models.

Parameters:

proportion : float, default 0.2

stratify : bool, default False

seed : int, default False

indices : list(np.ndarray,np.ndarray), default None

Two numpy arrays that contain indices for train/test slicing.

add_diff : bool, default False

Returns:

DataFrame

Examples

>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> pipeline.blend(seed=15)
>>> # Custom indices
>>> train_index = np.array(range(250))
>>> test_index = np.array(range(250,333))
>>> res = model_rf.blend(indicies=(train_index,test_index))
find_weights(scorer, test_size=0.2, method='SLSQP')[source]

Finds optimal weights for weighted average of models.

Parameters:

scorer : function

Scikit-learn like metric.

test_size : float, default 0.2

method : str

Type of solver. Should be one of:

  • ‘Nelder-Mead’
  • ‘Powell’
  • ‘CG’
  • ‘BFGS’
  • ‘Newton-CG’
  • ‘L-BFGS-B’
  • ‘TNC’
  • ‘COBYLA’
  • ‘SLSQP’
  • ‘dogleg’
  • ‘trust-ncg’
Returns:

list

gmean()[source]

Returns the gmean of the models predictions.

Returns:PipeApply
max()[source]

Returns the max of the models predictions.

Returns:PipeApply
mean()[source]

Returns the mean of the models predictions.

Returns:PipeApply

Examples

>>> # Execute
>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> pipeline.mean().execute()
>>> # Validate
>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> pipeline.mean().validate()
min()[source]

Returns the min of the models predictions.

Returns:PipeApply
stack(k=5, stratify=False, shuffle=True, seed=100, full_test=True, add_diff=False)[source]

Stacks sequence of models.

Parameters:

k : int, default 5

Number of folds.

stratify : bool, default False

shuffle : bool, default True

seed : int, default 100

full_test : bool, default True

If True then evaluate test dataset on the full data otherwise take the mean of every fold.

add_diff : bool, default False

Returns:

DataFrame

Examples

>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> stack_ds = pipeline.stack(k=10, seed=111)
weight(weights)[source]

Applies weighted mean to models.

Parameters:weights : list
Returns:np.ndarray

Examples

>>> pipeline = ModelsPipeline(model_rf,model_lr)
>>> pipeline.weight([0.8,0.2])