john_toolbox.preprocessing.pandas_transformers.FunctionTransformer

class john_toolbox.preprocessing.pandas_transformers.FunctionTransformer(column: str, func: Callable, dict_args: Dict, mode: str = 'apply_by_multiprocessing', return_col: Optional[str] = None, drop_input_col: bool = False)[source]

Bases: sklearn.base.BaseEstimator

Apply function Transformer.

For example, please refer to : https://github.com/nguyenanht/john-toolbox/blob/develop/notebooks/tutorial1%20-%20PandasPipeline%20%26%20PandasTransformer.ipynb

from https://stackoverflow.com/questions/42844457/scikit-learn-applying-an-arbitary-function-as-part-of-a-pipeline

column

Column to transform with the encoder.

Type

str, Optional

func

Function to apply.

Type

Callable

dict_args

Arguments to pass to the function.

Type

Dict

mode

Mode accepted : apply_by_multiprocessing, apply or vectorized apply_by_multiprocessing: apply the function by using total_number of cpu minus one apply: apply in standard way the function. vectorized: vectorise an operation. For example add 2 columns.

Type

str, Optional, default apply_by_multiprocessing

return_col

Name of the output.

Type

str, Optional, default=column

drop_input_col

Drop the input column.

Type

str, default=False

See also

SelectColumnsTransformer

Keep columns from DataFrame.

DropColumnsTransformer

Drop columns from DataFrame.

EncoderTransformer

Drop columns from DataFrame.

DebugTransformer

Keep track of information about DataFrame between steps.

Methods

fit

get_params

Get parameters for this estimator.

set_params

Set the parameters of this estimator.

transform

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance