john_toolbox.preprocessing.pandas_transformers.FunctionTransformer¶
-
class
john_toolbox.preprocessing.pandas_transformers.FunctionTransformer(column: str, func: Callable, dict_args: Dict, mode: str = 'apply_by_multiprocessing', return_col: Optional[str] = None, drop_input_col: bool = False)[source]¶ Bases:
sklearn.base.BaseEstimatorApply function Transformer.
For example, please refer to : https://github.com/nguyenanht/john-toolbox/blob/develop/notebooks/tutorial1%20-%20PandasPipeline%20%26%20PandasTransformer.ipynb
-
column¶ Column to transform with the encoder.
- Type
str, Optional
-
func¶ Function to apply.
- Type
Callable
-
dict_args¶ Arguments to pass to the function.
- Type
Dict
-
mode¶ Mode accepted : apply_by_multiprocessing, apply or vectorized apply_by_multiprocessing: apply the function by using total_number of cpu minus one apply: apply in standard way the function. vectorized: vectorise an operation. For example add 2 columns.
- Type
str, Optional, default apply_by_multiprocessing
-
return_col¶ Name of the output.
- Type
str, Optional, default=column
-
drop_input_col¶ Drop the input column.
- Type
str, default=False
See also
SelectColumnsTransformerKeep columns from DataFrame.
DropColumnsTransformerDrop columns from DataFrame.
EncoderTransformerDrop columns from DataFrame.
DebugTransformerKeep track of information about DataFrame between steps.
Methods
fitGet parameters for this estimator.
Set the parameters of this estimator.
transform-
get_params(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
-
set_params(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
-