CIMtools.preprocessing package

class CIMtools.preprocessing.CGR(cgr_type='0')

Bases: CIMtools.base.CIMtoolsTransformerMixin

fit(x, y=None)

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(x)
class CIMtools.preprocessing.CGRToMatrix(charge=True, is_radical=False, isotope=False, hybridization=False, neighbors=False, in_ring=False, adjacent=True)

Bases: CIMtools.preprocessing.graph_to_matrix.GraphToMatrix

fit(x, y=None)

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(x)
class CIMtools.preprocessing.Conditions(temperature=298.15, pressure=100000, solvents=None)

Bases: object

property pressure
property solvents
property temperature
class CIMtools.preprocessing.ConditionsToDataFrame(max_solvents=1)

Bases: CIMtools.base.CIMtoolsTransformerMixin

fit(x, y=None)

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_feature_names()

Get feature names.

Returns

feature_names – Names of the features produced by transform.

Return type

list of strings

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(x)
class CIMtools.preprocessing.DictToConditions(temperature=None, pressure=None, solvents=None, amounts=None, default_temperature=298.15, default_pressure=100000, default_first_solvent='water', default_first_amount=1)

Bases: CIMtools.base.CIMtoolsTransformerMixin

Dictionary to Conditions mapper

Parameters
  • temperature – name of temperature key

  • pressure – name of pressure key

  • solvents – names of solvents keys

  • amounts – names of solvents amounts keys

fit(x, y=None)

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(x)
class CIMtools.preprocessing.EquationTransformer(equation='x')

Bases: CIMtools.base.CIMtoolsTransformerMixin

fit(x, y=None)

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_feature_names()

Get feature names.

Returns

feature_names – Names of the features produced by transform.

Return type

list of strings

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(x)
class CIMtools.preprocessing.Fragmentor(fragment_type=3, min_length=2, max_length=10, cgr_dynbonds=0, doallways=False, useformalcharge=False, header=None, workpath='.', version='2017', verbose=False, remove_rare_ratio=0, return_domain=False)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

ISIDA Fragmentor wrapper

Parameters
  • fragment_type – fragmentation type. see Fragmentor manual (-t)

  • min_length – minimal length of fragments. see Fragmentor manual (-l)

  • max_length – maximal length of fragments. see Fragmentor manual (-u)

  • cgr_dynbonds – see Fragmentor manual (-d)

  • doallways – see Fragmentor manual (–DoAllWays)

  • useformalcharge – see Fragmentor manual (–UseFormalCharge)

  • header

    if None descriptors will be generated on train set if False Fragmentor will work in headless mode. in this mod fit unusable and Fragmentor return

    all found descriptors

    else path string to existing header file acceptable

  • workpath – path for temp files

  • version – fragmentor version

  • verbose – silent Fragmentor output

  • remove_rare_ratio – if descriptors found on train less then given ratio it will be removed from header. if partial fit used, be sure to use finalize method. unusable if headless mode set

  • return_domain – add AD bool column. False in column is: molecule/CGR has new features

delete_work_path()
finalize()

finalize partial fitting procedure

fit(x, y=None)

Compute the header.

fit_transform(x, y=None)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_feature_names()

Get feature names.

Returns

feature_names – Names of the features produced by transform.

Return type

list of strings

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

partial_fit(x, y=None)
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

set_work_path(workpath)
transform(x)
class CIMtools.preprocessing.FragmentorFingerprint(fingerprint_size=12, bits_count=4, bits_active=2, fragment_type=3, min_length=2, max_length=10, cgr_dynbonds=0, doallways=False, useformalcharge=False, workpath='.', version='2017', verbose=False)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

ISIDA Fragmentor fragments to fingerprints

Parameters
  • fingerprint_size – exponent of 2 of fingerprint length

  • bits_count – include number of fragment descriptors into fingerprint. for example by default: if number is one: only one set of bits will be activated. if number is tho: additional set of bits will be activated (totally up to 2*bits_active). if number is four or greater: will be activated up 4*bits_active bits

  • bits_active – number of activated bits for each fragment (need for prevent collision bit lost)

  • workpath – path for temp files.

  • version – fragmentor version. need for selecting Fragmentor executables named as fragmentor-{version}

delete_work_path()
fit(x, y=None)
fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

set_work_path(workpath)
transform(x)
transform_bitset(x)
class CIMtools.preprocessing.MoleculesToMatrix(charge=True, is_radical=False, isotope=False, hybridization=False, neighbors=False, implicit_hydrogens=False, total_hydrogens=False, in_ring=False, adjacent=True)

Bases: CIMtools.preprocessing.graph_to_matrix.GraphToMatrix

fit(x, y=None)

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(x)
class CIMtools.preprocessing.RDTool(algorithm='max', verbose=False)

Bases: CIMtools.base.CIMtoolsTransformerMixin

Parameters

algorithm – ‘max’,’min’,’mixture’

fit(x, y=None)

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(x)
class CIMtools.preprocessing.SolventVectorizer(polarizability_form1=True, polarizability_form2=True, permettivity_form1=True, permettivity_form2=True, permettivity_form3=True, permettivity_form4=True, permettivity_polarizability=True, alpha_kamlet_taft=True, beta_kamlet_taft=True, pi_kamlet_taft=True, spp_katalan=True, sb_katalan=True, sa_katalan=True)

Bases: CIMtools.base.CIMtoolsTransformerMixin

fit(x, y=None)

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_feature_names()

Get feature names.

Returns

feature_names – Names of the features produced by transform.

Return type

list of strings

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(x)
class CIMtools.preprocessing.StandardizeCGR

Bases: CIMtools.base.CIMtoolsTransformerMixin

Reactions and Molecules standardization

For molecules kekule/thiele and groups standardization procedures will be applied.

fit(x, y=None)

Do nothing and return the estimator unchanged

This method is just there to implement the usual API and hence work in pipelines.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(x)