ani1_interface

Created on Tue Jun 8 17:21:16 2021

@author: fhu14

Module Contents

Functions

get_data_type(→ List[str])

Obtains the corresponding ANI dataset keys from the keys specified

get_targets_from_h5file(→ dict)

Obtains the necessary target information from the dataset stored at ani1_path

get_ani1data(→ List[Dict])

Extracts data from the ANI-1 data files

get_ani1data_boosted(→ List[Dict])

Extracts data from the ANI-1 data files

Attributes

ani1_path

ani1_interface.get_data_type(specs: Union[List[str], str]) List[str]
Obtains the corresponding ANI dataset keys from the keys specified

in specs

Parameters

specs (Union[List[str], str]) – The abbreviated keys used to refer to specific ANI dataset keys.

Returns

The list of ANI keys corresponding to the

given keys in specs.

Return type

res (list[str])

ani1_interface.get_targets_from_h5file(data_specs: Union[List[str], str], ani1_path: str, exclude: dict = None) dict

Obtains the necessary target information from the dataset stored at ani1_path

Parameters
  • data_specs (Union[List[str], str]) – A string or list of strings encoding the data fields that should be extracted.

  • ani1_path (str) – The string indicating the relative or total path to the h5 dataset file.

  • exclude (dict) – Contains keys to exclude. Defaults to None.

Returns

A dictionary mapping the molecule name to the

corresponding data for that molecule as specified in data_specs

Return type

target_molecs (dict)

ani1_interface.ani1_path = data/ANI-1ccx_clean_fullentry.h5
ani1_interface.get_ani1data(allowed_Z: List[int], heavy_atoms: List[int], max_config: int, target: Dict[str, str], ani1_path: str = ani1_path, exclude: List[str] = []) List[Dict]

Extracts data from the ANI-1 data files

Parameters
  • allowed_Z (List[int]) – Include only molecules whose elements are in this list

  • heavy_atoms (List[int]) – Include only molecules for which the number of heavy atoms is in this list

  • max_config (int) – Maximum number of configurations included for each molecule.

  • target (Dict[str,str]) – entries specify the targets to extract key: target_name name assigned to the target value: key that the ANI-1 file assigns to this target

  • ani1_path (str) – The relative path to the data file. Defaults to ‘data/ANI-1ccx_clean_fullentry.h5’

  • exclude (List[str], optional) – Exclude these molecule names from the returned molecules Defaults to [].

Returns

Each Dict contains the data for a single
molecular structure:
{

‘name’: str with name ANI1 assigns to this molecule type ‘iconfig’: int with number ANI1 assignes to this structure ‘atomic_numbers’: List of Zs ‘coordinates’: numpy array (:,3) with cartesian coordinates ‘targets’: Dict whose keys are the target_names in the

target argument and whose values are numpy arrays with the ANI-1 data

Return type

molecules (List[Dict])

Notes: The ANI-1 data h5 files are indexed by a molecule name. For each

molecule, the data is stored in arrays whose first dimension is the configuration number, e.g. coordinates(iconfig,atom_num,3). This function treats each configuration as its own molecular structure. The returned dictionaries include the ANI1-name and configuration number in the dictionary, along with the data for that individual molecular structure.

ani1_interface.get_ani1data_boosted(allowed_Z: List[int], heavy_atoms: List[int], target_atoms: List[int], criterion: str, max_config: int, boosted_config: int, target: Dict[str, str], ani1_path: str = ani1_path, exclude: List[str] = []) List[Dict]

Extracts data from the ANI-1 data files

Parameters
  • allowed_Z (List[int]) – Include only molecules whose elements are in this list

  • heavy_atoms (List[int]) – Include only molecules for which the number of heavy atoms is in this list

  • target_atoms (List[int]) – List of atomic numbers for atom that need more representation. For example, if O needs more representation, target_atoms will include [8…]

  • criterion (str) – The requirement for boosted molecules, one of ‘any’ or ‘all’. If ‘any’, any molecule that contains at least one of the target atoms is boosted. If ‘all’, then only molecule that contain all the target atoms are boosted.

  • max_config (int) – Maximum number of configurations included for each molecule by default.

  • boosted_config (int) – Maximum number of configurations included for each molecule if they contain elements contained in the list.

  • target (Dict[str,str]) – entries specify the targets to extract key: target_name name assigned to the target value: key that the ANI-1 file assigns to this target

  • ani1_path (str) – The relative path to the data file. Defaults to ‘data/ANI-1ccx_clean_fullentry.h5’

  • exclude (List[str], optional) – Exclude these molecule names from the returned molecules Defaults to [].

Returns

Each Dict contains the data for a single
molecular structure:
{

‘name’: str with name ANI1 assigns to this molecule type ‘iconfig’: int with number ANI1 assignes to this structure ‘atomic_numbers’: List of Zs ‘coordinates’: numpy array (:,3) with cartesian coordinates ‘targets’: Dict whose keys are the target_names in the

target argument and whose values are numpy arrays with the ANI-1 data

Return type

molecules (List[Dict])

Notes: The ANI-1 data h5 files are indexed by a molecule name. For each

molecule, the data is stored in arrays whose first dimension is the configuration number, e.g. coordinates(iconfig,atom_num,3). This function treats each configuration as its own molecular structure. The returned dictionaries include the ANI1-name and configuration number in the dictionary, along with the data for that individual molecular structure.