ani1_interface
Created on Tue Jun 8 17:21:16 2021
@author: fhu14
Module Contents
Functions
|
Obtains the corresponding ANI dataset keys from the keys specified |
|
Obtains the necessary target information from the dataset stored at ani1_path |
|
Extracts data from the ANI-1 data files |
|
Extracts data from the ANI-1 data files |
Attributes
- ani1_interface.get_data_type(specs: Union[List[str], str]) List[str]
- Obtains the corresponding ANI dataset keys from the keys specified
in specs
- Parameters
specs (Union[List[str], str]) – The abbreviated keys used to refer to specific ANI dataset keys.
- Returns
- The list of ANI keys corresponding to the
given keys in specs.
- Return type
res (list[str])
- ani1_interface.get_targets_from_h5file(data_specs: Union[List[str], str], ani1_path: str, exclude: dict = None) dict
Obtains the necessary target information from the dataset stored at ani1_path
- Parameters
data_specs (Union[List[str], str]) – A string or list of strings encoding the data fields that should be extracted.
ani1_path (str) – The string indicating the relative or total path to the h5 dataset file.
exclude (dict) – Contains keys to exclude. Defaults to None.
- Returns
- A dictionary mapping the molecule name to the
corresponding data for that molecule as specified in data_specs
- Return type
target_molecs (dict)
- ani1_interface.ani1_path = data/ANI-1ccx_clean_fullentry.h5
- ani1_interface.get_ani1data(allowed_Z: List[int], heavy_atoms: List[int], max_config: int, target: Dict[str, str], ani1_path: str = ani1_path, exclude: List[str] = []) List[Dict]
Extracts data from the ANI-1 data files
- Parameters
allowed_Z (List[int]) – Include only molecules whose elements are in this list
heavy_atoms (List[int]) – Include only molecules for which the number of heavy atoms is in this list
max_config (int) – Maximum number of configurations included for each molecule.
target (Dict[str,str]) – entries specify the targets to extract key: target_name name assigned to the target value: key that the ANI-1 file assigns to this target
ani1_path (str) – The relative path to the data file. Defaults to ‘data/ANI-1ccx_clean_fullentry.h5’
exclude (List[str], optional) – Exclude these molecule names from the returned molecules Defaults to [].
- Returns
- Each Dict contains the data for a single
- molecular structure:
- {
‘name’: str with name ANI1 assigns to this molecule type ‘iconfig’: int with number ANI1 assignes to this structure ‘atomic_numbers’: List of Zs ‘coordinates’: numpy array (:,3) with cartesian coordinates ‘targets’: Dict whose keys are the target_names in the
target argument and whose values are numpy arrays with the ANI-1 data
- Return type
molecules (List[Dict])
- Notes: The ANI-1 data h5 files are indexed by a molecule name. For each
molecule, the data is stored in arrays whose first dimension is the configuration number, e.g. coordinates(iconfig,atom_num,3). This function treats each configuration as its own molecular structure. The returned dictionaries include the ANI1-name and configuration number in the dictionary, along with the data for that individual molecular structure.
- ani1_interface.get_ani1data_boosted(allowed_Z: List[int], heavy_atoms: List[int], target_atoms: List[int], criterion: str, max_config: int, boosted_config: int, target: Dict[str, str], ani1_path: str = ani1_path, exclude: List[str] = []) List[Dict]
Extracts data from the ANI-1 data files
- Parameters
allowed_Z (List[int]) – Include only molecules whose elements are in this list
heavy_atoms (List[int]) – Include only molecules for which the number of heavy atoms is in this list
target_atoms (List[int]) – List of atomic numbers for atom that need more representation. For example, if O needs more representation, target_atoms will include [8…]
criterion (str) – The requirement for boosted molecules, one of ‘any’ or ‘all’. If ‘any’, any molecule that contains at least one of the target atoms is boosted. If ‘all’, then only molecule that contain all the target atoms are boosted.
max_config (int) – Maximum number of configurations included for each molecule by default.
boosted_config (int) – Maximum number of configurations included for each molecule if they contain elements contained in the list.
target (Dict[str,str]) – entries specify the targets to extract key: target_name name assigned to the target value: key that the ANI-1 file assigns to this target
ani1_path (str) – The relative path to the data file. Defaults to ‘data/ANI-1ccx_clean_fullentry.h5’
exclude (List[str], optional) – Exclude these molecule names from the returned molecules Defaults to [].
- Returns
- Each Dict contains the data for a single
- molecular structure:
- {
‘name’: str with name ANI1 assigns to this molecule type ‘iconfig’: int with number ANI1 assignes to this structure ‘atomic_numbers’: List of Zs ‘coordinates’: numpy array (:,3) with cartesian coordinates ‘targets’: Dict whose keys are the target_names in the
target argument and whose values are numpy arrays with the ANI-1 data
- Return type
molecules (List[Dict])
- Notes: The ANI-1 data h5 files are indexed by a molecule name. For each
molecule, the data is stored in arrays whose first dimension is the configuration number, e.g. coordinates(iconfig,atom_num,3). This function treats each configuration as its own molecular structure. The returned dictionaries include the ANI1-name and configuration number in the dictionary, along with the data for that individual molecular structure.