Identifying protein domains using AlphaFoldDB
This module interacts with the AlphaFold Database (AFDB) to obtain and analyze Predicted Aligned Error (PAE) data for identifying protein domains. Domains are determined from PAE values, grouping residues with similar PAE thresholds.
- Functions:
read_afdb_json(accession_id): Retrieves PAE data.
find_domain_by_res(domains, res): Helper function to identify the domain a residue is in.
find_domains_from_pae(pae): Groups residues into domains based on PAE.
- Dependencies:
requests: Used for making HTTP requests to the AlphaFold Database.
.classes.Domain: The Domain class used to represent protein domains.
- alphafragment.alphafold_db_domain_identification.find_domain_by_res(domains, res)
Helper function to find the domain that contains a given residue.
- Parameters:
domains (list of Domain): The list of current domain objects.
res (int): The residue index to find within the domains.
- Returns:
Domain object if the residue is found within a domain, None otherwise.
- alphafragment.alphafold_db_domain_identification.find_domains_from_pae(pae, method='definite', custom_params=None)
Analyzes Predicted Aligned Error (PAE) data to group residues into domains. This function iterates through residue pairs, determining their domain membership based on PAE values and residue distances. Domains are represented as Domain objects with unique identifiers, start, and end residues.
- Parameters:
pae (list of lists): A 2D matrix of PAE values between residue pairs, where pae[i][j] is the PAE between residues i and j.
method (str, optional): Strategy for grouping residues into domains. Default is definite. Options are:
‘cautious’ - Groups residues into domains with moderate PAE thresholds, aiming to balance sensitivity and specificity.
‘definite’ - Only groups residues into domains if there is very high confidence in their relative positions. Likely to produce smaller domains.
‘custom’ - Allows specification of custom PAE thresholds via the custom_params argument.
custom_params (dict, optional): Required if method is ‘custom’, ignored otherwise. Must include the following keys, with integer values for each:
‘res_dist_cutoff’ (int) - The residue distance threshold to differentiate between close and further residue evaluations. Meaningless if close_pae_val == further_pae_val, or if set below 4 as residues closer than this are ignored anyway due to high base confidence in their relative positions. Set to 10 for cautious grouping method, and irrelavant for definite grouping method as close and further pae_vals are equal.
‘close_pae_val’ (int) - The PAE threshold which residue pairs must fall below to be considered within the same domain, if the distance between them is between 4 and res_dist_cutoff. Set to 4 for cautious grouping method and 4 for definite grouping method.
‘further_pae_val’ (int) - The PAE threshold which residue pairs must fall below to be considered within the same domain, if the distance between them is greater than res_dist_cutoff. Set to 11 for close grouping method and 4 for definite grouping method.
- Returns:
A list of Domain objects, each representing a domain with a unique identifier and the range of residues it encompasses. Domain positions use Pythonic 0-based indexing.
- Function logic:
Residues are grouped into the same domain based on a comparison of their Predicted Aligned Error (pae) values against thresholds determined by their proximity. If two residues are sufficiently close (based on a predefined or custom distance threshold), their PAE value must fall below a stricter, lower threshold to confirm high confidence in their proximity. For further apart residues, a higher PAE threshold can be used to account for less background confidence in their relative positions.
If a residue pair is decided to be in the same domain, the function checks if either residue is already part of an existing domain, and either adds the other residue to that domain or creates a new domain that includes both residues. All residues in between the two being assessed will also be automatically included in the domain - no gaps are allowed within domains.
Very close residues will always have high confidence in their relative positions, so the 3 neighbouring residues on either side of a residue are not evaluated.
- Raises:
ValueError: If the method is ‘custom’ and custom_params is not provided or missing necessary keys, if an invalid method is specified, or if the pae data is in the wrong format (not a square matrix of numbers).
- Note:
Domain positions are 0-based, so the start and end residues are 1 less than the actual residue numbers.
- alphafragment.alphafold_db_domain_identification.read_afdb_json(accession_id)
Fetches and returns the Predicted Aligned Error (PAE) data from the AlphaFold Database (AFDB) for a given UniProt accession ID.
- Parameters:
accession_id (str): The accession ID for which to fetch the corresponding AlphaFold PAE data. If ‘na’ or a similar placeholder is provided, the function will not attempt a request and will return None immediately.
- Returns:
list of lists or None: The predicted_aligned_error data as a list of lists if the request is successful and the data is present; otherwise, None.
- Prints an error message and returns None if:
The HTTP request to retrieve the file fails (e.g., file not found, network problems).
The ‘predicted_aligned_error’ data cannot be found within the response, indicating either an issue with the response data structure or the absence of PAE data for the provided accession ID.
- Note:
This function requires the requests library to make HTTP requests.
The function assumes that alphafold ids take the form ‘AF-[a UniProt accession]-F1.’ If there are multiple fragments associated with a uniprot id this will only take fragment 1