Utility functions for protein fragmentation
” Internal utility functions for protein fragmention process.
- Functions:
validate_fragmentation_parameters: Validates the parameters used for protein fragmentation.
merge_overlapping_domains: Merges overlapping domains within a list of domains.
check_valid_cutpoint: Helper function to validate potential fragment boundaries.
recursive_fragmentation: Main function for recursively generating fragments.
- Dependencies:
Domain: A class representing a domain within a protein sequence.
Protein: A class representing a protein sequence.
- alphafragment.fragmentation_methods.check_valid_cutpoint(res, domains, sequence_end)
Checks if a slicing index is a valid cutpoint.
- Parameters:
res (int): The residue position to check (will be sliced before this residue).
domains (list of Domain): The domains within the protein.
sequence_end (int): The last residue position in the protein sequence.
- Returns:
bool: True if the residue position is a valid cutpoint; False otherwise.
- alphafragment.fragmentation_methods.merge_overlapping_domains(domains)
Merges overlapping domains within a list of domains.
- Parameters:
domains (list of Domain): List of domain objects.
- Returns:
list of Domain: A list of domains where overlapping domains have been merged into single entries.
- alphafragment.fragmentation_methods.recursive_fragmentation(protein, domains, fragment_start, min_len, max_len, overlap, cutpoints=None)
Recursively splits a protein sequence into overlapping fragments, avoiding breaking domains.
- Parameters:
protein (Protein): The protein object to fragment.
domains (list of Domain): The list of domains within the protein - doesn’t use protein.domain_list as overlapping domains should be merged.
fragment_start (int): The starting position for fragmentation.
min_len (int): Minimum allowed fragment length.
max_len (int): Maximum allowed fragment length. (May be increased in the fragmentation process)
overlap (dict): Dictionary containing the ideal, minimum, and maximum overlap values, in the format: {‘min’:min_overlap, ‘ideal’:ideal_overlap, ‘max’:max_overlap} where min_overlap, ideal_overlap and max_overlap are all integers, with min_overlap<=ideal_overlap<=max_overlap.
cutpoints (list of tuples, optional): Accumulator for storing fragment cutpoints.
- Returns:
list of tuples or None: The list of fragment cutpoints if successful; otherwise, None.
- alphafragment.fragmentation_methods.validate_fragmentation_parameters(protein, min_len, max_len, overlap)
Validates the parameters used for protein fragmentation.
- Parameters:
protein (Protein): The protein object to be fragmented.
min_len (int): Minimum acceptable length for a protein fragment.
max_len (int): Maximum acceptable length for a protein fragment.
overlap (dict): Dictionary containing the ideal, minimum, and maximum overlap values, in the format: {‘min’:min_overlap, ‘ideal’:ideal_overlap, ‘max’:max_overlap} where min_overlap, ideal_overlap and max_overlap are all integers, with min_overlap<=ideal_overlap<=max_overlap.
- Returns:
None
- Raises:
ValueError: If any of the parameter validations fail.
TypeError: If the protein input is not an instance of the Protein class.