Utility functions for protein fragmentation
” Internal utility functions for protein fragmention process.
- Functions:
validate_fragmentation_parameters: Validates the parameters used for protein fragmentation.
merge_overlapping_domains: Merges overlapping domains within a list of domains.
check_valid_cutpoint: Helper function to validate potential fragment boundaries.
find_next_start: Finds the next valid fragment start position.
recursive_fragmentation: Main function for recursively generating fragments.
break_in_half: Splits a given Protein or ProteinSubsection object into two, ensuring no domains are broken and that the new subsections overlap.
- Dependencies:
time: Used for setting a time limit for recursive fragmentation.
Domain: A class representing a domain within a protein sequence.
Protein: A class representing a protein sequence.
ProteinSubsection: A class representing a subsection of a protein sequence.
- alphafragment.fragmentation_methods.break_in_half(protein, length, overlap)
Splits a given Protein or ProteinSubsection object into two subsections, ensuring no domains are broken and that the subsections overlap. The split is as close to the center as possible.
- Parameters:
protein (Protein or ProteinSubsection): The protein or protein subsection object to be split.
length (dict): Dictionary containing the ideal, minimum, and maximum length values.
overlap (dict): Dictionary containing the ideal, minimum, and maximum overlap values.
- Returns:
tuple: A tuple containing two new ProteinSubsection objects if valid; otherwise, None.
- alphafragment.fragmentation_methods.check_valid_cutpoint(res, domains, sequence_end)
Checks if a slicing index is a valid cutpoint.
- Parameters:
res (int): The residue position to check (will be sliced before this residue).
domains (list of Domain): The domains within the protein.
sequence_end (int): The last residue position in the protein sequence.
- Returns:
bool: True if the residue position is a valid cutpoint; False otherwise.
- alphafragment.fragmentation_methods.find_next_start(res, protein, domains, overlap)
Finds the next valid fragment start position.
- Parameters:
res (int): The current residue position.
protein (Protein): The protein object to be fragmented.
domains (list of Domain): The domains within the protein.
overlap (dict): Dictionary containing the ideal, minimum, and maximum overlap values, in the format: {‘min’: min_overlap, ‘ideal’: ideal_overlap, ‘max’: max_overlap} where min_overlap, ideal_overlap, and max_overlap are all integers, with min_overlap <= ideal_overlap <= max_overlap.
- Returns:
int or None: The next valid fragment start position if found; otherwise, None.
- alphafragment.fragmentation_methods.merge_overlapping_domains(domains)
Merges overlapping domains within a list of domains.
- Parameters:
domains (list of Domain): List of domain objects.
- Returns:
list of Domain: A list of domains where overlapping domains have been merged into single entries.
- alphafragment.fragmentation_methods.recursive_fragmentation(protein, domains, fragment_start, length, overlap, original_max_len, cutpoints=None, time_limit=None, start_time=None)
Recursively splits a protein sequence into overlapping fragments, avoiding breaking domains. If the process exceeds a specified time limit, it returns None, signaling that the protein should be split further.
- Parameters:
protein (Protein): The protein object to fragment.
domains (list of Domain): The list of domains within the protein - doesn’t use protein.domain_list as overlapping domains should be merged.
fragment_start (int): The starting position for fragmentation.
length (dict): Dictionary containing the ideal, minimum, and maximum length values.
overlap (dict): Dictionary containing the ideal, minimum, and maximum overlap values, in the format: {‘min’: min_overlap, ‘ideal’: ideal_overlap, ‘max’: max_overlap} where min_overlap, ideal_overlap, and max_overlap are all integers, with min_overlap <= ideal_overlap <= max_overlap.
original_max_len (int): The original maximum fragment length.
cutpoints (list of tuples, optional): Accumulator for storing fragment cutpoints.
time_limit (float, optional): The maximum time allowed for the operation, in seconds.
start_time (float, optional): The start time of the operation.
- Returns:
list of tuples or “TIME_LIMIT_EXCEEDED”: The list of fragment cutpoints if successful; otherwise, “TIME_LIMIT_EXCEEDED” if the time limit is exceeded, or None if no valid fragmentation pattern is found.
- alphafragment.fragmentation_methods.validate_fragmentation_parameters(protein, length, overlap)
Validates the parameters used for protein fragmentation.
- Parameters:
protein (Protein): The protein object to be fragmented.
length (dict): Dictionary containing the ideal, minimum, and maximum length values, in the format: {‘min’: min_len, ‘ideal’: ideal_len, ‘max’: max_len} where min_len, ideal_len, and max_len are all integers, with min_len <= ideal_len <= max_len.
overlap (dict): Dictionary containing the ideal, minimum, and maximum overlap values, in the format: {‘min’: min_overlap, ‘ideal’: ideal_overlap, ‘max’: max_overlap} where min_overlap, ideal_overlap, and max_overlap are all integers, with min_overlap <= ideal_overlap <= max_overlap.
- Returns:
None
- Raises:
ValueError: If any of the parameter validations fail.
TypeError: If the protein input is not an instance of the Protein class.