Utility functions for protein fragmentation

” Internal utility functions for protein fragmention process.

Functions:
  • validate_fragmentation_parameters: Validates the parameters used for protein fragmentation.

  • merge_overlapping_domains: Merges overlapping domains within a list of domains.

  • check_valid_cutpoint: Helper function to validate potential fragment boundaries.

  • find_next_start: Finds the next valid fragment start position.

  • recursive_fragmentation: Main function for recursively generating fragments.

  • break_in_half: Splits a given Protein or ProteinSubsection object into two, ensuring no domains are broken and that the new subsections overlap.

Dependencies:
  • time: Used for setting a time limit for recursive fragmentation.

  • Domain: A class representing a domain within a protein sequence.

  • Protein: A class representing a protein sequence.

  • ProteinSubsection: A class representing a subsection of a protein sequence.

alphafragment.fragmentation_methods.break_in_half(protein, length, overlap)

Splits a given Protein or ProteinSubsection object into two subsections, ensuring no domains are broken and that the subsections overlap. The split is as close to the center as possible.

Parameters:
  • protein (Protein or ProteinSubsection): The protein or protein subsection object to be split.

  • length (dict): Dictionary containing the ideal, minimum, and maximum length values.

  • overlap (dict): Dictionary containing the ideal, minimum, and maximum overlap values.

Returns:
  • tuple: A tuple containing two new ProteinSubsection objects if valid; otherwise, None.

alphafragment.fragmentation_methods.check_valid_cutpoint(res, domains, sequence_end)

Checks if a slicing index is a valid cutpoint.

Parameters:
  • res (int): The residue position to check (will be sliced before this residue).

  • domains (list of Domain): The domains within the protein.

  • sequence_end (int): The last residue position in the protein sequence.

Returns:
  • bool: True if the residue position is a valid cutpoint; False otherwise.

alphafragment.fragmentation_methods.find_next_start(res, protein, domains, overlap)

Finds the next valid fragment start position.

Parameters:
  • res (int): The current residue position.

  • protein (Protein): The protein object to be fragmented.

  • domains (list of Domain): The domains within the protein.

  • overlap (dict): Dictionary containing the ideal, minimum, and maximum overlap values, in the format: {‘min’: min_overlap, ‘ideal’: ideal_overlap, ‘max’: max_overlap} where min_overlap, ideal_overlap, and max_overlap are all integers, with min_overlap <= ideal_overlap <= max_overlap.

Returns:
  • int or None: The next valid fragment start position if found; otherwise, None.

alphafragment.fragmentation_methods.merge_overlapping_domains(domains)

Merges overlapping domains within a list of domains.

Parameters:
  • domains (list of Domain): List of domain objects.

Returns:
  • list of Domain: A list of domains where overlapping domains have been merged into single entries.

alphafragment.fragmentation_methods.recursive_fragmentation(protein, domains, fragment_start, length, overlap, original_max_len, cutpoints=None, time_limit=None, start_time=None)

Recursively splits a protein sequence into overlapping fragments, avoiding breaking domains. If the process exceeds a specified time limit, it returns None, signaling that the protein should be split further.

Parameters:
  • protein (Protein): The protein object to fragment.

  • domains (list of Domain): The list of domains within the protein - doesn’t use protein.domain_list as overlapping domains should be merged.

  • fragment_start (int): The starting position for fragmentation.

  • length (dict): Dictionary containing the ideal, minimum, and maximum length values.

  • overlap (dict): Dictionary containing the ideal, minimum, and maximum overlap values, in the format: {‘min’: min_overlap, ‘ideal’: ideal_overlap, ‘max’: max_overlap} where min_overlap, ideal_overlap, and max_overlap are all integers, with min_overlap <= ideal_overlap <= max_overlap.

  • original_max_len (int): The original maximum fragment length.

  • cutpoints (list of tuples, optional): Accumulator for storing fragment cutpoints.

  • time_limit (float, optional): The maximum time allowed for the operation, in seconds.

  • start_time (float, optional): The start time of the operation.

Returns:
  • list of tuples or “TIME_LIMIT_EXCEEDED”: The list of fragment cutpoints if successful; otherwise, “TIME_LIMIT_EXCEEDED” if the time limit is exceeded, or None if no valid fragmentation pattern is found.

alphafragment.fragmentation_methods.validate_fragmentation_parameters(protein, length, overlap)

Validates the parameters used for protein fragmentation.

Parameters:
  • protein (Protein): The protein object to be fragmented.

  • length (dict): Dictionary containing the ideal, minimum, and maximum length values, in the format: {‘min’: min_len, ‘ideal’: ideal_len, ‘max’: max_len} where min_len, ideal_len, and max_len are all integers, with min_len <= ideal_len <= max_len.

  • overlap (dict): Dictionary containing the ideal, minimum, and maximum overlap values, in the format: {‘min’: min_overlap, ‘ideal’: ideal_overlap, ‘max’: max_overlap} where min_overlap, ideal_overlap, and max_overlap are all integers, with min_overlap <= ideal_overlap <= max_overlap.

Returns:
  • None

Raises:
  • ValueError: If any of the parameter validations fail.

  • TypeError: If the protein input is not an instance of the Protein class.