cttrajectory module

This is where the cttrajectory shinanigans occurs

cttrajectory.py

cttrajectory is the main class through which simulation trajectories are read in.

class camparitraj.cttrajectory.CTTrajectory(trajectory_filename=None, pdb_filename=None, TRJ=None, protein_grouping=None, pdblead=False, debug=False)[source]

CTrajectory class that holds a single simulation trajectory object.

__init__(trajectory_filename=None, pdb_filename=None, TRJ=None, protein_grouping=None, pdblead=False, debug=False)[source]

CAMPARITraj trajectory object initializer.

CAMPARITraj will, by default, extract out the protein component from your trajectory automatically, which lets you ask questions about the protein only (i.e. without salt ions getting in the way).

Note that by default the mechanism by which individual proteins are identified is by cycling through the unique chains and determining if they are protein or not. You can also provide manual grouping via the protein_grouping option, which lets you define which residues should make up an individual protein. This can be useful if you have multiple proteins associated with the same chain, which happens in CAMPARI if you have more than 26 separate chains (i.e. every protein after the 26th is the ‘Z’ chain).

Parameters
  • trajectory_filename (str) – Filename which contains the trajectory file of interest. Normally this is __traj.xtc or __traj.dcd.

  • pdb_filename (str) – Filename which contains the pdb file associated with the trajectory of interest. Normally this is __START.pdb.

  • TRJ (mdtraj.Trajectory) –

    It is sometimes useful to re-defined a trajectory and create a new CTTraj object from that trajectory. This could be done by writing that new trajectory to file, but this is extremely slow due to the I/O impact of reading/writing from disk. If an mdtraj trajectory objected is passed, this is used as the new trajectory from which the CTTrajectory object is constructed.

    Default = None

  • protein_grouping (list of lists of ints) –

    Lets you manually define protein groups to be considered independently.

    Default = None

  • pdblead (bool) –

    Lets you set the PDB file (which is normally ONLY used as a topology file) to be the first frame of the trajectory. This is useful when the first PDB file holds some specific reference information which you want to use (e.g. RMSD or Q).

    Default = False

  • debug (book) – Prints warning/help information to help debug weird stuff during initial trajectory read-in. Default = False.

__get_proteins(trajectory, debug)

Internal function that takes an MDTraj trajectory and returns a list of mdtraj trajectory objects corresponding to each protein in the system, ASSUMING that each protein is in its own chain.

The way this works is to cycle through each chain, identify if that chain contains protein or not, and if it does grab all the atoms in that chain and perform an atomslice using those atoms on the main trajectory.

Parameters

trajectory (mdtraj.Trajectory) – An already parsed trajectory object (i.e. checked for CAMPARI- relevant defects such as unitcell issues etc)

Returns

Returns a tuple with three lists:

proteinTrajectoryList - contains a list of 0 or more CTProtein objcts resid_offset_list - contains a list of 0 or more integers which are

resid offset values

atom_offset_list - contains a list of 0 or more integers which are

atom offset values

Note all three lists must be the same length (by definition)

Return type

tuple

__get_proteins_by_residue(trajectory, residue_grouping, debug)

Internal function which returns a list of mdtraj trajectory objects corresponding to each protein where we explicitly define the residues in each protein.

Unlike the __get_proteins() function, which doesn’t require any manual input in identifying the proteins, here we provide a list of groups, where each group is the set of residues associated with a protein.

The way this works is to cycle through each group, and for each residue in each group grabs all the atoms and uses these to carry out an atomslice on the full trajectory.

Parameters
  • trajectory (mdtraj.Trajectory) – An already parsed trajectory object (i.e. checked for CAMPARI- relevant defects such as unitcell issues etc)

  • residue_grouping (list of list of integers) – Must be a list containing one or more lists, where each internal list contains a set of monotonically increasing residues (which correspond to the full protein trajectory). In other words, each sub-list defines a single protein. The integer indexing here - importantly - uses the CAMPARITraj internal residue indexing, meaning that indexing begins at 0 from the first residue in the PDB file.

__readTrajectory(trajectory_filename, pdb_filename, pdblead)

Internal function which parses and reads in a CAMPARI trajectory

Read a trajectory file. This was separated out into its own function in case we want to add additional sanity checks during the file loading.

Notably older versions of CAMPARI mess up the unitcell length vectors, so will cause problems, but you can get around this by having GROMACS rebuild the trajectory. If this happens an error pops up but instructions on how to use GROMACS to fix it are presented.

Parameters
  • trajectory_filename (str) – Filename which contains the trajectory file of interest. File type is automatically detected and dealt with mdtraj’ ‘load’ command (i.e. md.load(filename, top=pdb_filename))

  • pdb_filename (str) – Filename which contains the pdb file associated with the trajectory of interest. This defines the topology of the system and must match the trajectory in terms of number of atomas

pdbleadbool

Also extract the coordinates from the PDB file and append it to the front of the trajectory. This is useful if you are starting an analysis where that first structure should be a reference frame but it’s not actually included in the trajectory file.

Returns

Returns an mdtraj trajectory object

Return type

mdtraj.traj

get_interchain_distance(proteinID1, proteinID2, R1, R2, A1='CA', A2='CA', stride=1, mode='atom')[source]

Function which returns the distance between two specific atoms on two residues, or between two residues based on mdtraj’ atomselection mode rules (discussed below). Required input are protein ID selectors and the resid being used. Resids should be used as would be normally used for the CTProtein objects associated with proteinID1 and proteinID2.

For inter-atomic distances, atoms are selected from the passed residue and their ‘name’ field from the topology selection language (e.g. “CA”, “CB” “NH” etc). By default CA atoms are used, but one can define any residue of interest. We do not perform any sanity checking on the atom name - this gets really hard - so have an explicit try/except block which will warn you that you’ve probably selected an illegal atom name from the residues.

For inter-residue distances the associated rules are defined by the ‘mode’ selector. By default mode is set to ‘atom’, which means the variables A1 and A2 are used (with CA as default) to define inter-residue distance. However, if one of the other modes are used the A1/A2 parameters are ignored and alternative rules for computing inter-residue distance are used. These modes are detailed below.

Distance is returned in Angstroms.

Parameters
  • R1 (int) – Residue index of first residue

  • R2 (int) – Residue index of second residue

  • A1 (str) – Atom name of the atom in R1 we’re looking at. Default = ‘CA’

  • A2 (str) – Atom name of the atom in R2 we’re looking at. Default=’CA’

  • stride (int) – Defines the spacing between frames to compare with - i.e. take every $stride-th frame. Setting stride=1 would mean every frame is used, which would mean you’re doing an all vs. all comparisons, which would be ideal BUT may be slow. Default = 1

  • mode (str) –

    Mode allows the user to define different modes for computing atomic distance.

    The default is ‘atom’ whereby a pair of atoms (A1 and A2) are provided. Other options are detailed below and are identical to those offered by mdtraj in compute_contacts.

    Note that if modes other than ‘atom’ are used the A1 and A2 options are ignored.

    • ’ca’ - same as setting ‘atom’ and A1=’CA’ and A2=’CA’, this uses the C-alpha atoms.

    • ’closest’ - closest atom associated with each of the residues, i.e. the is the point of closest approach between the two residues.

    • ’closest-heavy’ - same as ‘closest’, except only non-hydrogen atoms are considered.

    • ’sidechain’ - closest atom where that atom is in the sidechain. Note this requires mdtraj version 1.8.0 or higher.

    • ’sidechain-heavy’ - closest atom where that atom is in the sidechain and is heavy. Note this requires mdtraj version 1.8.0 or higher.

Returns

Returns a 1D numpy array with the distance-per-frame betwee the specified residues

Return type

np.array

get_interchain_distance_map(proteinID1, proteinID2, resID1=None, resID2=None)[source]

Function which returns two matrices with the mean and standard deviation distances between the residues in resID1 from proteinID1 and resID2 from proteinID2

This computes the (full) intramolecular distance map, where the “distancemap” function computes the intermolecular distance map.

Specifically, this allows the user to define two distinct chains (i.e. an “interchain” distance map).

Obviously this only makes sense if your system has two separate protein objects defined, but in principle the output from:

intra_chain_distance_Map(0,0)

would be the same as:

proteinTrajectoryList[0].get_distance_map()

This is actually a useful sanity check!

Parameters
  • proteinID1 (int) – The ID of the first protein of the two being considered, where the ID is the proteins position in the self.proteinTrajectoryList list.

  • proteinID2 (int) – The ID of the second protein of the two being considered, where the ID is the proteins position in the self.proteinTrajectoryList list

  • resID1 (list of integers, default=None) – Is the list of residues from protein 1 we’re considering. If this is left as None (default), then it is assumed that all residues in proteinID1 should be used

  • resID2 (list of integers, default=None) – Is the list of residues from protein 2 we’re considering.If this is left as None (default), then it is assumed that all residues in proteinID2 should be used

Returns

  • tuple (tuple containing distanceMap and STDMap)

  • distanceMap (numpy matrix) – Is an [n x m] matrix where n and m are the number of proteinID1 residues and proteinID2 residues. Each position in the matrix corresponds to the mean distance between those two residues over the course of the simulation.

  • stdMap (numpy matrix) – Is an [n x m] matrix where n and m are the number of proteinID1 residues and proteinID2 residues. Each position in the matrix corresponds to the standard devaiation associated with the distances between those two residues.