cttrajectory module

This is where the cttrajectory shinanigans occurs

cttrajectory.py

This is where some stuff will be described

camparitraj.cttrajectory.testfunct()

This is a test function for documentation - do we build from source?

class camparitraj.cttrajectory.CTTrajectory(trajectory_filename=None, pdb_filename=None, TRJ=None, protein_grouping=None, pdblead=False)

CTrajectory class that holds a single simulation trajectory object.

__init__(trajectory_filename=None, pdb_filename=None, TRJ=None, protein_grouping=None, pdblead=False)

CAMPARITraj trajectory object initializer.

CAMPARITraj will, by default, extract out the protein component from your trajectory automatically, which lets you ask questions about the protein only (i.e. without salt ions getting in the way).

Note that by default the mechanism by which induvidual proteins are identified is by cycling through the unique chains and determining if they are protein or not. You can also provide manual grouping via the protein_grouping option, which lets you define which residues should make up an induvidual protein. This can be useful if you have multiple proteins associated with the same chain, which happens in CAMPARI if you have more than 26 separate chains (i.e. every protein after the 26th is the ‘Z’ chain).

trajectory_filename [string] Filename which contains the trajectory file of interest. Normally this is __traj.xtc or __traj.dcd

pdb_filename [string] Filename which contains the pdb file associated with the trajectory of interest. Normally this is __START.pdb

protein_grouping [list of lists of ints] {None} Lets you manually define protein groups to be considered independently

pdblead [Bool] {False} Lets you set the PDB file (which is normally ONLY used as a topology file) to be the first frame of the trajectory. This is useful when the first PDB file holds some specific reference information which you want to use (e.g. RMSD or Q).

TRJ [MDTraj trajectory] {None} It is sometimes useful to re-defined a trajectory and create a new CTTraj object from that trajectory. This could be done by writing that new trajectory to file, but this is extremely slow due to the I/O impact of reading/writing from disk. If an mdtraj trajectory objected is passed, this is used as the new trajectory from which the CTTrajectory object is constructed.

export_distanceMap(proteinID, filename)

Function which returns two matrices with the mean and standard deviation distances between the complete set of intra-residue distances in a single protein molecule.

This explicitly defines the non-redundant map, so you only get a matrix with one half filled in.

(distanceMap, STDMap)

distanceMap [numpy matrix] Is an [n x m] matrix where n and m are the number of proteinID1 residues and proteinID2 residues. Each position in the matrix corresponds to the mean distance between those two residues over the course of the simulation

stdMap [numpy matrix] Is an [n x m] matrix where n and m are the number of proteinID1 residues and proteinID2 residues. Each position in the matrix corresponds to the standard devaiation associated with the distances between those two residues

export_intraChainDistanceMap(proteinID1, proteinID2, filename, resID1=None, resID2=None)

Function which writes an intrachain distance map to a CSV file. Note this distanceMap is not returned, but can be generated by the .get_intrachainDistanceMap() function

This computes the (full) intramolecular distance map, where the “distancemap” function computes the intermolecular distance map.

Obviously this only makes sense if your system has two seperate protein objects defined, but in principle the output from

get_intracChainDistanceMap(0,0)

would be the same as

get_distanceMap(0)

This is actually a useful sanity check!

proteinID1 [int] The ID of the first protein of the two being considered, where the ID is the proteins position in the self.proteinTrajectoryList list

proteinID2 [int] The ID of the second protein of the two being considered, where the ID is the proteins position in the self.proteinTrajectoryList list

filename [string] Filename which your file should be saved to (.csv extension is added automatically)

resID1 [list of integers] None Is the list of residues from protein 1 we’re considering. If no option is provided assume we’re using all of the residues in protein 1

resID2 [list of integers] None Is the list of residues from protein 2 we’re considering. If no option is provided assume we’re using all of the residues in protein 2

(distanceMap, STDMap)

distanceMap [numpy matrix] Is an [n x m] matrix where n and m are the number of proteinID1 residues and proteinID2 residues. Each position in the matrix corresponds to the mean distance between those two residues over the course of the simulation

stdMap [numpy matrix] Is an [n x m] matrix where n and m are the number of proteinID1 residues and proteinID2 residues. Each position in the matrix corresponds to the standard devaiation associated with the distances between those two residues

export_localCollapse(proteinID1, filename, windowSize=10, bins=array([0., 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1., 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2., 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3., 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4., 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5., 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6., 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7., 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, 8., 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 9., 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9]))

Write a local collapse vector to file.

localCollapse calculates a vectorial representation of the radius of gyration along a polypeptide chain. This makes it very easy to determine where you see local collapse vs. local expansion.

proteinID1 [int] The ID of the protein to be assessed

filename [string] Filename which your file should be saved to (.csv extension is added automatically)

windowSize [int] 10 Size of the window over which conformations are examined. Default is 10.

bins a range of values (np.arange or list) spanning histogram bins.

3

export_samplingGoodness(proteinID1, filename, fragmentSize=10, stride=500, bins=None)

Write a samplingGoodness vector to file.

Sampling goodness calculates a trajectory’s Sigma^n vector, where n defines the windowsize. The Sigma vector provides an intuitive way to examine a vectorial profile to describe the sampling associated with a protein. This is well suited for IDPs, where good sampling is expected to correspond to a significant amount of heterogeneity. The Sigma vector helps identify regions along the protein sequence where sampling may be poor.

Some comments on the parameters:

stride: The Sigma Vector is calulated by

  1. Begining at the N-terminus, defining a window of n residues

2) The RMSD between every kth frame and everyt other frame is determine for the window. The coarseness of the trajectory (i.e. what value k is) is defined manually 3) The sliding window moves one residue and the analysis is repeated

proteinID1 [int] The ID of the protein to be assessed

filename [string] Filename which your file should be saved to (.csv extension is added automatically)

fragmentSize [int] 10 Size of the window over which conformations are examined. Default is 10. Note that RMSD has unpleasant scaling properties, such that larger windows may be less useful. We are concerened primarily with local conformational behaviour as global properties do not pertain to an IDP.

stride [int] 500 The Sigma vector is calculated over an all.vs.all for each frame of the trajectory. This becomes computationally too expensive when larger trajectories are involved, so the stride defines the granularity used in one of the dimensions. For more information see the main description above.

3

get_distanceMap(proteinID)

Function which returns two matrices with the mean and standard deviation distances between the complete set of intra-residue distances in a single protein molecule.

This explicitly defines the non-redundant map, so you only get a matrix with one half filled in.

(distanceMap, STDMap)

distanceMap [numpy matrix] Is an [n x m] matrix where n and m are the number of proteinID1 residues and proteinID2 residues. Each position in the matrix corresponds to the mean distance between those two residues over the course of the simulation

stdMap [numpy matrix] Is an [n x m] matrix where n and m are the number of proteinID1 residues and proteinID2 residues. Each position in the matrix corresponds to the standard devaiation associated with the distances between those two residues

get_intraChainDistanceMap(proteinID1, proteinID2, resID1=None, resID2=None)

Function which returns two matrices with the mean and standard devaition distances between the residues in resID1 from proteinID1 and resID2 from proteinID2

This computes the (full) intramolecular distance map, where the “distancemap” function computes the intermolecular distance map.

Obviously this only makes sense if your system has two seperate protein objects defined, but in principle the output from

intracChainDistanceMap(0,0)

would be the same as

distanceMap(0)

This is actually a useful sanity check!

proteinID1 [int] The ID of the first protein of the two being considered, where the ID is the proteins position in the self.proteinTrajectoryList list

proteinID2 [int] The ID of the second protein of the two being considered, where the ID is the proteins position in the self.proteinTrajectoryList list

resID1 [list of integers] None Is the list of residues from protein 1 we’re considering. If no option is provided assume we’re using all of the residues in protein 1

resID2 [list of integers] None Is the list of residues from protein 2 we’re considering. If no option is provided assume we’re using all of the residues in protein 2

(distanceMap, STDMap)

distanceMap [numpy matrix] Is an [n x m] matrix where n and m are the number of proteinID1 residues and proteinID2 residues. Each position in the matrix corresponds to the mean distance between those two residues over the course of the simulation

stdMap [numpy matrix] Is an [n x m] matrix where n and m are the number of proteinID1 residues and proteinID2 residues. Each position in the matrix corresponds to the standard devaiation associated with the distances between those two residues

get_intrachain_interResidue_atomic_distance(proteinID1, proteinID2, R1, R2, A1='CA', A2='CA', stride=1, mode='atom', correctOffset=True)

Function which returns the distance between two specific atoms on two residues. The atoms selected are based on the ‘name’ field from the topology selection language. This defines a specific atom as defined by the PDB file. By default A1 and A2 are CA (C-alpha) but one can define any residue of interest.

We do not perform any sanity checking on the atom name - this gets really hard - so have an explicit try/except block which will warn you that you’ve probably selected an illegal atom name from the residues.

Distance is returned in Angstroms.

R1 [int] Residue index of first residue

R2 [int] Residue index of second residue

A1 [string] {CA} Atom name of the atom in R1 we’re looking at

A2 [string {CA} Atom name of the atom in R2 we’re looking at

stride [int] {1} Defines the spacing betwen frames to compare with - i.e. take every $stride-th frame. Setting stride=1 would mean every frame is used, which would mean you’re doing an all vs. all comparions, which would be ideal BUT may be slow.

mode [string] {‘atom’} Mode allows the user to define differnet modes for computing atomic distance. The default is ‘atom’ whereby a pair of atoms (A1 and A2) are provided. Other options are detailed below and are identical to those offered by mdtraj in compute_contacts

‘ca’ - same as setting ‘atom’ and A1=’CA’ and A2=’CA’, this uses the C-alpha atoms

‘closest’ - closest atom associated with each of the residues, i.e. the is the point

of closest approach between the two residues

‘closest-heavy’ - same as closest, except only non-hydrogen atoms are considered

‘sidechain’ - closest atom where that atom is in the sidechain. Note this requires

mdtraj version 1.8.0 or higher.

‘sidechain-heavy’ - closest atom where that atom is in the sidechain and is heavy.

Note this requires mdtraj version 1.8.0 or higher.

correctOffset [Bool] {True} Defines if we perform local protein offset correction or not. By default we do, but some internal functions may have already performed the correction and so don’t need to perform it again.