Tutorial: CellOrganizer in 45 Minutes

Introduction

CellOrganizer is a software package that learns generative models of cell organization from fluorescence micrographs. These models are useful for modeling the dependency between compartments of the cell, allowing for a compact representation of cell geometries present in cell images and generating image of geometries useful for spatially realistic biochemical simulations. There are two main functions which this tutorial will cover: img2slml, the top-level function to train a generative model of cell morphology, and slml2img, the top-level function to generate an instance from a trained model.

Whom is this tutorial for?

This tutorial was written for people who have experience with fluorescence microscopy, no experience with CellOrganizer and possibly some experience with MATLAB, generative models, or cell modeling. Users should be interested in learning how to use the automated modeling tools provided by CellOrganizer to explore their image data.

Resources

Software

CellOrganizer

Other Software

ImageJ - This is great software for viewing your images and those synthesized from CellOrganizer. This tutorial uses ImageJ in some spots.

Prerequisites

Attention

CellOrganizer is only support on Windows through CellOrganizer for Docker

  • An OS X, Linux or Unix operating system
  • MATLAB installation (MATLAB 2014a or newer) with the following toolboxes:
    • Bioinformatics Toolbox
    • Computer Vision System Toolbox
    • Control System Toolbox
    • Curve Fitting Toolbox
    • Image Processing Toolbox
    • Mapping Toolbox
    • Optimization Toolbox
    • Robust Control Toolbox
    • Signal Processing Toolbox
    • Simulink
    • Simulink Design Optimization
    • Statistics and Machine Learning Toolbox
    • System Identification Toolbox
    • Wavelet Toolbox
  • Some basic familiarity with writing scripts/programming (preferably in MATLAB).

Requirements for inputs for building models

The main function that builds a generative model is called img2slml. This function has four input arguments

  • dna_membrane_images
  • cell_membrane_images
  • protein_channel_images
  • options_structure

The first three input parameters are

  • a string containing wildcards, e.g. /path/to/images/*.tiff
  • a cell array of strings that point to each file, e.g. {‘/path/to/images/1.tiff’, ‘/path/to/images/2.tiff’};
  • a cell array of function handles where each function returns a 3D array that corresponding to each image in the list

The fourth argument options_structure is a Matlab structure that contain the fields necessary for you to train the model in question.

In general, the images should

  • be compatible with BioFormats.
  • contain only a single cell OR have a single cell region defined by an additional mask images
  • contain channel(s) for fluorescent marker(s) appropriate for the desired type of model (typically, a channel for nuclear shape (e.g., DAPI, Hoechst, tagged histone) channels, cell shape (e.g., a soluble cytoplasmic protein, a plasma membrane protein, or autofluorescence), and a specific organelle)

If your images are valid OME.TIFF files with regions of interest (ROI), then you can use the helper function get_list_of_function_handles_from_ometiff to retrieve a list of function handles. Each function handle should be able to return a 3D matrix when called using feval. For example

Hence we can use this helper function to generate input arguments for the function img2slml. For example

Setup

Step 0: Download the most recent version of CellOrganizer

The software can be downloaded from the CellOrganizer homepage. Make sure to download the version including images, since we will use these images soon. After downloading the CellOrganizer source code, unzip the folder, and copy the resulting folder into the “Documents” → “MATLAB” directory.

Step 1: Add the CellOrganizer directory to the Path

You should see the folder appear in the “Current Folder” in MATLAB on the left side. If it doesn’t, make sure that your file path is “Users” → your user name → “Documents” → “MATLAB”.

To ensure that MATLAB can access the images and files contained within the CellOrganizer folder, right click on “cellorganizer_2.5.2” on the left side of the MATLAB window and select “Add to Path” → “Selected Folders and Subfolders”.

Step 2: Adding images

Images downloaded as part of the CellOrganizer download can be found in “Documents” → “MATLAB” → “cellorganizer_2.5.2” → “images”.

If you don’t have your own images and did not download the full version of CellOrganizer in Step 0, then you can download some samples here <http://murphylab.web.cmu.edu/data/Hela/3D/multitiff/3DHela_LAM.tgz>. (Note: The whole collection is 2.0 GB.) These are 3D HeLa images with a nuclear stain (channel 0), cell stain (channel 1) and protein stain (channel 2). The tagged protein is `LAMP2<https://en.wikipedia.org/wiki/LAMP2>`_, a lysosomal protein.

Optionally, to decrease training time, set aside 10 to 15 images that will not be used.

Training Models

img2slml.m, contained in the main cellorganizer folder, is the primary function used for training a model from cellular images. It takes 5 inputs: a flag describing the dimensionality of the data (i.e. 2D or 3D; this tutorial describes only 3D functionality), images for the nuclear channel, images for the cell shape channel, images for the protein channel (optional) and options used to change various model settings. The training portion of this tutorial covers the very basic setup required to get img2slml up and running.

Step 0: Create a “scratch” script

Click “New” → “New Script”, and save your file as tutorial_train.m (making sure that the file is saved to the “Documents” → “MATLAB” path, but not inside the “cellorganizer” folder). Instead of typing the commands that follow into the Command Window, type (or copy and paste) them into the script that you just created. This will keep track of what you have done so far and provide a resource for later use of CellOrganizer.

Step 1: Create variables containing your images

We next need to tell CellOrganizer which cellular images we would like to use. To make life easier in the future, let’s start by defining a variable that contains the path to the directory where our images for the project are going to be stored. You can find raw 3D images for HeLa cells in the path below, which we will rename as img__dir:

img_dir = ‘/Users/admin/Documents/MATLAB/cellorganizer_2.5.2/images/HeLa/3D/raw’;

We would like to select just the “LAM” image files found within this folder in order to train our model. There are three ways to do this depending on how you have stored your images, each of which having its own strengths: string wildcards, a cell array of file paths, and a cell array of function handles.

Option 1 (easiest)

String wild-cards: If your files are named in some basic pattern (as the LAM files are), then wildcards are the easiest way to get your file information into CellOrganizer. All of the LAM files have the format “LAM_cellX_chY_t1.tif”, where “X” is the number of the image (1-50) and “Y” is the number of the channel (0, 1, or 2 based on nuclear, cell, and protein channels). Therefore, we will split the images based on their channel and can create an array of image names for each channel using a wildcard as follows, where “*” indicates “any number”:

nuc_img_paths = [img_dir ‘/LAM_cell*_ch0_t1.tif’]; cell_img_paths = [img_dir ‘/LAM_cell*_ch1_t1.tif’]; prot_img_paths = [img_dir ‘/LAM_cell*_ch2_t1.tif’];

Option 2 (advanced)

Cell-array of string paths: Alternatively, you can store the images as individual paths in a cell array. Since there are 50 images, we will loop through the directory and store each name in an element of a cell array. There are more “programmatically correct” ways to do this, but this is the most direct. For the sake of training time, we’ll iterate over only the first 15 images.

for i = 1:15
nuc_img_paths{i} = [img_dir ‘/LAM_cell’ num2str(i) ‘_ch0_t1.tif’]; cell_img_paths{i} = [img_dir ‘/LAM_cell’ num2str(i) ‘_ch1_t1.tif’]; prot_img_paths{i} = [img_dir ‘/LAM_cell’ num2str(i) ‘_ch2_t1.tif’];

end

Option 3 (even more advanced)

Function handles: If you’re very comfortable with MATLAB, you can pass a cell-array of anonymous function handles as your images into CellOrganizer. If the previous sentence doesn’t make any sense to you, it’s probably best that you skip this part of the tutorial. An example of using function handles would be:

for i = 1:15
        nuc_img_paths{i} = @() ml_readimage([img_dir '/LAM_cell' num2str(i)...
        '_ch0_t1.tif']);
        cell_img_paths{i} = @() ml_readimage([img_dir '/LAM_cell' num2str(i)...
        '_ch1_t1.tif']);
        prot_img_paths{i} = @() ml_readimage([img_dir '/LAM_cell' num2str(i)...
        '_ch2_t1.tif']);
end

Here we’re using the CellOrganizer provided function ml_readimage to read in and return the actual image matrix, but any function that returns the actual image matrix of data will work.

Step 2: Set up the option structure

The option structure tells CellOrganizer how you want to build a model, and allows for option input. Most of the options have default values, so we don’t have to set them manually for this tutorial. However, we do need to know the pixel resolution of the images and a filename to save the resulting model in. To define the appropriate options, we create a struct variable called train_options and set its fields accordingly:

% this is the pixel resolution in um of the images
train_options.model.resolution = [0.049, 0.049, 0.2000];

% this tells CellOrganizer what channels to build models for
train_options.train.flag = 'all';

% this is the filename to be used to save the model in
train_options.model.filename = 'model.mat';

The option train_options.model.filename defines where the .mat file containing the resulting model should be saved. By setting train_options.train.flag to 'all' we specify training a model that trains a nuclear shape, cell shape and protein distribution model. We can also specify the train flag as 'framework' to train just a nuclear shape and cell shape model (and we would therefore no longer need to provide protein images), or set the flag to 'nuc' and just train a nuclear shape model (and not have to provide cell shape images).

So far we have the bare minimum requirements for setting up a model. We will set one more option to speed up the tutorial.:

train_options.model.downsampling = [4,4,1];

This downsamples our input images by 4 in the x- and y-dimensions, decreasing the memory used for the tutorial.

Step 3: Add a model type

We also need to specify what type of model we would like to train. We do this by adding additional lines to the options structure:

train_options.nucleus.type = ‘cylindrical_surface’; train_options.cell.type = ‘ratio’; train_options.protein.type = ‘vesicle’; train_options.debug = true;

Now that we have everything together, we can train the model:

img2slml('3D', nuc_img_paths, cell_img_paths, prot_img_paths, train_options);

If your model building options don’t require one of the image types (e.g., the protein images are not needed in this case), you can just use empty brackets:

img2slml(‘3D’, nuc_img_paths, cell_img_paths, [], train_options);

(Note: make sure that your inputs to img2slml correspond to your setting of train.flag)

Step 4: Run your script

Press the play button on the top of the MATLAB window or type the name of your script into the Command Window. If you used a lot of images or did not aggressively downsample your images this may take some time.

Step 5: Your model

Now that your run of CellOrganizer has completed without error, you should have a .mat file named model.mat in the directory in which you ran the code. Congratulations, you made it! If you load that file into your workspace, then you’ll see that this is another struct with fields. This is the model of your cell images. You’ll notice that it’s a lot smaller in file size than the collection of source images you used to train it. Take some time to explore these fields.

Synthesizing an Image from a Model

We will next describe how to synthesize a cell shape in CellOrganizer. The main function here is slml2img.m. This function takes two inputs: a cell array of paths to the models from which we want to synthesize an image and a list of options.

Step 0: Create a “scratch” script

Here we create a new script and call it tutorial_synthesis.m.

Step 1: Set up the model and option inputs

Start by defining two variables: a cell-array containing the path to the model you created in the Training section, and a new option structure (different from the one used for learning). If you followed the instructions in the Training section, then you should be able to create a variable using the same path as in Step 2 of Training.:

model_path = {'model.mat'};

Alternatively you can generate images from one of the models provided in the CellOrganizer distribution, such as the model of the lysosomal protein LAMP2 in HeLa cells:

model_path = {'./cellorganizer_2.5.2/models/3D/lamp2.mat'};

The option structure is set up similarly to that in Training. Here we create a new struct called synth_options and define where we want the images to be saved, a prefix for the saved files and the number of images desired:

%save into the current directory
synth_options.targetDirectory = './';

synth_options.prefix = 'synthesis_tutorial';

%generate two images
synth_options.numberOfSynthesizedImages = 2;

Step 2: Controlling the random seed (optional)

CellOrganizer generates synthetic images by randomly drawing parameter values from the distributions contained in the specific model. The random numbers are provided by the MATLAB rand function, and the specific sequence of random numbers the program will get when it calls rand can be controlled by specifying what is termed a random seed (which can be any number) using the rng function:

rng(666);

If we do this, the specific images that CellOrganizer will generate will be the same each time our script is run. If we don’t, the images will be based upon the current state of the random number generator, and we may get different images each time we run the script.

Now that we have everything set up, we can generate an image or two!

Step 3: Synthesize

As the last line of our script, we call slml2img.m:

slml2img(model_path, synth_options);

Save your file and run it. This may take a while, especially if you have decided to generate many images.

Step 4: Check out your images

Now that the image generation is completed, you can view them. In the current directory you should see a folder named “synthesis_tutorial”, and in that should be two directories, “cell1” and “cell2”, each of which contain images corresponding to each channel drawn from the model you trained in the Training section. While these images can be opened in ImageJ, we are going to demonstrate two useful tools in CellOrganizer that we frequently use to explore our synthesized images.

First we’re going to create an indexed image by combining output images.:

%read in each image to a variable
im_cell = ml_readimage('<path to cell image>');
im_dna = ml_readimage('<path to nucleus image>');
im_prot = ml_readimage('<path to protein image>');

%create an empty image
im_indexed = zeros(size(im_cell));

%Set the cell shape, nuclear shape and protein shape values to 1,2,3 respectively
im_indexed(im_cell>0) = 1;
im_indexed(im_dna>0) = 2;
im_imdexed(im_prot>0) = 3;

Now that we have an image, we can view it with the function img2vol (after downsampling it by a factor of two in X, Y and Z:

figure, img2vol(ml_downsize(im_indexed, [2,2,2]));

Congratulations! You have created a synthetic cell geometry!

Visualizing Model Results

Although generating synthetic cell shapes is fun (and useful for doing cell simulations), the real power of CellOrganizer lies in it’s ability to describe distributions of cell geometries and the organization of components within them. Here we will demonstrate how to use CellOrganizer to generate some interesting analysis results.

Background

Upon exposure to Bafilomycin A1, microtubule associated protein light chain 3 (LC3) localizes into autophagosomes for degradation and forms punctate structures.

../_images/bg.png

Images of eGFP-LC3 tagged RT112 cells at 40x under normal conditions (left) and in a 50uM Bafilomycin condition (right).

Let’s say we are curious as to how the number of autophagosomes changes with Bafilomycin concentration. Given a collection of images under different concentrations we can segment out the cell shapes and train a model for the cells contained in each image. It just so happens that we have already done that, and the models and associated drug concentrations, can be found here.

That .mat file has two variables saved in it. One is a list of drug micromolar concentrations, and the other is a list of models trained with images of cells at those concentrations (like the above two images). For each model, we’re going to plot the number of autophagosomes versus the Bafilomycin concentration.

Step 0: Create another “scratch” script

Lets call this one plotObjsByModel.m

Step 1: Load the model data into the workspace

Load the .mat file you downloaded into the Workspace by double clicking on it. You should see two variables, conc, and models. These are the variables that contain the drug concentrations and trained CellOrganizer models of cells exposed to Bafilomycin at those concentrations. You can access the first model by typing models{1}, the second model by models{2} and so on. You will see that there are a lot of components to these models, but we’re just interested in the number of objects under each condition.

Step 2: Plot the average number of autophagosomes for each model

We must access the component of the model that contains the distribution for the number of objects. We can access that in the first model with:

models{1}.proteinModel.objectModel.numStatModel

The output should be:

ans =

name: 'gamma'
alpha: 2.7464
beta: 19.291

This means that the distribution over the number of objects contained in the cells that were used to train this model is modeled as a gamma distribution with two parameters, alpha, and beta. It just so happens that the mean of a gamma distribution is the product of these two parameters. Let’s write a loop to get the average number of autophagosomes from each model:

for i = 1:length(models)
numObjsModel = models{i}.proteinModel.objectModel.numStatModel; avgObjects(i) = numObjsModel.alpha*numObjsModel.beta;

end

Now we simply plot the number of objects versus drug concentration. Here we will use a semilog plot to make visualization easier:

semilogx(conc, avgObjects, 'linestyle', 'none', 'marker', '.', 'markersize', 10)
ylabel('Mean Number of Objects Per Model')
xlabel('[Bafilomycin A1] (uM)')