# What Can I Do With CellOrganizer?

CellOrganizer has two basic functionalities: constructing generative models of the organization of cells, and generating synthetic images or movies of cells using those models. These two functionalities can be used to answer many cell biology questions. Since generative models seek to capture a complete description of a cell pattern (at least by some definition of complete), the parameters of such models can provide a better basis for analysis than descriptive features that may be incomplete and/or redundant. Three broad categories of tasks that can be done using results from CellOrganizer are described below, illustrated by examples from published work. The examples are subdivided into three categories:

- Analysis of the
__parameterization__of individual cells - Comparison of
__models__of__populations of cells__between different cell types or different conditions - Generation of synthetic cell images and
__geometries__

**Analysis of the **__parameterization__ of individual cells

__parameterization__of individual cells

*Estimating variation in number of objects or structures in a cell population*

CellOrganizer models often treat each organelle or structure as a distinct object. Thus one of the key components of the models is the distribution of the number of objects per cell within a cell population. An example from the paper describing the learning of a model of microtubule distribution (model type “network/microtubule_growth”) is shown in Figure 1.

Figure 1. Histogram of the estimated number of microtubules per cell in HeLa cells. From (Shariff et al., 2010)

*Comparing individual cells in different cell lines using non-parametric shape models*

Generative models can be either parametric, in which a set of parameters describing important properties are defined and their values are extracted from each cell and modeled (as was the case for microtubules above), or non-parametric, in which cells are compared directly. An example of the latter is diffeomorphic models (model type “framework/diffeomorphic”), in which cells are described implicitly by their similarity in shape to each other. In the simplest case, this involves measuring how much deformation is required to morph each pair of cells to match each other. This amount of deformation is interpreted as a *distance* between each pair and used to create a *distance* matrix for all cells. This matrix can be “projected” or *embedded* into a small number of dimension by standard methods so that the (approximate) relationships can be visualized. For example, comparison of the cell and nuclear shape distributions of different YFP-tagged clones of H1299 cells generated by the Alon group (Sigal et al., 2007) by creating diffeomorphic models is shown in Figure 2. Note that this scatter plot is implicitly a parametric comparison: the things that are being displayed for each cell are their two-dimensional coordinates in the embedded shape space. It is referred to as non-parametric because the things being compared are not defined in advance but are learned directly.

Figure 2. A cell and nuclear shape analysis for H1299 cell clones. Individual cells are shown, colored by the tagged clone they are from. The axes are arbitrary variables chosen to maximize the ability to represent distances between cells. Note how different tagged clones show different shapes (compare the red cells tagged in COX7C to the yellow and orange cells tagged in RAVER1 and RPL39, respectively). From (Johnson et al., 2015a).

## **Comparison of **__models__ between different markers, cell types or conditions

__models__between different markers, cell types or conditions

The generative models of a population of cells produced by CellOrganizer consist of estimated *probability density distributions* for combinations of parameter values (learned from the combination of parameter values seen in the training images). Depending on the assumptions made (or learned) while creating the model, these probability density distributions may consider each parameter independently or consist of the joint probability of parameter pairs or sets.

*Comparing one or more *__parameters__ of models between cell lines

__parameters__of models between cell lines

In addition to looking at variation in a parameter *within* a cell line, we can compare the distributions of that parameter *across* cell lines. An example is shown in Figure 3.

Figure 3. Comparison of the bivariate distributions of the estimated parameters of a microtubule model for eleven cell lines. The ellipses are centered at the bivariate means of the two parameters and contain about 67% to 80% of the cells for a particular cell line (at most 1.5 standard deviations from the means). From (Li et al., 2012)

Of course, we can also compare cell lines using the implicit parameters from non-parametric models. Figure 2 above shows comparison of individual cells from different cell lines. However, it can be difficult to draw conclusions from such scatter plots, especially if they involve large numbers of cells. An alternative is to compare the *distribution* learned when constructing a generative model from all cells of a given cell line. Those distributions can be represented by very simple statistical models (such as a mean vector and covariance matrix) or by more complex empirical probability densities. An example of the former approach is shown in Figure 4 for the same cell lines shown in Figure 2.

Figure 4. Contour lines for the 50th (thick lines) and 90th (thin lines) percentiles of probability density obtained via kernel density estimates are shown for the different YFP-tagged H1299 clones shown in Figure 2. From (Johnson et al., 2015a)

*Comparing *__whole models__ between cell lines

__whole models__between cell lines

These examples of comparing specific parameters of a model might lead to incorrect conclusions about the overall similarity of two models. For example, cell lines that appear very similar in the first two dimensions of a shape space (e.g., the yellow and orange cell lines in Figure 4) might be quite different in the third (or higher) dimension. As an alternative to comparing parameters directly, we can compare how likely it is that the distributions of cells in two cell lines are the same. This can be done in at least two ways. We can construct a generative model for one cell line and then ask how likely it is that each cell from the other cell line would have been generated by that model. This approach was used to propose an assignment of an unannotated protein to a specific punctate organelle (Johnson et al., 2015b). An alternative is to measure the *total variation* between two models by summing the difference in probability between the two models for many randomly chosen samples from the models. This approach has been used to obtain an overall measure of the similarity between different punctate organelle patterns (Li et al., 2016).

Table 1. Dissimilarity between punctate subpatterns in cell line U-20S with values close to 1 meaning absolutely distinguishable, while values close to 0 are indistinguishable. The values show the total variation between the models of a pair of proteins. From (Li et al., 2016)

## **Generation of synthetic cell **__geometries__

__geometries__

Since the models learned by CellOrganizer are *generative*, we can of course use them to create synthetic cells. All current CellOrganizer models attempt to describe the underlying properties of a cell or its structures in a way that is distinct from the way in which those cells or structures appear in an image taken with a particular microscope. Thus, for example, all objects of type “vesicle/gmm” are ellipsoids and all objects of type “network/microtubule_growth” are lines. This is true even though the vesicles (or microtubules) from which the model was learned were composed of pixels or voxels – the idea is to estimate the most likely shape/positions that they would have had in an idealized, continuous reality of a cell. This is a bit like doing optical character recognition from printed text – the numbers and letters in the text are composed of pixels created by the printer, but the “model” created from the text consists of the “idealized” sequence of characters.

*Creation of modal or “average” geometries*

As discussed above, our generative models of a population of cells consist of estimated *probability density distributions* for combinations of parameter values. Thus we can draw a new set of parameter values by sampling randomly from the probability density distribution. These parameter values can then be used to construct a corresponding idealized geometry (note that for some models, the same parameter values can give rise to more than one geometry, i.e., when the geometry construction involves randomly sampling from distributions specified by the parameters).

Instead of randomly sampling the parameters from the distribution, we can choose those parameters that correspond to the highest probability in the distribution. This corresponds to the “mode” of the distribution (the most frequent observation) and we can then construct an idealized (or “cartoon”) geometry from those parameters.

It is also important to note that we can combine generative models learned from different images as long as we make some assumption about how the models are related to each other. A simple but potentially useful assumption is that the things being modeled in two different models are independent of each other. For example, if we have models for different organelles that describe the positions of those organelles relative to some common frame of reference (i.e., the cell and nuclear boundaries), we can fix that frame of reference (e.g., choose an instance of cell and nuclear shape) and then synthesize the different organelles relative to that fixed frame. Figure 5 shows an example of a synthetic cell containing microtubules and eleven different punctuate organelles.

Figure 5. Synthetic cell image containing eleven punctuate patterns. Generative models of eleven different proteins with punctuate subcellular distributions were learned. Using the models, synthetic patterns for all proteins were independently created starting from a real cell geometry for the nuclear membrane, cell membrane, and microtubules; this assumes that positions of puncta do not affect each other (e.g., that peroxisomes are not more or less likely to be near RNP bodies). The nucleus is shown in dark gray and microtubules in light gray. The eleven patterns are CopI vesicles, CopII vesicles, Caveolae, Coated Pits, Early Endosomes, Late Endosomes, Lysosomes, Peroxisomes, RNP bodies, Recycling Endosomes, and Retromer.

## *Creation of geometries for simulations of cell biochemistry*

Extensive work has been done on simulating cellular processes *in silico*, such as simulating the kinetics of biochemical reactions. While most of that work does not specifically consider the possible influence of cell geometry on the results of simulations, systems have been created for simulating cell processes within specific, realistic cell geometries (Schaff et al., 1997; Stiles et al., 1998). These geometries are typically created by manual segmentation of microscope images to identify specific organelles and structures within the image. However, the use of synthetic, idealized cell geometries has a number of advantages, the most important of which are that they incorporate knowledge of organelle structure and arrangement learned from many images and that they can produce objects represented by meshes with watertight boundaries. The recently developed __“SBML spatial” standard__ provides a format for communicating cell geometries to cell simulation engines (Sullivan et al., 2015). CellOrganizer can produce SBML spatial files containing “meshed” surfaces or mathematically defined objects (constructive solid geometries), or both.

*Creation of simulated cell images*

The development and evaluation of image analysis software is often made difficult by the absence of “ground truth” regarding the results that should be obtained from a given image. For example, evaluation of segmentation of cells or organelles is often done by comparison with results from human segmentation, but the results from different people can be quite different. Generative models can address this challenge by producing simulated images for which the “ground truth” is known (Svoboda et al., 2009). The creation of the simulated images can incorporate the optical properties of a particular microscope, inluding the level of measurement noise typically seen

**References**

- Johnson, G.R., T.E. Buck, D.P. Sullivan, G.K. Rohde, and R.F. Murphy. 2015a. Joint Modeling of Cell and Nuclear Shape Variation.
*Mol Biol Cell*. 26:4046-4056. - Johnson, G.R., J. Li, A. Shariff, G.K. Rohde, and R.F. Murphy. 2015b. Automated Learning of Subcellular Variation among Punctate Protein Patterns and a Generative Model of their Relation to Microtubules.
*PLoS Comput. Biol.*11:e1004614. - Li, J., A. Shariff, M. Wiking, E. Lundberg, G.K. Rohde, and R.F. Murphy. 2012. Estimating microtubule distributions from 2D immunofluorescence microscopy images reveals differences among human cultured cell lines.
*PloS one*. 7:e50292. - Li, Y., T.D. Majarian, A.W. Naik, G.R. Johnson, and R.F. Murphy. 2016. Point process models for localization and interdependence of punctate cellular structures.
*Cytometry Part A*. in press. - Schaff, J., C.C. Fink, B. Slepchenko, J.H. Carson, and L.M. Loew. 1997. A general computational framework for modeling cellular structure and function.
*Biophys J*. 73:1135-1146. - Shariff, A., R.F. Murphy, and G.K. Rohde. 2010. A generative model of microtubule distributions, and indirect estimation of its parameters from fluorescence microscopy images.
*Cytometry Part A*. 77A:457-466. - Sigal, A., T. Danon, A. Cohen, R. Milo, N. Geva-Zatorsky, G. Lustig, Y. Liron, U. Alon, and N. Perzov. 2007. Generation of a fluorescently labeled endogenous protein library in living human cells.
*Nature protocols*. 2:1515-1527. - Stiles, J.R., J. Bartol, T.M., E.E. Salpeter, and M.M. Salpeter. 1998. Monte carlo simulation of neuro-transmitter release using MCell, a general simulator of cellular physiological processes.
*Proc Comput Neurosci*:279–284. - Sullivan, D.P., R. Arepally, R.F. Murphy, J.-J. Tapia, J.R. Faeder, M. Dittrich, and J. Czech. 2015. Design Automation for Biological Models: A Pipeline that Incorporates Spatial and Molecular Complexity.
*In*Proceedings of the 25th edition on Great Lakes Symposium on VLSI. ACM, Pittsburgh, Pennsylvania, USA. 321-323. - Svoboda, D., M. Kozubek, and S. Stejskal. 2009. Generation of digital phantoms of cell nuclei and simulation of image formation in 3D image cytometry.
*Cytometry Part A*. 75A:494-509.

#### Recent Posts

- New Release! CellOrganizer v2.9.3 November 17, 2021
- MMBioS 2021 Workshop July 29, 2021
- New Release! CellOrganizer v2.9.2 July 13, 2021
- New Release! CellOrganizer v2.9.1 July 12, 2021
- New Release! CellOrganizer v2.8.1 June 20, 2019