> Computational Biology and Biostatistics

Imputing Missing Data in Medical Questionnaires

Time Line

Ongoing since 2011

Team

Lalit Garg (now at University of Malta, Malta), JD, Khai Pang Leong (Tan Tock Seng Hospital, Singapore), Dr. Arul Earnest (Duke-NUS Graduate Medical School, Singapore).

Problem

Self-report questionnaires are used as an extremely valuable instrument to assess the quality of life of a patient, its relationship with socioeconomic and environmental factors, disease risk/ progress, treatment and disease burden, treatment response and quality of care. However, a common problem with such questionnaires is missing data. Despite enormous care and effort to prevent it, some level of missing data is common and unavoidable. Such missing data can have a detrimental impact on statistical analyses based on the questionnaires responses. A variety of methods have been suggested for missing data imputation. Nevertheless, more research is desperately needed to assess and improve the reliability of data imputation. In this project, we will explore existing and develop novel statistical procedures to complete missing data in medical questionnaires.

Contribution

We proposed innovative collaborative filtering techniques to complete missing data in repeated medical questionnaires. The proposed techniques are based on the canonical polyadic (CP) decomposition (a.k.a. PARAFAC). Besides the standard CP decomposition, also a normalized decomposition is utilized. As an illustration, systemic lupus erythematosus-specific quality-of-life questionnaire is considered. Measures such as normalized root mean square error, bias and variance are used to assess the performance of the proposed tensor-based methods in comparison with other widely used approaches, such as mean substitution, regression imputations and k-nearest neighbour estimation. The numerical results demonstrate that the proposed methods provide significant improvement in comparison to popular methods. The best results are obtained for the normalized decomposition.

Figure: Tensor factorization based collaborative filtering for missing data imputation in medical questionnaires.

Reference

Dauwels J, Garg L, Earnest A, Leong KP, Tensor Factorizations for Missing Data Imputation in Medical Questionnaires, ICASSP 2012, Mar 25-30, 2012, Kyoto, Japan. [ PDF ]

Dauwels J, Garg L, Earnest A, Leong KP, Tensor-Based Methods for Handling Missing Data in Quality-of-Life Questionnaires, IEEE Journal of Biomedical and Health Informatics, vol.18, no.5, pp.1571-1580, Sept. 2014. [ PDF ]

Computational Synchronization of Microarray Data with Application to Plasmodium Falciparum

Time Line

Ongoing since 2011

Team

JD, W. Zhao, J. C. Niles (SMART and MIT), J. Cao (SMART and MIT)

Problem

Microarrays are widely used to investigate the blood stage of Plasmodium falciparum infection. Starting with synchronized cells, gene expression levels are continually measured over the 48-hour intra-erythrocytic cycle (IDC). However, the cell population gradually loses synchrony during the experiment. As a result, the microarray measurements are blurred. In this project, we propose a generalized deconvolution approach to reconstruct the intrinsic expression pattern, and apply it to P. falciparum IDC microarray data.

Contribution

We develop a statistical model for the decay of synchrony among cells, and reconstruct the expression pattern through statistical inference. The proposed method can handle microarray measurements with noise and missing data. The original gene expression patterns become more apparent in the reconstructed profiles, making it easier to analyze and interpret the data. We hypothesize that reconstructed gene expression patterns represent better temporally resolved expression profiles that can be probabilistically modeled to match changes in expression level to IDC transitions.

This study proposes a new methodology for extracting intrinsic expression patterns from microarray data. By applying this method to P. falciparum microarray data, several protein kinases are predicted to play a significant role in the P. falciparum IDC. Earlier experiments have indeed confirmed that several of these kinases are involved in this process. Overall, these results indicate that further functional analysis of these additional putative protein kinases may reveal new insights into how the P. falciparum IDC is regulated.

In future work, we will apply those algorithms to short invasive recordings made in the operating room. We will compare our results with the gold standard, determined by clinicians from invasive recordings during seizures.

The proposed procedure may have enormous impact on clinical practice of epilepsy in Singapore and elsewhere, and would substantially reduce treatment costs. Moreover, our novel automated approach to medical decision making is not only relevant for neurosurgery but many other medical disciplines.

We investigated the effect of cell asynchrony by conducting numerical experiments. The simulation results suggest that cell asynchrony has varying effects on different intrinsic expression patterns. Specifically, the intrinsic patterns with high expression around the late life stage are more likely to be affected by cell asynchrony. It is also investigated how the effect of cell asynchrony depends on the experimental conditions. From this analysis, the burst rate in infection period and the standard deviation of growth rate are identified to have a strong impact on the blurring due to cell asynchrony. Consequently, it is critical to measure these two parameters during biological experiments in order to deblur time-series gene expression data.

Reference

Wei Zhao, J. Dauwels, J. Niles, and Jianshu Cao, Computational synchronization of microarray data with application to Plasmodium falciparum, Proteome Science, Proteome Sci. 2012; 10(Suppl 1): S10 (invited paper). [ PDF ]

W. Zhao, Wei, J. Dauwels, and J. Cao, The Effects of Cell Asynchrony on Time-Series Data: An Analysis on Gene Expression Level of Plasmodium Falciparum, 36th Annual International Conference of the IEEE Engineering In Medicine And Biology Society (EMBC 2014), 2014, in press. [ PDF ]

Wei Zhao, J. Dauwels and J. Cao, Computational Synchronization Improves the Consistency between Multiple Microarray Experiments, Workshop of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2013), Shanghai, China, December 18-21 2013.

Figure: The proposed method can reliably reconstruct the kinase expression pattern when the microarray data is contaminated by signal noise and has numerous missing data points.

Figure: We demonstrate how the cell asynchrony has varying effects D on different intrinsic expression patterns, characterized by the peak location mu and width sigma.

Bayesian Spatio-Temporal Image Analysis of Cell Sprouting in Angiogenesis

Time Line

Ongoing since 2010

Team

Lee-Ling Sharon Ong (SMART), JD, Marcelo Ong (NUS), H. Harry Asada (MIT and SMART)

Problem

The project goal is to develop an automated analysis for experiments in angiogenesis.
Angiogenesis is the process of generating a vascular network from an existing blood vessel. A population of Endothelial Cells (ECs) residing in a blood vessel (lumen) can sprout out and create a new vascular network when exposed to growth factors. At BioSyM-SMART, experiments in angiogenesis are performed using "in vitro" micro-fluidics devices.

A significant amount of data can be produced from these experiments particularly from 3D confocal images. Manual cell tracking of such data would be a time-consuming chore and subjected to human-to-human variance. Automated image analysis and tracking is therefore a more favorable alternative for a more efficient and accurate data assessment. Quantitative models could then be developed and validated to explain the emergent behaviors of many interacting cells and molecules in sprout formation.

Contribution

The project goal is to develop software for automated tracking of the migrating and proliferating cells and sprouts microscopy to provide useful insights into how a vascular structure is formed as a collection of migratory cells.

We incorporate biological models into our cell and sprout tracking algorithms. In micro-fluidic 3-D angiogenic sprouting experiments, two types of images are obtained at discrete time steps using confocal microscopy (see Fig.): a) three-dimensional fluorescent images of stained cell nuclei and b) two dimensional visible light images of the gel matrix. These two sources of images provide supplementary information as the outline of the lumen formed in the matrix by the migrating cells can be seen in the images of the gel. Although, three-dimensional centroid of cell nuclei may be extracted from fluorescence images, the acquisition is more harmful to the cells and the data is subjected to photo-bleaching unlike the 2D visible images. Hence, the latter images may be acquired at a higher sampling interval even though these images as subjected to the blur of out of focus material.

In sprout formation, tip-cells lead the forming sprout, while stalk cells trail behind, supporting the newly formed sprout. Tip cells are characterized by rich filopodia protrusions, which can be seen from the sprout outline. Our algorithms can differentiate and track the transition between tips and stalk cells as well as filopodia extension/retraction and lumen formation.

We track a joint state representation of tip/stalk cells and lumen and filopodia, using the concepts of Simultaneous Localization and Mapping (SLAM). The Bayesian filtering framework developed augments both the cell and lumen/filpodia parameters to the same state vector, allowing mathematically consistent simultaneous observation updates from both channels. The outcome is tracking and visualization software of the lumen formation, filopodia extension and retraction, and nuclei locations and tip/stalk phenotype at each time step.

In addition, we have developed software to jointly estimate 3D shapes and poses of cell nuclei from confocal images using statistcal prior information. Our algorithms try to emulate how people apply prior knowledge about nuclei shapes when they manually segment a cell.

Figure: Examples of the images acquired during time-lapse microscopy include (a) fluorescent images (top left) and (b) transmitted light images (bottom left). The transmitted light images provide information about the sprout profile while 3D cell nuclei centroid locations can be obtained from the fluorescent images. The automatically extracted cells, lumen and filopodia are jointly tracked using Bayesian filtering. We visualize (image on the right) the extracted lumen (shown in blue), extracted cell filopodia (shown in red) and the tip and stalk cells using in IMARIS (from Bitplane).

Reference

Ong LL, Dauwels J, Ang MH Jr, Asada HH, A Bayesian filtering approach to incorporate 2D/3D time-lapse confocal images for tracking angiogenic sprouting cells interacting with the gel matrix, Med Image Anal. 2014 Jan;18(1):211-27. [ PDF ]

S.L.L. Ong, M. Wang, J. Dauwels, H. Asada, Segmentation of Densely Populated Cell Nuclei from Confocal Image Stacks Using 3D Non-Parametric Shape Priors, 36th Annual International Conference of the IEEE Engineering In Medicine And Biology Society (EMBC 2014), 2014, in press. [ PDF ]

Mengmeng Wang, Lee-Ling Sharon Ong, Justin Dauwels, and H. Harry Asada. Automatic Detection of Endothelial Cells in 3D Angiogenic Sprouts from Experimental Phase Contrast Images. SPIE Medical Imaging 2015, accepted.

Data-Driven Biophysical Models of Angiogenesis

Time Line

Ongoing since 2012

Team

Wang Mengmeng, Lee-Ling Sharon Ong (SMART), JD, H. Harry Asada (MIT and SMART)

Problem

Angiogenesis is the physiological process through which new blood vessels form from existing vessels. It is a normal and vital process in growth, development and wound healing. However, tumor-induced angiogenesis is a critical step for cancer invasion, which causes 13% of all human deaths worldwide. Thus better understanding of angiogenesis will facility the treatment of cancer. Our objective is to develop a cell prediction model for angiogenesis based on experimentally verified data that is capable of generating vessel networks that are visually similar to actual data from time-lapse image (see Fig 1).

Contribution

We have developed a data-driven model that predicts the movement of stalk cells based on tip-stalk cell interaction in angiogenesis. The cell trajectories are obtained from automated image analysis of experimental observations. Since cell-cell interaction and drag force influence the position and velocities of stalk cell migration in ECM, both kinds of interactions are incorporated in our model. The unknown parameters in the model are estimated using estimated using Maximum Likelihood Estimation (MLE) from time-lapse experimental cell migration data. The numerical results show that our stalk cell prediction model can well- describe stalk cell trajectories, as shown in Fig 2.

Figure: The proposed method can reliably reconstruct the kinase expression pattern when the microarray data is contaminated by signal noise and has numerous missing data points.

Figure 2: (left) Comparison of predicted and experimental stalk cell trajectories; (right) Prediction error over time.

Reference

Mengmeng Wang, J. Dauwels, Lee-Ling S. Ong, H. Asada, 2D Data-Driven Stalk Cell Prediction Model Based on Tip-Stalk Cell Interaction in Angiogenesis, 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2013), 3-7 July 2013, pp. 4537-40. [ PDF ]

DAUWELS LAB

> Computational Biology and Biostatistics

Imputing Missing Data in Medical Questionnaires

Computational Synchronization of Microarray Data with Application to Plasmodium Falciparum

Bayesian Spatio-Temporal Image Analysis of Cell Sprouting in Angiogenesis

Data-Driven Biophysical Models of Angiogenesis

DAUWELS LAB