Large data volumes in tandem with increasing computational power and bandwidth have made it possible to understand the epigenome; think of the epigenome as the layer pervading the genome and giving every cell its identity. Every cell of the human body has the same genome. How then is a brain cell distinct from an immune cell? This is where the cell’s epigenome offers a distinct “symphony” to diverse contexts in which living cells thrive. Driven by the exabytes of sequencing data being generated, there is an increasing need to analyze genomic big data and computations in the living cells and then to translate them to discoveries in precision medicine. The best studied example of a cellular computation was first considered in the seminal paper by Berg and Purcell who showed that the information a cell can acquire about its environment is fundamentally limited by stochastic fluctuations in the occupancy of the membrane-bound receptor proteins that detect the ligand. This was way back in 1977! Today, abetted by exabytes of genomic data, it is known that there are computations within living cells. Overall, my lab’s goal is to understand some of these cellular computations and to reverse engineer them to restore health and vitality.
In the context of my talk today, these computations refer to the gene-gene and gene-RNA regulatory networks (GRN variants). A GRN is a set of genes, or parts thereof, which interact to control cellular functions. GRNs are important in development, differentiation, and cellular response to ambient signals. How can this “genomical” big data enable the decoding of the computation within cells, rapidly, and at scale? What kinds of algorithms can deal with the inherent heterogeneity, noise, and high-dimensionality of the data pertaining to the cellular computations? Can these efforts result in precise data-driven medicine? I will answer these questions in two parts:
Part 1: I will talk about our Avishkar suite of predictive algorithms, where we uncover the non-canonical signatures of small regulatory RNA (e.g., microRNA, miRNA for short) that target genes.
Part 2: I will present our work on federated cyberinfrastructures for genomics. This is in the context of MG-RAST, the largest metagenomics portal and analysis pipeline and operated by the US Department of Energy and is funded by an NIH R01 grant.