Author Archives: Neha Suresh

Credit: Pixabay.

You’ve heard about genome sequencing — but what’s exome sequencing?

Image in public domain (via Wiki Commons).

Despite our differences, human beings share 99.9% of the genome. In other words, we all differ by a mere 0.1% of genes, which triggers the difference in the way we appear, grow, and develop.

Over 80% of rare diseases are caused by genetic mutations in that miniscule difference, and it’s estimated that such undiagnosed diseases affect about 8% of our population. Detecting such diseases is challenging, but researchers are working on new promising techniques.

Potential forms of diagnosis for rare and undiagnosed diseases include:

  • Next Generation Sequencing (NGS), which refers to all large-scale DNA sequencing methods that allows for mapping the entire genome (whole genome sequencing);
  • whole exome sequencing — focusing on just the exons within all known genes
  • target gene panel (or only exons of selected genes). 

To understand whole exome sequencing (WES), we need dive into the world of our genetic makeup.

Four letters

The nucleus of every cell in the human body consists of 23 pairs of chromosomes, which makes 46 chromosomes in every cell. These chromosomes are in turn made of double stranded DNA. 

DNA is made up of genes that are built on nucleotides. The human genome consists of 20,000 genes and 3 billion nucleotides or “letters.” The’ letters’ are organic molecules, namely- Guanine (G), Thymine (T), Adenine (A) and Cytosine (C). G, T, A and C are arranged in specific sequences in our genes, subsequences that translate into proteins.

But not all 3 billion nucleotides translate into proteins. In fact, only a small percentage (about 1.5%) of these nucleotides, are actually translated into proteins. These are “EXpressed regiONS”, or exons.

This has led to the herald of Whole Exome Sequencing, or WES. While the cost of sequencing the entire genome is still out of reach, the cost of sequencing just the exons (aka the  functional part of the genome) is low enough that it has been used to find genetic abnormalities leading to rare diseases. It is also much easier to sift through this data. 

The complementary, “INTragenic regiON” or introns in genes are not represented or translated in proteins. 

Whole Exome Sequencing Whole Genome Sequencing 
Sequencing only the “coding DNA” or 30 million lettersLess intensive analytically and has lower storage requirementsLess expensive($1000 commercially)High sequencing depth of protein coding regions 

Ability to detect certain types of alterations may be limited Includes newly characterized  and novel genes 
Sequencing “all” DNA , introns and exons- 3 billion lettersMore variants to analyze and more storage requirements Much more expensive ($20,000 commercially)Extensive and uniform coverage of genome at a lower sequencing depth, both protein coding and non coding regions of genome Can detect more types of alterations than exome sequencing Includes newly characterized  and novel genes Can detect up to 10-15% more diagnoses than WES

The National Organization for Rare Disorders (NORD) at NCSU hosted Dr. Vandana Shashi, a pediatric genetics specialist at Duke University on April 22. Shashi she served as the co-chair of NIH’s Rare and Undiagnosed disease network and shared her perspectives on Exome and Genome Sequencing.

Given the nature of the method, some alterations that are not reliably detected by WES include- deep intronic non-coding region defects, pseudogenes and repeat regions. As WGS becomes cheaper and more accessible, Dr. Shashi sees this method eclipsinging WES. 

“I do see WGS  becoming cheaper and more accessible in the future, this will eventually eclipse WES,”Dr Shashi said, “WGS is a lot better at capturing copy number variants, i.e, deletions and duplications that are larger than 50 base pairs.”

In Whole Exome Sequencing, there are three steps DNA is prepared, sequenced and processed. 

Step 1- DNA Library Preparation 

  • Shear DNA – First, genomic DNA is sheared into random short fragments of about 300 base pairs. 
  • Blunting-  When an enzyme is used to chop the DNA into small parts, it leaves ends of uneven length on the double strand, depending on which strand is larger and which is shorter, the base pairs are either removed, or the missing base pairs are filled in by an enzyme producing “blunt” ends of equal length.
  • A-tailing- The blunted ends are then modified by adding a single adenine (A) nucleotide that forms an overhanging “A-tail”.
  • Add adapters- The sample is  flanked by ligate adaptors to allow sequencing.
  • Enrich library for Exome capture- Sequences that correspond to exons are captured by hybridization to DNA or RNA baits and then pulled down by coated magnetic beads. The selected fragments create a library enriched with exomes. 

Step 2- DNA Sequencing 

Exome capture  is followed by amplification of the sample and massive parallel sequencing. Massively parallel second generation sequencing (aka next generation sequencing) generates billions of base pairs of data. Barcodes to allow sample indexing, can be introduced at this step.

“You attach DNA to the flow cells and amplify. Basically, you are doing a number of simultaneous PCR reactions (Multiplex PCR reactions). Then you read the sequence  and you get a lot of fragments,” Dr Shashi said,”these fragments come in short reads of 100-115 base pairs long,” 

Step 3- DNA Analysis

The next step is DNA Alignment and Variant calling.

“These fragmented base pairs from the previous step are overlapped with one another and they are compared against a reference genome,” Dr. Shashi said. 

A reference genome here is a so-called ‘normal genome’ or a  representative example of the set of genes in one idealized individual of the human species. 

Bioinformatics tools are then  used in DNA analysis, they usually use one of these three file formats- 

It offers full sequencing of data and a corresponding quality score. Each sequence filtering gets entered as a 4 line format. 
Very large file formats, requires a lot of storage space. 
Binary Alignment Map, facilitates alignment of FASTQ to a reference genome 

Very large file formats, requires a lot of storage space 
Variant Call FormatStandardized text file for representing  Single Nucleotide Polymorphisms (SNP), Insertions and Deletions in the genome (INDEL) and corresponding variationsMost commonly used format 

Courtesy Twist Biosciences 

Dr Shashi used this method to diagnose a 20 month old with Brown–Vialetto–Van Laere Syndrome 2 (BVVLS2) (Shashi et. al) ,  they used high-dose riboflavin therapy or large doses of Vitamin B2 to stabilize the degressing neurological condition of the child. 

Whole exome Sequencing is shaping up to be the most exciting advance in the world of genetics and could possibly be a much larger stepping stone in the world of undiagnosed and rare disorders. Stay tuned as we keep up with this evolving bio technology.  

How understanding the complex MARCKS expression could help cure rare diseases

The National Organization for Rare Disorders at NC State just celebrated its first virtual Rare disease day on February 25 2021 in collaboration with the University of North Carolina at Chapel Hill and Wake Forest Medical School. Our research mentor, Dr. Kenneth Adler represented our chapter as our keynote speaker.

Dr. Adler is a professor of Cell Biology at NC State’s College of Veterinary medicine and he believes medical research has benefitted greatly from increased interest.

“Advocacy is playing an increasingly important role in accelerating progress in understanding and treating rare diseases,” Dr. Adler said. “Rare diseases of all kinds require dedicated individuals to pursue treatments and cures.”

Both at the University and at his Biomarck Pharamceuticals start-up, Dr. Adler’s work focuses on the so-called MARCKS expression and ways to counter it.

The Myristoylated Alanine Rich C-Kinase Substrate (MARCKS) is encoded by the MARCKS gene in vertebrates, a gene that plays an important role in many cellular processes, including cell motility, secretion, regulation of the cell cycle, and neural development.

Significantly heightened levels of MARCKS expression are found in many cell primary tumor samples, especially in lung & breast cancers. However, MARCKS expression is not accompanied by a mutation, leading to an understanding that the MARCKS pathway is a contributor to the cancer phenotype.

Step 1: Phosphorylation of MARCKS protein

It all starts with the so-called ATP, Adenosine Triphosphate by its full name. ATP is a molecule that captures chemical energy from the breakdown of food molecules to fuel bodily processes. Some enzymes (called kinases) can take phosphate (PO4) from ATP and bring them to other molecules in the body (called a substrate) — this process is called phosphorylation.

A first step towards understanding the MARCKS mechanism is breaking down the termprotein-kinase phosphorylation, which is depicted in thet image below:

A protein kinase catalyzes the transfer of phosphate from ATP to its protein substrates. Protein kinases are divided into many families, but the one protein kinase classification relevant to understanding the MARCKS mechanism is the Serine/ Threonine protein kinase.

This particular family of proteins includes cyclins or proteins associated with cell division, as well as Protein kinase D, which is increasingly recognized as a key regulatory signaling hub.

  • CDKs combine with a cyclin to regulate different phases of the cell cycle from the G1 phase to the completion of the cell cycle. G1 is the growth phase of the cell cycle before the DNA is synthesized and when the most proteins are made.
  • Protein Kinase D is activated by protein kinase C and is involved in the regulation of cell growth, proliferation, cell migration (metastasis), differentiation, and apoptosis (natural cell death). Protein Kinase C (PKC) is a family of protein kinase enzymes that are involved in controlling the function of other proteins.

MARCKS is a substrate for G1 kinases (CDK 4/6) as well as for the protein kinase C. Signal transduction pathways are cellular responses to extracellular stimuli, for Kinase C the signal transduction cascades include cell proliferation and immunological responses. Thus, MARCKS seems to be the link between cell cycle signaling and Protein Kinase C signal transduction pathways. This further indicates an involvement of the MARCKS proteins in the cancer phenotype.

In the above schematic of the MARCKS substrate, each region corresponds to a particular function:

  1. N-terminal – Start of a protein, in the above picture Mystrisic acid (MA) is linked to the N terminal of the protein
  2. MH2- The function of the MH2 domain is currently unknown. (Sheats et. al)
  3. Phosphorylation site domain (PSD)- site of phosphorylation, MARCKS binds to the cell membrane in its dephosphorylated state by ionic interactions between PSD site and membrane phospholipids

Step 2: PIP2 Release from membrane sites

MARCKS is usually bound to a lipid messenger called PIP2. When MARCKS is phosphorylated, (PIP2 – MARCKS) binding is disrupted, and PIP2 is released from its sequestration sites in the membrane, allowing detachment. This activates other signaling molecules such as Focal Adhesion Kinase (FAK) and other cytoskeletal proteins. In cancer, this results in enhanced cell migration (metastasis) and proliferation.

Step 3: FAK Activation

FAK is a key regulator of the growth factor receptor. Increased FAK expression is observed in many cancer cells, high expression is associated with poor chances of recovery. The first step in FAK activation is binding with PIP2. FAK promotes malignancy via highly coordinated signaling events that use a diverse range of cellular processes, especially cell migration (metastasis) and invasion.

BIO-11006 (anti-MARCKS peptide)

Biomarck Pharmaceuticals is currently developing its lead compound, BIO-11006 for non-small cell lung cancer and acute respiratory distress syndrome (ARDS). This is a 10 amino acid sequence attached at the N-terminal of MARCKS Protein. It is an investigational peptide medication delivered by a nebulizer which has shown activity in the laboratory and in animal studies in certain types of cancer.

It has been granted FDA acceptance to begin studies in adult lung cancer. BIO-11006 functions by disrupting the very first step of phosphorylating MARCKS and by selectively dephosphorylating MARCKS, inhibiting all the consecutive steps.

Current Use of BIO-110066 in Rare pediatric cancers

Compassionate use encompasses expanded access for a patient with an immediately life threatening condition. They gain access to an investigational drug for treatment outside the clinical trials when noo comparable or satisfactory alternate therapy is available.

In 2016, CBS North Carolina reported the first instance of compassionate use of BIO-110066. Philomena Stendardo, 8, was diagnosed with Diffuse Intrinsic Pontine Glioma (DIPG). DIPG is a rare, aggressive brain tumor. The tumor wrapped around her brain stem was inoperable. Biomarck Pharmaceuticals partnered with the Live Like Bella foundation to implement compassionate use in this case. After a few weeks of being administered with 006, Stendardo’s condition improved, she was able to sit up on her own, walk with assistance, move her right side and was in a position to think more clearly. Unfortunately she passed away several months, but her treatment could be a stepping stone to improving the quality of life for a cancer patient.

“The initial results suggest that there is something in this treatment that could be beneficial to patients with rare pediatric cancers,” Dr. Adler said, “Currently we are conducting expanded access clinical trials in Osteosarcoma patients in collaboration with Nicklaus Children’s hospital in Miami.”


  1. (1) Rare Disease day 2021- NORD at NC State University, UNC-Ch, Wake forest Med and Tompkins HS – YouTube
  3. (1) BMK101 MOA Final 180914e – YouTube
  4. Kinase Classification-CUSABIO
  5. MOA-Adler.pptx (
  6. Child with rare cancer is first to receive NC State scientist’s experimental treatment (
  7. Sheats, M. K., Yin, Q., Fang, S., Park, J., Crews, A. L., Parikh, I., Dickson, B., & Adler, K. B. (2019). MARCKS and Lung Disease. American journal of respiratory cell and molecular biology, 60(1), 16–27.
  8. Chen, C. H., Statt, S., Chiu, C. L., Thai, P., Arif, M., Adler, K. B., & Wu, R. (2014). Targeting myristoylated alanine-rich C kinase substrate phosphorylation site domain in lung cancer. Mechanisms and therapeutic implications. American journal of respiratory and critical care medicine, 190(10), 1127–1138.
  9. Yin Q, Fang S, Park J, Crews AL, Parikh I, Adler KB. An Inhaled Inhibitor of Myristoylated Alanine-Rich C Kinase Substrate Reverses LPS-Induced Acute Lung Injury in Mice. Am J Respir Cell Mol Biol. 2016 Nov;55(5):617-622. doi: 10.1165/rcmb.2016-0236RC. PMID: 27556883; PMCID: PMC5105187.
  10. Gambhir A, Hangyás-Mihályné G, Zaitseva I, Cafiso DS, Wang J, Murray D, Pentyala SN, Smith SO, McLaughlin S. Electrostatic sequestration of PIP2 on phospholipid membranes by basic/aromatic regions of proteins. Biophys J. 2004 Apr;86(4):2188-207. doi: 10.1016/S0006-3495(04)74278-2. PMID: 15041659; PMCID: PMC1304070.
  11. Chen, CH., Fong, L., Yu, E. et al. Upregulation of MARCKS in kidney cancer and its potential as a therapeutic target. Oncogene 36, 3588–3598 (2017).
  12. Chen CH, Thai P, Yoneda K, Adler KB, Yang PC, Wu R. A peptide that inhibits function of Myristoylated Alanine-Rich C Kinase Substrate (MARCKS) reduces lung cancer metastasis. Oncogene. 2014 Jul 10;33(28):3696-706. doi: 10.1038/onc.2013.336. Epub 2013 Aug 19. PMID: 23955080; PMCID: PMC4631387.
  13. Yin Q, Fang S, Park J, Crews A, Parikh I, Dickson B et al., Marcks- inhibitory peptides synergize with Cisplatin to inhibit metastasis and primary tumor growth in mouse orthotopic lung cancer models, Am J Respir, Crit CARE MED 2016, 193: A3131

How CRISPR could help us discover and treat rare cancers

Bacteria readily acquires a sequence of other species’ DNA into their own, in specific areas that we now call CRISPR. In the lab, CRISPR was synthesized by linking together two guide RNA sequences into a format that would provide the target information and allow us to edit multiple genes simultaneously.

Cancer is a genetic disease, it works by creating certain changes to genes that control the way our cells function, especially how they grow and divide. Some rare cancers, sarcomas in particular, have been treated using CRISPR, which is why the gene-editing tool seems like a good diagnostic and therapeutic tool in the future of cancer treatments.

Obtaining rare cancerous tumors for research is difficult, but luckily organizations such as and the Rare Cancer Research Foundation (RCRF) come into play. These sister groups perform a matching program that enables patients to directly donate their tumor tissue and medical data to research. All the data generated by the project is freely available to the research community and is dedicated to open science.

“Using, the Broad Institute of MIT and Harvard has created over 40 next-generation de-identified cancer models,” Ms. Barbara Van Hare, Director of Foundation partnerships at RCRF said, “These models and associated data will be shared within the worldwide research community.”

After procuring these rare disease samples, Dr. Jesse Boehm from Eli and Edythe L. Broad Institute might have the answer to decipher the genetic landscape of cancer cells and use that to our advantage. Dr. Boehm is the scientific director of the Broad Institute’s Cancer Dependency Map Initiative where he works on the cancer cell line factory project and the cancer dependency map.

Cancer samples are broken apart into cell models and are coaxed into growing in different conditions over a year-long time period. The data from these new cell models are then shared broadly with the world. This is a pipeline activity called the cell line factory. It is a part of an international effort to create a large reference data set, that is called the cancer dependency map.

The cancer dependency map has a two-pronged approach, first by testing cell lines against drugs and then pooled CRISPR screening. First, all cell lines are systematically tested against all drugs developed for any disease. Some known drugs have shown to be effective against certain cancers, clinical trials are swift as these are existing therapies.

“There are 20,000 proteins in the human genome and only 6000 drug therapies. Only five percent of human genes can be targeted with drugs. The cancer dependency Map is completed with the help of CRISPR,” Dr Boehm said.

Pooled CRISPR screening is used and 100,000 CRISPRs target every gene in the genome. Every cell is challenged with all these CRISPRs and at the end of the experiment, the abundance is compared to the beginning of the experiment.

CRISPR is used to snip genes,the DNA repairs creating a broken gene. Cells that are required for viability die and drop out of the population. CRISPRs are bar coded, so if by the end of the experiment the CRISPR is absent, it targets the gene that the cell needed to survive. The genes that drop out are good drug targets, most of these make way for drug discovery projects right away.

“CRISPR is such a sharp tool, it inspires a lot more confidence than its predecessors,” Dr Boehm said. He uses the analogy of Google Maps for this project: “It needs to tell clinicians what to do and where to go, but for it to be relevant-the data needs to be dense enough in that area.”

An additional therapy for cancers involves making four genetic modifications to T cells (immune cells that can kill cancer). It basically adds genes to T cells to fight cancer. One of these is a synthetic gene that gives the T cells a protein that can identify cancer cells better. CRISPR is also used to mute three genes that limit the cells’ cancer-killing abilities (Stadtmauer et al. 2020). With these limiting genes removed, the T cells are less inhibited to fight cancer.

These therapeutics and the Cancer dependency map will take a few decades to develop but will prove to be a very sharp tool in our arsenal against rare cancers when complete.


  • Boehm JS, Golub TR. An ecosystem of cancer cell line factories to support a cancer dependency map. Nat Rev Genet. 2015 Jul;16(7):373-4.
  • M. Jinek, K. Chylinski, I. Fonfara, M.Hauer, J.A. Doudna, E. Charpentier, A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science, 337 (2012), pp. 816–821.
  • Stadtmauer, E. A., Fraietta, J. A., Davis, M. M., Cohen, A. D., Weber, K. L., Lancaster, E., … & Tian, L. (2020). CRISPR-engineered T cells in patients with refractory cancer. Science, 367(6481).
  • Tsherniak, A., Vazquez, F., Montgomery, P. G., Weir, B. A., Kryukov, G., Cowley, G. S., … & Meyers, R. M. (2017). Defining a cancer dependency map. Cell, 170(3), 564-576.