-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.json
1 lines (1 loc) · 53 KB
/
index.json
1
[{"authors":["bo-li"],"categories":null,"content":"Dr. Bo Li is an Assistant Professor of Medicine at Harvard Medical School and the director of Bioinformatics and Computational Biology at Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital. His research focuses on large-scale single-cell genomics data analysis. He received his Ph.D. in computer science from UW-Madison and completed two postdoctoral trainings with Dr. Lior Pachter at UC Berkeley and Dr. Aviv Regev at Broad Institute. He is best known for developing RSEM, an impactful RNA-seq transcript quantification software. RSEM is cited 8,080 times (Google Scholar) and adopted by several big consortia such as TCGA, ENCODE, GTEx and TOPMed.\n","date":1579132800,"expirydate":-62135596800,"kind":"taxonomy","lang":"en","lastmod":1579132800,"objectID":"c0c89f04b86018bb44e11b90192190ab","permalink":"https://lilab.mgh.harvard.edu/author/bo-li/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/author/bo-li/","section":"authors","summary":"Dr. Bo Li is an Assistant Professor of Medicine at Harvard Medical School and the director of Bioinformatics and Computational Biology at Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital.","tags":null,"title":"Bo Li","type":"authors"},{"authors":["yiming-yang"],"categories":null,"content":"Yiming Yang is a computational scientist in Li Lab.\n","date":1579132800,"expirydate":-62135596800,"kind":"taxonomy","lang":"en","lastmod":1579132800,"objectID":"3f00102d4ba2e81a19d474c8163efbb7","permalink":"https://lilab.mgh.harvard.edu/author/yiming-yang/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/author/yiming-yang/","section":"authors","summary":"Yiming Yang is a computational scientist in Li Lab.","tags":null,"title":"Yiming Yang","type":"authors"},{"authors":["admin"],"categories":null,"content":"The Li Lab aims at developing novel computational tools to empower single-cell and single-nucleus multi-omics data analysis. Currently, we have three important research directions:\n Cloud-based analysis of large-scale single-cell and single-nucleus RNA-seq datasets; Cloud-based large-scale single-cell ATAC-seq data analysis; Cloud-based spatial transcriptomics data analysis. In addition, collaborating with biologists and clinicians, we also work on biological questions related to:\n Human Immune Cell Atlas; Human Tumor Cell Atlas; COVID-19 research. ","date":-62135596800,"expirydate":-62135596800,"kind":"taxonomy","lang":"en","lastmod":-62135596800,"objectID":"2525497d367e79493fd32b198b28f040","permalink":"https://lilab.mgh.harvard.edu/author/admin/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/author/admin/","section":"authors","summary":"The Li Lab aims at developing novel computational tools to empower single-cell and single-nucleus multi-omics data analysis. Currently, we have three important research directions:\n Cloud-based analysis of large-scale single-cell and single-nucleus RNA-seq datasets; Cloud-based large-scale single-cell ATAC-seq data analysis; Cloud-based spatial transcriptomics data analysis.","tags":null,"title":"Bo Li","type":"authors"},{"authors":["dongyu-zhou"],"categories":null,"content":"Dongyu is a Co-op student in Li Lab. She has developed an interest in the field of Computational Biology and Full Stack Web Development during the Co-op.\n","date":-62135596800,"expirydate":-62135596800,"kind":"taxonomy","lang":"en","lastmod":-62135596800,"objectID":"564e6fc5899a21e6eba3f53ea3593c85","permalink":"https://lilab.mgh.harvard.edu/author/dongyu-zhou/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/author/dongyu-zhou/","section":"authors","summary":"Dongyu is a Co-op student in Li Lab. She has developed an interest in the field of Computational Biology and Full Stack Web Development during the Co-op.","tags":null,"title":"Dongyu Zhou","type":"authors"},{"authors":["hui-ma"],"categories":null,"content":"Hui Ma is a Co-Op student in Li Lab. She has developed an interest in the field of Computational Biology during co-op.\n","date":-62135596800,"expirydate":-62135596800,"kind":"taxonomy","lang":"en","lastmod":-62135596800,"objectID":"7fbfad07141cc116328c049172ab535d","permalink":"https://lilab.mgh.harvard.edu/author/hui-ma/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/author/hui-ma/","section":"authors","summary":"Hui Ma is a Co-Op student in Li Lab. She has developed an interest in the field of Computational Biology during co-op.","tags":null,"title":"Hui Ma","type":"authors"},{"authors":["kamil-slowikowski"],"categories":null,"content":"Kamil Slowikowski is a postdoctoral fellow joint in Li Lab and Villani Lab.\n","date":-62135596800,"expirydate":-62135596800,"kind":"taxonomy","lang":"en","lastmod":-62135596800,"objectID":"2d14be859276d9fcd88a85374bd1ade9","permalink":"https://lilab.mgh.harvard.edu/author/kamil-slowikowski/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/author/kamil-slowikowski/","section":"authors","summary":"Kamil Slowikowski is a postdoctoral fellow joint in Li Lab and Villani Lab.","tags":null,"title":"Kamil Slowikowski","type":"authors"},{"authors":["linhui-chen"],"categories":null,"content":"Linhui Chen is a Co-Op student in Li Lab.\n","date":-62135596800,"expirydate":-62135596800,"kind":"taxonomy","lang":"en","lastmod":-62135596800,"objectID":"48b7b20f937e9b891514061cec7dc69c","permalink":"https://lilab.mgh.harvard.edu/author/linhui-chen/","publishdate":"0001-01-01T00:00:00Z","relpermalink":"/author/linhui-chen/","section":"authors","summary":"Linhui Chen is a Co-Op student in Li Lab.","tags":null,"title":"Linhui Chen","type":"authors"},{"authors":null,"categories":[],"content":" Linhui Chen and Dongyu Zhou officially join Li Lab today. Hope they have an enjoyable and fruitful time here!\n","date":1609718400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1609718400,"objectID":"49695acc958afeed69df9c3d7731601a","permalink":"https://lilab.mgh.harvard.edu/post/welcome_linhui_and_dongyu/","publishdate":"2021-01-04T00:00:00Z","relpermalink":"/post/welcome_linhui_and_dongyu/","section":"post","summary":"Linhui Chen and Dongyu Zhou officially join Li Lab today. Hope they have an enjoyable and fruitful time here!","tags":[],"title":"Welcome co-op students Linhui Chen and Dongyu Zhou joining our lab!","type":"post"},{"authors":["E. L. Bao","S. K. Nandakumar","X. Liao","A. G. Bick","J. Karjalainen","M. Tabaka","O. I. Gan","A. S. Havulinna","T. T. J. Kiiskinen","C. A. Lareau","A. L. de Lapuente Portilla","**B. Li**","C. Emdin","V. Codd","C. P. Nelson","C. J. Walker","C. Churchhouse","A. de la Chapelle","D. E. Klein","B. Nilsson","P. W. F. Wilson","K. Cho","S. Pyarajan","J. M. Gaziano","N. J. Samani","FinnGen","23andMe Research Team","A. Regev","A. Palotie","B. M. Neale","J. E. Dick","P. Natarajan","C. J. O’Donnell","M. J. Daly","M. Milyavsky","S. Kathiresan","V. G. Sankaran\u0026dagger;"],"categories":null,"content":"","date":1602633600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1602633600,"objectID":"65922798bf14b7d476102f414ccc2e02","permalink":"https://lilab.mgh.harvard.edu/publication/bao-2020/","publishdate":"2020-10-14T00:00:00Z","relpermalink":"/publication/bao-2020/","section":"publication","summary":"Myeloproliferative neoplasms (MPNs) are blood cancers that are characterized by the excessive production of mature myeloid cells and arise from the acquisition of somatic driver mutations in haematopoietic stem cells (HSCs). Epidemiological studies indicate a substantial heritable component of MPNs that is among the highest known for cancers1. However, only a limited number of genetic risk loci have been identified, and the underlying biological mechanisms that lead to the acquisition of MPNs remain unclear. Here, by conducting a large-scale genome-wide association study (3,797 cases and 1,152,977 controls), we identify 17 MPN risk loci (P ","tags":null,"title":"Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells","type":"publication"},{"authors":null,"categories":[],"content":"The Cumulus featured workspace on Terra platform has been updated.\nUsers who are interested in using cloud computing for single-cell data analysis can take it as an example to learn about Cumulus. In specific, this featured workspace uses a real-world dataset, and performs the process on gene-count matrix generation and the downstream analysis.\n","date":1597708800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1597708800,"objectID":"9896cbb64e0b892c83d7dc08f0e12f45","permalink":"https://lilab.mgh.harvard.edu/post/cumulus_featured_workspace/","publishdate":"2020-08-18T00:00:00Z","relpermalink":"/post/cumulus_featured_workspace/","section":"post","summary":"The Cumulus featured workspace on Terra platform has been updated.\nUsers who are interested in using cloud computing for single-cell data analysis can take it as an example to learn about Cumulus.","tags":[],"title":"Cumulus featured workspace on Terra is updated","type":"post"},{"authors":null,"categories":[],"content":" Yiming Yang and Hui Ma from Li Lab has taken the lead in a tutorial workshop for Pfizer researchers working on single-cell analysis, which was coordiated by Verily Life Science.\nIn this tutorial, Cumulus workflows, Pegasus analysis module in Python, and Cirrocumulus, a cloud-based visualizer on single-cell data, were introduced using a real-world dataset.\n","date":1595808000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1595808000,"objectID":"3a1f7efda243dd83856a67c0d851a2a6","permalink":"https://lilab.mgh.harvard.edu/post/pfizer_workshop/","publishdate":"2020-07-27T00:00:00Z","relpermalink":"/post/pfizer_workshop/","section":"post","summary":"Yiming Yang and Hui Ma from Li Lab has taken the lead in a tutorial workshop for Pfizer researchers working on single-cell analysis, which was coordiated by Verily Life Science.","tags":[],"title":"Cumulus is introduced in Pfizer Workshop","type":"post"},{"authors":["**B. Li**\u0026dagger;","J. Gould","**Y. Yang**","S. Sarkizova","M. Tabaka","O. Ashenberg","Y. Rosen","M. Slyper","M. S. Kowalczyk","A.-C. Villani","T. L. Tickle","N. Hacohen","O. Rozenblatt-Rosen\u0026dagger;","A. Regev\u0026dagger;"],"categories":null,"content":"","date":1595808000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1595808000,"objectID":"d25966619208c9633885f8986cffd891","permalink":"https://lilab.mgh.harvard.edu/publication/li-2020/","publishdate":"2020-07-27T00:00:00Z","relpermalink":"/publication/li-2020/","section":"publication","summary":"Massively parallel single-cell and single-nucleus RNA-seq (sc/snRNA-seq) have opened the way to systematic tissue atlases in health and disease, but as the scale of data generation is growing, so does the need for computational pipelines for scaled analysis. Here, we developed Cumulus, a cloud-based framework for analyzing large scale sc/snRNA-seq datasets. Cumulus combines the power of cloud computing with improvements in algorithm implementations to achieve high scalability, low cost, user-friendliness, and integrated support for a comprehensive set of features. We benchmark Cumulus on the Human Cell Atlas Census of Immune Cells dataset of bone marrow cells and show that it substantially improves efficiency over conventional frameworks, while maintaining or improving the quality of results, enabling large-scale studies.","tags":null,"title":"Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq","type":"publication"},{"authors":null,"categories":[],"content":"Cumulus paper is published in Nature Methods journal. You can find the paper here.\n","date":1595808000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1595808000,"objectID":"d8aee01940a8d7973f8898b286094863","permalink":"https://lilab.mgh.harvard.edu/post/cumulus_paper/","publishdate":"2020-07-27T00:00:00Z","relpermalink":"/post/cumulus_paper/","section":"post","summary":"Cumulus paper is published in Nature Methods journal. You can find the paper here.","tags":[],"title":"The Cumulus paper is published!","type":"post"},{"authors":["M. Slyper*","C. B. M. Porter*","O. Ashenberg*","J. Waldman","E. Drokhlyansky","I. Wakiro","C. Smillie","G. Smith-Rosario","J. Wu","D. Dionne","S. Vigneau","J. Jané-Valbuena","T. L. Tickle","S. Napolitano","M. Su","A. G. Patel","A. Karlstrom","S. Gritsch","M. Nomura","A. Waghray","S. H. Gohil","A. M. Tsankov","L. Jerby-Arnon","O. Cohen","J. Klughammer","Y. Rosen","J. Gould","L. Nguyen","M. Hofree","P. J. Tramontozzi","**B. Li**","C. J. Wu","B. Izar","R. Haq","F. S. Hodi","C. H. Yoon","A. N. Hata","S. J. Baker","M. L. Suvà","R. Bueno","E. H. Stover","M. R. Clay","M. A. Dyer","N. B. Collins","U. A. Matulonis","N. Wagle","B. E. Johnson","A. Rotem","O. Rozenblatt-Rosen O\u0026dagger;","A. Regev\u0026dagger;"],"categories":null,"content":"","date":1589155200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1589155200,"objectID":"9d06170e474c19c2db3b8ef56d4215dd","permalink":"https://lilab.mgh.harvard.edu/publication/slyper-2020/","publishdate":"2020-05-11T00:00:00Z","relpermalink":"/publication/slyper-2020/","section":"publication","summary":"Single-cell genomics is essential to chart tumor ecosystems. Although single-cell RNA-Seq (scRNA-Seq) profiles RNA from cells dissociated from fresh tumors, single-nucleus RNA-Seq (snRNA-Seq) is needed to profile frozen or hard-to-dissociate tumors. Each requires customization to different tissue and tumor types, posing a barrier to adoption. Here, we have developed a systematic toolbox for profiling fresh and frozen clinical tumor samples using scRNA-Seq and snRNA-Seq, respectively. We analyzed 216,490 cells and nuclei from 40 samples across 23 specimens spanning eight tumor types of varying tissue and sample characteristics. We evaluated protocols by cell and nucleus quality, recovery rate and cellular composition. scRNA-Seq and snRNA-Seq from matched samples recovered the same cell types, but at different proportions. Our work provides guidance for studies in a broad range of tumors, including criteria for testing and selecting methods from the toolbox for other tumors, thus paving the way for charting tumor atlases.","tags":null,"title":"A Single-Cell and Single-Nucleus RNA-Seq Toolbox for Fresh and Frozen Human Tumors","type":"publication"},{"authors":["T. Ouspenskaia*","T. Law*","K. R. Clauser*","S. Klaeger*","S. Sarkizova","F. Aguet","**B. Li**","E. Christian","B. A. Knisbacher","P. M. Le","C. R. Hartigan","H. Keshishian","A. Apffel","G. Oliveira","W. Zhang","Y. T. Chow","Z. Ji","I. Jungreis","S. A. Shukla","P. Bachireddy","M. Kellis","G. Getz","N. Hacohen","D. B. Keskin#","S. A. Carr#","C. J. Wu#\u0026dagger;","A. Regev#\u0026dagger;"],"categories":null,"content":"","date":1589155200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1589155200,"objectID":"78c922660ef4f30f8d42c52914200a12","permalink":"https://lilab.mgh.harvard.edu/publication/ouspenskaia-2020/","publishdate":"2020-05-11T00:00:00Z","relpermalink":"/publication/ouspenskaia-2020/","section":"publication","summary":"Tumor epitopes – peptides that are presented on surface-bound MHC I proteins - provide targets for cancer immunotherapy and have been identified extensively in the annotated protein-coding regions of the genome. Motivated by the recent discovery of translated novel unannotated open reading frames (nuORFs) using ribosome profiling (Ribo-seq), we hypothesized that cancer-associated processes could generate nuORFs that can serve as a new source of tumor antigens that harbor somatic mutations or show tumor-specific expression. To identify cancer-specific nuORFs, we generated Ribo-seq profiles for 29 malignant and healthy samples, developed a sensitive analytic approach for hierarchical ORF prediction, and constructed a high-confidence database of translated nuORFs across tissues. Peptides from 3,555 unique translated nuORFs were presented on MHC I, based on analysis of an extensive dataset of MHC I-bound peptides detected by mass spectrometry, with 20-fold more nuORF peptides detected in the MHC I immunopeptidomes compared to whole proteomes. We further detected somatic mutations in nuORFs of cancer samples and identified nuORFs with tumor-specific translation in melanoma, chronic lymphocytic leukemia and glioblastoma. NuORFs thus expand the pool of MHC I-presented, tumorspecific peptides, targetable by immunotherapies.","tags":null,"title":"Thousands of Novel Unannotated Proteins Expand the MHC I Immunopeptidome in Cancer","type":"publication"},{"authors":null,"categories":[],"content":" Hui Ma officially joins Li Lab today. Although this is a tough time all over the country, hope she could have a time learning a lot and having fun in our lab!\n","date":1588896000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1588896000,"objectID":"1f851d7fb225ce85f94048c0d334f3d4","permalink":"https://lilab.mgh.harvard.edu/post/welcome_hui/","publishdate":"2020-05-08T00:00:00Z","relpermalink":"/post/welcome_hui/","section":"post","summary":"Hui Ma officially joins Li Lab today. Although this is a tough time all over the country, hope she could have a time learning a lot and having fun in our lab!","tags":[],"title":"Welcome co-op student Hui Ma joining our lab!","type":"post"},{"authors":["P. Sen","A. R. Wilkie","F. Ji","**Y. Yang**","I. J. Taylor","M. Velazquez-Palafox","E. A. H. Vanni","J. M. Pesola","R. Fernandez","H. Chen","L. M. Morsett","E. R. Abels","M. Piper","R. J. Lane","S. E. Hickman","T. K. Means","E. S. Rosenberg","R. I. Sadreyev","**B. Li**","D. M. Coen","J. A. Fishman","J. El Khoury"],"categories":null,"content":"","date":1587513600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1587513600,"objectID":"0db54b91c0e94ce5df3bb514af466467","permalink":"https://lilab.mgh.harvard.edu/publication/sen-2020/","publishdate":"2020-04-22T00:00:00Z","relpermalink":"/publication/sen-2020/","section":"publication","summary":"Cytomegalovirus (CMV) is an important cause of morbidity and mortality in the immunocompromised host. In transplant recipients, a variety of clinically important “indirect effects” are attributed to immune modulation by CMV, including increased mortality from fungal disease, allograft dysfunction and rejection in solid organ transplantation, and graft-versus-host-disease in stem cell transplantation. Monocytes, key cellular targets of CMV, are permissive to primary, latent and reactivated CMV infection. Here, pairing unbiased bulk and single cell transcriptomics with functional analyses we demonstrate that human monocytes infected with CMV do not effectively phagocytose fungal pathogens, a functional deficit which occurs with decreased expression of fungal recognition receptors. Simultaneously, CMV-infected monocytes upregulate antiviral, pro-inflammatory chemokine, and inflammasome responses associated with allograft rejection and graft-versus-host disease. Our study demonstrates that CMV modulates both immunosuppressive and immunostimulatory monocyte phenotypes, explaining in part, its paradoxical “indirect effects” in transplantation. These data could provide innate immune targets for the stratification and treatment of CMV disease.","tags":null,"title":"Linking Indirect Effects of Cytomegalovirus in Transplantation to Modulation of Monocyte Innate Immune Function","type":"publication"},{"authors":["M. Lavaert","K. L. Liang","N. Vandamme","J. E. Park","J. Roels","M. S. Kowalczyk","**B. Li**","O. Ashenberg","M. Tabaka","D. Dionne","T. L. Tickle","M. Slyper","O. Rozenblatt-Rosen","B. Vandekerckhove","G. Leclercq","A. Regev","P. Van Vlierberghe","M. Guilliams","S. A. Teichmann","Y. Saeys","T. Taghon"],"categories":null,"content":"","date":1587081600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1587081600,"objectID":"7fce51412f9b163d13565c0b95bc7ff3","permalink":"https://lilab.mgh.harvard.edu/publication/lavaert-2020/","publishdate":"2020-04-17T00:00:00Z","relpermalink":"/publication/lavaert-2020/","section":"publication","summary":"During postnatal life, thymopoiesis depends on the continuous colonization of the thymus by bone-marrow-derived hematopoietic progenitors that migrate through the bloodstream. The current understanding of the nature of thymic immigrants is largely based on data from pre-clinical models. Here, we employed single-cell RNA sequencing (scRNA-seq) to examine the immature postnatal thymocyte population in humans. Integration of bone marrow and peripheral blood precursor datasets identified two putative thymus seeding progenitors that varied in expression of CD7; CD10; and the homing receptors CCR7, CCR9, and ITGB7. Whereas both precursors supported T cell development, only one contributed to intrathymic dendritic cell (DC) differentiation, predominantly of plasmacytoid dendritic cells. Trajectory inference delineated the transcriptional dynamics underlying early human T lineage development, enabling prediction of transcription factor (TF) modules that drive stage-specific steps of human T cell development. This comprehensive dataset defines the expression signature of immature human thymocytes and provides a resource for the further study of human thymopoiesis.","tags":null,"title":"Integrated scRNA-Seq Identifies Human Postnatal Thymus Seeding Progenitors and Regulatory Dynamics of Differentiating Immature Thymocytes","type":"publication"},{"authors":["Yiming Yang","Bo Li"],"categories":null,"content":"Harmony-pytorch is an ultrafast Pytorch implementation of the Harmony batch correction method.\nHarmony-pytorch is released on PyPI as a Python package.\n","date":1579132800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1579132800,"objectID":"dcf7ff9e6821aa4ac0e26d23a68a38f6","permalink":"https://lilab.mgh.harvard.edu/software/harmony-pytorch/","publishdate":"2020-01-16T00:00:00Z","relpermalink":"/software/harmony-pytorch/","section":"software","summary":"Pytorch implementation of the Harmony batch correction method","tags":null,"title":"Harmony-Pytorch","type":"software"},{"authors":["Joshua Gould","Yiming Yang","Bo Li"],"categories":null,"content":"Cirrocumulus is an interactive web application for exploring single cell datasets. It can be hosted on Google App Engine application for collaborative use or can be run in standalone mode on a personal computer. Cirrocumulus consists of a client-side component implemented in JavaScript and a server component implemented in Python. The client uses React to manage state and WebGL to visualize variables on a 2D or 3D embedding in a performant manner. The server component consists of functions to manage datasets, slice variables from a dataset stored as a folder of JSON and PARQUET files, and can optionally generate statistical summaries on an n-dimensional grid, thus enabling plotting of millions of cells. Cumulus has options to generate the Cirrocumulus input folder automatically for users. In the standalone mode, Cirrocumulus can additionally support datasets in AnnData or Zarr formats.\nClick here to read Cirrocumulus documentation\nCirrocumulus is released on PyPI as a Python package.\nPublications:\n Cumulus preprint ","date":1572393600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1572393600,"objectID":"9d7b77067a0d7f2aebcaddeeee0b4582","permalink":"https://lilab.mgh.harvard.edu/software/cirrocumulus/","publishdate":"2019-10-30T00:00:00Z","relpermalink":"/software/cirrocumulus/","section":"software","summary":"Cloud-based serverless scRNA-seq data visualizer","tags":null,"title":"Cirrocumulus","type":"software"},{"authors":["Bo Li","Joshua Gould","Yiming Yang","et al."],"categories":null,"content":"Cumulus is a cloud-based single-cell and single-nucleus RNA-Seq data analysis framework that is scalable, cost-effective, able to process a variety of data types and easily accessible to biologists. It is developed based on Google Cloud Platform and Broad Institute’s FireCloud and Terra services, and is publicly accessible.\nClick here to read Cumulus documentation\nPublications:\n Cumulus preprint ","date":1572393600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1572393600,"objectID":"4fb9c290f440320a3358d539e8586231","permalink":"https://lilab.mgh.harvard.edu/software/cumulus/","publishdate":"2019-10-30T00:00:00Z","relpermalink":"/software/cumulus/","section":"software","summary":"Cloud-based single-cell and single-nucleus RNA-seq data analysis framework","tags":null,"title":"Cumulus","type":"software"},{"authors":["Bo Li","Joshua Gould","Yiming Yang","et al."],"categories":null,"content":"Pegasus is a python package that takes gene-count matrices as input and performs a wide range of analyses, such as \u0026ndash;\n Quality control; Highly variable gene selection; Batch correction; Principal component analysis (PCA); k-nearest neighbor (kNN) graph construction; Diffusion map; Community-graph based clustering; Visualization on t-SNE, UMAP, and force-directed layout (FLE) embeddings; Differential expression, cluster-specific marker detection; Marker-based cell type annotation. Click here to read Pegasus documentation.\nPegasus is released on PyPI as a Python package.\nPublications:\n Cumulus preprint ","date":1572393600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1572393600,"objectID":"7019f4d330462404b62a5f4071c55818","permalink":"https://lilab.mgh.harvard.edu/software/pegasus/","publishdate":"2019-10-30T00:00:00Z","relpermalink":"/software/pegasus/","section":"software","summary":"Python analysis module of Cumulus, functionally comparable to Seurat and SCANPY","tags":null,"title":"Pegasus","type":"software"},{"authors":null,"categories":[],"content":"Cumulus manuscript is posted on bioRxiv. You can find the manuscript here.\n","date":1572393600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1572393600,"objectID":"81b5178a6cf4981f8e11faa011532bb6","permalink":"https://lilab.mgh.harvard.edu/post/cumulus_biorxiv/","publishdate":"2019-10-30T00:00:00Z","relpermalink":"/post/cumulus_biorxiv/","section":"post","summary":"Cumulus manuscript is posted on bioRxiv. You can find the manuscript here.","tags":[],"title":"The Cumulus manuscript is posted on bioRxiv!","type":"post"},{"authors":["B. J. Haas\u0026dagger;","A. Dobin","**B. Li**","N. Stransky","N. Pochet","A. Regev"],"categories":null,"content":"","date":1571616000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1571616000,"objectID":"fa6fa8275f80d4381c61cfe411b741ef","permalink":"https://lilab.mgh.harvard.edu/publication/haas-2019/","publishdate":"2019-10-21T00:00:00Z","relpermalink":"/publication/haas-2019/","section":"publication","summary":"**Background**Accurate fusion transcript detection is essential for comprehensive characterization of cancer transcriptomes. Over the last decade, multiple bioinformatic tools have been developed to predict fusions from RNA-seq, based on either read mapping or de novo fusion transcript assembly.**Results**We benchmark 23 different methods including applications we develop, STAR-Fusion and TrinityFusion, leveraging both simulated and real RNA-seq. Overall, STAR-Fusion, Arriba, and STAR-SEQR are the most accurate and fastest for fusion detection on cancer transcriptomes.**Conclusion**The lower accuracy of de novo assembly-based methods notwithstanding, they are useful for reconstructing fusion isoforms and tumor viruses, both of which are important in cancer research.","tags":null,"title":"Accuracy Assessment of Fusion Transcript Detection via Read-Mapping and De Novo Fusion Transcript Assembly-based Methods","type":"publication"},{"authors":["D. M. Popescu*","R. A. Botting*","E. Stephenson*","K. Green","S. Webb","L. Jardine","E. F. Calderbank","K. Polanski","I. Goh","M. Efremova","M. Acres","D. Maunder","P. Vegh","Y. Gitton","J. E. Park","R. Vento-Tormo","Z. Miao","D. Dixon","R. Rowell","D. McDonald","J. Fletcher","E. Poyner","G. Reynolds","M. Mather","C. Moldovan","L. Mamanova","F. Greig","M. D. Young","K. B. Meyer","S. Lisgo","J. Bacardit","A. Fuller","B. Millar","B. Innes","S. Lindsay","M. J. T. Stubbington","M. S. Kowalczyk","**B. Li**","O. Ashenberg","M. Tabaka","D. Dionne","T. L. Tickle","M. Slyper","O. Rozenblatt-Rosen","A. Filby","P. Carey","A. C. Villani","A. Roy","A. Regev","A. Chédotal","I. Roberts","B. Göttgens","S. Behjati","E. Laurenti\u0026dagger;","S. A. Teichmann\u0026dagger;","M. Haniffa\u0026dagger;"],"categories":null,"content":"","date":1570579200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1570579200,"objectID":"606256f5c31b664f8ae9cf0f61b3d341","permalink":"https://lilab.mgh.harvard.edu/publication/popescu-2019/","publishdate":"2019-10-09T00:00:00Z","relpermalink":"/publication/popescu-2019/","section":"publication","summary":"Definitive haematopoiesis in the fetal liver supports self-renewal and differentiation of haematopoietic stem cells and multipotent progenitors (HSC/MPPs) but remains poorly defined in humans. Here, using single-cell transcriptome profiling of approximately 140,000 liver and 74,000 skin, kidney and yolk sac cells, we identify the repertoire of human blood and immune cells during development. We infer differentiation trajectories from HSC/MPPs and evaluate the influence of the tissue microenvironment on blood and immune cell development. We reveal physiological erythropoiesis in fetal skin and the presence of mast cells, natural killer and innate lymphoid cell precursors in the yolk sac. We demonstrate a shift in the haemopoietic composition of fetal liver during gestation away from being predominantly erythroid, accompanied by a parallel change in differentiation potential of HSC/MPPs, which we functionally validate. Our integrated map of fetal liver haematopoiesis provides a blueprint for the study of paediatric blood and immune disorders, and a reference for harnessing the therapeutic potential of HSC/MPPs.","tags":null,"title":"Decoding Human Fetal Liver Haematopoiesis","type":"publication"},{"authors":["J. T. Gaublomme*\u0026dagger;","**B. Li***","C. McCabe","A. Knecht","**Y. Yang**","E. Drokhlyansky","N. Van Wittenberghe","J. Waldman","D. Dionne","L. Nguyen","P. De Jager","B. Yeung","X. Zhao","N. Habib","O. Rozenblatt-Rosen\u0026dagger;","A. Regev\u0026dagger;"],"categories":null,"content":"","date":1562025600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1562025600,"objectID":"1375ddeb35c20c231690f99239cb01cf","permalink":"https://lilab.mgh.harvard.edu/publication/gaublomme-2019/","publishdate":"2019-07-02T00:00:00Z","relpermalink":"/publication/gaublomme-2019/","section":"publication","summary":"Single-nucleus RNA-seq (snRNA-seq) enables the interrogation of cellular states in complex tissues that are challenging to dissociate or are frozen, and opens the way to human genetics studies, clinical trials, and precise cell atlases of large organs. However, such applications are currently limited by batch effects, processing, and costs. Here, we present an approach for multiplexing snRNA-seq, using sample-barcoded antibodies to uniquely label nuclei from distinct samples. Comparing human brain cortex samples profiled with or without hashing antibodies, we demonstrate that nucleus hashing does not significantly alter recovered profiles. We develop DemuxEM, a computational tool that detects inter-sample multiplets and assigns singlets to their sample of origin, and validate its accuracy using sex-specific gene expression, species-mixing and natural genetic variation. Our approach will facilitate tissue atlases of isogenic model organisms or from multiple biopsies or longitudinal samples of one donor, and large-scale perturbation screens.","tags":null,"title":"Nuclei Multiplexing with Barcoded Antibodies for Single-Nucleus Genomics","type":"publication"},{"authors":null,"categories":[],"content":"","date":1562025600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1562025600,"objectID":"1784a5eda538eb794a055ba0fdd062a7","permalink":"https://lilab.mgh.harvard.edu/post/paper_published_in_nature_communications/","publishdate":"2019-07-02T00:00:00Z","relpermalink":"/post/paper_published_in_nature_communications/","section":"post","summary":"","tags":[],"title":"Our nucleus hashing and DemuxEM paper is published in Nature Communications today!","type":"post"},{"authors":null,"categories":[],"content":"","date":1556668800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1556668800,"objectID":"699e032f47e0ff848081f2226887634e","permalink":"https://lilab.mgh.harvard.edu/post/li_lab_website_created/","publishdate":"2019-05-01T00:00:00Z","relpermalink":"/post/li_lab_website_created/","section":"post","summary":"","tags":[],"title":"The Li Lab website is created!","type":"post"},{"authors":["Bo Li"],"categories":null,"content":"PROBer is the first unified probabilistic framework for the analysis of a diverse set of sequencing-based ‘toeprinting’ assays. These assays are used to probe RNA secondary structure (DMS/SHAPE-Seq), detect epitranscriptomic mark (Pseudo-Seq), or identify RNA-protein interaction (iCLIP/eCLIP), which are important to understanding post-transcriptional gene regulation from all aspects.\nClick here to view PROBer website\nPublications:\n PROBer paper ","date":1495584000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1495584000,"objectID":"d912d6007ceb270d3e462c43060b944b","permalink":"https://lilab.mgh.harvard.edu/software/prober/","publishdate":"2017-05-24T00:00:00Z","relpermalink":"/software/prober/","section":"software","summary":"A principled and unified probabilistic framework for analyzing sequencing-based 'toeprinting' assays","tags":null,"title":"PROBer","type":"software"},{"authors":["**B. Li**","A. Tambe","S. Aviran","L. Pachter"],"categories":null,"content":"","date":1495584000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1495584000,"objectID":"a730c9af85170b5c6891e74ad2f08702","permalink":"https://lilab.mgh.harvard.edu/publication/li-2017/","publishdate":"2017-05-24T00:00:00Z","relpermalink":"/publication/li-2017/","section":"publication","summary":"A number of sequencing-based transcriptase drop-off assays have recently been developed to probe post-transcriptional dynamics of RNA-protein interaction, RNA structure, and RNA modification. Although these assays survey a diverse set of epitranscriptomic marks, we use the term toeprinting assays since they share methodological similarities. Their interpretation is predicated on addressing a similar computational challenge: how to learn isoform-specific chemical modification profiles in the face of complex read multi-mapping. We introduce PROBer, a statistical model and associated software, that addresses this challenge for the analysis of toeprinting assays. PROBer takes sequencing data as input and outputs estimated transcript abundances and isoform-specific modification profiles. Results on both simulated and biological data demonstrate that PROBer significantly outperforms individual methods tailored for specific toeprinting assays. Since the space of toeprinting assays is ever expanding and these assays are likely to be performed and analyzed together, we believe PROBer's unified data analysis solution will be valuable to the RNA community.","tags":null,"title":"PROBer Provides a General Toolkit for Analyzing Sequencing-Based Toeprinting Assays","type":"publication"},{"authors":["B. Haas","A. Dobin","N. Stransky","**B. Li**","X. Yang","T. Tickle","A. Bankapur","C. Ganote","T. Doak","N. Pochet","J. Sun","C. Wu","T. Gingeras","A. Regev"],"categories":null,"content":"","date":1490313600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1490313600,"objectID":"8aa2a619151dbb6cf9143028e19038b7","permalink":"https://lilab.mgh.harvard.edu/publication/haas-2017/","publishdate":"2017-03-24T00:00:00Z","relpermalink":"/publication/haas-2017/","section":"publication","summary":"**Motivation:** Fusion genes created by genomic rearrangements can be potent drivers of tumorigenesis. However, accurate identification of functionally fusion genes from genomic sequencing requires whole genome sequencing, since exonic sequencing alone is often insufficient. Transcriptome sequencing provides a direct, highly effective alternative for capturing molecular evidence of expressed fusions in the precision medicine pipeline, but current methods tend to be inefficient or insufficiently accurate, lacking in sensitivity or predicting large numbers of false positives. Here, we describe STAR-Fusion, a method that is both fast and accurate in identifying fusion transcripts from RNA-Seq data.**Results:** We benchmarked STAR-Fusion's fusion detection accuracy using both simulated and genuine Illumina paired-end RNA-Seq data, and show that it has superior performance compared to popular alternative fusion detection methods.**Availability and implementation:** STAR-Fusion is implemented in Perl, freely available as open source software at http://star-fusion.github.io, and supported on Linux.","tags":null,"title":"STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq","type":"publication"},{"authors":["K. Choudhary","N. P. Shih","F. Deng","M. Ledda","**B. Li**","S. Aviran"],"categories":null,"content":"","date":1480550400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1480550400,"objectID":"7712212b49e4f240077d80aec3f542af","permalink":"https://lilab.mgh.harvard.edu/publication/choudhary-2016/","publishdate":"2016-12-01T00:00:00Z","relpermalink":"/publication/choudhary-2016/","section":"publication","summary":"**Motivation:** The diverse functionalities of RNA can be attributed to its capacity to form complex and varied structures. The recent proliferation of new structure probing techniques coupled with high-throughput sequencing has helped RNA studies expand in both scope and depth. Despite differences in techniques, most experiments face similar challenges in reproducibility due to the stochastic nature of chemical probing and sequencing. As these protocols expand to transcriptome-wide studies, quality control becomes a more daunting task. General and efficient methodologies are needed to quantify variability and quality in the wide range of current and emerging structure probing experiments.**Results:** We develop metrics to rapidly and quantitatively evaluate data quality from structure probing experiments, demonstrating their efficacy on both small synthetic libraries and transcriptome-wide datasets. We use a signal-to-noise ratio concept to evaluate replicate agreement, which has the capacity to identify high-quality data. We also consider and compare two methods to assess variability inherent in probing experiments, which we then utilize to evaluate the coverage adjustments needed to meet desired quality. The developed metrics and tools will be useful in summarizing large-scale datasets and will help standardize quality control in the field.**Availability and Implementation:**The data and methods used in this article are freely available at: http://bme.ucdavis.edu/aviranlab/SPEQC_software. ","tags":null,"title":"Metrics for Rapid Quality Control in RNA Structure Probing Experiments","type":"publication"},{"authors":["X. Zeng","**B. Li**","R. Welch","C. Rojo","Y. Zheng","C. N. Dewey","S. Keleş"],"categories":null,"content":"","date":1445299200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1445299200,"objectID":"4a1ccc588a30b8d8d9904a8c1f7ef6a2","permalink":"https://lilab.mgh.harvard.edu/publication/zeng-2015/","publishdate":"2015-10-20T00:00:00Z","relpermalink":"/publication/zeng-2015/","section":"publication","summary":"Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells’ regulatory programs. Advancements in next generation sequencing enabled genome-wide profiling of protein-DNA interactions by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). However, interactions in highly repetitive regions of genomes have proven difficult to map since short reads of 50–100 base pairs (bps) from these regions map to multiple locations in reference genomes. Standard analytical methods discard such multi-mapping reads and the few that can accommodate them are prone to large false positive and negative rates. We developed Perm-seq, a prior-enhanced read allocation method for ChIP-seq experiments, that can allocate multi-mapping reads in highly repetitive regions of the genomes with high accuracy. We comprehensively evaluated Perm-seq, and found that our prior-enhanced approach significantly improves multi-read allocation accuracy over approaches that do not utilize additional data types. The statistical formalism underlying our approach facilitates supervising of multi-read allocation with a variety of data sources including histone ChIP-seq. We applied Perm-seq to 64 ENCODE ChIP-seq datasets from GM12878 and K562 cells and identified many novel protein-DNA interactions in segmental duplication regions. Our analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions.","tags":null,"title":"Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping","type":"publication"},{"authors":["Bo Li","Nathanael Fillmore"],"categories":null,"content":"A de novo transcriptome assembly evaluation package, which contains two components, RSEM-EVAL and REF-EVAL. Bo developed the RSEM-EVAL component.\nPublications:\n DETONATE paper ","date":1419120000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1419120000,"objectID":"8edf8ad323ecac14c5da907719573616","permalink":"https://lilab.mgh.harvard.edu/software/detonate/","publishdate":"2014-12-21T00:00:00Z","relpermalink":"/software/detonate/","section":"software","summary":"A de novo transcriptome assembly evaluation package, which contains two components, RSEM-EVAL and REF-EVAL. Bo developed the RSEM-EVAL component.","tags":null,"title":"DETONATE (DE novo TranscriptOme rNa-seq Assembly with or without the Truth Evaluation)","type":"software"},{"authors":["**B. Li***","N. Fillmore*","Y. Bai","M. Collins","J. A. Thomson","R. Stewart","C. N. Dewey"],"categories":null,"content":"","date":1419120000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1419120000,"objectID":"39d8acae0afff78aa0c8f838f93fc37c","permalink":"https://lilab.mgh.harvard.edu/publication/li-2014/","publishdate":"2014-12-21T00:00:00Z","relpermalink":"/publication/li-2014/","section":"publication","summary":"De novo RNA-Seq assembly facilitates the study of transcriptomes for species without sequenced genomes, but it is challenging to select the most accurate assembly in this context. To address this challenge, we developed a model-based score, RSEM-EVAL, for evaluating assemblies when the ground truth is unknown. We show that RSEM-EVAL correctly reflects assembly accuracy, as measured by REF-EVAL, a refined set of ground-truth-based scores that we also developed. Guided by RSEM-EVAL, we assembled the transcriptome of the regenerating axolotl limb; this assembly compares favorably to a previous assembly. A software package implementing our methods, DETONATE, is freely available at http://deweylab.biostat.wisc.edu/detonate .","tags":null,"title":"Evaluation of de novo Transcriptome Assemblies from RNA-Seq Data","type":"publication"},{"authors":["B. Haas","A. Papanicolaou","M. Yassour","M. Grabherr","P. D. Blood","J. Bowden","M. B. Couger","D. Eccles","**B. Li**","M. Lieber","M. D. MacManes","M. Ott","J. Orvis","N. Pochet","F. Strozzi","N. Weeks","R. Westerman","T. William","C. N. Dewey","R. Henschel","R. D. LeDuc","N. Friedman","A. Regev"],"categories":null,"content":"","date":1373500800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1373500800,"objectID":"75ff34c3acc4fba7b396430836b661e9","permalink":"https://lilab.mgh.harvard.edu/publication/haas-2013/","publishdate":"2013-07-11T00:00:00Z","relpermalink":"/publication/haas-2013/","section":"publication","summary":"De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.","tags":null,"title":"De novo Transcript Sequence Reconstruction from RNA-seq using the Trinity Platform for Reference Generation and Analysis","type":"publication"},{"authors":["**B. Li**","C. N. Dewey"],"categories":null,"content":"","date":1312416000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1312416000,"objectID":"cce8e41d08969b3335724c648cb9ba5a","permalink":"https://lilab.mgh.harvard.edu/publication/li-2011/","publishdate":"2011-08-04T00:00:00Z","relpermalink":"/publication/li-2011/","section":"publication","summary":"**Background**RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.**Results**We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.**Conclusions**RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.","tags":null,"title":"RSEM: Accurate Transcript Quantification from RNA-Seq Data with or without a Reference Genome","type":"publication"},{"authors":["Bo Li"],"categories":null,"content":"As one of the first ChIP-Seq multi-mapping read allocators, CSEM allows multi-reads to be utilized by peak callers.\nPublications:\n CSEM paper ","date":1310601600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1310601600,"objectID":"d95408ac37bd0cfd3dfc86daaaaed08f","permalink":"https://lilab.mgh.harvard.edu/software/csem/","publishdate":"2011-07-14T00:00:00Z","relpermalink":"/software/csem/","section":"software","summary":"As one of the first ChIP-Seq multi-mapping read allocators, CSEM allows multi-reads to be utilized by peak callers.","tags":null,"title":"CSEM (ChIP-Seq multi-read allocation using Expectation-Maximization)","type":"software"},{"authors":["D. Chung","P. F. Kuan","**B. Li**","R. Sanalkumar","K. Liang","E. H. Bresnick","C. Dewey","S. Keleş"],"categories":null,"content":"","date":1310601600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1310601600,"objectID":"7d35f4bee8b9b286012b7c3ebda1ff98","permalink":"https://lilab.mgh.harvard.edu/publication/chung-2011/","publishdate":"2011-07-14T00:00:00Z","relpermalink":"/publication/chung-2011/","section":"publication","summary":"Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.","tags":null,"title":"Discoevering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data","type":"publication"},{"authors":["**B. Li**","V. Ruotti","R. M. Stewart","J. A. Thomson","C. N. Dewey"],"categories":null,"content":"","date":1266192000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1266192000,"objectID":"242ee58450c0508f1cd50a2f6452fc44","permalink":"https://lilab.mgh.harvard.edu/publication/li-2010/","publishdate":"2010-02-15T00:00:00Z","relpermalink":"/publication/li-2010/","section":"publication","summary":"**Motivation:** RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically.**Results:** We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed.**Availability:** An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem .","tags":null,"title":"RNA-Seq Gene Expression Estimation with Read Mapping Uncertainty","type":"publication"},{"authors":["Bo Li"],"categories":null,"content":"RSEM is a widely-used RNA-Seq transcript quantification tool. RSEM papers are cited over 5,300 times. It was served in nationwide consortium projects such as ENCODE (The Encyclopedia of DNA Elements) and TCGA (The Cancer Genome Altas). It is also recommended by HCA (Human Cell Atlas) for analyzing plate-based SMART-Seq2 single-cell RNA-Seq data.\nPublications:\n RSEM algorithm RSEM software ","date":1266192000,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1266192000,"objectID":"f25d7259776fc2c1a9c14848311f7b75","permalink":"https://lilab.mgh.harvard.edu/software/rsem/","publishdate":"2010-02-15T00:00:00Z","relpermalink":"/software/rsem/","section":"software","summary":"RSEM is a widely-used RNA-Seq transcript quantification tool. RSEM papers are cited over 5,300 times.","tags":null,"title":"RSEM (RNA-Seq by Expectation-Maximization)","type":"software"}]