rstats

Using tidymodels to Predict Health Insurance Cost

By Arta Seyedian Medical Cost Personal Datasets Insurance Forecast by using Linear Regression Link to Kaggle Page Link to GitHub Source Around the end of October 2020, I attended the Open Data Science Conference primarily for the workshops and training sessions that were offered. The first workshop I attended was a demonstration by Jared Lander on how to implement machine learning methods in R using a new package named tidymodels.

Using VisiumExperiment at spatialLIBD package

By Brenda Pardo A month ago, I started an enriching adventure by joining Leonardo Collado-Torres’ team at Lieber Institute for Brain Development. Since then, I have been working on modifying spatialLIBD, a package to interactively visualize the LIBD human dorsolateral pre-frontal cortex (DLPFC) spatial transcriptomics data (Maynard, Collado-Torres, Weber, Uytingco, et al., 2020). The performed modifications allow spatialLIBD to use objects of the VisiumExperiment class, which is designed to specifically store spatial transcriptomics data (Righelli and Risso, 2020).

Using Space Ranger at JHPCE

By Nick Eagles As part of recent LIBD work with spatial gene expression, I recently was recommended the tool Space Ranger, which provides software pipelines walking Visium spatial RNA-seq samples through the steps we ultimately need to explore gene expression coupled with spatial information. In this blog post, I’ll explain how to start using Space Ranger at JHPCE, focusing heavily on the set-up details relevant to this cluster in particular.

R 101

HAPPY HOLIDAYS!!!🎉⛄🎆🍾❄ In the spirit of the coming new year and new beginnings, we created a tutorial for getting started or restarted with R. If you are new to R or have dabbled in R but haven’t used it much recently, then this post is for you. We will focus on data classes and types, as well as data wrangling, and we will provide basic statistics and basic plotting examples using real data.

Quality Surrogate Variable Analysis

By Amy Peterson Studying genetic differential expression using postmortem human brain tissue requires an understanding of the effect brain tissue degradation has on genetic expression. Particularly when brain tissue degradation confounds1 the differences in gene expression levels between subject groups. This problem of confounding necessitates measures from a control dataset of postmortem tissue from individuals who do not have the outcome of interest. Doing so provides a comparative measure of the impact of tissue degradation on expression that can then be used in a case-control study to examine the impact of the outcome of interest on genetic expression.

Quick overview on the new Bioconductor 3.8 release

Every six months the Bioconductor project releases it’s new version of packages. This allows developers a time window to try out new methods and test them rigorously before releasing them to the community at large. It also means that this is an exciting time 🎉. With every release there are dozens1 of new software packages. Bioconductor version 3.8 was just released on Halloween: October 31st, 2018. Thus, this is the perfect time to browse through their descriptions and find out what’s new that can be of use to your research.

“Demystifying Data Science” remote notes

To carry on our momentum from a few weeks ago from our useR!2018 remote notes blog post, this time we will be summarizing the Demystifying Data Sience 2018 conference for which you can register for free. We are just following David Robinson’s advice to blog all the time! Conference overview We got interested in this conference1 thanks to tweets like these ones that highlight that: data scientists are young! specialists are more in demand!

Hacking our way through UpSetR

For our club meeting today we were going to summarize the Demystifying Data Science conference but we forgot that the videos are not released yet. Oops, we'll have to postpone our blog post. We didn't read the fine print that talk recordings will be available sometime next week. Sorry about that! — LIBD rstats club (@LIBDrstats) July 27, 2018 So we adjusted plans and decided to continue our work on the UpSetR (Gehlenborg, 2016) package by Nils Gehlenborg.

LIBD rstats club remote useR!2018 notes

For our July 13th 2018 LIBD rstats club meeting we decided to check as much as we could the useR!2018 conference. Here’s what we were able to figure out about it in about an hour. Hopefully our quick notes will help other rstats enthusiasts, users and developers get a glimpse of the conference. Although there’s bound to me more videos and material about the conference coming out in the following days.

Introduction to Scraping and Wrangling Tables from Research Articles

By Steve Semick. What do you do when you want to use results from the literature to anchor your own analysis? When these results are in the form of an easily accessible table, such as a .csv file or .xlsx file, then it is simple enough to download them and incorporate them into your research. Often times, however, published findings are not so easy to handle. Today, we’ll go through a practical scenario on scraping an html table from a Nature Genetics article into R and wrangling the data into a useful format.