TidyR – organize your data

Today we will introduce you to the basic funtions of the TidyR library, which is used to change the shape and hierarchy of a dataset. We’ll start, as always, by loading the library:

For our example, we will use data that we generate ourselves. For example, let’s take the number of men and women in regions A and B in the two years 2020 and 2021. We will use the tibble function to create a table with this data:

Our data looks like this:

The first function we will learn is gather, which converts columns to rows:

This function converts our two columns, 2020 and 2021, into a single column called count, and also adds a column called year that contains the names of the original columns:

The spread function, on the other hand, converts rows to columns. We will split our data from table g by gender:

TidyR has other useful functions to clean up our collection of missing NA data. Let’s add the missing data to our table dt:

The drop_na function removes rows with missing data from our data frame:

The row with missing data is removed:

The fill function fills the missing data with values from the top or bottom of the column:

The missing value was filled in from the bottom row:

Replace_na is used to replace the missing value with a user-defined value:

What if our data was stored in a column as a delimited string of numbers:

We use the seperate_rows function:

To separate our columns into individual rows:

Or the separate function:

To separate the data into two columns:

To return to writing with a separator, use the unite function:

The tools in the tidyR library are very useful for organising our data so that we can run analyses on it or create plots from it.

Leave a Reply

Your email address will not be published.

Translate using Google Translate»
Social media & sharing icons powered by UltimatelySocial

Podoba Ci się nasza strona? Odwiedź nasz profil