StringR – operations on strings
This time we’ll show you some helpful functions from the stringr library that work on strings. Let’s load the library first:
1 |
library(stringr) |
Let’s create a sample vector with company names:
1 |
n = c("ibm","asus","acer","microsoft","lenovo","msi","dell") |
We can use the function str_detect to check wheather a particular phrase/letter occurs in a paricular element of the vector:
1 2 3 |
str_detect(n,"a") [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE |
Str_which in turn will return us the index of elements containing that phrase:
1 2 |
str_which(n,"a") [1] 2 3 |
If we want to count how many times a particular phrase occurs in the elements, we use str_count:
1 2 |
str_count(n,"l") [1] 0 0 0 0 1 0 2 |
Text fragments can be separated with str_sub:
1 2 |
str_sub(n,start = 1,end = 2) [1] "ib" "as" "ac" "mi" "le" "ms" "de" |
We select the elements that contain the phrase using str_subset:
1 2 |
str_subset(n,"a") [1] "asus" "acer" |
And we extract the length of each element with str_length:
1 2 |
str_length(n) [1] 3 4 4 9 6 3 4 |
Unnecessary spaces at the beginning and end of the string are removed with str_trim:
1 2 |
str_trim(" dell ") [1] "dell" |
Replacing phrases in elements with other other phrases with str_replace:
1 2 |
str_replace(n,"a","A") [1] "ibm" "Asus" "Acer" "microsoft" "lenovo" "msi" "dell" |
We convert strings to uppercase with str_to_upper:
1 2 |
str_to_upper(n) [1] "IBM" "ASUS" "ACER" "MICROSOFT" "LENOVO" "MSI" "DELL" |
As in a sentence we will use str_title:
1 2 |
str_to_title(n) [1] "Ibm" "Asus" "Acer" "Microsoft" "Lenovo" "Msi" "Dell" |
We will combine two strings with str_c:
1 2 |
str_c(n,str_to_upper(n)) [1] "ibmIBM" "asusASUS" "acerACER" "microsoftMICROSOFT" "lenovoLENOVO" "msiMSI" "dellDELL" |
We convert a vector of strings into a single string with:
1 2 |
str_c(n,collapse = ';') [1] "ibm;asus;acer;microsoft;lenovo;msi;dell" |
We sort the strings with str_sort:
1 2 |
str_sort(n) [1] "acer" "asus" "dell" "ibm" "lenovo" "microsoft" "msi" |
Of course, you can find more functions in the documentation. We encourage you to.