StringR – operations on strings
This time we’ll show you some helpful functions from the stringr library that work on strings. Let’s load the library first:
library(stringr)
Let’s create a sample vector with company names:
n = c("ibm","asus","acer","microsoft","lenovo","msi","dell")
We can use the function str_detect to check wheather a particular phrase/letter occurs in a paricular element of the vector:
str_detect(n,"a")
[1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE
Str_which in turn will return us the index of elements containing that phrase:
str_which(n,"a")
[1] 2 3
If we want to count how many times a particular phrase occurs in the elements, we use str_count:
str_count(n,"l")
[1] 0 0 0 0 1 0 2
Text fragments can be separated with str_sub:
str_sub(n,start = 1,end = 2)
[1] "ib" "as" "ac" "mi" "le" "ms" "de"
We select the elements that contain the phrase using str_subset:
str_subset(n,"a")
[1] "asus" "acer"
And we extract the length of each element with str_length:
str_length(n)
[1] 3 4 4 9 6 3 4
Unnecessary spaces at the beginning and end of the string are removed with str_trim:
str_trim(" dell ")
[1] "dell"
Replacing phrases in elements with other other phrases with str_replace:
str_replace(n,"a","A")
[1] "ibm" "Asus" "Acer" "microsoft" "lenovo" "msi" "dell"
We convert strings to uppercase with str_to_upper:
str_to_upper(n)
[1] "IBM" "ASUS" "ACER" "MICROSOFT" "LENOVO" "MSI" "DELL"
As in a sentence we will use str_title:
str_to_title(n)
[1] "Ibm" "Asus" "Acer" "Microsoft" "Lenovo" "Msi" "Dell"
We will combine two strings with str_c:
str_c(n,str_to_upper(n))
[1] "ibmIBM" "asusASUS" "acerACER" "microsoftMICROSOFT" "lenovoLENOVO" "msiMSI" "dellDELL"
We convert a vector of strings into a single string with:
str_c(n,collapse = ';')
[1] "ibm;asus;acer;microsoft;lenovo;msi;dell"
We sort the strings with str_sort:
str_sort(n)
[1] "acer" "asus" "dell" "ibm" "lenovo" "microsoft" "msi"
Of course, you can find more functions in the documentation. We encourage you to.