GGPlot2 – how to create a plot
In previous posts, we learned how to manipulate data. Today, we will learn about the ggplot2 library to plot data in graphs. Let’s initialize the library:
library(ggplot2)
Scatter plots
Let’s prepare the data for our first visualization. This will be a list of search coordinates in a square array where we found 1000 objects:
data = data.frame(x = as.integer(runif(1000,1,30)), y = as.integer(runif(1000,1,30)))
Using the ggplot library, we will plot their coordinates in a scatter plot using geom_point:
ggplot(data = data,aes(x,y)) + geom_point()
We have found several objects in several places, so they have the same coordinates. To display duplicate data, it is better to use geom_jitter, which changes the values slightly to allow “jittering” of the points:
ggplot(data = data,aes(x,y)) + geom_jitter()
The plot is more readable. There are other ggplot tools that can be used to plot overlapping data, but we leave it to you to find them.
To plot histograms, we use the geom_histogram function, where we can specify the width of the area (binwidth). Let’s look at where there were the most objects to the north:
ggplot(data = data,aes(x)) + geom_histogram(binwidth = 2)
Bar plots
Another type of commonly used chart is bar charts. Let’s prepare data that includes the names and distance people walked in four weeks:
data = data.frame(name = c("John","Barbara","Sophia","Jack","David","Kris","Walt","Bruce"), week1_km = runif(8,1,100),week2_km = runif(8,1,100),week3_km = runif(8,1,100),week4_km = runif(8,1,100))
We plot the distances from a single week on a chart using geom_bar:
ggplot(data = data,aes(name,week1_km)) + geom_bar(stat = "identity")
We can use the color and fill attributes to change the appearance of the bars:
ggplot(data = data,aes(name,week1_km)) + geom_bar(stat = "identity",color = "red",fill = "yellow")
Xlab, ylab change the axis descriptions, and ggtitle adds a name to the chart:
ggplot(data = data,aes(name,week1_km)) + geom_bar(stat = "identity",color = "red",fill = "yellow") + xlab("NAME") + ylab("WEEK DISTANCE [km]") + ggtitle("RUNNING DISTANCE")
We can quickly change the appearance of the chart by using default templates, e.g. theme_grey, theme_light:
ggplot(data = data,aes(name,week1_km)) + geom_bar(stat = "identity") + theme_light()
In bar charts, we can group the data by name, for example. But first we need to reformat them using the gather tool from the tidyr library:
library(tidyr)
data_new = gather(data,"week","distance",2:5)
We can display the data in a bar plot:
ggplot(data = data_new,aes(name,distance)) + geom_bar(stat = "identity",aes(fill=week))
At the top of the chart, the distances from the weeks are increasing, but if we change the postition attribute to “dodge” – the bars for each week are displayed side by side:
ggplot(data = data_new,aes(name,distance)) + geom_bar(stat = "identity",position = "dodge",aes(fill=week))
Axis descriptions and chart name are added or changed with labs:
ggplot(data = data_new,aes(name,distance)) +
geom_bar(stat = "identity",position = "dodge",aes(fill=week)) +
labs(title = "Plot",x = "NAME", y="DISTANCE (km)")
We will change the position of the legend with theme:
ggplot(data = data_new,aes(name,distance)) +
geom_bar(stat = "identity",position = "dodge",aes(fill=week)) +
labs(title = "Plot",x = "NAME", y="DISTANCE (km)") +
theme(legend.position = "bottom")
Box plots
Finally, let’s make some boxplots with the new data. Let’s generate the distances run by 8 people in 30 days:
data = data.frame(name = rep(c("John","Barbara","Sophia","Jack","David","Kris","Walt","Bruce"),30), distance = runif(30*8,3,20))
And let’s plot them in a boxplot:
ggplot(data = data,aes(name,distance)) + geom_boxplot()
Save plots to file
To save our charts to an external file (“eps”, “ps”, “tex”, “pdf”, “jpeg”, “tiff”, “png”, “bmp”, “svg” “wmf”) we use ggsave:
ggsave("D:/ggplot_boxplot.png", width = 5, height = 5)
The function saves the last displayed plot by default. Saving a particular plot is done by assigning it to a variable and inserting it into the ggsave function:
plot1 = ggplot(data = data,aes(x,y)) + geom_point()
ggsave("D:/plot.pdf", plot1, width = 5, height = 5)
The ggplot library has many functions and attributes that describe it. You can use all of these tools to create sophisticated plots. We leave the search for more complex solutions to you. You already have the basics.