Basic Statistical plotting using R

Hello Data Experts,

Let me continue from my last blog “Statistical Programming using R” where we discussed basic statistical inbuilt functions like Mean, Median Mode, Variance, Standard Deviation and Summary of statistical observation.

Let us move to an interesting section of R where we will explore Data Visualization keeping statistical scope in mind. Visualization help recall later instead of remembering numbers. Focus in this blog will be how to draw graphs like Box Plot, Scatter Diagram, Pie chart, Histogram, Line Graph, Bar chart and many more.

Before we get into visual representation of statistical data, it is important for us to understand different Data and Object types. There are different data types like Numeric, String, Integer, Binary. Different types of Objects can store different data types like List, Vector, Matrix, Data Frames, Factors and arrays

Vectors can have only single data type values. Let us look at few examples:

This will hold logical value

W <- TRUE

W

This will hold numeric value

N <- 5.5

N

This will hold Integer Value

G <- 4L

G

Whereas below example covers Character Data type

F <-"FALSE"

F

Let us try assigning set of numeric values to an object A, since all values assigned are numeric (homogenous set) hence default data type will be numeric.

A <- c(1,2,3,4,5,6,7,8,9)

A

If we have heterogeneous set of values in a vector it will convert each value into Character type. There is a data type loss in vectors.

G <- c("QW", 1, FALSE)

G

To retain the data type, we should go for LIST.

GLIST <- list("QW", 1, FALSE)

GLIST

So far we understood how is Vector different from List. Sorting works well with Vector but not for List. Let us try sort on both sets now, first sort will execute well whereas second one will error out as expected.

GSORT <- sort(G)

GSORT

GLISTSORT <- sort(GLIST)

GLISTSORT

Let us use merge command for both types. Merge for Vector is a Cartesian product whereas merge for List is merging of 2 data sets hence for vector 3*3 will be the outcome where 3+3 for list. This is important to understand.

VECTORMERGE <- merge(G, GSORT)

VECTORMERGE

LISTMERGE <- merge(GLIST, GLISTSORT)

LISTMERGE

To summarize attributes for Vector and List.

Vector: Converts data in characters and merging is a Cartesian product.

List: Retains the data type of the data and merging is a simple concatenation.

Let us discuss new object type ARRAY.

Array is a data type which hold data in rows and column form.

A1 <- c(1,2,3,4,5)

A2 <- c(9,8,7,6,5,4)

A1

A2

Array2 <- array(c(A1, A2),dim = c(3,5,2))

Array2

This will form an array of 3 rows, 5 columns and 2 dimensional.

Array2 <- array(c(A1 ,A2),dim = c(6,2,4))

Array2

This will form an array of 6 rows, 2 columns and 4 dimensional.

Let us move on to Matrix. It is a simple 2-dimension rectangular layout. TRUE or FALSE parameter is use to arrange data by Row or Column. TRUE will set data as Row and FALSE as COLUMN. Default value is FALSE.

A1 <- c(1,2,3,4,5)

A2 <- c(9,8,7,6,5,4)

MAT <- matrix(c(A1, A2),11,5)

MAT

MAT <- matrix(c(A1, A2),11,5, TRUE)

MAT

MAT <- matrix(c(A1, A2),5, 11)

MAT

MAT <- matrix(c(A1, A2), 5,11, TRUE)

MAT

There is another data type FACTOR, which is used to identify LEVEL i.e., it is same as getting the unique values in sort order.

A3 <- c(9,8,8,7,9,4,4,2,8,6,5,4)

FV <- factor(A3)

FV

Let us now talk about most important Object Type Data Frame. It is a table structure with Rows and columns.

IncidentNumber <- c(111,114,143,456)

IncidentDesc <- c("AA", "GG")

IncidentPri <- c("Low", "Low", "Medium", "Complex")

IncidentDet <- data.frame(IncidentNumber,IncidentDesc, IncidentPri)

IncidentDet

Since we have covered various types of objects, we will start will Graphical visualization now.

For PIE CHART let us populate 2 objects with values

PLOTVAL <- c(2,5,9,4,9)

PIEDESC <-c("A","B","C","D","E")

pie (PLOTVAL,PIEDESC, col = rainbow(length(PLOTVAL)))

Let us try to plot a BAR CHART

PLOTVAL <- c(2,5,9,4,9)

barplot(PLOTVAL)

To draw HISTOGRAM, we need an object with value

PLOTVAL <- c(2,5,9,4,9)

hist(PLOTVAL, col = “green”, border = “red”)

LINE GRAPH can be drawn with the single object as well

PLOTVAL <- c(2,5,9,4,9)

plot(PLOTVAL, type = “o”, col = “red”)

To draw a SCATTER GRAPH, need two set of values so that they represent X and Y Axis.

PLOTVAL <- c(2,5,9,4,9)

PLOTVAL1 <- c(2,5,9,4,9)

plot(PLOTVAL, PLOTVAL1, col = “blue”)

For BOX PLOT to be drawn we need 2 set of values

PLOTVAL <- c(120,50,90,14,9)

PLOTVAL1 <- c(120,25,19,175,29)

boxplot(PLOTVAL, PLOTVAL1, col =”blue”)

I hope first view of how one can generate graphs/visualization must be a good experience. Now that you have got key statistical formulas and visualization exposure, you are ready to explore advance statistical problems. In my next blog, I will cover “Advance statistical formulas using R Studio”.

Thank you for sparing time and going through this blog I hope it helped you built sound foundation of statistics using R. Kindly share your valuable and kind opinion. Please do not forget to suggest what you would like to understand and hear from me in my future blogs.

Thank you…

Outstanding Outliers:: “AG”.

Outstanding Outlier
Data Science from Scratch

Advertisements

One thought on “Basic Statistical plotting using R

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s