Sunday, January 14, 2018

How to handle NA and NULL values in Vectors of R Programming

NA and NULL values in Vectors of R Programming
In statistical data sets, we often encounter missing data, which we represent in R with the value NA. On the other hand, NULL represents that the value in question simply doesn’t exist, rather than being existent but unknown.
NA values:
In many of R’s statistical functions, we can instruct the function to skip over any missing values, or NAs.
> x <- c(88,NA,12,168,13)
> x
[1] 88 NA 12 168 13

> mean(x)
[1] NA

#na.rm=T (true) function will remove the nulls from an vector
> mean(x,na.rm=T)
[1] 70.25

> x <- c(88,NULL,12,168,13)
> mean(x)
[1] 70.25

In the first call, mean() refused to calculate, as one value in x was NA. But by setting the optional argument na.rm (NA remove) to true (T), we calculated the mean of the remaining elements. 
But R automatically skipped over the NULL value, which we’ll look at in the next section.

There are multiple NA values, one for each mode:
> x <- c(5,NA,12)
> mode(x[1])
[1] "numeric"

> mode(x[2])
[1] "numeric"

> y <- c("abc","def",NA)
> mode(y[2])
[1] "character"

> mode(y[3])
[1] "character"

NULL values:
One use of NULL is to build up vectors in loops, in which each iteration adds another element to the vector. In this simple example, we build up a vector of even numbers:
# build up a vector of the even numbers in 1:10
> z <- NULL
> for (i in 1:10) if (i %%2 == 0) z <- c(z,i)
> z
[1] 2 4 6 8 10

The %% is the modulo operator, giving remainders upon division. For example, 13 %% 4 is 1, as the remainder of dividing 13 by 4 is 1. Thus the example loop starts with a NULL vector and then adds the element 2 to it,then 4, and so on.

Here are two more ways another way to find even numbers in 1:10.
> seq(2,10,2)
[1] 2 4 6 8 10
> 2*1:5
[1] 2 4 6 8 10

But the point here is to demonstrate the difference between NA and NULL. If we were to use NA instead of NULL in the preceding example, we would pick up an unwanted NA:
> z <- NA
> for (i in 1:10) if (i %%2 == 0) z <- c(z,i)
> z
[1] NA 2 4 6 8 10

NULL is a special R object with no mode and its values really are counted as nonexistent . 
> u <- NULL
> length(u)
[1] 0

> v <- NA
> length(v)
[1] 1

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose, we have a sample Sales data, which is stored with Monthly Snaps...

Popular Posts from this Blog