NA and NULL values in Vectors of R Programming

In statistical data sets, we often encounter missing data, which we represent in R with the value NA. On the other hand, NULL represents that the value in question simply doesn’t exist, rather than being existent but unknown.

In many of R’s statistical functions, we can instruct the function to skip over any missing values, or NAs.

> x <- c(88,NA,12,168,13)

> x

[1] 88 NA 12 168 13

> mean(x)

[1] NA

#na.rm=T (true) function will remove the nulls from an vector

> mean(x,na.rm=T)

[1] 70.25

> x <- c(88,NULL,12,168,13)

> mean(x)

[1] 70.25

In the first call, mean() refused to calculate, as one value in x was NA. But by setting the optional argument na.rm (NA remove) to true (T), we calculated the mean of the remaining elements.

But R automatically skipped over the NULL value, which we’ll look at in the next section.

There are multiple NA values, one for each mode:

> x <- c(5,NA,12)

> mode(x[1])

[1] "numeric"

> mode(x[2])

[1] "numeric"

> y <- c("abc","def",NA)

> mode(y[2])

[1] "character"

> mode(y[3])

[1] "character"

One use of NULL is to build up vectors in loops, in which each iteration adds another element to the vector. In this simple example, we build up a vector of even numbers:

# build up a vector of the even numbers in 1:10

> z <- NULL

> for (i in 1:10) if (i %%2 == 0) z <- c(z,i)

> z

[1] 2 4 6 8 10

The %% is the modulo operator, giving remainders upon division. For example, 13 %% 4 is 1, as the remainder of dividing 13 by 4 is 1. Thus the example loop starts with a NULL vector and then adds the element 2 to it,then 4, and so on.

Here are two more ways another way to find even numbers in 1:10.

> seq(2,10,2)

[1] 2 4 6 8 10

> 2*1:5

[1] 2 4 6 8 10

But the point here is to demonstrate the difference between NA and NULL. If we were to use NA instead of NULL in the preceding example, we would pick up an unwanted NA:

> z <- NA

> for (i in 1:10) if (i %%2 == 0) z <- c(z,i)

> z

[1] NA 2 4 6 8 10

NULL is a special R object with no mode and its values really are counted as nonexistent .

> u <- NULL

> length(u)

[1] 0

> v <- NA

> length(v)

[1] 1

In statistical data sets, we often encounter missing data, which we represent in R with the value NA. On the other hand, NULL represents that the value in question simply doesn’t exist, rather than being existent but unknown.

**NA values:**In many of R’s statistical functions, we can instruct the function to skip over any missing values, or NAs.

> x <- c(88,NA,12,168,13)

> x

[1] 88 NA 12 168 13

> mean(x)

[1] NA

#na.rm=T (true) function will remove the nulls from an vector

> mean(x,na.rm=T)

[1] 70.25

> x <- c(88,NULL,12,168,13)

> mean(x)

[1] 70.25

In the first call, mean() refused to calculate, as one value in x was NA. But by setting the optional argument na.rm (NA remove) to true (T), we calculated the mean of the remaining elements.

But R automatically skipped over the NULL value, which we’ll look at in the next section.

There are multiple NA values, one for each mode:

> x <- c(5,NA,12)

> mode(x[1])

[1] "numeric"

> mode(x[2])

[1] "numeric"

> y <- c("abc","def",NA)

> mode(y[2])

[1] "character"

> mode(y[3])

[1] "character"

**NULL values:**One use of NULL is to build up vectors in loops, in which each iteration adds another element to the vector. In this simple example, we build up a vector of even numbers:

# build up a vector of the even numbers in 1:10

> z <- NULL

> for (i in 1:10) if (i %%2 == 0) z <- c(z,i)

> z

[1] 2 4 6 8 10

The %% is the modulo operator, giving remainders upon division. For example, 13 %% 4 is 1, as the remainder of dividing 13 by 4 is 1. Thus the example loop starts with a NULL vector and then adds the element 2 to it,then 4, and so on.

Here are two more ways another way to find even numbers in 1:10.

> seq(2,10,2)

[1] 2 4 6 8 10

> 2*1:5

[1] 2 4 6 8 10

But the point here is to demonstrate the difference between NA and NULL. If we were to use NA instead of NULL in the preceding example, we would pick up an unwanted NA:

> z <- NA

> for (i in 1:10) if (i %%2 == 0) z <- c(z,i)

> z

[1] NA 2 4 6 8 10

NULL is a special R object with no mode and its values really are counted as nonexistent .

> u <- NULL

> length(u)

[1] 0

> v <- NA

> length(v)

[1] 1

--------------------------------------------------------------------------------------------------------

Thanks, TAMATAM ; Business Intelligence & Analytics Professional

Thanks, TAMATAM ; Business Intelligence & Analytics Professional

--------------------------------------------------------------------------------------------------------

Wonderful blog! I found it while surfing around on Yahoo News.

ReplyDeleteDo you have anny tips on how to get listed in Yahoo News? I've been trying for a while but I never seem to

get there! Appreciate it