Filtering Indices from a Vector in R Programming

Filtering is one of the most common operations in R, as statistical analyses often focus on data that satisfies conditions of interest. The R's filtering feature reflecting the functional language nature of R.This allows us to extract a vector’s elements that satisfy certain conditions.

Lets see an example to extract from z all its elements whose squares were greater than 8 and then assign that subvector to w.

> z <- c(5,2,-3,8)

> w <- z[z*z > 8]

> w

[1] 5 -3 8

Here is an another example..

> z <- c(5,2,-3,8)

> z

[1] 5 2 -3 8

Evaluation of the expression z*z > 8 gives us a vector of Boolean values!

> z*z > 8

[1] TRUE FALSE TRUE TRUE

Notes :

First, in the expression z*z > 8, note that everything is a vector or vector

operator:

• Since z is a vector, that means z*z will also be a vector (of the same length

as z).

• Due to recycling, the number 8 (or vector of length 1) becomes the vector

(8,8,8,8) here.

• The operator >, like +, is actually a function.

> ">"(2,1)

[1] TRUE

> ">"(2,5)

[1] FALSE

">"(z*z,8)

Boolean values are used to call out the desired elements of z:

> z[c(TRUE,FALSE,TRUE,TRUE)]

[1] 5 -3 8

The following example will place things into even sharper focus. Here, we will again define our extraction condition in terms of z, but then we will use the results to extract from another vector, y, instead of extracting from z:

> z <- c(5,2,-3,8)

> j <- z*z > 8

> j

[1] TRUE FALSE TRUE TRUE

> y <- c(1,2,30,5)

> y[j]

[1] 1 30 5

Or, more compactly, we could write the following:

> z <- c(5,2,-3,8)

> y <- c(1,2,30,5)

> y[z*z > 8]

[1] 1 30 5

Note:

we are using one vector, z, to determine indices to use in filtering another vector, y. In contrast, our earlier example used z to filter itself.

Here’s another example, this one involving assignment. Say we have a vector x in which we wish to replace all elements larger than a 3 with a 0.

> x <- c(1,3,8,2,20)

> x[x > 3] <- 0

> x

[1] 1 3 0 2 0

Filtering can also be done with the subset() function. When applied to vectors,the difference between using this function and ordinary filtering lies in the manner in which NA values are handled.

> x <- c(6,1:3,NA,12)

> x

[1] 6 1 2 3 NA 12

> x[x > 5]

[1] 6 NA 12

> subset(x,x > 5)

[1] 6 12

Note:

When we did ordinary filtering in the previous section, R basically said,“Well, x[5] is unknown, so it’s also unknown whether its square is greater than 5.” But you may not want NAs in your results. When you wish to exclude NA values, using subset() saves you the trouble of removing the NA values yourself.

The filtering consists of extracting elements of a vector z that satisfy a certain condition. In some cases, though, we may just want to find the positions within z at which the condition occurs. We can do this using which(), as follows:

> z <- c(5,2,-3,8)

> which(z*z > 8)

[1] 1 3 4

Here, the expression z*z > 8 is evaluated to (TRUE,FALSE,TRUE,TRUE). The which() function then simply reports which elements of the latter expression are TRUE.

Filtering is one of the most common operations in R, as statistical analyses often focus on data that satisfies conditions of interest. The R's filtering feature reflecting the functional language nature of R.This allows us to extract a vector’s elements that satisfy certain conditions.

**Filtering Indices :**Lets see an example to extract from z all its elements whose squares were greater than 8 and then assign that subvector to w.

> z <- c(5,2,-3,8)

> w <- z[z*z > 8]

> w

[1] 5 -3 8

Here is an another example..

> z <- c(5,2,-3,8)

> z

[1] 5 2 -3 8

Evaluation of the expression z*z > 8 gives us a vector of Boolean values!

> z*z > 8

[1] TRUE FALSE TRUE TRUE

Notes :

First, in the expression z*z > 8, note that everything is a vector or vector

operator:

• Since z is a vector, that means z*z will also be a vector (of the same length

as z).

• Due to recycling, the number 8 (or vector of length 1) becomes the vector

(8,8,8,8) here.

• The operator >, like +, is actually a function.

> ">"(2,1)

[1] TRUE

> ">"(2,5)

[1] FALSE

">"(z*z,8)

Boolean values are used to call out the desired elements of z:

> z[c(TRUE,FALSE,TRUE,TRUE)]

[1] 5 -3 8

The following example will place things into even sharper focus. Here, we will again define our extraction condition in terms of z, but then we will use the results to extract from another vector, y, instead of extracting from z:

> z <- c(5,2,-3,8)

> j <- z*z > 8

> j

[1] TRUE FALSE TRUE TRUE

> y <- c(1,2,30,5)

> y[j]

[1] 1 30 5

Or, more compactly, we could write the following:

> z <- c(5,2,-3,8)

> y <- c(1,2,30,5)

> y[z*z > 8]

[1] 1 30 5

Note:

we are using one vector, z, to determine indices to use in filtering another vector, y. In contrast, our earlier example used z to filter itself.

Here’s another example, this one involving assignment. Say we have a vector x in which we wish to replace all elements larger than a 3 with a 0.

> x <- c(1,3,8,2,20)

> x[x > 3] <- 0

> x

[1] 1 3 0 2 0

**Filtering with the subset() Function :**Filtering can also be done with the subset() function. When applied to vectors,the difference between using this function and ordinary filtering lies in the manner in which NA values are handled.

> x <- c(6,1:3,NA,12)

> x

[1] 6 1 2 3 NA 12

> x[x > 5]

[1] 6 NA 12

> subset(x,x > 5)

[1] 6 12

Note:

When we did ordinary filtering in the previous section, R basically said,“Well, x[5] is unknown, so it’s also unknown whether its square is greater than 5.” But you may not want NAs in your results. When you wish to exclude NA values, using subset() saves you the trouble of removing the NA values yourself.

**The Selection Function which() :**The filtering consists of extracting elements of a vector z that satisfy a certain condition. In some cases, though, we may just want to find the positions within z at which the condition occurs. We can do this using which(), as follows:

> z <- c(5,2,-3,8)

> which(z*z > 8)

[1] 1 3 4

**The result says that elements 1, 3, and 4 of z have squares greater than 8.**

Here, the expression z*z > 8 is evaluated to (TRUE,FALSE,TRUE,TRUE). The which() function then simply reports which elements of the latter expression are TRUE.

--------------------------------------------------------------------------------------------------------

Thanks, TAMATAM ; Business Intelligence & Analytics Professional

--------------------------------------------------------------------------------------------------------

## No comments:

## Post a Comment

Hi User, Thank You for visiting My Blog. Please post your open Feedback only related to this Blog Posts. Please note that I cannot respond to the Anonymous Comments.