The Vectorized Operations in R Programming
One of the most effective ways to achieve speed in R code is to use operations that are vectorized, meaning that a function applied to a vector is actually applied individually to each element.
Suppose we have a function fx() that we wish to apply to all elements of a vector x. In many cases, we can accomplish this by simply calling fx() on x itself.
Vector In, Vector Out Model:
Suppose if we use the > function in Vector operations, that will applies to each elements u[1] and v[1], resulting in TRUE, then to u[2] and v[2], resulting in FALSE, and so on.
> u <- c(5,2,8)
> v <- c(1,3,9)
> u > v
[1] TRUE FALSE FALSE
A key point is that if an R function uses vectorized operations, it, too, is vectorized, thus enabling a potential speedup. Here is an example:
> w <- function(x) return(x+1)
> w(u)
[1] 6 3 9
Here, w() uses +, which is vectorized, so w() is vectorized as well. As you can see, there is an unlimited number of vectorized functions, as complex ones are built up from simpler ones.
Note that even the transcendental functions—square roots, logs, trig functions, and so on—are vectorized.
> sqrt(1:9)
[1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
[9] 3.000000
> y <- c(1.2,3.9,0.4)
> z <- round(y)
> z
[1] 1 4 0
The point is that the round() function is applied individually to each element in the vector y. And remember that scalars are really single-element vectors, so the “ordinary” use of round() on just one number is merely a special case.
> round(1.2)
[1] 1
Please note that the even operators such as + are also really functions, there also the element-wise addition will happens.
> y <- c(12,5,13)
> y+4
[1] 16 9 17
> '+'(y,4)
[1] 16 9 17
Note: Here recycling played a key role here, with the 4 recycled into
(4,4,4).
Since we know that R has no scalars, let’s consider vectorized functions that appear to have scalar arguments.
> fx<- function(x,c) return((x+c)^2)
> fx(1:3,0)
[1] 1 4 9
> fx(1:3,1)
[1] 4 9 16
Vector In, Matrix Out Model :
The vectorized functions we’ve been working with so far have scalar return values. Calling sqrt() on a number gives us a number. If we apply this function to an eight-element vector, we get eight numbers, thus another eight element vector, as output.
Example:
z12 <- function(z) return(c(z,z^2))
Applying z12() to 5, say, gives us the two-element vector (5,25). If we apply this function to an eight-element vector, it produces 16 numbers:
x <- 1:8
> z12(x)
[1] 1 2 3 4 5 6 7 8 1 4 9 16 25 36 49 64
It might be more natural to have these arranged as an 8-by-2 matrix,which we can do with the matrix function:
> matrix(z12(x),ncol=2)
[,1] [,2]
[1,] 1 1
[2,] 2 4
[3,] 3 9
[4,] 4 16
[5,] 5 25
[6,] 6 36
[7,] 7 49
[8,] 8 64
But we can streamline things using sapply() (or simplify apply). The call sapply(x,f) applies the function to each element of x and then converts the result to a matrix. We do get a 2-by-8 matrix, not an 8-by-2 one, but it’s just as useful this way.
> sapply(1:length(x),z12)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 1 4 9 16 25 36 49 64
Subsetting of a Vector
R has powerful indexing features for accessing object elements. These features can be used to select and exclude variables and observations.
> x <- c(2.1, 4.2, 3.3, 5.4)
> x
[1] 2.1 4.2 3.3 5.4
#Positive integers return elements at the specified positions:
> x[c(3, 1)]
[1] 3.3 2.1
# Like integer indices, you can repeat indices
>y[c("a", "a", "a")]
a a a
2.1 2.1 2.1
Thanks, TAMATAM
One of the most effective ways to achieve speed in R code is to use operations that are vectorized, meaning that a function applied to a vector is actually applied individually to each element.
Suppose we have a function fx() that we wish to apply to all elements of a vector x. In many cases, we can accomplish this by simply calling fx() on x itself.
Vector In, Vector Out Model:
Suppose if we use the > function in Vector operations, that will applies to each elements u[1] and v[1], resulting in TRUE, then to u[2] and v[2], resulting in FALSE, and so on.
> u <- c(5,2,8)
> v <- c(1,3,9)
> u > v
[1] TRUE FALSE FALSE
A key point is that if an R function uses vectorized operations, it, too, is vectorized, thus enabling a potential speedup. Here is an example:
> w <- function(x) return(x+1)
> w(u)
[1] 6 3 9
Here, w() uses +, which is vectorized, so w() is vectorized as well. As you can see, there is an unlimited number of vectorized functions, as complex ones are built up from simpler ones.
Note that even the transcendental functions—square roots, logs, trig functions, and so on—are vectorized.
> sqrt(1:9)
[1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
[9] 3.000000
> y <- c(1.2,3.9,0.4)
> z <- round(y)
> z
[1] 1 4 0
The point is that the round() function is applied individually to each element in the vector y. And remember that scalars are really single-element vectors, so the “ordinary” use of round() on just one number is merely a special case.
> round(1.2)
[1] 1
Please note that the even operators such as + are also really functions, there also the element-wise addition will happens.
> y <- c(12,5,13)
> y+4
[1] 16 9 17
> '+'(y,4)
[1] 16 9 17
Note: Here recycling played a key role here, with the 4 recycled into
(4,4,4).
Since we know that R has no scalars, let’s consider vectorized functions that appear to have scalar arguments.
> fx<- function(x,c) return((x+c)^2)
> fx(1:3,0)
[1] 1 4 9
> fx(1:3,1)
[1] 4 9 16
Vector In, Matrix Out Model :
The vectorized functions we’ve been working with so far have scalar return values. Calling sqrt() on a number gives us a number. If we apply this function to an eight-element vector, we get eight numbers, thus another eight element vector, as output.
Example:
z12 <- function(z) return(c(z,z^2))
Applying z12() to 5, say, gives us the two-element vector (5,25). If we apply this function to an eight-element vector, it produces 16 numbers:
x <- 1:8
> z12(x)
[1] 1 2 3 4 5 6 7 8 1 4 9 16 25 36 49 64
It might be more natural to have these arranged as an 8-by-2 matrix,which we can do with the matrix function:
> matrix(z12(x),ncol=2)
[,1] [,2]
[1,] 1 1
[2,] 2 4
[3,] 3 9
[4,] 4 16
[5,] 5 25
[6,] 6 36
[7,] 7 49
[8,] 8 64
But we can streamline things using sapply() (or simplify apply). The call sapply(x,f) applies the function to each element of x and then converts the result to a matrix. We do get a 2-by-8 matrix, not an 8-by-2 one, but it’s just as useful this way.
> sapply(1:length(x),z12)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 1 4 9 16 25 36 49 64
Subsetting of a Vector
R has powerful indexing features for accessing object elements. These features can be used to select and exclude variables and observations.
> x <- c(2.1, 4.2, 3.3, 5.4)
> x
[1] 2.1 4.2 3.3 5.4
#Positive integers return elements at the specified positions:
> x[c(3, 1)]
[1] 3.3 2.1
# Duplicated indices yield duplicated values
>x[c(1, 1)]
[1] 2.1 2.
>x[c(1, 1)]
[1] 2.1 2.
# Real numbers are simply truncated to integers
x[c(2.1, 2.9)]
[1] 4.2 4.2
x[c(2.1, 2.9)]
[1] 4.2 4.2
#Negative integers omit elements at the specified positions:
>x[-c(3, 1)]
[1] 4.2 5.4
>x[-c(3, 1)]
[1] 4.2 5.4
#Logical vectors select elements where the corresponding logical value is TRUE. This is probably the most useful type of subsetting because you write the expression that creates the logical vector:
>x[c(TRUE, TRUE, FALSE, FALSE)]
[1] 2.1 4.2
>x[c(TRUE, TRUE, FALSE, FALSE)]
[1] 2.1 4.2
>x[x > 3]
[1] 4.2 3.3 5.
I#f the logical vector is shorter than the vector being subsetted, it will be recycled to be the same length.
>x[c(TRUE, FALSE)]
[1] 2.1 3.3
# Equivalent to
>x[c(TRUE, FALSE, TRUE, FALSE)]
[1] 2.1 3.3
[1] 4.2 3.3 5.
I#f the logical vector is shorter than the vector being subsetted, it will be recycled to be the same length.
>x[c(TRUE, FALSE)]
[1] 2.1 3.3
# Equivalent to
>x[c(TRUE, FALSE, TRUE, FALSE)]
[1] 2.1 3.3
#A missing value in the index always yields a missing value in the output:
>x[c(TRUE, TRUE, NA, FALSE)]
[1] 2.1 4.2 NA
>x[c(TRUE, TRUE, NA, FALSE)]
[1] 2.1 4.2 NA
#Nothing returns the original vector. This is not useful for vectors but is very useful for matrices, data frames, and arrays. It can also be useful in conjunction with assignment.
>x[]
[1] 2.1 4.2 3.3 5.4
>x[]
[1] 2.1 4.2 3.3 5.4
#Zero returns a zero-length vector. This is not something you usually do on purpose, but it can be helpful for generating test data.
>x[0]
numeric(0)
>x[0]
numeric(0)
#Character vectors to return elements with matching names.
>y <- setNames(x, letters[1:4])
a b c d
2.1 4.2 3.3 5.4
>y <- setNames(x, letters[1:4])
a b c d
2.1 4.2 3.3 5.4
>y[c("d", "c", "a")]
d c a
5.4 3.3 2.1
d c a
5.4 3.3 2.1
# Like integer indices, you can repeat indices
>y[c("a", "a", "a")]
a a a
2.1 2.1 2.1
Thanks, TAMATAM
No comments:
Post a Comment
Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.