Thursday, November 29, 2018

What is the difference between rnorm(),runif() and sample() Functions in R

How to generate Random numbers using rnorm(), runif() and sample() Functions in R
rnorm () :
The rnorm() function is used to generate n normal random numbers with a specified mean and standard deviation.
The rnorm() function generates the random values from the normal distribution. Normally distributed random numbers on an interval have probabilities that follow the normal distribution bell curve, so numbers closer to the mean are more likely to be selected or to happen. 
Syntax :
rnorm(n, mean = , sd = )

Examples:
> set.seed(100)
>rnorm(10,mean=0,sd=1)
This will generate the 10 Random Numbers with Mean=0 and Standard Deviation=1.
[1] -0.9531302 1.1858818 -0.2572629 0.4372134 -0.3650827 0.4966740 0.5557346 0.6712590 -0.9485679 1.1848094
When we use set.seed(int) , we can get same random numbers again and again with same seed. Seed can be any unique integer.
> set.seed(100)
>rnorm(10,mean=0,sd=1)
[1] -0.9531302 1.1858818 -0.2572629 0.4372134 -0.3650827 0.4966740 0.5557346 0.6712590 -0.9485679 1.1848094


if you don't set the seed, the random number series will be new every time we run.
> rnorm(10,mean=0,sd=1)
 [1]  0.08988614  0.09627446 -0.20163395  0.73984050  0.12337950 -0.02931671 -0.38885425  0.51085626 -0.91381419  2.31029682
> set.seed(123) 
> rnorm(10,mean=5,sd=3) 
[1] 3.318573 4.309468 9.676125 5.211525 5.387863 10.145195 6.382749 1.204816 2.939441 3.663014
#Round down/up to nearest integer
> set.seed(123) 
> round(rnorm(10,mean=5,sd=3)) 
[1] 3 4 10 5 5 10 6 1 3 4 
#Round down to nearest integer
> set.seed(123)
> floor(rnorm(10,mean=5,sd=3)) 
[1] 3 4 9 5 5 10 6 1 2 3
#Round up to nearest integer
> set.seed(123) 
> ceiling(rnorm(10,mean=5,sd=3)) 
[1] 4 5 10 6 6 11 7 2 3 4

runif () :
The runif() function is used to generate n uniform random numbers lies in the interval (min, max).
The runif() function generates the random values from the uniform distribution.Uniformly distributed random numbers on an interval have equal probability of being selected or happening.
Syntax :
runif(n, min = , max = )

Examples :
>runif(10, min=5,max=25)
This will generate the 10 Random Numbers with between 5 and 25.
[1] 22.790786 18.856068 17.810136 24.885396 18.114116 19.170609 15.881320 16.882840 10.783195 7.942273
We can set the Seed for this function as well.
> set.seed(1) 
> runif(10, min=5,max=25) 
[1] 10.310173 12.442478 16.457067 23.164156 9.033639 22.967794 23.893505 18.215956 17.582281 6.235725
> set.seed(1) 
> runif(10, min=5,max=25) 
[1] 10.310173 12.442478 16.457067 23.164156 9.033639 22.967794 23.893505 18.215956 17.582281 6.235725

sample () :
The sample() function is used to generate a random sample of observations from the population. Make sure sample size should be smaller than population.

The sample() function is similar to runif(). The runif() gives fractional numbers and sample() gives whole numbers.
Syntax:
sample(population_size, sample_size, replace=FALSE)
here,the replace attribute is related to sampling techniques. When you sample replace = False, the element/number picked for sampling will not kept back in entire population to be picked again in same sample.
Examples :
> sample(x=10, size=9,replace=F) 
[1] 3 2 6 10 5 7 8 4 1
> sample(x=10, size=9,replace=T) 
[1] 8 10 3 7 2 3 4 1 4
> sample(x=c(0, 1), size=10, replace=TRUE) 
[1] 1 0 0 1 0 0 1 1 1 0 
> sample(x=c(0, 1), size=10, replace=FALSE) 
Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE'.
> sample(LETTERS, 5) 
[1] "S" "K" "T" "O" "R" 
> sample(letters, 5) 
[1] "o" "n" "s" "a" "k"
> set.seed(007) 
> sample(LETTERS, 5) 
[1] "Z" "J" "C" "B" "F" 
> set.seed(007) 
> sample(letters, 5) 
[1] "z" "j" "c" "b" "f"
> sample(month.name,5)
[1] "September" "August" "May" "November" "April"
> sample(month.abb,5) 
[1] "Mar" "Jan" "Nov" "Dec" "May"

Normal Vs. Uniform Distribution :
Image source : math.stackexchange.com

The green line shows a uniform distribution over the range [−5,5]. Informally, each number in the range is equally ("uniformly") likely to be picked. 
The red line shows a normal distribution with mean of 0 and standard deviation of 1. Numbers close to the mean are much more likely to be picked than those far away from the mean, in a particular and very special way.

--------------------------------------------------------------------------------------------------------
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------

No comments:

Post a Comment

Hi User, Thank You for visiting My Blog. Please post your genuine Feedback or comments only related to this Blog Posts. Please do not post any Spam comments or Advertising kind of comments which will be Ignored.

Featured Post from this Blog

How to compare Current Snapshot Data with Previous Snapshot in Power BI

How to Dynamically compare two Snapshots Data in Power BI Scenario: Suppose, we have a sample Sales data, which is stored with Monthly Snaps...

Popular Posts from this Blog