Log-normal distribution

Consent form
This website uses cookies to store user preferences and the status of the learning application that has already been processed. Furthermore, your interactions with the learning application as cursor movements, clicks and inputs are collected for research purposes. By continuing to use the website, you agree to this use.

Data protection information obligations regarding data collection in the “MultiLA” research project in accordance with Art. 13 GDPR The project “Multimodal Interactive Learning Dashboards with Learning Analytics” (MultiLA) aims to research learning behavior in the learning applications provided. For this purpose, data is collected and processed, which we will explain below.

Name and contact details of the person responsible Berlin University of Technology and Economics Treskowallee 8 10318 Berlin

T: +49.40.42875-0

Represented by the President Praesidentin@HTW-Berlin.de

Data protection officer Official data protection officer Vitali Dick (HiSolutions AG) datenschutz@htw-berlin.de

Project manager Other leg jerkers andre.beinrucker@htw-berlin.de

Processing of personal data 3.1 Purpose The processing of personal data serves the purpose of analyzing learning behavior and the use of interactive learning applications as part of the “MultiLA” research project.

3.2 Legal basis The legal basis is Article 6 Paragraph 1 Letter e GDPR.

3.3 Duration of storage All data is recorded only within the learning application. They are stored on the HTW-Berlin servers and will be deleted when the project or possible follow-up projects expire.

Your rights You have the right to receive information from the university about the data stored about you and/or to have incorrectly stored data corrected. You also have the right to delete or restrict processing or to object to processing. In addition, if you have given consent as the legal basis for the processing, you have the right to withdraw your consent at any time. The lawfulness of processing based on consent until its revocation remains unaffected. In this case, please contact the following person: Andre Beinrucker, andre.beinrucker@htw-berlin.de. You have the right to lodge a complaint with a supervisory authority if you believe that the processing of your personal data violates the law.
Information about your right to object according to Art. 21 Paragraph 1 GDPR You have the right, for reasons arising from your particular situation, to object at any time to the processing of data concerning you, which is carried out on the basis of Article 6 Paragraph 1 Letter e of the GDPR (data processing in the public interest).

Log-normal distribution for the prices 🤑

Recall that in our last case study, we modeled the first digit of the sample of supermarket prices with the Benford’s law. This time, let us model the prices as realizations of a continuous random variable \(X\) following a continuous distribution law.

What do you think the distribution law should be like? Can it be a normal distribution? An exponential distribution? Or do we need some other distribution to describe the associated random variable appropriately?

Let’s look at the histogram of the \(50\) prices:

Obviously, log-normal distribution would be the best choice for our model. But what is this “log-normal distribution”?

Log-normal distribution

” a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable \(X\) is log-normally distributed, then \(Y = \ln(X)\) has a normal distribution.”

So,

if a random variable \(X\) is log-normally distributed, then \(\ln X\) is normally distributed.
Since the function \(\ln(\cdot)\) admits only positive values, the log-normal distribution is a suitable model for positive-valued real random variables.

The density of the log-normal distribution for \(x \in(0;\infty)\) is:

\[f(x)=\frac1{x\sigma\sqrt{2\pi}}\cdot e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}}\]

with the resulting \(\mathbb E(X)=e^{\mu+\frac{\sigma^2}2}\) and \(\text{Var}(X)=(e^{\sigma^2}-1)\cdot e^{2\mu +\sigma^2}.\)

Below, you can find the densities of log-normal distributions with different parameter values.

😎 So, if our prices follow a log-normal distribution, we can apply the natural logarithm transformation to the prices and model them subsequently using a normal distribution! 😎

Normal distribution for the log-prices 🔬

Now recall our real sample of \(50\) prices for you here (https://www.edeka24.de)

Log-normal distribution

\[prices\sim \log N(\mu,\sigma^2)\rightarrow \ln (prices)\sim N(m,s^2)\]

A sample of real prices (no intended advertisement). Click here to unfold/fold the data 📲

So, if our prices follow a log-normal distribution, we can apply the natural logarithm transformation to the prices, which are already added in R as a variable prices, and model them subsequently using a normal distribution!

Follow the following steps:

compute the natural logarithm of the sample prices and call the new variable lprices. Use the R-function log(prices) or =LN() in excel/calc.

If you wish, you can use the following chunk to compute the probability in the question above using R:

lprices = log(prices)

Useful R-functions

m = mean(...) to compute the sample mean,
s = sd(...) to compute the sample standard deviation,
dnorm(x,m,s) to compute the density in points \(x\) modeled with normal distribution with mean \(m\) and standard deviation \(s\).
pnorm(x,m,s) to compute the the probability \(\mathbb P(X\leq x)\) for \(X\) following a normal distribution with mean \(m\) and standard deviation \(s\).

compute the sample mean m and the sample standard deviation s of the log-prices:

m = mean(lprices)
s = sd(lprices)
# print out the results
print(paste("the sample mean is",round(m,4),"and the standard deviation is",round(s,4)))

plot the density of the resulting normal distribution. Use R-command x=seq(from=-5,to=5,by=0.01) to create a sequence of values, and fx=dnorm(x,mean=m,sd=s) to evaluate the density at the values. You can use plot(x,fx,type="l") to show a plot.

x = seq(-5,5,0.01) # sequence of values
fx = dnorm(x,m,s) # density at the values
plot(x,fx, type="l") # plot of type "l"=lines

compute the following probabilities according to the log-normal model:

If you wish, you can use the following chunk to compute the probability in the question above using R (Note that the function pnorm(number,m,s) will compute for \(\mathbb P(X\leq ~number)\) for \(X\sim N(m,s^2)\)).

pnorm(3,m,s)
pnorm(3,m,s)- pnorm(1,m,s)
pnorm(log(3),m,s)
pnorm(log(3),m,s)- pnorm(log(1),m,s)

Log-normal distribution

Maria Osipenko

07.03.2024

Log-normal distribution for the prices 🤑

Log-normal distribution

Normal distribution for the log-prices 🔬

Log-normal distribution

Useful R-functions

Summary