Python Lognormal Probability Plot


I want to plot a cdf of data on a lognormal probability graph, like shown below:

enter image description here

I want the axes scales on my plot to look like that, only flipped (with probability on the x-axis). Note that the y-axis above is NOT simply a logarithmic scale. Also I'm not sure why the x-axis above repeats 1-9 instead of going to 10-99 etc, but ignore that part.

Here is what I have so far. I am using the method to make a CDF as outlined here

mu, sigma = 3., 1. # mean and standard deviation
data = np.random.lognormal(mu, sigma, 1000)

#Make CDF
dataSorted = np.sort(data)
dataCdf = np.linspace(0,1,len(dataSorted))

plt.plot(dataCdf, dataSorted)
plt.gca().set_yscale('log')
plt.xlabel('probability')
plt.ylabel('value')

enter image description here

Now I just need a way to scale my x-axis like the y-axis is on the picture above.


Answers:


A way to tackle this problem is to use a symmetric log scale, called symlog.

Symlog is a logarithmic plot that behaves linearly within some range around 0 (where a normal log plot would show infinitively many decades) such that a logarithmic graph crossing 0 is actually possible.

Symlog can be set in matplotlib using ax.set_xscale('symlog', linthreshx=0.1) where linthreshx denotes the linear range around zero.

As in this case we want the center of the graph to be at 0.5 instead of 0, we can actually plot two graphs and stick them together. In order to get the desired result, one can now play with the tickmarks to be shown, as well as the linthreshx paramter. Below is an example.

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker
mu, sigma = 3., 1. # mean and standard deviation
data = np.random.lognormal(mu, sigma, 1000)

#Make CDF
dataSorted = np.sort(data)
dataCdf = np.linspace(0,1,len(dataSorted))

fig, (ax1, ax2) = plt.subplots(ncols=2, sharey=True)
plt.subplots_adjust(wspace=0.00005)
ax1.plot(dataCdf[:len(dataCdf)/2], dataSorted[:len(dataCdf)/2])
ax2.plot(dataCdf[len(dataCdf)/2:]-1, dataSorted[len(dataCdf)/2:])

ax1.set_yscale('log')
ax2.set_yscale('log')

ax1.set_xscale('symlog', linthreshx=0.001)
ax2.set_xscale('symlog', linthreshx=0.001)

ax1.set_xlim([0.01, 0.5])
ax2.set_xlim([-0.5, -0.01])

ticks = np.array([0.01,0.1,  0.3])
ticks2 = ((1-ticks)[::-1])-1
ax1.set_xticks(ticks)
ax1.xaxis.set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax2.set_xticks(ticks2)
ax2.xaxis.set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax2.set_xticklabels(ticks2+1)

ax1.spines["right"].set_visible(False)
ax2.spines["left"].set_visible(False)
ax1.yaxis.set_ticks_position('left')
ax2.yaxis.set_ticks_position('right')

ax1.set_xlabel('probability')
ax1.set_ylabel('value')

plt.savefig(__file__+".png")
plt.show()

enter image description here