AnimalGAN Formula Breakdown

A generative adversarial network model alternative to animal studies for clinical pathology assessment | Nature Communications – Xi ChenRuth RobertsZhichao Liu & Weida Tong , 2023 

Break down of the equation

1. ​minG and maxD

This is essentially the min-max game where the Generator (G) is trying to minimize the loss between the fake synthetic data and the real experimental data. The Discriminator (D) is trying to maximize the loss so that it can accurately tell the difference between fake and real data.

2. the Loss function

This is the score between G and D, training adjusts both G and D to optimize this score.

3. The first term

E = expected value that is averaged over many samples;
c = conditons information which includes chemical structure, dose level and treatment duration. Compound + dose + time, for ex/ acetaminophen + high dose + 28 days
x = real clin path data from rats on experiments. Includes parameters such as glucose and creatine
P(c,x) = probability distribution of real training data from actual experiments.
D(c,x) = the output of the discriminator when given conditions and real data, if it gets a 1 then it means its real and if its 0 then it means its fake. It can only be between 0 and 1.
log D(c,x) = this rewards D for identifying correctly if the sample is real or fake, so D wants this value to be large. log 0.99 is -0.004 log 0.1 is -1, thus real data get the discriminator value close to 0.

4. The Second Term:

This represents generated data.
z = random noise vector that introduces biological variability since there is always individual variation.
Pz = probability distribution of noise, usually Gaussian (normal) noise
Pc = distribution of experimental conditions, looking at all compound-dose-time combinations
G(c,z) = this is generator output that includes condition and random noise, in turn producing synthetic data
D(G(c,z) = the discriminator is evaluating the generated fake value, when this becomes 0.5, that is when the discriminator cant tell what is real or fake
Log (1 – D(G(c,z) = this function rewards the Discriminator for detecting a fake and Discriminator wants the value to be large while Generator wants it to be low.

5. The Third Term:

This is the generator regularization loss, which forces the generator to behave smoothly. Essentially if there is a small change dose, time or chemical structure, there should be a gradual change in clinical pathology parameters.
λ (lambda) = is the weighting factor that controls how important regularization is, large lambda puts more emphasis on biological smoothness and small lambda is less emphasis.


Leave a comment