class: center, middle, inverse, title-slide # Informed Design of Experiments? ### Martin Modrák ### 2018/06/11 --- background-image:url("enbik_img/matrix.jpg") background-position:50% 50% class: center bottom inverse # Simulations! .copyright[photo by Maurizio Pesce, CC-BY] --- # Why & What -- 1. Design of experiments -- * No. of replicates, comparison groups, ... -- 1. Understanding the methods you use -- 1. Case Studies -- * t-test -- * DESeq2 --- # Power Analysis -- * Simulations: -- * Easier -- * Test the whole process -- * More assumptions --- background-image: url("enbik_img/dive_in.jpg") background-position: 50% 0% class: inverse, center, bottom .copyright[photo: U.S. government work] -- # Case Study 1 ## Two sample t-test --- # A Hypothetical Experiment -- * Cell culture -- * Does unoptanium increase midichlorian production? -- * 5 replicates -- * Analyze with t-test, significant if `\(p < 0.05\)` -- * Simulation assumptions * Unoptanium helps ( `\(+2\mu g\)` on average) -- * `\(\mathrm{sd} = 8\mu g\)` --- # What do we care about? -- * Observed effect size -- * How frequently will we claim significance -- * a.k.a. power -- * But there's more! -- * Let's simulate 10000 datasets --- background-image: url("enbik_img/what_could_go_wrong.jpg") background-position: 50% 0% background-size: cover .copyright[photo: U.S. government work] --- # What We Observe ![](enbik_files/figure-html/t_observed_effects-1.svg)<!-- --> --- # Filter for Significance ![](enbik_files/figure-html/t_filtered_effects-1.svg)<!-- --> -- **Power:** ``` ## p < 0.05 in 0.0561 cases ``` --- # A Closer Look ![](enbik_files/figure-html/t_filtered_zoomed-1.svg)<!-- --> -- **Type S Error** (wrong **S**ign) -- <table> <thead> <tr> <th style="text-align:left;"> Type S error </th> <th style="text-align:left;"> 95% CI excludes true </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 16.9% </td> <td style="text-align:left;"> 36.4% </td> </tr> </tbody> </table> --- # A Closer Look ![](enbik_files/figure-html/t_filtered_zoomed_2-1.svg)<!-- --> **Type M Error** (wrong **M**agnitude) -- <table> <thead> <tr> <th style="text-align:right;"> Mean exaggeration </th> <th style="text-align:right;"> Min. exaggeration </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5.5 </td> <td style="text-align:right;"> 2.1 </td> </tr> </tbody> </table> --- background-image: url("enbik_img/kangaroo.jpg") background-position: 50% 0% background-size: 60% class: center, bottom # Significance is Not a Savior! --- # Impact on the Literature -- * Published effects are exaggerated -- * Exaggeration depends on amount of noise -- * Negligible in high-powered studies -- * If a results looks too good given the noise -- it probably is. --- background-image: url("enbik_img/challenge.jpg") background-position: 50% 0% class: center, bottom .copyright[photo by Llann Wé, CC-BY] -- # Case Study 2 ## Differential Expression (DESeq2) --- # Less Hypothetical Experiment -- * Differential expression upon unoptanium stress -- * Control, treatment, 3 replicates each -- * 1000 genes -- * We use DESeq2 to test for effect = `\(|log_2(fc)| > 1\)` --- # Simulating DESeq2 -- * Where do the read counts come from? -- * From a previous experiment -- * How to set `\(log_2(fc)\)` ? -- * 80% genes have `\(log_2(fc) = 0\)` -- * 0, 2, 4 and 6 for the other 20% -- * 100 simulations each --- # Some results <table> <thead> <tr> <th style="text-align:right;"> log_fc </th> <th style="text-align:right;"> True Pos. </th> <th style="text-align:right;"> False Pos. </th> <th style="text-align:right;"> Type S error </th> <th style="text-align:right;"> Mean exaggeration </th> <th style="text-align:right;"> Mean shrunk exaggeration </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 1.8 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> NaN </td> <td style="text-align:right;"> NaN </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 2.8 </td> <td style="text-align:right;"> 2.0 </td> <td style="text-align:right;"> 0.1 </td> <td style="text-align:right;"> 3.1 </td> <td style="text-align:right;"> 1.9 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 76.3 </td> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 0.1 </td> <td style="text-align:right;"> 1.3 </td> <td style="text-align:right;"> 1.0 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 161.3 </td> <td style="text-align:right;"> 6.4 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 0.9 </td> </tr> </tbody> </table> We tested for `\(|log_2(fc)| > 1\)` --- # Replicating DeSeq2 results -- * Exact experiment replication (3 replicates each) -- * Replicated = significant in both --- # Replication results <table> <thead> <tr> <th style="text-align:right;"> log_fc </th> <th style="text-align:right;"> Significant 1st experiment </th> <th style="text-align:right;"> Replicated </th> <th style="text-align:right;"> Smaller effect - significant </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 4.4 </td> <td style="text-align:right;"> 0.3 </td> <td style="text-align:right;"> 0.9 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 79.9 </td> <td style="text-align:right;"> 38.6 </td> <td style="text-align:right;"> 0.7 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 169.2 </td> <td style="text-align:right;"> 141.4 </td> <td style="text-align:right;"> 0.6 </td> </tr> </tbody> </table> --- # DESeq2 Summary -- * DE experiments have low power -- * DESeq2 rocks! -- * DESeq2 avoids false positives at all costs -- -> high false negatives --- class:inverse # Take Home -- * Worry about Type S & M errors -- * Simulate experiments before investing money -- * Simulate to understand published research -- * Code available at https://github.com/cas-bioinf/statistical-simulations -- .thanks[ Thanks for your attention! ] --- # What about 6 replicates? <table> <thead> <tr> <th style="text-align:right;"> log_fc </th> <th style="text-align:right;"> True Pos. </th> <th style="text-align:right;"> False Pos. </th> <th style="text-align:right;"> Type S error </th> <th style="text-align:right;"> Mean exaggeration </th> <th style="text-align:right;"> Mean shrunk exaggeration </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 0.7 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> NaN </td> <td style="text-align:right;"> NaN </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 8.1 </td> <td style="text-align:right;"> 0.8 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 1.8 </td> <td style="text-align:right;"> 1.4 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 150.9 </td> <td style="text-align:right;"> 2.4 </td> <td style="text-align:right;"> 0.1 </td> <td style="text-align:right;"> 1.1 </td> <td style="text-align:right;"> 1.0 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 184.1 </td> <td style="text-align:right;"> 3.4 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 0.9 </td> </tr> </tbody> </table> We tested for `\(|log_2(fc)| > 1\)`