background-image: url(img/title_background.jpeg) background-size: cover class: title-slide, inverse .pull-left[ # The Irruption Of The AI ## ### Facundo Muñoz<br/>facundo.munoz@cirad.fr<br/><i class="fab fa-twitter"></i> <i class="fab fa-github"></i> famuvie<br /><img src="img/CirBlanc_L230px.png" style="width: 25%" /> <!-- ![](img/CirBlanc_L230px.png) --> ] .pull-right[ <!-- ![](https://datacarpentry.org/rr-organization1/fig/files_messy_tidy.png) --> ] SEIO2022 Granada _The role of Biostatistics in Health Data Science_ <!-- ![](img/band_praps.png) --> ??? <!-- --- --> <!-- layout: true --> <!-- <a class="footer-link" href="https://umr-astre.pages.mia.inra.fr/training/notions_stats/">Notions de base en statistiques - umr-astre.pages.mia.inra.fr/training/notions_stats/</a> --> --- class: impact, inverse, middle, center background-image: url(img/gold-rush.jpeg) background-size: contain background-position: center # The AI rush --- .pull-left[ ![](img/villani-cover.png) ] -- .pull-right[ ![](img/villani-focus.png) ] ---
Veterinary Research (2021)
Research perspectives on animal health in the era of artificial intelligence
Ezanno, Pauline; Picault, Sébastien; Beaunée, Gaël et. al.
-- .pull-left[ - Diagnosis and disease case detection - More accurate predictions and reduced errors ] -- .pull-right[ - Representing more realistically complex biological systems - Speed-up decisions and improving risk analyses ] ??? --- .pull-left[ ![](img/deep-learning_ensae.png) ] .pull-right[ Using recurrent neural networks to predict epidemiological dynamics ![](img/prediction-sirs.png) ] ??? Surprising results: impressive accuracy with only 20 % of training data and no model structure whatsoever. --- class: middle .pull-left[ # AI vs Biostatistics # contenders? ] -- .pull-right[ - ## Budget and funding - ## Education - ## (Health) Sciences ] ??? - Funding specifically allocated to applications of AI - Degrees and masters on Statistics on decline in favour of Data Science, and AI - Increased emphasis on predictive and personalised medicine at the expense of understanding causal mechanisms --- background-image: url(img/ml-algorithms.jpeg) background-size: contain background-position: center ??? Traditional statistical models like linear or logistic regression, gaussian mixture models, and topics like dimensionality reduction are considered as mere oldish "predictive algorithms" often outperformed by more flexible and computationally-intensive alternatives --- background-image: url(img/stats-ai.jpeg) background-size: contain background-position: center ??? On the other side, the statistical community often dismisses progress in AI methods --- .pull-left[ ![](img/ai-ifs.png) ] -- .pull-right[ ![](img/ai-algorithm.png) ] --- class: inverse, middle, impact # The gap of Biostatistics --- .pull-left[ ## Biostatistics shaped the field of Health Science in the 20th century, with methods that were appropriate for ___finding the big effects___
Gelman
et. al.
2021
] -- .pull-right[ - sampling theory - experimental design - classical and Bayesian decision analysis - confidence intervals and hypothesis testing - maximum likelihood - the analysis of variance ... ] ??? The theory of hypothesis testing developed by Neyman, Pearson and Fisher in the 20-30's became one of the twentieth century’s most influential pieces of applied mathematics. See Gelman and Vetahri for significant advances in the period 1920-1970. Fisher introduced the concept of randomisation in agricultural experiments and Bradford Hill introduced the concept into medical research, inaugurating the modern controlled clinical trial. During the late 1940s, two decades after Ronald Fisher had introduced randomization to agricultural experimentation, Hill introduced randomization to medical research. Inaugurating the modern controlled clinical trial --- # The information revolution
??? - Things changed. We have access to loads of data and lots of computing power. We need to find more subtle effects, interactions and weak signals from data with lots of noise and observational. --- .pull-left[ ## Biostatistics has evolved __a lot__ over the last 50 years
Gelman
et. al.
2021
] -- .pull-right[ - counterfactual causal inference - bootstrapping and simulation-based inference - overparameterized models and regularization - Bayesian multilevel models - generic computation algorithms - adaptive decision analysis - robust inference - exploratory data analysis ... ] --- # Yet, the t-test remains... ![](img/ttest-diagram.png)
Technology Innovations in Statistics Education (2007)
The Introductory Statistics Course: A Ptolemaic Curriculum?
Cobb, George W
??? In the field, researchers continue publishing studies with t-tests and looking at partial views of the problem which fit into the classical methods, rather than using modern methodologies. (prog. stats UG) --- class: inverse, middle, center background-image: url(img/gap.jpeg) background-size: contain background-position: center # Perceived __gap__ between needs and available skills # some think that AI is bridging it --- class: middle .pull-left[ > _Who knows why people do what they do? __The point is they do it__, and we can track and measure it with unprecedented fidelity_ .pull-right[Chris Anderson, 2008] ] .pull-right[ [![](img/end-theory.png)](https://www.wired.com/2008/06/pb-theory/) ] --- class: middle .pull-left[ > _With enough data, the numbers speak for themselves. (...) We can __stop looking for models__. We can analyze the data without hypotheses_ .pull-right[Chris Anderson, 2008] ] .pull-right[ [![](img/end-theory.png)](https://www.wired.com/2008/06/pb-theory/) ] --- background-image: url(img/guardian-post-theory.png) background-size: contain background-position: center --- class: middle, center, inverse # Data-driven predictive methods are # __not sufficient for the Scientific enterprise__ --- .pull-left[ # Big noise > _The rise of __Big Data__ has the potential to help us predict the future, yet __much of it is misleading, useless or distracting__._ .pull-right[Nate Silver, 2012] ] .pull-right[ ![](img/silver-signal.jpg) ] --- background-image: url(img/spurious-correlations.png) background-size: contain background-position: center --- <!-- ```{r message = FALSE} --> <!-- drop_name(bib_path, "pearl_book_2018", use_xaringan = TRUE) |> --> <!-- htmltools::includeHTML() --> <!-- ``` --> .pull-left[ > ## Two people who believe in two different causal diagrams can analyze the same data and may never come to the same conclusion, __regardless of how “big” the data are__ ] .pull-right[ ![](img/pearl-book.jpg) ] --- .pull-left[ - The __identification problem__ - Confounding variables and incorrect causal relationships can improve prediction - Prediction of the impact of interventions (counterfactual analysis) requires accurate representation of the causal relationships ] .pull-right[ ![](img/mcelreath-statistical.jpg) ] ---
Ecology Letters (2022)
The forecast trap
Boettiger, Carl
> _the model that makes the better prediction is not necessarily the one that makes the better policy_ ---
Journal of the American Statistical Association (2021)
What are the Most Important Statistical Ideas of the Past 50 Years?
Gelman, Andrew; Vehtari, Aki
> _there is no way to avoid statistical issues of generalizing from sample to population, generalizing from treatment to control group, and generalizing from observed data to underlying constructs of interest_ --- class: inverse, center, middle # Embracing __hybridation__ ??? - Earlier developments in statistics came from within applied fields such as psychology and genetics --- class: middle .pull-left[ # AI has undergone over phenomenal progress in the last decades ] .pull-right[ - Large data and complex models - Unstructured input (e.g. images, video, natural language, ...) - Scalable training ] --- # Promising research areas .pull-left[ - Combinations of existing methods - Interpretable machine learning - high-dimensional non-parametric modelling - interface between causal inference and decision making ] .pull-right[
Gelman
et. al.
2021
] ---
arXiv:2107.04562 [cs, stat] (2021)
The Bayesian Learning Rule
Khan, Mohammad Emtiyaz; Rue, Håvard
- Machine and Deep Learning algorithms and Bayesian methods are special cases of a general, unifying principle - quantify the uncertainty of DL methods using Bayesian principles - derive new algorithms --- class: inverse, middle, center, impact # Conclusions --- # Improve __dissemination__ of modern statistical methods -- # AI methods have __much value__ to offer -- # AI methods focus on __prediction__ -- # __Data__ (no matter how large) and __prediction__ are not sufficient for scientific progress -- # Find synergies and embrace __hybridation__ of methods --- background-image: url(img/title_background.jpeg) background-size: cover class: title-slide, inverse # Thanks! Slides created with the package R [**xaringan**](https://github.com/yihui/xaringan). Behind the scenes: [remark.js](https://remarkjs.com), [**knitr**](https://yihui.org/knitr), and [RMarkdown](https://rmarkdown.rstudio.com). <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.