class: title-slide, inverse .pull-left[ # Intérêt et limitations des paramètres descriptifs ## ### Facundo Muñoz<br/>facundo.munoz@cirad.fr<br/> ![](img/CirBlanc_L230px.png) ] .pull-right[ ![](img/same-stats-different-graphs-image-1920x1000.gif) .credit[[Justin Matejka, George Fitzmaurice ](https://www.research.autodesk.com/publications/same-stats-different-graphs-generating-datasets-with-varied-appearance-and-identical-statistics-through-simulated-annealing/)] ] ??? --- layout: true <a class="footer-link" href="https://umr-astre.pages.mia.inra.fr/training/notions_stats/">Notions de base en statistiques - umr-astre.pages.mia.inra.fr/training/notions_stats/</a> --- # Intérêt Calcul de statistiques sommaires pour résumer ceci .small[ 56, 31, 56, 8, 32, 14, 36, 56, 19, 1, 3, 104, 43, 44, 72, 9, 28, 25, 27, 55, 20, 16, 16, 7, 23, 40, 48, 64, 22, 55, 95, 15, 49, 52, 50, 10, 65, 12, 39, 36, 3, 26, 23, 20, 43, 108, 53, 38, 4, 8, 3, 13, 66, 67, 50, 61, 36, 38, 29, 9, 81, 3, 26, 12, 36, 37, 70, 1, 35, 12, 50, 35, 9, 54, 47, 8, 47, 2, 29, 61, 38, 41, 23, 24, 1, 9, 11, 10, 29, 47, 71, 38, 49, 65, 18, 0, 16, 9, 19, 36, 60, 24, 25, 44, 55, 3, 57, 83, 84, 35, 4, 35, 26, 22, 2, 14, 19, 30, 19, 68, 11, 75, 48, 32, 36, 39, 50, 11, 0, 63, 82, 26, 3, 82, 73, 19, 33, 48, 8, 10, 53, 20, 71, 75, 76, 54, 44, 5, 22, 94, 29, 8, 98, 9, 89, 1, 101, 7, 21, 52, 42, 21, 116, 3, 44, 29, 27, 16, 6, 44, 3, 28, 38, 29, 10, 10 ] --- # Intérêt Calcul de statistiques sommaires pour résumer ceci en une petite dizaine de chiffres : .pull-left[ .small[ 56, 31, 56, 8, 32, 14, 36, 56, 19, 1, 3, 104, 43, 44, 72, 9, 28, 25, 27, 55, 20, 16, 16, 7, 23, 40, 48, 64, 22, 55, 95, 15, 49, 52, 50, 10, 65, 12, 39, 36, 3, 26, 23, 20, 43, 108, 53, 38, 4, 8, 3, 13, 66, 67, 50, 61, 36, 38, 29, 9, 81, 3, 26, 12, 36, 37, 70, 1, 35, 12, 50, 35, 9, 54, 47, 8, 47, 2, 29, 61, 38, 41, 23, 24, 1, 9, 11, 10, 29, 47, 71, 38, 49, 65, 18, 0, 16, 9, 19, 36, 60, 24, 25, 44, 55, 3, 57, 83, 84, 35, 4, 35, 26, 22, 2, 14, 19, 30, 19, 68, 11, 75, 48, 32, 36, 39, 50, 11, 0, 63, 82, 26, 3, 82, 73, 19, 33, 48, 8, 10, 53, 20, 71, 75, 76, 54, 44, 5, 22, 94, 29, 8, 98, 9, 89, 1, 101, 7, 21, 52, 42, 21, 116, 3, 44, 29, 27, 16, 6, 44, 3, 28, 38, 29, 10, 10 ] ] .pull-right[ .small[ <table> <thead> <tr> <th style="text-align:left;"> Statistique </th> <th style="text-align:right;"> Valeur </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> N </td> <td style="text-align:right;"> 176 </td> </tr> <tr> <td style="text-align:left;"> Moyenne </td> <td style="text-align:right;"> 35 </td> </tr> <tr> <td style="text-align:left;"> Médianne </td> <td style="text-align:right;"> 30 </td> </tr> <tr> <td style="text-align:left;"> Écart type </td> <td style="text-align:right;"> 26 </td> </tr> <tr> <td style="text-align:left;"> Fourchette </td> <td style="text-align:right;"> 116 </td> </tr> <tr> <td style="text-align:left;"> Min </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Max </td> <td style="text-align:right;"> 116 </td> </tr> <tr> <td style="text-align:left;"> Q1 </td> <td style="text-align:right;"> 13 </td> </tr> <tr> <td style="text-align:left;"> Q3 </td> <td style="text-align:right;"> 50 </td> </tr> </tbody> </table> ] ] ??? Ce qui fait un taux de compression de presque 20x !! Et c'est de plus en plus pertinent selon la taille du jeux de données est plus conséquente. --- class: inverse, middle, center # Mais attention ! ![](https://media.giphy.com/media/Y5wlazC8lSVuU/giphy.gif) --- # Anscombe's quartet (1973) .pull-left[ Mêmes : 🗹 __moyenne__ `\(x\)` et `\(y\)` 🗹 __écart type__ `\(x\)` et `\(y\)` 🗹 __corrélation__ entre `\(x\)` et `\(y\)` 🗹 droite de __régression__ ] .pull-right[ ![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Anscombe%27s_quartet_3.svg/638px-Anscombe%27s_quartet_3.svg.png) .credit[[Wikipedia](https://en.wikipedia.org/wiki/Anscombe%27s_quartet)] ] ??? Les quatre jeux de données (x, y) ont les mêmes statistiques descriptifs --- background-image: url(img/same-stats-different-graphs-image-1920x1000.gif) background-size: contain background-position: center class: bottom, inverse, center # The datasaurus dozen .credit[[Justin Matejka, George Fitzmaurice](https://www.research.autodesk.com/publications/same-stats-different-graphs-generating-datasets-with-varied-appearance-and-identical-statistics-through-simulated-annealing/) ] --- class: inverse, middle, center ## Ne vous fiez jamais uniquement aux statistiques sommaires # __visualisez__ toujours vos données .pull-right[ .right[ .quote[» Par-dessus tout, affichez les données.] .credit[Eduard Tufte. _The Visual Display of Quantitative Information_, 1986.] ] ] --- class: middle # Merci! Diapositives créées à l'aide du package R [**xaringan**](https://github.com/yihui/xaringan). En s'appuyant sur [remark.js](https://remarkjs.com), [**knitr**](https://yihui.org/knitr), et [R Markdown](https://rmarkdown.rstudio.com). <a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/deed.fr"><img alt="Licence Creative Commons" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />Ce(tte) œuvre est mise à disposition selon les termes de la <a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/deed.fr">Licence Creative Commons Attribution - Partage dans les Mêmes Conditions 4.0 International</a>.