class: center, middle, inverse, title-slide # Data science: A game changer for science and innovation ### Thiyanga S. Talagala, PhD ### Department of Statistics, University of Sri Jayewardenepura ### 3 February 2022 --- class: middle, center # **Data science:** A game changer for science and innovation --- background-image: url(orange.jpeg) background-position: center background-size: cover class: center, middle --- background-image: url(f1.png) background-position: center background-size: contain class: center, middle --- background-image: url(f2.png) background-position: center background-size: contain class: center, middle --- background-image: url(f3.png) background-position: center background-size: contain class: center, middle --- background-image: url(f4.png) background-position: center background-size: contain class: center, middle --- ## Algorithm a set of instructions used to solve a problem -- **MEDIPI** (**MEDI**icinal **P**lant **I**dentification) algorithm ![](m1.png) --- **MEDIPI** (**MEDI**icinal **P**lant **I**dentification) algorithm ![](m2.png) --- **MEDIPI** (**MEDI**icinal **P**lant **I**dentification) algorithm ![](imgleaf.png) --- **MEDIPI** (**MEDI**icinal **P**lant **I**dentification) algorithm ![](m3.png) --- class: middle, center # Can you patent an algorithm? --- background-image: url(p1.png) background-position: center background-size: contain --- ### Largest machine learning and artificial intelligence (AI) patent owners - 2020 <img src="innovation_files/figure-html/unnamed-chunk-1-1.png" width="100%" /> Data: https://www.statista.com/statistics/1062360/autonomous-driving-patent-owners-japanese-authority/ --- class: middle, center # Facebook: Scan photos for brands and see what products you like --- background-image: url(frenchfries.jpg) background-position: center background-size: contain --- class: middle, center # Better change of securing a patent --- background-image: url(bp.jpg) background-position: center background-size: contain --- - Model building: Given data predict the likelihood of **Preeclampsia** (a pregnancy complication characterized by high blood pressure and signs of damage to another organ system, most often the liver and kidneys) `$$Y = f(X)$$` - Incorporate this into the device to generate an alert when the likelihood of having Preeclampsia is high. --- background-image: url(sydney.jpg) background-position: center background-size: cover --- background-image: url(s1.png) background-position: center background-size: contain --- class: middle, center # Role of Statisticians --- class: inverse # Prof. Laleen Karunanayake ![](l1.png) --- class: inverse # Prof. Upul Subasinghe ![](l2.png) --- # Statistics: The Science of Data 1. Data collection - Design of Experiments 2. Data visualization 3. Data analysis 4. Interpretation of Results --- class: middle, center ## Open Science "the movement **to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of society**, amateur or professional" source: https://en.wikipedia.org/wiki/Open_science --- class: middle, center ## Reproducibility "Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them." source: https://www.coursera.org/learn/reproducible-research --- ## Open source software authored by me `tea`: R package for tea exporting countries `mozzie`: R package for dengue cases in Sri Lanka `colmozzie`: R package for dengue cases and climate variables in Colombo Sri Lanka `m4comp2018`: R package for M4 Competition time series data `DSjobtracker`: R package containing information related to data science job advertisements. What skills and qualifications are required for data science related jobs? `MedLEA`: The MedLEA package provides morphological and structural features of 471 medicinal plant leaves and 1099 leaf images of 31 species and 29-45 images per species. --- ## Open source software authored by me `ceylon`: An R package to plot maps of Sri Lanka `covid19srilanka`: An R package to get tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic in Sri Lanka. `seer`: R package for feature-based time series forecasting. `tsfeatures`: R package tsfeatures provides methods for extracting various features from time series data. `explainer`: Take a peek inside a random forest. `tsdataleaks`: R Package for detecting data leakages in time series forecasting competitions. `nic`: Nature inspired colour palette for data visualization. --- background-image: url(nic.png) background-position: center background-size: contain --- # Impact ## `seer` package downloads <img src="innovation_files/figure-html/unnamed-chunk-2-1.png" width="100%" /> --- background-image: url(keynote.png) background-position: center background-size: contain --- # Small things matter a lot! - Give it a catchy name - Add a logo ![](hex.png) --- class: center, middle # Thank You!
<i class="fab fa-twitter fa-3x faa-float animated " style=" color:lightblue;"></i>
<i class="fab fa-github fa-3x faa-float animated " style=" color:black;"></i>
# @thiyangt ### web: https://thiyanga.netlify.app # email: ttalagala@sjp.ac.lk