**ACM-IKDD Summer ****School **

**on Data Science**

**July 4th – 16th, @ IIT Gandhinagar, Sponsored by ShareChat**

This school is about the algorithmic, statistical, and engineering challenges associated with various stages of data analysis. Each of the sub-topics will include both theoretical and hands-on aspects. We will cover how to collect and clean up data, probabilistic models for data, and various algorithmic challenges that arise when scaling these models to large data. We will also do deep dives for data-driven modeling in three different scientific domains – natural language processing, computer vision, and earth and climate sciences. Lastly, we will learn how to deploy a machine learning model in production and keep it up-to-date. There will be multiple lectures on each subtopic, and participants will be taken from the basics to some of the cutting-edge questions in these areas.

## Topics

- introduction to data collection pipeline, tools and techniques for data processing (e.g., normalization, outlier removal), descriptive statistics, visualization
- models for supervised learning – MLE, MAP, and fully Bayesian modeling, clustering, matrix factorization, spatio-temporal data modeling
- algorithms for data and dimension reduction
- experiment design and model evaluation, A/B testing etc.
- modern models for NLP, computer vision, data-driven modeling for earth and climate sciences
- data-science lifecycle, standard practices of MLOps

## Background / prior courses recommended

The following background is expected from the participants. The links curated contain material that can be used to revise/pick up the necessary material.

- Programming
- A course on python programming, e.g. this course on NPTEL or this one.
- Linear algebra
- MIT-OCW Introduction to Linear algebra
- 3Blue1Brown playlist on linear algebra
- Basics of data structures and algorithms
- NPTEL course by Prof Naveen Garg.
- Introductory probability
- MIT-OCW (first ten lectures should suffice)
- Khan-Academy series on probability

- Any specific software (Matlab, Python, etc. ) to be used:
**Python**

## Speakers

- Keynote Speech – Kevin Murphy (Google).
- Ashish Tendulkar (Google)
- Surender Kumar (Flipkart)
- Satyanath Bhat (IIT Goa)
- Sriparna Saha (IIT Patna)
- Shivam Rana (Swiggy)
- Rishabh Mehrotra (ShareChat)
- Nipun Batra (IITGN)
- Mayank Singh (IITGN)
- Shanmuga Raman (IITGN)
- Udit Bhatia (IITGN)
- Anirban Dasgupta (IITGN)
- Lavanya Tekumulla (Founder, AiFonic Labs)