My notes from a lecture by John Samuel Raja

Avyay Kashyap
2 min readSep 18, 2019

The problem with public data:
1. Data resides in silos.
2. Access to this data is expensive and not directly usable.
3. Legality with regards to use of data is unclear.

For example, Bloomberg makes tons of money just by maintaining a data repo that contributes to their revenue. The access to this data is also extremely hard, with terrible UX, hard to understand, with tons of data thrown at the user, without much of it making sense to the untrained eye. So, the data is rich, but very few people know how to use it.

Solution to the problem? A natural language search engine that everyone can use without prior knowledge of existing data. This solves a few things:
1. It aggregates data into a single source/database.
2. The data can then be easily searchable and comparable.
3. The data can be visualised, even at the granular level.

Types of Data

  1. Data that is easily available (data.gov.in, indiabudgets.nic.in, censusindia.gov.in).
  2. Data that is available, but loads of effort needs to be taken to extract it (real estate regulatory data, Guidance value in Luytens).
  3. Data that ought to be free, but is not (Survey of India — monopoly of maps India, IMD — weather data).

What can be done with data?

  1. It can answer business questions
    For example, a diaper company which was #2 in India, was looking to understand how they can capture markets and grow their share. So what is the kind of data they use? A map of the births in India over the last two years shows the rural regions of Maharashtra, Bihar, UP, Rajasthan, etc have the highest numbers. But this kind of a map is not representative of people who can buy diapers. For that, a map of affluence will have to be overlaid on the birthing map. So, a map of the highest number of births in private hospitals will be the most representative and of highest use to such a company.
  2. Make data stories
    -
    How Gurgaon was built? data story by HowIndiaLives. This story shows the growth the city of Gurgaon, which was primarily built by DLF. The story explore the possible connections between the then ruling Congress government and Robert Vadra of DLF. It is an Exploratory data visualisation.
    - IAS Officers and the Notorious Transfer Culture
    - Variance in Education across Dalits

Guiding principle for all journalistic work: Ask “So What?”

Approaching Data

  1. Have a question and use the data to find out the answers
  2. Mine the data and find insights from the Visualisation

Graphical excellence is, Tufte said:
- complex ideas communicated with clarity, precision and efficiency
- giving the user greatest number of ideas in the shortest time with least amount of ink
- always multivariate
-telling the truth

This was part of a talk series held during the DE705 Interactive Data Visualization module by Prof. Venkatesh Rajamanickam, IDC School of Design, IIT Bombay.

--

--