Every year, floods displace millions of people in Assam and force them to take refuge in relief camps. In fact, the devastating floods were one of the watershed events in the state's history.
The Indian government allocates billions of rupees every year for flood relief efforts in Assam. But a major challenge facing the government is to ensure that these funds are utilised effectively and that the right resources reach the right places quickly.
The problem persists as government data is fragmented and stored in isolated silos. In the case of the Assam government, data related to disaster management is stored across 18 different departments and other central agencies.
But a lab operating at the intersection of data, technology, design, and social sciences came to their rescue. Called Civic Data Labs (CDL), the startup is working closely with the Assam government to consolidate data and analyze it in better ways.
(Credit: Reuters)
Setting data standards
“The Assam government is pumping in a lot of money into building disaster risk reduction measures, but we are still not sure whether the money is being pumped in the right places at the right time and whether it is translating into increased risk reduction,” said Gaurav Godhwani, co-founder of Civic Data Lab. target.
The Assam finance ministry and disaster management authority knocked on the startup's door, and they decided to respond. Godhwani said that under a non-disclosure agreement, the startup was given access to all the datasets except one: the one on public procurement, which the startup needed to curate and standardize.
“We worked with the ministry to ascertain what procurements were being done in relation to the floods. When was the last time embankment repairs were done after the damage occurred? When was restoration of that particular area done? We soon realised that the Assam government did not have clear standards for data management,” he said.
Therefore, CDL introduced an international data standard called the Open Contracting Data Standard, which has been adopted by over 50 countries, making Assam the first state in India to adopt this international standard.
Additionally, Godhwani and his team have created a platform to bring all the datasets together and also a data exchange layer that will enable all publishers, including the 18 departments and other central agencies, to manage disaster-related data.
“Most companies have their own databases and APIs that don't talk to each other. The challenge was to bring all this together in one place. So we streamlined the datasets, standardized them based on common geographic metrics like state, district, revenue area, and gave IDs to all the datasets so that everything is uniquely identified,” Godhwani explained.
(Stakeholder consultation with ASDMA, Source: Civic Data Labs)
Early success
Once the dataset was streamlined and standardized, CDL used simple machine learning techniques to extract valuable insights from the data.
“We have demonstrated hazard analysis and disaster modelling using GIS technology and simple machine learning techniques and released our initial findings. We have focused on high-risk districts and revenue zones and highlighted areas that require more attention,” Godhwani said.
In response, the Assam government has asked for the disaggregated data to be consolidated for periodic analysis over a five-year period.
CDL has developed a platform to comprehensively assess historical trends and facilitate planning for future monsoon cycles and subsequent restoration efforts.
Godhwani said the Assam government has already seen results using the data model CDL built: Officials used the model to determine allocations and then validated it with the help of field staff.
“We found that the algorithm effectively identified areas that were previously overlooked. As a result, income groups that previously were underfunded were able to receive more support based on our model,” Godhwani noted.
Second, it has significantly expanded the department's capabilities. Previously, generating analytical reports during a disaster was a time-consuming task, due to a lack of suitable software or knowledge of effective data visualization techniques.
“Now they're actively taking courses on Coursera about geospatial data management and AI. We're seeing a new enthusiasm for data science and its potential.”
(Capacity building workshop conducted by CDL with state and district officials in Assam, source: Civic Data Labs)
Machine learning technology helps
Godhwani said this was done using three machine learning algorithms. The first algorithm used was Random Forest, which aimed to understand the likelihood of inundation. This was achieved by analysing satellite imagery obtained from multiple sources to observe the extent of inundation during the monsoon season.
Second, we used data envelopment modeling to extract variables from text information such as maternal and child health indicators, public procurement, etc.
“The algorithm helps us understand how many procurements have been done before, during and after the monsoon season. The algorithm creates a model based on all this and gives it a score based on historical data. It also creates a baseline in case historical data is missing and whenever new data comes in, it can measure whether you are exceeding or falling short of that baseline.”
Finally, CDL uses the Topsis scoring method to consolidate all of the data into a risk index that can be measured on a scale of 1 to 5, with 1 indicating low risk and 5 indicating high risk. This scoring method calculates all variables in one place and provides departments with timely, up-to-date information on which areas to prioritize each month, allowing for monthly updates.
“Previously, it would take us around two years to calculate one departmental timestamp. This is the kind of automation we are enabling with machine learning algorithms,” Godhwani pointed out.
Leverage generative AI
While machine learning algorithms have been sufficient so far, Godhwani believes the platform can scale by leveraging large language models (LLMs).
Currently, all the data generated goes into a publishing platform, but Godhwani believes that if the same data needs to be made publicly available, something like a rules-based engine won't be enough.
Information could be disseminated to the public through chatbots and social media applications that the public already use.
“For example, if you're going to this particular area, be aware that there is a possibility of flooding in the next 24 to 48 hours due to heavy rains. Large-scale language models offer these opportunities,” Godhwani said.
CDL has already conducted pilot tests and presented the results to the Assam government, but rollout is still several months away.
“Governments need to get more comfortable with piloting this at scale. The other thing is making sure the ethics and modelling are right. It can't be an illusion, because most of the models today are still illusions. If they are, the risks are extremely high.”
About Civic Data Labs
Godhwani founded Civic Data Labs along with Deepthi Chand. Interestingly, the duo has registered Civic Data Labs as a startup and not a non-profit. The team is 52 strong and includes disaster risk production experts, weather scientists, data scientists, data engineers, technologists, designers/user researchers, and technical architects.
Though the company is bootstrapped, it raises funds for its projects — for example, its project with the Assam government is funded by the Rockefeller Foundation and the Patrick J. McDobbin Foundation.
The startup works closely with non-profit organisations, volunteer organisations, government agencies and departments.
