This video discusses that there is rigor to operationalizing an open data program.
- [Instructor] When a government agency decides they are convinced of the value of an open data portal, when they have at least drafted an initial policy that structures the purpose and overall rules, it's time to get down to the important work of designing its processes and operations. We'll begin by taking a look at the general lifecycle, from the identification of data to the final publishing of the data on the open data portal. In this video, I'm going to briefly introduce a seven-step publishing process.
This process is designed to ensure that the right data gets the right vetting prior to publishing. In step one, the team assigned to a evaluate requests for new datasets to be published, receives the request. There are many ways that a request could come in. It could simply be an email from a government staff member. It could be a response to a solicitation, say to a subset of the community. It might even be just the first group of datasets that the open data program team believes will be a good first step.
Who might this evaluation team be? Well that's for each agency to decide. They might consider the core team assigned to deploy and manage the open data program. It could be the agency's data analyst or chief data officer. It might even be some form of a data governance board, assuming the agency employs data governance. The first step, evaluate request, determines basic applicability. This includes whether the data has an obvious data protection. In the US, for example, that would include a lot of data related to personal health records.
Other basic information required could include acknowledging that the organization has possession of the data, that ownership is clear, and that its location is known. Assuming this basic information is collected, and there are no obvious show-stoppers, we move to step two. In step two, the identified data is evaluated for its quality and integrity. It probably comes as no surprise that many organizations store datasets that are not maintained and that contain a lot of errors, or are just incomplete.
Sadly, where data governance has not effectively been applied, bad quality datasets are all too common. In this step, the evaluation team must be confident in the data. They must have assurance that it is complete and accurate. If not, it may not be a candidate for publishing or the data owner will be tasked with fixing the data. If the data demonstrates quality and integrity, it moves to the next step. Now that we've established the data is a good candidate for open data publishing, it goes through a series of steps that are used to lower risk.
We'll talk in more detail about the risk in the next video, so for now we'll briefly touch on the three steps in the risk management arena. Steps three, four, and five, are used to evaluate any legal issues, privacy issues, and security issues. Depending on the organization, this could be one or more people or teams. A legal issue could be related to data that is confidential by law, such as health information in the United States. A privacy issue could be related to personal identifiable information, or PII, as it's known.
PII includes a person's name and address, their age, their gender, and phone numbers. Finally, security information is often a broad area and could include data that provides information that can compromise the security of a physical facility. For example, data on building entry access codes. Steps three through five are essential to protect both the agency and the community. In step six, the business owner of the data has an opportunity to review the state of the dataset.
Any recommendations or suggestions that were made by prior stakeholders in the process, and to give their blessing to the publishing process. In the US, where a law has compelled agencies to release data upon request, assuming it's not protected, this is not an opportunity for the business owner to decline publishing. They can work to increase their confidence in the quality and integrity of the data, perhaps even pushing it back in the process for further diligence, but they ultimately need to reach a comfort and approve it for publishing.
Once step six is complete, the data is published on the open data portal. These seven steps might seem overly burdensome, but they will serve an agency well. Once exercised a few times, this should process reasonably quickly. Of course, the main determinant is the quality of the data.
Dr. Jonathan Reichental introduces real-world use cases for open data, as well as the steps you need to take to develop and operationalize an open data program. He also explains how data scientists use open data to tell stories and drive data visualizations. Along the way, he provides numerous examples of open data in action: improving government, empowering citizens, creating opportunity, and solving public problems.
- Understanding what open data really is
- Current open data efforts around the globe
- Open data in action
- Designing an open data governance process, including policies
- Monetizing open data
- Storytelling with open data
- Selling the value of open data
- Measuring the value of open data