Sign In
The Open Data Inventory

​​​​​ Assessing the Coverage and Openness of Official Statistics


Executive Summary: In this article, we describe a new approach to assessing the coverage and accessibility of the datasets most pertinent to managing and monitoring the social, economic, and environmental development of a country. The Open Data Inventory assesses the data provided by national statistical offices (NSOs) through their principal websites using two sets of criteria that evaluate topical coverage and openness. The results are tabulated to allow comparisons within a country across different datasets and between countries. A prototype of the inventory methodology has been applied to nine developing countries and results are described and compared with other assessments of data openness and statistical capacity. Because the method is under development, the identities of the countries have been suppressed to focus on the inventory methodology and its application. Anticipated future refinements include a selectable weighting system and a process for crowd sourcing updates to the inventory.



Open Data and Official Statistics

National statistics offices (NSOs) are the apex of the official statistical systems of their countries. They have the principal responsibility for organizing data collection, setting standards and implementing statistical methods, and publishing and disseminating the results. Where other ministries or branches of government may have responsibility for producing statistics on specific topics – for example the education ministry typically compiles statistics from its administrative records, and in many countries the central bank produces the national accounts as well as financial statistics – the NSO often acts as the central clearing house, bringing together the work of other statistical offices. Only a few countries – the United States is a prominent example – have no central statistical office. In these cases an office of the executive branch typically acts to coordinate the production and dissemination of official statistics. As the custodians of valuable statistics produced at considerable public expense, NSOs or their functional equivalents have a special obligation to maximize their public benefit.


In recent years, many countries have signed commitments (e.g. OGP, IATI) to open data and open government practices.[1] International organizations, such as the World Bank have also embraced the open data movement.[2] Despite these commitments, many governments have not fully realized the potential of providing open access to the full range of data produced and maintained by their national statistical systems. External assessments of government data sources have focused on government budgets or on data with immediate commercial applications such as transportation timetables, map files, and even crime statistics. Often overlooked are the rich datasets of social, economic, and environmental indicators under the purview of the national statistical office or its allied agencies.[3] Furthermore many NSOs seem to have been dilatory in joining the open data movement, despite their absolute and comparative advantage in managing large datasets. Because of their singular responsibility for official statistics, NSOs should be leaders in the open data movement. If there is going to be a data revolution in developing countries,[4] it should be led by NSOs.


Structure of the Open Data Inventory

The Open Data Inventory is designed to evaluate the coverage and openness of data published on NSO websites. While some countries have more than 100 offices and agencies that produce official government statistics,[5] we only consider data that can be found on the NSO website or for which the NSO website provides a direct link. Currently, the most accessible data for many countries are available only on the websites and in the databanks of international organizations. This should not be the case. Governments and their statistical offices are the source of much of the data that appear in international databases and should provide open and timely access to these data.


Traditionally NSOs have disseminated data through yearbooks, abstracts, and paper publications. However, with the rapidly expanding growth of the Internet in every part of the world, all but six countries have established websites for their NSOs.[6] By examining the content of NSO websites, we are able to observe what is available to a typical user of NSO data without placing an administrative burden on government agencies by asking them to respond to questionnaires or other interrogatives.

The Open Data Inventory focuses on what we call "macrodata." By this we mean indicators that have been aggregated above the unit record level. Microdata -- survey responses and administrative records -- are the ultimate source for most macrodata. If proper privacy measures are put in place, microdata should also be released by governments as open data. However macrodata are the final products of the national statistical system that are used to monitor development trends and guide public and private decision making. The breadth of topics covered in the official statistics provided by NSOs and their adherence to standards for open access are therefore relevant measures of the functioning of national statistical systems.


Data Categories

The data needs of different countries may vary considerably from one to another. However, there are major categories of data of importance regardless of the country-specific context. We have identified twenty data categories we consider the foundation of a national statistical system. Other categories that could and perhaps should be included are data on crime, policing, and judicial administration and measures of infrastructure facilities and their use. We will consider these in future versions of the Open Data Inventory.

 

Social Statistics ​ ​
Data categoryRepresentative indicatorsRecommended disaggregation
1. Population and vital statisticsPopulation by 5-year age groups; crude birth rate; crude death rate

Sex; Marital status

 

2. Education: FacilitiesNumber of schools and classrooms; teaching staff; annual budget

Age group; School stage

 

3. Education: OutcomesEnrollment and completion rates; literacy rates and/or competency exam results

Sex; School stage; Age groups

 

4. Health: FacilitiesCore operational statistics of health system (budget, clinics, hospital capacity, doctors, nurses, midwives)

Facility type

 

5. Health: Preventive care and morbidityImmunization rates; incidence and prevalence major communicable diseases

Sex; age as applicable

 

6. Health: Reproductive healthMaternal mortality ratio; infant mortality rate; under-5 mortality rate; fertility rate; contraceptive prevalence rate; adolescent birth rate

Mortality rates disaggregated by sex

 

7. Gender statisticsSpecialized studies of the status and condition of women; violence against women; women in parliament and management 
8. Poverty StatisticsNumber and percentage of poor at national poverty line; distribution of incomeMedian income; income shares by deciles

 

Economic and Financial Statistics ​ ​
Data categoryRepresentative indicatorsRecommended disaggregation
9. National accountsProduction by industry; expenditure by government and householdsProduction by industrial classification; Current and constant prices
10. Labor statisticsEmployment; unemployment; child laborSex; Major age groups; Employment by industry and occupation
11. Price indexesConsumer price index; Producers price index; wholesale price index (optional)By major components
12. Central government financeActual revenues; actual expendituresRevenues by source; Expenditures by major categories
13. Money and bankingMoney supplyM1; M2; and so forth
14. International tradeExports and importsMajor categories using international trade classification
15. Balance of paymentsExports and imports of goods and services; foreign investment; foreign exchange ratesGoods and services disaggregated by principal industry groupings

 

Environment Statistics ​ ​
Data categoryRepresentative indicatorsRecommended disaggregation
16. Land useLand areaUrban/rural locations; cropping patterns
17. Resource useFishery harvests; forests coverage and deforestation; major mining activities including gas/petroleum; water supply & useData in physical units and/or value; Location as appropriate
18. Energy useConsumption of electricity, coal, oil, and renewablesIndustry; households; in physical units
19. PollutionEmissions of air and water pollutants; CO2 and other GHG; toxic substancesBy type of pollutant; geographic or ecological zones as appropriate; in physical units
20. Built environmentAccess to drinking water; access to sanitation; housing quality (from census)Urban/rural locations; in appropriate units

 

For each of the twenty data categories we have listed representative indicators that would be expected to be included in a robust data set. The list is not exhaustive. Most statistical systems will produce many other indicators. To be useful for analysis, indicators should be disaggregated to show differences among important subsets of the population. Geographic disaggregation is also important and is discussed further below.


Scoring system

The Open Data Inventory assesses two important dimensions of each data category: coverage and openness. Data quality in the sense of accuracy or adherence to standards is not considered here as to do so would require detailed review of the internal processes of statistical producers. The IMF's Special Data Dissemination Standard (SDDS)[7] and General Data Dissemination System and Data Quality Assessment Frameworks (DQAF)[8] and the occasional Reports on the Observance of Standards and Codes (ROSC)[9] provide considerable information about the quality of statistics produced by member countries, although largely confined to financial and economic statistics. At a more aggregate level, the World Bank's Statistical Capacity Indicator[10] provides a useful measure of the technical capacity of the national statistical system to produce reliable statistics.


We consider five elements of data coverage: time coverage (five and ten year periods), geographic coverage (sub-levels), and provision of disaggregated data. To quantify results each element is worth one point if the criterion is satisfied; one-half point if the criterion is partly satisfied; and zero otherwise. While this radically simplifies many complicated issues, it both reflects the practical limits of observation and provides the sort of information that someone asking the question "Does the website for country X have complete information on (for example) demographic trends" would find useful.


Coverage criteria ​ ​ ​ ​
Time coverageGeographicDisaggregation
Data available in last 5 yearsData covering last 10 yearsFirst admin levelSecond admin level Recommended disaggregations as described
Suggested point allocation​ ​ ​ ​
Complete: 1 Some: 0.5 None:0Complete: 1 Some: 0.5 None:0Yes: 1 No:0Yes: 1 No:0All: 1 Some: 0.5 None:0

 

Standards for the periodicity of data vary. Here we assess NSOs based on a minimum standard. Most indicators should be produced annually. More advanced statistical systems may produce data at higher frequencies, such as quarterly national accounts. But poverty estimates at five-year intervals would be a significant improvement in many countries. Disaggregation to subnational units is highly desirable, but for some data categories, such as international trade or monetary statistics subnational disaggregations are not expected. Recommended disaggregations by functional characteristics are suggested for each data category.


We divide time coverage into two categories: data available in the last five years and data covering last ten years. When considering coverage for the last five years, we award a full point if data are available for three or more of the last five years, recognizing that there are time lags in producing any indicator. A half point is given if data are available for one or two of the last five years. With regard to the last ten years, a full point is awarded if six of the last years have data. A half point is awarded if data are available for three to five of the last ten years. Zero points are awarded if data are available for the last ten years. 


Most, but not all, data can be reported at the first subnational administrative level and some at the lower second administrative level. The geographic definition of these reporting levels may differ depending on the categories of data. Education and health systems, for example, may define their administrative districts differently. Finally, for each set of indicators we suggest disaggregations appropriate to the specific set of indicators. These include sex, age groupings, school stages, and so forth. Such disaggregations greatly increase the analytical value of the data.


Criteria for data openness ​ ​ ​ ​ ​
Download formatMetadata availableLicensing terms
Machine readableNon-proprietaryUser selection/API or bulk downloadMetadata availableTerms of use stated/ CC BY 4.0 or similar
Suggested point allocation
Yes: 1 No:0Yes: 1 No:0User selected: 0.5 plus API option: 0.5

Specific to indicator/ dataset: 1; Non-specific: 0.5;

No: 0

ToU present: 0.5; CC BY 4.0 or similar terms: 0.5

 

There are also five elements to data openness, each worth a single point if fully present and a half point if partially present. These elements represent a condensation of well-known standards for open data such as the Open Definition.[11] For many who are seeking data, the most important issue will be the ability to download data in a file format suitable for their continuing use. The first criterion asks whether the data are available in machine-readable formats. This excludes PDF files and picture formats (such as JPEG). Excel and other proprietary file formats are machine-readable, but do not meet the second criterion for openness. Data in non-proprietary formats such as CSV (comma separated variables) are easy to analyze with a wide variety of free and commercial software and may be more easily shared.


The third criterion of data openness is the ability of users to select the data of interest to them. Many NSO websites provide only preselected tables of data. If a user interface is provided that allows user selection of data, a half point is awarded. If, in addition, data are available through an Application Program Interface (API) that allows for easy access and automated downloads a full point is awarded.


Metadata are an important element of data openness. They help users assess the quality of the data and contribute to their appropriate use.  One point is awarded if metadata are present that provide specific details about the definition of the indicators or the method of data collection and compilation. Some websites may have general metadata about a large survey or group of data of which an indicator is part. In such cases, a half point is given.


Finally, we consider the licensing of the data.  Licensing should be clear and straightforward on the NSO website. Following the definition of open data we give a half point for website that clearly states its terms of use. An additional half point is awarded if the licensing terms allow for the free use and reuse of the data. Although a specific form of license is not required, the terms of use should correspond to those of a Creative Commons with Attribution International license[12]


Many websites do not provide any information about licensing of data. A copyright symbol at the bottom of the NSO web page does not provide sufficient explanation of the licensing terms of the data on the website. No data will be included in the assessment for which a license or subscription is required unless those data are accessible on public portions of the website.


Initial Results

To see how the Open Data Inventory applies to actual countries, we applied the assessment rubric to nine low- and middle-income countries. The results are preliminary and meant to be demonstrative. To avoid focusing on specific circumstances in the countries, selected the country names have been replaced with regional identifications: LA1 and LA2 are from Latin America; SSA1, SSA2, and SSA3 are from Sub-Saharan Africa; MENA is in the World Bank's Middle East and North Africa region; and ASIA1,  ASIA2, and ASIA3 are from Asia.




Chart 1 shows the scores of each country as a percentage of total possible points available. Only one country included in this sample received more than 50 percent of the possible points for its total score. All nine countries received higher scores for coverage than for openness.  Indeed, for some countries the coverage score was several times larger than the openness score. Most of these countries are far from embracing openness.  There is a high level of correlation (92 percent) between the openness scores and the coverage scores.


The countries in this sample were not randomly selected so country results may not be representative of their region. Still, it is notable that Sub-Saharan African countries tend to have lower scores than the other countries in the sample.


Comparison to Other Indexes

The Open Data Inventory is the first index to assess both the coverage and openness of national statistical systems. The World Bank Statistical Capacity Indicator (SCI), the Open Data Institute and World Wide Web Foundation's Open Data Barometer (ODB), and the Open Knowledge Foundation's Open Data Index (ODI) all provide indexes evaluating statistics. The ODB and Open Data Index both rate the openness of government data. The SCI rates the capacity of national statistical systems to produce good quality statistics but does not specifically consider openness.


The SCI has been applied to 149 countries classified by the World Bank as low- or middle-income. The ODB and Open Data Index include primarily high-income countries in their assessments. Only 18 developing countries are included in all three indexes. In a previous analysis, we found that among these countries, the correlation of overall scores between ODB and Open Data Index is 30 percent. The correlation between the ODB and the SCI is 47 percent. The correlation between the ODI and SCI is 74 percent.[13]

The final scores for the three indexes and the Open Data Inventory (adjusted to a 100-point scale) are shown in Chart 2.


 

 

To better visualize the comparison between the different measures of statistics in these countries, we rescaled the indexes to show the final score for each country as standard deviations from the mean in chart 3. Note that only seven of the nine countries included in this analysis were included in the Open Data Index and that a different set of seven countries overlaps with the Open Data Inventory and the Open Data Barometer. This decreases the number of data points available to compare the four indexes. It also highlights the need for indexes like this to evaluate the data of low- and middle-income countries.


 

Table 1: Correlations of Scores in 2013 Indexes-Countries in Common 

 

Open Data Inventory 

Open Data Barometer 

Open Data Index 

Statistical Capacity Indicator 

Open Data Inventory 

- 

25% 

71% 

49% 

Open Data Barometer 

25% 

- 

-37% 

-16% 

Open Data Index 

71% 

-37% 

- 

71% 

Statistical Capacity Indicator 

49% 

-16% 

71% 

- 


Table 1 shows the correlations between the indexes for the nine countries. The strongest correlation is between the Open Data Index and the Open Data Inventory. One similarity between the two most highly correlated indexes is their similar method of data collection. The Open Data Index is crowd-sourced. The Open Data Inventory is designed so that assessments can be made by interested data users relying on publicly accessible data. In other words, it is not crowd-sourced but could be. In contrast, the ODB and SCI rely on the knowledge of experts. The difference between data that are easily accessible to the public and data that are technically present is one possible reason for the different scores between the OD Inventory and the ODB and SCI. Another is that the indexes evaluate different sets of data. The Open Data Inventory also differs from the others in separately assessing data coverage and openness.

 

Table 2: Correlations of Open Data Index Sub-scores and Other Indexes 

 

Coverage sub-score of Open Data Inventory 

Openness sub-score of Open Data Inventory 

Open Data Barometer 

18% 

29% 

Open Data Index 

70% 

70% 

Statistical Capacity Indicator 

62% 

38% 


In table 2 we look at the correlations between the scores of the other indexes and the two subcategories of the Open Data Inventory: Openness and Coverage. We note a much higher correlation between the SCI and Open Data Inventory sub-score for Coverage than with the SCI and the Open Data Inventory Openness sub-score. The SCI does not assess openness itself; hence, it is not surprising that it shows such a low correlation with the openness sub-score. The SCI's assessment of capacity over a wide range of statistical topics seems to be more closely linked to coverage in the Open Data Inventory. As previously seen, the ODB has a relatively low correlation with the Open Data Inventory. Interestingly, the Open Data Index shows a high level of correlation with both sub-scores of the Open Data Inventory.


Common Subject Scores

One of the biggest differences between the four indexes is the type of datasets evaluated to develop the indexes. For instance, the Open Data Index and Open Data Barometer both look at public transportation timetables. The Open Data Inventory does not evaluate data on transportation timetables. Additionally, one third of the points in the SCI are based on evaluations of statistical methodology. None of the other three indexes include this in their evaluations, although the Open Data Inventory awards a point if methodological information is provided. The differences in the subjects considered by the indexes may explain much of the variation between the indexes.


To refine our comparison of the four indexes, we recalculated the scores for each index using only datasets that are similar to those included in the Open Data Inventory. Table 3 lists the datasets used to develop the scores for the four indexes.  We highlight the datasets that seem similar to those included in light grey.  Based on the sub-scores for these topics, we developed new scores for each index that we have termed "common subject scores."

 

Table 3: Datasets Considered in Indexes 

Open Data Inventory 

Open Data Index 

Open Data Barometer 

Statistical Capacity Indicator 

Population & Vital Statistics 

Government budget 

Government budget 

Poverty survey 

Education: Facilities 

Government spending 

Government spending 

Health survey 

Education: Outcomes 

Emissions of pollutants 

Environment statistics 

Population census 

Health: Facilities 

National statistics 

International trade 

Vital registration system 

Health: Preventative care and morbidity 

Transport timetables 

Health 

Poverty (below $1/day) 

Health: Preventive care & morbidity 

Election results 

Census 

Mortality (under 5) 

Health: Reproductive health 

Company register 

Education 

Measles immunization (under 1) 

Gender statistics 

National map 

Crime statistics 

HIV (adults aged 15-49) 

Poverty Statistics 

Legislation 

Public transport timetables 

Ratio of girls to boys (prim & sec) 

Labor Statistics 

Postcodes/ Zip codes 

Map 

Primary completion 

Price indexes 

 

Land ownership 

Improved water source 

Central government finance 

 

Company register 

GDP per capita growth 

Money & banking 

 

Legislation 

Attended births  

Trade 

 

National election results 

Malnutrition (under 5) 

Balance of payments 

 

 

Agricultural census 

Land use 

 

 

Statistical methodology 

Resource use 

 

 

 

Energy use 

 

 

 

Pollution 

 

 

 

Built environment 

 

 

 


The Common Subject Scores calculated for each index includes many changes. For instance, only four of the original ten datasets used to develop the Open Data Index have subjects in common with the Open Data Inventory.  Therefore, the common subject scores for the Open Data Index only include the scores for those four subjects.

 

Table 4: Correlations of Common Subject Scores in 2013 Indexes-Countries in common 

 

Open Data Inventory 

Open Data Barometer 

Open Data Index 

Statistical Capacity Indicator 

Open Data Inventory 

- 

36% 

84% 

25% 

Open Data Barometer 

36% 

- 

8% 

-27% 

Open Data Index 

84% 

8% 

- 

39% 

Statistical Capacity Indicator 

25% 

-27% 

39% 

- 


Table 4 indicates that the "common subject" versions of the indexes show different trends from the original indexes. The correlation between the Open Data inventory and Open Data Index jumps to 84 percent. By contrast, the correlation between the Open Data Inventory and Statistical Capacity Indicator decreases to 25 percent.  Chart 4 displays the standardized scores for the common subject scores analysis.


 

 

Note that LA2 is unique among countries in this chart. In two indexes it receives the highest score and it is above average for the other two. Such agreement between the indexes is not seen for the other countries. Hence, if we measure the correlations between the scores of the indexes, LA2 may bias the correlations upward due to the limited number of countries analyzed here.

 

Table 5: Correlations of Common Subject Scores in 2013 Indexes 

-Countries in Common Except LA2 

 

Open Data Inventory 

Open Data Barometer 

Open Data Index 

Statistical Capacity Index 

Open Data Inventory 

- 

1% 

80% 

26% 

Open Data Barometer 

1% 

- 

-20% 

-54% 

Open Data Index 

80% 

-20% 

- 

38% 

Statistical Capacity Index  

26% 

-54% 

38% 

- 


Table 5 shows the correlations between the common subject scores without LA2. The correlation between the Open Data Inventory and the Open Data Barometer now drops to one percent. However, the correlation between the Open Data Index and Open Data Inventory remains strong at 80 percent. 


The Open Data Inventory shares some similarities with existing indexes of statistics around the world, most notably with the Open Data Index. Still, there are important differences, particularly compared to the Open Data Barometer with which the Open Data Inventory has a low correlation. The strong correlation with the Open Data Index deserves further analysts for which it would be helpful to have a larger sample of low- and middle-income countries. entric assessment of data openness.


Second Review of Inventory Score

To this point in the analysis, ODW staff have done the Open Data Inventory assessments. To assess the precision of our inventory, we sent our Open Data Inventory rubric to three people who had not previously seen the assessments and assigned them to conduct an inventory for the same nine countries previously assessed. They were only given the website of the NSO for the country and criteria for scoring countries.


Table 6 shows the scores for the first and second assessment of the websites. The assessments conducted by non-ODW staff are shaded in grey. These are

 

Table 6: Comparison of different evaluators 

Proportionate scores (%) 

SSA1 

SSA2 

Asia 1 

Asia 2 

SSA3 

MENA 

LA1 

Asia 3 

LA2 

Data Coverage  

Social Stats Ave. 

5 

13 

3 

7 

8 

19 

7 

4 

11 

11 

9 

13 

8 

11 

9 

10 

16 

18 

Economic & Financial Stats Ave. 

29 

60 

44 

59 

54 

100 

45 

36 

46 

46 

58 

65 

79 

76 

79 

74 

70 

81 

Environ-mental Stats Ave. 

30 

48 

26 

10 

34 

80 

32 

18 

14 

38 

38 

56 

22 

42 

42 

50 

60 

52 

Overall Coverage Score 

27 

56 

29 

38 

42 

90 

38 

24 

39 

48 

48 

64 

46 

58 

56 

58 

73 

78 

Data Openness 

Social Stats Ave. 

1 

9 

2 

4 

2 

8 

2 

1 

8 

7 

7 

10 

6 

6 

6 

8 

14 

16 

Economic & Financial Stats Ave. 

3 

41 

10 

19 

10 

40 

20 

19 

28 

30 

3 

7 

9 

8 

44 

39 

81 

83 

Environmental Stats Ave. 

4 

32 

8 

2 

10 

40 

16 

2 

10 

26 

14 

46 

42 

24 

26 

30 

52 

72 

Overall Openness Score 

3 

40 

9 

15 

10 

40 

16 

8 

27 

31 

22 

43 

39 

31 

35 

37 

70 

79 



A summary of the differences between the first and second evaluators is provided in table 7.  The table indicates the differences in overall score (relative to the score provided by the first assessment).

 

Table 7: Summary Statistics on comparing evaluators 

 

SSA1 

SSA2 

ASIA1 

ASIA2 

SSA3 

MENA 

LA1 

ASIA3 

LA2 

Median 

Difference in overall score 

216% 

40% 

150% 

-40% 

20% 

54% 

5% 

5% 

10% 

20% 

Total point difference 

32.8 

7.7 

39.0 

-10.6 

6.4 

18.8 

2.3 

2.4 

7.0 

7.04 

Agreement on each cell 

53% 

75% 

45% 

69% 

69% 

51% 

56% 

64% 

62% 

62% 

Time spent on second assessment (in hours) 

5 

2 

4.5 

2 

4.5 

6.5 

4 

2.25 

6.5 

4.25 

Same language for assessments 

Y 

Y 

N 

Y 

Y 

Y 

Y 

Y 

N 

 


One country, SSA1, showed a significant amount of disagreement from one assessment to the next. One of the major discrepancies was the interpretation of the terms of use. While one evaluator felt that the terms of use were not specifically about the data and should receive no points, the other evaluator felt that the terms of use deserved points for each data category. Since the country was already receiving such a small score, the different interpretation of the criteria for one point created significant differences between the two assessments.


Another country deserves special note. The second assessment of ASIA1 was performed in the non-English version of the website and was done after several software downloads required by the website. The first assessment was only done in English and was done without any software downloads. The first evaluator was not able to successfully download the multiple sets of software to query a certain database on the website. As a standard practice, we decided that, going forward, assessments will only be conducted of data for which no unique software is required. The two assessments of ASIA1 would likely be much more similar if both assessors had evaluated the country without the software downloads.


It is also interesting to note that the second assessment of LA2 was done in the local language while the first was done in English. The differences between the two assessments were similar to the differences observed in countries for which both assessments were done in the same language. This increases our confidence that assessments do not necessarily need to be done in the predominant language of the country being assessed.

 

Table 8: Percentage of cells in agreement with each other 

 

Data for last 5 years 

Data for last 10 years  

First admin level 

Second admin level 

Functional characteristics 

Machine readable 

Non-proprie 

-tary 

User selection / API or bulk down-load 

Meta-data available 

Terms of use stated / CC BY 4.0 or similar 

Min 

5% 

10% 

24% 

47% 

32% 

55% 

45% 

0% 

0% 

0% 

Max 

75% 

75% 

88% 

93% 

84% 

100% 

100% 

100% 

95% 

100% 

Median 

60% 

50% 

59% 

87% 

47% 

85% 

85% 

65% 

40% 

20% 

Mean 

54% 

46% 

59% 

74% 

52% 

81% 

81% 

56% 

44% 

39% 


Table 8 indicates the percentage of cells in agreement between the two evaluators aggregated at the column level. This table suggests that the highest level of agreement between the points awarded for machine readability and the non-proprietary form of downloads. There was the lowest level of agreement on terms of use and creative commons licenses.  The differences between the different evaluators have prompted us to clarify these assessment points.  Additionally, We foresee the need for  spot-checking or other quality control measures as other evaluators are engaged. Whether the inventory maintained through a contributory, crowd-sourced process remains an open question.


Future Development

The results of the preliminary application of the Open Data Inventory demonstrate the value of an approach to assessing national statistical systems that is both broader in coverage and more intensively focused on the openness of the data sets. After further testing and refinement of the scoring rubric, we plan to engage volunteers from around the world to provide an assessment of at least 70 developing countries for the initial Inventory. Open Data Watch staff will provide training and quality control. With experience gained in the first full round of assessments, a process for crowd-sourced reports and assessments will be considered.

The Open Data Inventory will be produced annually. We will encourage feedback from governments and hope that the OD Inventory will become a basis for a dialog with national statistical offices. Although the OD Inventory has been developed with a primary focus on the statistical systems in developing countries, it could also be applied to those of high income countries. As results are released, the spreadsheets containing the source data will also be released showing notes and links to sources used for scoring. With the transparent spreadsheets it will be possible for users to develop their own weighting schemes and apply them to our data.


 


Footnotes

  1. See, for instance, the Open Government Partnership: http://www.opengovpartnership.org/countries and the International Aid Transparency Initiative: http://www.aidtransparency.net/about/partner-country-perspectives
  2. See http://data.worldbank.org/
  3. Open Data Watch. 2014. Indexes of Data Quality and Openness: http://www.opendatawatch.com/Pages/MR-Indices.aspx
  4. United Nations, 2013. A New Global Partnership: Eradicate Poverty and Transform Economies Through Sustainable Development. http://www.un.org/sg/management/pdf/HLP_P2015_Report.pdf​​
  5. Laux, Richard and Richard Alldritt: UK Statistics Authority, 2009, Boundary issues in relation to UK official statistics. http://www.statisticsauthority.gov.uk/reports---correspondence/reports/conference-papers/boundary-issues-in-relation-to-uk-official-statistics.pdf
  6. See Open Data Watch list of National Statistical Offices (NSO) websites: http://www.opendatawatch.com/pages/NSO-list.aspx
  7. See http://dsbb.imf.org/pages/sdds/home.aspx
  8. See http://dsbb.imf.org/Pages/DQRS/DQAF.aspx
  9. See http://www.imf.org/external/NP/rosc/rosc.aspx
  10. See http://bbsc.worldbank.org​
  11. See http://opendefinition.org/od/
  12. The most recent version is CC BY 4.0. See https://creativecommons.org/licenses/by/4.0/
  13. Read more at http://www.opendatawatch.com/Pag and es/MR-Indices.aspx