In this (somewhat delayed) installment of graphic Monday, I compare population and sex-ratio statistics from three different sources – the United Nations Population Division’s World Population Prospects, the World Bank’s World Development Indicators, and the U.S. Census International Data Base.
First, I compare statistics from all three sources regarding total world population. These are shown in the graphic below. All three sources have relatively similar predictions for the overall population of the world over time.
Then I look at sex-ratio estimates for the world population using data from the UN and World Bank (note: this data does not exist for the US Census as an aggregate for the world). Sex-ratio is measured as the number of males divided by the number of females. To ease interpretation, this is reported here as a percentage or the number of males per 100 females.
As is quite clear, the UN estimates for this figure are much smoother than the World Bank estimate. This is because the World Bank estimate here is derived from their measure “% total population female”. To obtain the World Bank sex-ratio from this, I multiplied the female percent statistic by the total population to obtain the estimated number of females. I then subtracted the estimated number of females from the estimated total population to derive the estimated number of males.
When comparing the total population estimates by gender, the two datasets largely mirror one another, but again, the World Bank data is not as smooth due to the coarse method of derivation.
Rwanda as a Specific Case
This episode of graphic Monday was inspired by a conversation I had with a friend last week. His concern was focused on the gender gap in Rwanda’s population since the genocide. While exploring the data on this in more detail, I found that there are large discrepancies in how the US Census Bureau estimates Rwanda’s population when compared to the UN and World Bank. Let’s start with a graphic of the total population.
The World Bank and UN estimates are fairly consistent with one another until we reach the recent period. Starting in the late 2000s, the UN data remains fairly smooth while the World Bank data fluctuates rapidly. Meanwhile, the US data rarely conforms to the World Bank and UN estimates. Also note, all three datasets estimate a higher population when compared to the official Rwanda census figures for 2002 and 2012.
When looking at the data on gender, the differences between the US and international organization estimates is exacerbated. Here’s a graphic of the estimated sex-ratio for Rwanda over time for the three databases.
While again we see that estimates by the World Bank and UN are fairly close, the US Census estimate for the sex-ratio in Rwanda is much, much higher after the 1994 genocide. A t-test reveals that prior to the genocide period, the difference between the US and the UN and World Bank estimates is not statistically significant. After the genocide period, however, the US statistics are significantly different from the international organizations’ estimates with a higher than 99% confidence level. Meanwhile, it is important to note, again, that all three of the databases estimate a higher sex-ratio when compared to the official Rwanda census. (note: Data for this statistic are only available for the most recent census).
Analysis by gender reveals that the US data appears to estimate a higher male population of Rwanda than the other two databases and the Rwandan census.
Exploring two other cases: the United States and Uganda
This finding, that Rwanda population estimates from the US Census International Data base are significantly different from UN and World Bank estimates, calls into question the validity of population measures. Which data are most accurate? It is difficult to tell. All three databases draw on similar sources – such as official census figures and the UN population estimates. Yet all three produce different results, possibly due to variation in estimation procedures.
Do these differences hold for other cases? To quickly test for this, I chose to look at data for the United States and Uganda. The US serves as a test to see if US data is closer to international databases when it comes to its own country. It also tests whether data are more aligned when it comes to developed countries. The second case, Uganda, is used to test whether a neighboring country to Rwanda experiences similar divergences in estimates, despite its different historical trajectory. In particular, I wondered if the differences were due to divergent estimates regarding the impact of the Rwandan genocide on population demographics.
Starting with the US data, the trend found in Rwanda seems to disappear. All three data sources predict similar total population, populations by gender, and sex-ratios. The US data on sex-ratio does seem to diverge again, but only in the ‘projection’ years.
Data for Uganda, other hand, experiences some of the same problems as Rwanda. While the total population projections are quite similar, since the 1980s, US data predicts a lower population for Uganda than international organizations. Note that all three databases are more in-line with Uganda’s official census data than they were for Rwanda. Of the three, the US prediction remains closest to the official Uganda census. Unfortunately, data from Uganda’s 2012 census is not yet available to make the same comparisons.
When it comes to the sex-ratio of the population of Uganda, the data become quite divergent. While the official Uganda census data reflects an overall decline in the sex ratio of the population since the 1960s, the three datasets largely do not follow this trend. The US Census Bureau appears to once again follow the Uganda census data closest, but starting in the mid-1980s, this diverges. Meanwhile, the UN and World Bank data are nearly identical and suggest a flat or upward trend in the sex ratio for Uganda’s population.
To better understand these differences, let’s look at the population breakdown by gender for Uganda during this period. As expected, the US Census data shows that it approximates the Uganda census data closest of the three databases. Starting with the 2002 census, however, it appears to estimate a lower female population that the Uganda census figures.
In sum, this exploration of population and gender demographic data using data from the UN, World Bank, and the US Census Bureau demonstrates that there are quite different ways to go about estimating population, even when relying on the same primary data sources. When looking at overall trends for the globe and specific trends for three cases, there appears to be stark differences generally and across cases. After such a superficial analysis, I am not left with enough confidence to suggest which dataset would be most useful for analysts. Ultimately, that decision will rest with the researcher.
Replication Information and References
If you would like to replicate the graphics in this post, the data are available in this Excel File. Note, you will need to transport these data into Stata prior to creating the graphics. The Stata code for replicating the graphics is available in this PDF File .
Data for this post were obtained from the following sources:
- Rwanda National Institute of Statistics. 2014. 2012 Population and Housing Census (Provisional Results). Raw data available for download here.
- Uganda Bureau of Statistics. 2002. 2002 Uganda Population and Housing Census: Main Report. Report and data available here.
- United Nations Population Division. 2014. World Population Prospects 2012. Raw data available for download here.
- United States Census Bureau. 2014. International Data Base. Raw data available for download here.
- World Bank. 2014. World Development Indicators. Raw data available for download here.