At The Outlier, we turn data into compelling charts and stories covering everything from health and climate change to food prices and loadshedding. One of the most common questions we get is: Where do we find our data?
A data story usually starts in one of two ways:
- You have data that you want to analyse to uncover insights and create charts.
- You have a topic in mind but need to find data to support your story.
If you’re in the second camp, this guide is for you. We’ll focus on how to find reliable data online.
Many organisations, research institutes and international agencies make data publicly available – you just need to know where to look.
Evaluating data
To ensure you source reliable data on a particular subject, consider the following approaches:
- Look for reputable sources: Prioritise data from reputable, authoritative and peer-reviewed sources such as academic journals, research institutes, government agencies and professional organisations.
- Assess the source: Check who collected, analysed and published the data. Evaluate their reputation, expertise and credibility. Transparency regarding their methodology, assumptions and limitations is also key.
- Avoid unreliable sources: Avoid biased, outdated or unverified sources such personal blogs, social media posts or commercial websites, especially companies that may have vested interests.
For example, mining employment data from the Department of Mineral Resources and Energy is published in their Mineral Economics Bulletin, which helps verify its credibility.
Types of data
Who shares data? Here are links to the data collected and published by some useful and reputable websites:
Governments and their agencies
South Africa:
- Statistics South Africa: The national statistical service with comprehensive data on various aspects of SA society, economy and environment. Reliable and regularly updated.
- Municipal Money: This platform offers detailed financial information about municipalities in South Africa, including budgets and expenditures – essential data for assessing municipal performance.
Elsewhere:
- Open data portals with extensive datasets: Data.gov (US), Data.gov.uk (UK), Data.gov.in (India), data.europa.eu (European Union)
- NASA: Scientifically validated datasets related to space and Earth sciences.
International organisations
- World Bank: Global economic data
- United Nations: Datasets on global issues such as health, education and human rights
- International Monetary Fund: Economic data helpful to understand financial stability
- World Health Organization: Current global health-related statistics, particularly relating to public health issues.
- International Energy Agency: Statistics on energy production, consumption and sustainability practices.
Academic and research organisations
- Google Scholar: Provides access to a vast range of scholarly articles across disciplines. It’s an excellent starting point for finding peer-reviewed research.
- ResearchGate: Repository for academic papers across various fields.
- Oxford University’s excellent Our World in Data. It compiles research from various sources, focusing on understanding trends in development issues.
Non-profits
- Open Data Institute, based in the UK, promotes the use of open data to drive innovation.
- OpenAfrica, a Code for Africa project that collects and shares open data relating to African development issues.
- World Resources Institute. Focused on environmental issues, its data relates to global sustainability practices and policies.
Industry and private-sector data
- Statista: Aggregates data from wide range of industries. Some content is behind a paywall.
- Bloomberg Terminal: Leading source of financial markets. Subscription-based.
- S&P Global: Extensive market intelligence. Subscription-based.
- Moody’s Analytics: Economic forecasts and risk analysis. Subscription-based.
- Nielsen: Consumer behaviour analytics and market trends. Paid (business agreement).
- Quandl: Financial, economic and alternative datasets; particularly useful for quantitative analysis in finance. Free and paid.
- Amazon Web Services Open Data: Large-scale datasets. Free.
- Google Trends: Search trends over time at no cost.
- BP Statistical Review of World Energy: Global energy statistics, useful for analysing trends. Free
Community and crowdsourced data
- Wikipedia: Valuable summaries of almost every topic, with references to primary sources that can be further investigated.
- Ushahidi: Citizen-generated data relating to governance and human rights issues.
Tools to find data
The key is knowing which organisation is likely to have the data you need. There are two ways to find it: using AI tools or smart keyword searches.
1. Using AI
AI tools such as Perplexity, Gemini, ChatGPT and Claude can help you quickly identify organisations that share data – often more efficiently than a traditional Google search. We like starting with Perplexity, as it provides multiple sources as well as descriptions of the data provided.
Example: Finding dam level data
Typing this prompt into Perplexity: List reliable and authoritative sources of data about South Africa’s dams
Will likely give you an answer much like this one:
Here are some reliable and authoritative sources of data about South Africa’s dams:
- FAO Database on South African Surface Water Bodies: This database contains 3,843 records of dams in South Africa, including details on surface areas, capacities, and coordinates. More than 93% of the records come from the Dam Safety Record dataset.
- National Integrated Water Information System: Managed by the Department of Water and Sanitation, NIWIS provides access to various datasets related to water supply reliability and surface water storage.
- Resource Quality Information Services: This service offers data on water quality in rivers and dams across South Africa. They provide long-term trend reports and visualizations of water chemistry. You can also download an inorganic water chemistry dataset of rivers, dams and lakes in South Africa, prepared from DWS data from 1972-2011.
- OpenUp Data Resource: This resource provides historical dam level data sourced from government websites.
- DWS Dams Data: The Department of Water and Sanitation provides a map viewer with downloadable datasets related to dams across South Africa. The dams data is based on the 1:50 000 topographic series of National Geo-Spatial Information.
At The Outlier, we also look at other well-known local institutions which may have relevant water-related data, such as:
- DFFE, the department of forestry, fisheries and the environment
- The Water Research Commission
- Amatola Water
- AfriWX
- Water Wise
2. Using keywords
Another simple method is to use a search engine and include terms like data, dashboard or statistics in your searches.
Example: Searching for cholera data
Googling ‘cholera dashboard’ will give you the World Health Organization’s cholera dashboard as a top result. This provides authoritative and freely downloadable data.
When sourcing data on the web, always note the date of the information and how regularly it is updated. The WHO’s cholera dashboard, for example, is refreshed every two weeks.
Search shortcuts
If you’re looking for something specific, refine your search using advanced Google techniques:
- site: Limits results to a specific website.
- filetype: Finds specific document types like Excel spreadsheets.
- intitle: Searches for pages with specific keywords in the title.
- inurl: Finds pages with specific keywords in the URL.
- AND Include all search terms
- OR Include at least one search term
- ” ” Search for an exact phrase
- ****Include variations of the search term or use it for wild card searches
- ~ Include synonyms of the search term
Example: Finding mining employment data
Searching ‘mining employment South Africa data’ brings up results from Statista, CEIC Data and Statistics South Africa. But what if you specifically need data from the department of mineral resources and energy?
Try adding:
- site:dmre.gov.za → Limits results to the department of mineral resources and energy’s website.
- filetype:xlsx → Finds Excel files containing relevant data.
Try it yourself!
Now that you have these search strategies, put them to the test. Finding the right data is often just a few smart searches away.