Tag: working with data
-
OpenRefine Part 2: Removing duplicates and using version history
OpenRefine is one of The Outlier’s favourite tools when working with large datasets. This powerful open-source program is ideal for cleaning messy data. In this post, we focus on two essential features: removing duplicates and using version history to keep track of your changes.
-
OpenRefine Part 1: Installing and merging datasets
If you find yourself getting bogged down when you’re working with large datasets in Excel or Google Sheets, OpenRefine might be the solution you’ve been looking for.
-
5 reasons to switch to OpenRefine to clean data
It has a pretty steep learning curve but it’s definitely worth the effort to learn OpenRefine if you need to clean large amounts of very messy data.
-
Simple guide to scraping data from PDFs
Papers, PDFs and poorly scanned documents are the way most data journalism projects begin. But instead of typing up a new spreadsheet, here’s how to use Adobe Acrobat DC to do all the hard work for you.