{"id":89798,"date":"2024-11-27T07:50:41","date_gmt":"2024-11-27T07:50:41","guid":{"rendered":"https:\/\/outliereditor.co.za\/?p=89798"},"modified":"2025-11-17T17:38:23","modified_gmt":"2025-11-17T17:38:23","slug":"openrefine-part-1-installing-and-merging-datasets","status":"publish","type":"post","link":"https:\/\/outliereditor.co.za\/index.php\/2024\/11\/27\/openrefine-part-1-installing-and-merging-datasets\/","title":{"rendered":"OpenRefine Part 1: Installing and merging datasets"},"content":{"rendered":"\n<p>If you find yourself getting bogged down when you\u2019re working with large datasets in Excel or Google Sheets, OpenRefine might be the solution you&#8217;ve been looking for.<\/p>\n\n\n\n<p>In this tutorial, we&#8217;ll walk you through downloading OpenRefine and how to merge multiple datasets.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Watch the video<\/h2>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"OpenRefine Part 1: Downloading OpenRefine and merging datasets\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/zjNE6kzaKBw?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">What is OpenRefine?<\/h2>\n\n\n\n<p>OpenRefine is a powerful, free tool designed to help data professionals clean, transform and merge complex datasets that traditional spreadsheet software cannot handle. Whether you&#8217;re working with research data, performance reports or any other large dataset, OpenRefine can simplify your data management process.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Download OpenRefine<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open your web browser. Search for \u2018OpenRefine\u2019 and navigate to the official download page.<\/li>\n\n\n\n<li>Select the version that matches your operating system (Windows, Mac, or Linux) and download the software.<\/li>\n\n\n\n<li>Once installed, clicking OpenRefine will open it in a new tab in your browser.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"537\" src=\"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-20.31.31-1024x537.png\" alt=\"\" class=\"wp-image-89800\" srcset=\"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-20.31.31-1024x537.png 1024w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-20.31.31-300x157.png 300w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-20.31.31-768x403.png 768w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-20.31.31-1536x806.png 1536w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-20.31.31-2048x1074.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Prepare your datasets for merging<\/h2>\n\n\n\n<p>You need to do a quick recon of your original datasets. Open them in a spreadsheet program like Excel or GoogleSheets and make sure that:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The column headers are consistent<\/strong>: Your spreadsheets must have exactly the same column headings.<\/li>\n\n\n\n<li><strong>Matching column order<\/strong>: The columns must be in the exact same sequence for OpenRefine to merge the datasets correctly.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"492\" src=\"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.25.59-1024x492.png\" alt=\"\" class=\"wp-image-89801\" srcset=\"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.25.59-1024x492.png 1024w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.25.59-300x144.png 300w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.25.59-768x369.png 768w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.25.59-1536x738.png 1536w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.25.59-2048x984.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Pro tip: Preparing your files<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Download each dataset as a CSV (comma separated values) file. If working with multiple files, compress them into a single zip file.<\/li>\n\n\n\n<li>Select all the downloaded CSV files. Right-click and choose <strong>Compress<\/strong> to create a zip archive.<\/li>\n\n\n\n<li>Rename the zip file to something appropriately descriptive.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"645\" src=\"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.28.11-1024x645.png\" alt=\"\" class=\"wp-image-89802\" srcset=\"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.28.11-1024x645.png 1024w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.28.11-300x189.png 300w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.28.11-768x484.png 768w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.28.11-1536x967.png 1536w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-25-at-21.28.11-2048x1290.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Importing the data<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go back to OpenRefine in your browser.<\/li>\n\n\n\n<li>Click <strong>Choose File<\/strong> and upload your zipped dataset. Click the <strong>Next<\/strong> button.<\/li>\n\n\n\n<li>Review the file recognition settings: You\u2019ll need to make sure that OpenRefine correctly identifies the file format as CSV. If not, you can manually select the correct format from the dropdown menu.<\/li>\n\n\n\n<li>Ensure that \u2018Ignore first row\u2019 option is unchecked if your column headings are in the first row. Otherwise, selected the appropriate number of rows to ignore.<\/li>\n\n\n\n<li>Click <strong>Create Project.<\/strong><\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"569\" src=\"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.28.30-1024x569.png\" alt=\"\" class=\"wp-image-89803\" srcset=\"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.28.30-1024x569.png 1024w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.28.30-300x167.png 300w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.28.30-768x426.png 768w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.28.30-1536x853.png 1536w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.28.30-2048x1137.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Viewing and verifying the merged data<\/h2>\n\n\n\n<p>OpenRefine initially displays only 10 rows. You can adjust this in the dropdown menu \u2013 choosing up to 1,000 rows to see a substantial portion of your data.<\/p>\n\n\n\n<p>Scroll through the data to verify that the files have been merged correctly and that all the original files are represented in the merged dataset.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"524\" src=\"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.38.35-1024x524.png\" alt=\"\" class=\"wp-image-89804\" srcset=\"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.38.35-1024x524.png 1024w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.38.35-300x153.png 300w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.38.35-768x393.png 768w, https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/Screenshot-2024-11-26-at-13.38.35.png 1420w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">The full series<\/h2>\n\n\n\n<p><strong>Part 1: <\/strong>Installing OpenRefine and merging datasets | <a href=\"https:\/\/youtu.be\/zjNE6kzaKBw?si=evt5V83_wC2tjnru\">WATCH<\/a><\/p>\n\n\n\n<p><strong>Part 2:<\/strong> <a href=\"https:\/\/theoutlier.co.za\/how-to\/2024-11-28\/89816\/openrefine-part-2-removing-duplicates-and-using-version-history\/\">Removing duplicates and tracking changes<\/a> | <a href=\"https:\/\/youtu.be\/nPr6Puvdh5M?si=I3PgHj5ibm57x1-Z\">WATCH<\/a><\/p>\n\n\n\n<p><strong>Part 3:<\/strong> <a href=\"https:\/\/theoutlier.co.za\/how-to\/2024-12-05\/89888\/openrefine-part-3-fixing-inconsistencies-with-the-cluster-and-edit-tool\">Fixing inconsistencies with the cluster and edit tool<\/a> | <a href=\"https:\/\/youtu.be\/bnrPUr1lb-8?si=jYaU2Z1IFKOFRyAH\">WATCH<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Notebook<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/openrefine.org\/\">OpenRefine\u2019s official website<\/a> and <a href=\"https:\/\/docs.openrefine.org\/\">documentation<\/a><\/li>\n\n\n\n<li>Read more: <a href=\"https:\/\/theoutlier.co.za\/databites\/2023-09-07\/178\/hands-on-5-reasons-to-switch-to-openrefine-to-clean-data\/\">5 reasons to switch to OpenRefine to clean data<\/a><\/li>\n\n\n\n<li>Subscribe to <a href=\"https:\/\/www.youtube.com\/@OutlierAfrica\">The Outlier\u2019s YouTube channel<\/a> to be notified of new updates<\/li>\n\n\n\n<li>Sign up to <a href=\"https:\/\/theoutlier.co.za\/newsletter\">The Outlier\u2019s newsletter<\/a> for weekly tools, tips and data-driven insights<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you find yourself getting bogged down when you\u2019re working with large datasets in Excel or Google Sheets, OpenRefine might be the solution you&#8217;ve been looking for.<\/p>\n","protected":false},"author":7,"featured_media":89805,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[448,1387],"tags":[482,449,477,1294,480,479,462,485],"newsletter-post":[],"site":[],"class_list":["post-89798","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-how-to","category-the-outlier","tag-data-cleaning","tag-how-to","tag-messy-data","tag-open-source","tag-openrefine","tag-tips","tag-tools","tag-working-with-data"],"acf":{"big_number":"","big_number_caption":"","big_number_link":"","big_number_background":"","big_number_text_colour":"#000000","big_number_icon":false,"big_number_wide":"yes","featured_chart":{"ID":89805,"id":89805,"title":"OpenRefine1-DallE","filename":"OpenRefine1.webp","filesize":82618,"url":"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/OpenRefine1.webp","link":"https:\/\/outliereditor.co.za\/index.php\/2024\/11\/27\/openrefine-part-1-installing-and-merging-datasets\/openrefine1\/","alt":"Image created with Dall-E","author":"7","description":"","caption":"","name":"openrefine1","status":"inherit","uploaded_to":89798,"date":"2024-11-27 07:49:53","modified":"2024-11-27 07:50:22","menu_order":0,"mime_type":"image\/webp","type":"image","subtype":"webp","icon":"https:\/\/outliereditor.co.za\/wp-includes\/images\/media\/default.png","width":1024,"height":1024,"sizes":{"thumbnail":"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/OpenRefine1-150x150.webp","thumbnail-width":150,"thumbnail-height":150,"medium":"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/OpenRefine1-300x300.webp","medium-width":300,"medium-height":300,"medium_large":"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/OpenRefine1-768x768.webp","medium_large-width":768,"medium_large-height":768,"large":"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/OpenRefine1.webp","large-width":1024,"large-height":1024,"1536x1536":"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/OpenRefine1.webp","1536x1536-width":1024,"1536x1536-height":1024,"2048x2048":"https:\/\/outliereditor.co.za\/wp-content\/uploads\/2024\/11\/OpenRefine1.webp","2048x2048-width":1024,"2048x2048-height":1024}},"flourish_chart_id":"","flourish_sub_title":"","flourish_chart_width":"medium","is_newsletter_post":"No","post_style":"ch","show_on_front":"Yes","link_through":"Yes","chart_url":"","background_colour":"#0089AA","text_colour":"#FFFFFF"},"_links":{"self":[{"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/posts\/89798","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/comments?post=89798"}],"version-history":[{"count":9,"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/posts\/89798\/revisions"}],"predecessor-version":[{"id":89910,"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/posts\/89798\/revisions\/89910"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/media\/89805"}],"wp:attachment":[{"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/media?parent=89798"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/categories?post=89798"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/tags?post=89798"},{"taxonomy":"newsletter-post","embeddable":true,"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/newsletter-post?post=89798"},{"taxonomy":"site","embeddable":true,"href":"https:\/\/outliereditor.co.za\/index.php\/wp-json\/wp\/v2\/site?post=89798"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}