If your science involves working with large data sets, you’ve no doubt cursed your computer, your life, your job, etc, while spending endless hours cleaning messy data. If a human has ever touched your data, it’s likely to have outliers, text entry inconsistencies (like “University of California Berkeley” vs. “Univ. of CA Berkeley”), style errors, and more. And it’s also likely that you’ve either written scripts to clean up your data, or taken the risk of messing everything up by using Excel and its limited undo button.
Google Refine, a product recently made open source by Google, promises to make a small number of people who deal with these sorts of problems (among others, scientists and doctors) very happy. This completely free, open sourced software, equips users with tools to quickly, easily, and safely identify and fix problems in data sets.
Watch the video above and you’ll see how it might integrate into your data management routine. Also note that all data is stored locally on a user’s own hard drive, so confidentiality of posting sensitive information in the cloud is not an issue.
Product: Google Refine…