Subscribe To Our Newsletter

Get tips and tools to tell your data story better.

No, thanks

 In Articles, DataBlog, How To, Link, Services

Last week we talked about using Tabula to free data from PDFs and get it into a workable state. Sometimes when extracting data using tools such as Adobe Acrobat, Tabula or simply cutting and pasting from documents and websites, you get the data, but it’s not quite in the format you need to work with. Often it comes in a format that isn’t quite recognized by Excel as a well-formatted spreadsheet, because Excel may condense several columns of data into a single column of text.

Excel includes an option called “Text to Columns” which is one of the secret weapons of data productivity in Excel. It’s flexible, easy to use and has saved me hundreds of hours in my years of working with data. For this demonstration, we’ll use the data we extracted from the PDF last week. Only, instead of choosing the “Download CSV” option in Tabula, we’ll choose the “Copy to Clipboard” option. (However, any copied data will work.)

The “Text to Columns” command puts each record in a separate column, dividing the data into columns, as well as getting rid of unwanted delimiters that often come with data that’s cut and pasted from the internet.

Step 1: Paste the unformatted data into an Excel workbook

2015-02-16 Text to Columns in Excel 1

Step 2: Select the column of unformatted data and choose “Text to Columns” from the Data menu.

2015-02-16 Text to Columns in Excel 2  Step 3: Choose the “Delimited” option in the Text to Columns Wizard 

2015-02-16 Text to Columns in Excel 3Step 4: Select each of the delimiters you’d like to use to separate your data

You can choose from the preset delimiters such as Tab, Semicolon, Comma, and you can also enter any special character or letter that your data contains in the “Other” box. This trick is quite handy if you have nuisance material in your copied data, such as extra dashes or colons. Simply enter them in the “Other” delimiter box and they will be separated from your data by the tool. 

2015-02-16 Text to Columns in Excel 4Step 5: Look at how Excel is suggesting your reformatted data will be set up.

If this is how you want your data to appear when it is separated into columns, proceed. If it’s not, play a bit with the selection of delimiters in the menu above. Most times, with a little experimenting, you can get your data to be reformatted exactly how you’d like it.

2015-02-16 Text to Columns in Excel 5
Step 6 Click Finish

Your unusable data is now usable! 

2015-02-16 Text to Columns in Excel 6

Datassist has available a number of “How to’s”, resources, and articles sharing what we find useful in the field of data science for data journalism organizations, non-profits, and business organizations. As you have time, you might look through previous Datassist blogs, such as our data series and the blogs about data-driven journalism last year, including Smart Data-Driven Journalism Tools.

[dismissable_alert class=”alert-info” ]…and take time right now to:

  • Sign up for monthly data science and visualization resources.
  • Take a look at some of the work we have done.
  • Friend us on Facebook for a personal connection with what we do.
  • Follow us on Twitter for the latest news.
  • Get inspired with us on Pinterest.[/dismissable_alert]
Recommended Posts

Start typing and press Enter to search