A starter guide to importing CSV files into databases

Takeaways

CSVs are commonly used to share data between programs that can't otherwise communicate with one another.
CSVs resemble spreadsheets, but unlike Excel files do not retain any metadata about formatting or formulas.
Many companies struggle to get CSV data into a database, because of the mismatch in schemas and the lack of data validation.
You can write DIY scripts to import data on an ad hoc basis, or use an off-the-shelf tool.

Understanding file formats is a vital part of handling data. CSV files, in particular, are a common sight if you're routinely dealing with databases or spreadsheets. In this starter guide, we take a closer look at CSV files how you can import CSV to database effectively.

Getting to Know CSV Files

CSV or Comma-Separated Values is a simple file format that utilizes commas to distinguish different values. Despite the name, other delimiters can be used too. CSV files excel in representing structured collections of records, making them a favorite choice in data exchange.

Here's a snapshot of what a CSV file might look like:

contacts.csv name,country,"phone number" friend,Narnia,0-000-000 buddy,Hyrule,1-111-1111 companion,Westeros,2-222-2222

The CSV format is useful because it can represent any structured collection of records in a standardized way.

The CSV file format promises any consuming application that each record will be on a separate line, with commas signifying different fields. Since nearly every application can generate and interpret them, CSV files are considered the lowest common denominator of data exchange.

Need to get a lot of data out of Salesforce but the API is broken? You can always export a report as a CSV file. Need to import bulk data into Quickbooks? Upload a CSV.

CSV is the reliable workhorse of the data world. When there is no API connection between two applications, or when the API fails, you can almost always use CSV files to move data between the two.

CSV is the most important file format Dromo supports, because it is the lowest common denominator of data exchange on the web.
– Dave Fort, CEO at Dromo

How is a CSV file different from a spreadsheet?

Even technical folks frequently mix up CSVs and spreadsheets, because they are both tabular data formats representing a collection of records. However, there are several key differences between the two.

CSV is an open file format based on a common standard (documented in RFC 4180) that can be interpreted by almost every data application. Spreadsheets, like Excel or Google Sheets, are proprietary formats that can contain information about data display, manipulation, and visualization.

The difference between CSV and XLSX

The most common spreadsheet file format is XLSX. This proprietary file format was developed by Microsoft and initially designed to work only with Microsoft Excel (but now they work with more applications).

XLSX is versatile and powerful; if you need to save formulas, multiple worksheets, graphs, or formatted data, XLSX can handle it. On the flip side, most data applications do not support producing or consuming XLSX files directly because of the complexity of the format and challenges with interoperability.

CSVs do one thing really well: simplified data import and export. They use much less memory than a comparable XLSX file, which is why you are more likely to see CSV files stored in data archives than XLSX files.

However, CSVs cannot retain any metadata about a collection. When users edit a spreadsheet in Excel or Google Sheets and then export it to CSV, they often experience a common frustration: all of the formulas and formatting have disappeared!

When you need to retain robust metadata about tabular data, use a spreadsheet. When you need efficient storage or data transfer between applications, CSV is the better choice.

The problem with importing CSV to database

Importing a CSV to a database presents a challenge as the CSV format lacks the database's structure and constraints. The process requires matching the implicit schema of the CSV with the explicit schema of your database, validating the data, transforming it as needed, then isolating and resolving errors.

Ultimately, there are only two ways to solve this problem.

1. Build an importer

Your database might have a strict dress code, but spreadsheets can always show up wearing any old t-shirt.
– Jeff Fiddler, Software Engineer at Dromo

You can always take the initiative and code an importer yourself. The first hard part is connecting the dots between the implicit schema of the CSV (e.g., the column headers) and the explicit schema of your database (e.g., the field names).

Then come more hard parts: enforcing validation logic, applying arbitrary transformations, and cleaning up errors efficiently.

And then what happens if a future user modifies the CSV template, changing the implicit schema ever so slightly? It's easy to write a workaround or ad hoc solution the first time, but the more hotfixes you hardcode, the more brittle your scripts become.

DIY importers come under pressure as you grow, which is the exact worst time to be stuck debugging legacy code.

2. Buy an importer

I've had to write more custom importers than I can count, and it's always a huge pain. Why should developers have to keep reinventing the wheel?
– Micah Buckley-Farlee, Head of Engineering at Dromo

Alternatively, you can use an off-the-shelf importer like Dromo. Dromo helps you quickly connect the dots between a CSV file and a database schema, and apply all the database constraints to the CSV itself.

With Dromo, you can build self-service pipelines that feed CSV data directly from customers into your application database with confidence.

Best of all, you can implement it in minutes without tying up precious engineering time.

A Starter Guide to Importing CSV Files into Databases

On This Page