4 Reasons You Should Use JSON Instead of CSV

Datafiniti
Knowledge from Data: The Datafiniti Blog
3 min readMar 16, 2014

--

Do you deal with large volumes of data? Does your data contain hierarchical information (e.g., multiple reviews for a single product)? Then you need to be using JSON as your go-to data format instead of CSV.

We offer CSV views when downloading data from Datafiniti for the sake of convenience, but we always encourage users to use the JSON views. Check out these reasons to see how your data pipeline can benefit from making the switch.

1. JSON is better at showing hierarchical / relational data

Consider a single business record in Datafiniti. Here’s a breakdown of the fields you might see

  • Business name
  • Business address
  • A list of categories
  • A list of reviews (each with a date, user, rating, title, text, and source)

Now consider a list of these product records. Each product will have a different number of prices and reviews.

Here’s how some sample data would look like in CSV (Datafiniti link):

And here’s that same data in JSON (Datafiniti link):

The JSON view looks so much better, right?

2. CSV will lose data

If you look closely at the CSV data above, you’ll notice that we have a set number of prices and reviews for each product. This is because we’re forced to make some cut-off for how many prices and reviews we show. If we didn’t, each row would have a different number of columns, which would make parsing the data next to impossible. Unfortunately, many products have dozens or even hundreds of prices and reviews. This means you end up losing a lot of valuable data by using the CSV view.

3. The standard CSV reader application (Excel) is terrible

Excel is great for loading small, highly-structured spreadsheet files. It’s terrible at loading files that may have 10,000 rows, 100+ columns, with some of these columns populated by unstructured text like reviews or descriptions. It turns out that Excel does not follow CSV-formatting standards, so even though we properly encode all the characters, Excel doesn’t know how to read that. This results in some fields spilling over into adjacent columns, which makes the data unreadable.

4. JSON is easier to work with at scale

Without question, JSON is the de-facto choice when working with data at scale. Most modern APIs are RESTful, and therefore natively support JSON input and output. Several database technologies (including most NoSQL variations) support it. It’s significantly easier to work with within most programming languages as well. Just take a look at this simple PHP code for working with some JSON from Datafiniti:

Further Reading

Check out these helpful links to get more familiar with JSON:

You can connect with us and learn more about our business, people, product, and property APIs and datasets by selecting one of the options below.

--

--

We provide instant access to business, people, product, and property data sourced from thousands of websites.