Turning Obstacles Into Opportunities • Q&A With Datafiniti Customer, Datlinq

Datafiniti
Knowledge from Data: The Datafiniti Blog
5 min readJun 26, 2018

--

Every obstacle with a product is an opportunity to improve and meet customer needs. Or, in the case of one Datafiniti customer, a challenge creates a path to an in-house solution.

Back in December, our team announced the arrival of our v4 API with the hopes of providing customers with more intuitive user functionality. While working with v3, Tom Lous, a data engineer at Datlinq took matters into his own hands.

Datlinq is a foodservice and location database that helps brands gain market insight and connect with potential buyers using Datafinti’s Business Data. To better integrate the Datlinq platform with the Datafiniti API, Lous built an open source Scala wrapper.

Today, we are talking about feedback and the project’s process in a Q&A with Lous and Shion Deysarkar, CEO and Founder at Datafiniti.

Q: Tell me a little bit more about yourself and your role at Datlinq.

TL: I’m Tom Lous, I’m a Data Engineer at Datlinq. I’m responsible for creating the ingestion processing pipeline from all data sources to one huge data storage and applying machine learning tools. I also build APIs on top of that.

SD: That sounds like quite a lot of projects you have going on.

TL: The main thing is the big data pipeline where we collect a lot of data from Datafiniti, Facebook, Google, and other sources. We combine it all in the pipeline by matching it, sanitizing it, and putting it in a final data store to be accessed by our internal systems.

Q: When you started using the Datafiniti API, what were your initial impressions?

TL: It was pretty straightforward, the v3 API. I know you guys were still in development. There were some things I had to communicate and you got them fixed quite early on. I think some of the little hiccups, for all intents and purposes, were pretty useful. You improved a lot with the v4 API. It’s more standardized in the sense of security and flow.

Q: Are there any outstanding features of the API you wish were there?

TL: No. One of the main things we were looking for, Search by ID, is now implemented in the v4 API. What will be neat is if we could limit the size of some fields. For example, returning reviews for search and locations. For Tower of London, it gives me 50,000 reviews which are not very useful to us. It will be great if you could create views where you say, ‘Limit these fields to a max of and only use the latest versions’. That will be a nice feature.

Q: What motivated you to build a Scala wrapper with the Datafiniti API?

TL: That’s a good question. We were going to use this API in different parts of our pipeline. First of all the ingestion part, but also for our data scientist to get little batches of samples, and maybe some other places as well. Mainly, because the standard API is pretty straightforward, but the download flow, especially authentication, was pretty complicated. It involves a lot of steps that you have to remember. I think putting it all in one single location is more advantageous and makes our developers not have to think about how to implement the API authentication and flow. That being said, I talked to my CEO about it, ‘If we’re building this IP, it will be great if we just open source it’. It’s a good way to showcase that we’re part of the open source community and maybe help some people who also use Java or Scala in that fashion.

Q: Were there any other specific challenges that you faced when writing the wrapper?

TL: As I said, the download flow, especially in v3 was challenging. I had to solve it with polling constructs to see if it was ready, when it was ready, and when to start downloading individual files. I ran into some interesting issues. For example, the way you generated downloads, it depended on how short the information was and the underlying Elasticsearch database. Sometimes it resulted in three separate download links and sometimes it resulted in 200 separate download links, especially if the dataset was pretty big. Since I did everything multithreaded, for each download, I spawned different threads. It jammed up my system because I spawned 200 threads and downloaded from AWS. My system didn’t like it, and AWS didn’t like it as well. I had to provision some stuff. ‘If there’s more than X downloads, just use at most, 20 threads’.

Q: Do you think handling the download process is the main benefit of using the wrapper over the API directly?

TL: If I have the library in my Scala […] I don’t have to think about the authentication flow. V4, actually, I think involves an extra step in generating the token, which is a good practice in my opinion, but if you want to deploy that in your software, you have to either remember to first do this, then that, where to store information, and this gives us a general approach. I think that’s the main benefits.

Q: What additional features would you like to add to the wrapper?

TL: Now it’s what I return as pure JSON because I haven’t bothered with creating data constructs out of the results sets. From a purely functional approach, I would rather define models that can be used down the line, so I don’t have to parse or check the JSON. It’s more pure. The difficulty is, because depending on what view you’re using, the model may change. I’m not sure if on your end you can change the data type or the data structure as well, but if you do, the model will break. I’m a bit apprehensive about implementing that, but it’s something that’s on my wish list.

Q: How can other people or developers help with the project?

TL: Oh, that’s great. Just clone the repository and check it out. If you change stuff, make a pull request. I’m not promising that I will get back to you the next day, but I take every contribution seriously. Feel free to help.

At Datafiniti, our team strives to improve our products by actively engaging with our customers. We are excited about the initial feedback we have received and intend to incorporate this information into our development process for future designs. Stay tuned for more updates coming soon on additional 80legs products.

You can learn more about and contribute to Datlinq’s open source Scala wrapper here.

You can connect with us and learn more about our business, people, product, and property APIs and datasets by selecting one of the options below.

If you have any questions, you can contact your customer success representative or email our team at support@datafiniti.co.

Written by Nicholle Shaver, content marketer.

--

--

We provide instant access to business, people, product, and property data sourced from thousands of websites.