3 Reasons Why Businesses Should (Re)Consider Using Web Data

Businesses, large and small, are skeptical of using web data.

And I don’t blame them.

If a business needs useful web data, the process to acquire it is tedious, messy and expensive. Current business processes are super-efficient and have limited slack time. They need to get to the right web data as quickly as possible. Once they find this web data, they need to ingest and integrate it quickly into their BI systems. Web data needs to be accurate, comprehensive, and up-to-date.

In most situations where a company uses web data, one of these requirements is not met, thus limiting the impact of web data on the business.

It doesn’t need to be like this.


There are at least 3 reasons why data-driven businesses should consider using web data in more diligent and pervasive manner –

Reason #1: Monitor Dynamic Market Trends Continuously

Traditional methods of monitoring market trends (quarterly or monthly updates, analyst reports, sales channels, etc.) are too slow to provide the data needed for emerging business models. Companies need to access key market events as and when they happen. The web can become the primary source for this key data.

Reason #2: Gather Comprehensive Competitive Intelligence Swiftly

As markets and consumers become ultra-segmented, a comprehensive approach to competitive intelligence is critical to a functional BI process. By understanding competitors’ pricing or consumer sentiment or inventory levels, appropriate range of responses can be considered within the business. There are few sources that are more comprehensive than the web.

Reason #3: Respond Rapidly, Intelligently and Profitably

Business can react or respond rapidly with a deeper understanding of a given situation, be it a competitive product price or uncertainty in the supply chain or an impending consumer crisis. By incorporating web data into their critical information flows, businesses can now have more robust BI systems and better data to make informed management decisions.

These are but some of the many reasons why enterprises, large and small, should use web data. This is a fascinating topic for us at Datafiniti. I will be presenting some interesting examples at the upcoming Wolfram Data Summit on how businesses can leverage web data and have a positive impact on themselves and their ecosystems.

Look me up at the conference or watch this space for another update.

How I Stopped Being Afraid of AI

Reducing the burden of human labor using mechanical devices such as the wheel, the lever, the sail, the steam engine, etc. is not a new idea. In most cases, these devices helped societies vastly improve their quality of life and increase opportunities for its citizenry. However, using intelligent, independently thinking machines to help, enhance or substitute human labor and more importantly human thought, is a new phenomenon.

The 2014 short documentary film by C.G.P. Grey called Humans Need Not Apply thoughtfully discusses this impact of automation on humans and paints a rather bleak future of work.

There is an inherent unease about the kind of tasks intelligent machines are now performing while replacing human workers. This view is also shared by some rather influential figures in technology and science such as Ray Kurzweil and Elon Musk.

But, then, a glass half-empty glass is also half-full.

There is a different, more optimistic perspective on Artificial Intelligence – that there is a vast, untapped, positive impact it can have on humans and on the nature of work. That AI is another tool, albeit more powerful and more impactful tool, but a tool nevertheless whose power is waiting to be harnessed by businesses. In fact, at Datafiniti, we leverage AI to perform significant amount of tasks that help create a better and more robust product.

There are also other perspectives on AI. Geoff Colvin of Fortune, recently argued in his book, Humans Are Underrated, that people fearful that their jobs are at risk may be asking the wrong question as to what kind of work a computer will never be able to do. Geoff_Colvin_Book
Instead, Mr. Colvin proposes that we ask what are the activities that we humans, driven by our deepest nature or by the realities of daily life, will simply insist be performed by other humans, even if computers could do them?

We think the subject of AI and its application areas are so exciting that we have even proposed a panel at the upcoming 2016 SXSW Interactive titled – How I Stopped Being Afraid of AI. We think this will bring a fresh perspective to this raging debate. Please click here to know more and vote for us.

We are more likely to fear what we do not understand. Get to know AI and what it can do for your business.

Introducing Property Data from Datafiniti


Today is a BIG day at Datafiniti! We are very excited to announce the release of Datafiniti Property Data.

The Datafiniti Property Data will allow businesses instant access to the cleanest, most comprehensive, up-to-date web data such as rental prices and rental inventory in the sharing economy, values of property investments and overall real-estate market trends. At launch Datafiniti Property Data will have over 1,000,000 listings and will cover the entire market of about 10,000,000 online listings by Q4 2015. You can know more about the product here.

Datafiniti continues to rapidly increase our data coverage and currently has over 72,000,000 unique web data records including

  • 40,000,000+ unique Products information including product pricing and product reviews
  • 32,000,000+ unique Businesses information including location and reviews

Our Property Data helps our current customers in several ways including strategic planning, marketing initiatives, sales & business development. Datafiniti is making a significant impact to the businesses of our current customers. If your business deals with the Real Estate ecosystem, then you should let us know how we can help you.

Need for Speed: Solving the Real Estate Sector’s Need for Fresh Data

“There is gold in them thar hills” said Mulberry Sellers, a Mark Twain character lured by the California gold rush. Today, Mulberry would say “There is gold in them thar web.”

There is indeed a huge amount of publicly-available data filled with critical information your business. A recent Techcrunch article highlights the use of such data in the Real Estate sector:

.. up-to-date information is crucial for ensuring consumer trust. Sites like Trulia need data to be in near real-time. If, for instance, a consumer wants to view the value of homes in his or her neighborhood, Trulia can display the recent sale prices of homes rather than values from two or three years ago.

It is critical for the Real Estate sector to have the latest data, especially so in the accommodation market. The sharing economy has disrupted the hospitality industry. With companies like HomeAway, Booking.com, AirBnB offering attractive rentals at affordable prices, incumbent players like Marriott and Hilton need to know the latest pricing and inventory levels of these rental spaces to stay competitive. Real estate portals like MyRentToOwn.com and other emerging businesses in the Real Estate sector ecosystem need access to the latest listing data to service their customers effectively.

As the supply of homes dwindle (see chart below) and consumers lifestyles get more connected, the need for the latest information is all the more critical. The data needs of these businesses are varied but the data needs be to be current and comprehensive.

Having the freshest online data continues to remain challenge for Real Estate sector with no satisfactory solution. Until now.

Using publicly available data, Datafiniti will soon introduce a solution that solves the need for clean, current and comprehensive real estate data. Providing instant access to this data is going to start a quiet revolution in the way Real Estate uses web data.

There is surely gold in them thar web.

How Do We Move Past Data Wrangling?

“Data Wrangling” needs to be a thing of the past. The business world has recognized the value of data (and big data) for several years now. Unfortunately, it’s still stuck in the quagmire of messy data. The New York Times recently published an article “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”. This is a depressing title, but it’s dead-on. Some key points from the article:

Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.

But if the value comes from combining different data sets, so does the headache. Data from sensors, documents, the web and conventional databases all come in different formats.

The second quote really stands out for me. It specifically mentions the frustration that comes with combining data sets sourced from the web. Of course, at Datafiniti, we’re working to remove this burden from the business analyst or data scientist. Beyond our company though, data providers need to embrace the challenge of providing clean data. With the exception of our own company, we haven’t seen this happen within the field of web data. Traditional web scraping companies have been content to license complicated software to their customers and let them or a partner company handle the sanitization that inevitably comes next. As the table stakes rise in the data world, data wrangling will be absorbed up the data value chain, and businesses will expect clean data to be provided upfront.

At Datafiniti, we’re excited to not only see this evolution, but be a driving force for it. Let’s go from 80% data wrangling / 20% insights to 0% data wrangling / 100% insights. Let’s make data wrangling a thing of the past.

How Web Data Can Help Your Job Search

Last week some friends of ours visited Austin. Like so many people, they were moving to this amazing city for new jobs. One of them was on the verge of accepting a job at Whole Foods, and the other was beginning his search. Since his wife’s office would be downtown, he wanted to find other companies that were located downtown. Since his background was in software development, he needed to find a company that was in technology.

Surprisingly, getting a list of technology companies in a specific geographic area is not an easy thing to do, at least with traditional tools like Google, LinkedIn, etc. My friend had already exhausted all the normal options for this information and he knew his list of companies was woefully small.

As he described his problem with me, I knew I could help. After all, Datafiniti customers face the same problems before they come to us, but on a larger scale. They know the information is on the web, but they don’t know how to compile it quickly and easily. With a smile on my face, I told him I could get his list right away.

I opened up my laptop and issued a request to Datafiniti:

Screen capture from my API client

Screen capture from my API client

Within a few minutes, I had over 1,000 businesses in a file for him, ready to sort through. Since the data included zip codes, he could easily filter out the companies located in downtown Austin. Armed with the websites, he could start going through the list and finding career pages, job openings, and more. Here’s a screenshot of that file:


It was incredibly gratifying to use Datafiniti to help a friend with his job search. It just goes to show many potential applications there are for clean, organized web data.

Fighting Human Trafficking with Web Data

One of the central tenets of our work at Datafiniti is that web data has the potential for tremendous positive change in our world. We recently came across an article that demonstrates this perfectly, entitled “The escort database that combats human trafficking“.

The article talks about DIG (Domain-specific Insight Graphs), which crawls the entire web and converts content into data (sound familiar?). In this case, the data is a collection of markers to help identify human trafficking activity on the web and track down missing persons.

The creators of DIG highlight many of the challenges with and benefits of web data. Specifically:

The internet contains seemingly limitless information, but we’re constrained by our ability to search that information and come up with meaningful results.

The UK’s Human Trafficking Centre identified 2,255 potential victims of human trafficking in 2012, and the Missing Persons Advocacy Network estimated 200,000 US children are at high risk for trafficking into the sex industry. Better tools to address the unwieldy problem of police scouring the entire web for clues are an obvious priority.

These observations mirror the issues we tackle every day at Datafiniti. Although the data vertical is different, the challenges and approaches are incredibly similar.

It’s very exciting to see others recognize the potential of web data and use it for such tremendous social good as this. We’re sure to see many other applications benefiting society as web data becomes more and more accessible.

How Web Data Will Make Business Intelligence Smarter Than Anything We’ve Seen So Far

In the past 2 blogs, I have touched upon about the nature of web data and the immense potential it can unleash if businesses leveraged it appropriately. In this post, I’ll try to offer a glimpse into the future of business intelligence applications that becomes possible with quality web data, the applications beginning to take flight, as well as their evolution in the near future.

Always-On Pricing Intelligence

What’s Happening Now

Visibility into online prices has always been murky. With its quality web data, Datafiniti is finally shedding some light into the vast universe of online product listings. Armed with our quality web data, customers have been able to audit their brand merchandising, get instant access to online product assortment, and compare their online reviews to those of their competitors.

What We’ll See Next

As the window opens on online product data, retailers are going to discover unseen opporunities. By accessing online pricing and reviews from multiple websites almost instantly, a retailer will find gaps in their competitors’ offering or find where new markets are surfacing. Brands will also instantly react to positive or negative reviews, whenever and wherever they show up, resulting in a better brand experience and effective customer service for consumers.

Reputation Management Analytics

What’s Happening Now

One of the early use cases we’re seeing is reputation management for businesses. Franchises like Starbucks, Subway, and others have thousands of locations around the world. Every day reviews are posted online for these locations, and analysts at these firms need visibility into that activity. Traditionally this visibility has been anything but real-time. Now with Datafiniti providing regular updates to reviews across multiple sites, our customers are approaching instant visibility.

What We’ll See Next

As the review data provided by Datafiniti becomes layered with sentiment analysis, businesses will get a real-time pulse on customer moods. We’ve seen this happen with access to Twitter data, but we haven’t seen it across multiple websites or even online data sources. In other words, we’ll go beyond tunnel-vision and move to a wide-angle lens of customer sentiment across the Internet.

Complete Snapshots of Sales Prospects

What’s Happening Now

Supplementing sales leads with external data has been around for a while now. Instant access to web data takes it to another level. Datafiniti customers are getting access to investment activity, news articles, and more for companies ranging from Fortune 500s to startups. By getting timely information like this, sales personnel can obviously make more informed decisions and better segment their sales targets.

What We’ll See Next

With each data point added to a sales lead, the sales process becomes more efficient. Web data will make hyper-targeted sales leads a reality, with sales personnel knowing everything from which person to contact to what products the company already uses. As the sales process reaches this kind of hyper-efficiency, the business sales cycle will become incredibly tight.

Bounding the infinite web is a challenging and inspiring task. Using the latest technology to help businesses leverage quality web data will change the nature and the future of business. The above examples of business intelligence applications are just a small glimpse into what becomes possible with quality web data. What new applications will arise out of the instant access to web data? What impact will it have on business and society? As providers of data that will power this future, we are extremely excited be part of this progress and eager to see what others will develop.

If you haven’t already, register for our presentation during NewCo ATX. I’ll be showing some examples of the above applications, along with other great illustrations of the power of web data.

The Potential of Instant Access to Web Data

In my last post, I asked the question “Is Web Data Possible?” At first, this question may seem obvious, but closer inspection of the challenges in making web data consumable makes it apparent how difficult the problem really is. It also highlights why it hasn’t happened yet, despite many attempts to do so.

At Datafiniti, we’re making web data truly available for the first time, and we’re fascinated the possibilities opening up such a data source represents. What would you do if you could get instant access to all web data? It’s a question that touches on the possibility of accessing almost all human experience and knowledge instantly. It’s incredibly exciting, but also difficult, to think about its impact.

Where We Are Now

The concept of instant access to all web data is still in its infancy. Businesses are already realizing the fruits of data-driven processes and decision-making. Most of this has occurred by using information that’s already available from internal systems – CRMs, SCMs, ERPs, etc. But as more phases of the customer’s journey go online, the more of that customer’s data is native to the web. This has rapidly resulted in the web becoming the largest repository of customer preferences, interactions, and comments, causing Doug Laney of Gartner to comment that the web is the largest database for any company.

“Web scraping” is how most people refer to accessing web data, but this method is incredibly incomplete and error-prone. It doesn’t produce web data in any usable sense. It just produces a simple copy-and-paste log file. Without any refinement through sanitization, aggregation, and other data enrichment techniques, it provides a poor representation of the web data needed by most organizations. Yet despite the poor data quality it produces, it does provide some value and is a popular choice for acquiring web content.

The Next Phase

So, if current approaches of using a small sliver of web data are already providing some utility, what could the potential of instantly accessing ALL web data hold? Right now, it’s difficult to forecast its impact, but we know it will be huge.

The most immediate effect is obvious: web data will significantly improve any business’ ability to react to market changes.

Businesses that thrive are those that are nimble, efficient and responsive to the market. However, all of that is only possible if businesses can access comprehensive information on their customers’ motivations, competitors’ offerings, and overall market ecosystem. Unfortunately, this data, when available, is often incomplete and not current. One way to supplement this critical data set is to leverage web data. The large aggregation of consumer and competitor web data will provide insights that internal company data collection methods would be hard-pressed to deliver. Web data fills the large data gap that exists today for almost every business. Filling that gap means better insight into customers, competitors, and the market as a whole.

Like I said, all of the above is the immediate effect of web data. What comes next has the potential to change how our society as a whole behaves.

We’ve already seen how enabling instant access to single points of web content have revolutionized our society. Google has effectively made the web an extension of every person’s own knowledge. Now apply this same concept to businesses. What happens when the web is an extension of every business’ own database? There is a next-generation of applications and analytics waiting to be imagined and released once web data is a reality.

How You Can Learn More

I’ll be sharing some possible ideas and prototypes for this analysis during the upcoming NewCo Austin event. Register here to attend our presentation on May 29th, 2015 at 9:30 am at our downtown offices. We’d love to have you over!

Bounding the Infinite: Is Web Data Possible?

The infinite.

We can understand the concept but never truly appreciate the scope. Yet we as a people have created something infinite: the Internet. At over 45 billion known web pages, with an exponential growth rate, the Internet is infinite. It contains almost every conceivable piece of information, an endless supply of content, and – potentially – an infinite source of data. Data on businesses, products, real estate, people, and much more all exist on the Internet. Applications that could leverage this data would provide an enormous amount of value to individuals, businesses, and society.

Unfortunately, leveraging web data has so far been unsuccessful. Although a tremendous amount of value lies within web data, it can’t be used because it needs a consistent structure to make it consumable. The source code representation of a product listing on Amazon has almost no overlapping structure to the same product listing on Walmart. The value of web data will manifest once you can tap into both listings, and millions others, without requiring any additional translation from raw source to consumable information.


This is exactly what we’ve done at Datafiniti. By providing a single database of web, we’ve enabled businesses to leverage information from across the Internet in a standardized, easy way. With a single API call, you can access over 50 million records on businesses, products, and properties sourced from hundreds of websites. We continue to increase the size of our data, the variety of sources, and the types of available data on a daily basis.

During the upcoming NewCo Austin event, I’ll be speaking about how we make this possible and what applications we’re enabling by making web data easily consumable. If you’d like to attend our presentation, please register here or contact us. We’d love to share our vision for the possibilities of web data with you!