The Effect of Big Data on Recommendation Quality: The Example of Internet Search

By Maximilian Schaefer, Geza Sapi, and Szabolcs Lorincz

Machine learning applications are becoming increasingly important in many industries. Logistic companies rely on them for supply management, banks for evaluating credit default risk, and search engines to match potential search results to the corresponding search request. All these applications rely on data to “train” their underlying algorithms.

Advances in the field of artificial intelligence combined with falling costs for computing, data storage, and, most notably, data collection have consequences for competition policy. Many observers fear that industries in which data and machine learning play an important role may tip toward monopolies. This conjecture is often rationalized by the “positive feedback loop” hypothesis, which states that more data leads to improved quality, which attracts more customers, who provide more data, which increases quality, and so on.

Despite the ongoing policy debate revolving around how to properly assess the role of data for competition policy issues, the empirical literature on the topic is surprisingly scarce. In this paper, we contribute to a better understanding of the role of data for service quality by looking at the example of search engines. Most notably, we elaborate on an identification strategy to disentangle the impact of data from the impact of other factors, like the quality of the algorithm.

We use search engine query logs from the Yahoo! search engine to analyze how data accumulation on specific search terms impacts the quality of the search results for the search terms. Our analysis shows that economies of scale from data increase with the average amount of personalized information the search engine has on the customers who search for a specific search term. This insight lends support to initiatives that enable users of IT services to easily carry their personal data to other service providers as a way to mitigate potential market power.

The full research paper is available as DIW Discussion Nr. 1730 (open access pdf download).