Small Data Research Challenges


Increasingly, the bottleneck in data processing is not the rate at which we can produce results, but rather the rate at which those results can be communicated to a human who must act on them. We need better ways to visually summarize data, communicating the interesting features of large, diverse datasets. We need better ways for non-programmers interact with data, and we need to scale-up existing tools for visual analytics. We need to help people identify interesting features in complex datasets, and then guide them through validating those features.

Analytics at the Edge

Data analytics is moving to the edge. The average smartphone processes almost 180,000 queries per day. IoT devices preprocess data, summarizing and filtering it before sending it into the cloud for deeper analysis. In short, even big data challenges are increasingly being tackled by small, battery-operated devices rather than big, beefy centralized servers. This means that age-old assumptions about optimization goals, system capabilities, and workload characteristics are changing. Latency and power-use become more critical, as each user has their own personal database instances. Usage patterns become more chaotic and variable. Power and performance constraints shift, for example when a device is plugged in. In short, we need to better understand the limitations, capabilities, and constraints of data management at the pocket-scale.

The Curse of Small Data

The big data mentality says that, given enough data, it’s always possible to dig out more signal from the noise. Unfortunately, this is not always true. Tools designed for big data can fail spectacularly when data gets sparse. The consequences are significant, as even well intentioned data scientists can be easily led into p-hacking their data. A major challenge of the small data era is coping with the scale of what we don’t know and can’t prove conclusively given the data we have available.

Putting the User in Control

Issues of Privacy, Reproducibility, Transparency, etc…

Personal Data Interactions

Data is becoming personal. I can wear a device that tracks my vital statistics. I can have devices in my home that track temperature, humidity, and warn me when bad things happen, from needing to water my plants, to my sump pump breaking. My pocket contains an extensive database of many people that I’ve interacted with for the past 2 decades, as well as all of my appointments and communications. Top this off with information that I can download from Google, Facebook, and/or other online sources, and I have a veritable gold mine of information about me. How can I leverage all of this information?