Real-Time Analysis of High-Volume Structured and Unstructured Data

Miércoles 25 - 17:20

Wing-Hong Andrew Ko and Thorvald Natvig, Medallia

Summary: Medallia's OLAP technology allows us to dynamically aggregate billion record data sets with very high dimensionality, in seconds. At a high level, our in-house OLAP engine uses just-in-time bytecode generation to transform our dataset into an optimized OLAP cube for each query it runs. Combined with our dynamic visualization framework, our clients can easily analyze different slices of customer feedback data across different demographics, regions, time periods, and other attributes on very large, constantly changing data sets. Our text analytics capabilities allow us to categorize comments using a variety of rule-based and statistical approaches. Our main areas of focus are hierarchical, customizable topic classification combined with sentiment analysis. We are working on the extraction of suggestions from customer responses using machine learning and clustering algorithms. Our talk will focus on interesting obstacles encountered when working with real client data and how we provide value using these technologies.

Bios: Wing-Hong Andrew Ko obtained an undergraduate and master's degree in Computer Science from Carnegie Mellon University, with a focus on machine learning. He recently worked on a social media project at Medallia, automatically finding and accumulating feedback data from public online review sites for every hotel in the world. He is currently working on applying natural language processing and machine learning techniques to analyze free text responses.

Dr. Thorvald Natvig obtained his PhD from the Norwegian University of Science and Technology, with focus on high performance computing. He is an active open source developer as the main developer of Mumble, a VoIP platform. At Medallia, he is currently working on architectural performance improvements, I/O optimizations and our next-generation OLAP engine.