Big Data comes in many shapes and sizes. Varieties of Big Data include Big Tabular Data (e.g., large enterprise-style relational data sets), Big Graph Data (e.g., large social networks), Big Textual Data (e.g., large collections of blogs or messages), and of course Big Semistructured Data (e.g., large collections of JSON objects) — a.k.a. Big NoSQL Data. This presentation will provide an introduction to the NoSQL faction of the Big Data movement, describing the nature of the data and then briefly surveying some of the platforms available for storing and querying such data today — a.k.a. document-oriented NoSQL database systems. To make the presentational technically concrete, it will include a live look at Apache AsterixDB, an open-source Big Data Management System that originated from several southern University of California campuses and that provides an excellent basis for discussing NoSQL data, its underlying storage technologies, and the kinds of schema-related and query-related features that more and more such systems are beginning to offer.
Michael J. Carey received his B.S. and M.S. degrees from Carnegie-Mellon University and his Ph.D. from the University of California, Berkeley, in 1979, 1981, and 1983, respectively. He is currently a Bren Professor of Information and Computer Sciences at the University of California, Irvine (UCI) and a Consulting Architect at Couchbase, Inc. Before joining UCI in 2008, Dr. Carey worked at BEA Systems for seven years and led the development of BEA’s AquaLogic Data Services Platform product for virtual data integration. He also spent a dozen years teaching at the University of Wisconsin-Madison, five years at the IBM Almaden Research Center working on object-relational databases, and a year and a half at e-commerce platform startup Propel Software during the infamous 2000-2001 Internet bubble. Dr. Carey is an ACM Fellow, an IEEE Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E.F. Codd Innovations Award. His current interests all center around data-intensive computing and scalable data management (a.k.a. Big Data).