Big Data Systems 2017

Related courses:

Content:

  • Overview of the course
  • Origin of Big Data Systems
    • Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung: The Google file system. SOSP 2003: 29-43
    • Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004: 137-150
    • Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber: Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26(2): 4:1-4:26 (2008)
  • Storage (ZHU Renyu, DING Guohao)
    • Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen S. Hofmann, Jon Howell, Yutaka SuzueFlat Datacenter Storage. OSDI 2012: 1-15
    • Muralidhar Subramanian, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Sivakumar Viswanathan, Linpeng Tang, Sanjeev Kumar: f4: Facebook’s Warm BLOB Storage System. OSDI 2014: 383-398
    • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, Werner VogelsDynamo: amazon’s highly available key-value store. SOSP 2007: 205-220
    • Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry C. Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, Venkateshwaran Venkataramani: TAO: Facebook’s Distributed Data Store for the Social Graph. USENIX Annual Technical Conference 2013: 49-60
  • In-Memory Database and Concurrency Control (ZHU Chaofan, WANG Yanzhao)
    • Hector Garcia-Molina, Kenneth Salem: Main Memory Database Systems: An Overview. IEEE Trans. Knowl. Data Eng. 4(6): 509-516 (1992)
    • Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, Michael Stonebraker: OLTP through the looking glass, and what we found there. SIGMOD Conference 2008: 981-992
    • Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, Samuel Madden: Speedy transactions in multicore in-memory databases. SOSP 2013: 18-32
    • Xiangyao Yu, Andrew Pavlo, Daniel Sánchez, Srinivas Devadas: TicToc: Time Traveling Optimistic Concurrency Control. SIGMOD Conference 2016: 1629-1642
  • Parallel Processing (YU Ruonan, HUANG Libo)
    • Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, Ion Stoica: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. NSDI 2012: 15-28
    • Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica: Discretized streams: fault-tolerant streaming computation at scale. SOSP 2013: 423-438
  • Parallel Join (LI Na, KUANG Jun)
    • Spyros Blanas, Yinan Li, Jignesh M. Patel: Design and evaluation of main memory hash join algorithms for multi-core CPUs. SIGMOD Conference 2011: 37-48
    • Cagri Balkesen, Jens Teubner, Gustavo Alonso, M. Tamer Özsu: Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. ICDE 2013: 362-373
    • Cagri Balkesen, Gustavo Alonso, Jens Teubner, M. Tamer Özsu: Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited. PVLDB 7(1): 85-96 (2013)
    • Martina-Cezara Albutiu, Alfons Kemper, Thomas Neumann: Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems. PVLDB5(10): 1064-1075 (2012)
  • Structured Data (WANG Jialun, GU Jianan)
    • Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O’Neil, Patrick E. O’Neil, Alex Rasin, Nga Tran, Stanley B. Zdonik: C-Store: A Column-oriented DBMS. VLDB 2005: 553-564
    • Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandier, Lyric Doshi, Chuck Bear: The Vertica Analytic Database: C-Store 7 Years Later . PVLDB 5(12): 1790-1801 (2012)
    • Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis: Dremel: Interactive Analysis of Web-Scale Datasets. PVLDB 3(1): 330-339 (2010)
  • MVCC (ZHANG Yao, ZHENG Beilei)
    • Yingjun Wu, Joy Arulraj, Jiexi Lin, Ran Xian, Andrew PavloAn Empirical Evaluation of In-Memory Multi-Version Concurrency Control. PVLDB 10(7): 781-792 (2017)
    • Dan R. K. Ports, Kevin Grittner: Serializable Snapshot Isolation in PostgreSQL. PVLDB 5(12): 1850-1861 (2012)
    • Per-Åke Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, Mike Zwilling: High-Performance Concurrency Control Mechanisms for Main-Memory Databases. PVLDB5(4): 298-309 (2011)
    • Thomas Neumann, Tobias Mühlbauer, Alfons Kemper: Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. SIGMOD Conference 2015: 677-689
  • (QI Xuecheng, PANG Shuaifeng)

Coordination:

Advertisements