Thursday, January 11, 2018.
Title: Providing fast and reliable transactions in distributed database systems
Distributed database systems run transactions across machines to ensure serializability. Traditional approaches for distributed transactions are based on two-phase locking or optimistic concurrency control. However, these protocols suffer from performance degradation because of aborting and/or blocking. In addition, to provide fault tolerance, traditional approaches replicate data relying on an extra layer of consensus protocols such as Paxos, which incurs extra cost. This talk focuses on one question: how can we improve the system performance without giving up the serializability guarantee? It will cover a new concurrency control protocol based on dependency tracking, how to combine it with conflict analysis to support a more diversified workload, and how to extend it to support geo-replication in a merged layer style with lower overhead.
Shuai Mu is a post-doctoral researcher at New York University, working with Mike Walfish and Jinyang Li on distributed systems. He earned his PhD from Tsinghua University (Beijing, China) in 2015.
Faculty Host: Dr. David Lowenthal
Thursday, November 30, 2017.
Title: Privacy in a World of Mass Surveillance
In this talk I will discuss Pung, a private communication system that allows users to exchange messages over the Internet without revealing any information to network providers (ISPs, E-mail servers, etc.). In particular, providers do not learn with whom users communicate, how often, or the content of any communication. We show that this strong privacy property is achievable even when providers are arbitrarily malicious.
To make Pung efficient in practice, we build a new private information retrieval (PIR) library called SealPIR. This library allows a user to retrieve an element from an untrusted server without revealing to the server which element was retrieved. SealPIR is orders of magnitude more network efficient than existing PIR constructions, is concretely efficient, and can be used in other applications.
Sebastian Angel is a Ph.D. candidate at The University of Texas at Austin and visiting academic at New York University's Courant Institute of Mathematical Sciences. He is interested in topics at the intersection of security, systems, and networking. Beyond private communication, his work includes adding verifiability to large scale auction systems, architecting OS defenses against malicious peripheral devices (USB flash drives, keyboards, etc.), and ensuring that applications running on public data centers achieve predictable performance.
Faculty Host: Dr. Katherine Isaacs
Tuesday, November 21, 2017.
Title: 2-3 Cuckoo Filters for Faster Triangle Listing and Set Intersection
Abstract: We introduce new dynamic set intersection data structures, which we call 2-3 cuckoo filters and hash tables. These structures differ from the standard cuckoo hash tables and cuckoo filters in that they choose two out of three locations to store each item, instead of one out of two, ensuring that any item in an intersection of two structures will have at least one common location in both structures. We demonstrate the utility of these structures by using them in improved algorithms for listing triangles and answering set intersection queries.
Prof. Goodrich received his B.A. in Mathematics and Computer Science from Calvin College in 1983 and his PhD in Computer Sciences from Purdue University in 1987. He is a Chancellor's Professor at the University of California, Irvine, where he has been a faculty member in the Department of Computer Science since 2001. He was a professor in the Department of Computer Science at Johns Hopkins University from 1987-2001. Dr. Goodrich's research is directed at the design of high performance algorithms and data structures with applications to information assurance and security, the Internet, machine learning, and geometric computing. He is an ACM Distinguished Scientist, a Fellow of the American Association for the Advancement of Science (AAAS), a Fulbright Scholar, a Fellow of the IEEE, and a Fellow of the ACM.
Faculty Host: Dr. Stephen Kobourov
Thursday, November 9, 2017.
Title: Scalable Learning Over Distributions
Abstract: A great deal of attention has been applied to studying new and better ways to perform learning tasks involving static finite vectors. Indeed, over the past century the fields of statistics and machine learning have amassed a vast understanding of various learning tasks like clustering, classification, and regression using simple real valued vectors. However, we do not live in a world of simple objects. From the contact lists we keep, the sound waves we hear, and the distribution of cells we have, complex objects such as sets, distributions, sequences, and functions are all around us. Furthermore, with ever-increasing data collection capacities at our disposal, not only are we collecting more data, but richer and more bountiful complex data are becoming the norm.
In this presentation we analyze regression problems where input covariates, and possibly output responses, are probability distribution functions from a nonparametric function class. Such problems cover a large range of interesting applications including learning the dynamics of cosmological particles and general tasks like parameter estimation.
However, previous nonparametric estimators for functional regression problems scale badly computationally with the number of input/output pairs in a data-set. Yet, given the complexity of distributional data it may be necessary to consider large data-sets in order to achieve a low estimation risk.
To address this issue, we present two novel scalable nonparametric estimators: the Double-Basis Estimator (2BE) for distribution-to-real regression problems; and the Triple-Basis Estimator (3BE) for distribution-to-distribution regression problems. Both the 2BE and 3BE can scale to massive data-sets. We show an improvement of several orders of magnitude in terms of prediction speed and a reduction in error over previous estimators in various synthetic and real-world data-sets.
Junier Oliva is a Ph.D. candidate in the Machine Learning Department at the School of Computer Science, Carnegie Mellon University. His main research interest is to build algorithms that understand data at an aggregate, holistic level. Currently, he is working to push machine learning past the realm of operating over static finite vectors, and start reasoning ubiquitously with complex, dynamic collections like sets and sequences. Moreover, he is interested in exporting concepts from learning on distributional and functional inputs to modern techniques in deep learning, and vice-versa. He is also developing methods for analyzing massive datasets, both in terms of instances and covariates. Prior to beginning his Ph.D. program, he received his B.S. and M.S. in Computer Science from Carnegie Mellon University. He also spent a year as a software engineer for Yahoo!, and a summer as a machine learning intern at Uber ATG.
Faculty Host: Dr. Mihai Surdeanu
Tuesday, November 7, 2017.
Title: Uncovering and Addressing Security Assumptions About Hardware
Abstract: Due to manufacturing error, reliability failure modes, or just complex feature design, hardware occasionally exhibits surprising behaviors. Unknowingly, software security can rest on incorrect assumptions about hardware minutiae. In my research I expose how previously unknown or under-appreciated hardware behaviors can result in side-channels that have high-level privacy and security impact in software like web browsers. Motivated by these attacks, I also work to build architectures for both software and hardware that are inherently resistant to side-channels for mitigation against both known and unknown attacks.
In this talk I highlight attacks we have developed using details of hardware behavior, as well as a defensive browser scheme to mitigate such attacks. I first describe how we use floating-point timing side-channels to break web privacy in all major desktop web browsers. I then use these attacks and others as a motivation for our defensive browser proposal: Fermata. I discuss both the complete vision of Fermata as well as diving into the details of our prototype implementation: Fuzzyfox. Fuzzyfox is an incomplete Fermata implementation designed to field-test the ideas of Fermata and their impact on security and usability.
David is a PhD candidate in Computer Science at UC San Diego working in security, systems, and hardware. His research interests focus on the collision between software security theory and hardware reality. Previously, David received his B.S. in Computer Science from Carnegie Mellon University in 2011 and co-founded the San Diego-based security company Somerset Recon in 2012. He expects to defend his thesis in 2018.
Faculty Host: Dr. Christian Collberg
Thursday, October 26, 2017.
Title: Machine Learning By the People, for the People
Machine learning is concerned with the design and analysis of algorithms that compute general facts about an underlying data-generating process by observing limited amounts of that data. Classically, the outcome of a learning algorithm is considered in isolation from the effects that it may have on the process that generates the data or computes the outcome. With data science and the applications of machine learning revolutionizing day-to-day life, however, people and organizations increasingly interact with learning systems. It is essential to account for the wide variety of social and economical limitations, aspirations, and behaviors demonstrated by these people and organizations, which fundamentally change the nature of learning tasks and the challenges involved. I will describe three examples from my work on the theoretical aspects of machine learning and economics that account for these interactions: learning optimal policies in game-theoretic settings, without an accurate behavioral model, by interacting with people; learning the parameters of an optimal economic mechanism when the behavior and preferences of people can change over time and as the result of their interactions with the learning system; and collaborative learning in a setting where multiple learners attempt to discover the same underlying concept.
Nika Haghtalab is a Ph.D. candidate at the Computer Science department of Carnegie Mellon University, co-advised by Avrim Blum and Ariel Procaccia. She is a recipient of the IBM and Microsoft Research Ph.D. fellowships, and the Siebel Scholarship.
Faculty Host: Dr. John Kececioglu
Tuesday, October 10, 2017.
Title: Visual Analytics of Stance in Social Media
This talk will give an overview of the StaViCTA framework project that aims to tackle the challenge of investigating stance (such as attitudes, feelings, perspectives, or judgements) in written human communication. After introducing our definition of stance and providing visualization showcases on how stance analysis might be used to better understand social media, I will discuss several visual analytics tools that were especially designed to support the development of stance classification. They reach from approaches providing fundamental insights into text data that are necessary for building an appropriate linguistic stance theory to approaches for text data annotation and visualization that facilitate the entire process of training a stance classifier.
Andreas Kerren received the B.S. and M.S. degrees as well as his PhD degree in Computer Science from Saarland University, Saarbrücken (Germany). In 2008, he achieved his habilitation (docent competence) from Växjö University (Sweden). Dr. Kerren is currently a Full Professor in Computer Science at the Department of Computer Science, Linnaeus University (Sweden), where he is heading the research group for Information and Software Visualization, called ISOVIS. His main research interests include the areas of Information Visualization, Visual Analytics, and Human-Computer Interaction. He is, among others, editorial board member of the Information Visualization journal, has served as organizer/program chair at various conferences, such as IEEE VISSOFT 2013/2018, IVAPP 2013-15/2018 or GD 2018, and has edited a number of successful books on human-centered visualization.
Faculty Host: Dr. Helen Purchase
Tuesday, October 2, 2017.
Title: Depth Based Visualizations for Ensemble Data and Graphs
Ensemble datasets are being increasingly seen in a range of domains. Such datasets often appear as a result of a collection of solutions recorded from simulation runs with different parameters/initial conditions, as well as precision uncertainty associated with repeated measurements of a natural phenomenon. Studying ensembles in terms of the variability between members can provide valuable insight into the generating process; particularly when mathematically modeling the process is complex or infeasible. Ensemble visualization can be a powerful way to study the generating process by analyzing ensembles of solutions or possible outcomes. In ensemble visualization, key interests include understanding the typical/atypical members as well as variability in the ensemble. In absence of any information about the underlying generative model, a family of nonparametric methods known as data depth is able to quantify the notion of centrality and provide center-outward order statistics for ensembles. In this talk I will explore novel applications of existing depth based methods, and describe my research on new advantageous visualizations—and associated methods to compute depth—for ensembles of various data types—namely, 3D isocontours, paths on a graph, nodes on a graph, graphs, and data in inner product spaces.
Mukund Raj graduated with a B.S. degree in Electronics and Telecommunications Engineering from the University of Pune in 2008. From 2008 to 2011 he worked as a software engineer at Infosys Labs, where he worked on developing web based accessibility tools. From 2011 to 2013 he was a member of the Visual Perception and Spatial Cognition lab at the University of Utah. In 2013 he graduated with an M.S. degree in computing from the University of Utah, where is also currently working toward a PhD degree in computing.
Faculty Host: Dr. Alon Efrat
Thursday, September 28, 2017.
Title: Designing Secure Systems for Censorship Resistance
Tools to circumvent censorship aim to hide the websites that users access from a government censor. Some even disguise traffic patterns by mimicking allowed protocols or using services such as Skype to tunnel censored content. These systems have evolved as a result of a cat-and-mouse game between nation-state censors and censorship resistors: as new techniques for evading censorship arise, censors tweak their filtering systems to identify the weaknesses in existing tools that signal their usage. In this talk, I will describe key events in the censorship arms race and how to design and implement censorship circumvention tools that tilt the arms race in the favour of the censorship resistor.
Faculty Host: Dr. David Lowenthal
Thursday, Septemeber 14, 2017.
Title: Visual Analytics Methods for Spatiotemporal Analysis
From smart phones to fitness trackers to sensor enabled buildings, data is currently being collected at an unprecedented rate. Now, more than ever, data exists that can be used to gain insight into how policy decisions can impact our daily lives. For example, one can imagine using data to help predict where crime may occur next or inform decisions on police resource allocations or diet and activity patterns could be used to provide recommendations for improving an individual's overall health and well-being. Underlying all of this data are measurements with respect to space and time. However, finding relationships within datasets and accurately representing these relationships to inform policy changes is a challenging problem. This research talk will address fundamental questions of how we can effectively explore such space-time data in order to enhance knowledge discovery and dissemination. Examples in this talk will focus on my lab group's recent research efforts in criminal analysis looking at methods of extending kernel density estimation, a theoretical analysis of cluster projections in choropleth maps, and novel visualization methods for tracking geographical hotspots with an emphasis on disease surveillance.
Ross Maciejewski is an Associate Professor of Computer Science at Arizona State University whose primary research interests are in the areas of geographical visualization and visual analytics focusing on public health, social media, sustainability, criminal incident reports and dietary analysis. He has served on the organizing committee for the IEEE Conference on Visual Analytics Science and Technology and the IEEE/VGTC EuroVis Conference and is serving as the Vice Chair for IEEE VIS 2017 in Phoenix, AZ. His work has been recognized through award winning submissions to the IEEE Visual Analytics Contest (2010, 2013 and 2015), and a best paper award in EuroVis (2017). He is a Fellow of the Global Security Initiative at ASU and the recipient of an NSF CAREER Award (2014).
Tuesday, September 12, 2017.
Speaker: Kyle Fox, Ph.D.
Title: Maps Between Geometric Data Sets
We will discuss two variants of the problem of computing maps between data sets. First, we will describe a near-linear time approximation algorithm for computing dynamic time warping maps between point sequences, a central problem in the analysis of trajectories and other curves. Next, we will describe fast approximation algorithms for computing transportation maps, a widely used method for comparing and relating two distributions. In both cases our goal is to develop simple, fast, hopefully near-linear-time approximation algorithms.
Kyle Fox recently joined the University of Texas at Dallas as an Assistant Professor after completing a postdoc at Duke University. He obtained his Ph.D. from the University of Illinois at Urbana-Champaign in 2013. His research interests lie primarily in algorithms, including geometric algorithms, computational topology, combinatorial optimization, and their applications to data analysis and graph algorithms. He was a recipient of the Department of Energy Office of Science Graduate Fellowship and a winner of the C. W. Gear Outstanding Graduate Student award while at the University of Illinois.
Thursday, August 24, 2017.
Title: Graph Drawings: as created by users (or 'Doing the Future Work')
Much effort has been spent on designing algorithms for the automatic layout of graphs. Typically, the worth of these algorithms has been determined by their computational efficiency and by the extent to which the graph drawings they produce conform to pre-defined "aesthetics" (for example, minimising the number of edge crosses and edge bends, or maximising symmetry).
Prior experimental work has focussed on the extent to which the layout of a graph drawing assists with the comprehension of the embodied relational information. This seminar presents an alternate approach to determining the relative worth of graph layout aesthetics, based on how users create their own graph drawings. The seminar will present the results of both the published research experiments, as well as two follow-up studies.
Dr Helen Purchase is Senior Lecturer in the School of Computing Science at the University of Glasgow. She has worked in the area of empirical studies of graph layout for several years, and also has research interests in visual aesthetics, task-based empirical design, collaborative learning in higher education, and sketch tools for design. She has recently written a book on Empirical methods for HCI research.