Tuesday, March 21, 2017.
Title: Breaking the Monolith: Rethinking Storage System Design
Distributed object-based storage systems serve the growing and disparate needs of almost all of large-scale web services in use today. This is driving the proliferation of a wide variety of distributed object-based stores. However, extant, monolithic, scale-out storage system designs present unique challenges in adapting to the ever-changing storage requirements in both efficiency (for example, performance and resource efficiency), and flexibility (for example, ease-of-use and programmability).
My research takes two crucial steps on this difficult road to optimize and design better object-based stores. In this talk, I will first show that an approach to storage system design based on a simple core principle, resource partitioning, can yield systems with significantly improved performance and resource efficiency under dynamic, skewed, and multi-tenancy workloads. I will show how to effectively exploit fine- and coarse-grained resource partitioning in MBal, a distributed in-memory object caching system, that offers a holistic solution wherein the load balancing model tracks hotspots and applies different strategies based on the severity of the imbalance. Then, I will discuss a fundamental challenge faced by all practitioners and developers working on scalable storage: how to implement a fast and reliable scale-out storage system with minimal engineering effort? I will present how modular design rather than the extant monolithic approaches can ease the burden of designing new storage systems, especially by enabling an innovative decoupling of the control and data plane in distributed storage. I will conclude with a brief discussion of my vision for future storage and data-intensive systems.
Yue Cheng is a Ph.D. candidate in the Department of Computer Science at Virginia Tech. His research interests include distributed systems, cloud storage, and Internet of Things. His work has been published in premier venues in computer systems and high-performance computing, including USENIX ATC, ACM EuroSys, and ACM HPDC. He has worked and collaborated with leading storage researchers at IBM Research and Dell EMC. He received his B.Eng. in Computer Science from Beijing University of Posts and Telecommunications. Yue loves traveling and all kinds of outdoor activities.
Tuesday, March 28, 2017.
Title: Building Next-Generation Flash Storage Systems: From Data Centers to Wearables
Flash-based storage (e.g., SSD) has been widely adopted in almost every kind of platform spanning all the way from wearables, to mobiles, to data center servers. Since their arrival more than a decade ago, Flash has been improved significantly in terms of latency, throughput and capacity. While storage hardware has evolved, the corresponding software stack is the main bottleneck not only for Flash but also for imminent storage media deemed to be significantly faster than Flash.
In this talk, I will demonstrate new approaches to designing flash-based storage systems to unleash the power of hardware devices while preserving the simplicity and guarantees of system abstractions. Specifically, I will present three systems (FlashBlox, FlashMap, and WearDrive) that enable applications leverage the power of Flash with little software overheads by challenging conventional wisdom on storage system design. They improve performance and energy-efficiency significantly for applications in large-scale data centers as well as in small smart devices such as wearables.
Jian Huang is a Ph.D. candidate in the School of Computer Science at Georgia Institute of Technology. His research interests lie in the areas of computer systems, including operating systems, systems architecture, systems reliability and security, and distributed systems. He enjoys building practical, reliable, secure and high-performance systems. Most of his recent work focuses on building memory and storage systems with new and emerging memory technologies such as flash memory, battery-backed DRAM and phase-change memory. His research contributions have been published at ASPLOS, FAST, USENIX ATC, ISCA, VLDB, SoCC, and IPDPS. His work WearDrive won the Best Paper Award at USENIX ATC in 2015 and attracted popular press coverage in more than eight countries. His work FlashMap won the IEEE Micro Top Picks Honorable Mention in 2016. Most of the technologies he has developed have had an impact on industrial and real-world systems, some of them are being transferred into products including those at Microsoft data centers.
Thursday, January 19, 2017.
Title: Privacy as a Service
Current cloud services are vulnerable to hacks and surveillance programs that undermine user privacy and personal liberties. In this talk, I present how we can build practical systems for protecting user privacy from powerful persistent threats. I will discuss Talek, a private publish-subscribe protocol. With Talek, application developers can send and synchronize data through the cloud without revealing any information about data contents or communication patterns of application users. The protocol is designed with provable security guarantees and practical performance, 3-4 orders of magnitude better throughput than other systems with comparable security goals. I will also discuss Radiatus, a security-focused web framework to protect web apps against external intrusions, and uProxy, a Internet censorship circumvention tool in deployment today.
Thursday, January 19, 2017.
Often times, data analysts or application scientists benefit from a holistic view of the data. These views could be concise summarizes that allow people to develop intuitions about overall patterns or to confirm very general expectations, e.g. regarding data quality. Alternatively, these views might provide opportunities for exploratory investigations, where users or scientists might look for new patterns or trends. This talk describes some aspects of the general problem of summarizing ensembles of data and describes some recent work that spans the space of approaches from light-weight summarizes to more sophisticated analyses that allow for deeper investigations.
Faculty Host: Dr. Joshua Levine
Tuesday, January 24, 2017.
Machine learning algorithms are everywhere, ranging from simple data analysis and pattern recognition tools used across the sciences to complex systems that achieve super-human performance on various tasks. Ensuring that they are safe—that they do not, for example, cause harm to humans or act in a racist or sexist way—is therefore not a hypothetical problem to be dealt with in the future, but a pressing one that we can and should address now.
In this talk I will discuss some of my recent efforts to develop safe machine learning algorithms, and particularly safe reinforcement learning algorithms, which can be responsibly applied to high-risk applications. I will focus on a specific research problem that is central to the design of safe reinforcement learning algorithms: accurately predicting how well a policy would perform if it were to be used, given data collected from the deployment of a different policy. Solutions to this problem provide a way to determine that a newly proposed policy would be dangerous to use without requiring the dangerous policy to ever actually be used.
Philip Thomas is a postdoctoral research fellow in the Computer Science Department at Carnegie Mellon University, advised by Emma Brunskill. He received his Ph.D. from the College of Information and Computer Sciences at the University of Massachusetts Amherst in 2015, where he was advised by Andrew Barto. Prior to that, Philip received his B.S. and M.S. in computer science from Case Western Reserve University in 2008 and 2009, respectively, where Michael Branicky was his adviser. Philip's research interests are in machine learning with emphases on reinforcement learning, safety, and designing algorithms that have practical theoretical guarantees.
Faculty Host: Dr. Mihai Surdean
Thursday, February 2, 2017.
Title: Fixing the Phone: Designing Strong Authentication for Telephony
In this talk, we will examine how phone networks are being misused for authentication and secure communications. We will then see how new systems leveraging techniques from networking, cryptography, and signal processing can provide strong authentication of phone calls. These techniques will pave the way for a more trustworthy telephone infrastructure.
He holds an MS in Computer Science from Georgia Tech as well as a BS and MS in Computer Engineering from Mississippi State University. His work has been recognized with two best paper awards, and he was named an NSF Graduate Research Fellow in 2010.
Faculty Host: Dr. Saumya Debray
Tuesday, February 7, 2017.
Advances in machine learning have led to rapid and widespread deployment of software-based inference and decision making, resulting in various applications such as data analytics, autonomous systems, and security diagnostics. Current machine learning systems, however, assume that training and test data follow the same, or similar, distributions, and do not consider active adversaries manipulating either distribution. Recent work has demonstrated that motivated adversaries can circumvent anomaly detection or classification models at test time through evasion attacks, or can inject well-crafted malicious instances into training data to induce errors in classification through poisoning attacks. In addition, by undermining the integrity of learning systems, the privacy of users' data can also be compromised.
In this talk, I describe my recent research addressing evasion attacks, poisoning attacks, and privacy problems in adversarial environments. The key approach is to utilize game theoretic analysis and model the interactions between an intelligent adversary and a machine learning system as a Stackelberg game, allowing us to design robust learning strategies which explicitly account for an adversary’s optimal response.
Dr. Bo Li is a postdoctoral research fellow in the department of Electrical Engineering and Computer Science at University of Michigan. She is a member of IEEE, AAAI, and ACM. She received the Symantec Research Labs Graduate Fellowship in 2015 as one of three recipients nationwide of the prestigious fellowship. Her research focuses on machine learning, security, privacy, game theory, social networks, and adversarial deep learning. She has designed several robust learning algorithms, a scalable framework for achieving robustness for a range of learning methods, and a privacy preserving data
publishing system. Dr. Li interests in both theoretical analysis of general threat models and developing practical systems. She has evaluated the vulnerabilities of real-world machine learning models and developed resilient learning systems to not only preserve robustness, but also optimize resource allocation based on practical constraints. Another focus of her research is on developing scalable robust algorithms that can process massive amounts of data available for Internet-scale problems regarding specific cloud computing infrastructure to achieve secure learning for big data. She is also active in adversarial deep learning research for training generative adversarial networks (GAN) and designing robust deep neural networks against adversarial examples.
Faculty Host: Dr. Saumya Debray
Thursday, February 9, 2017.
Title: Making the World Safer through "Cyber"-Autonomy
Our world is driven by interconnected software. While this connectivity provides functionality and convenience, it is not without risks: vulnerabilities are still rampant in modern programs, and the exploitation of these vulnerabilities turns our connectivity into a liability. With the recent proliferation of "smart" devices, more vulnerable software than ever is connected to the internet, and open to attackers.
We must find and fix these vulnerabilities before they can be exploited. In this talk, I will describe my research into an analysis pipeline that is flexible and extensible enough to target the identification of different types of vulnerabilities in binary code. I will discuss angr, the analysis framework powering this pipeline, and detail how angr can be applied not only to vulnerability identification in binary code, but to vulnerability remediation as well. Finally, I will show the culmination of these techniques in the form of the Mechanical Phish, one of the world's first fully autonomous hacking systems. Last year, the Mechanical Phish won third place in the DARPA Cyber Grand Challenge by automatically finding, exploiting, and patching vulnerabilities in a live competition, at a scale that could not be achieved by human hackers.
I have open-sourced the Mechanical Phish and angr to give researchers a platform on which to build next-generation binary analysis techniques. The growing community around these projects, including research labs, companies, and enthusiasts around the world, are actively pushing forward the frontier of the field. With ever-improving vulnerability detection and remediation techniques, we hope to introduce automated binary analysis into the toolbox of the "good guys", making our world more secure in the process.
Faculty Host: Dr. Christian Collberg
Monday, February 13, 2017.
Title: Efficient Coordination for Global-Scale Data Management
Replicating data across datacenters (geo-replication) provides higher levels of fault-tolerance and data availability. The Wide-Area Network (WAN) latency separating datacenters is orders of magnitude larger than traditional network latency within a datacenter. This makes it expensive to preserve the consistency of data copies. However, consistency and high-level access abstractions like database transactions are favored by developers because they hide the complexity of the underlying replica and concurrency control. This has led to the adoption of consistent transactions in large-scale geo-replicated systems.
In this talk, I will present the fundamental challenges in designing geo-replicated data management systems. Specifically, transaction latency is high due to the need to coordinate between datacenters spread across the world. Traditionally, coordination is performed by polling other datacenters for permissions to execute. This made a Round-Trip Time (RTT) latency inevitable. In geo-replication, this is an expensive cost and thus leads to the question: Is it possible to avoid the polling paradigm of coordination? Message Futures is a protocol that demonstrates a new paradigm of continuous, proactive coordination. In this paradigm, transactions can coordinate in sub-RTT latency. Breaking the RTT latency barrier invites the next part of the talk where I derive a lower bound for coordination latency. The proposed lower-bound model inspires a design of a coordination protocol called Helios that targets achieving the lower-bound latency. The talk will also discuss many of the practical aspects of building scalable large-scale data management and communication platforms for geo-replicated systems. I conclude the talk with future opportunities for global-scale data management in the context of edge computing, Internet of Things, and data science.
Faculty Host: Dr. Michelle Strout
Tuesday, February 14, 2017.
Title: Learning and Incentives in Systems with Humans in the Loop
There is an increasing amount of human-generated data available on the internet -- including online reviews, user search histories, datasets labeled using crowdsourcing, and beyond. This has created an unprecedented opportunity for researchers in machine learning and data science to address a wide range of problems. On the other hand, human-generated data also creates unique challenges. Humans might be strategic or careless, possess diverse skills, or have behavioral biases. What is the right way to understand and utilize human-generated data? Furthermore, can we better design the systems with humans in the loop to generate more useful data in the first place?
In this talk, I will present my research which addresses the challenges in utilizing and eliciting data from humans. In particular, I will first introduce the problem of actively purchasing data from humans for solving machine learning tasks, and demonstrate how to convert a large class of machine learning algorithms into pricing and learning mechanisms. In the second part of the talk, I will discuss how to obtain high-quality data from humans using financial incentives and present our findings in a comprehensive set of behavioral experiments conducted on Amazon Mechanical Turk.
Faculty Host: Dr. Carlos Scheidegger
Thursday, February 16, 2017.
Manifold-valued data naturally occur in many disciplines. For example, directional data can be represented as points on a unit sphere. Diffusion tensors in magnetic resonance images form a quotient manifold GL(n)/O(n), which is a space of symmetric positive definite (SPD) matrices. Also, the Hilbert unit sphere for the squareroot representation of orientation distribution functions (ODFs) or probability density functions (PDFs). Their data spaces are known, a priori, to have a nice mathematical structure with well-studied properties. It makes sense that if algorithms make use of this additional information, even more efficient inference procedures can be developed. Motivated by this intuition, in this talk we study the relationship between statistical learning algorithms and the geometric structures of data spaces encountered in machine learning, computer vision and neuroimaging using mathematical tools (e.g. Riemannian geometry). As a result, this framework will give new insights into statistical inference methods for image analysis and enable developing new models for manifoldvalued data and (potentially manifold-valued parameters) to improve statistical power.
Topics featured in this talk include: manifold statistics on Riemannian manifolds, manifold-valued multivariate general linear models, canonical correlation analysis on manifolds, Dirichlet Process for manifold-value variables, interpolation of k-GMMs, latent variable graphical model selection, and abundant inverse regression.
• PhD. student, Computer sciences, University of Wisconsin-Madison, advisor: Vikas Singh, 2011 - present (PhD minor: Statistics)
• Internship at Amazon in Customer Service Machine Learning Analytics Team, 2013
• M.S., Computer sciences, University of Wisconsin-Madison, 2013
• Scientist, Music and Audio Research Group (MARG) in GSCST at SNU, 2010 - 2011
• M.S., Computer science and engineering, Seoul National University, 2010
• B.S., Computer science, Korea University, 2008
• Research Interests: Machine learning and computer vision, manifold statistics, medical imaging, and large scale numerical optimization
Tuesday, February 21, 2017.
Today's common online services (social networks, media streaming, messaging, email, etc.) undermine privacy. Indeed, there have been numerous incidents
(hacks, accidental disclosures, etc.) where private information has leaked. With a vision of reducing these risks, my research aims to build systems that provide strong privacy guarantees and are practical (i.e., have functionality and costs comparable to that of the status quo).
In this talk, I will describe the challenges faced in building such systems and and how I address them. As an example of one of the systems I have built, I will
describe Popcorn, a Netflix-like media delivery system that provably hides, even from the content distributor (e.g., Netflix), which movie a user is watching; is otherwise consistent with prevailing commercial regime (copyrights, etc.); and achieves plausibly deployable performance (per-request dollar cost is 3.87 times that of a non-private system).
Trinabh Gupta is a PhD candidate at The University of Texas at Austin. He is also a visiting academic at NYU's systems group. His research interests are in systems, security, and privacy, and he has worked on privacy-preserving online services, and failure detection in distributed systems. His advisors are Lorenzo Alvisi and Michael Walfish. Prior to being a PhD student he was a computer science undergraduate student at Indian Institute of Technology Delhi (IITD).