Tuesday, April 10, 2018.
Title: Unleashing The Power of Software-Architecture Co-Design in Exascale Era and Beyond
Within a decade, the technological underpinnings for the process Gordon Moore described will likely come to an end as silicon photolithography approaches atomic scale. To continue the rapid, predictable, and affordable scaling of computing performance in exascale and beyond, major vendors have shown increased reliance on heterogeneous hardware acceleration, where multiple novel technologies are integrated into a single heterogeneous system, including different types of accelerators, new memory technologies and hybrid interconnect. Under such extreme heterogeneous environment, traditional software designs that do not consider these complex architecture features will no longer provide optimal efficiency. In this talk, I will use current commercial GPU-based heterogeneous architectures as examples to showcase my recent research related to co-design efforts, specifically on runtime-architecture co-design (e.g., circumventing architecture design constraints or leveraging its features to improve overall system efficiency), and application-architecture co-design (e.g., applying unique application features for design optimization steering). These two high-level topics cover a range of interesting co-design research directions such as approximate computing for scientific applications, big data analytics, 3D gaming/virtual reality, and emerging memory technology integration. I will also briefly discuss my ongoing projects in collaboration with academia and industry labs, as well as future research ideas.
Shuaiwen Leon Song is currently a senior scientist and technical lead in High Performance Computing Group at Pacific Northwest National Lab (PNNL). He is an adjunct scholar with CS department at College of William & Mary. His research interests are in the areas of system efficiency, with a strong focus on software-architecture co-design in high performance computing. He is a Lawrence Livermore ISCR scholar, a recipient of 2011 Paul E. Torgersen excellent research award, 2016 PNNL PCSD outstanding performance award, and 2017 IEEE TCHPC early career award in high performance computing. He has published in major HPC-related conferences including ASPLOS, SC, MICRO, HPCA and PPoPP. His past works have received two best paper runner-up nominations in the Supercomputing conference and a HiPEAC paper award. His research has been supported by several U.S. government agencies and industry labs. Currently he is leading a big data analytics and a deep-learning directed HPC design LDRD project at PNNL.
Host: Michelle Strout
Monday, April 09, 2018.
Title: Software Security Today: Understanding and Containing Sophisticated Attacks
As our society is becoming increasingly reliant on software, software security is also becoming of critical importance. Through the years, defenses, such as address space layout randomization, data-execution prevention, and stack and heap protections have significantly raised the bar for attackers, making software exploitation harder. However, attacks have also evolved to a new level of sophistication. Modern attacks combine multiple vulnerabilities to launch code-reuse attacks that “repurpose” existing code to execute arbitrary computations. Such attacks have reignited the interest in various instantiations of defenses based on control-flow integrity. In this talk, I will be presenting our work on evaluating the effectiveness of such defenses by modeling them and attempting to produce a proof-of-concept attack that bypasses them. I will proceed to talk about reducing the attack surface of applications by removing unused code, which aims both to hinder attacks and enhance defenses. I will conclude with discussing new research directions.
Georgios Portokalidis is an Assistant Professor in the Department of Computer Science at Stevens Institute of Technology. He obtained his doctorate degree in Computer Science from Vrije Universiteit in Amsterdam, while he also holds an MS from Leiden University and a BS from University of Crete. His research interests are mainly around the area of systems and security. Some of the subjects he is actively working on include the detection and prevention of state-of-the-art attacks against software systems, efficient information-flow tracking systems, user authentication, and IoT security. He has authored numerous papers published in venues like ACM CCS, ACM EuroSys, Usenix Security, and IEEE Security and Privacy. He has received funding through ONR, DARPA, and IARPA, while he has also been involved in several projects funded by the EU and NSF. He has served in program committees of various conferences, including USENIX Security, NDSS, ACSAC, RAID, and others.
Host: Saumya Debray
Tuesday, April 03, 2018.
Title: Transfer learning towards intelligent systems in the wild
Developing intelligent systems for vision and language understanding has long been a crucial part that people dream about the future. In the past few years, with the accessibility to large-scale data and the advance of machine learning algorithms, vision and language understanding has had significant progress for constrained environments. However, it remains challenging for unconstrained environments in the wild where the intelligent system needs to tackle unseen objects and unfamiliar language usage that it has not been trained on. Transfer learning, which aims to transfer and adapt the learned knowledge from the training environment to a different but related test environment has thus emerged as a promising paradigm to remedy the difficulty. In this talk, I will present my recent work on transfer learning towards intelligent systems in the wild. I will begin with zero-shot learning, which aims to expand the learned knowledge from seen objects, of which we have training data, to unseen objects, of which we have no training data. I will present an algorithm SynC that can construct classifiers of any object class given its semantic description, even without training data, followed by a comprehensive study on how to apply it to different environments. I will then describe an adaptive visual question answering framework that builds upon the insight of zero-shot learning and can further adapt its knowledge to the new environment given limited information. I will finish my talk with directions for future research.
Wei-Lun (Harry) Chao is a Computer Science PhD candidate at University of Southern California, working with Fei Sha. His research interests are in machine learning and its applications to computer vision, artificial intelligence, and health care. His recent work has focused on transfer learning towards vision and language understanding in the wild. His earlier research includes work on probabilistic inference, structured prediction for video summarization, and face understanding.
Host: Kobus Barnard
Thursday, March 29, 2018.
Title: Enabling Performance Optimization and Reproducibility by Elimination of Inter-job Interference
On most supercomputers, resource managers allocate nodes to jobs without considering the sharing of network resources by different jobs. Such network-oblivious resource allocations result in link sharing among multiple jobs, which causes significant performance variability and performance degradation for individual jobs. In this talk, I will explore low-diameter networks and associated node allocation policies for eliminating inter-job interference. I will present a new low-diameter topology, called express mesh, constructed by extending n-dimensional mesh networks. An express mesh is denser than the corresponding mesh network, has a lower diameter independent of the number of routers, and is easily partitionable. I will also present a practical node allocation policy for fat-tree networks, which are the most commonly used low-diameter networks currently. This policy for fat-tree networks eliminates inter-job interference and performance variability as well as improve overall performance.
Nikhil Jain is a Computer Scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory. He works on topics related to parallel computing including networks, performance optimization, scalable application development, and interoperation of languages. Nikhil received a Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2016, and B.Tech. and M.Tech degrees in Computer Science and Engineering from I.I.T. Kanpur, India in May 2009. He was awarded the Sidney Fernbach postdoctoral fellowship for two years in 2016, the IBM PhD fellowship in 2014, and the Andrew and Shana Laursen fellowship in 2011.
Faculty Host: David Lowenthal
Tuesday, March 27, 2018.
Title: Enhancing the Discovery and Mitigation of Vulnerabilities in Binary Programs
In the computing landscape of the modern world, our devices and systems, including PCs, servers, industrial control systems, and smart/embedded devices, are increasingly relying on programs for which the source code is unavailable to end users, security analysts, and even manufacturers – termed “binary programs”. Oftentimes, binary programs are not fully secure, and through these devices and systems, vulnerabilities in binaries may have a broad impact on society. Because of the intrinsic complexity of programs, the discovery and mitigation of vulnerabilities in binaries is generally viewed as a difficult task. It is only more difficult due to the loss of information, especially semantics, through compilation and optimization.
In this talk, I will present my research on improving the discovery and mitigation of vulnerabilities in binaries without requiring source code. I approach this goal from different angles. I will first discuss improvements on traditional vulnerability discovery techniques, such as fuzz testing, by complimenting them with assistance from either symbolic execution engines or intelligence from non-expert humans. I will then showcase a novel technique for static binary rewriting with extremely low overhead, which greatly reduces the performance impact of vulnerability mitigation and program hardening on binaries. These techniques are built upon the angr binary analysis platform, which I co-founded and maintain to help foster the future of binary analysis.
Ruoyu (Fish) Wang is a Ph.D. candidate in the SecLab of the Department of Computer Science at the University of California, Santa Barbara, being advised by Prof. Giovanni Vigna and Prof. Christopher Kruegel. His research focuses on system security, especially on automated binary program analysis and reverse engineering of software. He is the co-founder and a core developer of the binary analysis platform, angr. He is a core member of the CTF team Shellphish and the CGC team Shellphish CGC, with whom he won the third place in the Final Event of the DARPA Cyber Grand Challenge in 2016.
Faculty Host: Dr. Saumya Debray
Thursday, March 22, 2018.
Title: Securing Smart, Connected Systems through Systematic Problem Analysis and Mitigation
The world is increasingly connected through a series of smart, connected systems such as smartphone systems, smart home systems, and the emerging smart transportation and autonomous vehicle systems. While leading to improved services, such transformation also introduces new security challenges. To address these challenges, in contrast to existing defense mechanisms that are mostly ad hoc and reactive, my research aims at developing proactive defense approaches that can systematically discover, analyze, and mitigate new security problems in smart, connected systems.
In this talk, I will focus on my research efforts in securing two most basic components in any smart, connected system: network stack and smart control. For network stack security, I will describe our discovery of a new attack vector (US-CERT alert TA16-144A) that was unexpectedly brought by the recent expansion in DNS, and our subsequent systematic analysis at both network and software levels for its defense. For smart control security, I will describe my most recent work that performed the first security analysis of the next-generation Connected Vehicle (CV) based traffic signal control, which discovers new vulnerabilities at the traffic signal control algorithm level. I will conclude by discussing my future research plans in securing existing and future smart, connected systems, especially those in critical domains such as transportation and automobile.
Qi Alfred Chen is a PhD candidate in the EECS department at University of Michigan advised by Professor Z. Morley Mao. His research interest is network and systems security, and the major theme of his research is to address security challenges through systematic problem analysis and mitigation. His research has discovered and mitigated security problems in various systems such as next-generation transportation systems, smartphone OSes, network protocols, DNS, GUI systems, and access control systems. His work has impact in both academia and industry with over 10 top-tier conference papers, news coverage and interviews, vulnerability disclosures, and industry discussions and responses. His current research focuses on smart systems and IoT, e.g., smart home, smart transportation, and autonomous vehicle systems.
Faculty Host: Dr. Beichuan Zhang
Tuesday, March 20, 2018.
Title: Efficient Recording and Analysis of Software Systems
Failures in medical devices, banking software, and transportation systems have lead to both significant fiscal costs and even loss of life. Researchers have developed sophisticated methods to monitor and understand many of the complex system mis-behaviors behind these bugs, but their computational costs (often an order of magnitude or more) prohibit their use in production, leading to an ecosystem of critical software with little guaranteed protection, and no method of reconciling misbehaviors.
In this talk I present systems and techniques which reduce the run-time burden of the tools required to understand and monitor the complex behaviors of today's critical systems. First, I present Optimistic Hybrid Analysis (OHA). OHA observes that when applying static analysis towards dynamic analysis optimization, the static analysis need not be correct in all cases, so long as any analysis errors can be caught at runtime. This observation enables the use of much more efficient and accurate static analyses than historically used, creating dynamic run-times dramatically lower than prior techniques. Second, I argue that computer systems should be capable of not only recalling any prior state, but also providing the provenance of any byte within the history of the computation. I call such a system an "Eidetic System", and I present Arnold, the first practical eidetic system, capable of recording and recalling years of computation on a single disk. I show that Arnold can practically
answer critical questions about serious information leakages, such as exactly what information (if any) was leaked by the Heartbleed vulnerability, or Equifax breach.
David Devecsery is currently a postdoctoral researcher at the University of Michigan, after completing his Ph.D. in January 2018 at the University of Michigan. His interests broadly span the areas of software systems, program analysis, and system security. David is particularly interested in creating practical tools that enable
developers, users, and system administrators to practically observe and understand complex and unexpected behaviors of software systems.
Faculty Host: Dr. Michelle Strout
Tuesday, March 13, 2018.
Title: Clustering for Large Data Domains
We live in an era in which a massive amount of data is being generated in a very short period of time, and there is high demand to process it. Clustering is a crucial tool that is used in a wide array of applications to analyze such large datasets. The objective is to place similar data objects in the same group and dissimilar objects in different groups. Though clustering problems have received considerable attention over the past several decades, brute-force algorithms stubbornly stand as the most efficient solution in many settings.
My research provides insight into why it is unlikely for more efficient algorithms to exist. I describe these new developments, and propose ways of overcoming this barrier for some restricted settings. In particular, my research also gives efficient and near-optimal solutions for processing vast amounts of data that is well-clusterable. practice.
Alan Roytman is currently a Postdoctoral researcher at the University of Copenhagen. His research interests are in the design and analysis of algorithms, and more broadly in theoretical computer science. Previously, he was a Postdoctoral researcher at Tel Aviv University, where he was supported by the I-CORE in Algorithms Postdoctoral Fellowship. Alan obtained his Ph.D. from the University of California, Los Angeles under the guidance of Professor Rafail Ostrovsky, and his undergraduate degree from the University of California, Berkeley.
Faculty Host: Dr. Stephen Kobourov
Thursday, March 1, 2018.
Title: Create a Fully Autonomous World for Software Security
To protect the billions of computers running countless programs, security researchers have pursued automated vulnerability detection and remediation techniques, attempting to scale such analyses beyond the limitations of human hackers. However, although techniques will mitigate, or even eliminate the bottleneck that human effort represented in these areas, the human bottleneck (and human fallibility) remains in the higher-level strategy of what to do with automatically identified vulnerabilities, automatically created exploits, and automatically generated patches. There are many choices to make regarding the specificities of such a strategy, and these choices have real implications beyond cyber-security exercises. For example, individuals make decisions on whether to patch the Spectre vulnerability given the fact that the patch affects the performance in some workloads, and nations make decisions on whether to disclose new software vulnerabilities (zero-day vulnerabilities) or to exploit them for gain.
In this talk, I will introduce my work of cyber autonomy. Cyber autonomy is a new computer security research area, aiming to secure programs without human intervention, from discovering vulnerabilities, making decisions to executing decisions. While the first generation of the implemented systems (autonomous cyber reasoning systems) have shown the potential for cyber autonomy, they are still simplistic for practical use. I will delve into the challenges in cyber autonomy and the issue of the strategy-techniques gap, explore the possible solutions, and discuss the future steps to mature cyber autonomy to everyday practice.
Tiffany Bao is a PhD candidate in Electrical and Computer Engineering at Carnegie Mellon University advised by Professor David Brumley. She is also a member of CyLab Security and Privacy Institute, Carnegie Mellon University. Her research focuses on cyber autonomy, and her work spans the areas of binary analysis techniques and game-theoretical strategy. She earned her Bachelor of Science from Peking University in 2012, and she has worked as a security specialist at University of California Santa Barbara, Peking University and Tsinghua University.
Faculty Host: Dr. Christian Collberg
Tuesday, February 27, 2018.
Title: Adaptive Machine Learning with Multi-Armed Bandits
We are in the middle of the AI revolution with the success of AlphaGo, image classification, and speech recognition. However, the success relies on a large amount of data, which raises numerous challenges for novel tasks since the data is usually not readily-available and takes money and time to collect. How can we minimize the data collection costs and train models efficiently with insufficient data? In this talk, I will talk about novel adaptive data collection and learning algorithms arising from the so-called multi-armed bandit framework and show their theoretical guarantees and their effectiveness in real-world applications. Specifically, I will show that my algorithms can quickly recommend personalized products to a novel user in a scalable way via a novel extension of online optimization algorithms. I will also discuss how biological experiments can be performed with a reduced amount of budget by adaptively selecting what experiments to run next.
Kwang-Sung Jun is a postdoctoral researcher at the University of Wisconsin-Madison Wisconsin Institute for Discovery, advised by Profs. Robert Nowak, Rebecca Willett, and Stephen Wright. His research focuses on adaptive and interactive machine learning that arises in real-world and interdisciplinary applications. Specifically, he works on multi-armed bandits, online optimization, and cognitive modeling, which has applications in personalized recommendation, adaptive biological experiments, and psychology. He received a Ph.D. in Computer Science from the University of Wisconsin-Madison under the supervision of Prof. Xiaojin (Jerry) Zhu.
Faculty Host: Dr. John Kececioglu
Thursday, February 22, 2018.
Title: Abstractions, Mechanisms, and Policies for Intra-Kernel Privilege Separation
Many layers of our computing stacks are implicitly trusted, but are themselves no more secure than the applications they seek to protect. Securing them is challenging because they operate with full authority.
In this talk I describe one of my explorations into making *trusted* software more trustworthy. The Nested Kernel replaces monolithic operating system design with a new organization that directly integrates memory protection services into the operating system itself. I describe how a common memory protection mechanism (memory management unit) is powerful enough to isolate itself from most of the operating system without requiring a higher hardware privilege level.
The result is that the Nested Kernel is efficient and portable to diverse system software (Xen, FreeBSD, Linux, Android) and hardware (Arm, x86-64, VT-x), while reducing the code allowed to modify protection policies by two orders of magnitude. Nested Kernel prototypes (FreeBSD and Xen) demonstrate that it is possible to retrofit security into existing and popular systems with explicit and powerful intra-kernel protection services, providing the foundation for future research in securing our systems. In the talk I describe the Nested Kernel and sketch a path forward for a "micro-evolution" of monolithic systems, which I intend to exploit for operating system hardening and verification: a must for gaining any assurance in our computing stacks.
Nathan Dautenhahn is a postdoctoral researcher in the Department of Computer and Information Science at the University Pennsylvania. He earned his doctorate in Computer Science from the University of Illinois at Urbana-Champaign in August of 2016. His research investigates trustworthy system design by developing experimental operating systems, compilers, and hardware components, which has led to publications in key security and systems venues, including IEEE S&P, CCS, NDSS, ASPLOS, and ISCA. His dissertation, on the Nested Kernel, identifies solutions for defending against insecure and malicious operating systems. The Nested Kernel is under consideration for inclusion in HardenedBSD (a variant of FreeBSD) and employed by others integrating it into Linux. Dautenhahn actively contributes to graduate education and service by participating in many activities, such as creating the Doctoral Education Perspectives seminar, formally mentoring undergraduate and graduate students, and serving on the Computer Science Graduate Academic Council and the Engineering Graduate Student Advisory Committee.
Faculty Host: Dr. Rick Snodgrass
Tuesday, February 20, 2018.
Title: Incentivizing Societal Contributions for and via Machine Learning
Machine learning (ML) and automatic algorithmic decision making have started to play central and crucial roles in our daily lives. At the same time, more and more data used to train ML algorithms are now collected through crowdsourcing or other forms of participatory computation involving human agents. With humans being both the source and the ultimate target of these algorithms, which are increasingly being used to assist in making important and sometimes life-changing decisions, new and interesting challenges arise.
I will present our studies that address the challenges of incentivizing societal contributions for building better and more robust ML algorithms. In the first part of the talk, I will demonstrate how ML techniques can be leveraged to quantify the value of human reported information when there is no ground-truth verification. I show how these results help design better incentive mechanisms to encourage user input and help make high quality data collection more efficiently, compared to existing, non-ML based methods. In the second part, I will show how the Multi-Armed Bandit type of online learning techniques can help resolve above incentive challenge in a sequential data acquisition setting. I will conclude my talk with future works.
Yang Liu is currently a postdoctoral fellow at Harvard University. He obtained his PhD degree from the Department of EECS, University of Michigan Ann Arbor in 2015. He also obtained a Master of Science in EE:Systems and in Mathematics in 2012 and 2014 respectively, both from University of Michigan, and holds a Bachelor degree from Shanghai Jiao Tong University, China. His research interests broadly focus on the interactions between society and machine learning, and in particular algorithmic decision-making. He was a Finalist for the Towner Prize for Outstanding Ph.D. research in 2015, and the winner of the best poster award at Michigan Engineering Symposium in 2011.
Faculty Host: Dr. Joshua Levine
Thursday, February 15, 2018.
Title: On-Device Machine Learning: Small Models and Fast Prediction
Many complex machine learning models have demonstrated tremendous success for massive data. However, these advances are not necessarily feasible when deploying these models to devices due to large model size and evaluation cost. In many real-world applications such as robotics, self-driving car and smartphone apps, the learning tasks need to be carried out in a timely fashion on a computation and memory limited platform. Therefore, it is extremely important to study building “small” models from “big” machine learning models. The main topic of my talk is to investigate how to reduce the model size and speed up the elevation for complex machine learning models while maintaining similar accuracy. Specifically, I will discuss how to compress the model and achieve fast prediction for different real world machine learning applications including matrix approximation and extreme classification.
Si Si is a researcher and software engineer in Google research. Her research focus is developing scalable machine learning models. Si obtained her bachelor's degree from the University of Science and Technology of China in 2008, M.Phil. degree in 2010 from University of Hong Kong, and Ph.D. from University of Texas at Austin in 2016. She is the recipient of the MCD fellowship in 2010-2013, and the best paper award in ICDM 2012. Si is selected as one of the Rising Stars in EECS 2017.
Faculty Host: Dr. Mihai Surdeanu
Tuesday, February 13, 2018, 2018.
Title: Analyzing and Mitigating Congestion on High Performance Networks
High performance networks are a critical component of clusters and supercomputers that enable fast communication between compute nodes. On many platforms, the performance of parallel codes is increasingly communication-bound due to a disproportionate increase in the compute capacity per node but only modest increases in network bandwidths. Hence, it is extremely important to optimize communication on the network. On most architectures, communication performance may be degraded due to network congestion arising from message flows of one or multiple jobs sharing the same network resources. For the past several years, I have been studying network congestion on high performance networks, and developing different strategies to mitigate it. In this talk, I will present studies on analyzing network congestion on two different network topologies, a dragonfly and a five-dimensional torus network, using analytical modeling, visualization, and machine learning.
Abhinav Bhatele is a computer scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory. His research interests include performance optimizations through analysis and visualization, task mapping and load balancing, network design and simulation, parallel runtimes and interoperation, and HPC data analytics. Abhinav received a B.Tech. degree in Computer Science and Engineering from I.I.T. Kanpur, India in May 2005 and M.S. and Ph.D. degrees in Computer Science from the University of Illinois at Urbana-Champaign in 2007 and 2010 respectively. Abhinav was a recipient of the ACM/IEEE-CS George Michael Memorial HPC Fellowship in 2009 and the IEEE TCSC Young Achievers in Scalable Computing award in 2014. He has received best paper awards at Euro-Par 2009, IPDPS 2013 and IPDPS 2016.
Faculty Host: Dr. Kate Isaacs
Thursday, February 8, 2018.
Title: Designing Operating Systems for Data-Intensive Heterogeneous Systems
The dramatic growth in the volume of data and the disproportionately slower advancements in memory scalability and storage performance have plagued application performance in the last decade. Emerging heterogeneous memory technologies such as nonvolatile memory (NVM) promise to alleviate both memory capacity and storage problems; however, realizing the true potential of these technologies requires rethinking of software systems in a way that it hasn't before. My research has developed fundamental principles and redesigned operating systems (OSes), runtimes, file systems, and applications to address both main memory capacity scaling and storage performance challenges. In my talk, I first present our approach to scaling main memory capacity across heterogeneous memory by redesigning the OS virtual memory as opposed to the file system used by current systems. Our design makes OS virtual memory data structures and abstractions heterogeneity-aware and intelligently captures application's use of memory for efficient data placement. I then briefly discuss our approach to reducing software bottlenecks of storage by moving the file system into the storage hardware. I finally conclude my talk with a future vision of unifying converging memory and storage technologies into an application-transparent data-tier fully managed by OS and user-level runtimes.
Sudarsun is a postdoctoral research associate at the University of Wisconsin-Madison, where he works on operating systems and storage research. His postdoctoral advisors are Prof. Andrea Arpaci-Dusseau and Prof. Remzi Arpaci-Dusseau. Sudarsun received a Ph.D. in Computer Science from Georgia Tech in 2016 under the guidance of the late Prof. Karsten Schwan and Prof. Ada Gavrilovska. Sudarsun's research focus is at the intersection of hardware and software, building operating systems and system software for next-generation memory and storage technologies. Results from his work have appeared at premier operating systems and architecture venues, including EuroSys, FAST, ISCA, HPCA, PACT, IPDPS, and others. In addition, his work during his summer internships at HP Labs, Intel Labs, and Adobe Labs resulted in 3 patents related to nonvolatileÂ memory and resource management. Sudarsun has taught several graduate and undergraduate-level courses and he was nominated for the Georgia Tech-wide Outstanding Teaching Assistant Award.
Faculty Host: Dr. John Hartman
Tuesday, February 6, 2018.
Title: Machine Learning for Understanding the Dynamics of Cell Populations
New technologies allow us to understand many biological processes at the molecular level but require principled machine learning methods to capture the underlying dynamics of the cell populations. In this talk, I present two projects. In the first project, we design a dynamic graphical model to jointly analyze different types of genomic aberrations from multi-location/multi-time biopsies of metastatic breast cancer. The model allows us to accurately characterize genomic aberrations and understand oncogenic processes from next-generation sequencing data at a significantly larger scale. In the second project, we propose a dimensionality reduction approach to recover intrinsic biological structure from single cell Hi-C contact maps. With mouse ES cells, our dimensionality reduction approach successfully recovers the intrinsic cell-cycle manifold, and shows its robustness in terms of the number of contacts.
Dr. Jie Liu received his Ph.D. in computer science from the University of Wisconsin-Madison. He is currently a Moore/Sloan Data Science Postdoctoral Fellow in the Genome Sciences Department and eScience Institute at the University of Washington. His research interests are machine learning and its applications in biomedical informatics.
Faculty Host: Dr. John Kececioglu
Thursday, January 11, 2018.
Title: Providing fast and reliable transactions in distributed database systems
Distributed database systems run transactions across machines to ensure serializability. Traditional approaches for distributed transactions are based on two-phase locking or optimistic concurrency control. However, these protocols suffer from performance degradation because of aborting and/or blocking. In addition, to provide fault tolerance, traditional approaches replicate data relying on an extra layer of consensus protocols such as Paxos, which incurs extra cost. This talk focuses on one question: how can we improve the system performance without giving up the serializability guarantee? It will cover a new concurrency control protocol based on dependency tracking, how to combine it with conflict analysis to support a more diversified workload, and how to extend it to support geo-replication in a merged layer style with lower overhead.
Shuai Mu is a post-doctoral researcher at New York University, working with Mike Walfish and Jinyang Li on distributed systems. He earned his PhD from Tsinghua University (Beijing, China) in 2015.
Faculty Host: Dr. David Lowenthal
Thursday, November 30, 2017.
Title: Privacy in a World of Mass Surveillance
In this talk I will discuss Pung, a private communication system that allows users to exchange messages over the Internet without revealing any information to network providers (ISPs, E-mail servers, etc.). In particular, providers do not learn with whom users communicate, how often, or the content of any communication. We show that this strong privacy property is achievable even when providers are arbitrarily malicious.
To make Pung efficient in practice, we build a new private information retrieval (PIR) library called SealPIR. This library allows a user to retrieve an element from an untrusted server without revealing to the server which element was retrieved. SealPIR is orders of magnitude more network efficient than existing PIR constructions, is concretely efficient, and can be used in other applications.
Sebastian Angel is a Ph.D. candidate at The University of Texas at Austin and visiting academic at New York University's Courant Institute of Mathematical Sciences. He is interested in topics at the intersection of security, systems, and networking. Beyond private communication, his work includes adding verifiability to large scale auction systems, architecting OS defenses against malicious peripheral devices (USB flash drives, keyboards, etc.), and ensuring that applications running on public data centers achieve predictable performance.
Faculty Host: Dr. Katherine Isaacs
Tuesday, November 21, 2017.
Title: 2-3 Cuckoo Filters for Faster Triangle Listing and Set Intersection
Abstract: We introduce new dynamic set intersection data structures, which we call 2-3 cuckoo filters and hash tables. These structures differ from the standard cuckoo hash tables and cuckoo filters in that they choose two out of three locations to store each item, instead of one out of two, ensuring that any item in an intersection of two structures will have at least one common location in both structures. We demonstrate the utility of these structures by using them in improved algorithms for listing triangles and answering set intersection queries.
Prof. Goodrich received his B.A. in Mathematics and Computer Science from Calvin College in 1983 and his PhD in Computer Sciences from Purdue University in 1987. He is a Chancellor's Professor at the University of California, Irvine, where he has been a faculty member in the Department of Computer Science since 2001. He was a professor in the Department of Computer Science at Johns Hopkins University from 1987-2001. Dr. Goodrich's research is directed at the design of high performance algorithms and data structures with applications to information assurance and security, the Internet, machine learning, and geometric computing. He is an ACM Distinguished Scientist, a Fellow of the American Association for the Advancement of Science (AAAS), a Fulbright Scholar, a Fellow of the IEEE, and a Fellow of the ACM.
Faculty Host: Dr. Stephen Kobourov
Thursday, November 9, 2017.
Title: Scalable Learning Over Distributions
Abstract: A great deal of attention has been applied to studying new and better ways to perform learning tasks involving static finite vectors. Indeed, over the past century the fields of statistics and machine learning have amassed a vast understanding of various learning tasks like clustering, classification, and regression using simple real valued vectors. However, we do not live in a world of simple objects. From the contact lists we keep, the sound waves we hear, and the distribution of cells we have, complex objects such as sets, distributions, sequences, and functions are all around us. Furthermore, with ever-increasing data collection capacities at our disposal, not only are we collecting more data, but richer and more bountiful complex data are becoming the norm.
In this presentation we analyze regression problems where input covariates, and possibly output responses, are probability distribution functions from a nonparametric function class. Such problems cover a large range of interesting applications including learning the dynamics of cosmological particles and general tasks like parameter estimation.
However, previous nonparametric estimators for functional regression problems scale badly computationally with the number of input/output pairs in a data-set. Yet, given the complexity of distributional data it may be necessary to consider large data-sets in order to achieve a low estimation risk.
To address this issue, we present two novel scalable nonparametric estimators: the Double-Basis Estimator (2BE) for distribution-to-real regression problems; and the Triple-Basis Estimator (3BE) for distribution-to-distribution regression problems. Both the 2BE and 3BE can scale to massive data-sets. We show an improvement of several orders of magnitude in terms of prediction speed and a reduction in error over previous estimators in various synthetic and real-world data-sets.
Junier Oliva is a Ph.D. candidate in the Machine Learning Department at the School of Computer Science, Carnegie Mellon University. His main research interest is to build algorithms that understand data at an aggregate, holistic level. Currently, he is working to push machine learning past the realm of operating over static finite vectors, and start reasoning ubiquitously with complex, dynamic collections like sets and sequences. Moreover, he is interested in exporting concepts from learning on distributional and functional inputs to modern techniques in deep learning, and vice-versa. He is also developing methods for analyzing massive datasets, both in terms of instances and covariates. Prior to beginning his Ph.D. program, he received his B.S. and M.S. in Computer Science from Carnegie Mellon University. He also spent a year as a software engineer for Yahoo!, and a summer as a machine learning intern at Uber ATG.
Faculty Host: Dr. Mihai Surdeanu
Tuesday, November 7, 2017.
Title: Uncovering and Addressing Security Assumptions About Hardware
Due to manufacturing error, reliability failure modes, or just complex feature design, hardware occasionally exhibits surprising behaviors. Unknowingly, software security can rest on incorrect assumptions about hardware minutiae. In my research I expose how previously unknown or under-appreciated hardware behaviors can result in side-channels that have high-level privacy and security impact in software like web browsers. Motivated by these attacks, I also work to build architectures for both software and hardware that are inherently resistant to side-channels for mitigation against both known and unknown attacks.
In this talk I highlight attacks we have developed using details of hardware behavior, as well as a defensive browser scheme to mitigate such attacks. I first describe how we use floating-point timing side-channels to break web privacy in all major desktop web browsers. I then use these attacks and others as a motivation for our defensive browser proposal: Fermata. I discuss both the complete vision of Fermata as well as diving into the details of our prototype implementation: Fuzzyfox. Fuzzyfox is an incomplete Fermata implementation designed to field-test the ideas of Fermata and their impact on security and usability.
David is a PhD candidate in Computer Science at UC San Diego working in security, systems, and hardware. His research interests focus on the collision between software security theory and hardware reality. Previously, David received his B.S. in Computer Science from Carnegie Mellon University in 2011 and co-founded the San Diego-based security company Somerset Recon in 2012. He expects to defend his thesis in 2018.
Faculty Host: Dr. Christian Collberg
Thursday, October 26, 2017.
Title: Machine Learning By the People, for the People
Machine learning is concerned with the design and analysis of algorithms that compute general facts about an underlying data-generating process by observing limited amounts of that data. Classically, the outcome of a learning algorithm is considered in isolation from the effects that it may have on the process that generates the data or computes the outcome. With data science and the applications of machine learning revolutionizing day-to-day life, however, people and organizations increasingly interact with learning systems. It is essential to account for the wide variety of social and economical limitations, aspirations, and behaviors demonstrated by these people and organizations, which fundamentally change the nature of learning tasks and the challenges involved. I will describe three examples from my work on the theoretical aspects of machine learning and economics that account for these interactions: learning optimal policies in game-theoretic settings, without an accurate behavioral model, by interacting with people; learning the parameters of an optimal economic mechanism when the behavior and preferences of people can change over time and as the result of their interactions with the learning system; and collaborative learning in a setting where multiple learners attempt to discover the same underlying concept.
Nika Haghtalab is a Ph.D. candidate at the Computer Science department of Carnegie Mellon University, co-advised by Avrim Blum and Ariel Procaccia. She is a recipient of the IBM and Microsoft Research Ph.D. fellowships, and the Siebel Scholarship.
Faculty Host: Dr. John Kececioglu
Tuesday, October 10, 2017.
Title: Visual Analytics of Stance in Social Media
This talk will give an overview of the StaViCTA framework project that aims to tackle the challenge of investigating stance (such as attitudes, feelings, perspectives, or judgements) in written human communication. After introducing our definition of stance and providing visualization showcases on how stance analysis might be used to better understand social media, I will discuss several visual analytics tools that were especially designed to support the development of stance classification. They reach from approaches providing fundamental insights into text data that are necessary for building an appropriate linguistic stance theory to approaches for text data annotation and visualization that facilitate the entire process of training a stance classifier.
Andreas Kerren received the B.S. and M.S. degrees as well as his PhD degree in Computer Science from Saarland University, Saarbrücken (Germany). In 2008, he achieved his habilitation (docent competence) from Växjö University (Sweden). Dr. Kerren is currently a Full Professor in Computer Science at the Department of Computer Science, Linnaeus University (Sweden), where he is heading the research group for Information and Software Visualization, called ISOVIS. His main research interests include the areas of Information Visualization, Visual Analytics, and Human-Computer Interaction. He is, among others, editorial board member of the Information Visualization journal, has served as organizer/program chair at various conferences, such as IEEE VISSOFT 2013/2018, IVAPP 2013-15/2018 or GD 2018, and has edited a number of successful books on human-centered visualization.
Faculty Host: Dr. Helen Purchase
Tuesday, October 2, 2017.
Title: Depth Based Visualizations for Ensemble Data and Graphs
Ensemble datasets are being increasingly seen in a range of domains. Such datasets often appear as a result of a collection of solutions recorded from simulation runs with different parameters/initial conditions, as well as precision uncertainty associated with repeated measurements of a natural phenomenon. Studying ensembles in terms of the variability between members can provide valuable insight into the generating process; particularly when mathematically modeling the process is complex or infeasible. Ensemble visualization can be a powerful way to study the generating process by analyzing ensembles of solutions or possible outcomes. In ensemble visualization, key interests include understanding the typical/atypical members as well as variability in the ensemble. In absence of any information about the underlying generative model, a family of nonparametric methods known as data depth is able to quantify the notion of centrality and provide center-outward order statistics for ensembles. In this talk I will explore novel applications of existing depth based methods, and describe my research on new advantageous visualizations—and associated methods to compute depth—for ensembles of various data types—namely, 3D isocontours, paths on a graph, nodes on a graph, graphs, and data in inner product spaces.
Mukund Raj graduated with a B.S. degree in Electronics and Telecommunications Engineering from the University of Pune in 2008. From 2008 to 2011 he worked as a software engineer at Infosys Labs, where he worked on developing web based accessibility tools. From 2011 to 2013 he was a member of the Visual Perception and Spatial Cognition lab at the University of Utah. In 2013 he graduated with an M.S. degree in computing from the University of Utah, where is also currently working toward a PhD degree in computing.
Faculty Host: Dr. Alon Efrat
Thursday, September 28, 2017.
Title: Designing Secure Systems for Censorship Resistance
Tools to circumvent censorship aim to hide the websites that users access from a government censor. Some even disguise traffic patterns by mimicking allowed protocols or using services such as Skype to tunnel censored content. These systems have evolved as a result of a cat-and-mouse game between nation-state censors and censorship resistors: as new techniques for evading censorship arise, censors tweak their filtering systems to identify the weaknesses in existing tools that signal their usage. In this talk, I will describe key events in the censorship arms race and how to design and implement censorship circumvention tools that tilt the arms race in the favour of the censorship resistor.
Faculty Host: Dr. David Lowenthal
Thursday, Septemeber 14, 2017.
Title: Visual Analytics Methods for Spatiotemporal Analysis
From smart phones to fitness trackers to sensor enabled buildings, data is currently being collected at an unprecedented rate. Now, more than ever, data exists that can be used to gain insight into how policy decisions can impact our daily lives. For example, one can imagine using data to help predict where crime may occur next or inform decisions on police resource allocations or diet and activity patterns could be used to provide recommendations for improving an individual's overall health and well-being. Underlying all of this data are measurements with respect to space and time. However, finding relationships within datasets and accurately representing these relationships to inform policy changes is a challenging problem. This research talk will address fundamental questions of how we can effectively explore such space-time data in order to enhance knowledge discovery and dissemination. Examples in this talk will focus on my lab group's recent research efforts in criminal analysis looking at methods of extending kernel density estimation, a theoretical analysis of cluster projections in choropleth maps, and novel visualization methods for tracking geographical hotspots with an emphasis on disease surveillance.
Ross Maciejewski is an Associate Professor of Computer Science at Arizona State University whose primary research interests are in the areas of geographical visualization and visual analytics focusing on public health, social media, sustainability, criminal incident reports and dietary analysis. He has served on the organizing committee for the IEEE Conference on Visual Analytics Science and Technology and the IEEE/VGTC EuroVis Conference and is serving as the Vice Chair for IEEE VIS 2017 in Phoenix, AZ. His work has been recognized through award winning submissions to the IEEE Visual Analytics Contest (2010, 2013 and 2015), and a best paper award in EuroVis (2017). He is a Fellow of the Global Security Initiative at ASU and the recipient of an NSF CAREER Award (2014).
Tuesday, September 12, 2017.
Speaker: Kyle Fox, Ph.D.
Title: Maps Between Geometric Data Sets
We will discuss two variants of the problem of computing maps between data sets. First, we will describe a near-linear time approximation algorithm for computing dynamic time warping maps between point sequences, a central problem in the analysis of trajectories and other curves. Next, we will describe fast approximation algorithms for computing transportation maps, a widely used method for comparing and relating two distributions. In both cases our goal is to develop simple, fast, hopefully near-linear-time approximation algorithms.
Kyle Fox recently joined the University of Texas at Dallas as an Assistant Professor after completing a postdoc at Duke University. He obtained his Ph.D. from the University of Illinois at Urbana-Champaign in 2013. His research interests lie primarily in algorithms, including geometric algorithms, computational topology, combinatorial optimization, and their applications to data analysis and graph algorithms. He was a recipient of the Department of Energy Office of Science Graduate Fellowship and a winner of the C. W. Gear Outstanding Graduate Student award while at the University of Illinois.
Title: Graph Drawings: as created by users (or 'Doing the Future Work')
Prior experimental work has focussed on the extent to which the layout of a graph drawing assists with the comprehension of the embodied relational information. This seminar presents an alternate approach to determining the relative worth of graph layout aesthetics, based on how users create their own graph drawings. The seminar will present the results of both the published research experiments, as well as two follow-up studies.