Colloquia 2019-2020

Thursday, March 19, 2020 - 11:00am - Virtual

Speaker: Vivek Kulkarni, Ph.D.

Tittle: "Human-Centric Natural Language Processing"

Abstract: Despite remarkable progress, modern natural language processing (NLP) systems are brittle and biased because they ignore the human and social context that language is situated in, and are insensitive to differences in language use across contexts like time and geography. To overcome these limitations, I advocate for NLP that is human-centric and robust to language variation. First, I will demonstrate that one can model human context by learning a small set of latent human factors (traits) from background language. I show that these traits broadly capture differences among people, are generally predictive of a variety of outcomes and improve the performance of NLP models on tasks such as stance and sarcasm detection. In the second part of the talk, I will propose methods to learn socially primed word embeddings that reliably model semantic variation across contexts and briefly outline their effectiveness on downstream tasks. Finally, I conclude by mapping out open problems and challenges that naturally guide future research towards the larger goal of robust, human-centric and fair natural language processing.

Bio: Vivek Kulkarni is a Postdoctoral Research Scholar at Stanford in the Stanford NLP group advised by Prof. Dan Jurafsky. His research broadly focuses on making NLP robust to language variation and human-centric. He received his Ph.D. in Computer Science from Stony Brook University in 2017. His work has been featured in MIT Tech Review, the VICE and, The Guardian.

Faculty Host: Dr. Josh Levine

 

Tuesday, March 17, 2020 - 11:00am - Virtual

Speaker: Sazzadur Rahaman, Ph.D. Candidate

Title: "From Theory to Practice: Deployment-grade Tools and Methodologies for Software Security"

Abstract: Automated software checking for security is a challenging problem with a remarkable impact. Most of the solutions are hindered by the practical difficulty of reducing false positives without compromising analysis quality. In this talk, I will share my experiences with building high precision tools and methodologies for software security checking (i.e., detecting software non-compliance and vulnerabilities).

In the first part, I will present my work on building robust methodologies to evaluate the payment card industry (PCI) data security standard (DSS) certification process for e-commerce websites. Our study confirms that 86% of the websites have at least one PCI DSS violation that should have disqualified them as non-compliant. In the second part, I will talk about our solution for high precision (98.61%) detection of cryptographic API misuse vulnerabilities massive-sized (e.g., millions of LoC) programs. Oracle has implemented this in its internal code analysis platform, Parfait and found new issues that were previously unknown. I will also share my insights on secure coding in the light of our findings in several high-profile open source projects.

Bio: Sazzadur Rahaman is a Ph.D. candidate from the department of computer science at Virginia Tech. His research focus is to minimize the gap between the theory and the practice of software security.
Sazzadur's works have been published in top-tier security conferences (i.e., ACM CCS, PETS) and Journals (i.e., TDSC). As the recognition of his work, he received several fellowships (Bitshare fellowship and Pratt fellowship) at Virginia Tech.
Prior to joining Virginia Tech, he worked as a software engineer. He has 3.5+ years of industry experience in building health care, payment, and financial technology solutions. He received his B.Sc. in computer science at Bangladesh University of Engineering and Technology (BUET). Sazzadur is also among the top 7% users on StackOverflow with 6000+ reputations from 200+ posts.

Faculty Host: Dr. Beichuan Zhang

 

Thursday, March 05, 2020 - 11:00am - Gould-Simpson (GS) 906

Speaker: Jianguo Wang, Ph.D.

Tittle: "Data Search Systems for Emerging Trends"

Abstract: Building scalable data systems is not only important for computer science, but also for modern society. Historically, data was primarily managed by databases. Now, we see a proliferation of modern data systems such as search systems, key-value stores, and cloud systems. This is largely due to the emerging trends that make people rethink whether existing designs are still optimal. Among which, three driving forces protrude: modern hardware, emerging applications, and cloud computing.

In this talk, I will focus on a widespread class of data systems, namely data search systems, and show how modern hardware and emerging applications alter the design tradeoffs. First, I will introduce a memory-centric search index compression that enables fast query processing over compressed data by leveraging the properties of modern hardware. Second, I will present a protein data search system that supports billion-scale interactive data analytics by fully exploring domain knowledge and application-specific data characteristics. Finally, I will outline future research agenda on developing scalable data systems for emerging trends.

Bio: Jianguo Wang is currently working at Amazon AWS on cloud-native databases. He received his Ph.D. in Computer Science from the University of California San Diego in December 2018, under the supervision of Professors Yannis Papakonstantinou and Steven Swanson. His research interests span database systems, data search systems, modern hardware, emerging applications, and cloud computing, with a focus on building efficient data management systems for emerging trends. He interned at Microsoft Research, Oracle, and Samsung on data-intensive systems.

Faculty Host: Dr. Rick Snodgrass

 

Tuesday, March 03, 2020 - 11:00am - Gould-Simpson (GS) 906

Speaker: Lifu Huang, Ph.D Candidate

Title: "Cold-Start Universal Information Extraction"

Abstract: Who? What? When? Where? Why? are fundamental questions asked when gathering knowledge about and understanding a concept, topic, or event.The answers to these questions underpin the key information conveyed in the over whelming majority, if not all, of language-based communication.Unfortunately, typical machine learning models and Information Extraction (IE)techniques heavily rely on human annotated data, which is usually very expensive and only available and compiled for very limited types or languages, rendering them incapable of dealing with information across various domains, languages, or other settings.

In this talk, I will introduce a new information extraction paradigm - Cold-Start Universal Information Extraction, which aims to create the next generation of information access where machines can automatically discover accurate, concise, and trustworthy information embedded in data of any form without requiring any human effort. Principally, my efforts along this line go towards three questions: (1) How can machines automatically discover the key information from texts without any pre-defined types or any human annotated data? (2) How can machines benefit from available resources, e.g.,large-scale ontologies or existing human annotations? (3) How can information extraction approaches be extent to low-resource languages without any extra human effort? My research answers these questions with three key research innovations: a Liberal Information Extraction framework which bottom-up discovers structured information and automatically induces a type schema, aZero-shot IE approach which reframes IE as a grounding problem instead of classification, and a multilingual common semantic space framework which retains clustering structures in each language and enables IE to be feasible for thousands of languages. I will conclude my talk by showing what are the remaining challenges and discussing several future research directions. 

Bio: Lifu Huang is a PhD candidate at the Computer ScienceDepartment of University of Illinois at Urbana-Champaign. He has a wide range of research interests in natural language processing and understanding. Specifically, his current research focuses on developing efficient information extraction approaches to automatically extract structured knowledge from any forms of data at little to no cost.  He received his M.S. from Peking University in 2014 with the highest university honor and National Scholarship. He has served as the Program Committee member for many top NLP and AI venues including ACL, EMNLP, NAACL, AAAI, etc. He also received the fellowship from Allen Institute for Artificial Intelligence (AI2)in 2019.

Faculty Host: Dr. Kwang-Sung Jun

 

Thursday, February 27, 2020 - 11:00am - Gould-Simpson (GS) 906

Speaker: Hui Guan. Ph.D. Candidate

Title: "Reuse-Centric Programming System Support of Machine Learning"

Abstract: Modern machine learning, especially deep learning, faces a fundamental question: how to create models that efficiently deliver reliable predictions to meet the requirements of diverse applications running on various systems. This talk will introduce reuse-centric optimization, a novel direction for addressing the fundamental question. Reuse-centric optimization centers around harnessing reuse opportunities for enhancing computing efficiency. It generalizes the principle to a higher level and a larger scope through a synergy between programming systems and machine learning algorithms. Its exploitation of computation reuse spans across the boundaries of machine learning algorithms, implementations, and infrastructures; the types of reuse it covers range from pre-trained Neural Network building blocks to preprocessed results and even memory bits; the scopes of reuse it leverages go from training pipelines of deep learning to variants of Neural Networks in ensembles; the benefits it generates extend from orders of magnitude faster search for a good smaller Convolution Neural Network (CNN) to the elimination of all space cost in protecting parameters of CNNs.

Bio: Hui Guan is a Ph.D. candidate in the Department of Electrical and Computer Engineering, North Carolina State University, working with Dr. Xipeng Shen and Dr. Hamid Krim. Her research lies in the intersection between Machine Learning and Programming Systems, with a focus on improving Machine Learning (e.g., speed, scalability, reliability) through innovations in algorithms and programming systems (e.g., compilers, runtime), as well as leveraging Machine Learning to improve High-Performance Computing.

Faculty Host: Dr. Jason Pacheco

 

Tuesday, February 25, 2020 - 11:00am - Gould-Simpson (GS) 906

Speaker: Peng Qi, Ph.D. Candidate

Tittle: "Explainable and Efficient Knowledge Acquisition from Text"

Abstract: Human languages have served as the media for our knowledge over generations. With the rise of the digital world, making use of the knowledge that is encoded in text has become unprecedentedly important yet challenging. In recent years, the NLP community has made great progress towards operationalizing textual knowledge by building accurate systems that answer factoid questions. However, largely relying on matching local text patterns, these systems fall short at their ability to perform complex reasoning, which limits our effective use of textual knowledge. To address this problem, I will first talk about two distinct approaches to enable NLP systems to perform multi-step reasoning that is explainable to humans, through extracting facts from natural language and answering multi-step questions directly from text. I will then demonstrate that beyond static question answering with factoids, true informativeness of answers stems from communication. To this end, I will show how we lay the foundation for reasoning about latent information needs in conversations to effectively exchange information beyond providing factoid answers. 

Bio: Peng Qi is a Computer Science PhD student at Stanford University. His research interests revolve around building natural language processing systems that better bridge between humans and the large amount of textual information we are engulfed in. He is excited about building scalable and explainable AI systems, and has worked on extracting knowledge representations from text, question answering involving complex reasoning, and multi-lingual NLP.

Faculty Host: Dr. Kobus Barnard

 

Tuesday, February 18, 2020 - 11:00am - Gould-Simpson (GS) 906

Speaker: Yue Duan, Ph.D.

Title: "Discerning Code Changes From a Security Perspective"

Abstract: Programs are not immutable. In fact, most programs are under constant changes for security (e.g, vulnerability fix) and non-security (e.g., new features) reasons. These code changes have exposed great security challenges.

In this talk, I will present my unique approach that combines static/dynamic program analysis with other techniques including deep learning and virtual machine introspection (VMI), to understand code changes from a security perspective in mobile and PC software domains, and further solve real-world security issues. First, Android packers, as a set of code transformation techniques, are gaining increasingly popularity among Android malware, rendering existing malware detection techniques obsolete. We propose DroidUnpack, which is a VMI based Android packing analysis framework, to perform the first large-scale systematic study on Android packing techniques, and report some surprising findings. Second, Android third-party libraries (TPL) that can provide complementary functionalities and ease the app developments have become one of the major sources of Android security issues, due to the pervasive outdatedness issue. Prior efforts have been made to understand and mitigate specific types of security issues in TPLs, but there exists no generic solution to solve the issues and keep them up-to-date. We propose LibBandAid to automatically generate updates for TPLs in Android apps in a non-intrusive fashion without the need of source code. Third, binary code differential analysis, a.k.a, binary diffing, is a fundamental analysis capability that aims to quantitatively measure the similarity between two given binaries and produce the fine-grained block level matching. It has enabled many critical security usages including patch analysis and malware analysis. Existing binary diffing techniques suffer from low accuracy, poor scalability, coarse granularity or require extensive labeled training data to function.

I present a novel technique named DeepBinDiff, an unsupervised deep neural network based program-wide code representation learning technique for binary diffing. It relies on both the code semantic information as well as the program-wide control flow information to generate basic block embeddings, and further performs a K-hop greedy matching to find the optimal diffing results using the generated embeddings.

Bio: Yue Duan is currently a Postdoctoral Researcher at Cornell University. He received his Ph.D in Computer Science from UC Riverside in 2019. He earned his M.S and B.S from Syracuse University and Xi'an Jiaotong University respectively. His research interests mainly lie in System Security, Mobile Security, Deep Learning and Blockchain. His work has been extensively published in leading security conferences including ACM CCS, NDSS and RAID.

Faculty Host: Dr. Carlos Scheidegger

 

Tuesday, February 04, 2020 - 11:00am - Gould-Simpson (GS) 906

Speaker: Hong Hu, Ph.D.

Title: "Space Wars: Exploiting Program (in)Variants for Software Security"

Abstract: The ever-increasing code base of modern software inevitably introduces vulnerabilities which enable attackers to construct sophisticated exploits and compromise our computer systems. Control-flow hijacking is the state-of-the-art exploit method, where attackers aim to take over the execution of the vulnerable program. Accordingly, defenders strive to protect the control-flow integrity to mitigate attacks. As these protections gradually get deployed, it is getting harder for attackers to hijack the control-flow and they may switch to other exploit methods to achieve malicious goals. It is urgent for defenders to understand the remaining attack vectors and develop defenses in advance.

In this talk, I will present two works that explore the program data space to provide comprehensive protections and to find new devastating attacks and. First, I will demonstrate that program data space provides necessary auxiliary information for achieving complete protection against control-flow attacks. Specifically, only with extra context information, we can get the unique code target for indirect calls and jumps. Second, I will demonstrate that data-oriented attacks, which conform to all control-flow protections, are practical, expressive and can be generated automatically. Attackers can systematically search in the program data space to construct arbitrary, even Turing-complete computations in real-world programs, like browsers.  In the end, I will talk about my plan on extending data-oriented attacks to other platforms and languages, and the potential directions to prevent this new type of attacks. 

Bio: Dr. Hong Hu is a research scientist of computer science at the Georgia Institute of Technology. His main research area is system and software security, focusing on exploring new attack vectors of memory errors and developing effective defense mechanisms. His work has appeared in top venues of system security, including IEEE S&P, USENIX Security, CCS and NDSS. He received the Best Paper Award from CCS 2019 and ICECCS 2014. Dr. Hu obtained his Ph.D. degree from the National University of Singapore in 2016, and was a Postdoctoral Fellow at Georgia Tech from 2017 to 2019.  

Faculty Host: Dr. Saumya Debray

 

Thursday, January 30, 2020 - 11:00am - Gould-Simpson (GS) 906

Speaker: Cheng Tan, Ph.D. Candidate

Title: "How to Audit Outsourced Services"

Abstract: How can users of a cloud service verify that the service truly performs as promised? This question is vital today because clouds are complicated black boxes, running in different administrative domains from users. Their correctness can be undermined by internal corruptions---misconfigurations, operational mistakes, insider attacks, unexpected failures, or adversarial control at any layer of the execution stack.

This talk will present verifiable infrastructure, a framework that lets users audit outsourced applications and services. I will introduce two systems: Orochi and Cobra, which verify the execution of, respectively, untrusted servers and black-box databases. Orochi and Cobra introduce various techniques, including deduplicated re-execution, consistent ordering verification, GPU accelerated pruning, and others. Beyond these two systems, I will also discuss verifiable infrastructure more generally.

Bio: Cheng Tan is a computer science Ph.D. candidate in the Courant Institute at New York University. His interests are in operating systems, networked systems, and security. His work on the Efficient Server Audit Problem was awarded best paper at SOSP 2017. His work on data center network troubleshooting at Microsoft Research has been deployed globally in more than 30 data centers in Microsoft Azure. 

Faculty Host: Dr. Michelle Strout 

 

Tuesday, January 28, 2020 - 11:00am - Gould-Simpson (GS) 906

Speaker: Aravind Machiry, Ph.D. Candidate

Title: "Securing Modern Systems"

Abstract: Modern systems are mainly composed of IoT devices and Smartphones. Most of these devices use ARM processors, which, along with flexible licensing, have new security architecture features, such as ARM Trust Zone, that enables execution of secure applications in an untrusted environment. Furthermore, well-supported,extensible, open-source embedded operating systems like Android allow the manufactures to quickly customize their operating system with device drivers, thus reducing the time-to-market. 

Unfortunately, the proliferation of device vendors and race to the market has resulted in poor quality low-level system software containing critical security vulnerabilities. Furthermore,the patches for these vulnerabilities get merged into the end-products with a significant delay resulting in the Patch Gap, which causes the privacy and security of billions of users to be at risk.

In this talk, I will show how the new architecture features in ARM processors can lead to security issues by introducing new attack vectors. Second, I will show that the existing techniques are inadequate to find the security issues and how, with certain well-defined optimizations, we can precisely find the security issues. Third, I will present my solution to the problem of Patch Gapby showing a principled approach to port patches to vendor product repositories automatically. Finally, I will present my ongoing work to automatically port C to Checked C, which provides a low overhead, backward-compatible, and memory-safe C alternative that could be used on modern systems to prevent security vulnerabilities.

Bio: Aravind Machiry is a Ph.D. candidate in Computer Science at the University of California, Santa Barbara. He is a recipient of various awards, such as the Symantec Research Labs Fellowship and UCSB Graduate Division Dissertation Fellowship. His work spans across various aspects of System security and Program analysis. Specifically, he works on applying static/dynamic program analysis and fuzzing to solve various system security problems. He is also interested in identifying and improving the weaknesses of static program analysis. His research resulted in various Open-source security tools and several Common Vulnerability Exposures (CVEs) in critical system software such as kernel drivers, Trusted Execution Environments, and boot loaders. His research is also academically recognized with awards such as Distinguished Paper Award, Internet Defense Prize, an invitation to present at CSAW Applied Research Competition. Previously, Aravind received his Master's degree in Information Security from the Georgia Institute of Technology.

Faculty Host: Dr. Christian Collberg

 

Tuesday, December 10, 2019 - 11:00am - Gould-Simpson (GS) 906

Speaker: Babak Salimi

Title: "Soundness and Fairness in Data-Driven Decision Making"

Abstract: Scaling and democratizing access to big data offers the alluring prospect of providing meaningful, actionable information that supports decision-making. Today, data-driven decisions profoundly affect the course of our lives: whether to admit us to a particular school, offer us a job, grant us a mortgage, etc. Therefore unfair, inconsistent, or faulty decision-making raises serious concerns about ethics and responsibility. This talk delves into two issues of soundness and fairness in decision making systems. We show how causal inference techniques can help with both issues. 

We will show that analytical SQL queries supported by the mainstream business intelligence and analytics environments can lead to preparing results and wrong business decisions. We demonstrate a system which brings together techniques from data management and causal inference to automatically rewrite analytical SQL queries into complex causal queries that support sound decision making. 

We then show that sound decision making using causal inference is essential for reasoning about fairness and discrimination. We will see the existing popular notions of fairness in ML fail to distinguish between discriminatory, non-discriminatory and spurious correlations between sensitive attributes and outcome of learning algorithms. We will discuss a new notion of fairness that subsumes and improves on several previous definitions and can correctly distinguish fairness violations and non-violations. We will then present an approach to removing discrimination by repairing the training data in order to remove the effect of any inappropriate and discriminatory causal relationship between the protected attribute and classifier predictions. 

Bio: Babak Salimi is a postdoctoral research associate in Computer Science & Engineering at the University of Washington, Seattle, where he works with Dan Suciu and the Database Group. He received his Ph.D. from the School of Computer Science at Carleton University in Ottawa, Canada, and his M.Sc. in Computation Theory (2009) and B.Sc. in Computer Engineering (2006) from Sharif University of Technology and Azad University of Mashhad, respectively. Babak's research interests span data management, causal inference, decision-making systems, algorithmic fairness and responsible data science.

Faculty Host: Dr. Carlos Scheidegger

 

Thursday, December 05, 2019 - 11:00am - Gould-Simpson (GS) 906

Speaker: Ken Shirriff, Ph.D.

Title: "Restoring the Apollo Guidance Computer: Lessons from a 50-year-old system"

Abstract: The Apollo Guidance Computer (AGC) played a critical role in the Moon landings. One of the first computers to use integrated circuits, the compact AGC provided guidance, navigation, and control onboard the spacecraft. This talk explains how we repaired an AGC (including its ferrite core memory), got it running, and ran the original Moon landing software on it. I'll also discuss the AGC's innovations in software engineering, user interfaces, interpreters, real-time computing, and multi-tasking, along with its   performance mining Bitcoins.

Bio: Ken Shirriff restores old computers, including a Xerox Alto and an IBM 1401 punch card computer. His blog (righto.com) discusses reverse engineering everything from chargers to microprocessors. He wrote the Arduino IRremote library, tried mining Bitcoin with pencil and paper, and added seven characters to Unicode. Ken was formerly a programmer at Google and holds a Ph.D. in computer science from UC Berkeley. He has published papers on operating systems and fractals and has received 20 patents.

Faculty Host: Dr. John Hartman

 

Tuesday, December 03, 2019 - 11:00am - Gould-Simpson (GS) 906

Speaker: Santiago Torres-Arias, Ph.D.

Title: "in-toto: Providing farm-to-table guarantees for bits and bytes"

Abstract: The software development process is quite complex and involves a number of independent actors. Developers check source code into a version control system, the code is compiled into software at a build farm, and CI/CD systems run multiple tests to ensure the software’s quality among a myriad of other operations. Finally, the software is packaged for distribution into a delivered product, to be consumed by end users. An attacker that is able to compromise any single step in the process can maliciously modify the software and harm any of the software’s users. 

To address these issues, we designed in-toto, a framework that cryptographically ensures the integrity of the software supply chain. in-toto grants the end user the ability to verify the software’s supply chain from the project’s inception to its deployment. We demonstrate in-toto’s effectiveness on 30 software supply chain compromises that affected hundreds of million of users and showcase in-toto’s usage over cloud-native, hybrid-cloud and cloud-agnostic applications. in-toto is integrated into products and open source projects that are used by millions of people daily.

Bio: N/A

Faculty Host: Dr. Christian Collberg

 

Tuesday, November 26, 2019 - 11:00am - Gould-Simpson (GS) 906

Speaker: Greg Bodwin

Title: "Sketching Graphs and Matrices"

Abstract: Modern algorithms commonly have to process and store enormous networks, which are usually represented as graphs or matrices. Our classic methods are often way too slow to be effective on these huge inputs. This leaves us with two options: replace our classic algorithms with faster ones, or replace our huge input networks with smaller ones. This talk is all about the second option. We will survey the "sketching method," a broad and active area of research in computer science and mathematics where one tries to approximate complicated networks with simple ones (or prove that it can't generally be done). We will overview the sketching method in graph theory, linear algebra, analysis, and algorithms research, and we will discuss some recent successful applications of the sketching method in practice. 

Bio: Greg is a researcher in theoretical computer science and combinatorics. His main academic interest is in how best to "sketch" mathematical objects like graphs, matrices, or metrics by finding smaller or simpler versions with roughly the same properties. Greg completed his PhD at MIT in 2018, and won the George M. Sprowls award for it (best MIT EECS thesis). He is currently working as a postdoc at GA Tech. Greg will be spending 2 weeks in our department, so stop by to say Hi in GS721.

Faculty Host: Dr. Stephen Kobourov

 

Thursday, October 17, 2019 - 11:00am - Gould-Simpson (GS) 906

Speakers: Kate Isaacs, Alex Bigelow, Katy Williams

Title: "Three Short Visualization Talks: Design Methodology, Network Data Wrangling, and ASCII-only Graph Drawing"

Speaker 1: Katy Williams, Ph.D. Candidate

Title: "Visualizing a Moving Target: A Design Study on Task Parallel Programs in the Presence of Evolving Data and Concerns"
Abstract: Common pitfalls in visualization projects include lack of data availability and the domain users' needs and focus changing too rapidly for the design process to complete. While it is often prudent to avoid such projects, we argue it can be beneficial to engage them in some cases as the visualization process can help refine data collection, solving a "chicken and egg" problem of having the data and tools to analyze it. We found this to be the case in the domain of task parallel computing where such data and tooling is an open area of research. Despite these hurdles, we conducted a design study. Through a tightly-coupled iterative design process, we built Atria, a multi-view execution graph visualization to support performance analysis. Atria simplifies the initial representation of the execution graph by aggregating nodes as related to their line of code. We deployed Atria on multiple platforms, some requiring design alteration. We describe how we adapted the design study methodology to the "moving target" of both the data and the domain experts' concerns and how this movement kept both the visualization and programming project healthy. 

Speaker 2: Alex Bigelow, Ph.D. Candidate

Title: "Origraph: Interactive Network Wrangling
Abstract: Networks are a natural way of thinking about many datasets. The data on which a network is based, however, is rarely collected in a form that suits the analysis process, making it necessary to create and reshape networks. Data wrangling is widely acknowledged to be a critical part of the data analysis pipeline, yet interactive network wrangling has received little attention in the visualization research community. In this talk, I discuss a set of operations that are important for wrangling network datasets and introduce a visual data wrangling tool, Origraph, that enables analysts to apply these operations to their datasets. Key operations include creating a network from source data such as tables, reshaping a network by introducing new node or edge classes, filtering nodes or edges, and deriving new node or edge attributes. Origraph enables analysts to execute these operations with little to no programming, and to immediately visualize the results. In addition, we introduce interfaces designed to aid analysts in specifying arguments for sensible network wrangling operations. We demonstrate the usefulness of Origraph through a use case: first in exploring the influence of money on the political support for the war in Yemen.

Speaker 3: Kate Isaacs, Ph.D.

Title: "Preserving Command Line Workflow for a Package Management System using ASCII DAG Visualization"
Abstract: Package managers provide ease of access to applications by removing the time-consuming and sometimes completely prohibitive barrier of successfully building, installing, and maintaining the software for a system. Package management system developers, package maintainers, and users may consult the dependency graph of a package when a simple listing is insufficient for their analyses. However, users working in a remote command line environment must disrupt their workflow to visualize dependency graphs in graphical programs, possibly needing to move files between devices or incur forwarding lag. To preserve the command line workflow, we develop an interactive ASCII visualization for its dependency graphs. We evaluate the use of our visualization through a command line-centered study, comparing it to two existing approaches. We observe that despite the limitations of the ASCII representation, our visualization is preferred by participants when approached from a command line interface workflow.

 

Tuesday, August 27, 2019 - 11:00am - Gould-Simpson (GS) 906

Speaker: Micheal Cherktov, Ph.d

Title: "Gauges, Loops, and Polynomials for Partition Functions of Graphical Models"

Abstract: Graphical models (GM) represent multivariate and generally not normalized probability distributions. Computing the normalization factor, called the partition function (PF), is the main inference challenge relevant to multiple statistical and optimization applications. The problem is of an exponential complexity with respect to the number of variables. In this manuscript, aimed at approximating the PF, we consider Multi-Graph Models (MGMs) where binary variables and multivariable factors are associated with edges and nodes, respectively, of an undirected multi-graph. We suggest a new methodology for analysis and computations that combines the Gauge Function (GF) technique with the technique from the field of real stable polynomials. We show that the GF, representing a single-out term in a finite sum expression for the PF which achieves extremum at the so-called Belief-Propagation (BP) gauge, has a natural polynomial representation in terms of gauges/variables associated with edges of the multi-graph. Moreover, GF can be used to recover the PF through a sequence of transformations allowing appealing algebraic and graphical interpretations. Algebraically, one step in the sequence consists in application of a differential operator over gauges associated with an edge. Graphically, the sequence is interpreted as a repetitive elimination/contraction of edges resulting in MGMs on decreasing in size (number of edges) graphs with the same PF as in the original MGM. 
Even though complexity of computing factors in the sequence of derived MGMs and respective GFs grow exponentially with the number of eliminated edges, polynomials associated with the new factors remain bi-stable if the original factors have this property. Moreover, we show that BP estimations in the sequence do not decrease, each low-bounding the PF.

Bio: Michael Chertkov is the Director of the Applied Mathematics at the University of Arizona. He received a PhD in physics from the Weizmann Institute of Science in 1996 and spent three years at Princeton University as a R. H. Dicke Fellow in the Department of Physics. He joined Los Alamos National Lab in 1999, initially as a J. R. Oppenheimer Fellow in the Theoretical Division, where he led projects in the physics of algorithms, energy grid systems, physics and engineering informed data science, and machine learning for turbulence. He is a fellow of the American Physical Society (APS) and a senior member of IEEE.

Faculty Host: Dr. Stephen Kobourov