DashO Java Obfuscator

Reviewed by: Tapasya Patki

Department of Computer Science
University of Arizona
September 10, 2008

Abstract

This website is an academic report on the tool DashO Pro, a commercial Java obfuscator, compactor, optimizer and watermarker developed by PreEmptive Solutions that prevents reverse engineering of the generated byte-code. DashO is a post-development recompilation system that can be used to secure the source code prior to the testing, integration and release phases in the software development life-cycle.

Introduction

The Java programming language is a general-purpose, object oriented language that offers important advantages in terms of platform-independence, better productivity and configurability. Software written in Java is distributed in the form of byte-code, which preserves the semantics involved in the original source code to a large extent. This makes Java code easy to reverse engineer, and hence susceptible to misuse. Open source Java decompilers like JAD and commercial decompilers like SourceAgain are available in the public domain. These tools facilitate reverse engineering and can extract logical and structural details from the distributed byte code.

In order to protect the software from attackers, practices like obfuscation, tamperproofing and watermarking can be deployed [9]. Several commercial tools based on these techniques for securing softwares and protecting intellectual property are available. DashO is one such popular tool. J-shrink, JOBE and KlassMaster are some other widely used security tools. The DashO Java obfuscator was developed by PreEmptive Solutions. It uses a control flow obfuscation and a patented overload-induction technique for renaming identifiers to obfuscate the software code. This methodology indirectly reduces the size of the code, and unlike other obfuscators, makes the code smaller and faster. Another patented mechanism on Transient Variable Caching has been incorporated in the recent versions of DashO.

Installation

DashO is an expensive, licensed tool. The latest version costs about $2000. However, a free 14-day trial version is available for evaluation purposes. This version, unlike the licensed version, requires internet connectivity. DashO can be used on various platforms including Windows and Linux.

A form has to be completed to request for the evaluation version and a copy is made available in about a weekТs time after the approval. After the initial authorization, a serial number is provided and the software can be downloaded from a password-protected link (the size is about 7 MB). Prior to use, the software has to be registered. It is trivial to execute the software after registration as the executable files are provided.

Features

The key features of DashO are obfuscation, optimization and watermarking. These are briefly highlighted below, and are later discussed in greater detail in the 'Internals' section.

Obfuscation

Obfuscation is a technique that makes reverse-engineering of intermediate code difficult. It adds confusion to the code. DashO uses control flow obfuscation and identifier renaming to complicate the byte code and make it incomprehensible. DashO also provides runtime decrypted string encryption along with obfuscation. Incremental obfuscation can also be done.

Compaction and Optimization

Generally, obfuscated code tends to be larger and slower. However, DashO has a unique obfuscation algorithm which optimizes the code, and makes it smaller and faster. DashO makes use of the Transient Variable Caching technique to further improve performance. These optimizations help to distribute the code more effectively across the Internet.

Watermarking

To trace licensed copies of the distributed software, DashO provides the watermarking feature. Copyright information as well as registration information can be embedded in the executable jar files using watermarking. The PreMark tool in DashO serves this purpose.

Usage

There are two modes for operating DashO. The Advanced (Entry Point) Mode works well for complex applications where fine grained control is desired. This mode supports compaction and pruning and allows for deployment of third party packages. The Quick Jar Mode, on the other hand, is ideal for simple applications where coarse grained control is sufficient. Third party packages are not supported and pruning is not required. In addition to these two modes, DashO also has a Command Line Mode that can be used to integrate build scripts. The command line mode also supports the watermarking tool.

A user-friendly GUI has been provided to facilitate the interactions. As shown in the screen-shots, there are four main navigation trees in the user interface. The Input navigation tree is used to specify the location of the class paths, entry points, libraries and other properties. The Obfuscation navigation tree provides various security options including control flow obfuscation, overload-induction renaming and string encryption. The Optimization navigation tree provides size reduction options. The PreMark tool is used for watermarking. The Output navigation tree allows us to specify the location where the obfuscated classes and generated mappings and reports would be stored. A wizard is also provided to help the user in the process.

Scrn1

Generated Results

Files with the extension .dox are generated for project files. These project files contain information about how a given application is to be obfuscated. The dox-files are arranged in a typical name-value format.

Obfuscated Output (decompiled using javap)

The obfuscated outline of the program aftter the identifier renaming and control flow is shown in the figure. It can be seen that the renaming process takes away the context from the code, thus making it difficult to analyze. Also, it can be seen that DashO tried to give the same name to as many identifiers as possible.

Javap Output

Obfuscated and Encrypted Output (decompiled using SourceAgain)

The string encryption feature can be used to encrypt and protect sensitive data in the source code. The code snippet in the figure shows how the obfuscated, encrypted code looks like. DashO does on-the-fly decryption.

Report Generation

DashO provides a reporting mechanism as well. Typical reports generated have been illustrated in the figures. The project report summarizes basic information about the obfuscated classes and also mentions the time taken for this processing. The renaming report, on the other hand, indicates the mapping between the original and the obfuscated files.

report 1

Internals

DashO uses identifier renaming, control flow obfuscation, string encryption, watermarking and transient variable caching for securing the code. The algorithmic basics for these technologies have been discussed in the following sections.

Renaming Identifiers

DashO uses a patented method for renaming identifiers of a Java program optimally [3]. New names are assigned to the Java classes, fields and methods while maintaining the constraints imposed by the programming system. These names are meaningless, shorter in length, and are selected from an ordered list of plausible identifiers, such as Сa,b,c ЕТ or С1,2,3ЕТ

We can define an identifier to be the name of a variable, method or a class. Java variables are classified into two high-level types- local variables and fields. A field is a variable that represents the data in the class, while a local variable is limited to method scope. All identifiers, except for local variables, are stored in the object files (.class files) after compilation. Optimality is defined as the ability to rename many identifiers to the same string, and reusing the shortest identifiers continually. This introduces redundancy, thus letting one memory location to store the names of many identifiers, making the programs smaller and faster.

Renaming Classes

Classes can be identified in Java with the help of the package name and the actual class name. A package groups together logically related classes, and is a separate entity prior to compilation. The package name becomes a part of the class name after compilation. Hence, a package can be treated along with the class name as a single unit, allowing us to disregard the organization aspects thus facilitating the renaming. Following this integration, the original class names can be replaced with new class names selected from an ordered list by a sequential traversal operation.

Renaming Fields

Fields can be uniquely identified with their name and class membership. It is important to be able to distinguish between the fields within a class. Since fields are not affected by inheritance hierarchies, the renaming process is trivial. For each class, the renaming starts at a predetermined value in the ordered list, say for example at Сa.Т For fields that belong to different classes, redundancy can be introduced easily. Local variables, as discussed earlier, are not renamed as they do not exist after compilation.

Renaming Methods (Overload Induction)

Methods can be distinguished on the basis of their signature, which includes the name and the list of arguments. Since overloading is supported in Java, the complier treats methods with the same name but different signatures separately. Thus, methods within a class that have different lists of arguments can all be safely replaced with the same name, thus supporting maximal overloading. This is known as Overload Induction.

The most important aspect with method renaming is the handling of inheritance hierarchies. For a method to be successfully overridden, the names and the parameter lists should match exactly. Thus, renaming would have to ensure a consistent name replacement mechanism across the inheritance hierarchy. To achieve this, a class inheritance hierarchy tree is built to determine the naming dependencies among the methods. Then all the lead classes of the hierarchy trees are identified and method lists are generated for these by walking up the tree. These method lists include the inherited methods as well as the methods local to the class.

Once the method lists have been generated, the renaming process can begin. This is done in two phases. In the first phase, methods with access modifiers as default, protected or public are renamed. A dependency routine (shown in figure) is executed to obtain the mapping from original names to the new names. In the second phase, the private and the static methods of the class are renamed. Static methods are treated like private methods.

Renaming Interfaces

Interfaces in Java are generally used to support abstract definitions and multiple inheritance. The renaming of interfaces is treated in a manner similar to that of classes.

Dependency Routine

(Image Src: [3])

Control Flow Obfuscation

Obfuscation is a popular technique that is used to complicate the source code of a program so as to render reverse engineering attacks impossible. Obfuscation makes the code difficult to understand when it is de-compiled while maintaining the same functional behavior.

The software is deliberately transformed into an identically functioning but purposefully unreadable form. There are two broad categories of obfuscation- surface obfuscation and deep obfuscation. Surface obfuscation deals with complicating of the concrete syntax of the source code. Deep obfuscation, on the other hand, adds confusion to the actual structure of the program by changing its control flow or the data reference behavior. Various obfuscation methodologies like data obfuscation, layout obfuscation, design obfuscation etc. have been discussed in the literature.

Control flow obfuscation concentrates on obscuring the flow of control and the purpose of the variables in individual procedures. Opaque predicates or misleading constructs, like the if statement that always evaluate to true on program execution, may be introduced. Control Aggregation, Control Ordering and Control Computation are the fundamental methodologies that obfuscate the control flow of the program. These have been discussed in [8,9].

DashO uses deep and advanced control flow obfuscation to add confusion to the source code. Instead of adding misleading code constructs, DashO destroys the code patterns that the decompilers may use to obtain the original code. This involves the generation of spaghetti code, which produces valid executable behavior, but results in semantically non-determinant decompiled output.

String Encryption

A typical attack involves searching for specific strings or numbers in the software. For example, an attacker may try to look for the serial number or a software-generated error message to be able to bypass a license or a registration check. To thwart such attempts, DashO provides for a runtime decrypted string encryption technique. This makes it possible to encrypt strings in sensitive parts of the application. The configuration rules are inclusion rules as on-the fly decryption is used. This means that the strings shall not be encrypted unless the methods that use them are included.

An interesting discovery

The encryption algorithm that is used by DashO has not been discussed in published literature. However, when the class files generated by DashO after obfuscation and string encryption were examined, it was observed that DashO adds a decryption routine to one of the classes in the software. This class is generally a class which does not contain any encrypted information. This decompilation of this decryption routine failed when the JAD decompiler was used. However, when the web-version of SourceAgain was used, the decryption method was successfully decompiled, though it was in an obfuscated form. This routine has been shown in the figure.

Decrypt

Incremental Obfuscation

This is a feature of DashO facilitates iterative software development, which is of interest to enterprise development teams. Since DashO generates name mapping records during the obfuscation run, the obfuscated names can be reapplied and preserved in successive runs. This makes it possible to do a partial build with the expectation that the access points will get mapped to exactly the same names as a prior build. This makes updating of previously released obfuscated software possible.

Watermarking

Watermarking refers to the embedding of a message into the software unobtrusively, without affecting its runtime behavior. This can be used to verify license information, and to track the unauthorized copies of the software back to the source. Customer identification or copyright information can be hidden in a watermark.

DashO provides the PreMark took to add and read watermarks. This is a command line tool which does not require the DashO GUI to run. PreMark can be used to watermark any jar file, even if it has not been obfuscated by DashO. The usage has been shown in the figure. DashO also provides a GUI-based version of PreMark.

PreMark

Note: The PreMark tool failed to load in the evaluation version of DashO and resulted in a registration error (even after re-registering a number of times).

Transient Variable Caching

Serialization refers to the process of converting the current state of an object to a form that can be used for storing it in a file or transmitting it over a network. A transient variable is a variable that cannot be serialized. Transient variables are generally sensitive variables, like passwords, which we do not want to save as a part of the persistent state of the object. They may also be temporary variables whose storage may be umimportant.

On register based machines, the transient variables can be cached in the available registers, thus saving the overheads of memory access. These removals of transient variables on register-based architectures make the program faster. Java programs, on the other hand, rely on stack-based architectures. If JavaТs stack-based code could be forced to work like a register-based system for transient variables, then a substantial performance improvement could be achieved. Tyma [10] discusses the Transient Variable Caching (TVC) algorithm to achieve this. Through aggressive instruction reordering of the byte code and with the help of peep-hole optimization, the effect of variable caching can be created on the Java run-time stack. This makes the program smaller and faster. A side-effect of the mechanism is that the instruction reordering makes decompilation of byte code difficult. Thus, both code obfuscation and optimization can be achieved.

The TVC algorithm strives to remove definition-use sequences from the intermediate code (which may or may not adjacent). The intermediate code is divided into basic blocks, and a Directed Acyclic Graph (DAG) is created to represent the instruction sequences at each level. As long as the dependency relationships in the graph are maintained and the stack integrity is preserved (i.e. the leaves the stack in the same way that it found it), these definition-use sequences can be removed. In the byte code, these sequences occur within store-load instruction blocks. Hence, the algorithm begins by scanning through each block of code to gather the store-load sets. After this, liveness testing on the variables associated with these instructions is done. The next step is to remove the simple, adjacent pairs. The main part of the algorithm deals with how violating pairs can be removed. Violating pairs are pairs that cannot be removed directly as they may result in stack imbalance or dependency alterations. To resolve such pairs, aggressive instruction reordering is carried out. This revolves around the Push Migration algorithm, in which a push (or load) instruction is moved to an earlier location in the block. Commutative properties of operations need to be maintained.

TVC Example

Figure shows an example of how TVC re-orders instructions. Here, the stack is acting as a cache for the temporary variable, thus reducing the extra memory overhead.

Empirical results show that with TVC, a significant reduction in code is obtained when compared to traditional pruning and grafting methods. Also, TVC provides a modest performance increase.

Concerns with Dynamic Loading and Reflection

Java has some powerful features like reflection and dynamic class loading. Dynamic class loading is a utility that lets the Java platform install components at run time. It provides support for important concepts like lazy- loading, type-safe linkage and user-defined extensibility. Reflection is another useful idea that lets the program СreflectТ or СintrospectТ at runtime. This also equips the program with the ability to modify or manipulate itself. It is important from the point of view of extensibility and debugging. Reflection generally comes with a significant performance overhead.

DashO supports dynamic loading by excluding all the potential dynamically loadable classes from its obfuscation and encryption processes. This requires a careful manual configuration of the system. There are situations when dynamic loading is predictable. In such situations, it is expected that DashO would be provided with complete information about these classes which may be loaded at run-time. However, there may be user-customizable applications where dynamic loading is unpredictable. In such situations, it is very important to configure DashO correctly, as the obfuscation process may result in code which does not function as expected and is irreversible.

DashO follows a conservative approach by unconditionally including all classes that are viewed via reflection. DashO cannot always identify the class which makes the reflection call. Therefore, it is the responsibility of the developer to explicitly provide this information to DashO. There may also be situations where the conservative approach fails. DashO generates reports and indicates the irresolvable situations encountered in both these cases.

Evaluation

There is no standard mechanism for doing performance analysis for the available commercial obfuscators. Karnick et al. [6] discuss a qualitative analysis mechanism for Java obfuscators. The overall quality (S_quality) of an obfuscator is determined on the basis of three factors - potency (S_pot), resilience (S_res) and cost (S_cost). Potency refers to the amount of obscurity that has been added to the source code to render it incomprehensible. This depends on the confusion introduced in nesting, control flow, variables and length of the program. The resilience is a measure of how strongly the obfuscated source code can resist an automated attack against a de-obfuscator/de-compiler. It is difficult to measure resilience since no commercial de-obfuscators exist in the industry. Hence, the paper lays its analysis on available de-compilers like JAD, DJ Java and Cavaj. The cost of obfuscation refers to the computational overhead involved in transforming the source code. This can be calculated based on the resource utilization factors of memory, storage space and application's run time.

Typically, the overall quality of an obfuscator can be rated as:

(S_quality) = 0.4 S_pot + 0.6 S_res - S_cost
Based on these factors, Karnick et al.[6] conclude that DashO is more of an optimizer than an obfuscator, since it has the lowest scores for potency (refer to Figure) and resilience, and a good score for cost. DashO has been compared with peer obfuscators like KlassMaster, Smokescreen and Allatori. This is a recent evaluation conducted in 2006.

Potency

(Image Src: [6])

In another report from 2002, Hamilton et al. [5] mention that though DashO proudly proclaims their use of a renaming system, it does not seem to do any intelligent obfuscation. A technical report by Hongying Lai [11] at the University of Auckland reaches a conclusion that DashO is an average obfuscator. While KlassMaster scores a 9 on a 10-point metric (based on some benchmark tests) described in the report, DashO scores a 5, and other obfuscators like JShrink score a mere 2.

Summary

DashO is a commercial Java obfuscating, optimizing and watermarking tool. The features and usage of DashO were discussed in this paper. The internal algorithms for identifier renaming and transient variable caching were also reviewed. Though there are no good measures for evaluating obfuscators, an attempt to analyze the capabilities of DashO was made. It was observed that DashO was good at optimizing code, however, the obfuscation mechanisms were not very strong when compared to its peer obfuscators.

References

DashO Download Link
DashO Making Java code Smaller, Faster, Better, Technical White Paper, Retrieved on Sept 6, 2008 from http://www.kessler.de/prd/preemptive/dasho_whitepaper.pdf
Method for Renaming Identifiers of a Computer Program, United States Patent Number 6,102,966, Aug 15, 2000 Retrieved on Sept. 3, 2008 from http://www.freepatentsonline.com/6102966.html
FAQ DashO, http://www.preemptive.com/dasho-faq.html and http://www.soleacom.com/UK/Products/FAQDasho.htm
Hamilton J.A.,Chatham W.,Eoff B., Imsand E., and Sachitano A,- Security issues Resulting from Interoperability, Retrieved on Sept 6, 2008 from http://www.eng.auburn.edu/users/hamilton/security/spawar/9_Security_Issues_Resulting_from_Interoperability.pdf
Karnick M., MacBride J., McGinnis S., Tang Y.,and Ramchandran R,- A Qualitative Analysis of Java Obfuscation, Proceedings of the 10th International Conference on Software Engineering and Applications, Nov. 13-15, 2006, USA, Retrieved on Sept 6, 2008 from http://users.rowan.edu/~ravi/conference/conf_2006_06.pdf
Software Code Protection by Obscuring its Data Driven Form, - http://www.wipo.int/pctdb/en/wo.jsp?IA=CA2000000943&wo=2001014953&DISPLAY=DESC
Collberg C., Thomborson C., and Low D, (1997). A Taxonomy of Obfuscating Transformations, Retreived on Sept 6, 2008 from http://www.cs.arizona.edu/~collberg/Research/Publications/CollbergThomborsonLow97a/
Collberg C., and Thomborson C., (2002). Watermarking, Tamper-proofing and Obfuscation- Tools for Software Protection. IEEE Transactions on Software Engineering, Vol. 28, No. 8, pp 735-746
T¤ma, P., (1999).Transient variable caching in JavaТs stack-based intermediate representation, Scientific Programming, Vol. 7, Iss. 2, pp. 157-166
Hongying Lai (2001), A comparative survey of Java obfuscators available on the Internet, Technical report submitted to the University of Auckland, Retrieved on Sept 8, 2008 from http://www.cs.auckland.ac.nz/~cthombor/Students/hlai/hongying.pdf