In this post, I’ll discuss some of the design issues, approaches and solutions I’ve encountered and taken designing and implementing my prototype Antivirus scanner. Specifically, this post will look at some of the issues involved with inter process communication, client-server Antivirus and middleware such as web services.
In the beginning, my Antivirus scanner was one single standalone application. It took an argument in the command line to say which binary should be scanned. This is the simplest approach, and not the most efficient. I discovered during evaluation that when the signature database was of a moderate size, loading of the database into main memory was taking a significant amount of time. In addition, simply loading the application into memory was taking longer than expected due to the large size of the binary and the number of libraries it linked against. In some windows systems caching allowed program loading to perform faster, but the database issue was a showstopper if I was going to scan any more than a trivial number of binaries.
The solution I chose was to make the Antivirus scanner a client-server architecture. I would start the scanner as a server, and take as long as necessary to load the database. The client would be a very compact and simple application that submitted jobs to the server and received a response. At this point the issue of IPC and middleware raised its head. How do the client and server communicate?
Possibly communication could be done using local IPC. I chose sockets. This enabled me to create separate clients for Linux and Windows, and submit jobs across the network. This was useful at the time because I wanted my malware collection to be stored on a fairly isolated Linux box to reduce the chance of unwanted malware infections.
Sockets are great, but how to implement the client-server model? One option is to use a custom protocol. Another option is to use middleware solutions such as RPC or even things like SOAP or Corba. I chose to go with a custom protocol.
How does the server display the results of each scan you might ask. Originally, I used a very ad hoc text-based output. Very hacky, very simple. I had a set of shell scripts to generate statistics on scan results using this output. The problem with this naive approach is that every time I slightly changed the result data, the scripts would break. My solution to this was to modify the output to use XML. I also rewrote those particular scripts to use Python which has reasonable XML processing capabilities.
Now I’m getting to the point where I want to expose the Antivirus system to the internet. How do I go about this? Web access seems important. My first approach at exposing the system was by using a PHP written web interface that connected to the server using the custom protocol I talked about earlier. The PHP also managed some SQL databases. There is more to an Antivirus scanner than just the engine, and the PHP managed this. I wrote a fairly simple interface and design for this, and it works reasonably well.
I am currently ditching the PHP web interface and going with a Java desktop client. I wrote a Java desktop client for binary navigation and analysis that communicated with the Antivirus and analysis engine using my custom protocol. But because the protocol is custom, it’s hard to expose this publicly through firewalls. I considered a HTTP tunnel using the TRACE method to forward communication from the client through the gateway to my Antivirus server, but I don’t really trust my implementation of a middleware layer. These things are hard to get right, and vulnerabilities are typically common in these types of applications. Corba as a middleware layer is out of the question because it doesn’t handle firewalls very nicely. Likewise, the Windows middleware layer (WCF) is very centric to .net, and I don’t want vendor lock-in at this point since I don’t see Mono at this point being a complete solution for my needs. Web services seem the way to go. The choices are SOAP and REST based services. I have chosen SOAP as the initial implementation using Apache Axis2. I plan to reimplement the PHP web interface into a server-side component based architecture with a java client on the desktop. The server side components will communicate to the Antivirus scanner using the custom protocol previously talked about.
There is whole slew of other questions to address, such as how tightly coupled should unpacking, static analysis and all the other components should be. Should middleware be used to integrate those components? The short answer is yes, and I’m using Java and SOAP to make the system more component based, but I will not address these issues in this post. I think this post has reached its word limit.