Sampled Security Policies

There’s this vague question I’ve been tossing around in my head for a couple of days - what is the class of security policies that one can enforce by sampling the execution of a program? Note that this is not the same as inlining reference monitors.

First of all, we need to precisely define “sampling”. How much state of the program do we see? The entire heap? The PC? The stack? The current value of all registers? How do the properties covered improve or degrade as we increase or decrease the state covered?

Then — what is the granularity of sampling? Instructions? Basic blocks? Random points in the program?
Also, sampling can not enforce a property with certainty, but with some probability. How can we make this probability high?

Like I said, this is just a fuzzy notion, and I have no concrete answers. For all I know, it might turn out that the class of properties enforceable through sampling is very small, or very weak. But it would be nice to know anyway.

Riya and privacy

Srijith commented on my earlier post about trying out Riya, and pointed out that people had been raising privacy concerns about the product.

Ben says: what if Riya recognizes and tags pictures of me in public that other people have taken, that I don’t necessarily want to be associated with? He uses the example of visiting an erotica event.

To which Munjal, one of Riya’s founders, replies:

Riya doesn’t look for you or tag you in another’s photo unless they train us and have you in their friends list. Even if they do, it is not public unless they make it public.

I think that’s a fair enough reply.

I am not a  lawyer, but my perspective on this: when you’re in public (on the street, at an erotica expo etc) you don’t really have any reasonable expectation of privacy, so you can’t say “my privacy was violated” if someone recognizes you.

My only concern about Riya doesn’t even have anything to do with facial recognition — it’s just that (to me, at least) my photos are rather personal things, and I do feel a little queasy about uploading all of them to a server that I don’t control. But this is pattern that comes up again and again, with any hosted service — Gmail, online calendaring, online file storage etc. If only there was a way for me to encrypt everything with a key that I own, and then upload everything, that would be great. But in general, this is a hard problem — how is the remote service going to work on your data if its encrypted?

My dissertation

I finally submitted my dissertation to the UC Irvine library on Friday. After my final defense, this was the last hurdle to jump, and I have now officially completed my doctorate.

Here’s my dissertation. (PDF, 3.5 MB)

page1.jpg

To cut a long story short, here is my thesis:

Remote attestation, one of the core mechanisms of Trusted Computing, can be
performed in a way that:

  • reasons expressively about program behavior and dynamic properties
  • enables a flexible, graded notion of trust
  • avoids intractable management problems at both the client and server end
  • does not tie attestation to a specific executable binary

In short, remote attestation can attest program properties, rather than program
binaries. I call this semantic remote attestation.

RFIDs and command injection attacks

There’s two phrases you never thought would go together! What do RFIDs, those tiny, powerless, passive things, have to do with havoc-wreaking command injection attacks?

Andrew Tannenbaum’s group has just published a paper that shows how seemingly innocent RFIDs can be used to inject malicious code into the backends that process their data:

One day a container arrives in the supermarket distribution center that is carrying a surprising payload. The container’s RFID tag is infected with a computer virus. This particular RFID virus uses SQL injection to attack the backend RFID middleware systems.

There you go — another channel for tainted input.
This so-called “taint problem” comes up everywhere. Most recently, its been brought to light because of its implications for web application security, but if you dig deeper, its a much more fundamental problem that comes up whenever your program deals with untrusted input.

I think the long-term solution to this is to have fine-grained labeling of data, along with a way to propagate those labels. We’ve made an initial jab at the problem, but that’s only scratching the surface. How long before labeling becomes a first-class abstraction in a language? In a virtual machine runtime? Or even at the architecture level in hardware?

Talking about semantic remote attestation

Problem with regular remote attestation

I gave a talk on semantic remote attestation in the language-based security seminar class today. Here are the slides for the talk.

There were many questions from the audience.

One of the questions — “Isn’t all this just about DRM?”

Unfortunately, while DRM is one of the possible applications of having a TPM in your machine, that is not the only thing about Trusted Computing. By itself, the TPM is a completely passive device. All it does is measure software, and then reliably and accurately report those measurements. It is policy-neutral. Policy is what is done in higher levels of software, like the OS, or a media player, based on the measurements reported by the TPM.

The central problem, in my opinion, is to have a trusted entity to vouch for software that is outside software, since software cannot vouch for itself.

There was also much discussion about the attack model. The TPM’s attack model only aims to protect against software attacks — with physical attacks all bets are off.

An interesting point, raised by Christian Stork: why can’t I load my own root keys into the TPM (as opposed to the manufacturer-certified keys that are embedded in every TPM), self-certify them, and then use them within my own PGP-like web of trust? Honestly, I don’t know. Maybe because the TPM designers think that web-of-trust models won’t work with the mass-deployment scenario they have in mind.

There was aso a lot of discussion about security policies. This is a recurring theme in almost all security research. The techniques are, relatively speaking, easy. The hard part is to specify a policy.

Oh — and going off topic — since I hand-wrote these slides on my Tablet PC, people are always surprised by the novelty of seeing hand-written slides on a PowerPoint presentation. Like my officemate, Andreas Gal, said: “We started with handwritten transparencies, then PowerPoint decks, and now we’re back to handwriting”. I said, “Yeah — but this time, it’s been done right!”

Techniques for remotely stored, yet private, indices

In the wake of the debate over storing all our documents on Google’s servers to enable their “search across computers” feature, I went around digging for cryptographic techniques that enable indexing and searching data on a remote server, without the server learning anything about the documents or the queries.

Here are some links:

A comprehensive survey of secure indexing techniques.

A secure index is a data structure that allows a querier with a “trapdoor” for a word x
to test in O(1) time only if the index contains x; The index reveals no information about its
contents without valid trapdoors, and trapdoors can only be generated with a secret key. Secure
indexes allow a querier to check if a document contains a keyword without having to decrypt the
entire document

Searching Encrypted Data, by Song, Wagner and Perrig

In this paper, we describe our cryptographic schemes
for the problem of searching on encrypted data and provide
proofs of security for the resulting crypto systems. Our
techniques have a number of crucial advantages. They are
provably secure: they provide provable secrecy for encryption,
in the sense that the untrusted server cannot learn
anything about the plaintext when only given the ciphertext;
they provide query isolation for searches, meaning
that the untrusted server cannot learn anything more about
the plaintext than the search result; they provide controlled
searching, so that the untrusted server cannot search for an
arbitrary word without the user’s authorization; they also
support hidden queries, so that the user may ask the untrusted
server to search for a secret word without revealing
the word to the server.

So what’s holding back the application of these techniques in the real world? The scenarios for their use are extremely compelling, more today than ever.

Thanks to our resident database and security expert, Einar Mykletun, for the pointers.

Higher-order injection attacks on web applications

A higher-order injection attack happens when an attackers stores a malicious string into persistent storage, using a context where it is not considered malicious, to be executed at a later time in a different context where the string can actually do damage.

Borrowing an example from F. Valeur, D. Mutz, and G. Vigna: consider a script that deletes old users who haven’t logged in for a while:
delete_old.jpg

Here the username string is where the malicious payload is stored. Consider what happens if the username is ‘OR ‘1′=’1. When the username is read into this script, the last line has a WHERE clause that evaluates to true, and thus deletes all users.

As pointed out by the CSSE (context-sensitive string evaluation) paper, such higher-order attacks are particularly hard to guard against. Data coming from persistent storage such as a database is considered trusted to start with, and rarely checked. Runtime approaches that detect injection attacks by adding metadata to strings (such as CSSE, or our taint propagation approach) would actually need to also make their metadata persistent, and then restore it when persistent data is read back into memory. In implementation terms this is much harder because it touches parts other than the VM or interpreter — such as the database, or filesystem. Also, if metadata is written out to persistent storage, it might interfere with the operation of other programs that work off the same data.

Peer-to-peer and Trusted Computing

Here’s another example of how Trusted Computing can be used to provide application-level security guarantees.
Zhang, Chen and Sandhu investigate how Trusted Computing can be used to improve the integrity and authenticity properties of a distributed P2P network.

…we focus on the specific problems of data authenticity and integrity instead of discussing P2P security in general. We propose a general architecture that enhances the authenticity and integrity of data shared in these systems by using trusted computing (TC) technologies… Specifically, we propose a trusted reference monitor (TRM) in the platform of each peer beyond necessary trusted hardware and supporting functions. A TRM can monitor and verify the information a peer provides to ensure data authenticity.

They present an abstract architecture where every peer is endowed with a trusted reference monitor (TRM) that can gather and then transmit configuration data about its node to other peers (in the paper, they use Windows registry keys as an example). The TRM also acts as a mediator for access control decisions.

The basic trusted component is the trusted hardware including a TPM. A TRM is an application or service component running in the operating system’s user space, enforcing access control policies in general client-side platforms. The hardware, cooperating with the security kernel, provides necessary functions to the TRM, from basic cryptographic functions to platform and program attestation, and sealed storage for sensitive data.

p2ptrm.jpg
(diagram taken from their paper)

Currently, most P2P systems are ridiculously easy to game. There are simply no checks on what a node says, and it could be saying anything at all. This is precisely the reason that we picked Gnutella as one of the example applications in our original paper on semantic remote attestation. As they point out in their paper, semantic remote attestation is one of the underlying techniques that could be used as a concrete implementation of their trusted reference monitor.
Complete citation for the paper:

Zhang, X.; Chen, S.; Ravi Sandhu, “Enhancing data authenticity and integrity in P2P systems,” Internet Computing, IEEE , vol.9, no.6pp. 42- 49, Nov.-Dec. 2005

Talk on web application security

First slide of the talk

I’m going to give a talk about web application security in a seminar class held by my advisor Prof. Michael Franz later this afternoon. This blog post is supposed to be the accompanying “see here for more” link for the talk. Here are a few resources and pointers to go look at if you want to dive deeper into some of the topics I’m going to talk about.

The OWASP page is a great resource for web app security in general. It’s the home of the top ten web vulnerabilities, as well as WebGoat and WebScarab.

I maintain a list of research papers on the topic of web application security, with a strong tilt towards beating command injection attacks. There’s also a related doodle of the various proposed solutions. This area has gotten a lot of attention from CS researchers lately.

Here at UCIrvine, we’ve done some work on hardening the JVM against attacks on web applications. I presented a paper on this at the last ACSAC. Here’s the paper (Taint Propagation for Java - PDF), and here are the slides for that talk (PDF).

Finally, here are the slides of the talk. (PDF)

Various solutions to web application security

Was doodling around on my Tablet, and drew a map of the space of solutions for web application security. (Also, see my list of research papers on the topic).

Various solutions for web application security

Why Owner Override is not a solution for the problems of Remote Attestation

Seth Schoen has proposed owner override as a mechanism to mitigate some of the treacherous aspects of remote attestation:

TCG should empower computer owners to override attestations deliberately to defeat policies of which they disapprove. Giving the owner this choice preserves an essential part of the status quo: third parties can never know for sure what’s running on your PC. TCG already defines a platform owner concept. The TCG specification also should provide for a facility by which the platform owner, when physically present, can force the TPM chip to generate an attestation as if the Platform Configuration Registers (PCRs) contained values of the owner’s choice instead of their actual values.

Stefan Bechtold has a critique of owner override. Besides that, I believe owner override suffers from another fundamental shortcoming.

The reason trusted hardware (in the form of a TPM, in this case) is needed is that software cannot vouch for its own integrity. Its easy to simply go one level of abstraction below any piece of software, subvert it at the lower level, and make it believe that everything is OK. This is the fundamental principle that rootkits use. Secret keys cannot be stored in software because other software can easily snoop it. So we need something outside the software stack — trusted hardware — that can reliably and accurately measure software integrity, and then also securely report it (to either the local user, or remote entities).

The hardware must be designed in such a way that its measurements and reports cannot be forged and tampered with. That is the central guarantee that trusted hardware buys us. And if the user were able to “forge” PCR entries in the TPM, that would simply invalidate the entire design of the TPM.

Besides, owner override will probably be performed on behalf of the owner by a piece of software. What if the owner-override software is taken over by malware? The possibility certainly exists. Then we are back to square one, and we might as well have had systems without a TPM in the first place.

Owner-override also has implications for sealed storage. The sealed storage functionality in the TPM binds encrypted data to a particular PCR value (this has its own problems and people have been investigating alternatives). The idea is that deviations from that PCR value indicate that your system has been compromised, and so the data will not be exposed in the clear. If I could put in my own PCR values, sealed storage would be useless.

In effect, owner override completely demolishes the very technical end for which trusted hardware was put into systems in the first place.

I don’t think that we need to throw the baby out with the bath water. I agree with Seth when he says that remote attestation has a number of flaws (I’ve pointed out a few in our paper). However, I also think that Trusted Computing can significantly help with security in general, and rather than severely crippling it, we should look for technical means to overcome its shortcomings. There are a number of proposals out there that recognize the problems with remote attestation, and present alternatives:

Philip Wadler’s blog.

Trusted Computing, P2P networks and (semantic) remote attestation

Quoting from a paper titled “Remote Attestation and Peer-to-peer Networks” by Ville Likitalo:

Considering remote attestation in a peer-to-peer network
context, we conclude that semantic attestation offers several
advantages over the other models
, provided that a trusted
platform is available. However, a combination of crypto-
graphic attestation and proof-carrying would be a very suit-
able solution for a mobile operator. A reputation based
model does not require a trusted platform if a trusted 3rd
party is available and therefore is the only viable alternative
for current solutions so far.

(emphasis mine)

Feb 3 Update: Link to paper was broken - fixed.

Semantic Remote Attestation: Open Questions…

Stefan Bechtold talks about our paper on semantic remote attestation (local copy - slides of my USENIX VM talk) over on this TC Blog. In short, we think that remote attestation as it is now is fundamentally broken (see the paper for a long explanation of why), and semantic remote attestation tries to fix all that and attest program behavior rather than program binaries.

Stefan raises a couple of questions about our technique:

However, it seems to me that the proposal suffers from at least two limitations: First, although the authors give some examples of what kind of behavior their trusted virtual machine could analyze, it is an open question whether this mechanism can provide all the different kinds of information a remote challenger needs when he does a remote attestation. More work and explanation is probably needed in this area. Secondly, and more importantly, this proposal only works with applications written in Java and similar languages - as the language-based trusted virtual machine needs the high-level information that is included in Java bytecode.

(emphasis mine)

To address the first point: I think right now there is no general consensus on what kinds of information a remote challenger needs. There are some probable scenarios though: DRM comes to mind (and not just for corporate interests, for regular people too), as well as high-level application-specific properties that vary from application to application and are hard to generalize. In the case of file-sharing applications, one might want to enforce polices of the form “share file X only within group G”. Currently, we simply trust a client to respect such policies, with no way to either enforce or check them ourselves. For example, it turned out that a popular BitTorrent client was not properly respecting the “private” flag on torrents. Another related property is that of controlling information flow within a program. This comes up whenever a program manipulates sensitive information (financial records, confidential documents) on our behalf. We are currently working on ways to enhance the JVM with dynamic information-flow control capabilities, with the final goal of being able to remotely attest such properties.

Ultimately, semantic remote attestation itself is a policy-agnostic technique (as it should be!) and, in my view, has the capability to remotely attest a broad variety of high-level policies.

Now, the second point — this indeed is the more important, and more interesting question. Imagine that what we call the TrustedVM isn’t the JVM, but something lower on the software stack. Say, the operating system. Or even lower — the x86 instruction set. As we go lower in the abstraction tower, our “funnel” gets wider — we’re able to capture the behavior of more and more software. At the hardware instruction set level we can capture the behavior of the entire system (indeed, projects such as ReVirt have done this). There is a tradeoff though: lower levels of abstraction have lesser high-level information, and consequently, it is hard to make high-level policy decisions at a low level. For example, the OS is not aware of application-level policies such as “the reply to such-and-such network request must be of such-and-such form and possess such-and-such properties”. And don’t even think about doing such things at the ISA level.

So where does that leave us? We’re stuck with the age-old dilemma of choosing a sweet-spot along a tradeoff curve. Where is that sweet-spot? I honestly don’t know. But with more software being targeted to managed runtimes, my bet would be that the sweet-spot is at the level of the language runtime. And in the meanwhile, it would really interesting to see how the ideas of semantic remote attestation can be applied at the level of the operating system by (the age-old) technique of syscall interception.

Added a couple of new papers to the list of web application security research. Thanks to Benjamin Livshits for pointing out the paper by Yichen Xie and Alex Aiken.

Emerging Trends in Information and Computer Security - Jan 22, 2006.

Research on Web Application Security

This is my list of research in web application security. It is
incomplete and ever-growing.
. Send me mail if you think
something should be added here.

Web Application Security Research
Paper Authors Language/platform Links

Finding Security Vulnerabilities in Java Applications with Static Analysis, USENIX Security 2005. Benjamin Livshits and Monica S. Lam Java Virtual Machine
Finding Application Errors and Security Flaws Using PQL: a Program Query Language, OOPSLA 2005. Michael Martin, Benjamin Livshits, and Monica S. Lam Java Virtual Machine
WebSSARI — Web application Security via Static Analysis and Runtime Inspection Yao-Wen Huang, Fang Yu, Christian Hang, Chung-Hung Tsai, Der-Tsai Lee, Sy-Yen Kuo PHP WWW ‘04 paper

AMNESIA: Analysis and Monitoring for NEutralizing SQL-Injection Attacks, ASE 2005. W. Halfond and A. Orso Java Virtual Machine
The Essence of Command Injection Attacks in Web Applications, POPL 2006. Zhendong Su and Gary Wassermann Language agnostic, but evaluated with JSP and PHP.
Static analysis of role-based access control in J2EE applications, TAVWEB 2004 Gleb Naumovich and Paolina Centonze Java Virtual Machine
Automatically Hardening Web Applications using Precise Tainting. Anh Nguyen-Tuong, Salvatore Guarnieri, Doug Green, Jeffrey Shirley, David Evans. PHP
How safe is it out there? Moran Surf and Amichai Shulman. Study of web app security

OWASP Top Ten Vulnerabilities in Web Applications Open Web Application Security Project
Java String Analyzer Aske Simon Christensen, Anders Mller and Michael I. Schwartzbach SAS’03 paper

Static Detection of Security Vulnerabilities in Scripting Languages Yichen Xie and Alex Aiken PHP
A Learning-Based Approach to the Detection of SQL Attacks F. Valeur, D. Mutz, and G. Vigna. Languge-agnostic — works at DB level. In Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), Vienna, Austria, July 2005.

Using Generalization and Characterization Techniques in the Anomaly-based Detection of Web Attacks W. Robertson, G. Vigna, C. Kruegel, R. Kemmerer. Languge-agnostic — works at DB level. In the Proceedings of the 13th Annual Network and Distributed System Security Symposium (NDSS)

Defending against Injection Attacks through Context-Sensitive String Evaluation Tadeusz Pietraszek, Chris Vanden Berghe At the VM level (PHP and JVM) Project webpage

Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. James Newsome and Dawn Song. x86 ISA In Proceedings of the 12th Annual Network and Distributed System Security Symposium (NDSS ’05), February 2005.

Taint Propagation for Java, ACSAC 2005. Vivek Haldar, Deepak Chandra and Michael Franz Java Virtual Machine Slides of talk

Demo of our taint propagation scheme.

Talking about taint propagation for Java at Acsac

I presented our paper on taint
propagation for Java at Acsac this
past week. My co-authors were
fellow grad student Deepak Chandra
and our advisor Prof. Michael Franz.

The questions and feedback I
got sometimes caught me unawares
and made me think about
things I hadn’t considered so far.
Many thanks to Scott Stoller ,
David A. Wheeler and
Dave Wichers for their questions
and comments. And, of course, to
the anonymous reviewers too, of
which there were no less than
four.

Quick summary: our goal is
to combat vulnerabilities in web
applications arising from improperly
validated input
. This, by the
way, is the largest single source
of security holes in web applications.

We have built a runtime
mechanism for the Java virtual
machine that marks untrusted
input as “tainted” and,
until it is cleaned
by some validation routine,
prohibits its use in security
sensitive methods, such as those
for executing SQL queries.
This is very much like
the taint mode in Perl, and
in fact, that was the original
inspiration for our work (though
we came at it from a totally
different perspective—see
this paper for background).
But our technique is more
flexible because the sources of
tainted data and the methods
where tainted data must not
be used (sinks) , are not
hard-coded into the interpreter
(like Perl), but are
independently specified.

This is a problem that is the
target of much research right now.
Proposed solutions include purely
static analysis techniques, purely
runtime techniques (like ours)
and every thing in between,
each with its own unique
trade offs.

The paper:

Taint Propagation for Java
by Vivek Haldar, Deepak.
Chandra and Michael Franz

Sides from the talk I gave.

DRM is for regular people too

Slashdot snip of BitComet article

Mention the word (or acronym) DRM and some synonyms immediately come to mind: draconian, Big Brother, evil. Or so the mainstream press and many other people would have us think. It seems as if DRM, and its vehicle, Trusted Computing, is just a plot for big media and big computing corporations to enslave us all, and make us pay for every pixel and microsecond of content, and in general, own our souls and condemn our progeny to bonded labor.

This story is the perfect example of how DRM is not just for the Disneys and Sonys of the world, but for regular people too. In short, BitComet is a bittorrent client that does not respect the “private” flag in torrent files (under certain conditions). This is useful for closed or private communities that want to share torrents only within the community. BitComet allows sharing of private torrents.

DRM is for regular people too. I want a way to share large files (pictures, videos) with only select people. I want a way to ensure that the programs which I use to do that will actually respect my rules, i.e., my digital rights. I want a way to ensure that marking content “private” is not just an informational attribute that can be overridden by a buggy or malicious program, but that the rule is actually enforced, and moreover, I know that it is being enforced.

Much of what is written about DRM is driven by fear and paranoia. If DRM can help the MPAA protect their content, it can help me protect mine too. Imagine how much easier it would be to share content online for individuals if DRM actually worked. While we’re cooking up doomsday scenarios where the MPAA owns my machine, why don’t we also think about some interesting ways in which individuals can make use of DRM?

ACSAC 2005 at Tucson, AZ

I attended the Annual Computer Security Applications Conference 2005 (ACSAC) at Tucson, AZ, last week. I was presenting a paper in the tech sessions (more about that in another post). Some of the highlights:

* Dave Wichers of Aspect Security taught a full-day tutorial on web application security. This was the most fun part of the conference for me, because Dave sprinkled the class with hands-on sessions on WebGoat, a set of web applications for courses just like this, with a number of vulnerabilities built in, which students must figure out and “exploit”. At the end, after the end of the class, Dave and a few of us stayed back to take a stab at the WebGoat challenge, a tough series of vulnerabilities in WebGoat, outside of the usual “teaching” vulnerabilities.
* David A. Wheeler gave an invigorating talk on beating the famous “trusting trust” attack first proposed (and seemingly even carried out) by Ken Thompson in his Turing Award lecture. This is particularly pernicious attack, and as David pointed out, by planting backdoors in two compilers (GCC, and the Microsoft C compiler), an attacker could control more than 90% of the world’s computers.
* Lorenzo Cavallaro presented a paper on finding (and fixing) a replay attack vulnerability in one of the TPM’s core authentication protocols. The lack of formal proofs (or even modeling) of security properties of the TPM makes this a wide open field for research.
* The New Security Paradigms Workshop panel: Four authors from this year’s workshop were selected to be on the panel. Julie Thorpe talked about their authentication scheme called passthoughts. It is exactly what it sounds like — authenticating users by their thoughts. Users are rigged with a brainwave recorder (EEG?) and shown a series of pictures at random, one of which is their “chosen” picture, and when they see that, there’s a noticeable change in their brainwaves. The major bottleneck in the approach right now is the technology for recording brainwaves. This could be the “perfect” authentication scheme — no shoulder surfing, and no forgetting passwords. It’s a little sci-fi too!
* Also on the NSPW panel: Richard Ford lamented the dire situation of proliferating internet worms and viruses, and proposed a provocative scheme to battle it. He noted that after the Slammer outbreak, there was a marked drop in worm infections for a while because everyone was scared into patching their systems. So why not purposely release a (not too harmful) worm from time to time to whip people into hardening their systems? Looking to natural systems for an analogy, he compared this to a controlled forest fire. This sparked much discussion — what are the ethics behind this? How do you construct such worms? What gives any one agency/person the right to do this? But then one of the goals of this was to get people talking about the problem in creative ways.

Overall, ACSAC is a unique conference because it serves as a great middle ground where both academia and industry come together to talk about security. That makes it forward-looking, and yet grounded in very real, current problems.

Symantec Internet Security Threat Report

Security Statistics