Sampled Security Policies

There’s this vague question I’ve been tossing around in my head for a couple of days - what is the class of security policies that one can enforce by sampling the execution of a program? Note that this is not the same as inlining reference monitors.

First of all, we need to precisely define “sampling”. How much state of the program do we see? The entire heap? The PC? The stack? The current value of all registers? How do the properties covered improve or degrade as we increase or decrease the state covered?

Then — what is the granularity of sampling? Instructions? Basic blocks? Random points in the program?
Also, sampling can not enforce a property with certainty, but with some probability. How can we make this probability high?

Like I said, this is just a fuzzy notion, and I have no concrete answers. For all I know, it might turn out that the class of properties enforceable through sampling is very small, or very weak. But it would be nice to know anyway.

PhD dissertation LaTeX template for University of California at Irvine

Sadly, UCI does provide a LaTeX template for dissertations. Getting the formatting right without a template can be a major pain, so here are the templates that I used.

LaTeX UCI thesis class and sample LaTeX file (ZIP file)

Credits: I initally got this template from TeX guru and ex-officemate Jeff von Ronne. I’ve made a tweak or two, but it’s 99% Jeff’s work (and lots of people before him too — see the class file for more credits).

O Google, please make this the first hit for the next poor grad student who searches for “uc irvine dissertation template”!!

Riya and privacy

Srijith commented on my earlier post about trying out Riya, and pointed out that people had been raising privacy concerns about the product.

Ben says: what if Riya recognizes and tags pictures of me in public that other people have taken, that I don’t necessarily want to be associated with? He uses the example of visiting an erotica event.

To which Munjal, one of Riya’s founders, replies:

Riya doesn’t look for you or tag you in another’s photo unless they train us and have you in their friends list. Even if they do, it is not public unless they make it public.

I think that’s a fair enough reply.

I am not a  lawyer, but my perspective on this: when you’re in public (on the street, at an erotica expo etc) you don’t really have any reasonable expectation of privacy, so you can’t say “my privacy was violated” if someone recognizes you.

My only concern about Riya doesn’t even have anything to do with facial recognition — it’s just that (to me, at least) my photos are rather personal things, and I do feel a little queasy about uploading all of them to a server that I don’t control. But this is pattern that comes up again and again, with any hosted service — Gmail, online calendaring, online file storage etc. If only there was a way for me to encrypt everything with a key that I own, and then upload everything, that would be great. But in general, this is a hard problem — how is the remote service going to work on your data if its encrypted?

Trying out Riya beta

Waiting in my inbox this morning was an invitation to try out the beta of Riya. So I signed up, downloaded their uploading tool, and started uploading.

riya_logo.gif

The uploading tool not only transfers the photos to Riya, but also performs the actual recognition of faces. When the pictures are uploaded, you can log into the web site and start attaching names to recognized faces. Once you give it 20-30 samples of a specific person, it does a pretty good job of recognizing more pictures of the same person — at least for frontal shots. Later, you can go pick out the pictures it recognized incorrectly.

Another nice touch — it does OCR on text in pictures. But this seems to work only with more or less horizontal text, and with fonts that are common.

I never thought I’d use an online service for storing all my pictures because I just love having them locally, to browse whenever I like. But this is compelling enough that I think I’m going to start slowly uploading all my pictures to Riya.

Number one wish for Riya: have a local client that allows offline browsing of your photo collection, with facial recognition. Or better still, standardize a format for specifying regions within a picture, and metadata associated with it (e.g. the rectangle from (x1, y1) to (x2, y2) has Vivek’s face), so that multiple clients can use it. Not sure how well that’s going to work with their business model, though.

Overall, I think this has the potential to be something really great!

My dissertation

I finally submitted my dissertation to the UC Irvine library on Friday. After my final defense, this was the last hurdle to jump, and I have now officially completed my doctorate.

Here’s my dissertation. (PDF, 3.5 MB)

page1.jpg

To cut a long story short, here is my thesis:

Remote attestation, one of the core mechanisms of Trusted Computing, can be
performed in a way that:

  • reasons expressively about program behavior and dynamic properties
  • enables a flexible, graded notion of trust
  • avoids intractable management problems at both the client and server end
  • does not tie attestation to a specific executable binary

In short, remote attestation can attest program properties, rather than program
binaries. I call this semantic remote attestation.

Can you patent thoughts? (by Michael Crichton)

RFIDs and command injection attacks

There’s two phrases you never thought would go together! What do RFIDs, those tiny, powerless, passive things, have to do with havoc-wreaking command injection attacks?

Andrew Tannenbaum’s group has just published a paper that shows how seemingly innocent RFIDs can be used to inject malicious code into the backends that process their data:

One day a container arrives in the supermarket distribution center that is carrying a surprising payload. The container’s RFID tag is infected with a computer virus. This particular RFID virus uses SQL injection to attack the backend RFID middleware systems.

There you go — another channel for tainted input.
This so-called “taint problem” comes up everywhere. Most recently, its been brought to light because of its implications for web application security, but if you dig deeper, its a much more fundamental problem that comes up whenever your program deals with untrusted input.

I think the long-term solution to this is to have fine-grained labeling of data, along with a way to propagate those labels. We’ve made an initial jab at the problem, but that’s only scratching the surface. How long before labeling becomes a first-class abstraction in a language? In a virtual machine runtime? Or even at the architecture level in hardware?

Broken government websites

inswebsite.jpg
Shouldn’t government websites try to be a little more cross-platform??

What I want in a blog reader…

I’ve been using BlogBridge for a day and its now my blog reader of choice, for several reasons:

  • feed ratings help when you have to quickly scan though a hundred feeds
  • the little activity indicators next to each feed are a really good visual indication of which feeds have new stories
  • you can sync your feeds up to a server (need to sign up for that) and then sync them across all your clients
  • its in cross-platform Java

blogbridge.jpg
bbactivity.jpg

All the same, there’s still plenty more I want in a newsreader. My number one gripe: newsreaders have to effectively summarise and present information from a large number of feeds. All current newsreaders work on the “feed-browsing” model, where the user clicks on a feed, reads its stories, moves on to a different feed and so on. I want my newsreader to be a statistical clustering engine that automatically learns from my reading habits, and also clusters similar stories from across different feeds into a coherent view. In short, a newsreader has to make it easy for me to keep up with hundreds of feeds without having to scan each one of them.

Announcing Shop Finder

Ever wanted to find a place where all the shops you wanted to go to were close together so that you could drive to one place and be done with it? Ever gone out to lunch with friends, only one wants to go to Subway, another to Quiznos, and another to Carls Jr, and wanted to find a place where all three were next to each other?

Then Shop Finder is for you. Enter two or more shops, or restaurants or any business at all, and a location, and it will go and tell you where you can find all of them close to each other.
clustershop.jpg

This is a quick and dirty early release, and there’s still plenty of stuff I want to do with this if I have some spare time on my hands:

  • Brush up the interface — show links to the shops, present clusters in a way that doesn’t look like debug output etc.
  • Let the user tweak the cluster radius — that’s how close two shops should be to be considered as part of the same cluster.

In the meanwhile, I hope someone finds this useful. Feedback is most welcome. Leave comments here, or email vh@vivekhaldar.com
This is what I was getting at with a couple of earlier posts.

OK, take me to Shop Finder already!! 

Finding clusters of locations on maps

So here’s a problem I faced while trying to build a little map application:

Say I have the locations (both in terms of street addresses and latitude/longitude) of various stores in a given vicinity. E.g. All Frys, Circuit Citys, Best Buys etc in my location. Now I want to find clusters where all of them are close to each other. So, for example, I want to find a cluster where a Frys store, Circuit City and Best Buy are all “close” to each other. Why? So that I can go visit all of them at once! I’m willing to travel some extra distance away from my location if I can find such a cluster. What’s an efficient algorithm to find such clusters?

clustermap.png

Here’s a possible algorithm:

(1) Let S1, S2… Sn be the sets of locations of each shop. Pick the smallest one. For every shop s that has location (x, y) in this set:

(2) Find shops in each of the other Si that are in the location range (x +/- delta, y +/- delta), where delta is the threshold for determining what’s inside your cluster. These shops go into the “current cluster”.

Let’s analyze the runtime of this algorithm. For simplicity, assume that there are m sets of shops (S1… Sm) and that each of the sets has n elements.

Sort each of the sets both on x and y. Each can be sorted in O(n log n) time, and there are m of them, so this step takes O (mn log n)

Step (2) above can be done with a binary search (that’s why I sorted the sets first) in O(log n) time, and it needs to be repeated n times, so this step takes O(n log n) time.

So our total running time is O(mn log n + n log n) = O(mn log n).

The numbers of stores of which we’re looking for clusters, m, is usually a small constant less than 5. So we basically have an nlogn algorithm for doing this.
Does anyone know of a faster way of doing this?

Web browser security is hard…

director.gif

I can’t imagine using delicious without this fantastic tool called Director. It uses a nifty trick to plant its own UI over a user’s delicious bookmarks page:

This project uses the only reliable loophole for executing foreign Javascript code: the bookmarklet bootloader. It works by inserting a