Understanding Google's Bug Bounty Program

Some people have taken Google’s idea of offering security bug bounties, and taken them to their logical conclusion: why stop at security bugs? Why not incentivize reporting of ALL software bugs with bounties? Aren’t other companies cheap for not offering bug bounties?

Questions along these lines misunderstand how software development works. Engineers don’t sit on their hands and surf Reddit after shipping a product. They’re already working on bugs. All sufficiently complex software ships with known bugs; more reporting isn’t likely to change whether they get fixed or not.

So the premise that “reporting more bugs will improve software quality” is speculative, at best. Software quality is determined by what the market will bear. The market usually rewards buggy-but-good-enough software that solves a problem now, rather than perfect solutions that are late to the party. This is partly because of the time value of software, and partly because chasing defects offers diminishing returns.

But more to the point: everyday users are not equipped to report bugs. They don’t have the training, tools, or motivation to do it properly.

At Microsoft (and other software companies), crash dumps already include as much information as can be legally collected, based on the user’s consent. So bug information for crashes and many other issues (uncaught exceptions, for example) are already being collected in an automated, accurate way. So really, any well-supported software already has built-in reporting for most high-impact bugs.

In addition, you run into other problems with bug bounties:

  • Common bugs will be over-reported, wasting everyone’s time. Customers are already motivated to report these since they want them fixed.
  • Unexpected, but by-design behavior will be incorrectly reported as bugs.
  • Problems will be exacerbated by the bounties, increasing volumes and decreasing quality. This actually creates noise around what is really important to fix (e.g., the defects users would report even if there was no bounty).
  • A bug bounty program isn’t free. Someone has to triage the input and it’s a zero sum game: either a developer does less productive work sorting through bounty submissions or headcount grows. It’s not like you can fire the testing team.

The only exception to this rule is security bugs, which operate a little bit differently than run-of-the-mill defects:

  • There is already a market for security bugs, which can be sold to hackers. The developer is simply trying to outbid them to keep the product secure.
  • This means there’s already a set of professionals who are hunting for such bugs; professionals are much more likely to find bugs on account of understanding how software is designed and implemented.
  • Users are unlikely to notice or report security bugs since they generally don’t obstruct functionality, meaning there are fewer dupes to wade through, and bug reports will be of higher quality on average.

So in my opinion, paying bounties for security bugs can be effective, but its not likely bounties for functionality bugs in the general case would be particularly productive.

OAuth Hell

It’s a pretty sad fact that OAuth has come to be a de-facto industry standard for API authentication, because OAuth is so broken.

Before OAuth, creating and consuming APIs across services was hell. We mostly just did stupid stuff like asked users for their passwords, so we could log in on their behalf and maybe do some page scraping stuff. If a proper API actually existed, they probably implemented a custom authentication protocol that required you to read their exact implementation, permissions, and handshake procedure, making the interface un-reuseable.

Ideally, OAuth would come along and solve these problems for us by:

  • Allowing untrusted applications to perform actions on behalf of a user at the API provider.
  • Authenticating the user’s permission to perform said actions, without divulging the user’s password.
  • Selectively granting permissions to an untrusted client, to prevent hijacking of account & login details.
  • Revoking the client application’s privileges at command of the user, without requiring a password change.
  • Promoting code reuse through a standard protocol for negotiating access to an API provider.

OAuth takes every single one of these requirements…and partially solves all of them.

While OAuth is conceptually great, and is much clearer in the 2.0 spec, it still contains a number of warts that make it a complete pain to integrate. Consider this:

  • There’s basically no standard for how to implement an OAuth provider. Try pointing an OAuth client at a different provider and try to count the number of changes you have to make to get it working. It’s mind-boggling the number of unique tweaks and quirks that providers come up with. The whole concept of interoperability is thrown out the window, and you have to go back to “well, it generally works this way, but you have to read all their developer docs and spend an afternoon conforming to the custom design.” And this is all by design! Read it straight from the spec:

    …this specification is likely to produce a wide range of non-interoperable implementations…
    …with the clear expectation that future work will define prescriptive profiles and extensions necessary to achieve full web-scale interoperability.

  • Scopes are pretty much a crapshoot. Take a look at this passage from the spec:

    The value of the scope parameter is expressed as a list of space-delimited, case-sensitive strings. The strings are defined by the authorization server.

    So…how do you find out what scopes are supported, allowed, or required? Surprise! You don’t. You have to read the developer docs. Assuming they are posted. More generally, it’s impossible to programmatically register a client, learn about server capabilities, discover endpoints, and most other things. Which means hours of slogging through documentation to manually code these into the OAuth client, all so you can do it again for the next provider. Did Amazon’s success in services teach us nothing about the value of being able to programmatically discover, query, register with, and use a service?

  • OAuth web flow requires you to visit the provider website through your browser. This makes sense, of course, since you need to authenticate with them before you can authorize the client app. But this flow doesn’t work on mobile, in which case you need to use a different flow, one that requires you to enter your password into the untrusted app. Ugh. To date, there hasn’t been a really good mobile story here from anyone, and we’re still in the dark ages as far as mobile apps are concerned. Which is a shame, because back in my day, we used these things called web browsers.
  • There are security issues you can drive a truck through. Consider this cogent explanation by John Bradley of how granting access to an OAuth client application also gives it the ability to impersonate you at *any other* OAuth client for that provider:

    The problem is that in the authentication case, websites do have a motivation to inappropriately reuse the access token.  The token is no longer just for accessing the protected resource, it now carries with it the implicit notion that the possessor is the resource owner.

    So we wind up in the situation where any site the user logs into with their Facebook account can impersonate that user at any other site that accepts Facebook logins through the Client-side Flow (the default for Facebook). The less common Server-side Flow is more secure, but more complicated.

    The attack really is quite trivial once you have a access token for the given user, you can cut and paste it into a authorization response in the browser.

OAuth is an ambitious project that has given us a glimpse at how awesome an interactive web can be. It’s just a shame that this is what we’ll have to settle for given the slow pace of improvement in such a widely used authorization framework.

What do I need to get into Software Engineering?

I’m often asked a question along the lines of “how do I get into software if I studied x instead of Computer Science?” Often x is a closely related field, such as electrical engineering, but I sometimes get this question from, say, English majors as well. So my answers will depend on how much catch-up there is to do.

One of the things I often skip is trying to determine if someone should make a lateral change, and just assuming it as a given. Software is both a very well paying job, and yet there’s a big shortage of engineers right now even in a deep jobs crisis. There’s a reason for that — it’s not everyone’s cup of tea. So before we get to the how its sometimes necessary to get to the why. If you’re looking to get in because of the money, but aren’t really all that interested in computing or math, then that raises the bar a lot in terms of what you have to be willing to put up with. I’m not going to be an idealist and say that you should only have a career doing something you love – the reality is that we mostly work so we can do the things we like; its not very often you truly love your career. So it’s a good idea to ask yourself if you really, really, want to spend your entire life staring at and discussing what appears on a glowing rectangle and being misunderstood by people who think you “fix computers” for a living.

But let’s assume you’ve gone through that discussion with yourself rationally, and you’ve come to the determination that you want to be a software engineer. I’m emphasizing the exact terminology here because this is my only area of expertise, and I can’t offer advice on how to become a repair technician, sysadmin, or database administrator. In most software companies, software engineering doesn’t even fall under “IT.” So you know what a software engineer is, and you want to become one. Let’s look at your options:

1. Go back to school for Computer Science: This is an obvious one, and yet it’s generally not what people have in mind when they come to me asking for advice. This should be the first option you consider, both because its low-risk and your best shot. If you have to rule this out, then you probably don’t want it bad enough and should stop considering a career in software. Sorry.

People seem to think that rubbing shoulders with engineers gives them an insider who can pull strings and show them a painless backdoor to becoming a well-paid engineer at a top-5 firm. Either that, or they are waiting for me to say “well, normally, I’d recommend you get a degree, but since you’re so exceptionally smart…” Folks, it doesn’t work that way. I have to say this because there’s a lot of misconception out there about qualifications; I wrote a whole essay on this – see why the degree matters. The short story is that proven ability and potential matters far far more than “who you know” or “what you think you know.” You’re not as good as you probably think you are.

2. Fill in the gaps and build experience: This is for people who are in a very closely related field, such as degree holders in mathematics or mechanical engineers. In order to pull this off successfully you need to both be very disciplined, self-motivated, and perhaps a bit lucky. You might need additional instruction in things like data structures, algorithms, or operating systems principles, but perhaps not a full-on second degree. You’re going to need to build up a portfolio of experience to show prospective employers. And you’ll need to have good grades from school — I’d venture at least a 3.2, scale up if your school isn’t highly ranked (if you don’t know if you are, then you aren’t).

Regarding your experience, you should either start or contribute to an open-source project, publicly available software release, or a startup or other smaller company willing to take a risk on you. Your title should be “software engineer” or very similar. You need to be writing code and learning at least 2 modern languages, preferably 3.

Your goal is to make yourself appear “serious” about software engineering, and simultaneously to build up the programming experience that allows you to keep up when you’re at a top firm that will expect it. Your degree will serve as an indication that you’re smart enough to do the job, and your resume is to convince them you have the right skillset. A computer science degree is supposed to do a little bit of both (in a “necessary but not sufficient” sense), but you’re piecing those two together without a degree.

3. Charge head-on. This is the step most people want me to recommend, because it’s the quickest: all you have to do is update your resume, study a few interview questions, and apply! The reality is that most people who didn’t do either of #1 or #2 will not be ready and will not succeed. The reason I say this is because this is an appealing, and somewhat dangerous option. The danger is that you will fail, but you won’t know it until much later. You see, HR departments and recruiters mostly just scan for keywords when looking at resumes, and because the employment field is so red-hot, you’ll start getting phone calls and emails just by putting yourself out there.

“Wow,” you’ll say to yourself. “Everybody wants me. I must be pretty good.” Except this is an illusion. Recruiters are practically BEGGING anyone with “Java” or “Python” on their resumes to go interview. It doesn’t mean you’re prepared to be a top engineer.

Next, you might even pass a first round interview just by virtue of knowing what a while-loop is and being able to explain class inheritance (the standards can be low, believe me). Then, odds are better than not that you’ll hit a brick wall — or worse, you’ll get a lucky dice roll and get a job offer.

Here’s the problem: the variance in impact in engineering, and the corresponding amount of pay, is massive. What I mean is that a GREAT engineer is easily worth 3-4 good engineers. And the best are going to be 10 times better — even at the same experience level.

Because of this, software is a winner-take-most kind of economy. All the best firms are wildly successful — and don’t have to deal with unproven candidates; they just offer 50-100% more salary and options than the median and get more applications than they can handle anyway. In addition, working at anything but the very best firms means you won’t be developing to your full potential as quickly as you might. If you settle, you will be making significantly less now, and you’ll have a slower trajectory on top of that. So the opportunity cost of not having the degree can be very high.

In this context, even if you can get a software job now, unless you are already getting offers (not interviews) from top firms, investing in a computer science degree can pay for itself in as short as 2-3 years, and from then on it’s pure icing on the cake. In the worst case, you’ll find that computer science isn’t for you anyway, and you’ll save yourself a lot of pain.

You have to pay it all back eventually

“You have to pay it all back eventually,” he admonished me. There was a long silence; I hesitated to openly disagree. This stranger, after all, was many decades my elder, and the commander of the vehicle I rode in, at that.

It was a chilly October, and the discussion had turned to the fiscal stimulus. I seem to have unwittingly broken the rule entitled acceptable topics for smalltalk.

He continued passionately about the evils of borrowing, of the importance of fiscal responsibility, and I concurred unconvincingly. He sensed something was wrong and fell silent, an open offer to change the topic.

You see, he hadn’t paid off his mortgage yet.

Shut up with this "You Are the Product"

Idiots repeat catchphrases to show off faux intelligence, but they really only serve to remind everyone that the Real World does not operate on catchphrases. The latest example that has been bothering me is this “you are the product” revelation about Facebook, as if someone had finally, just now, (in 2012!) realized that we have all been duped. Here is Douglas Rushkoff – who calls himself a “media theorist”:

“Ask a kid what Facebook is for and they’ll answer ‘it’s there to help me make friends’. Facebook’s boardroom isn’t talking about how to make Johnny more friends. It’s talking about how to monetise Johnnny’s social graph.”

He added: “Ask yourself who is paying for Facebook. Usually the people who are paying are the customers. Advertisers are the ones who are paying. If you don’t know who the customer of the product you are using is, you don’t know what the product is for. We are not the customers of Facebook, we are the product. Facebook is selling us to advertisers.”

Aside from the fact that “media theorist” isn’t a real title or profession, this analysis is wrong on multiple counts. It’s a narrow-minded half-truth repeated to sound all new-age and post-modern, but doesn’t hold up to even light analysis.

Firstly, “you are the product” is not a novel insight, nor does it capture the essence of the relationship between a user and the company. Is Facebook somehow different from free TV, free newspapers, specialized magazines, or the millions of other websites out there? Trading attention and information for a service or product is not new, nor is it cynical and evil. Rushkoff is right that “usually the people who are paying are the customers,” but he fails to make the obvious connection: that money isn’t the only form of payment – time and information work just as well. It’s also insulting to the audience to presume they don’t understand that free products may be sponsored by ads, which is basically what “you are the product” boils down to. Doesn’t sound so smart now, does it?

Facebook’s boardroom very much does care about how to help people make more friends. A social graph isn’t very valuable if it, you know, isn’t a graph. The fact that Facebook is trying to monetize its platform shouldn’t be surprising, and its a false dichotomy to conclude that because they want to make money, that they therefore don’t care to serve their users or their interests. Facebook is a comic-book villain, and it’s cackling as it exploits its userbase without regard to the long term consequences. Please.

The Facebook platform is a product. The New York Times website is a product. To argue otherwise ignores the reality.

Internet users are notoriously demanding: they insist that software be high quality, responsive, and free of advertising and intrusive privacy policies. But most of all, they want it to be fucking free. All else is secondary, and easily traded away. People are not doing this subconsciously. They are not being manipulated like sheep, or subversively being tricked into signing up for Facebook, Gmail, Dropbox, Pandora, and Twitter.

This is important because it brings me to my next point: the market has spoken, and it wants free. Everything on the internet must be free. It must be free of ads too, but for literally 99.9% of people, that is a small concession to make if it means Free. Anyone who speaks to the contrary is in such a small minority that they more or less don’t exist. But the internet has the ability to amplify the voices of a small and vocal minority, to make this seem like it’s a real issue. But most of these people don’t stop to think that it can’t be both free and not have ads; and when the chips are down they’ll keep on using Facebook just like the next guy, because the average internet user will spend 4 hours researching prices to save 6 bucks.

Douglas Rushkoff on Facebook: https://www.facebook.com/rushkoff

I guess he likes being a product too.

Computer Science: Why the Degree Matters

It’s in vogue these days to confidently declare that college degrees aren’t useful. Such unfounded assertions might even be backed by lonely data points (“Gates/Zuckerberg/person de jour didn’t graduate from college, the smartest coworker at my company has no formal training,” etc). This trend has accelerated lately with the proliferation of the rumor that Google and Facebook occasionally hire people right out of high school, while conveniently ignoring the fact that it wouldn’t be unusual for entire teams at these companies to be composed of PhDs. Maybe you should just play the lottery instead?

A lot of lies start out with a grain of truth, and this particular one is no exception. The fact is that its becoming easier to be an innovator and coder – barriers to entry are decreasing, costs are falling through the floor, and accessibility is on the rise. But software engineering as a profession suffers from a credibility problem, since there’s no standard for becoming a software engineer. Sure, its also true that a degree doesn’t make one. True skill isn’t denoted by a diploma, but there’s certainly a correlation. And you know what they say about hiring: its better to say no to a good candidate than yes to a bad one. Continue reading “Computer Science: Why the Degree Matters”

(Mis)Adventures in Ruby

Think fast! Why is this code wrong?

class User < ActiveRecord::Base

    before_save :update_precheck

    def update_precheck
        check_email_changed
    end

    # if the user changed their email then make sure we ask them to verify email again
    def check_email_changed
        if self.email_changed?
            self.is_verified = false
        end
    end

end

If you said "the model will never save any changes if you edit your email address because update_precheck will return false, which cancels the ActiveRecord save" - well congrats! You're an awesome and experienced Ruby programmer!

For the rest of us, this sucks and is completely unintended behavior because of Ruby's "I'm so cool I'm going to take the last statement in your function and make it an implicit return value" stunt. There's a few things at play here: unfortunately this code will work some of the time, because before_save callbacks work if you return nothing OR true, and is only cancelled if you return false. So the lack of strictness in this regard actually hides how the function operates. Just for reference, here is the correct code:

class User < ActiveRecord::Base

    before_save :update_precheck

    def update_precheck
        check_email_changed
    end

    # if the user changed their email then make sure we ask them to verify email again
    def check_email_changed
        if self.email_changed?
            self.is_verified = false
        end
        # make sure we dont accidentally cancel saving if this is the last function called by update_precheck
        return true
    end

end

I try to avoid writing code that accidentally works, but unfortunately Ruby sometimes makes it too easy, apparently.

Edit:

This features reminds me of a good line by one of my heroes, Raymond Chen: Cleaner, More Elegant, and Wrong

How to Write a Software Resume

One of the fun things I did in college was screen resumes and interview candidates for various reasons: as a student programming lead, then as a technical adviser to nonprofits at my university (the trust conferred on me was rather novel at the time). The good candidates have stayed in my mind as vividly as the terrible ones. I later got to see hundreds of resumes while recruiting for Microsoft, and there are patterns to the best and worst resumes. Here are some of them:

  • Use keywords: C++, Java, Ajax, ATL, pthreads, MongoDB. Preferably in the context of stuff you worked on. I’m trying to judge how hard the projects were, and it doesn’t help if you just say “built an e-commerce system.”
  • Show some expertise. Don’t just spam keywords but rather clusters of related ones to show you’ve actually explored something in depth rather than just dabbling in 30 languages. “Java 6, Ant, JBoss, Spring & Hibernate” says a lot more than “Ruby on Rails, Python, PHP, Java, Lisp.” As a corollary, make sure you list where you worked with these languages under your experience. If you list “Java” a single time on your resume, I can pretty much guarantee you do not know Java.
  • Don’t give mundane details. If it takes you more than about a sentence to describe your role, you’re probably doing it wrong. I want to see accomplishments and specific projects and technologies you worked with. As a general rule, ruthlessly remove every word that can’t conceivably change my mind about you. “Filed reports and assisted customers with inquiries?” Shut up.
  • Don’t lie or exaggerate. Don’t even stretch the truth. “Assistant to the Manager” is not an “Assistant Manager.” A “4.0/5.0” GPA is not “a 4.0”. Reviewers pick up on these tricks.
  • DO NOT PAD your resume. If you’re a college hire or looking for an internship, we understand your experience is light. This doesn’t mean you need to go on endlessly about your volunteer work or have 5 bullet points about your cashier position at CVS. I don’t need 4 lines of crap about how you worked at a dog shelter. Adding drivel de-emphasizes your relevant experience and drowns out the signal with noise.

This last point is particularly sensitive. Many people don’t feel comfortable filling out 3/5ths of page, or even 4/5ths. They must fill the whole page, with anything. I have seen college sophomores who insisted on 2 full pages, size 10. The perpetual fear among this population appears to be: “I am so young, how do I prove I am mature, responsible, and capable without bucketloads of meaningless accomplishments?” To that I answer:

  • Your grades. Especially in computer science, but also peripheral classes. It’s important that you do well even in courses you don’t like, because that’s the sign of a responsible adult (doing things you don’t want to do: story of a grownup’s life).
  • Length of employment and/or extracurricular involvement, rather than quantity (or needless details). Being able to stick through things for medium and long term commitments is the sign of a mature person.
  • Projects: open source contributions, bootstrapped web-apps, whatever. This is the next best thing to work experience, displays initiative, and is the best time for you to experiment before you get beaten down by the everyday stresses of life after college. True story.

Thoughts on Process

Process sucks.

One of the things I’ve learned about process is that it tends to grow exponentially with the size of an organization. I’ve also learned that beyond its optimal point, process directly hinders productivity. Because of this, productivity tends to only increase logarithmically as a function of group size.

This is not a new observation. It is probably most famously studied in The Mythical Man-Month. Increasing manpower produces diminishing returns because work cannot be perfectly parallelized; and even when it can, communication lines increase and introduce overhead. What really drove it home for me was working the full gamut of companies – from the 3 man startup to the largest software company in the world.

The organizations that do the worst tend to be the ones that have far too much process given their size. Sometimes this is a result of a smaller group operating within the context of a larger company (and often adopting process as a top-down mandate). Sometimes it is a smaller company pretending to be a larger company (and thus gaining all the disadvantages but none of the advantages of a big company). However it happened, process often intended to increase productivity or decrease risk has the opposite effect – either hindering productivity, or increasing schedule risk.

There are also the rare organizations that have too little process given their size. These organizations increase the likelihood of catastrophic failure (and leadership either massively magnifies or mitigates this risk), and actually see decreases in productivity as a result of constant error-correcting. My conclusion is that this is a real threat, but it is mostly outweighed by the threat of too much process: death by stagnation and suppression of productivity.

Process, then, is really about tradeoffs: lean towards safety with more, or increase risk with less of it? Introducing much-needed process almost always results in better productivity. But it would appear that leaning towards less process instead of over-prescribing it reaps the largest rewards. This is, of course, domain-specific: we do not want medical or air traffic control systems built “agile.” But why exactly does process hinder us so much?

When code is free and software is paid for in human hours, time is the most expensive luxury. Every moment spent on formalities and overhead is time not spent executing. And if an engineer spends 1/4 of his time doing stuff other than coding, he doesn’t operate at 3/4 capacity – he operates at far less. This is not only because process always inevitably spills over into time it was never meant to occupy, but because it also distracts from real work by taking up mindshare. Adding tasks generates context switches for programmers, which is basically the most terrible thing you can do for productivity. In addition, process is often introduced top-down to benefit managers, who don’t fully appreciate the costs borne by individual contributors.

Because of this, naively adding up the time spent in overhead instead of coding makes it easy to underestimate the effect that process has on productivity. So whenever we add new process and observe the amazing new productivity it has enabled and the risk it has lowered, we often fail to account for the real total cost of it. Routinely underestimating the costs of process and overestimating its benefits causes most organizations to drift towards performing suboptimally by having too much process overhead. It’s a foregone conclusion.

These effects need to be consciously resisted: when you introduce process, account for its inevitable hidden time cost. Perhaps use the same principle that experienced schedulers apply to software scheduling: just add 20 or 40 percent to your best guess (depending on team experience and product maturity), and that’s roughly where you’ll land. Resist process and make it come from the bottom-up, where the people implementing it have the best information about its cost-benefit analysis.