Changes to constant unloading in Ruby 2.4

We noticed some interesting behavior recently while upgrading to Rails 5.0.1 and Ruby 2.4. Between MRI 2.3x branch and 2.4.0, constant unloading no longer affects variable references to the unloaded class constant:

ref = Object.const_set(“ASD”, Class.new)
=> ASD
2.4.0 :014 > Object.send(:remove_const, “ASD”)
=> ASD
2.4.0 :015 > ref.name
=> “ASD”

In previous versions of Ruby, ref.name would return nil. In fact, in 2.4 the reference doesn’t appear to be unloaded in nearly any sense of the word – I can do ref.new and it will create a new instance of the ASD class, and calling instance.class on it will return ASD. The only change appears to be that directly trying to reference ASD now results in a NameError.

At first I thought this had something to do with threadsafety, but now I’m not so sure; Ruby lets you do all sorts of things that can silently affect behavior in other threads (like redefining methods and classes), so this reasoning would be inconsistent with that design.

Anyways, there are 3,392 commits between 2.3.0 and 2.4.0. Should be fun to track down.

Resolutions for the new year

2016 was a big year for me – both personally and professionally. I owe a great deal to a supportive network of friends, family, and professional contacts. It was also due to a lot of luck.

Today is MLK Day 2017, the world is still a really unfair place, and the smart money is betting it’s about to get worse. My goal is to do what I can to counteract this.

Economic Security

Economic security is still out of reach for many people. I see this firsthand when I go to the food bank. To many others though, this is an abstract concept. I’ve heard people I respect tell me to my face, and I paraphrase: “voting to raise the price of eggs in MA won’t cause hardship, because most people already have basic food security.” I have enough experience to know this is laughably untrue, but I held my tongue. Even though I was raised in a low income household, it is not always easy to appreciate how many problems are solved by money, if they stopped being problems long ago, or were never problems in the first place.

I resolve to stop by the food bank more regularly. I’ve learned over several years that the Red Cross in particular does good work, and helps people very efficiently with the resources they have. My individual contributions will probably amount to essentially nothing in the long run, but this is OK.

I resolve to talk more about poverty and be more mindful of poverty. The self-sorting of Americans along socioeconomic and racial lines has been damaging to our ability to even acknowledge the massive difference in lived experiences of people from different socioeconomic classes in this country. It is difficult, a priori, to imagine an order of magnitude difference in income, let alone two or three.

Luck and discrimination in the workforce

An uncomfortably large component of career success is just luck. This includes interviewing. To the untrained eye, it looks like software interviewing (vs say, police officer or teacher) has this veneer of objectivity – there are code challenges and whiteboarding sessions and everything. But this is bullshit, and it starts right at the resume screening process.

We computer scientists actively chase minorities out of our classrooms and workforce. This is partly through inertia (it takes a certain kind of woman to be comfortable in a room with 19 peers who all happen to be male). It is partly through a continuous stream of micro-aggressions, such as assuming south asians are on a visa, or women are probably in QA or UX design. It is partly because of the well-documented subconscious bias that puts extra hurdles and higher standards in their way, whether they are applying to the major, interviewing for jobs, or up for promotion. And it is perpetuated by stereotypical male posturing that makes women feel like they can’t keep up in CS courses, even when they score similarly or better.

I resolve to spend time fixing the supply side. Yesterday I finished volunteering at RailsBridge Boston for the first time. I hope to do it again in the future. Starting today, I will also speak to anyone about getting into computer science, with essentially no preconditions. In the past, I’ve argued that prospective engineers should really make an effort to formally learn computer science, and simultaneously argued that our community should be more respectful and inclusive of other fields and backgrounds. My position today is that we should unambiguously strive to create a bigger tent that welcomes everyone.

I resolve to spend time fixing the demand side. At Privy, we recently published our commitment to candidates – our goal is to make it easier for candidates to be evaluated fairly regardless of their background or skills. I’m still learning how I can personally make more of a difference here. I expect things will take time.

We just raised a round led by Mike Volpe + Yoav Shapira

We just raised a round of funding led by the amazing guys at Operator.vc. Yoav Shapira (1st VPE, Hubspot) and Mike Volpe (founding CMO, Hubspot) are swell guys in addition to being towering figures in Boston, and I look up to them a lot more than they probably know.

The new financing also includes participation from Bill Cohen, Managing Partner and Todd Breeden, Principal at KiwiVenture Partners II. With this new round, Operator.vc and KiwiVenture Partners II join our existing investors (Hubspot, Accomplice, 500 Startups and more).

Around the interwebs:

On a related note, check out our job openings.

Good and bad questions for candidates to ask

As a candidate, you should definitely ask questions during an interview. But it’s a complete myth that you need to heavily research a company or startup beforehand in order to come up with good questions. Here are a few good questions I’ve heard recently, in no particular order.

What are the working hours like?

I get the sense some people avoid this question even when they should ask, because they fear coming off as too concerned about work/life balance. I think this is overblown, and most places are going to at least pay lip service to caring about balance and offering a reasonable description of the hours and flexibility you would be subject to. A great way to frame it is ask about the interviewers’ hours or the founding or executive team’s.

What are the team members’ backgrounds?

This seems like a good way to demonstrate curiosity about cultural fit and working environment. I think cultural fit is overused and misused as hiring criteria in a way that has the unintended outcome of being racist or sexist, but it can still be a useful signal, or at least a way to suss out red flags about the company culture.

Also, at smaller companies, this might be the single most important detail, because the experience and character of the early leadership team is overwhelmingly going to determine workplace culture, not to mention the success of the venture.

Do you hang out together outside of work?

The answer to this question could help set and communicate expectations around friendships in and out of the office. I think it’s a tall order to assume people “need” to be friends outside of work to be successful, so a mismatch in expectations here is illuminating.

And also for me personally, it would be weird if the team didn’t occasionally celebrate milestones with team offsites or dinners.

What will my first few weeks be like?

This is a good way to check if there is a defined onboarding plan and specific set of projects in mind for the candidate. Speaks to the level of organization and preparation of the firm for the new employee’s arrival. Note that this question may not be appropriate for interviewers who would be peers.

What is your technology stack?

Not a great question on its own, but sometimes leads to interesting origin stories regarding the firm or product[1]. Also, it can reveal how the stack has changed and hint at how the company approaches technical decisions. As a bonus, it can also reveal naive aphorisms like “use the best tool for the job” which in my opinion always betrays a lack of technology planning (who is going to maintain your polyglot systems at scale?).

What are your hiring plans?

This can reveal organizational priorities and shine light on projections for growth, cashflow, etc. In rare cases this can immediately disqualify companies that clearly don’t know what they are doing (e.g., trying to get sales and biz dev #10 before a prototype has even been built, etc). But this is rare for most companies that have raised outside money, in my experience it’s mostly bootstrappers and people who are spending a financial windfall who fall into those traps because they often have no one to tell them their hiring plan won’t work.

Other good questions to ask

There are some situational or firm-specific questions that should be asked if you’ve done some research and can’t find a satisfactory answer. Sometimes it’s good to ask in general about the product roadmap (to test if they have a plan, and/or whether their plan is delusional). Sometimes it’s good to ask about monetization plans (if they are pre-revenue) or fundraising (if they are pre-revenue and haven’t raised in, say, a year or more).

Bad questions to ask

A lot of bad advice on this topic takes a cargo cult approach to interview questions, or treats it like a clever game of turning the tables on the interviewer (“act like an owner” or “act like an investor”). This is not a winning approach.

Investors are either putting in someone else’s money, or investing their own money that they can afford to lose. In either case, there’s no obvious parallel to how you as an employee should vet a company you plan to join. You should do your diligence, but there’s nothing specific to diligence as a potential investor that you should emulate. Investors hope 9 of 10 investments utterly fail and 1/10 becomes the next Facebook. This isn’t some special corner case – it’s the ideal scenario for the investor, and an absolute disaster for most employees.

And “act like an owner” sounds like smart advice, but doesn’t actually mean anything in practice[2]. You will, by definition, be essentially the opposite of an owner. So even if “act like an owner” could be translated into insightful questions or actions, they aren’t likely to be good moves in your situation. So with all that said, here are some things I would personally avoid:

Coming off as skeptical of the product or business model.

You should definitely ask about the product or business model if things aren’t clear, but wrap them in a veneer of curiosity, rather than the arrogance of an “investor”/”owner” diligencing a deal.

Asking about competitors

This one’s a bit nuanced – if there’s an obvious competitor that is further along and the firm doesn’t have clear differentiation, then this is a fair question to have them elaborate on. But I think that questions about the broader market are more illuminating than asking about any one particular competitor in the general case, because you might otherwise come off as skeptical of the product or business model[3], or miss the bigger picture.

Do you do X?

Just as it is on the other side of the table, close-ended questions are bad, and give you very few bits of information. For example, you want open-ended ones like “how do you do code reviews?” instead of “do you do code reviews?” And even that’s not a great question – it’s usually better to ask meta questions about process and how changes are expected and managed, rather than about the current process.

Are you planning to do X?

Aside from falling into the one-bit-of-information pitfall above, this question asks the interviewer to verbally commit to something that you have no way of verifying. “Yes, we’ll definitely do X when we have the funding for it” gives you less than one bit of information, because you haven’t established enough trust to rely on their word, and it’s too early to be making demands that will get you written assurances.

 

[1] I’ve had multiple people who, upon hearing about our hosting stack (Heroku), complain that it was too expensive, with a subset of them obviously implying that we made a poor choice.

[2] I’ve found most career advice that can be summed up in one sentence to fit this description.

[3] This is one thing investors get to do that you don’t: you don’t get to be skeptical of the plan or business model, because nobody wants to hire someone like that. On the other hand, I’m sure plenty of founders could live with skeptical investors as long as they forked over the money, because as much as both founders and investors like to pretend its about more than the money, its mostly about the money.

The two-man rule in engineering

In nuclear weapons design, there is a two-man rule that prevents any single individual from accidentally — or maliciously — launching nuclear weapons. Each step requires knowledge and consent from two individuals to proceed. Even when the President initiates a launch order, he must jointly authenticate with the Secretary of Defense (they’re given separate codes, even though the President has sole authority).

When the order reaches the launch control center, two people are required to authenticate and initiate the launch, for example by (vastly simplifying…) turning two keys simultaneously.

The benefits are at least twofold. First, it’s much harder to compromise or impersonate two people simultaneously than it is to compromise one. Second, it also provides error correction. When two people are involved in a process, it’s much more likely that if someone is about to make an oversight or error, it will be caught. This works better when the roles are asymmetric, because then they won’t both be on the same “wavelength.” Most good processes of this type seem to be asymmetric in some way.

There are many contexts where we want error correction and extra security: executing large financial transfers, preparing patients for surgery, performing space shuttle launch checks, or running nuclear reactors. It also comes up a lot in software development, which is what got me thinking about this. Let’s count the ways we implement the two man rule:

Code review: Everyone is either doing this or making bad excuses for why they shouldn’t. But it’s the clearest and most accessible example of a two-man rule in software engineering.

Spec review: An essential part of any sizable project is a review of the specification to make sure, in particular, that 1) the right thing is being built in the right way, and 2) the right people and teams are aware of any impact the work might have on them.

Continuous integration: The branch built on your machine, but does it build on another one? This turns up countless “oh right I added this config variable/package and forgot to propagate the change” incidents before they become blocking.

Pair programming: I think of this as just real-time code review. It has all the same benefits and more, with the downside that it can’t be done asynchronously.

Deployments: I wish we did this closer to 100% of the time, but it has definitely been helpful to have a second person on hand for deployments in addition to the primary engineer. This is especially critical during complex deployments that happen in phases or involve many moving parts. Ideally the role is relegated to going through the checklist one last time (“says there are database migrations, are we expecting downtime or can we keep pre-boot on, and if so is the config correct?”), and in the event of an issue, helping to investigate or doing the checklist in reverse to roll back.

Mind the Gap

As we continue to grow, there are a few areas where I think a more consistent two-man rule will lead to high return on effort in the future:

  • manually rebooting servers, changing server counts or container types
  • adding/scaling services
  • running one-off commands against the production database

And yes, every once in a blue moon we deploy tiny changes to production without full code review, or force a failing build onto staging — something that is intentionally difficult and unwieldy to do. This has gone from rare to extremely rare, and I expect this trend to continue. But I like processes to be developed and enforced bottom-up if possible, and prefer values over inflexible rules. So far this tenet hasn’t failed us, and we still trust each other with good judgment above all else.

However, as the stakes get higher every day, the cost/benefit equation will eventually tip towards a standard operating procedure that can be summarized as “trust, but verify.” If that doesn’t sound like a good proverb to live by, maybe a second opinion is in order?

 

Don't tweak all the variables at once

I have been at Privy for a year. I’m proud of the team and product we’ve built, and I was excited to sit down and make a list of some of the new things I learned during my time here. Then I realized that most of these “lessons” would’ve been covered if I had just re-read everything ever written by Fred Brooks, Martin Fowler and Eric Ries…but that doesn’t make a good blog post.

So that got me thinking about the things I already sorta-knew that had been validated. Perhaps there was some pattern there. And so I made my first order list, which I present below.

I have learned virtually nothing about…

  1. Using a stack in the middle of the adoption curve: Ruby on Rails.
    • Ruby/MRI is between 2 and 50x slower than running a static language on the JVM, but even a slight increase in developer productivity more than makes up for the operations cost.
    • The advantage of using a really fancy stack (more cool factor for recruiting, etc) really doesn’t seem to compare favorably to the disadvantages (more uncertainty, smaller pool of technical talent).
    • The evidence that startups regularly die due to technology stack is vanishingly flimsy, so no need to dwell here.
  2. Building a local team.
    • Geographically distributed teams and getting on the bandwagon of “work anywhere cuz we have Slack lol” seems all the rage today, but the early team is more important than the early product, and the best teams are in the same place every day.
    • Resisting the urge to go remote has been something of a useful filtering mechanism: does this individual believe enough in our vision to consider moving here for the job?[1]
  3. Having some really solid cultural values (or aspirations, as they may be) that aren’t totally groundbreaking.
    • It’s more important that we live up to great values than come up with amazing ones. I’ll leave the latter to the management consultants.
  4. Using traditional engineering management.
    • We basically do agile: there are weekly-ish sprints; we do higher level planning on a monthly basis; a couple times a year we work on a strategic roadmap. We write software specifications before we code, and we ship daily with continuous integration and lower test coverage than I’d like to admit. Yawn.
    • We don’t use “flat” organizations or Holacracy or whatever trendy hipster management structure is in vogue. What the hell kind of problem is this trying to solve anyway? My theory is it’s got something to do with cool factor for recruiting, but I have a feeling the people trying this are no more certain than I am.

What’s the big meta lesson here?

If anything, it probably goes a little bit like this: the available levers to pull in a startup are numerous, but there are only a few that make a measurable difference. The things that are most likely to kill us are the things that kill most startups: having a subpar team, building a product that nobody wants, executing poorly on feedback loops, that kind of thing.

These are the things that, in Paul Graham terminology, make you “default dead” until you figure out how to get them right. And it’s critically important to realize that things like “what do we build?” and “who do we sell it to?” are the things that startups are doing “wrong by default” and need to diagnose and fix as quickly as possible.

But then there are the other things, like “how do we write a scalable system to respond to HTTP requests?” or “how should we manage engineering teams?” in which there are essentially no forced errors, and where (barring a well-articulated exception[2]) the correct answer is the default one. So almost all of the risks here seem to be to the downside, and any upside is probably insignificant compared to the scale and difficulty of the hard problem: building a novel product under uncertainty.

There are certainly going to be exceptions to this. There are going to be teams that have figured out how to deviate from orthodoxy and are reaping benefits from it. I’m OK with this, and my theory is that it either doesn’t matter (e.g., they were going to be a success anyway) or it won’t rescue them (they’re doomed and they didn’t differentiate in a way that mattered).

And so it must follow that the majority of our iterating and tweaking is on the thing that will make us a great company: what do we build? Who do we sell it to? There are enough variables in there that I don’t really have any brainpower left over to do anything except reach for Generic Ruby/Python/JavaScript framework and using engineering/recruiting/management techniques that were old 30 years ago.

 

[1] This isn’t all roses, since it biases us significantly towards younger folks who don’t have as many attachments, the net effect of which is…debatable, but obviously not lethal in a vibrant tech city like Boston.

[2] Example: One excuse I’ve used to provision real hardware in a real datacenter as opposed to just spinning up an EC2 instance is “I’ve done the math and TCO in AWS is literally 25X more expensive.”

How to uninstall the default Windows 10 apps and disable web search

If you’re like me, you’ve been enjoying Windows 10 for quite some time now. Couple things annoy me:

1. I accidentally changed all my file associations to the new default Windows apps, because the (intentionally) misleading firstrun experience presented fine print I glossed over.
2. I don’t like searching the web from the Windows Start menu, because I’d rather not transmit everything I type there over the network. Call me old fashioned.

Remove default apps

Open up a powershell prompt and run this to remove most of the default apps:

Get-AppxPackage *onenote* | Remove-AppxPackage
Get-AppxPackage *zunevideo* | Remove-AppxPackage
Get-AppxPackage *bingsports* | Remove-AppxPackage
Get-AppxPackage *windowsalarms* | Remove-AppxPackage
Get-AppxPackage *windowscommunicationsapps* | Remove-AppxPackage
Get-AppxPackage *windowscamera* | Remove-AppxPackage
Get-AppxPackage *skypeapp* | Remove-AppxPackage
Get-AppxPackage *getstarted* | Remove-AppxPackage
Get-AppxPackage *zunemusic* | Remove-AppxPackage
Get-AppxPackage *windowsmaps* | Remove-AppxPackage
Get-AppxPackage *soundrecorder* | Remove-AppxPackage

Turn off Web Search

Next, open up Group Policy Editor (gpedit.msc) and navigate to:

Computer Configuration -> Administrative Templates -> Windows Components -> Search. Enable the policies:

  • Do not allow web search
  • Don’t search the web or display web results in Search
  • Don’t search the web or display web results in Search over metered connections

Finally, open up “Cortana and Search Settings” and disable “Search online and enable web results”.

Heroku Pricing Changes

Couple of quick points on Heroku’s pricing changes which I’ve been meaning to get out:

  • Its not an across-the-board price cut. While the dyno pricing has decreased, they also got rid of the free $36ish/month in free dyno credits.
  • New free tier replaces the free dyno credit. Minimum 6 hours of sleep per day means no more abusing the free tier by pinging your app every few minutes to keep it from sleeping. Seems a lot of people were doing this to run production apps for free; good riddance.
  • New $7/month hobby tier is a great new option for people who were previously hosting production apps for free and need them live 24/7. This is a great deal since you can even have worker/background dynos for the same price. Makes sense for Heroku too – they’ll derive a good deal of long tail revenue from folks who would’ve previously just stuck with the free tier (maybe using the ping hack to prevent idling). Honestly I think the revenue is not the point – it’s more just preventing people from abusing the free tier while giving enough folks a no-excuses carrot to use the platform so it’ll be a no-brainer when they “go pro.”
  • Professional dyno pricing drop is great, but it’s going to be a wash for the majority of paying users because the free credit is going away. Basically there’s no more big cliff where you go from free->paid any more, but the steepness of the pricing increases is somewhat lower. My intuition is the winners are the 4-5 figure/month customers, makes sense since that’s around the time they start thinking about moving to AWS directly for cost savings. More of them will just consider staying.

Why Work at a Startup?

Because I’m tired of explaining to everyone, I’m going to make this list to refer to anyone who asks. While I don’t think any of these are particularly original, it makes a handy checklist for anyone considering a similar jump[1].

  •  Faster time to market. At Privy, we routinely ship code that was written earlier in the day or week. Seems petty, but as an engineer, it’s frustrating to improve something and then not have it in the hands of customers for weeks or months.
  • More hats to wear. The diversity of work at a startup appeals to me. I can work on product, recruiting, and engineering. Before lunch. The pace and scope of work is both faster and longer term, and I like being involved in multiple parts of the business.
  • Be judged by customers, not managers. A startup makes each person less insulated from the market. Therefore the correlation between performance and rewards tends to be much closer.
  • Less politics. As a consequence of the last point, politics becomes less important. It’s much harder to bullshit accomplishments in a startup when the entire company fits into a small room or two. Tired of carrying teammates who aren’t pulling their own weight? Join a startup.
  • Incredible learning. As another corollary to being closer to market forces, I’ve learned a lot about how to run a business that provides value to customers in exchange for money. I’ve in turn been able to apply experience I’ve learned elsewhere that I never would’ve been able to use at a larger company, because my job title would’ve prevented me from doing anything other than engineering.
  • Challenging the status quo, not defending it. Name recognition is cool, but I never got the sense that my role at Office was about reshaping how people work – probably because our market share had nowhere to go but down. But I’ve found I don’t mind playing the underdog as long as I have a thesis about how the future should change for the better.

 

1. In a necessary but not sufficient way (i.e. if these don’t apply to you, a startup is probably a bad idea; but if they do apply to you, a startup could still be a bad idea).

How we sped up our background processing 150x

Performance has always been an obsession of mine. I enjoy the challenge of understanding why things take as long as they do. In the process, I often discover that there’s a way to make things faster by removing bottlenecks. Today I will go over some changes we recently made to Privy that resulted in our production application sending emails 150x faster per node!

Understanding the problem

When we starting exploring performance in our email queueing system, all our nodes were near their maximum memory limit. It was clear that we were running as many workers as we could per machine, but the CPU utilization was extremely low, even when all workers were busy.

Anyone with experience will immediately recognize that this means these systems were almost certainly I/O bound. There’s a couple obvious ways to fix this. One is to perform I/O asynchronously. Since these were already supposed to be asynchronous workers, this didn’t seem intuitively like the right answer.

The other option is to run more workers. But how do you run more workers on a machine already running as many workers as can fit in memory?

Adding more workers

We added more workers per node by moving from Resque to Sidekiq. For those who don’t know, Resque is a process-based background queuing system. Sidekiq, on the other hand, is thread-based. This is important, because Resque’s design means a copy of the application code is duplicated across every one of its worker processes. If we wanted two Resque workers, we would use double the memory of a single worker (because of the copy-on-write nature of forked process memory in linux, this isn’t strictly true, but it was quite close in our production systems due to the memory access patterns of our application and the ruby runtime).

Making this switch to Sidekiq allowed us to immediately increase the number of workers per node by a factor of roughly 6x. All the Sidekiq workers are able to more tightly share operating system resources like memory, network connections, and database access handles.

How did we do?

This one change resulted in a performance change of nearly 30x (as in, 3000% as fast).

Wait, what?

Plot twist!

How did running more workers also result in a performance increase of 500% per worker? I had to do some digging. As it turns out, there’s a number of things that make Resque workers slower:

  • Each worker process forks a child process before starting each job. This takes time, even on a copy-on-write system like linux.
  • Then, since there are now two processes sharing the same connection to redis, the child has to reopen the connection.
  • Now, the parent will have to wait on the child process to exit before it can check the queue for the next job to do.

When we compounded all of these across every worker, it turns out these were, on average, adding a multiple-seconds-long penalty to every job. There is almost certainly something wrong here (and no, it wasn’t paging). I’m sure this could’ve been tuned and improved, but I didn’t explore since it was moot at this point anyway.

Let’s do better – with Computer ScienceTM

In the course of rewriting this system, we noticed some operations were just taking longer than felt right. One of these was the scheduling system: we schedule reminder emails to be sent out in redis itself, inserting jobs into a set that is sorted by time. Sometimes things happen that require removing scheduled emails (for example, if the user performs the action we were trying to nudge them to do).

While profiling the performance of these email reminders, I noticed an odd design: whenever the state of a claimed offer changes (including an email being sent), all related scheduled emails are removed and re-inserted (based on what makes sense for this new state). Obviously, this is a good way to make sure that anything unnecessary is removed without having to know what those things are. I had a hunch: If the scheduled jobs are sorted by time, how long would it take to find jobs that aren’t keyed on time?

O(n). Whoops!

It turns out that the time it took to send an email depended linearly on how many emails were waiting to be sent. This is not a recipe for high scalability.

We did some work to never remove scheduled jobs out of order – instead, scheduled jobs check their validity during runtime and no-op if there is nothing to do. Since no operations depend linearly on the size of the queue any more, its a much more scalable design.

By making this change, we saw an increase in performance of more than 5x in production.

Summing up

  • Moving from process-based to thread-based workers: ~6x more workers per node.
  • Moving from forking workers to non-forking workers: 5x faster.
  • Removing O(n) operations from the actual email send job: 5x faster.
  • Total speedup: Roughly 150x performance improvement.