A. Jesse Jiryu Davis

4 results

Investing in CS4All: One Year Later

When a couple of New York City high school teachers partnered with MongoDB to teach computer science, did they succeed? Their curriculum was untested, and they were teaching in difficult districts where most students are from poor and minority families. I talked with these two teachers, Jeremy Mellema and Timothy Chen, back in September , when they had completed a summer fellowship at MongoDB and had just started teaching their curriculum; at the end of the academic year this spring, I visited Jeremy and Tim again to find out the result. Their successes were sparse and partial. They discovered that their students' poor reading skills were a barrier to learning to code, and that teaching new coders how to solve problems is, itself, an unsolved problem. With a coarse unit of iteration—a school semester—it is painfully slow to experiment and find teaching methods that work. But even partial wins make a difference for individual kids, and the support of professional engineers at companies like MongoDB can be a powerful accelerant. What engages students Jeremy's main struggle was to get his students excited about code. He was assigned to teach a computer science class at Bronx Compass High School in the fall, using the curriculum he wrote during his fellowship at MongoDB last summer. In the beginning he spent too much time lecturing. “It felt weird,” he said. “It should be more like, ‘Let’s get down and dirty,’ and not, ‘Let’s have me talk to you.’” Even when his students did get their hands on computers, the first exercises were simply retyping Python scripts from a textbook. The payoff, watching a script run without throwing an exception, was hardly satisfying to them. Things started to click when he introduced Python turtle graphics , which gave the class more obvious evidence of their accomplishments. It also allowed Jeremy better opportunities to motivate and engage his students directly. “Some days I would challenge them to see who could make the craziest drawing,” Jeremy says. He would tell his students, “That’s so cool. I only made a star. You definitely beat me today.” Jeremy teaches both history and computer science, and he finds that some of his lowest-performing history students are his best CS students. “It’s satisfying to see them in their element,” he says. In Jeremy's view a computer science class can touch a student's intellect just as deeply as history. “People are multifaceted. You’re not only who you are when you’re in my history class.” Jeremy's computer science class was cancelled this spring; the students at Bronx Compass High School are behind on history credits and there are only three history teachers on staff. For now, computer science is merely an elective, so Jeremy is back teaching history full-time. “I really miss teaching CS,” he says. If he resumes the course, Jeremy thinks it must be livelier. He is reconsidering his use of the videos from How To Think Like a Computer Scientist, which he studied last summer on the recommendation of his mentor Shannon Bradshaw, MongoDB’s vice president of education. The content helped Jeremy train to teach CS, but when he showed the videos to his kids they were bored. Jeremy hopes to make new videos that will draw them in. His students from the fall semester say he should get their advice. Otherwise, they warned, “you might do something that you think is cool but it’s actually super corny.” A head start Although there is no computer science elective this semester, some students are pursuing the topic in other ways. A young woman from his class in the fall, Tatyana Camacho, now interns for the high school’s IT department. I had quoted her in my previous article, and Jeremy tells me she loved it. She commanded him to show her father in the next parent-teacher conference: “You need to show my dad that I’m one of the advanced students.” Jeremy still runs the afternoon Computer Club. I visited the club to meet a student, Daniel Rodriguez, who was tinkering with an Arduino and a circuit board that the school provided. “I don't have the ability to get this equipment otherwise, in my predicament,” says Daniel. He starts his Arduino projects by copying examples. The wiring is easier than the coding for him, he told me, "especially because I'm not the best speller in the world." Once he has the example working he modifies it to his own taste. Most recently, he wanted to show a message but, with only LEDs, he can’t display much. He researched Morse Code and made a light flash the code for “HELLO”, like any programmer demonstrating a system for the first time. “Most people think that once you plug something in, that’s it, it works,” says Daniel. “But I’m the person that makes the circuit run. I tell people, ‘I made it do that.’ And seeing them fascinated by what I did, it makes me, in turn, fascinated by what I'm doing.” Daniel has to return the Arduino at the end of the year. Next year he’ll go to a trade school for electricians. Working with the Arduino will give him an advantage, he hopes, and it seems plausible to me. As he finishes school and starts work as an electrician the world will be changing around him: smart appliances and programmable components will be everywhere. An electrician who loves to code will have a big head start. Making anything they want Timothy Chen teaches in Hell’s Kitchen, at Urban Assembly Gateway School for Technology. I visited his class in May to see how his students had progressed since I last saw them in September. They were involved in a multi-week project called the AP Create Task, part of a national Advanced Placement exam. “They are allowed to make literally anything they want, in any language,” says Tim. Students submit their code and a one-minute video of the program in action, and they may describe their project either in writing or in audio narration. I was surprised by how 21st Century the test is, and how accommodating it could be to students with a deficit in reading and writing. It must be remarkably difficult, however, to score fairly. The Create Task is many students’ first time scoping and integrating a sizable project, and there were flameouts. One young man tried to make a maze game drawn with ASCII characters; it proved too ambitious and he ran out of time. Tim isn’t supposed to help students define the scope of their projects, but if they announce they’re going to tackle something difficult he will push them to list all the components. In the best case, they realize they don’t know how to do most of the project and choose something simpler. One of Tim’s students, Jahseem Maxwell, was building a Go Fish card game in Python, and she was having trouble integrating the pieces. “It has to be a certain order and it's hard to make that order when you don't know, really, what you’re doing. I’m struggling, putting it all together.” Another student, Cecilia Gonzalez, was writing a Choose Your Own Adventure game. She says the AP Create Task encourages students to work in pairs. “We work sort of together but not exactly.” Each must create at least one significant part of the program independently. Cecilia’s game is based on a monster of urban legend called The Rake, which comes closer when you think about it. The game begins by asking questions such as the player’s name and height. She told me the player’s answers will determine “some things that are going to happen,” but she didn’t give away any spoilers. When Tim began teaching the class in September, he hadn’t written the ending yet, either. His greatest fear was his students would learn the curriculum faster than he could write it. By May it was clear that wasn’t a problem. “Some of the students can’t read very well, and that was a big barrier because all the things I made were text,” he says. “Everything just took longer than expected.” Problem solving How do you teach problem solving? This is Tim’s great unanswered question from the year. Perhaps if high school computer science were taught like math, as a series of small problems with only one right answer each, then how to solve those problems wouldn't be such a mystery. But high school CS is taught like art class. Tim’s students invent new projects and somehow solve the unpredictable problems that arise in them. Tim speculates that he would learn problem solving himself by watching an experienced programmer solve a new problem, hit roadblocks, and overcome them. Indeed, that is how I have taught problem solving to MongoDB interns. Together, we attack problems without knowing the answers beforehand. It requires an entire summer of one-on-one collaboration. “I don't think that model works very with the kids,” says Tim, “especially if they are not very good with sitting still for an extended period. I'm not sure how to reach them.” Falling in love Tim, like Jeremy, wants to make more multimedia to reach students despite their poor reading skills. “I want to rethink how it should be done before I start this time. I kind of jumped into it too quickly.” Tim’s main goal is to give kids the chance to fall in love with programming and continue on their own. Many other goals are still out of reach: students at his school score low on the AP test, and few of them are likely to get a college degree in CS or be professional coders. Still, Tim hopes that a more varied course, with audio and video, could bring students farther. “The big hurdle for everyone is teaching problem solving. If I can get that, everything else is easy. I'm still trying to figure out how to do that.”

October 3, 2017

When Switching Projects, Check your Assumptions or Risk Disaster

On January 10, I released a badly broken version of the MongoDB C Driver, libmongoc 1.5.2. For most users, that version could not connect to a server at all! Luckily, in under 24 hours a developer reported the bug, I reverted the mistake and released a fix. Although it was resolved before it did any damage, this is among the most dramatic mistakes I've made since I switched from the PyMongo team to libmongoc almost two years ago. My error stemmed from three mistaken assumptions I've had ever since I changed projects. What were they? Inception Here's how the story began. In December, a libmongoc user named Alexey pointed out a longstanding limitation: it would only resolve hostnames to IPv4 addresses. Even if IPv6 address records existed for a hostname, the driver would not look them up -- when it called getaddrinfo on the hostname to do the DNS resolution, it passed AF_INET as the address family, precluding anything but IPv4. So if you passed the URI mongodb://example.com , libmongoc resolved "example.com" to an IPv4 address like 93.184.216.34 and tried to connect to it. If the connection timed out, the driver gave up. The driver did support IPv6 connections, but the only way to use them was to pass in an IPv6 address like this one: mongodb://[2606:2800:220:1:248:1893:25c8:1946] ...which makes the driver call getaddrinfo with the raw address string and AF_INET6 , resulting in an IPv6 address. To fully support IPv6, libmongoc should call getaddrinfo with AF_UNSPEC instead. Then getaddrinfo would return a list of both IPv6 and IPv4 addresses for "example.com", to which the driver should try to connect, each in turn, until a connection succeeds. Let's look at some code and see why the driver didn't yet achieve this standard . Here is where the address family, either IPv4 or IPv6, is determined in the code that parses the MongoDB URI : const char *host; /* extract the host substring of a URI */ host = mongoc_uri_parse_hostname (uri); if (strchr (host, ':')) { family = AF_INET6; /* means "IPv6 only" */ } else { family = AF_INET; /* means "IPv4 only" */ } You can see that if the driver finds any ":" characters in the host string it sets the family to IPv6, for which ":" is a required component of an address. Otherwise it uses IPv4, where the “:” is prohibited in both hostnames and addresses. Later, in a different function, in a different file, the address family is used for DNS resolution: /* host is "example.com", port is 27017, * address_family came from the code above */ int mongoc_connect (const char *host, uint16_t port, int address_family) { int sock; struct addrinfo hints = { 0 }; struct addrinfo *result; hints.ai_family = address_family; /* hostname resolution: * get a list of IP addresses for "host" */ getaddrinfo (host, port, &hints, &result); /* connect to first IP address */ sock = socket (result[0].ai_family, result[0].ai_socktype, result[0].ai_protocol); if (mongoc_socket_connect (sock) == -1) { /* give up */ return -1; } return sock; } As Alexey pointed out, this logic prevented us from translating a hostname like "example.com" into an IPv6 address. He submitted a one-line patch for the URI-parsing code: if (strchr (host, ':')) { family = AF_INET6; } else { - family = AF_INET; /* means "IPv4 only" */ + family = AF_UNSPEC; /* means "any family" */ } This change looked good to me: given a hostname, libmongoc should try to connect over IPv6, IPv4, or whatever getaddrinfo gives us. Hannes, libmongoc's other maintainer, also reviewed the patch. It was simple and it passed our tests, so we accepted the change and decided to include it in our upcoming release, version 1.5.2. This one-line change broke the driver. Assumption one: Reality matches imagination Just from looking at Alexey's patch, I might have realized that it had only accomplished half an improvement. It switched the address family from AF_INET to AF_UNSPEC , so now the driver would resolve a hostname to a list of IPv6 and IPv4 addresses. But, it did not try connecting to each address until one succeeded; it only tried the first. If the first address was IPv6 and the MongoDB server was only listening on IPv4, the whole procedure failed. I assumed that libmongoc already implemented a loop to try each address in turn, because my understanding of libmongoc was biased from time spent developing PyMongo's code. I knew that, from its early days, PyMongo implemented the following loop: # Resolve IPv6 and IPv4 addresses, try each until one succeeds. for result in getaddrinfo(host, port, AF_UNSPEC, SOCK_STREAM): family, socktype, proto, canonname, sockaddr = result try: sock = socket.socket(family, socktype, proto) sock.settimeout(connect_timeout) sock.connect(sockaddr) # No exception was thrown, success. return sock except socket.error as e: pass raise Exception("couldn't connect") (I am omitting many details, of course.) PyMongo had "always" worked this way, as far as I knew. You passed it a hostname and PyMongo resolved it to a list of addresses, then tried to connect to each address until one succeeded. I imagined that libmongoc must, too. Imagining code is a necessary step when you take over a large project—you will never read all the code. But if you do not distinguish code you imagined from the code you've seen with your own eyes, then you become prone to making decisions without evidence. When I reviewed Alexey's patch, I only saw a few lines surrounding the change. I imagined that a loop similar to PyMongo's was in libmongoc, in a different function in another file, and in my complacency, I never verified that what I imagined was truly so. Maybe it's because we love technology so much, when many of us imagine a machine we assume each part of it has been fully and beautifully developed. William Gibson, a founding cyberpunk writer, said: It wasn't until I could finally afford a computer of my own that I found out there's a drive mechanism inside — this little thing that spins around. I'd been expecting an exotic crystalline thing, a cyberspace deck or something, and what I got was a little piece of a Victorian engine that made noises like a scratchy old record player. I notice this kind of imagination among programmers all the time. An intern asked me last year, "Where are all the MongoDB error codes documented?" It didn't occur to her that we haven't yet documented them all. Or, a colleague asked me the other day, "What philosophy governs which MongoDB drivers implement which features?" He assumed a profound principle where, in many cases, the answer was that not all drivers are completed. When we sketch in our minds how something unfamiliar must have been done, we don't sketch some parts as unfinished—we imagine a completed whole. I had assumed so confidently that libmongoc worked the way I imagined, that I never verified it. The best defense against this assumption is to remember which code you've imagined and which you've actually read. Assumption two: Ordinary features don't need special tests I consider IPv4 very ordinary, and in contrast I think IPv6 is exotic. I've been obsessed with ensuring the latter is tested thoroughly. Last year, before the events of this story, I had said, "Let's make sure we run MongoDB with IPv6 enabled in Evergreen so our libmongoc tests always exercise the IPv6 code." On every one of our menagerie of supported platforms, I made sure we started the MongoDB server with --ipv6: Mac, Windows, Linux, ARM processors, MinGW, Solaris, IBM s390x mainframes, and on and on. Then, for each platform, I added a test that connects to localhost by its IPv6 address, mongodb://[::1]. "That takes care of that," I thought. Unfortunately I'd just punched a different hole in the test matrix. We could introduce a bug in IPv4 support and never know it. I'd assumed that since IPv4 is so ordinary, it needed no special test. It would be immediately obvious if it stopped working. Do you remember when Richard Feynman demonstrated , with a piece of rubber and a cup of ice water, that the Challenger shuttle had been destroyed because its O-rings were too stiff to ensure a seal when they got cold? (Did you know that it was astronaut Sally Ride who first uncovered this flaw ? She slipped the evidence in secret to another scientist, to avoid reprisal from NASA.) There are many painful lessons from that accident, but a big one was this: NASA engineers were so obsessed with the extreme heat their O-rings must withstand, they ignored the chill of a typical winter morning on Earth. Like me, they had guarded against the exotic threat, but it was the ordinary one that led to destruction. When I merged the IPv6 patch I was oblivious to the threat of breaking IPv4 support. Everything seemed fine: On my machine and Hannes's, the driver kept working. All the tests passed. We released libmongoc 1.5.2 and celebrated with some whiskey. (I'm in New York and Hannes is in Palo Alto, so we had to toast each other virtually.) But all was not well. Remi Collet, a developer at RedHat who maintains the libmongoc package in Fedora, tested our release a few hours after we announced it. On his test rig, Remi does not run MongoDB with IPv6 enabled. He immediately found that libmongoc 1.5.2 cannot connect to mongodb://localhost if MongoDB is only listening on IPv4. He filed CDRIVER-1988 and I diagnosed the problem soon after. Although 1.5.2 tries IPv6 first, if it can't connect, it doesn't fall back to IPv4. It was obvious, but it required someone outside our team to notice it. I had assumed that if I broke such an ordinary feature the bug would not go undetected. The best defenses against this assumption are working with an outsider with an independent perspective, and thorough testing for even the most ordinary features. Assumption three: Any bug that can be fixed, should be fixed When I took over libmongoc more than a year ago, I was accustomed to my previous, luxurious position on the PyMongo team. PyMongo is mature: it's been well-staffed for years and it has few open bugs, if any. Therefore, whenever a bug is reported, it is usually a recent bug. If someone finds a bug in PyMongo 3.2, we probably introduced the bug in 3.2 when we implemented some new feature. We can diagnose it easily and release a fix in PyMongo 3.2.1, returning our bug count to zero. Since 2015, we have typically resolved PyMongo bugs within weeks: But libmongoc is not like PyMongo. It's less mature and, until recently, had only a fraction of the programmer-hours devoted to it. Besides, as everyone knows, C is harder than Python. Therefore, there were periods when the average age of the issues we resolved was more than two months. Hannes and I have made steady headway reducing the libmongoc bug count and resolving its urgent bugs. Some old bugs still linger, however, and I have to be disciplined about when to fix them. We cannot squash every known bug with every release. The benefit of fixing each bug must be assessed against the risk of creating a worse one. The other adjustment I had to make was in my understanding of Semantic Versioning . When I worked on PyMongo, I understood it like this: This is too ambitious for libmongoc. Rather than fixing all the bugs in every patch release, my new policy is: "Fix new bugs. Leave other bug fixes for the next minor release." Here's my adjusted understanding: Patch releases should be minimally risky. With few exceptions, they should repair some part of the code that we unwittingly broke while adding features in the last minor version. To put it more bureaucratically, a patch release should fix only "regressions." When Alexey's pull request came in, I forgot my own policy: I included his change in a patch release. The bug it fixed wasn't new; we should have waited for 1.6.0. We might still have released a broken driver then, but the risk was more appropriate for a minor version bump. Users have the right to expect we'd minimize risk in a patch release. It's as if I'd made a beautiful stack of pebbles and, just before I finished, my ambition got the better of me and I tried to balance just one more tiny pebble at the apex. This was the moment of failure. This is when I turned my back on my precariously balanced tower and, seconds later, it collapsed . This is when I screwed up the libmongoc 1.5.2 release. I assumed that we should fix every known bug as soon as possible. The best defense against this assumption is to be disciplined: put off most changes for the next minor version. Redemption I made a significant mistake in releasing libmongoc 1.5.2, but Remi Collet's prompt bug report saved us. It helped that Remi had an independent perspective. It helped, too, that I've invested time improving our release automation over the last few months, so it only took minutes to revert the buggy patch and publish libmongoc 1.5.3. With the wisdom gained from this painful episode, I'm less likely to fall prey to my mistaken assumptions again.

March 30, 2017

Investing In CS4All: Training Teachers and Helping Them Build Curricula

Until last year, Jeremy Mellema was a history teacher. Now, he's teaching computer programming. When I visited his class in the Bronx this month, he had 30 students with 30 MacBooks, completing exercises in Python. They had just finished a lesson on data types, and now they were tackling variables. In Jeremy's class, the first variable assignment is: tupac = "Greatest of All Time!!" Computer science for all A year ago, New York City mayor Bill de Blasio announced Computer Science for All , an $80 million public-private partnership. The goal is to teach computer science to every student at every public school. But first, the schools need curricula and 5000 teachers need training. Here at MongoDB, our VP of Education Shannon Bradshaw oversees MongoDB University, which trains IT professionals to use MongoDB. When he heard about CS4All, he wanted us to contribute. He proposed that we set aside budget for two paid fellowships, and recruit public school teachers to spend the summer with us. We would develop them as teachers, and help build curricula they could take back into schools this fall. MongoDB staff would share our expertise, our office space, our equipment, and the MongoDB software itself. Shannon pitched his proposal to the company like this: "As many of us know, it’s still unusual for students to encounter computer science, let alone databases, in their classrooms before entering college. I believe this absence directly contributes to the gender and racial disparity we see today across our industry." The CS4All project improves access to these subjects for many more students in our city, and MongoDB could be part of it from the beginning. The teachers we hired for the summer are at opposite ends of a spectrum. At the more technical end is Tim Chen. He majored in Math and took some CS classes in college. When he applied to work for the NYC Department of Education, there was a job opening for a Tech Teacher. "I didn’t know what 'tech' meant at the time,” he said, “but I lucked out because it was teaching really basic software engineering." His first year teaching, he taught the 9th grade curriculum for the Software Engineering Program. Now, he's an experienced CS teacher, and his students this fall are comparatively advanced programmers. He spent his summer at MongoDB outlining an ambitious two-year curriculum in Python and Javascript. Jeremy Mellema, on the other hand, was teaching 10th grade world history when he was tapped to teach a CS class last year. He took an intensive training with the NYC Department of Education's Software Engineering Program, where he spent a day apiece working in Python, HTML, CSS, JavaScript, Scratch, and Arduino. That fall was a struggle. "It was pretty overwhelming," he said. Jeremy is a skilled teacher, but on the topic of software, he "just didn’t know a whole lot." Jeremy joined us this summer to meet software engineers and learn to code more like a professional. His presence is particularly valuable because he's not a specialist: he'll ensure the curriculum he builds can be taught by other teachers who aren't already coders. Shannon assigned him the book How to Think Like a Computer Scientist , and he spent the summer turning its contents into something high schoolers can learn from. He told me, "I was excited to actually get my hands dirty with what it means to be a programmer and learn how to do things besides basic Scratch programming, drag and drop, and to see how you actually use it in the real world." Back to school Now that school is back in session, I'm watching Tim and Jeremy's progress with fascination. Their students come to CS from a different angle than I did: I have the typical background of a computer programmer in this country, with college-educated parents and all the privileges that typically pave the way to a career in software. I taught myself to code in high school and got a Bachelor's in CS from Oberlin College. But the students in Tim and Jeremy's classes nearly all come from low-income families and qualify for federal meal assistance. These students have self-selected into the computer track, and they are mostly boys; but there are a half-dozen girls in both classes. The majority are Black or Hispanic, and Jeremy's class also includes many recent immigrants from the Carribbean and Bangladesh. If young people like them succeed as software engineers, it will go a long way to addressing the inequalities of our industry and our society as a whole. But the goal for Tim and Jeremy this year is more modest. They will prove and refine their curricula in their classrooms; then the materials can be used by any teacher in the New York City public schools. I visited their classes to watch their plans being put to the test. The Bronx: Jeremy's class When I saw Jeremy at Bronx Compass High School, he'd had the students for one week. He was nervous the month before about how class would go: "I’m always afraid it’s going to bomb, or the students won’t find it interesting." But the kids were hacking enthusiastically. Watching them reminded me of my own joy when I first learned to code. Class time is spent working independently while Jeremy checks in with each student. The kids present a huge range of skills: some have taken computer science classes for several years, and others have just moved to New York and are getting their first exposure to CS. Jeremy worries about challenging all his students according to their level. An advanced student, Tatyana, is able to use this wide skill range to make the class more effective for the novice students. I noticed her sitting sideways in her chair so she could coach the students to her side and behind her. "It's nice to help people who struggle with coding because it's like a mixture of math and grammar,” she said. “When you write 'print', if you write it with a capital P your code won't work. It's like teaching them a new language." Tatyana plans to keep coding after high school: "I like making things work the way I want them to work." Since my formal CS education began in college, I've never sat in a high school programming class. To my surprise, 30 students can be trusted to stay on task, even when they're on laptops with the whole Internet in reach. Jeremy doesn't mind when they take detours to the Web. He said, "If they just need YouTube to play music to drown out their classmates that's fine." I saw his student Amaury with headphones on and YouTube playing in the background in order to concentrate, the same way I use music at the MongoDB office. Jeremy gives him space to work in his own style: "It's fine. As soon as I ask Amaury a question, his headphones are off and he answers." I asked Amaury to take off his headphones for a minute and tell me how the class was going. "It's cool learning how a computer works using just a bunch of freaking inputs. Put in a couple lines of code and you can have the computer do crazy stuff." Then his headphones went back on and he got back to hacking. By the end of this year, Jeremy hopes his kids will advance from sandboxed programming environments to real-world tools—that they'll install Python on their computers and code on their own. "The kids I’ve seen really do well,” he said, “this will open the door for them to take the world by storm. There’s a lot of really smart and talented kids who are not in an advantaged place. This will put them at an advantage." Hell's Kitchen: Tim's class Tim Chen teaches a few blocks from the MongoDB office, at Urban Assembly Gateway School for Technology on the west side of Manhattan. His students have specialized in CS more than Jeremy's, and he has more time with them: 45 minutes a day, 5 days a week. His curriculum is accordingly more ambitious. This year he's teaching introductory programming in Python, aligned with the upcoming version of the Advanced Placement test in Computer Science Principles. Beginning with Python makes entry to the subject easy for Tim's students, but by the end of the year they'll cover nearly every topic of a first semester college class. Next year they’ll switch to Javascript. They'll build web applications with MongoDB, Express, and Node. These three technologies (along with Angular.js) make up the famous MEAN Stack , which Tim learned this summer from MongoDB University. When I asked Tim how his curriculum performed in its first week, he said there'd been no surprises yet—but he's still in the process of writing it. Only the first semester's material is complete. "My greatest fear is that they’ll move faster than I can create it," he said. Still, he expects his students to be forgiving if they hit any bumps. "I told them this is my first year trying this out, so we're going to try things, and if it doesn't work out we're going to try different things. They're cool with it." Time to scale up Teaching one class in Hell's Kitchen and one in the Bronx is only the start. Tim wants to create a curriculum that any teacher can pick up and deliver to their students. Jeremy has the same goal, and because of his background as a history teacher, rather than a computer scientist, he is focused on making his course effective for teachers like him. For Shannon, this teacher-training is the most unexpected, and the most inspiring, aspect of our involvement. "Most of the people who will teach computer science in the New York City public schools are transitioning from another discipline,” he said. “They’re never going to be hardcore software engineers. They’re professional teachers; that’s what they want to do." But in nine years, computer science will be a core class in New York City, and the existing staff will have to teach it. "If we can contribute to figuring that out, how to make this transition as effective as possible for computing education in New York City, then we’ve gone miles beyond where I imagined us going."

October 6, 2016

Cat-Herd's Crook: How YAML test specs improve driver conformance

At MongoDB we write open source database drivers in ten programming languages. We also help developers in our community replicate our drivers' behavior in even more (and more exotic) languages. Ideally, all drivers behave alike; or, where they differ, the differences are written down and justified. How can we herd all these cats along the same path? For years we failed. Each false start at standardization left us more discouraged. But we’ve recently gained momentum on standardizing our drivers. Human-readable, machine-testable specs, coded in YAML, prove which code conforms and which does not. These YAML tests are the Cat-Herd's Crook: a tool to guide us all in the same direction. Dozens of cats Our drivers for the ten programming languages we support are, for the most part, independent rewrites. We implement the same API, the same algorithms, the same wire protocol from scratch each time. Once each driver is released, we maintain its separate code as MongoDB evolves and adds features. All these official drivers don't even encompass the whole effort. Programmers in our community tackle driver development in other languages that are newer or more exotic . Sometimes we come across some driver already fully-formed, published by someone we don't even know. Of course, these third-party drivers reuse code as rarely as we do. They're rewrites of the same ideas, over and over again. I hear you thinking: "How wasteful! All those drivers should just be thin wrappers of the C Driver." But the effort pays off: any programming language you might reasonably use has a language-native, idiomatic driver for MongoDB. There is, however, a problem: unintentional differences among the drivers. Let us set aside the topic of bugs. All drivers have them, but this article is not about bugs. It's about driver authors making reasonable choices that differ. Without intending to, we vary. Sometimes we don't even know what the variations are. "Local threshold" A couple years ago, Jesse and Samantha specified a simple way for MongoDB drivers to load-balance queries across servers. It was easy enough to build, but, to our exasperation, not all drivers implemented it the same. Say you want to load-balance your queries between two MongoDB servers in a replica set, but not if the network latency to one of them is much greater than the other's. We consider the scenario in which one of them is a 10 ms round trip from your application, and the other is 20 ms: Jesse specified how drivers should do this balancing act: they should respect a "local threshold." The closest server is always eligible for queries, and so are all servers not more than 15 ms farther than it. (This logic applies when you configure the driver for secondary reads or "nearest" reads, see the manual for a specific driver like PyMongo for details.) Some driver authors understood local threshold as Jesse intended. In the example here, the driver should spread load equally between the 10 ms and the 20 ms server, because the distant server is less than 15 ms farther than the near server: But several driver authors misinterpreted the spec to mean, "query the nearest server, or any server within a 15 ms round-trip from the application". So in this case, the server 20 ms away is completely excused from load-balancing. The driver only uses the nearest server: This misinterpretation had two consequences: Consequence Number One: Some drivers didn't offer the trade-off we'd decided upon, between low-latency and load-balancing. Instead, in situations like this example, all you got was low-latency. But maybe that's not so bad. You might argue that's a reasonable way to implement a driver. Which leads us to the real issue: Consequence Number Two: You install a driver and you don't know which behavior it implements. Unintentional differences like this cost our customer support team, make our drivers hard to document, and make them hard to learn. The inconsistency wasn't due to a lack of specification. We wrote it down clearly! "Determine the 'nearest' member as the one with the quickest response to a ping. Choose a member randomly from those at most 15ms 'farther' than the nearest member." This is why it's so annoying. Everyone read the first sentence, but not everyone carefully read the second. After all, reading English is boring. As a result, some drivers shipped with this hidden variation that lasted for months or, in one case, more than a year before we knew that they weren't up to spec. Did we have to let it linger? Why? Why did these hidden, unintentional differences last for so long? There are two causes: Too hard to verify Jesse and Samantha were unable or unwilling to read dozens of implementations of "local threshold." We don't know dozens of programming languages, and we didn't want to make a career of ensuring this one idea was coded consistently. And "local threshold" is just one of hundreds of features that MongoDB drivers must all implement the same. Non-conformism We also balked at the social discomfort of forcing our colleagues to conform to our idea. Perhaps at a more starchy, proprietary company, this isn't a problem. The boss says, "Do this!" and everyone steps into line. But at MongoDB there's a collegial, open-source vibe. Although there eventually is a final outcome to any debate, enforcing that is uncomfortable. Samantha and Jesse weren't eager to be the enforcers. The authors of other specs besides "local threshold" weren't eager to be enforcers, either. But the drivers team never gave up. We tried to unify our drivers by a variety of methods. The first three failed. False starts Reference implementation Since prose wasn't rigorous enough, our next idea was to describe the "local threshold" algorithm in code. We used Python, since it is famously legible: def choose_member(servers): best_ping_time = min( server.ping_time for server in servers) filtered_servers = [ server for server in servers if server.ping_time <= best_ping_time + 15] return random.choice(filtered_servers) This is certainly simple. It can even be unit-tested! But despite this reference implementation, unintentional differences persisted in our drivers. Why? Having a reference implementation doesn't prove that every other language implementation matches it. Besides, it is just as hard to read code as English. A reference implementation does have the advantage that it is less ambiguous, but that is the only advantage. Varying implementations lingered. Tests in prose We had a better idea: We could write tests! "Given servers with round trip time times 10ms and 20ms, the driver reads from both equally." But again, we could not prove that everyone implemented the tests, or implemented them the same. A test plan in prose is a substantial step in the right direction. Its main weakness is maintainability: over time, the specification evolves. We find problems with it, or improve it. We can broadcast those changes by updating tests or adding them, but we can't prove that a driver has updated its tests in Java and Python and Haskell to match the new English tests. There isn't enough time in the world for one person to read dozens of test suites in dozens of programming languages and verify all are up to date. Cucumber We needed automated, cross-language tests. After evaluating some tools we tried Cucumber. Cucumber is a "behavior-driven development" tool. It was designed for coders to show to clients and say, "Do we agree that this is what I'm going to implement?" The client can read the test because it resembles English. It looks something like this: Feature: Local Threshold Scenario: Nearest Given a replica set with 2 nodes And node 0 has latency 10 And node 1 has latency 20 When I track server latency on all nodes And I query with localThresholdMS 15 Then the query occurs on either node The syntax is funky. And there is still ambiguity here—what does "when I track server latency on all nodes" mean? When engineers think a tool is aesthetically offensive, no amount of debating or threatening them will lead to an enthusiastic implementation. But Cucumber does have an advantage: At least the tests are automated! The way you automate it is, you write a pattern-matcher that matches each kind of statement in these tests, like "node 0 has latency 10". Then you implement the statement in terms of driver operations in your programming language. We planned to do that work in C, C++, Python, Javascript, and so on, and then verify all the drivers matched the expected outcome: "the query occurs on either node." If this worked, we would have proven all drivers implemented "local threshold" the same. Writing these pattern-matchers in a dozen languages sure seemed like hard work. Even worse, some languages like C didn't have a Cucumber framework at all, so we'd have had to write one. The work was daunting—we were certain there must be a better way. Besides, there was a cultural rebellion against Cucumber. For Ruby or Python programmers, Cucumber looked reasonable, but less so to lower-level coders. Cucumber looks absurd to a C programmer. When engineers think a tool is aesthetically offensive, no amount of debating or threatening them will lead to an enthusiastic implementation. So we relented. But now we were without a paddle. For more than a year, we kept writing specs, and we made a solid effort to manually verify that everyone implemented them correctly. But we didn't trust that all the drivers were the same, because we couldn't mechanically test them. But we did find a way out. Our solution was to write tests for our specs in YAML. Tests in YAML What's a YAML? YAML is Yet Another Markup Language—actually, its inventors changed their minds. YAML Ain't Markup Language. It's a data language. The distinction is revealing: unlike HTML, say, which marks up plain text, YAML code is pure data. It borrows syntax from several other programming languages like C, Python, Perl. Therefore, unlike Cucumber, YAML feels like neutral ground for programmers in any language. It doesn't provoke an aesthetic revolt. Also unlike Cucumber, most languages have YAML parsers. For those that do not, we convert our YAML tests into JSON. Apart from its universality, YAML is appealing for other reasons. Large data structures are more readable in YAML than in JSON. YAML has comments, unlike JSON. And, since YAML is a superset of JSON, we can embed JSONified MongoDB data directly into a YAML test. YAML is also factorable! For example, here are the descriptions Samantha wrote in YAML for servers with round trip times of 10 and 20 ms. We will use them to write a standard test of "local threshold": server_definitions: # A near server. - &server_one address: "a" avg_rtt_ms: 10 # A farther server. - &server_two address: "b" avg_rtt_ms: 20 YAML's killer feature is aliasing: you can name an object using the ampersand, like "&server_one" above, then reference it repeatedly: # A two-node replica set. servers: - *server_one - *server_two Think of &server_one as "address-of" and *server_one as "dereference", like pointers in C. This lets us DRY our test specs. Testing "local threshold" in YAML With our two servers declared, we can define a test of "local threshold". Both servers should be eligible for a query: operation: query # Expected output: both servers are eligible for a query. eligible_servers: - *server_one - *server_two There are about 40 YAML tests in this format, that test different aspects of the drivers' load-balancing behavior. We ship a README with the tests, that describes the data structure and how driver tests should interpret it, and how to validate a driver's output against the expected output. Here's an edited version of the Python driver's "local threshold" test harness. It uses the driver's real Topology class to represent the set of servers and their round trip times, and the driver's real function for selecting a server. But the test doesn't use the network: the inputs it needs are supplied by YAML data. from pymongo.topology import Topology from pymongo.server_selectors import (any_server_selector, apply_local_threshold) def run_scenario(scenario_def): servers = scenario_def["servers"] addresses = [server["address"] for server in servers] topology = Topology(addresses) # Simulate connecting to servers and measuring # round trip time. for server in servers: topology.on_change(server) # Create server selector. if scenario_def["operation"] == "query": selector = query_server_selector else: # ... other operations we can test ... pass eligible = topology.select_servers(selector) expected_eligible = [server["address"] for server in scenario_def["eligible_servers"]] assert expected_eligible == eligible We run this harness with our 40 YAML files to test that "local threshold" and related features are all up to spec in the Python driver. Besides these 40 tests, there are other suites, in slightly different structures, that test other specifications of other aspects of the drivers. With this method there's still some work that has to be done for each driver. All the driver authors write "harnesses" to run each spec's YAML tests. This duplicated work is unavoidable: the drivers are separate code bases in different programming languages, and they have to be tested in different ways. But the harnesses are concise little bits of code, and we are significantly deduplicating: we design tests once, and we maintain one set of tests for all drivers. The initial effort of reimplementing the YAML test harness in each driver gives that driver access to the shared suite, and keeps paying off forever after. The payoff On occasion, the spec changes. Jesse or Samantha or another driver engineer publishes an update that improves the spec and its tests. Most of the time, new tests broadcast spec changes very precisely: the drivers update their copies of the YAML files, they fail the new tests, and then they bring their implementations back up to spec. Sometimes a detail can fall between the cracks. For example, we might specify that drivers should track an extra piece of information about the database servers' state, and add tests that show the expected value of this variable, but forget to update a few drivers' test harnesses to actually assert the driver's value matches the expected value. Still, such lapses are far better contained now than before we started using YAML tests. We're closing the gaps that allow for misinterpretation, and we're making the duplicated effort as small as possible. Brave new world Ever since we began using YAML tests, our specs and our drivers have been improving rapidly. Better implementations The usual arguments in favor of Test Driven Development apply to our YAML tests: writing a driver (or adapting one) so it can be tested this way leads toward neatly factored interfaces, cleaved along the same conceptual borders as our specifications themselves. Accountability If your driver passes the tests, it is up to spec, otherwise you have to fix your code. It's no longer the spec author's responsibility to review all the drivers to catch mistakes. Additionally, YAML tests have a way of ending debate. In the past, driver authors might ignore a spec, or remain attached to the different choices that remain in their own favorite code. These alternative ideas are usually excellent, but they are not the spec. Jesse, Samantha, and other spec authors hesitated to enforce their decisions, but a YAML test doesn't care. The layer of indirection that Samantha introduced into the verification process, by publishing YAML tests for our feature, reduced the emotional friction caused when a spec made our colleagues change their favorite code. Encourages more specs Writing a spec is time-consuming and often frustrating. When Samantha, Jesse, or any of the other driver engineers at MongoDB writes a spec, we have to learn the problem deeply, anticipate our users' future needs, then try to get dozens of eccentric coders to agree on one way to code a solution. All this hard work deserves a satisfying outcome, but very often the outcome was discouraging. Some specs were never implemented by all drivers, or implemented inconsistently. "Leading programmers is like herding cats" is a cliché because it's true. But now, with our YAML testing system in place, we know that our hard work will pay off. The specs we write will be implemented correctly, so we're motivated to write more specs and our standardization process is accelerating. Our specs actually work now. Cats cooperating Our success with the new spec process lets us dream of a greater ambition: organizing the open source community to build standard MongoDB drivers. The YAML tests are a pathway for outside contributors to write high-quality drivers that are consistent with ones we publish. We could imagine the day when outside code is proven as trustworthy as our own. We're not there yet. A lot of the knowledge and discussion about how drivers interact with the MongoDB server is still internal, and it's hard for an outside developer to catch up on the debates and access the institutional knowledge of our engineering team. But we can at least see the way forward now; the spec tests are a powerful accelerant in that direction.

March 2, 2016