njs bloghttps://vorpus.org/blog/2019-05-04T10:00:00-07:00Why I'm not collaborating with Kenneth Reitz2019-05-04T10:00:00-07:002019-05-04T10:00:00-07:00Nathaniel J. Smithtag:vorpus.org,2019-05-04:/blog/why-im-not-collaborating-with-kenneth-reitz/<p>[Content Warning: parts of this post could be triggering for those who have experienced gaslighting or other forms of abuse.]</p><p>[Content Warning: parts of this post could be triggering for those who have experienced gaslighting or other forms of abuse]</p>
<p>Kenneth Reitz is a famous Python developer, best-known for founding the <a href="https://pypi.org/project/requests/">Requests</a> project. Until a few years ago, I'd never interacted with him in any serious way, but I thought highly of him.</p>
<p>I appreciated (and still appreciate) his design taste, and emphasis on usability and beauty. Requests is a piece of critical infrastructure that holds up large parts of the software world; his puckish insistence that it's actually an "art project" appealed to my anarchist sympathies. I admired (and still admire) his <a href="https://www.kennethreitz.org/essays/mentalhealtherror-an-exception-occurred">openness with his mental health struggles</a>. When I was starting my project Trio, I wanted to emphasize its friendliness and accessibility, so I borrowed his "for humans" tagline, and started the documentation by <a href="https://github.com/python-trio/trio/blob/37de153f858e29df3a19db9fffcd0fb3f2308951/docs/source/index.rst">quoting him</a>.</p>
<p>Then I started working on adding async support to requests.</p>
<h1>A timeline of async in Requests, and the Requests 3 fundraiser</h1>
<p>In August 2017, I <a href="https://github.com/python-trio/urllib3/issues/1">started working</a> on adding async support to urllib3, which is the underlying HTTP library that Requests uses. This was <em>highly</em> experimental. Historically, the state of the art was that you had one HTTP client library for synchronous code, one <a href="https://github.com/twisted/treq">for Twisted</a>, one <a href="https://www.tornadoweb.org/en/stable/httpclient.html#tornado.httpclient.AsyncHTTPClient">for Tornado</a>, one <a href="https://aiohttp.readthedocs.io/en/stable/client.html">for asyncio</a>, and so on – each maintained as independent projects that didn't share code. Everyone knows this is silly, but it's very challenging to fix: you need deep expertise in HTTP, <em>and</em> in all these different approaches to networking, <em>and</em> some clever idea for how to reconcile their seemingly irreconcilable APIs. So every previous attempt had failed. Now I thought I had a clever idea, so I gave it a try.</p>
<p>Reitz was very interested in this work, because he very much wanted async support in Requests, but – as he told me – didn't know how to solve these problems himself. We had some video calls and IRC discussions, and he tried to <a href="https://twitter.com/kennethreitz/status/904399257804922881">leverage his notoriety to recruit volunteers and send them my way</a>. Nothing much came of this, but I kept plugging away, along with some other Trio contributors.</p>
<p><img src="https://vorpus.org/blog/why-im-not-collaborating-with-kenneth-reitz/requests3-features.png" alt="Screenshot of Reitz's fundraising page, showing the promised features in Requests 3" width="50%" align="right"></p>
<p>Then on March 7 2018, he <a href="https://twitter.com/kennethreitz/status/971523729632178176">announced</a> that work had begun on "Requests 3", that its headline feature would be the native async/await support I was working on, and that he was seeking donations to make this happen.</p>
<p>Most open-source projects struggle to raise a few thousand dollars to hold a meeting, but this got people excited. He was deluged with donations from both individuals and large companies like Microsoft, Google, Slack, etc., and the fundraiser total quickly reached ~$30k.</p>
<p>On March 15 2018, he contacted me to talk about the fundraiser. He told me he was uncertain what to do with this amount of money – he said his original goal was just to raise $5k to buy a computer. Privately, I was skeptical that the $5k computer had anything to do with Requests. Requests is a small pure-Python library; if you want to work on it, then any cheap laptop is more than sufficient. $5k is the price of a beefy server or <a href="https://techbuyersguru.com/best-5000-gaming-pc-build-ultra-extreme-4k-threadripper-may-2019">top-end gaming rig</a>. But I figured that even if he spent $5k of the money on some unrelated computer, we could call that compensation for his past work, and it would still leave ~$25k to fulfill the promises he'd made in the fundraiser. And this was clearly a great opportunity to build some amazing new stuff. So I didn't say anything about the computer.</p>
<p>Instead, I gave some general tips from my experience with fund-raising and grants, emphasizing the importance of transparency to maintain trust, and recommending he set up a <a href="https://en.wikipedia.org/wiki/Fiscal_sponsorship">fiscal sponsorship</a> relationship with the Python Software Foundation (PSF) or a similar non-profit. And I tried to help with finding ways to spend the money effectively – for example, I was already working full-time, but I contacted one of the volunteers who'd been helping me to see if they were available for a contracting gig.</p>
<p>Around this time, he also did some experiments with our work-in-progress on urllib3, which led to a <a href="https://twitter.com/kennethreitz/status/974963822682427393">tweet</a> demonstrating "Requests Core" issuing multiple HTTP requests in parallel. "Requests Core" here was a snapshot of our work, that he forked and renamed. As far as I know, the only thing added was some basic HTTP/2 support, but unfortunately (and despite our warnings beforehand) this used a dead-end approach, so the code wasn't useful.</p>
<p>Up to this point, there were definitely some odd features in our interactions, but, you know, people are odd sometimes. I personally wouldn't have announced a fundraiser without first talking to the people actually working on the features I was promising, but I was confident we could find some way to spend the money effectively. Maybe his HTTP/2 code wasn't useful, but at least he was getting some experience with async/await. I thought it would work out OK.</p>
<p>Over the next few months, there were some more odd things – different members of the Requests maintainers team reported hearing very different stories about what was happening to the money. But the big change came in late May 2018, when I <a href="https://vorpus.org/blog/a-farewell-to-the-berkeley-institute-for-data-science/">left UC Berkeley</a> and started consulting. This seemed like a potential win-win – I was looking for work and excited about the project, and he was stuck with money he had no way to spend. So I sent him an email to explore further.</p>
<p>After a month and several follow-up pings, he finally responded. His main points were:</p>
<ul>
<li>He actually only raised $28k.</li>
<li>"Most of it" went to taxes.</li>
<li>He expected me to do the work of fulfilling the commitments he'd made for new features in Requests 3.</li>
<li>But none of the money was available to fulfill those commitments; instead, he was going to wait for me to implement the new features for him, and then he needed the entire $28k to pay for writing documentation for my features.</li>
<li>If I couldn't fulfill his commitments on a volunteer basis, he encouraged me to hold my own fundraiser.</li>
</ul>
<p>He ended by suggesting we do a call that week to discuss details.</p>
<p>I was bewildered. That's not how taxes work. It's not how commitments work. The idea that novel technology stacks are free but a few pages of docs cost $28k is bizarre. The idea that you can't afford to implement new features because you're going to spend the money on documenting the new features you can't afford to build... it doesn't make any sense at all.</p>
<p>If he'd found another way to use the money on Requests, then I would have been totally happy. I didn't have any claim on the money. But this was something else entirely. I was extremely concerned. But I still wanted to get the best outcome we could for the project and the community, so I tried to keep the lines of communication open. I agreed that a call would be a good idea, and suggested some times. I also expressed my worry that he was risking his reputation – more strongly this time – and reiterated my offer to help, writing: "I think right now there is a real risk that requests 3 never materializes and the public impression becomes "oh yeah Kenneth Reitz stole that money". I really hope neither of these things happens. But hope isn't a plan. I think we need a plan."</p>
<p>At this point he stopped answering my emails, and deleted the fundraising page – the one with the record of donations received, and what he was promising in return – from his website (<a href="https://web.archive.org/web/20180701201816/https://www.kennethreitz.org/requests3">before</a> / <a href="https://web.archive.org/web/20180715115213/https://www.kennethreitz.org/requests3">after</a>). He also updated the <a href="https://github.com/kennethreitz/requests/commit/f4818b2010ce0846e9ddc85f439b80303a395494#diff-caf2a6b8f4947d018f68893c695b5202L86">Requests</a> <a href="https://github.com/kennethreitz/requests/commit/704abf13325355497fc2df73259fb24615c237b9">documentation</a> and his blog (<a href="https://web.archive.org/web/20180405031054/https://www.kennethreitz.org/essays/call-for-sponsors-requests-30-development">before</a> / <a href="https://web.archive.org/web/20180810042517/https://www.kennethreitz.org/essays/call-for-sponsors-requests-30-development">after</a>) to remove references to the deleted page. Some months later, he put up a <a href="https://web.archive.org/web/20180916141516/https://www.kennethreitz.org/requests3/">new page at the original URL</a>, requesting that anyone who had questions about the fundraiser should contact him privately.</p>
<p>Our only contact since then was an email he sent me out of the blue on February 9 this year. Instead of responding to anything I'd said before, he suggested that he and I write a joint grant proposal to the PSF, to pay me to do the same work that his fundraiser was allegedly funding. Of course this was a non-starter. I'm pretty sure the PSF is too savvy to fund something like this without asking some tough questions about where the other money went. And even if they didn't, and even if we somehow ignored the ethical issues, he was effectively asking me to link our reputations together, so that if his handling of the fundraiser blew up, it would implicate me as well. I didn't reply.</p>
<h1>Was it an honest mistake?</h1>
<p>Not everyone is familiar with standard practices for handling fundraising in open-source projects. So as a comparison, let me explain how the Python Software Foundation's <a href="https://wiki.python.org/psf/PackagingWG">Packaging Working Group</a> handled the <a href="https://pyfound.blogspot.com/2017/11/the-psf-awarded-moss-grant-pypi.html">funding for the new PyPI</a>.</p>
<p>Since this was our first time getting an external grant like this, we started by making a plan for what to do and who would do it, including identifying existing contributors who were available to work as contractors. Only after that was in place did we apply for the money.</p>
<p>Then after the money arrived, we didn't just hand it over. Each of the contractors wrote up a few paragraphs to formally state their rates and what they were committing to, the group reviewed them, and then we held a quick vote over email to approve them. The contractors who were members of the Working Group didn't vote on their own proposals. Everyone provided regular invoices. And the whole process was ultimately overseen by the PSF's Board of Directors, who are elected by the community.</p>
<p>This is a pretty lightweight process, and it isn't infallible, but it provides a baseline level of transparency and accountability. And the PSF is happy to provide this service for any Python-related project; for example, they handle donations for <a href="https://palletsprojects.com/blog/donate/">Flask and related projects</a>.</p>
<p>Perhaps Reitz simply didn't know how these things are normally done, and this is all an unfortunate but understandable mistake. However, I find this unlikely. At the time Reitz ran his fundraiser, he was sitting on the PSF Board of Directors. And as a member of the Packaging Working Group, he participated in the voting for the PyPI funding, which happened a few months before he started his fundraiser. And yet, none of the PSF staff I've talked to knew about his fundraiser until I told them about it.</p>
<p>In short: He chose a fundraiser structure that avoids standard accountability mechanisms he was familiar with. He never had any plan or capability to deliver what he promised. And when I offered a way for him to do it anyway, he gave me some bafflegab about how expensive it is to write docs. Effectively, his public promises about how he would use the Requests 3 money were lies from start to finish, and he hasn't shown any remorse or even understanding that this is a problem.</p>
<p>A betrayal of trust like this damages the entire community. It's hard enough raising money for open-source as it is; this kind of thing <em>really</em> doesn't help.</p>
<p>And on a more personal level, I felt his interactions with me were extremely manipulative. I felt like he tried to exploit me, and that he tried to make me complicit in covering up his lies to protect his reputation. I was extremely uncomfortable with the idea of going along with this, but he created a situation where my only other options were to either give up on working on async entirely, or else to go public with the whole story, at potentially serious cost to myself.</p>
<h1>Was this a one-off mistake, or part of a larger pattern?</h1>
<p>I wasn't sure what to do, so I started quietly contacting other community members to get more context. I quickly discovered that contrary to Reitz's public reputation, every time I talked to anyone who had worked with him directly, they expressed serious discomfort with him, and many had their own disturbing stories – mine was nowhere near the worst. For example, <a href="https://github.com/sigmavirus24">Ian Stapleton Cordasco</a> volunteered to go on the record publicly, stating: "Having to deal with Kenneth all these years has made it such that I barely work on python open source software anymore and have largely, quietly left the community".</p>
<p>Something I found especially disturbing: whenever I talked to any of his long-term collaborators about my experience, they immediately jumped to reassure me that I wasn't going crazy. Which... I mean, I appreciated the support. But it was clear this isn't the first time they'd had to do this. Apparently after people start working with Reitz, they always need to be reassured that they can trust their own perceptions. These collaborators have been doing this for so long that this seems normal to them. But it's not normal.</p>
<p>This is the classic "missing stair" problem. Those in the inner circle quietly work around the toxic person. Outsiders come in blind. I'm pretty well-connected in the Python world, and I came in blind. In retrospect, I can see some warning signs. The insistence on <a href="https://en.wikipedia.org/wiki/Auteur">auteur</a> status now seems less like a charming quirk, and more like a calculated bluff to claim credit and power while denying responsibility. An <a href="https://twitter.com/kennethreitz/status/997485342579220482">insistence on "positivity"</a> is a common tactic among those who want to avoid accountability. But they fooled me.</p>
<p>Something I keep thinking about: the first time I talked to him about async in Requests, months before the fundraiser, he made a strange comment: he pointed out that he was totally dependent on me to implement this, and therefore, if I were to demand that he make Requests use Trio (my library) by default instead of AsyncIO (the better-known competitor), then he'd have no choice but to acquiesce. It struck me as an incredibly strange thing to bring up – it was almost like he was asking me to manipulate him. At the time, I mumbled something about wanting to succeed on the merits, not by blackmail, and recommended that he not set a default at all. In retrospect, I'm reminded of how con artists often start by tempting their victims into some minor unethical act, so that as the con escalates they feel trapped.</p>
<p>His collaborators also consistently cited his bipolar disorder as an excuse for whatever he did. I think this is deeply unfair to Reitz, and to everyone struggling with mental health issues. <a href="https://medium.com/@tmcolon/its-a-mental-illness-not-an-excuse-to-be-an-a-hole-db0c909d14f9">Illness does not erase the harm someone does to others, or their responsibility for their actions.</a> Many people manage their conditions without causing this kind of harm, and when they mess up, they make amends, just like the rest of us. If someone can't do that, then as a community, we can have compassion but shouldn't give them power and influence.</p>
<p>I think a lot of people don't realize how little Reitz actually has to do with Requests development. For many years now, actual maintenance has been done almost exclusively by other volunteers. If you look at the <a href="https://pypi.org/project/requests/">maintainers list on PyPI</a>, you'll see he doesn't have PyPI rights to his own project, because he kept breaking stuff, so the real maintainers insisted on revoking his access. If you clone the Requests git repo, you can run <code>git log requests/</code> to see a list of <a href="https://github.com/kennethreitz/requests/commits/master/requests">every time someone changed the library's source code</a>, either directly or by merging someone else's pull request. The last time Reitz did either was in May 2017, when he made some whitespace cleanups.</p>
<p>At least as far as commits go, his main contributions since then appear to consist of merging some small doc fixes, and monetizing the project by adding <a href="https://github.com/kennethreitz/requests/commit/86914e2ddab3da351e484a3211406962ce1922c3">donation links</a>, <a href="https://github.com/kennethreitz/requests/pull/4779">ads</a>, <a href="https://github.com/kennethreitz/requests/commit/63e7748fe502f44d112486d81a0da8cf38f36455">intrusive sponsored links</a>, etc. All of this money goes directly into his pocket, not the project's maintainers.</p>
<p>I also learned that he has a history of selling premium support contracts for Requests, where he took the money and then delegated the actual work to unpaid volunteers.</p>
<p>I don't have any objection to trying to make money from open-source. I've written before about how <a href="https://vorpus.org/blog/the-unreasonable-effectiveness-of-investment-in-open-source-infrastructure/">open-source doesn't get nearly enough investment</a>. I do object to exploiting volunteers, driving out community members, and lying to funders and the broader community. Reitz has a consistent history of doing all these things.</p>
<h1>Why am I writing this?</h1>
<p>I've struggled to decide what to do here. Since last year, I've tried to be very cautious when speaking to people about this, because I don't want to start false rumors or feed an internet mob. (This has also meant keeping quiet about the work we've been doing on async in urllib3, and made it difficult for me to work on it at all.) And I'm scared of how making this public might affect my own reputation and mental health.</p>
<p>Ultimately, I decided to speak out because I care deeply about the Python community and its members. If one of our community's most prominent members freely lies to donors and harms volunteers, and if we all let that go without saying anything, then that puts everything we've built together at risk. And I'm in a better position than many to speak up.</p>
<p>So what happens now?</p>
<p>Since this is the internet, I have to say explicitly: Please do not harass or abuse Reitz. That's never appropriate. (And in case you're the kind of person that doesn't find moral arguments convincing, then consider: he clearly wants attention.)</p>
<p>I call on Reitz to make a public accounting of the money he raised and how it was spent.</p>
<p>I urge the Requests project maintainers to transition their project to a more normal, less dysfunctional governance model. You can acknowledge his contributions without buying into his personal mythology. His insights are not irreplaceable. You know this situation is harming you and your users. You and your users are more important than his ego.</p>
<p>Beyond that, I'm going to focus on my own work. I'm done keeping secrets to protect Reitz from the consequences of his actions; what happens next is up to him and the larger Python community.</p>
<p>If anyone needs a listening ear, I can be reached at <a href="mailto:njs@pobox.com">njs@pobox.com</a>. I'm also around at PyCon this weekend.</p>
<h2>Edit history</h2>
<ul>
<li><strong>2019-05-04</strong>: Initial post.</li>
<li><strong>2019-05-06</strong>: The original post quoted several anonymous community members, with the goal of further illustrating the climate created by Reitz's behavior. I received feedback that the anonymous quotes weren't adding to constructive discussion. On consideration, I agree, so I removed them. I kept all the text describing things I experienced personally, as well as the one credited quote.</li>
</ul>Beautiful tracebacks in Trio v0.7.02018-09-10T12:00:00-07:002018-09-10T12:00:00-07:00Nathaniel J. Smithtag:vorpus.org,2018-09-10:/blog/beautiful-tracebacks-in-trio-v070/<p><a href="https://trio.readthedocs.io/">Trio</a> is a new async concurrency
library for Python that's obsessed with correctness and usability.</p>
<p>On the correctness side, one of Trio's unique features is that it
never discards exceptions: if you don't catch an exception, then
eventually it will propagate out the top of your program and print a
traceback to help you debug, just like in regular Python. Errors
should never pass silently!</p>
<p>But... in Trio v0.6.0 and earlier, these tracebacks also contained a
lot of clutter showing how the exception moved through Trio's internal
plumbing, which made it difficult to see the parts that were relevant
to your code. It's a small thing, but when you're debugging some nasty
concurrency bug, it can make a big difference to have exactly the
information you need, clearly laid out, without distractions.</p>
<p>And thanks to some
<a href="https://github.com/python-trio/trio/pull/612">hard</a>
<a href="https://github.com/python-trio/trio/pull/631">work</a>
<a href="https://github.com/python-trio/trio/pull/640">by</a> <a href="https://github.com/belm0">John
Belmonte</a>, the just-released Trio v0.7.0
gives you exactly that: clean tracebacks, focused on your code,
without the clutter. See below for some before/after comparisons.</p>
<p>Before Trio, I never really thought about where tracebacks came from,
and I certainly never changed how I wrote code because I wanted it to
produce a different traceback. Making useful tracebacks is the
interpreter's job, right? In the process, we had to study how the
interpreter manages tracebacks, <a href="https://github.com/python-trio/trio/pull/612#issuecomment-414201886">how they interact with context
managers</a>,
<a href="https://github.com/python-trio/trio/pull/634/files">how to introspect stack usage in third-party
libraries</a>, and
other arcane details ... but the results are totally worth it.</p>
<p>To me, this is what makes Trio so fun to work on: our goal is to make
Python concurrency an order of magnitude friendlier and more
accessible than it's ever been before, and that means we're constantly
exploring new design spaces, discovering new things, and figuring out
new ways to push the limits of the language.</p>
<p>If that sounds like fun to you too, then we're always <a href="https://trio.readthedocs.io/en/latest/contributing.html">looking for
contributors</a>.
And don't worry, you don't need to be an expert on tracebacks or
concurrency – the great thing about inventing something new is that we
get to figure it out together!</p>
<p>Or, just scroll down to check out our new tracebacks. They're so
pretty! 🤩</p>
<h1>Simple example</h1>
<p>Here's the simplest possible crashing Trio program:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">trio</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="k">raise</span> <span class="ne">RuntimeError</span><span class="p">(</span><span class="s2">"whoops"</span><span class="p">)</span>
<span class="n">trio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
</code></pre></div>
<p>With previous Trio versions, this code gave us a traceback like:</p>
<div class="highlight"><pre><span></span><code><span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"error-example.py"</span>, line <span class="m">6</span>, in <span class="n"><module></span>
<span class="n">trio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">1277</span>, in <span class="n">run</span>
<span class="k">return</span> <span class="n">result</span><span class="o">.</span><span class="n">unwrap</span><span class="p">()</span>
File <span class="nb">".../site-packages/outcome/_sync.py"</span>, line <span class="m">107</span>, in <span class="n">unwrap</span>
<span class="k">raise</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">1387</span>, in <span class="n">run_impl</span>
<span class="n">msg</span> <span class="o">=</span> <span class="n">task</span><span class="o">.</span><span class="n">context</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">task</span><span class="o">.</span><span class="n">coro</span><span class="o">.</span><span class="n">send</span><span class="p">,</span> <span class="n">next_send</span><span class="p">)</span>
File <span class="nb">".../site-packages/contextvars/__init__.py"</span>, line <span class="m">38</span>, in <span class="n">run</span>
<span class="k">return</span> <span class="n">callable</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">970</span>, in <span class="n">init</span>
<span class="bp">self</span><span class="o">.</span><span class="n">entry_queue</span><span class="o">.</span><span class="n">spawn</span><span class="p">()</span>
File <span class="nb">".../site-packages/async_generator/_util.py"</span>, line <span class="m">42</span>, in <span class="n">__aexit__</span>
<span class="k">await</span> <span class="bp">self</span><span class="o">.</span><span class="n">_agen</span><span class="o">.</span><span class="n">asend</span><span class="p">(</span><span class="kc">None</span><span class="p">)</span>
File <span class="nb">".../site-packages/async_generator/_impl.py"</span>, line <span class="m">366</span>, in <span class="n">step</span>
<span class="k">return</span> <span class="k">await</span> <span class="n">ANextIter</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_it</span><span class="p">,</span> <span class="n">start_fn</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span>
File <span class="nb">".../site-packages/async_generator/_impl.py"</span>, line <span class="m">202</span>, in <span class="n">send</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_invoke</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_it</span><span class="o">.</span><span class="n">send</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
File <span class="nb">".../site-packages/async_generator/_impl.py"</span>, line <span class="m">209</span>, in <span class="n">_invoke</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">fn</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">317</span>, in <span class="n">open_nursery</span>
<span class="k">await</span> <span class="n">nursery</span><span class="o">.</span><span class="n">_nested_child_finished</span><span class="p">(</span><span class="n">nested_child_exc</span><span class="p">)</span>
File <span class="nb">"/usr/lib/python3.6/contextlib.py"</span>, line <span class="m">99</span>, in <span class="n">__exit__</span>
<span class="bp">self</span><span class="o">.</span><span class="n">gen</span><span class="o">.</span><span class="n">throw</span><span class="p">(</span><span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">traceback</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">202</span>, in <span class="n">open_cancel_scope</span>
<span class="k">yield</span> <span class="n">scope</span>
File <span class="nb">".../site-packages/trio/_core/_multierror.py"</span>, line <span class="m">144</span>, in <span class="n">__exit__</span>
<span class="k">raise</span> <span class="n">filtered_exc</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">202</span>, in <span class="n">open_cancel_scope</span>
<span class="k">yield</span> <span class="n">scope</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">317</span>, in <span class="n">open_nursery</span>
<span class="k">await</span> <span class="n">nursery</span><span class="o">.</span><span class="n">_nested_child_finished</span><span class="p">(</span><span class="n">nested_child_exc</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">428</span>, in <span class="n">_nested_child_finished</span>
<span class="k">raise</span> <span class="n">MultiError</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_pending_excs</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">1387</span>, in <span class="n">run_impl</span>
<span class="n">msg</span> <span class="o">=</span> <span class="n">task</span><span class="o">.</span><span class="n">context</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">task</span><span class="o">.</span><span class="n">coro</span><span class="o">.</span><span class="n">send</span><span class="p">,</span> <span class="n">next_send</span><span class="p">)</span>
File <span class="nb">".../site-packages/contextvars/__init__.py"</span>, line <span class="m">38</span>, in <span class="n">run</span>
<span class="k">return</span> <span class="n">callable</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
File <span class="nb">"error-example.py"</span>, line <span class="m">4</span>, in <span class="n">main</span>
<span class="k">raise</span> <span class="ne">RuntimeError</span><span class="p">(</span><span class="s2">"whoops"</span><span class="p">)</span>
<span class="gr">RuntimeError</span>: <span class="n">whoops</span>
</code></pre></div>
<p>It's accurate, and I guess it shows off how hard Trio is working on
your behalf, but that's about all I can say for it – all the stuff our
users care about is drowned in the noise.</p>
<p>But thanks to John's fixes, Trio v0.7.0 instead prints:</p>
<div class="highlight"><pre><span></span><code><span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"error-example.py"</span>, line <span class="m">6</span>, in <span class="n"><module></span>
<span class="n">trio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/trio/_core/_run.py"</span>, line <span class="m">1328</span>, in <span class="n">run</span>
<span class="k">raise</span> <span class="n">runner</span><span class="o">.</span><span class="n">main_task_outcome</span><span class="o">.</span><span class="n">error</span>
File <span class="nb">"error-example.py"</span>, line <span class="m">4</span>, in <span class="n">main</span>
<span class="k">raise</span> <span class="ne">RuntimeError</span><span class="p">(</span><span class="s2">"whoops"</span><span class="p">)</span>
<span class="gr">RuntimeError</span>: <span class="n">whoops</span>
</code></pre></div>
<p>Three frames, straight to the point. We've removed almost all of
Trio's internals from the traceback. And, for the one line that we
can't remove (due to Python interpreter limitations), we've rewritten
it so you can get a rough idea of what it's doing even when it's
presented out of context like this. (<code>run</code> re-raises the main task's
error.)</p>
<h1>A more complex example</h1>
<p>Here's a program that starts two concurrent tasks, which both raise
exceptions simultaneously. (If you're wondering what this "nursery"
thing is, <a href="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/">see this earlier
post</a>.)</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">trio</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">crasher1</span><span class="p">():</span>
<span class="k">raise</span> <span class="ne">KeyError</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">crasher2</span><span class="p">():</span>
<span class="k">raise</span> <span class="ne">ValueError</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">open_nursery</span><span class="p">()</span> <span class="k">as</span> <span class="n">nursery</span><span class="p">:</span>
<span class="n">nursery</span><span class="o">.</span><span class="n">start_soon</span><span class="p">(</span><span class="n">crasher1</span><span class="p">)</span>
<span class="n">nursery</span><span class="o">.</span><span class="n">start_soon</span><span class="p">(</span><span class="n">crasher2</span><span class="p">)</span>
<span class="n">trio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
</code></pre></div>
<p>Hope your scroll wheel is ready, because here's what old versions of
Trio printed for this:</p>
<div class="highlight"><pre><span></span><code><span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"error-example.py"</span>, line <span class="m">14</span>, in <span class="n"><module></span>
<span class="n">trio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">1277</span>, in <span class="n">run</span>
<span class="k">return</span> <span class="n">result</span><span class="o">.</span><span class="n">unwrap</span><span class="p">()</span>
File <span class="nb">".../site-packages/outcome/_sync.py"</span>, line <span class="m">107</span>, in <span class="n">unwrap</span>
<span class="k">raise</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">1387</span>, in <span class="n">run_impl</span>
<span class="n">msg</span> <span class="o">=</span> <span class="n">task</span><span class="o">.</span><span class="n">context</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">task</span><span class="o">.</span><span class="n">coro</span><span class="o">.</span><span class="n">send</span><span class="p">,</span> <span class="n">next_send</span><span class="p">)</span>
File <span class="nb">".../site-packages/contextvars/__init__.py"</span>, line <span class="m">38</span>, in <span class="n">run</span>
<span class="k">return</span> <span class="n">callable</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">970</span>, in <span class="n">init</span>
<span class="bp">self</span><span class="o">.</span><span class="n">entry_queue</span><span class="o">.</span><span class="n">spawn</span><span class="p">()</span>
File <span class="nb">".../site-packages/async_generator/_util.py"</span>, line <span class="m">42</span>, in <span class="n">__aexit__</span>
<span class="k">await</span> <span class="bp">self</span><span class="o">.</span><span class="n">_agen</span><span class="o">.</span><span class="n">asend</span><span class="p">(</span><span class="kc">None</span><span class="p">)</span>
File <span class="nb">".../site-packages/async_generator/_impl.py"</span>, line <span class="m">366</span>, in <span class="n">step</span>
<span class="k">return</span> <span class="k">await</span> <span class="n">ANextIter</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_it</span><span class="p">,</span> <span class="n">start_fn</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span>
File <span class="nb">".../site-packages/async_generator/_impl.py"</span>, line <span class="m">202</span>, in <span class="n">send</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_invoke</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_it</span><span class="o">.</span><span class="n">send</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
File <span class="nb">".../site-packages/async_generator/_impl.py"</span>, line <span class="m">209</span>, in <span class="n">_invoke</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">fn</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">317</span>, in <span class="n">open_nursery</span>
<span class="k">await</span> <span class="n">nursery</span><span class="o">.</span><span class="n">_nested_child_finished</span><span class="p">(</span><span class="n">nested_child_exc</span><span class="p">)</span>
File <span class="nb">"/usr/lib/python3.6/contextlib.py"</span>, line <span class="m">99</span>, in <span class="n">__exit__</span>
<span class="bp">self</span><span class="o">.</span><span class="n">gen</span><span class="o">.</span><span class="n">throw</span><span class="p">(</span><span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">traceback</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">202</span>, in <span class="n">open_cancel_scope</span>
<span class="k">yield</span> <span class="n">scope</span>
File <span class="nb">".../site-packages/trio/_core/_multierror.py"</span>, line <span class="m">144</span>, in <span class="n">__exit__</span>
<span class="k">raise</span> <span class="n">filtered_exc</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">1387</span>, in <span class="n">run_impl</span>
<span class="n">msg</span> <span class="o">=</span> <span class="n">task</span><span class="o">.</span><span class="n">context</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">task</span><span class="o">.</span><span class="n">coro</span><span class="o">.</span><span class="n">send</span><span class="p">,</span> <span class="n">next_send</span><span class="p">)</span>
File <span class="nb">".../site-packages/contextvars/__init__.py"</span>, line <span class="m">38</span>, in <span class="n">run</span>
<span class="k">return</span> <span class="n">callable</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
File <span class="nb">"error-example.py"</span>, line <span class="m">12</span>, in <span class="n">main</span>
<span class="n">nursery</span><span class="o">.</span><span class="n">start_soon</span><span class="p">(</span><span class="n">crasher2</span><span class="p">)</span>
File <span class="nb">".../site-packages/async_generator/_util.py"</span>, line <span class="m">42</span>, in <span class="n">__aexit__</span>
<span class="k">await</span> <span class="bp">self</span><span class="o">.</span><span class="n">_agen</span><span class="o">.</span><span class="n">asend</span><span class="p">(</span><span class="kc">None</span><span class="p">)</span>
File <span class="nb">".../site-packages/async_generator/_impl.py"</span>, line <span class="m">366</span>, in <span class="n">step</span>
<span class="k">return</span> <span class="k">await</span> <span class="n">ANextIter</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_it</span><span class="p">,</span> <span class="n">start_fn</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span>
File <span class="nb">".../site-packages/async_generator/_impl.py"</span>, line <span class="m">202</span>, in <span class="n">send</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_invoke</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_it</span><span class="o">.</span><span class="n">send</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
File <span class="nb">".../site-packages/async_generator/_impl.py"</span>, line <span class="m">209</span>, in <span class="n">_invoke</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">fn</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">317</span>, in <span class="n">open_nursery</span>
<span class="k">await</span> <span class="n">nursery</span><span class="o">.</span><span class="n">_nested_child_finished</span><span class="p">(</span><span class="n">nested_child_exc</span><span class="p">)</span>
File <span class="nb">"/usr/lib/python3.6/contextlib.py"</span>, line <span class="m">99</span>, in <span class="n">__exit__</span>
<span class="bp">self</span><span class="o">.</span><span class="n">gen</span><span class="o">.</span><span class="n">throw</span><span class="p">(</span><span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">traceback</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/_core/_run.py"</span>, line <span class="m">202</span>, in <span class="n">open_cancel_scope</span>
<span class="k">yield</span> <span class="n">scope</span>
File <span class="nb">".../site-packages/trio/_core/_multierror.py"</span>, line <span class="m">144</span>, in <span class="n">__exit__</span>
<span class="k">raise</span> <span class="n">filtered_exc</span>
<span class="gr">trio.MultiError</span>: <span class="n">KeyError(), ValueError()</span>
<span class="x">Details of embedded exception 1:</span>
<span class="x"> Traceback (most recent call last):</span>
<span class="x"> File ".../site-packages/trio/_core/_run.py", line 202, in open_cancel_scope</span>
<span class="x"> yield scope</span>
<span class="x"> File ".../site-packages/trio/_core/_run.py", line 317, in open_nursery</span>
<span class="x"> await nursery._nested_child_finished(nested_child_exc)</span>
<span class="x"> File ".../site-packages/trio/_core/_run.py", line 428, in _nested_child_finished</span>
<span class="x"> raise MultiError(self._pending_excs)</span>
<span class="x"> File ".../site-packages/trio/_core/_run.py", line 1387, in run_impl</span>
<span class="x"> msg = task.context.run(task.coro.send, next_send)</span>
<span class="x"> File ".../site-packages/contextvars/__init__.py", line 38, in run</span>
<span class="x"> return callable(*args, **kwargs)</span>
<span class="x"> File "error-example.py", line 4, in crasher1</span>
<span class="x"> raise KeyError</span>
<span class="x"> KeyError</span>
<span class="x">Details of embedded exception 2:</span>
<span class="x"> Traceback (most recent call last):</span>
<span class="x"> File ".../site-packages/trio/_core/_run.py", line 202, in open_cancel_scope</span>
<span class="x"> yield scope</span>
<span class="x"> File ".../site-packages/trio/_core/_run.py", line 317, in open_nursery</span>
<span class="x"> await nursery._nested_child_finished(nested_child_exc)</span>
<span class="x"> File ".../site-packages/trio/_core/_run.py", line 428, in _nested_child_finished</span>
<span class="x"> raise MultiError(self._pending_excs)</span>
<span class="x"> File ".../site-packages/trio/_core/_run.py", line 1387, in run_impl</span>
<span class="x"> msg = task.context.run(task.coro.send, next_send)</span>
<span class="x"> File ".../site-packages/contextvars/__init__.py", line 38, in run</span>
<span class="x"> return callable(*args, **kwargs)</span>
<span class="x"> File "error-example.py", line 7, in crasher2</span>
<span class="x"> raise ValueError</span>
<span class="x"> ValueError</span>
</code></pre></div>
<p>Accurate, but unreadable. But now, after rewriting substantial
portions of Trio's core task management code, we get:</p>
<div class="highlight"><pre><span></span><code><span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"error-example.py"</span>, line <span class="m">14</span>, in <span class="n"><module></span>
<span class="n">trio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/trio/_core/_run.py"</span>, line <span class="m">1328</span>, in <span class="n">run</span>
<span class="k">raise</span> <span class="n">runner</span><span class="o">.</span><span class="n">main_task_outcome</span><span class="o">.</span><span class="n">error</span>
File <span class="nb">"error-example.py"</span>, line <span class="m">12</span>, in <span class="n">main</span>
<span class="n">nursery</span><span class="o">.</span><span class="n">start_soon</span><span class="p">(</span><span class="n">crasher2</span><span class="p">)</span>
File <span class="nb">".../site-packages/trio/trio/_core/_run.py"</span>, line <span class="m">395</span>, in <span class="n">__aexit__</span>
<span class="k">raise</span> <span class="n">combined_error_from_nursery</span>
<span class="gr">trio.MultiError</span>: <span class="n">KeyError(), ValueError()</span>
<span class="x">Details of embedded exception 1:</span>
<span class="x"> Traceback (most recent call last):</span>
<span class="x"> File "error-example.py", line 4, in crasher1</span>
<span class="x"> raise KeyError</span>
<span class="x"> KeyError</span>
<span class="x">Details of embedded exception 2:</span>
<span class="x"> Traceback (most recent call last):</span>
<span class="x"> File "error-example.py", line 7, in crasher2</span>
<span class="x"> raise ValueError</span>
<span class="x"> ValueError</span>
</code></pre></div>
<p>Reading from the bottom up, the two exceptions each started in their
respective tasks, then met and got bundled together into a
<code>MultiError</code>, which propagated into the main task's nursery block, and
then eventually up out of the call to <code>trio.run</code>.</p>
<p>Now when things go wrong, Trio shows you what you need to reconstruct
what happened, and nothing else.</p>
<h1>Comments</h1>
<p>You can <a href="https://trio.discourse.group/t/discussion-thread-beautiful-tracebacks-in-trio-v0-7-0/29">discuss this post on the Trio forum</a>.</p>The unreasonable effectiveness of investment in open-source infrastructure2018-05-25T02:00:00-07:002018-05-25T02:00:00-07:00Nathaniel J. Smithtag:vorpus.org,2018-05-25:/blog/the-unreasonable-effectiveness-of-investment-in-open-source-infrastructure/<p>In
my
<a href="/blog/a-farewell-to-the-berkeley-institute-for-data-science">last post</a>,
I gave a retrospective of my time at the UC Berkeley Institute for
Data Science (BIDS), where I've had an unusual, almost unique,
position that allowed me to focus full-time on making the open Python
ecosystem work better for scientists, and in particular I described my
work in four areas: revitalizing NumPy development, improving Python
packaging, the viridis colormap, and the Trio project to make
concurrent programming more accessible. Of course you should read and
judge for yourself, but personally I feel like this was an
extraordinary return-on-investment for BIDS and its funders: 1
headcount × 2 years = 4 different projects that wouldn't have happened
otherwise (plus two more candidate projects identified), all with
enormously broad impact across research and industry.</p>
<p>Yet curiously, all the problems that I worked on are ones that have
been well-known and widely-discussed for years. So why didn't they get
addressed before? How can there be so much low-hanging fruit? Why was
funding me so unreasonably effective?</p>
<p>I wish I could say that it's because I'm, y'know, <em>just that good</em>...
but it's not true. Instead, I'd argue that these successes followed
from some specific aspects of the position, and are replicable at
other institutions and in other communities. Specifically, I believe
that these projects all fell into a category that's <strong>mostly
inaccessible to current funding models for open (scientific)
software</strong>. Projects like this accumulate, gathering dust, because
there's no-one in a position to tackle them. This is a tragedy, but if
we can understand the reason and find a way to fix it, then we'll
unlock a tremendous opportunity for high-ROI investments.</p>
<p>The category I'm thinking of is defined by two features: it contains
projects that (1) require a modest but non-trivial amount of
sustained, focused attention, and (2) have an impact that is large,
but broad and diffuse. That combination is currently kryptonite for
open source and particularly open science. Consider the <strong>types of
labor</strong> we have available:</p>
<p>Famously, a lot of open-source development is done by <strong>volunteers
working nights and weekends, grad students playing hooky, industry
developers using "20% time" to contribute back</strong>: these are similar in
that they're all ways of scavenging small bits of time out of people's
lives. I think it's a testament to the power of the open-source model
that it can take effective advantage of scattershot contributions like
this, and these kinds of contributions can add up to make amazing
things – which makes it tempting to conclude that this kind of labor
is sufficient to solve any problem. But it's not true! There are many
problems where forty people each putting in one hour a week are
helpless, but that can easily be solved by one person working forty
hours. That's why none of NumPy's many volunteers built consensus on
governance or wrote a grant, why dozens of people have tried to get
rid of "jet" without success, why Python packaging remains painful
despite being used daily by millions of people, and so forth – the
inability of any individual contributor to devote enough focused,
sustained attention to get any traction.</p>
<p>Another way people contribute to OSS is <strong>as a side-effect of some
other funded work</strong>. For example, work on conda the open-source
package management tool is subsidized by Anaconda, the commercially
supported software distribution. Or in an academic context, an
astronomy grad student's thesis work is funded by a grant, and they
might contribute the resulting algorithms back to AstroPy. But
paradoxically, the projects I described above all have "too much"
impact to be funded this way – and in particular, their impact is too
broad and diffuse.</p>
<p>Everyone already uses NumPy and nobody owns it, so from a company's
point of view, it's very difficult to make a business case for
supporting its development. You can make a moral case, and sometimes
that can work, but I've had many conversations with companies that
ended with "You're right, we should be helping, and I really wish we
could, but..." Or for another example, before viridis, probably the
most impactful work on the colormap problem was done by engineers at
Mathworks, who created
the
<a href="https://blogs.mathworks.com/steve/2014/10/20/a-new-colormap-for-matlab-part-2-troubles-with-rainbows/">parula</a> colormap
and made it the default in MATLAB – but they had to make
it
<a href="https://blogs.mathworks.com/steve/2014/10/20/a-new-colormap-for-matlab-part-2-troubles-with-rainbows/#comment-27702">proprietary</a> to
justify their investment, which sharply limited its impact.</p>
<p>This isn't unique to industry; essentially the same dynamics apply in
academia as well. If an astronomer contributes to AstroPy, then other
astronomers can appreciate that; it might not be worth as much as
writing a proper journal article, but it's worth some disciplinary
credit, and anyway most of the work can be justified as a side-effect
of publishing a paper, thesis, etc. But NumPy is different: most
advisors will look askance on someone who spends a lot of time trying
to contribute to NumPy, because that's "not astronomy", and while it
produces value, it's not the kind of value that can be captured in
discrete papers and reputation within the field. Similarly, Python's
community-maintained packaging stack is everyone's problem, so it's
no-one's problem. You get the idea.</p>
<p>This raises a natural question: if we can't piggyback on some other
funding, why not <strong>get a dedicated grant</strong>? This is an excellent
solution for projects that require a <em>lot</em> of focused attention, but
there are two problems. First, many projects only require a <em>modest</em>
amount of focused attention – too much for volunteers, but too little
to justify a grant – and thus fall through the cracks. It would have
taken more effort to get a grant for viridis, or for the packaging
improvements described above, than it did to actually do the work. In
other cases, like NumPy or (perhaps) my concurrency project, a grant
makes sense. But there's a catch-22: the planning and writing required
to get a grant is itself a project that requires sustained
attention... and without the grant, this attention isn't available!</p>
<p>So how do grants ever work, then? Well, academia has a solution to
this, that's imperfect in many ways but nonetheless may serve as an
inspiration: they have faculty positions. Faculty have a broad mandate
to identify problems where applying their skills will produce impact,
the autonomy to follow up on these problems, the stability to spend at
least some time on risky projects (especially post-tenure), and an
environment that supports this kind of work (e.g., with startup funds,
grad students, administrative support, etc.). But unfortunately,
universities currently don't like to fund faculty positions outside of
specific fields, or where the outcomes are tools rather than papers –
regardless of how impactful those tools might be.</p>
<p>Of course we should fund more grants for open scientific software.
More and more, scientific research and software development are
interwoven and inseparable – from a single cell in a single Jupyter
notebook, to a custom data processing pipeline, to whole new packages
to disseminate new techniques. And this means that scientific research
is increasingly dependent on the ongoing maintenance of the rich,
shared ecosystem of open infrastructure software that field-specific
and project-specific software builds on.</p>
<p>But grant calls alone will be ineffective unless we also have leaders
who can think strategically about issues that cut across the whole
software ecosystem, identify the way forward, and write those grants –
and those leaders need jobs that let them do this work. Our ecosystem
needs gardeners. That's what made my position at BIDS unique, and why
I was able to tackle these problems: I was one of the few people in
all of science with the mandate, autonomy, stability, and support to
do so. Any solution to the sustainability problem needs to find a way
to create positions with these properties.</p>
<!-- LocalWords: UC NumPy viridis colormap y'know ROI Mathworks
-->
<!-- LocalWords: parula
-->A farewell to the Berkeley Institute for Data Science2018-05-25T01:00:00-07:002018-05-25T01:00:00-07:00Nathaniel J. Smithtag:vorpus.org,2018-05-25:/blog/a-farewell-to-the-berkeley-institute-for-data-science/<p>In February 2015, I joined
the
<a href="https://bids.berkeley.edu/">UC Berkeley Institute for Data Science</a>
(BIDS) in a very unusual position: I got to focus full-time on making
the open Python ecosystem work better for scientists. My contract is
ending in a bit over a month, so I'm currently thinking about what's
next. But in this post I want to instead look back on what this unique
opportunity allowed me to do, both as a kind of personal post-mortem
and in the hopes that it might be of interest to people and
institutions who are thinking about different models for funding open
source and open science. In particular, there's also
a
<a href="/blog/the-unreasonable-effectiveness-of-investment-in-open-source-infrastructure">follow-up post</a> discussing
some implications for software sustainability efforts.</p>
<h1>A BIDS retrospective</h1>
<p>Might as well start with the worst part: in late 2016 I came down with
some
serious
<a href="https://vorpus.org/blog/emerging-from-the-underworld/">health issues</a>,
and have been on partial disability leave since then. This has been
gradually getting better – cross your fingers for me. But it does mean
that despite the calendar dates, in terms of hours worked I've only
been at BIDS for 2 years and change.</p>
<p>But I'm pretty proud of what I accomplished in that time. There were
four main projects I led while at BIDS, that I'll discuss in
individual sections below. And to be clear, I'm certainly not claiming
exclusive credit for any of these – they all involved lots of other
people, who together did way more than I did! But I think it's fair to
say that these are all projects where I played a critical role in
identifying the issues and finding a way to push the community towards
solving them, and that if BIDS hadn't funded my position then none of
these things would have happened.</p>
<h2>Revitalizing NumPy development</h2>
<p>NumPy is so central to numerical work in Python, and so widely used in both academia and industry, that many people assume that it must receive substantial funding and support. But it doesn't; in fact for most of its history it's been maintained by a small group of loosely-organized, unpaid volunteers. When I started at BIDS one of my major goals was to change that, ultimately by getting funding – but simply airdropping money into a community-run OSS project doesn't always produce good results.</p>
<p>So the first priority was to get the existing maintainers on the same page about where we wanted to take the project and how funding could be effectively used – basically paying down "social debt" that had accumulated during the years of under-investment. I organized a <a href="https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting">developer meeting</a>, and based on the discussions there (and with many other stakeholders) we were ultimately able to get consensus around a <a href="https://github.com/numpy/numpy/pull/6352">governance document</a> (<a href="https://www.numpy.org/devdocs/dev/governance/index.html">latest version</a>) and <a href="https://www.youtube.com/watch?v=fowHwlpGb34">technical roadmap</a>. Based on this, I was able to secure two grants totaling $1.3 million from the <a href="https://www.numfocus.org/blog/numpy-receives-first-ever-funding-thanks-to-moore-foundation/">Moore</a> and <a href="https://bids.berkeley.edu/news/bids-receives-sloan-foundation-grant-contribute-numpy-development">Sloan</a> foundations, and we've just finished hiring <a href="https://mail.python.org/pipermail/numpy-discussion/2018-April/077903.html">two full-time NumPy developers at BIDS</a>.</p>
<p>I have to pause here to offer special thanks to the rest of the NumPy grant team at BIDS: Jonathan Dugan, Jarrod Millman, Fernando Pérez, Nelle Varoquaux, and Stéfan van der Walt. I didn't actually have any prior experience with writing grant proposals or hiring people, and initially I was on my own figuring this out, which turned out to be, let's say, <em>challenging</em>... especially since I was trying to do this at the same time as navigating my initial diagnosis and treatment. (It turns out <a href="https://en.wikipedia.org/wiki/Bus_factor">not all buses have wheels</a>.) They deserve major credit for stepping in and generously contributing their time and expertise to keep things going.</p>
<h2>Improving Python packaging (especially for science)</h2>
<p>Software development, like science in general, is an inherently
collaborative activity: we all build on the work of others, and
hopefully contribute back our own work for others to build on in turn.
One of the main mechanisms for this is the use and publication of
software packages. Unfortunately, Python packaging tools have
traditionally been notoriously unfriendly and difficult to work with –
especially for scientific projects that often require complex native
code in C/C++/Fortran – and this has added substantial friction to
this kind of collaboration. While at BIDS, I worked on reducing this
in two ways: one for users, and one for publishers.</p>
<!--
Total manylinux downloads:
SELECT
COUNT(*)
FROM
TABLE_DATE_RANGE( [the-psf:pypi.downloads], TIMESTAMP("20160101"), TIMESTAMP("20180507") )
WHERE
file.filename CONTAINS 'manylinux'
result: 387,546,749
Daily:
Same query for 2018-04-01 through 2018-04-28 (inclusive) says: 28,664,594
By project:
SELECT
COUNT(*), file.project
FROM
TABLE_DATE_RANGE( [the-psf:pypi.downloads], TIMESTAMP("20160101"), TIMESTAMP("20180507") )
WHERE
file.filename CONTAINS 'manylinux'
GROUP BY
file.project
LIMIT
1000
(then I sorted the resulting spreadsheet, because I forgot to sort in the query)
-->
<p>On the package user side, conda has done a great deal to relieve the
pain... but only for conda users. For a variety of reasons, many
people still need or prefer to use the official community-maintained
pip/PyPI/wheel stack. And one major limitation of that stack was that
you could distribute pre-compiled packages on Windows and MacOS, but
not on the other major OS: Linux. To solve this, I led the creation
of <a href="https://github.com/pypa/manylinux">the "manylinux" project</a>. This
has dramatically improved the user experience around installing Python
packages on Linux servers, especially the core scientific stack. When
I ran the numbers a few weeks ago (2018-05-07), ~388 million manylinux
packages had been downloaded from <a href="https://pypi.org/">PyPI</a>, and that
number was growing by ~1 million downloads every day, so we're almost
certainly past 400 million now. And if
you
<a href="https://docs.google.com/spreadsheets/d/1lOLvSF0up4eZyv2ugZi-TM_GIs3gPCkGNIDKXS1Y3w4">look at those downloads</a>,
scientific software is heavily represented: ~30 million downloads of
NumPy, ~15 million SciPy, ~15 million pandas, ~12 million
scikit-learn, ~8 million matplotlib, ~4 million tensorflow, ... (Fun
fact: a back of the envelope calculation<sup id="fnref:scipy-carbon"><a class="footnote-ref" href="#fn:scipy-carbon">1</a></sup> suggests that the
manylinux wheels for SciPy alone have so far prevented ~90 metric tons
of CO<sub>2</sub> emissions, equivalent to planting ~2,400 trees.)</p>
<!-- I tried compiling SciPy on my laptop to make this measurement, and /usr/bin/time said:
766.18user 41.98system 11:56.81elapsed 112%CPU (0avgtext+0avgdata 1113152maxresident)k
1085192inputs+2848992outputs (2026major+11620433minor)pagefaults 0swaps
-->
<p>So manylinux makes things easier for users. Eventually, users become
developers in their own right, and want to publish their work. And
then they have to learn to use distutils/setuptools, which is...
painful. Distutils/setuptools can work well, especially in simple
cases, but their design has some fundamental limitations that make
them confusing and difficult to extend, and this is especially
problematic for any projects with complex native code dependencies or
that use NumPy's C API, i.e. scientific packages. This isn't exactly
distutils's fault – its design dates back to the last millennium, and
no-one could have anticipated all the ways Python would be used over
the coming decades. And Python's packaging maintainers have done a
heroic job of keeping things working and incrementally improving
on
<a href="https://caremad.io/posts/2016/05/powering-pypi/">extremely minimal resources</a>.
But often this has meant piling expedient hacks on top of each other;
it's very difficult to revisit fundamental decisions when you're a
all-volunteer project struggling to maintain critical infrastructure
with millions of stakeholders. And so fighting with
distutils/setuptools has remained a rite of passage for Python
developers. (And conda can't help you here either: for builds, conda
packages rely on distutils/setuptools, just like the rest of us.)</p>
<p>Another of my goals while at BIDS was to chart a path forward out of
this tangle – and, with the help of lots of folks
at
<a href="https://mail.python.org/mm3/archives/list/distutils-sig@python.org/">distutils-sig</a> (especially
Thomas Kluyver, whose efforts were truly heroic!), we now have
one. <a href="https://www.python.org/dev/peps/pep-0518/">PEP 518</a> defines the
<code>pyproject.toml</code> file and for the first time makes it possible to
extend distutils/setuptools in a reasonable way (for those who know
<code>setup.py</code>: this is basically <code>setup_requires</code>, except it works). This
recently shipped in pip 10.
And <a href="https://www.python.org/dev/peps/pep-0517/">PEP 517</a> isn't quite
implemented yet, but soon it will make it easy for projects to abandon
distutils/setuptools entirely in favor of tools that
are <a href="https://flit.readthedocs.io/en/latest/">easier to use</a>
or
<a href="https://scikit-build.readthedocs.io/en/latest/">better prepared to handle demanding scientific users</a>,
making software publication easier and more accessible to ordinary
scientists.</p>
<h2>The Viridis colormap</h2>
<p>When I started at BIDS, matplotlib still used
the
<a href="http://www.climate-lab-book.ac.uk/2014/end-of-the-rainbow/">the awful "jet" colormap</a> by
default, despite probably dozens of peer-reviewed articles pointing
out how rainbow colormaps like "jet" distort users' understanding of
their data, create barriers to accessibility, and lead to bad
decisions, including (for
example)
<a href="http://gvi.seas.harvard.edu/sites/all/files/borkin-InfoVis2011_camera-ready.pdf">unnecessary medical diagnostic errors</a>.
So I suggested to <a href="http://mentat.za.net/">Stéfan</a> that we fix this.
This was an interesting challenge, with two parts: first, the
computational challenge of building a set
of <a href="https://github.com/matplotlib/viscm">tools</a>
to
<a href="https://bids.github.io/colormap/">visualize and design better colormaps</a>,
and second and more importantly, the social challenge of convincing
people to actually use them. After all, there have been many proposals
for better colormaps over the years. Most of them sank without a
trace, and it was entirely possible that our colormap "viridis" would
do the same.</p>
<p>This required working with the matplotlib community to first find a
socially acceptable way to make any changes <em>at all</em> in their default
styles –
here
<a href="https://github.com/matplotlib/matplotlib/issues/875#issuecomment-59958483">my suggestion</a> of
a style-change-only 2.0 release proved successful (and ultimately led
to a
much-needed
<a href="https://matplotlib.org/2.0.0/users/dflt_style_changes.html">broader style overhaul</a>).
Then we had the problem that there are many perfectly reasonable
colormaps, and we needed to build consensus around a single proposal
without getting derailed by endless discussion – avoiding this was the
goal of
a
<a href="https://www.youtube.com/watch?v=xAoljeRJ3lU">talk I gave at SciPy 2015</a>.</p>
<!-- R package rankings: according to cranlogs::cran_top_downloads(when="last-month", count=100) (covering the period 2018-04-06 through 2018-05-05), viridisLite is the 32nd most-downloaded R package. -->
<p>In the end, we succeeded beyond our wildest expectations. As of today,
my talk's been watched >85,000 times, making it the most popular talk
in the history of the SciPy conference. Viridis is now the default
colormap in matplotlib, octave, and parts of ggplot2. Its R package
receives
<a href="https://cranlogs.r-pkg.org/badges/viridisLite">hundreds of thousands of downloads every month</a> which
puts it comfortably in the top 50 most popular R packages. Its fans
have ported it to essentially every visualization framework known to
humankind. It's been showcased
in
<a href="https://physics.aps.org/featured-article-pdf/10.1103/PhysRevLett.116.061102">Nobel-prize winning research</a> and
<a href="https://www.nasa.gov/feature/new-horizons-captures-record-breaking-images-in-the-kuiper-belt">NASA press releases</a>,
and
inspired
<a href="https://twitter.com/sjmgarnier/status/886348228572139520">stickers</a>
and <a href="https://twitter.com/colormap_bot">twitter bots</a>
and
<a href="https://tos.org/oceanography/article/true-colors-of-oceanography-guidelines-for-effective-and-accurate-colormap">follow-ups</a> <a href="https://arxiv.org/abs/1712.01662">from</a> <a href="https://twitter.com/Cyclogenesis_au/status/998652450067439616">other</a> <a href="https://idl.cs.washington.edu/papers/quantitative-color/">researchers</a>.</p>
<p>On the one hand, it's "just" a colormap. But it feels pretty good to
know that every day millions of people are gaining a little more
understanding, more insight, and making better decisions thanks to our
work, and that we've permanently raised the bar on good data
visualization practice.</p>
<h2>Making concurrent programming more accessible</h2>
<p>Here's a common problem: writing a program that does multiple things
concurrently, either for performance or as an intrinsic part of its
functionality – from web servers handling simultaneous users and web
spiders that want to fetch lots of pages in parallel, to Jupyter
notebooks juggling multiple backend kernels and a UI, to complex
simulations running on HPC clusters. But writing correct concurrent
programs is notoriously challenging, even for experts. This is a
challenge across the industry, but felt particularly acutely by
scientists, who generally receive minimal training as software
developers, yet often need to write novel high-performance parallel
code – since by definition, their work involves pushing the boundary
of what's possible. (In
fact <a href="https://software-carpentry.org/">Software Carpentry</a> originally
<a href="https://software-carpentry.org/scf/history/">"grew out of [Greg Wilson's] frustration working with scientists who
wanted to parallelize complex programs but didn't know what version
control was..."</a>.)</p>
<p>Over the last year I've been developing a new paradigm for making
practical concurrent programming more accessible to ordinary
developers, based on a novel analysis of where some of the
difficulties come from, and repurposing some old ideas in language
design. In the course of this work I've produced a practical
implementation in the Python
library <a href="https://trio.readthedocs.io/">Trio</a>, together with a series
of articles, including two discussing the theory behind the core new
language constructs:</p>
<ul>
<li>
<p><a href="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/">Notes on structured concurrency, or: Go statement considered harmful</a></p>
</li>
<li>
<p><a href="https://vorpus.org/blog/timeouts-and-cancellation-for-humans/">Timeouts and cancellation for humans</a></p>
</li>
</ul>
<p>This last project is a bit different than the others – it's more in
the way of basic research, so it will be some time before we know the
full impact. But so far it's attracting quite a bit of interest across
the industry and from language designers
(<a href="https://github.com/JuliaLang/julia/issues/6283#issuecomment-387415648">for example</a>)
and I suspect that either Trio or something very like it will become
the de facto standard library for networking and concurrency in
Python.</p>
<h2>Other work</h2>
<p>Some other smaller things I did at BIDS, besides the four major
projects discussed above:</p>
<ul>
<li>
<p>Was elected as an
honorary <a href="https://www.python.org/psf/fellows/">PSF Fellow</a>, and to
the
<a href="https://mail.python.org/pipermail/python-committers/2018-January/005147.html">Python core developer team</a>.</p>
</li>
<li>
<p>Wrote
up
<a href="https://docs.google.com/document/d/1lByVCEeoJhqtYjUUGtL7pd8sA-MkPQs-swv6E_YEzmg/">feedback</a> for
the BLAS working group on their proposal for
a
<a href="https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/">next generation BLAS API</a>.
The
<a href="https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms">BLAS</a> is
the set of core linear algebra routines that essentially all
number-crunching software is built on, and the BLAS working group is
currently developing a the first update in almost two decades. In
the past, BLAS has been designed mostly with input from traditional
HPC users running Fortran on dedicated clusters; this is the first
time NumPy/SciPy have been involved in this process.</p>
</li>
<li>
<p>Provided some assistance with organizing
the
<a href="https://pyfound.blogspot.com/2017/11/the-psf-awarded-moss-grant-pypi.html">MOSS grant</a> that
<a href="https://lwn.net/Articles/751458/">funded the new PyPI</a>.</p>
</li>
<li>
<p>Created the <a href="https://h11.readthedocs.io/">h11</a> HTTP library, and
came up with
a <a href="https://github.com/urllib3/urllib3/issues/1323">plan</a> for using
it to let urllib3/requests
and
<a href="https://github.com/urllib3/urllib3/issues/1323#issuecomment-379237931">downstream</a> packages
join the new world of Python async concurrency.</p>
</li>
<li>
<p>Had a number of discussions with the conda team about how the conda
and pip worlds could cooperate better.</p>
</li>
<li>
<p>And of course lots of general answering of questions, giving of
advice, fixing of bugs, triaging of bugs, making of connections,
etc.</p>
</li>
</ul>
<h2>...and the ones that got away</h2>
<p>And finally, there are the ones that got away: projects where I've
been working on laying the groundwork, but ran out of time before
producing results. I think these are entirely feasible and have
transformative potential – I'm mentioning them here partly in hopes
that someone picks them up:</p>
<p><strong>PyIR</strong>: Here's the problem. Libraries like NumPy and pandas are
written in C, which makes them reasonably fast on CPython, but
prevents JIT optimizers like PyPy or Numba from being able to speed
them up further. If we rewrote them in Python, they'd be fast on PyPy
or Numba, but unusably slow on regular CPython. Is there any way to
have our cake and eat it too? Right now, our only solution is to
maintain multiple copies of NumPy and other key libraries (e.g. Numba
and PyPy have both spent significant resources on this), which isn't
scalable or sustainable.</p>
<p>So I
organized <a href="https://python-compilers-workshop.github.io/">a workshop</a>
and invited all the JIT developers I could find. I think we came up
with a viable way forward, based around the idea of a Cython-like
language that generates C code for CPython, and a common higher-level
IR for the JITs, and multiple projects were excited about
collaborating on this – but this happened literally the week before I
got sick, and I wasn't able to follow up and get things organized.
It's still doable though, and could unlock a new level of performance
for Python – and as a bonus, in the long run it might provide a way to
escape the "C API trap" that currently blocks many improvements to
CPython (e.g., removing the GIL).</p>
<p><strong>Telemetry</strong>: One reason why developing software like NumPy is
challenging is that we actually have very little idea how people use
it. If we remove a deprecated API, how disruptive will that be? Is
anyone actually using that cool new feature we added? Should we put
more resources into optimizing module X or module Y? And what about at
the ecosystem level – how many users do different packages have? Which
ones are used together? Answering these kinds of questions is crucial
to providing responsible stewardship, but right now there's simply no
way to do it.</p>
<p>Of course there are many pitfalls to gathering this sort of data; if
you're going to do it at all, you have to do it right, with
affirmative user consent, clear guidelines for what can be collected
and how it can be used, a neutral non-profit to provide oversight,
shared infrastructure so we can share the effort across many projects,
and so on. But these are all problems that can be solved with the
right investment (about which, see below), and doing so could
radically change the conversations around maintaining and sustaining
open scientific software.</p>
<h1>What next?</h1>
<p>So there you have it: that's what I've been up to for the last few
years. Not everything worked out the way I hoped, but overall I'm
extremely proud of what I was able to accomplish, and grateful to BIDS
and its funders for providing this opportunity.</p>
<p>As mentioned above, I'm currently considering options for what to do
next – if you're interested in discussing
possibilities, <a href="mailto:njs@pobox.com">get in touch</a>!</p>
<p>Or, if you're interested in the broader question of sustainability for
open scientific software, I wrote
a
<a href="/blog/the-unreasonable-effectiveness-of-investment-in-open-source-infrastructure">follow-up post</a> trying
to analyze what it was about this position allowed it to be so
successful.</p>
<!-- LocalWords: UC mortem Dugan Millman Pérez Nelle Varoquaux der
-->
<!-- LocalWords: Stéfan Fortran manylinux pre scipy avgtext avgdata
-->
<!-- LocalWords: maxresident pagefaults Wh distutils setuptools sig
-->
<!-- LocalWords: distutils's Kluyver pyproject toml py Viridis nd
-->
<!-- LocalWords: colormap colormaps viridis issuecomment ggplot de
-->
<!-- LocalWords: viridisLite facto PSF BLAS urllib async PyIR JIT
-->
<!-- LocalWords: CPython scalable Cython JITs
-->
<div class="footnote">
<hr>
<ol>
<li id="fn:scipy-carbon">
<p>Assuming that the people installing SciPy manylinux wheels would
instead have built SciPy from source (which is what <code>pip install
scipy</code> used to do), that building SciPy takes 10 minutes, and that
during that time the computer consumes an
extra
<a href="https://en.wikipedia.org/wiki/List_of_CPU_power_dissipation_figures">50 W of power</a>,
then we can calculate 10 minutes * 50 W / 60 minutes/hour / 1000
Wh/kWh * 15,000,000 builds = 125,000 kWh of reduced electricity
usage, which I then plugged
into
<a href="https://www.epa.gov/energy/greenhouse-gas-equivalencies-calculator">this EPA calculator</a>. <a class="footnote-backref" href="#fnref:scipy-carbon" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Companion post for my PyCon 2018 talk on async concurrency using Trio2018-05-11T00:00:00-07:002018-05-11T00:00:00-07:00Nathaniel J. Smithtag:vorpus.org,2018-05-11:/blog/companion-post-for-my-pycon-2018-talk-on-async-concurrency-using-trio/<p><strong>The talk itself:</strong></p>
<ul>
<li>
<p><a href="https://www.youtube.com/watch?v=oLkfnc_UMcE">Video</a></p>
</li>
<li>
<p><a href="https://github.com/python-trio/trio-talks/blob/master/njsmith-async-concurrency-for-mere-mortals/2018-05-11-pycon.odp?raw=true">Slides</a> (7
MiB odp file)</p>
</li>
<li>
<p><a href="https://github.com/python-trio/trio-talks/blob/master/njsmith-async-concurrency-for-mere-mortals/2018-05-11-pycon-notebook.ipynb">Code</a></p>
</li>
</ul>
<p>(I'm afraid you probably need LibreOffice, and ideally
the <a href="https://fonts.google.com/specimen/Montserrat">Montserrat</a>
and <a href="https://dejavu-fonts.github.io/">Deja Vu Sans Mono</a>, to view the
slides properly. I haven't posted a PDF because LibreOffice's PDF
export makes a mess of slides containing animations.)</p>
<p><strong>Chat:</strong> Questions? You're watching the talk months later on youtube?
That's cool, you can <a href="https://trio.discourse.group/t/discussion-trio-async-concurrency-for-mere-mortals-talk-at-pycon-2018/31">discuss it on the Trio
forum</a>,
or we
<a href="https://gitter.im/python-trio/general">hang out on Gitter chat</a>.</p>
<p><strong>Sprint info:</strong> I'll be here at PyCon for the first two days of the
sprint – if you want to contribute to Trio, or just play around with
it while sitting next to me, then that'd be awesome! Note that we give
out commit rights to
everyone
<a href="https://trio.readthedocs.io/en/latest/contributing.html#joining-the-team">as soon as their first PR is merged</a>.</p>
<p><strong>Trio's tutorial and reference manual:</strong> <a href="https://trio.readthedocs.io">https://trio.readthedocs.io</a></p>
<p><strong>Code and issues:</strong> <a href="https://github.com/python-trio/trio">https://github.com/python-trio/trio</a></p>
<p><strong>Articles:</strong> For more background on the ideas in Trio:</p>
<ul>
<li>
<p>The theory behind nurseries:
<a href="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/">Notes on structured concurrency; or, Go statement considered harmful</a></p>
</li>
<li>
<p>The theory behind cancel scopes, and why you might prefer them to
all the other kinds of cancellation
APIs:
<a href="https://vorpus.org/blog/timeouts-and-cancellation-for-humans/">Timeouts and cancellation for humans</a></p>
</li>
<li>
<p>How Trio implements
control-C:
<a href="https://vorpus.org/blog/control-c-handling-in-python-and-trio/">Control-C handling in Python and Trio</a> (this
isn't important for using Trio, but might be interesting if you like
reading about complicated technical tricks)</p>
</li>
</ul>
<p><strong>I really want to follow you on Twitter!</strong> I don't really tweet much,
but, <a href="https://twitter.com/vorpalsmith">here you go...</a>.</p>
<p><strong>Do you have a blog?</strong> Yep. This is it :-).</p>Notes on structured concurrency, or: Go statement considered harmful2018-04-25T00:00:00-07:002018-04-25T00:00:00-07:00Nathaniel J. Smithtag:vorpus.org,2018-04-25:/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/<p class="first last">How ideas from the late 1960s can help us build better
concurrency APIs today.</p>
<!-- gross hack to trick pelican into including the .woff file in the
output dir
it's referenced from the .svg files, thanks to running
python ~/bin/svg-add-font-face.py "DejaVu Sans Mono" deja-vu-sans-mono.woff *.svg
I also did:
python ~/bin/svg-add-style.py "@import url('https://fonts.googleapis.com/css?family=Montserrat');" *.svg
Next time, it might be better to use - -export-text-to-path to batch
convert everything instead -->
<div style="display: none;"><p><a class="reference external" href="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/deja-vu-sans-mono.woff">fake link</a></p>
</div><!-- gross hack to make the SVGs shrink down on mobile to be visible
within the viewport
Apparently overflow-x: auto would help if placed on a <div> wrapped
around the <object>, but I don't have one of those -->
<style>
object {
max-width: 100%;
/* overflow-x: auto; */
}
</style><p>Every concurrency API needs a way to run code concurrently. Here's
some examples of what that looks like using different APIs:</p>
<div class="highlight"><pre><span></span>go myfunc(); // Golang
pthread_create(&thread_id, NULL, &myfunc); /* C with POSIX threads */
spawn(modulename, myfuncname, []) % Erlang
threading.Thread(target=myfunc).start() # Python with threads
asyncio.create_task(myfunc()) # Python with asyncio
</pre></div>
<p>There are lots of variations in the notation and terminology, but the
semantics are the same: these all arrange for <tt class="docutils literal">myfunc</tt> to start
running concurrently to the rest of the program, and then return
immediately so that the parent can do other things.</p>
<p>Another option is to use callbacks:</p>
<div class="highlight"><pre><span></span>QObject::connect(&emitter, SIGNAL(event()), // C++ with Qt
&receiver, SLOT(myfunc()))
g_signal_connect(emitter, "event", myfunc, NULL) /* C with GObject */
document.getElementById("myid").onclick = myfunc; // Javascript
promise.then(myfunc, errorhandler) // Javascript with Promises
deferred.addCallback(myfunc) # Python with Twisted
future.add_done_callback(myfunc) # Python with asyncio
</pre></div>
<p>Again, the notation varies, but these all accomplish the same thing:
they arrange that from now on, if and when a certain event occurs,
then <tt class="docutils literal">myfunc</tt> will run. Then once they've set that up, they
immediately return so the caller can do other things. (Sometimes
callbacks get dressed up with fancy helpers like <a class="reference external" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all">promise</a>
<a class="reference external" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/race">combinators</a>,
or <a class="reference external" href="https://twistedmatrix.com/documents/current/core/howto/servers.html">Twisted-style protocols/transports</a>,
but the core idea is the same.)</p>
<p>And... that's it. Take any real-world, general-purpose concurrency
API, and you'll probably find that it falls into one or the other of
those buckets (or sometimes both, like asyncio).</p>
<p>But my new library <a class="reference external" href="https://trio.readthedocs.io">Trio</a> is weird. It
doesn't use either approach. Instead, if we want to run <tt class="docutils literal">myfunc</tt> and
<tt class="docutils literal">anotherfunc</tt> concurrently, we write something like:</p>
<div class="highlight"><pre><span></span><span class="k">async</span> <span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">open_nursery</span><span class="p">()</span> <span class="k">as</span> <span class="n">nursery</span><span class="p">:</span>
<span class="n">nursery</span><span class="o">.</span><span class="n">start_soon</span><span class="p">(</span><span class="n">myfunc</span><span class="p">)</span>
<span class="n">nursery</span><span class="o">.</span><span class="n">start_soon</span><span class="p">(</span><span class="n">anotherfunc</span><span class="p">)</span>
</pre></div>
<!-- This runs ``myfunc`` and ``anotherfunc`` concurrently, and waits for
them both to finish; if one of them raises an exception, then the
other one is cancelled and the exception propagates into the caller. -->
<p>When people first encounter this "nursery" construct, they tend to
find it confusing. Why is there an indented block? What's this
<tt class="docutils literal">nursery</tt> object, and why do I need one before I can spawn a task?
Then they realize that it prevents them from using patterns they've
gotten used to in other frameworks, and they get really annoyed. It
feels quirky and idiosyncratic and too high-level to be a basic
primitive. These are understandable reactions! But bear with me.</p>
<p><strong>In this post, I want to convince you that nurseries aren't quirky or
idiosyncratic at all, but rather a new control flow primitive that's
just as fundamental as for loops or function calls. And furthermore,
the other approaches we saw above – thread spawning and callback
registration – should be removed entirely and replaced with
nurseries.</strong></p>
<p>Sound unlikely? Something similar has actually happened before: the
<tt class="docutils literal">goto</tt> statement was once the king of control flow. Now it's a
<a class="reference external" href="https://xkcd.com/292/">punchline</a>. A few languages still have
something they call <tt class="docutils literal">goto</tt>, but it's different and far weaker than
the original <tt class="docutils literal">goto</tt>. And most languages don't even have that. What
happened? This was so long ago that most people aren't familiar with
the story anymore, but it turns out to be surprisingly relevant. So
we'll start by reminding ourselves what a <tt class="docutils literal">goto</tt> was, exactly, and
then see what it can teach us about concurrency APIs.</p>
<div class="contents topic" id="contents">
<p class="topic-title"><strong>Contents:</strong></p>
<ul class="simple">
<li><a class="reference internal" href="#what-is-a-goto-statement-anyway" id="id9">What is a <tt class="docutils literal">goto</tt> statement anyway?</a></li>
<li><a class="reference internal" href="#what-is-a-go-statement-anyway" id="id10">What is a <tt class="docutils literal">go</tt> statement anyway?</a></li>
<li><a class="reference internal" href="#what-happened-to-goto" id="id11">What happened to <tt class="docutils literal">goto</tt>?</a><ul>
<li><a class="reference internal" href="#goto-the-destroyer-of-abstraction" id="id12"><tt class="docutils literal">goto</tt>: the destroyer of abstraction</a></li>
<li><a class="reference internal" href="#a-surprise-benefit-removing-goto-statements-enables-new-features" id="id13">A surprise benefit: removing <tt class="docutils literal">goto</tt> statements enables new features</a></li>
<li><a class="reference internal" href="#goto-statements-not-even-once" id="id14"><tt class="docutils literal">goto</tt> statements: not even once</a></li>
</ul>
</li>
<li><a class="reference internal" href="#go-statement-considered-harmful" id="id15"><tt class="docutils literal">go</tt> statement considered harmful</a><ul>
<li><a class="reference internal" href="#go-statements-not-even-once" id="id16"><tt class="docutils literal">go</tt> statements: not even once</a></li>
</ul>
</li>
<li><a class="reference internal" href="#nurseries-a-structured-replacement-for-go-statements" id="id17">Nurseries: a structured replacement for <tt class="docutils literal">go</tt> statements</a><ul>
<li><a class="reference internal" href="#nurseries-preserve-the-function-abstraction" id="id18">Nurseries preserve the function abstraction.</a></li>
<li><a class="reference internal" href="#nurseries-support-dynamic-task-spawning" id="id19">Nurseries support dynamic task spawning.</a></li>
<li><a class="reference internal" href="#there-is-an-escape" id="id20">There is an escape.</a></li>
<li><a class="reference internal" href="#you-can-define-new-types-that-quack-like-a-nursery" id="id21">You can define new types that quack like a nursery.</a></li>
<li><a class="reference internal" href="#no-really-nurseries-always-wait-for-the-tasks-inside-to-exit" id="id22">No, really, nurseries <em>always</em> wait for the tasks inside to exit.</a></li>
<li><a class="reference internal" href="#automatic-resource-cleanup-works" id="id23">Automatic resource cleanup works.</a></li>
<li><a class="reference internal" href="#automated-error-propagation-works" id="id24">Automated error propagation works.</a></li>
<li><a class="reference internal" href="#a-surprise-benefit-removing-go-statements-enables-new-features" id="id25">A surprise benefit: removing <tt class="docutils literal">go</tt> statements enables new features</a></li>
</ul>
</li>
<li><a class="reference internal" href="#nurseries-in-practice" id="id26">Nurseries in practice</a></li>
<li><a class="reference internal" href="#conclusion" id="id27">Conclusion</a></li>
<li><a class="reference internal" href="#comments" id="id28">Comments</a></li>
<li><a class="reference internal" href="#acknowledgments" id="id29">Acknowledgments</a></li>
<li><a class="reference internal" href="#footnotes" id="id30">Footnotes</a></li>
</ul>
</div>
<div class="section" id="what-is-a-goto-statement-anyway">
<h2><a class="toc-backref" href="#id9">What is a <tt class="docutils literal">goto</tt> statement anyway?</a></h2>
<p>Let's review some history: Early computers were programmed using
<a class="reference external" href="https://en.wikipedia.org/wiki/Assembly_language">assembly language</a>, or other even
more primitive mechanisms. This kinda sucked. So in the 1950s, people
like <a class="reference external" href="https://en.wikipedia.org/wiki/John_Backus">John Backus</a> at
IBM and <a class="reference external" href="https://en.wikipedia.org/wiki/Grace_Hopper">Grace Hopper</a>
at Remington Rand started to develop languages like <a class="reference external" href="https://en.wikipedia.org/wiki/Fortran">FORTRAN</a> and <a class="reference external" href="https://en.wikipedia.org/wiki/FLOW-MATIC">FLOW-MATIC</a> (better known for its
direct successor <a class="reference external" href="https://en.wikipedia.org/wiki/COBOL">COBOL</a>).</p>
<p>FLOW-MATIC was very ambitious for its time. You can think of it as
Python's great-great-great-...-grandparent: the first language that
was designed for humans first, and computers second. Here's some
FLOW-MATIC code to give you a taste of what it looked like:</p>
<object data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/flow-matic-1.svg" style="width: 440px;" type="image/svg+xml"></object>
<p>You'll notice that unlike modern languages, there's no <tt class="docutils literal">if</tt> blocks,
loop blocks, or function calls here – in fact there's no block
delimiters or indentation at all. It's just a flat list of statements.
That's not because this program happens to be too short to use fancier
control syntax – it's because block syntax wasn't invented yet!</p>
<object data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/sequential-and-go-to-schematic.svg" style="width: 400px;" type="image/svg+xml">Sequential flow represented as a vertical arrow pointing
down, and goto flow represented as an arrow that starts
pointing down and then leaps off to the side.</object>
<p>Instead, FLOW-MATIC had two options for flow control. Normally, it was
sequential, just like you'd expect: start at the top and move
downwards, one statement at a time. But if you execute a special
statement like <tt class="docutils literal">JUMP TO</tt>, then it could directly transfer control
somewhere else. For example, statement (13) jumps back to statement
(2):</p>
<object data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/flow-matic-2.svg" style="width: 440px;" type="image/svg+xml"></object>
<p>Just like for our concurrency primitives at the beginning, there was
some disagreement about what to call this "do a one-way jump"
operation. Here it's <tt class="docutils literal">JUMP TO</tt>, but the name that stuck was <tt class="docutils literal">goto</tt>
(like "go to", get it?), so that's what I'll use here.</p>
<!-- FLOW-MATIC also has conditional jumps: you'll notice the word ``IF``
appears, but it's different from a modern ``if`` statement: it checks
some condition, but then instead of having different blocks it
executes, it jumps to different places:
.. image:: {attach}flow-matic-3.svg
:width: 440px -->
<p>Here's the complete set of <tt class="docutils literal">goto</tt> jumps in this little program:</p>
<object data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/flow-matic-4.svg" style="width: 440px;" type="image/svg+xml"></object>
<p>If you think this looks confusing, you're not alone! This style of
jump-based programming is something that FLOW-MATIC inherited pretty
much directly from assembly language. It's powerful, and a good fit to
how computer hardware actually works, but it's super confusing to work
with directly. That tangle of arrows is why the term "spaghetti code"
was invented. Clearly, we needed something better.</p>
<p>But... what is it about <tt class="docutils literal">goto</tt> that causes all these problems? Why
are some control structures OK, and some not? How do we pick the good
ones? At the time, this was really unclear, and it's hard to fix a
problem if you don't understand it.</p>
</div>
<div class="section" id="what-is-a-go-statement-anyway">
<h2><a class="toc-backref" href="#id10">What is a <tt class="docutils literal">go</tt> statement anyway?</a></h2>
<p>But let's hit pause on the history for a moment – everyone knows
<tt class="docutils literal">goto</tt> was bad. What does this have to do with concurrency? Well,
consider Golang's famous <tt class="docutils literal">go</tt> statement, used to spawn a new
"goroutine" (lightweight thread):</p>
<div class="highlight"><pre><span></span><span class="c1">// Golang</span>
<span class="k">go</span> <span class="nx">myfunc</span><span class="p">();</span>
</pre></div>
<p>Can we draw a diagram of its control flow? Well, it's a little
different from either of the ones we saw above, because control
actually splits. We might draw it like:</p>
<object class="align-center" data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/go-schematic-unlabeled.svg" style="width: 395px;" type="image/svg+xml">"Go" flow represented as two arrows: a green arrow pointing
down, and a lavender arrow that starts pointing down and then
leaps off to the side.</object>
<p>Here the colors are intended to indicate that <em>both</em> paths are taken.
From the perspective of the parent goroutine (green line), control
flows sequentially: it comes in the top, and then immediately comes
out the bottom. Meanwhile, from the perspective of the child (lavender
line), control comes in the top, and then jumps over to the body of
<tt class="docutils literal">myfunc</tt>. Unlike a regular function call, this jump is one-way: when
running <tt class="docutils literal">myfunc</tt> we switch to a whole new stack, and the runtime
immediately forgets where we came from.</p>
<p>But this doesn't just apply to Golang. This is the flow control
diagram for <em>all</em> of the primitives we listed at the beginning of this
post:</p>
<ul class="simple">
<li>Threading libraries usually provide some sort of handle object that
lets you <tt class="docutils literal">join</tt> the thread later – but this is an independent
operation that the language doesn't know anything about. The actual
thread spawning primitive has the control flow shown above.</li>
<li>Registering a callback is semantically equivalent to starting a
background thread that (a) blocks until some event occurs, and
then (b) runs the callback. (Though obviously the implementation is
different.) So in terms of high-level control flow, registering a
callback is essentially a <tt class="docutils literal">go</tt> statement.</li>
<li>Futures and promises are the same too: when you call a function and
it returns a promise, that means it's scheduled the work to happen
in the background, and then given you a handle object to join the
work later (if you want). In terms of control flow semantics, this
is just like spawning a thread. Then you register callbacks on the
promise, so see the previous bullet point.</li>
</ul>
<p>This same exact pattern shows up in many, many forms: the key
similarity is that in all these cases, control flow splits, with one
side doing a one-way jump and the other side returning to the caller.
Once you know what to look for, you'll start seeing it all over the
place – it's a fun game! <a class="footnote-reference" href="#id5" id="id1">[1]</a></p>
<p>Annoyingly, though, there is no standard name for this category of
control flow constructs. So just like "<tt class="docutils literal">goto</tt> statement" became the
umbrella term for all the different <tt class="docutils literal">goto</tt>-like constructs, I'm
going to use "<tt class="docutils literal">go</tt> statement" as a umbrella term for these. Why
<tt class="docutils literal">go</tt>? One reason is that Golang gives us a particularly pure example
of the form. And the other is... well, you've probably guessed where
I'm going with all this. Look at these two diagrams. Notice any
similarities?</p>
<object class="align-center" data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/go-schematic-and-go-to-schematic.svg" style="width: 400px;" type="image/svg+xml">Repeat of earlier diagrams: goto flow represented as an arrow
that starts pointing down and then leaps off to the side, and
"go" flow represented as two arrows: a green arrow pointing
down, and a lavender arrow that starts pointing down and then
leaps off to the side.</object>
<p>That's right: <strong>go statements are a form of goto statement.</strong></p>
<p>Concurrent programs are notoriously difficult to write and reason
about. So are <tt class="docutils literal">goto</tt>-based programs. Is it possible that this might
be for some of the same reasons? In modern languages, the problems
caused by <tt class="docutils literal">goto</tt> are largely solved. If we study how they fixed
<tt class="docutils literal">goto</tt>, will it teach us how to make more usable concurrency APIs?
Let's find out.</p>
</div>
<div class="section" id="what-happened-to-goto">
<h2><a class="toc-backref" href="#id11">What happened to <tt class="docutils literal">goto</tt>?</a></h2>
<p>So what is it about <tt class="docutils literal">goto</tt> that makes it cause so many problems? In
the late 1960s, <a class="reference external" href="https://en.wikipedia.org/wiki/Edsger_W._Dijkstra">Edsger W. Dijkstra</a> wrote a pair of
now-famous papers that helped make this much clearer: <a class="reference external" href="https://scholar.google.com/scholar?cluster=15335993203437612903&hl=en&as_sdt=0,5">Go to statement
considered harmful</a>,
and <a class="reference external" href="https://www.cs.utexas.edu/~EWD/ewd02xx/EWD249.PDF">Notes on structured programming</a> (PDF).</p>
<div class="section" id="goto-the-destroyer-of-abstraction">
<h3><a class="toc-backref" href="#id12"><tt class="docutils literal">goto</tt>: the destroyer of abstraction</a></h3>
<p>In these papers, Dijkstra was worried about the problem of how you
write non-trivial software and get it correct. I can't give them due
justice here; there's all kinds of fascinating insights. For example,
you may have heard this quote:</p>
<img alt="Testing can be used to show the presence of bugs, but never to show their absence!" src="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/testing.png" />
<p>Yep, that's from <em>Notes on structured programming</em>. But his major
concern was <em>abstraction</em>. He wanted to write programs that are too
big to hold in your head all at once. To do this, you need to treat
parts of the program like a black box – like when you see a Python
program do:</p>
<div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s2">"Hello world!"</span><span class="p">)</span>
</pre></div>
<p>then you don't need to know all the details of how <tt class="docutils literal">print</tt> is
implemented (string formatting, buffering, cross-platform differences,
...). You just need to know that it will somehow print the text you
give it, and then you can spend your energy thinking about whether
that's what you want to have happen at this point in your code.
Dijkstra wanted languages to support this kind of abstraction.</p>
<p>By this point, block syntax had been invented, and languages like
ALGOL had accumulated ~5 distinct types of control structure: they
still had sequential flow and <tt class="docutils literal">goto</tt>:</p>
<object data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/sequential-and-go-to-schematic.svg" style="width: 400px;" type="image/svg+xml">Same picture of sequential flow and goto flow as before.</object>
<p>And had also acquired variants on if/else, loops, and function calls:</p>
<object class="align-center" data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/control-schematics.svg" style="width: 500px;" type="image/svg+xml">Diagrams with arrows showing the flow control for if
statements, loops, and function calls.</object>
<p>You can implement these higher-level constructs using <tt class="docutils literal">goto</tt>, and
early on, that's how people thought of them: as a convenient
shorthand. But what Dijkstra pointed out is that if you look at these
diagrams, there's a big difference between <tt class="docutils literal">goto</tt> and the rest. For
everything except <tt class="docutils literal">goto</tt>, flow control comes in the top → [stuff
happens] → flow control comes out the bottom. We might call this the
"black box rule": if a control structure has this shape, then in
contexts where you don't care about the details of what happens
internally, you can ignore the [stuff happens] part, and treat the
whole thing as regular sequential flow. And even better, this is also
true of any code that's <em>composed</em> out of those pieces. When I look at
this code:</p>
<div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s2">"Hello world!"</span><span class="p">)</span>
</pre></div>
<p>I don't have to go read the definition of <tt class="docutils literal">print</tt> and all its
transitive dependencies just to figure out how the control flow works.
Maybe inside <tt class="docutils literal">print</tt> there's a loop, and inside the loop there's an
if/else, and inside the if/else there's another function call... or
maybe it's something else. It doesn't really matter: I know control
will flow into <tt class="docutils literal">print</tt>, the function will do its thing, and then
eventually control will come back to the code I'm reading.</p>
<p>It may seem like this is obvious, but if you have a language with
<tt class="docutils literal">goto</tt> – a language where functions and everything else are built on
top of <tt class="docutils literal">goto</tt>, and <tt class="docutils literal">goto</tt> can jump anywhere, at any time – then
these control structures aren't black boxes at all! If you have a
function, and inside the function there's a loop, and inside the loop
there's an if/else, and inside the if/else there's a <tt class="docutils literal">goto</tt>... then
that <tt class="docutils literal">goto</tt> could send the control anywhere it wants. Maybe control
will suddenly return from another function entirely, one you haven't
even called yet. You don't know!</p>
<p>And this breaks abstraction: it means that <em>every function call is
potentially a</em> <tt class="docutils literal">goto</tt> <em>statement in disguise, and the only way to
know is to keep the entire source code of your system in your head at
once.</em> As soon as <tt class="docutils literal">goto</tt> is in your language, you stop being able do
local reasoning about flow control. That's <em>why</em> <tt class="docutils literal">goto</tt> leads to
spaghetti code.</p>
<p>And now that Dijkstra understood the problem, he was able to solve it.
Here's his revolutionary proposal: we should stop thinking of
if/loops/function calls as shorthands for <tt class="docutils literal">goto</tt>, but rather as
fundamental primitives in their own rights – and we should remove
<tt class="docutils literal">goto</tt> entirely from our languages.</p>
<p>From here in 2018, this seems obvious enough. But have you seen how
programmers react when you try to take away their toys because they're
not smart enough to use them safely? Yeah, some things never change.
In 1969, this proposal was <em>incredibly controversial</em>. <a class="reference external" href="https://en.wikipedia.org/wiki/Donald_Knuth">Donald Knuth</a> <a class="reference external" href="https://scholar.google.com/scholar?cluster=17147143327681396418&hl=en&as_sdt=0,5">defended</a>
<tt class="docutils literal">goto</tt>. People who had become experts on writing code with <tt class="docutils literal">goto</tt>
quite reasonably resented having to basically learn how to program
again in order to express their ideas using the newer, more
constraining constructs. And of course it required building a whole
new set of languages.</p>
<div class="figure align-right">
<img alt="On the left, a photo of a snarling wolf. On the right, a photo of a grumpy bulldog." src="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/wolf-and-bulldog.jpg" style="width: 400px;" />
<p class="caption">Left: A traditional <tt class="docutils literal">goto</tt>. Right: A domesticated <tt class="docutils literal">goto</tt>, as
seen in C, C#, Golang, etc. The inability to cross function
boundaries means it can still pee on your shoes, but it probably
won't rip your face off.</p>
</div>
<p>In the end, modern languages are a bit less strict about this than
Dijkstra's original formulation. They'll let you break out of multiple
nested structures at once using constructs like <tt class="docutils literal">break</tt>,
<tt class="docutils literal">continue</tt>, or <tt class="docutils literal">return</tt>. But fundamentally, they're all designed
around Dijkstra's idea; even these constructs that push the boundaries
do so only in strictly limited ways. In particular, functions – which
are the fundamental tool for wrapping up control flow inside a black
box – are considered inviolate. You can't <tt class="docutils literal">break</tt> out of one
function and into another, and a <tt class="docutils literal">return</tt> can take you out of the
current function, but no further. Whatever control flow shenanigans a
function gets up to internally, other functions don't have to care.</p>
<p>This even extends to <tt class="docutils literal">goto</tt> itself. You'll find a few languages that
still have something they call <tt class="docutils literal">goto</tt>, like C, C#, Golang, ... but
they've added heavy restrictions. At the very least, they won't let
you jump out of one function body and into another. Unless you're
working in assembly <a class="footnote-reference" href="#id6" id="id2">[2]</a>, the classic, unrestricted <tt class="docutils literal">goto</tt> is gone.
Dijkstra won.</p>
</div>
<div class="section" id="a-surprise-benefit-removing-goto-statements-enables-new-features">
<h3><a class="toc-backref" href="#id13">A surprise benefit: removing <tt class="docutils literal">goto</tt> statements enables new features</a></h3>
<p>And once <tt class="docutils literal">goto</tt> disappeared, something interesting happened:
language designers were able to start adding features that depend on
control flow being structured.</p>
<p>For example, Python has some nice syntax for resource cleanup: the
<tt class="docutils literal">with</tt> statement. You can write things like:</p>
<div class="highlight"><pre><span></span><span class="c1"># Python</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"my-file"</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_handle</span><span class="p">:</span>
<span class="o">...</span>
</pre></div>
<p>and it guarantees that the file will be open during the <tt class="docutils literal">...</tt> code,
but then closed immediately afterward. Most modern languages have some
equivalent (RAII, <tt class="docutils literal">using</tt>, try-with-resource, <tt class="docutils literal">defer</tt>, ...). And
they all assume that control flows in an orderly, structured way. If
we used <tt class="docutils literal">goto</tt> to jump into the middle of our <tt class="docutils literal">with</tt> block... what
would that even do? Is the file open or not? What if we jumped out
again, instead of exiting normally? Would the file get closed? This
feature just doesn't work in any coherent way if your language has
<tt class="docutils literal">goto</tt> in it.</p>
<p>Error handling has a similar problem: when something goes wrong, what
should your code do? Often the answer is to pass the buck up the stack
to your code's caller, let them figure out how to deal with it. Modern
languages have constructs specifically to make this easier, like
exceptions, or other forms of <a class="reference external" href="https://doc.rust-lang.org/std/result/index.html#the-question-mark-operator-">automatic error propagation</a>.
But your language can only provide this help if it <em>has</em> a stack, and
a reliable concept of "caller". Look again at the control-flow
spaghetti in our FLOW-MATIC program and imagine that in the middle of
that it tried to raise an exception. Where would it even go?</p>
</div>
<div class="section" id="goto-statements-not-even-once">
<h3><a class="toc-backref" href="#id14"><tt class="docutils literal">goto</tt> statements: not even once</a></h3>
<p>So <tt class="docutils literal">goto</tt> – the traditional kind that ignores function boundaries –
isn't just the regular kind of bad feature, the kind that's hard to
use correctly. If it were, it might have survived – lots of bad
features have. But it's much worse.</p>
<blockquote>
<!-- I mean, sure, if you do have to use it,
then there are better and worse ways to do that, and it's probably a
good idea to build better tools on top and then use those as much as
possible. But this doesn't really fix the problem. -->
</blockquote>
<p>Even if you don't use <tt class="docutils literal">goto</tt> yourself, merely having it as an option
in your language makes <em>everything</em> harder to use. Whenever you start
using a third-party library, you can't treat it as a black box – you
have to go read through it all to find out which functions are regular
functions, and which ones are idiosyncratic flow control constructs in
disguise. This is a serious obstacle to local reasoning. And you lose
powerful language features like reliable resource cleanup and
automatic error propagation. Better to remove <tt class="docutils literal">goto</tt> entirely, in
favor of control flow constructs that follow the "black box" rule.</p>
</div>
</div>
<div class="section" id="go-statement-considered-harmful">
<h2><a class="toc-backref" href="#id15"><tt class="docutils literal">go</tt> statement considered harmful</a></h2>
<p>So that's the history of <tt class="docutils literal">goto</tt>. Now, how much of this applies to
<tt class="docutils literal">go</tt> statements? Well... basically, all of it! The analogy turns out
to be shockingly exact.</p>
<p><strong>Go statements break abstraction.</strong> Remember how we said that if our
language allows <tt class="docutils literal">goto</tt>, then any function might be a <tt class="docutils literal">goto</tt> in
disguise? In most concurrency frameworks, <tt class="docutils literal">go</tt> statements cause the
exact same problem: whenever you call a function, it might or might
not spawn some background task. The function seemed to return, but is
it still running in the background? There's no way to know without
reading all its source code, transitively. When will it finish? Hard
to say. If you have <tt class="docutils literal">go</tt> statements, then functions are no longer
black boxes with respect to control flow. In my <a class="reference external" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/">first post on
concurrency APIs</a>,
I called this "violating causality", and found that it was the root
cause of many common, real-world issues in programs using asyncio and
Twisted, like problems with backpressure, problems with shutting down
properly, and so forth.</p>
<p><strong>Go statements break automatic resource cleanup.</strong> Let's look again
at that <tt class="docutils literal">with</tt> statement example:</p>
<div class="highlight"><pre><span></span><span class="c1"># Python</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"my-file"</span><span class="p">)</span> <span class="k">as</span> <span class="n">file_handle</span><span class="p">:</span>
<span class="o">...</span>
</pre></div>
<p>Before, we said that we were "guaranteed" that the file will be open
while the <tt class="docutils literal">...</tt> code is running, and then closed afterwards. But
what if the <tt class="docutils literal">...</tt> code spawns a background task? Then our guarantee
is lost: the operations that <em>look</em> like they're inside the <tt class="docutils literal">with</tt>
block might actually keep running <em>after</em> the <tt class="docutils literal">with</tt> block ends, and
then crash because the file gets closed while they're still using it.
And again, you can't tell from local inspection; to know if this is
happening you have to go read the source code to all the functions
called inside the <tt class="docutils literal">...</tt> code.</p>
<p>If we want this code to work properly, we need to somehow keep track
of any background tasks, and manually arrange for the file to be
closed only when they're finished. It's doable – unless we're using
some library that doesn't provide any way to get notified when the
task is finished, which is distressingly common (e.g. because it
doesn't expose any task handle that you can join on). But even in the
best case, the unstructured control flow means the language can't help
us. We're back to implementing resource cleanup by hand, like in the
bad old days.</p>
<p><strong>Go statements break error handling.</strong> Like we discussed above,
modern languages provide powerful tools like exceptions to help us
make sure that errors are detected and propagated to the right place.
But these tools depend on having a reliable concept of "the current
code's caller". As soon as you spawn a task or register a callback,
that concept is broken. As a result, every mainstream concurrency
framework I know of simply gives up. If an error occurs in a
background task, and you don't handle it manually, then the runtime
just... drops it on the floor and crosses its fingers that it wasn't
too important. If you're lucky it might print something on the
console. (The only other software I've used that thinks "print
something and keep going" is a good error handling strategy is grotty
old Fortran libraries, but here we are.) Even Rust – the language
voted Most Obsessed With Threading Correctness by its high school
class – is guilty of this. If a background thread panics, Rust
<a class="reference external" href="https://doc.rust-lang.org/std/thread/">discards the error and hopes for the best</a>.</p>
<p>Of course you <em>can</em> handle errors properly in these systems, by
carefully making sure to join every thread, or by building your own
error propagation mechanism like <a class="reference external" href="https://twistedmatrix.com/documents/current/core/howto/defer.html#visual-explanation">errbacks in Twisted</a>
or <a class="reference external" href="https://hackernoon.com/promises-and-error-handling-4a11af37cb0e">Promise.catch in Javascript</a>.
But now you're writing an ad-hoc, fragile reimplementation of the
features your language already has. You've lost useful stuff like
"tracebacks" and "debuggers". All it takes is forgetting to call
<tt class="docutils literal">Promise.catch</tt> once and suddenly you're dropping serious errors on
the floor without even realizing. And even if you do somehow solve all
these problems, you'll still end up with two redundant systems for
doing the same thing.</p>
<div class="section" id="go-statements-not-even-once">
<h3><a class="toc-backref" href="#id16"><tt class="docutils literal">go</tt> statements: not even once</a></h3>
<p>Just like <tt class="docutils literal">goto</tt> was the obvious primitive for the first practical
high-level languages, <tt class="docutils literal">go</tt> was the obvious primitive for the first
practical concurrency frameworks: it matches how the underlying
schedulers actually work, and it's powerful enough to implement any
other concurrent flow pattern. But again like <tt class="docutils literal">goto</tt>, it breaks
control flow abstractions, so that merely having it as an option in
your language makes everything harder to use.</p>
<p>The good news, though, is that these problems can all be solved:
Dijkstra showed us how! We need to:</p>
<ul class="simple">
<li>Find a replacement for <tt class="docutils literal">go</tt> statements that has similar power, but
follows the "black box rule",</li>
<li>Build that new construct into our concurrency framework as a
primitive, and don't include any form of <tt class="docutils literal">go</tt> statement.</li>
</ul>
<p>And that's what Trio did.</p>
</div>
</div>
<div class="section" id="nurseries-a-structured-replacement-for-go-statements">
<h2><a class="toc-backref" href="#id17">Nurseries: a structured replacement for <tt class="docutils literal">go</tt> statements</a></h2>
<p>Here's the core idea: every time our control splits into multiple
concurrent paths, we want to make sure that they join up again. So for
example, if we want to do three things at the same time, our control
flow should look something like this:</p>
<object class="align-center" data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/nursery-schematic-unlabeled.svg" style="width: 250px;" type="image/svg+xml"></object>
<p>Notice that this has just one arrow going in the top and one coming
out the bottom, so it follows Dijkstra's black box rule. Now, how can
we turn this sketch into a concrete language construct? There are some
existing constructs that meet this constraint, but (a) my proposal is
slightly different than all the ones I'm aware of and has advantages
over them (especially in the context of wanting to make this a
standalone primitive), and (b) the concurrency literature is vast and
complicated, and trying to pick apart all the history and tradeoffs
would totally derail the argument, so I'm going to defer that to a
separate post. Here, I'll just focus on explaining my solution. But
please be aware that I'm not claiming to have like, invented the idea
of concurrency or something, this draws inspiration from many sources,
I'm standing on the shoulders of giants, etc. <a class="footnote-reference" href="#id7" id="id3">[3]</a></p>
<p>Anyway, here's how we're going to do it: first, we declare that a
parent task cannot start any child tasks unless it first creates a
place for the children to live: a <em>nursery</em>. It does this by opening a
<em>nursery block</em>; in Trio, we do this using Python's <tt class="docutils literal">async with</tt>
syntax:</p>
<object class="align-center" data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/nursery-1-pathified.svg" style="width: 350px;" type="image/svg+xml"></object>
<p>Opening a nursery block automatically creates an object representing
this nursery, and the <tt class="docutils literal">as nursery</tt> syntax assigns this object to the
variable named <tt class="docutils literal">nursery</tt>. Then we can use the nursery object's
<tt class="docutils literal">start_soon</tt> method to start concurrent tasks: in this case, one
task calling the function <tt class="docutils literal">myfunc</tt>, and another calling the function
<tt class="docutils literal">anotherfunc</tt>. Conceptually, these tasks execute <em>inside</em> the
nursery block. In fact, it's often convenient to think of the code
written inside the nursery block as being an initial task that's
automatically started when the block is created.</p>
<object class="align-center" data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/nursery-2-pathified.svg" style="width: 500px;" type="image/svg+xml"></object>
<p>Crucially, the nursery block doesn't exit until all the tasks inside
it have exited – if the parent task reaches the end of the block
before all the children are finished, then it pauses there and waits
for them. The nursery automatically expands to hold the children.</p>
<p>Here's the control flow: you can see how it matches the basic pattern
we showed at the beginning of this section:</p>
<object class="align-center" data="https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/nursery-3-pathified.svg" style="width: 600px;" type="image/svg+xml"></object>
<p>This design has a number of consequences, not all of which are
obvious. Let's think through some of them.</p>
<div class="section" id="nurseries-preserve-the-function-abstraction">
<h3><a class="toc-backref" href="#id18">Nurseries preserve the function abstraction.</a></h3>
<p>The fundamental problem with <tt class="docutils literal">go</tt> statements is that when you call a
function, you don't know whether it's going to spawn some background
task that keeps running after it's finished. With nurseries, you don't
have to worry about this: any function can open a nursery and run
multiple concurrent tasks, but the function can't return until they've
all finished. So when a function does return, you know it's really
done.</p>
</div>
<div class="section" id="nurseries-support-dynamic-task-spawning">
<h3><a class="toc-backref" href="#id19">Nurseries support dynamic task spawning.</a></h3>
<p>Here's a simpler primitive that would also satisfy our flow control
diagram above. It takes a list of thunks, and runs them all
concurrently:</p>
<div class="highlight"><pre><span></span><span class="n">run_concurrently</span><span class="p">([</span><span class="n">myfunc</span><span class="p">,</span> <span class="n">anotherfunc</span><span class="p">])</span>
</pre></div>
<p>But the problem with this is that you have to know up front the
complete list of tasks you're going to run, which isn't always true.
For example, server programs generally have <tt class="docutils literal">accept</tt> loops, that
take incoming connections and start a new task to handle each of them.
Here's a minimal <tt class="docutils literal">accept</tt> loop in Trio:</p>
<div class="highlight"><pre><span></span><span class="k">async</span> <span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">open_nursery</span><span class="p">()</span> <span class="k">as</span> <span class="n">nursery</span><span class="p">:</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">incoming_connection</span> <span class="o">=</span> <span class="k">await</span> <span class="n">server_socket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span>
<span class="n">nursery</span><span class="o">.</span><span class="n">start_soon</span><span class="p">(</span><span class="n">connection_handler</span><span class="p">,</span> <span class="n">incoming_connection</span><span class="p">)</span>
</pre></div>
<p>With nurseries, this is trivial, but implementing it using
<tt class="docutils literal">run_concurrently</tt> would be <em>much</em> more awkward. And if you wanted
to, it would be easy to implement <tt class="docutils literal">run_concurrently</tt> on top of
nurseries – but it's not really necessary, since in the simple cases
<tt class="docutils literal">run_concurrently</tt> can handle, the nursery notation is just as
readable.</p>
</div>
<div class="section" id="there-is-an-escape">
<h3><a class="toc-backref" href="#id20">There is an escape.</a></h3>
<p>The nursery object also gives us an escape hatch. What if you really
do need to write a function that spawns a background task, where the
background task outlives the function itself? Easy: pass the function
a nursery object. There's no rule that only the code directly inside
the <tt class="docutils literal">async with open_nursery()</tt> block can call
<tt class="docutils literal">nursery.start_soon</tt> – so long as the nursery block remains open
<a class="footnote-reference" href="#id8" id="id4">[4]</a>, then anyone who acquires a reference to the nursery object gets
the capability of spawning tasks into that nursery. You can pass it in
as a function argument, send it through a queue, whatever.</p>
<p>In practice, this means that you can write functions that "break the
rules", but within limits:</p>
<ul class="simple">
<li>Since nursery objects have to be passed around explicitly, you can
immediately identify which functions violate normal flow control by
looking at their call sites, so local reasoning is still possible.</li>
<li>Any tasks the function spawns are still bound by the lifetime of the
nursery that was passed in.</li>
<li>And the calling code can only pass in nursery objects that it itself
has access to.</li>
</ul>
<p>So this is still very different from the traditional model where any
code can at any moment spawn a background task with unbounded
lifetime.</p>
<p>One place this is useful is in the proof that nurseries have
equivalent expressive power to <tt class="docutils literal">go</tt> statements, but this post is
already long enough so I'll leave that for another day.</p>
</div>
<div class="section" id="you-can-define-new-types-that-quack-like-a-nursery">
<h3><a class="toc-backref" href="#id21">You can define new types that quack like a nursery.</a></h3>
<p>The standard nursery semantics provide a solid foundation, but
sometimes you want something different. Perhaps you're envious of
Erlang and its supervisors, and want to define a nursery-like class
that handles exceptions by restarting the child task. That's totally
possible, and to your users, it'll look just like a regular nursery:</p>
<div class="highlight"><pre><span></span><span class="k">async</span> <span class="k">with</span> <span class="n">my_supervisor_library</span><span class="o">.</span><span class="n">open_supervisor</span><span class="p">()</span> <span class="k">as</span> <span class="n">nursery_alike</span><span class="p">:</span>
<span class="n">nursery_alike</span><span class="o">.</span><span class="n">start_soon</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
</pre></div>
<p>If you have a function that takes a nursery as an argument, then you
can pass it one of these instead to control the error-handling policy
for the tasks it spawns. Pretty nifty. But there is one subtlety here
that pushes Trio towards different conventions than asyncio or some
other libraries: it means that <tt class="docutils literal">start_soon</tt> has to take a function,
not a coroutine object or a <tt class="docutils literal">Future</tt>. (You can call a function
multiple times, but there's no way to restart a coroutine object or a
<tt class="docutils literal">Future</tt>.) I think this is the better convention anyway for a number
of reasons (especially since Trio doesn't even have <tt class="docutils literal">Future</tt>s!),
but still, worth mentioning.</p>
</div>
<div class="section" id="no-really-nurseries-always-wait-for-the-tasks-inside-to-exit">
<h3><a class="toc-backref" href="#id22">No, really, nurseries <em>always</em> wait for the tasks inside to exit.</a></h3>
<p>It's also worth talking about how task cancellation and task joining
interact, since there are some subtleties here that could – if handled
incorrectly – break the nursery invariants.</p>
<p>In Trio, it's possible for code to receive a cancellation request at
any time. After a cancellation is requested, then the next time the
code executes a "checkpoint" operation (<a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#checkpoints">details</a>),
a <tt class="docutils literal">Cancelled</tt> exception is raised. This means that there's a gap
between when a cancellation is <em>requested</em> and when it actually
<em>happens</em> – it might be a while before the task executes a checkpoint,
and then after that the exception has to unwind the stack, run cleanup
handlers, etc. When this happens, the nursery always waits for the
full cleanup to happen. We <em>never</em> terminate a task without giving it
a chance to run cleanup handlers, and we <em>never</em> leave a task to run
unsupervised outside of the nursery, even if it's in the process of
being cancelled.</p>
</div>
<div class="section" id="automatic-resource-cleanup-works">
<h3><a class="toc-backref" href="#id23">Automatic resource cleanup works.</a></h3>
<p>Because nurseries follow the black box rule, they make <tt class="docutils literal">with</tt> blocks
work again. There's no chance that, say, closing a file at the end of
a <tt class="docutils literal">with</tt> block will accidentally break a background task that's
still using that file.</p>
</div>
<div class="section" id="automated-error-propagation-works">
<h3><a class="toc-backref" href="#id24">Automated error propagation works.</a></h3>
<p>As noted above, in most concurrency systems, unhandled errors in
background tasks are simply discarded. There's literally nothing else
to do with them.</p>
<p>In Trio, since every task lives inside a nursery, and every nursery is
part of a parent task, and parent tasks are required to wait for the
tasks inside the nursery... we <em>do</em> have something we can do with
unhandled errors. If a background task terminates with an exception,
we can rethrow it in the parent task. The intuition here is that a
nursery is something like a "concurrent call" primitive: we can think
of our example above as calling <tt class="docutils literal">myfunc</tt> and <tt class="docutils literal">anotherfunc</tt> at the
same time, so our call stack has become a tree. And exceptions
propagate up this call tree towards the root, just like they propagate
up a regular call stack.</p>
<p>There is one subtlety here though: when we re-raise an exception in
the parent task, it will start propagating in the parent task.
Generally, that means that the parent task will exit the nursery
block. But we've already said that the parent task cannot leave the
nursery block while there are still child tasks running. So what do we
do?</p>
<p>The answer is that when an unhandled exception occurs in a child, Trio
immediately cancels all the other tasks in the same nursery, and then
waits for them to finish before re-raising the exception. The
intuition here is that exceptions cause the stack to unwind, and if we
want to unwind past a branch point in our stack tree, we need to
unwind the other branches, by cancelling them.</p>
<p>This does mean though that if you want to implement nurseries in your
language, you may need some kind of integration between the nursery
code and your cancellation system. This might be tricky if you're
using a language like C# or Golang where cancellation is usually
managed through manual object passing and convention, or (even worse)
one that doesn't have a generic cancellation mechanism.</p>
</div>
<div class="section" id="a-surprise-benefit-removing-go-statements-enables-new-features">
<h3><a class="toc-backref" href="#id25">A surprise benefit: removing <tt class="docutils literal">go</tt> statements enables new features</a></h3>
<p>Eliminating <tt class="docutils literal">goto</tt> allowed previous language designers to make
stronger assumptions about the structure of programs, which enabled
new features like <tt class="docutils literal">with</tt> blocks and exceptions; eliminating <tt class="docutils literal">go</tt>
statements has a similar effect. For example:</p>
<ul class="simple">
<li>Trio's cancellation system is easier to use and more reliable than
competitors, because it can assume that tasks are nested in a
regular tree structure; see <a class="reference external" href="https://vorpus.org/blog/timeouts-and-cancellation-for-humans/">Timeouts and cancellation for humans</a>
for a full discussion.</li>
<li>Trio is the only Python concurrency library where control-C works
the way Python developers expect (<a class="reference external" href="https://vorpus.org/blog/control-c-handling-in-python-and-trio/">details</a>).
This would be impossible without nurseries providing a reliable
mechanism for propagating exceptions.</li>
</ul>
</div>
</div>
<div class="section" id="nurseries-in-practice">
<h2><a class="toc-backref" href="#id26">Nurseries in practice</a></h2>
<p>So that's the theory. How's it work in practice?</p>
<p>Well... that's an empirical question: you should try it and find out!
But seriously, we just won't know for sure until lots of people have
pounded on it. At this point I'm pretty confident that the foundation
is sound, but maybe we'll realize we need to make some tweaks, like
how the early structured programming advocates eventually backed off
from eliminating <tt class="docutils literal">break</tt> and <tt class="docutils literal">continue</tt>.</p>
<p>And if you're an experienced concurrent programmer who's just learning
Trio, then you should expect to find it a bit rocky at times. You'll
have to <a class="reference external" href="https://stackoverflow.com/questions/48282841/in-trio-how-can-i-have-a-background-task-that-lives-as-long-as-my-object-does">learn new ways to do things</a>
– just like programmers in the 1970s found it challenging to learn how
to write code without <tt class="docutils literal">goto</tt>.</p>
<p>But of course, that's the point. As Knuth wrote (<a class="reference external" href="https://scholar.google.com/scholar?cluster=17147143327681396418&hl=en&as_sdt=0,5">Knuth,
1974</a>,
p. 275):</p>
<blockquote>
Probably the worst mistake any one can make with respect to the
subject of <strong>go to</strong> statements is to assume that "structured
programming" is achieved by writing programs as we always have and
then eliminating the <strong>go to</strong>'s. Most <strong>go to</strong>'s shouldn't be
there in the first place! What we really want is to conceive of our
program in such a way that we rarely even <em>think</em> about <strong>go to</strong>
statements, because the real need for them hardly ever arises. The
language in which we express our ideas has a strong influence on
our thought processes. Therefore, Dijkstra asks for more new
language features – structures which encourage clear thinking – in
order to avoid the <strong>go to</strong>'s temptations towards complications.</blockquote>
<p>And so far, that's been my experience with using nurseries: they
encourage clear thinking. They lead to designs that are more robust,
easier to use, and just better all around. And the limitations
actually make it easier to solve problems, because you spend less time
being tempted towards unnecessary complications. Using Trio has, in a
very real sense, taught me to be a better programmer.</p>
<p>For example, consider the Happy Eyeballs algorithm (<a class="reference external" href="https://tools.ietf.org/html/rfc8305">RFC 8305</a>), which is a simple
concurrent algorithm for speeding up the establishment of TCP
connections. Conceptually, the algorithm isn't complicated – you race
several connection attempts against each other, with a staggered start
to avoid overloading the network. But if you look at <a class="reference external" href="https://github.com/twisted/twisted/compare/trunk...glyph:statemachine-hostnameendpoint">Twisted's best
implementation</a>,
it's almost 600 lines of Python, and still has <a class="reference external" href="https://twistedmatrix.com/trac/ticket/9345">at least one logic
bug</a>. The equivalent in
Trio is more than <strong>15x</strong> shorter. More importantly, using Trio I was
able to write it in minutes instead of months, and I got the logic
correct on my first try. I never could have done this in any other
framework, even ones where I have much more experience. For more
details, you can <a class="reference external" href="https://www.youtube.com/watch?v=i-R704I8ySE">watch my talk at Pyninsula last month</a>. Is this typical?
Time will tell. But it's certainly promising.</p>
</div>
<div class="section" id="conclusion">
<h2><a class="toc-backref" href="#id27">Conclusion</a></h2>
<p>The popular concurrency primitives – <tt class="docutils literal">go</tt> statements, thread
spawning functions, callbacks, futures, promises, ... they're all
variants on <tt class="docutils literal">goto</tt>, in theory and in practice. And not even the
modern domesticated <tt class="docutils literal">goto</tt>, but the old-testament fire-and-brimstone
<tt class="docutils literal">goto</tt>, that could leap across function boundaries. These primitives
are dangerous even if we don't use them directly, because they
undermine our ability to reason about control flow and compose complex
systems out of abstract modular parts, and they interfere with useful
language features like automatic resource cleanup and error
propagation. Therefore, like <tt class="docutils literal">goto</tt>, they have no place in a modern
high-level language.</p>
<p>Nurseries provide a safe and convenient alternative that preserves the
full power of your language, enables powerful new features (as
demonstrated by Trio's cancellation scopes and control-C handling),
and can produce dramatic improvements in readability, productivity,
and correctness.</p>
<p>Unfortunately, to fully capture these benefits, we do need to remove
the old primitives entirely, and this probably requires building new
concurrency frameworks from scratch – just like eliminating <tt class="docutils literal">goto</tt>
required designing new languages. But as impressive as FLOW-MATIC was
for its time, most of us are glad that we've upgraded to something
better. I don't think we'll regret switching to nurseries either, and
Trio demonstrates that this is a viable design for practical,
general-purpose concurrency frameworks.</p>
</div>
<div class="section" id="comments">
<h2><a class="toc-backref" href="#id28">Comments</a></h2>
<p>You can <a class="reference external" href="https://trio.discourse.group/t/discussion-thread-notes-on-structured-concurrency-or-go-statement-considered-harmful/25">discuss this post on the Trio forum</a>.</p>
<!-- XX: Add this back once there are articles to link to that aren't
already linked multiple times above
Further reading
===============
This post has the heart of what I'm trying to do with Trio, but it
hardly covers everything, so here are some articles that explore other
facets of its core design. I'll try to keep this list updated as
* `Timeouts and cancellation for humans
<https://vorpus.org/blog/timeouts-and-cancellation-for-humans/>`__:
Discussion of Trio's cancellation system, whose design is deeply
entwined with nurseries.
* `Control-C handling in Python and Trio
<https://vorpus.org/blog/control-c-handling-in-python-and-trio/>`__:
A deep dive into signal handling in Python, and all the magic
that lets most users forget about it entirely.
* `Some thoughts on asynchronous API design in a post-async/await
world
<https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/>`__:
My first post on concurrent API design. Probably mostly of
historical interest at this point? Some nice concrete discussion of
the problems caused by unrestricted ``go`` statements, and you can
see me groping towards the ideas in this essay, in case you're into
that kind of thing. You can even see me using ``goto`` as an
analogy, without realizing how relevant it actually was... -->
</div>
<div class="section" id="acknowledgments">
<h2><a class="toc-backref" href="#id29">Acknowledgments</a></h2>
<p>Many thanks to Graydon Hoare, Quentin Pradet, and Hynek Schlawack for
comments on drafts of this post. Any remaining errors, of course, are
all my fault.</p>
<p>Credits: Sample FLOW-MATIC code from <a class="reference external" href="http://archive.computerhistory.org/resources/text/Remington_Rand/Univac.Flowmatic.1957.102646140.pdf">this brochure</a>
(PDF), as <a class="reference external" href="http://www.computerhistory.org/collections/catalog/102646140">preserved by the Computer History Museum</a>.
<a class="reference external" href="https://www.flickr.com/photos/iam_photo/478178221">Wolves in Action</a>, by i:am.
photography / Martin Pannier, licensed under <a class="reference external" href="https://creativecommons.org/licenses/by-nc-sa/2.0/">CC-BY-SA 2.0</a>, cropped.
<a class="reference external" href="https://pixabay.com/en/french-bulldog-pet-dog-funny-2427629/">French Bulldog Pet Dog</a> by
Daniel Borker, released under the <a class="reference external" href="https://creativecommons.org/publicdomain/zero/1.0/">CC0 public domain dedication</a>.</p>
</div>
<div class="section" id="footnotes">
<h2><a class="toc-backref" href="#id30">Footnotes</a></h2>
<table class="docutils footnote" frame="void" id="id5" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id1">[1]</a></td><td>At least for a certain kind of person.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id2">[2]</a></td><td>And WebAssembly even demonstrates that it's possible and at
least somewhat desirable have a low-level assembly language
without <tt class="docutils literal">goto</tt>: <a class="reference external" href="https://www.w3.org/TR/wasm-core-1/#control-instructions%E2%91%A0">reference</a>,
<a class="reference external" href="https://github.com/WebAssembly/design/blob/master/Rationale.md#control-flow">rationale</a></td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id3">[3]</a></td><td>For those who can't possibly pay attention to the text without
first knowing whether I'm aware of their favorite paper, my
current list of topics to include in my review are: the
"parallel composition" operator in Cooperating/Communicating
Sequential Processes and Occam, the fork/join model, Erlang
supervisors, Martin Sústrik's article on <a class="reference external" href="http://250bpm.com/blog:71">Structured
concurrency</a> and work on <a class="reference external" href="https://github.com/sustrik/libdill">libdill</a>, and <a class="reference external" href="https://docs.rs/crossbeam/0.3.2/crossbeam/struct.Scope.html">crossbeam::scope</a>
/ <a class="reference external" href="https://docs.rs/rayon/1.0.1/rayon/fn.scope.html">rayon::scope</a> in Rust.
[Edit: I've also been pointed to the highly relevant
<a class="reference external" href="https://godoc.org/golang.org/x/sync/errgroup">golang.org/x/sync/errgroup</a> and
<a class="reference external" href="https://godoc.org/github.com/oklog/run">github.com/oklog/run</a> in Golang.] If I'm
missing anything important, <a class="reference external" href="mailto:njs@pobox.com">let me know</a>.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id4">[4]</a></td><td>If you call <tt class="docutils literal">start_soon</tt> <em>after</em> the nursery block has
exited, then <tt class="docutils literal">start_soon</tt> raises an error, and conversely, if
it doesn't raise an error, then the nursery block is guaranteed
to remain open until the task finishes. If you're implementing
your own nursery system then you'll want to handle
synchronization carefully here.</td></tr>
</tbody>
</table>
<!-- LocalWords: woff dir svg DejaVu deja vu url html SVGs viewport -->
<!-- LocalWords: APIs myfunc Golang pthread POSIX modulename Erlang -->
<!-- LocalWords: myfuncname asyncio GObject getElementById myid async -->
<!-- LocalWords: onclick Javascript errorhandler addCallback goto px -->
<!-- LocalWords: combinators anotherfunc cancelled MATIC matic Edsger -->
<!-- LocalWords: Golang's goroutine runtime png shorthands jpg RAII -->
<!-- LocalWords: backpressure unhandled Fortran errbacks hoc rethrow -->
<!-- LocalWords: reimplementation tracebacks tradeoffs pathified TCP -->
<!-- LocalWords: coroutine invariants cancelling Twisted's Pyninsula -->
<!-- LocalWords: Graydon Hoare Pradet Hynek Schlawack Borker Sústrik's -->
<!-- LocalWords: WebAssembly libdill -->
</div>
Timeouts and cancellation for humans2018-01-11T00:00:00-08:002018-01-11T00:00:00-08:00Nathaniel J. Smithtag:vorpus.org,2018-01-11:/blog/timeouts-and-cancellation-for-humans/<p class="first last">Timeouts are so hard to use, that even <tt class="docutils literal">requests</tt> makes
them confusing. Let's talk about why, and how to fix it.</p>
<p><em>Your</em> code might be perfect and never fail, but unfortunately the
outside world is less reliable. Sometimes, other people's programs
crash or freeze. Networks go down; printers <a class="reference external" href="https://en.wikipedia.org/wiki/Lp0_on_fire">catch on fire</a>. Your code needs to be
prepared for this: every time you read from the network, attempt to
acquire an inter-process lock, or send an HTTP request, there are at
least three possibilities you need to think about:</p>
<ul class="simple">
<li>It might succeed.</li>
<li>It might fail</li>
<li>It might hang forever, never succeeding or failing: days pass,
leaves fall, winter comes, yet still our request waits, yearning for
a response that will never come.</li>
</ul>
<!-- the 30 years quote is Pat Helland, Idempotence is not a Medical
condition, which I have in zotero -->
<!-- https://en.wikipedia.org/wiki/List_of_individual_dogs#Faithful_after_master.27s_death -->
<!-- "Jurassic Bark" is the futurama episode with the horrible horrible
ending. -->
<p>The first two are straightforward enough. To handle that last case,
though, you need timeouts. Pretty much every place your program
interacts with another program or person or system, it needs a
timeout, and if you don't have one, that's a latent bug.</p>
<p>Let's be honest: if you're like most developers, your code probably
has <em>tons</em> of bugs caused by missing timeouts. Mine certainly does.
And it's weird – since this need is so ubiqituous, and so fundamental
to doing I/O correctly, you'd think that every programming environment
would provide easy and robust ways to apply timeouts to arbitrary
operations. But... they don't. In fact, most timeout APIs are so
tedious and error-prone that it's just not practical for developers to
reliably get this right. So don't feel bad – it's not your fault your
code has all those timeout bugs, it's the fault of those I/O
libraries!</p>
<p>But now I'm, uh, <a class="reference external" href="https://trio.readthedocs.io">writing an I/O library</a>. And not just any I/O library, but
one whose whole selling point is that it's obsessed with being easy to
use. So I wanted to make sure that in my library – Trio – you can
easily and reliably apply timeouts to arbitrary I/O operations. But
designing a user-friendly timeout API is a surprisingly tricky task,
so in this blog post I'm going to do a deep dive into the landscape of
possible designs – and in particular the many precursors that inspired
me – and then explain what I came up with, and why I think it's a real
improvement on the old state-of-the-art. And finally, I'll discuss how
Trio's ideas could be applied more broadly, and in particular, I'll
demonstrate a prototype implementation for good old synchronous
Python.</p>
<p>So – what's so hard about timeout handling?</p>
<div class="contents topic" id="contents">
<p class="topic-title"><strong>Contents:</strong></p>
<ul class="simple">
<li><a class="reference internal" href="#simple-timeouts-don-t-support-abstraction" id="id11">Simple timeouts don't support abstraction</a></li>
<li><a class="reference internal" href="#absolute-deadlines-are-composable-but-kinda-annoying-to-use" id="id12">Absolute deadlines are composable (but kinda annoying to use)</a></li>
<li><a class="reference internal" href="#cancel-tokens" id="id13">Cancel tokens</a><ul>
<li><a class="reference internal" href="#cancel-tokens-encapsulate-cancellation-state" id="id14">Cancel tokens encapsulate cancellation state</a></li>
<li><a class="reference internal" href="#cancel-tokens-are-level-triggered-and-can-be-scoped-to-match-your-program-s-needs" id="id15">Cancel tokens are level-triggered and can be scoped to match your program's needs</a></li>
<li><a class="reference internal" href="#cancel-tokens-are-unreliable-in-practice-because-humans-are-lazy" id="id16">Cancel tokens are unreliable in practice because humans are lazy</a></li>
</ul>
</li>
<li><a class="reference internal" href="#cancel-scopes-trio-s-human-friendly-solution-for-timeouts-and-cancellation" id="id17">Cancel scopes: Trio's human-friendly solution for timeouts and cancellation</a><ul>
<li><a class="reference internal" href="#how-cancel-scopes-work" id="id18">How cancel scopes work</a></li>
<li><a class="reference internal" href="#where-do-we-check-for-cancellation" id="id19">Where do we check for cancellation?</a></li>
<li><a class="reference internal" href="#an-escape-hatch" id="id20">An escape hatch</a></li>
<li><a class="reference internal" href="#cancel-scopes-and-concurrency" id="id21">Cancel scopes and concurrency</a></li>
<li><a class="reference internal" href="#summary" id="id22">Summary</a></li>
</ul>
</li>
<li><a class="reference internal" href="#who-else-can-benefit-from-cancel-scopes" id="id23">Who else can benefit from cancel scopes?</a><ul>
<li><a class="reference internal" href="#synchronous-single-threaded-python" id="id24">Synchronous, single-threaded Python</a></li>
<li><a class="reference internal" href="#asyncio" id="id25">asyncio</a></li>
<li><a class="reference internal" href="#other-languages" id="id26">Other languages</a></li>
</ul>
</li>
<li><a class="reference internal" href="#now-go-forth-and-fix-your-timeout-bugs" id="id27">Now go forth and fix your timeout bugs!</a></li>
<li><a class="reference internal" href="#comments" id="id28">Comments</a></li>
</ul>
</div>
<div class="section" id="simple-timeouts-don-t-support-abstraction">
<h2><a class="toc-backref" href="#id11">Simple timeouts don't support abstraction</a></h2>
<p>The simplest and most obvious way to handle timeouts is to go through
each potentially-blocking function in your API, and give it a
<tt class="docutils literal">timeout</tt> argument. In the Python standard library you'll see this
in APIs like <tt class="docutils literal">threading.Lock.acquire</tt>:</p>
<pre class="literal-block">
lock = threading.Lock()
# Wait at most 10 seconds for the lock to become available
lock.acquire(timeout=10)
</pre>
<p>If you use the <tt class="docutils literal">socket</tt> module for networking, it works the same
way, except that the timeout is set on the socket object instead of
passed to every call:</p>
<pre class="literal-block">
sock = socket.socket()
# Set the timeout once
sock.settimeout(10)
# Wait at most 10 seconds to establish a connection to the remote host
sock.connect(...)
# Wait at most 10 seconds for data to arrive from the remote host
sock.recv(...)
</pre>
<p>This is a little more convenient than having to remember to pass in
explicit timeouts every time (and we'll discuss the convenience issue
more below) but it's important to understand that this is a purely
cosmetic change. The semantics are the same as we saw with
<tt class="docutils literal">threading.Lock</tt>: each method call gets its own separate 10 second
timeout.</p>
<p>So what's wrong with this? It seems straightforward enough. And if we
always wrote code directly against these low level APIs, then it would
probably be sufficient. But – programming is about abstraction. Say we
want to fetch a file from <a class="reference external" href="https://en.wikipedia.org/wiki/Amazon_S3">S3</a>. We might do that with
boto3, using <a class="reference external" href="https://botocore.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.get_object">S3.Client.get_object</a>.
What does <tt class="docutils literal">S3.Client.get_object</tt> do? It makes a series of HTTP
requests to the S3 servers, by calling into the <a class="reference external" href="http://python-requests.org/">requests</a> library for each one. And then each
call to <tt class="docutils literal">requests</tt> internally makes a series of calls to the
<tt class="docutils literal">socket</tt> module to do the actual network communication <a class="footnote-reference" href="#id6" id="id1">[1]</a>.</p>
<p>From the user's point of view, these are three different APIs that
fetch data from a remote service:</p>
<div class="highlight"><pre><span></span><span class="n">s3client</span><span class="o">.</span><span class="n">get_object</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
<span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"https://..."</span><span class="p">)</span>
<span class="n">sock</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
</pre></div>
<p>Sure, they're at different levels of abstraction, but the whole idea
of abstracting away such details is that the user doesn't have to
care. So if our plan is to use <tt class="docutils literal">timeout=</tt> arguments everywhere, then
we should expect these each to take a <tt class="docutils literal">timeout=</tt> argument:</p>
<div class="highlight"><pre><span></span><span class="n">s3client</span><span class="o">.</span><span class="n">get_object</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"https://..."</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">sock</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
</pre></div>
<p>Now here's the problem: if this is how we're doing things, then
actually implementing these functions is a pain in the butt. Why?
Well, let's take a simplified example. When processing HTTP response,
there comes a point when we've seen the <tt class="docutils literal"><span class="pre">Content-Length</span></tt> header, and
now we need to read that many bytes to fetch the actual response body.
So somewhere inside <tt class="docutils literal">requests</tt> there's a loop like:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">read_body</span><span class="p">(</span><span class="n">sock</span><span class="p">,</span> <span class="n">content_length</span><span class="p">):</span>
<span class="n">body</span> <span class="o">=</span> <span class="nb">bytearray</span><span class="p">()</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">body</span><span class="p">)</span> <span class="o"><</span> <span class="n">content_length</span><span class="p">:</span>
<span class="n">max_to_receive</span> <span class="o">=</span> <span class="n">content_length</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">body</span><span class="p">)</span>
<span class="n">body</span> <span class="o">+=</span> <span class="n">sock</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="n">max_to_receive</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">body</span><span class="p">)</span> <span class="o">==</span> <span class="n">content_length</span>
<span class="k">return</span> <span class="n">body</span>
</pre></div>
<p>Now we'll modify this loop to add timeout support. We want to be able
to say "I'm willing to wait at most 10 seconds to read the response
body". But we can't just pass the timeout argument through to
<tt class="docutils literal">recv</tt>, because imagine the first call to <tt class="docutils literal">recv</tt> takes 6 seconds –
now for our overall operation to complete in 10 seconds, our second
<tt class="docutils literal">recv</tt> call has to be given a timeout of 4 seconds. With the
<tt class="docutils literal">timeout=</tt> approach, every time we pass between levels of
abstraction we need to write some annoying gunk to recalculate
timeouts:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">read_body</span><span class="p">(</span><span class="n">sock</span><span class="p">,</span> <span class="n">content_length</span><span class="p">,</span> <span class="n">timeout</span><span class="p">):</span>
<span class="hll"> <span class="n">read_body_deadline</span> <span class="o">=</span> <span class="n">timeout</span> <span class="o">+</span> <span class="n">time</span><span class="o">.</span><span class="n">monotonic</span><span class="p">()</span>
</span> <span class="n">body</span> <span class="o">=</span> <span class="nb">bytearray</span><span class="p">()</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">body</span><span class="p">)</span> <span class="o"><</span> <span class="n">content_length</span><span class="p">:</span>
<span class="n">max_to_receive</span> <span class="o">=</span> <span class="n">content_length</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">body</span><span class="p">)</span>
<span class="hll"> <span class="n">recv_timeout</span> <span class="o">=</span> <span class="n">read_body_deadline</span> <span class="o">-</span> <span class="n">time</span><span class="o">.</span><span class="n">monotonic</span><span class="p">()</span>
</span><span class="hll"> <span class="n">body</span> <span class="o">+=</span> <span class="n">sock</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="n">max_to_receive</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="n">recv_timeout</span><span class="p">)</span>
</span> <span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">body</span><span class="p">)</span> <span class="o">==</span> <span class="n">content_length</span>
<span class="k">return</span> <span class="n">body</span>
</pre></div>
<p>(And even this is actually simplified because we're pretending that
<tt class="docutils literal">sock.recv</tt> takes a <tt class="docutils literal">timeout</tt> argument – if you wanted to this for
real you'd have to call <tt class="docutils literal">settimeout</tt> before every socket method, and
then probably use some <tt class="docutils literal">try</tt>/<tt class="docutils literal">finally</tt> thing to set it back or
else risk confusing some other part of your program.)</p>
<p>In practice, nobody does this – all the higher-level Python libraries
I know of that take <tt class="docutils literal">timeout=</tt> arguments, just pass them through
unchanged to the lower layers. And this breaks abstraction. For
example, here are two popular Python APIs you might use today, and
they look like they take similar <tt class="docutils literal">timeout=</tt> arguments:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">threading</span>
<span class="n">lock</span> <span class="o">=</span> <span class="n">threading</span><span class="o">.</span><span class="n">Lock</span><span class="p">()</span>
<span class="n">lock</span><span class="o">.</span><span class="n">acquire</span><span class="p">(</span><span class="n">timeout</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"https://..."</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
</pre></div>
<p>But in fact these two <tt class="docutils literal">timeout=</tt> arguments mean totally different
things. The first one means "try to acquire the lock, but give up
after 10 seconds". The second one means "try to fetch the given URL,
but give up if at any point any individual low-level socket operation
takes more than 10 seconds". Probably the whole reason you're using
<tt class="docutils literal">requests</tt> is that you don't want to think about low-level sockets,
but sorry, you have to anyway. In fact it is currently <strong>not
possible</strong> to guarantee that <tt class="docutils literal">requests.get</tt> will return in <strong>any</strong>
finite time: if a malicious or misbehaving server sends at least 1
byte every 10 seconds, then our <tt class="docutils literal">requests</tt> call above will keep
resetting its timeout over and over and never return.</p>
<p>I don't mean to pick on <tt class="docutils literal">requests</tt> here – this problem is everywhere
in Python APIs. I'm using <tt class="docutils literal">requests</tt> as the example because Kenneth
Reitz is famous for his obsession with making its API as obvious and
intuitive as possible, and this is one of the rare places where he's
failed. I think this is the only part of the requests API that gets a
<a class="reference external" href="http://docs.python-requests.org/en/master/user/quickstart/#timeouts">big box in the documentation warning you that it's counterintuitive</a>.
So like... if even Kenneth Reitz can't get this right, I think we can
conclude that "just slap a <tt class="docutils literal">timeout=</tt> argument on it" does not lead
to APIs fit for human consumption.</p>
</div>
<div class="section" id="absolute-deadlines-are-composable-but-kinda-annoying-to-use">
<h2><a class="toc-backref" href="#id12">Absolute deadlines are composable (but kinda annoying to use)</a></h2>
<p>If <tt class="docutils literal">timeout=</tt> arguments don't work, what can we do instead? Well,
here's one option that some people advocate. Notice how in our
<tt class="docutils literal">read_body</tt> example above, we converted the incoming relative
timeout ("10 seconds from the moment I called this function") into an
absolute deadline ("when the clock reads 12:01:34.851"), and then
converted back before each socket call. This code would get simpler if
we wrote the whole API in terms of <tt class="docutils literal">deadline=</tt> arguments, instead of
<tt class="docutils literal">timeout=</tt> arguments. This makes things simple for library
implementors, because you can just pass the deadline down your
abstraction stack:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">read_body</span><span class="p">(</span><span class="n">sock</span><span class="p">,</span> <span class="n">content_length</span><span class="p">,</span> <span class="n">deadline</span><span class="p">):</span>
<span class="n">body</span> <span class="o">=</span> <span class="nb">bytearray</span><span class="p">()</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">body</span><span class="p">)</span> <span class="o"><</span> <span class="n">content_length</span><span class="p">:</span>
<span class="n">max_to_receive</span> <span class="o">=</span> <span class="n">content_length</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">body</span><span class="p">)</span>
<span class="n">body</span> <span class="o">+=</span> <span class="n">sock</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="n">max_to_receive</span><span class="p">,</span> <span class="n">deadline</span><span class="o">=</span><span class="n">deadline</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">body</span><span class="p">)</span> <span class="o">==</span> <span class="n">content_length</span>
<span class="k">return</span> <span class="n">body</span>
<span class="c1"># Wait 10 seconds total for the response body to be downloaded</span>
<span class="n">deadline</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">monotonic</span><span class="p">()</span> <span class="o">+</span> <span class="mi">10</span>
<span class="n">read_body</span><span class="p">(</span><span class="n">sock</span><span class="p">,</span> <span class="n">content_length</span><span class="p">,</span> <span class="n">deadline</span><span class="p">)</span>
</pre></div>
<p>(A well-known API that works like this is <a class="reference external" href="https://golang.org/pkg/net/#Conn">Go's socket layer</a>.)</p>
<p>But this approach also has a downside: it succeeds in moving the
annoying bit out of the library internals, and and instead puts it on
the person using the API. At the outermost level where timeout policy
is being set, your library's users probably want to say something like
"give up after 10 seconds", and if all you take is a <tt class="docutils literal">deadline=</tt>
argument then they have to do the conversion by hand every time. Or
you could have every function take both <tt class="docutils literal">timeout=</tt> and <tt class="docutils literal">deadline=</tt>
arguments, but then you need some boilerplate in every function to
normalize them, raise an error if both are specified, and so forth.
Deadlines are an improvement over raw timeouts, but it feels like
there's still some missing abstraction here.</p>
</div>
<div class="section" id="cancel-tokens">
<h2><a class="toc-backref" href="#id13">Cancel tokens</a></h2>
<div class="section" id="cancel-tokens-encapsulate-cancellation-state">
<h3><a class="toc-backref" href="#id14">Cancel tokens encapsulate cancellation state</a></h3>
<p>Here's the missing abstraction: instead of supporting two different
arguments:</p>
<div class="highlight"><pre><span></span><span class="c1"># What users do:</span>
<span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=...</span><span class="p">)</span>
<span class="c1"># What libraries do:</span>
<span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">deadline</span><span class="o">=...</span><span class="p">)</span>
<span class="c1"># How we implement it:</span>
<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">deadline</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">deadline</span> <span class="o">=</span> <span class="n">normalize_deadline</span><span class="p">(</span><span class="n">deadline</span><span class="p">,</span> <span class="n">timeout</span><span class="p">)</span>
<span class="o">...</span>
</pre></div>
<p>we can encapsulate the timeout expiration information into an object
with a convenience constructor:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Deadline</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deadline</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">deadline</span> <span class="o">=</span> <span class="n">deadline</span>
<span class="k">def</span> <span class="nf">after</span><span class="p">(</span><span class="n">timeout</span><span class="p">):</span>
<span class="k">return</span> <span class="n">Deadline</span><span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">monotonic</span><span class="p">()</span> <span class="o">+</span> <span class="n">timeout</span><span class="p">)</span>
<span class="c1"># Wait 10 seconds total for the URL to be fetched</span>
<span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"https://..."</span><span class="p">,</span> <span class="n">deadline</span><span class="o">=</span><span class="n">after</span><span class="p">(</span><span class="mi">10</span><span class="p">))</span>
</pre></div>
<p>That looks nice and natural for users, but since it uses an absolute
deadline internally, it's easy for library implementors too.</p>
<p>And once we've gone this far, we might as well make things a bit more
abstract. After all, a timeout isn't the only reason you might want to
give up on some blocking operation; "give up after 10 seconds have
passed" is a special case of "give up after <some arbitrary condition
becomes true>". If you were using <tt class="docutils literal">requests</tt> to implement a web
browser, you'd want to be able to say "start fetching this URL, but
give up when the 'stop' button gets pressed". And libraries mostly
treat this <tt class="docutils literal">Deadline</tt> object as totally opaque in any case – they
just pass it through to lower-level calls, and trust that eventually
some low-level primitives will interpret it appropriately. So instead
of thinking of this object as encapsulating a deadline, we can start
thinking of it as encapsulating an arbitrary "should we give up now"
check. And in honor of its more abstract nature, instead of calling it
a <tt class="docutils literal">Deadline</tt> let's call this new thing a <tt class="docutils literal">CancelToken</tt>:</p>
<div class="highlight"><pre><span></span><span class="c1"># This library is only hypothetical, sorry</span>
<span class="kn">from</span> <span class="nn">cancel_tokens</span> <span class="kn">import</span> <span class="n">cancel_after</span><span class="p">,</span> <span class="n">cancel_on_callback</span>
<span class="c1"># Returns an opaque CancelToken object that enters the "cancelled"</span>
<span class="c1"># state after 10 seconds.</span>
<span class="n">cancel_token</span> <span class="o">=</span> <span class="n">cancel_after</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="c1"># So this request gives up after 10 seconds</span>
<span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"https://..."</span><span class="p">,</span> <span class="n">cancel_token</span><span class="o">=</span><span class="n">cancel_token</span><span class="p">)</span>
<span class="c1"># Returns an opaque CancelToken object that enters the "cancelled"</span>
<span class="c1"># state when the given callback is called.</span>
<span class="n">cancel_callback</span><span class="p">,</span> <span class="n">cancel_token</span> <span class="o">=</span> <span class="n">cancel_on_callback</span><span class="p">()</span>
<span class="c1"># Arrange for the callback to be called if someone clicks "stop"</span>
<span class="n">stop_button</span><span class="o">.</span><span class="n">on_press</span> <span class="o">=</span> <span class="n">cancel_callback</span>
<span class="c1"># So this request gives up if someone clicks 'stop'</span>
<span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"https://..."</span><span class="p">,</span> <span class="n">cancel_token</span><span class="o">=</span><span class="n">cancel_token</span><span class="p">)</span>
</pre></div>
<p>So promoting the cancellation condition to a first-class object makes
our timeout API easier to use, and <em>at the same time</em> makes it
dramatically more powerful: now we can handle not just timeouts, but
also arbitrary cancellations, which is a very common requirement when
writing concurrent code. (For example, it lets us express things like:
"run these two redundant requests in parallel, and as soon as one of
them finishes then cancel the other one".) This is a <em>great</em> idea. As
far as I know, it originally comes from Joe Duffy's <a class="reference external" href="https://blogs.msdn.microsoft.com/pfxteam/2009/05/22/net-4-cancellation-framework/">cancellation
tokens</a>
work in C#, and Go <a class="reference external" href="https://golang.org/pkg/context/">context objects</a> are essentially the same idea.
Those folks are pretty smart! In fact, cancel tokens also solve some
other problems that show up in traditional cancellation systems.</p>
</div>
<div class="section" id="cancel-tokens-are-level-triggered-and-can-be-scoped-to-match-your-program-s-needs">
<h3><a class="toc-backref" href="#id15">Cancel tokens are level-triggered and can be scoped to match your program's needs</a></h3>
<p>In our little tour of timeout and cancellation APIs, we started with
timeouts. If you start with cancellation instead, then there's another
common pattern you'll see in lots of systems: a method that lets you
cancel a single thread (or task, or whatever your framework uses as a
thread-equivalent), by waking it up and throwing in some kind of
exception. Examples include asyncio's <a class="reference external" href="https://docs.python.org/3/library/asyncio-task.html#asyncio.Task.cancel">Task.cancel</a>,
Curio's <a class="reference external" href="https://curio.readthedocs.io/en/latest/reference.html#Task.cancel">Task.cancel</a>,
pthread cancellation, Java's <a class="reference external" href="https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html#interrupt--">Thread.interrupt</a>,
C#'s <a class="reference external" href="https://msdn.microsoft.com/en-us/library/system.threading.thread.interrupt(v=vs.110).aspx">Thread.Interrupt</a>,
and so forth. In their honor, I'll call this the "thread interrupt"
approach to cancellation.</p>
<p>In the thread-interrupt approach, cancellation is a point-in-time
<em>event</em> that's directed at a <em>fixed-size entity</em>: one call → one
exception in one thread/task. There are two issues here.</p>
<p>The problem with scale is fairly obvious: if you have a single
function you'd like to call normally <em>but</em> you might need to cancel
it, then you have to spawn a new thread/task/whatever just for that:</p>
<pre class="literal-block">
http_thread = spawn_new_thread(requests.get, "https://...")
# Arrange that http_thread.interrupt() will be called if someone
# clicks the stop button
stop_button.on_click = http_thread.interrupt
try:
http_response = http_thread.wait_for_result()
except Interrupted:
...
</pre>
<p>Here the thread isn't being used for concurrency; it's just an awkward
way of letting you delimit the scope of the cancellation.</p>
<p>Or, what if you have a big complicated piece of work that you want to
cancel – for example, something that internally spawns multiple worker
threads? In our example above, if <tt class="docutils literal">requests.get</tt> spawned some
additional backgrounds threads, they might be left hanging when we
cancel the first thread. Handling this correctly would require some
complex and delicate bookkeeping.</p>
<p>Cancel tokens solve this problem: the work they cancel is "whatever
the token was passed into", which could be a single function, or a
complex multi-tiered set of thread pools, or anything in between.</p>
<p>The other problem with the thread-interrupt approach is more subtle:
it treats cancellation as an <em>event</em>. Cancel tokens, on the other
hand, model cancellation as a <em>state</em>: they start out in the
uncancelled state, and eventually transition into the cancelled state.</p>
<p>This is subtle, but it makes cancel tokens less error-prone. One way
to think of this is the <a class="reference external" href="https://lwn.net/Articles/25137/">edge-triggered/level-triggered distinction</a>: thread-interrupt APIs provide
edge-triggered notification of cancellations, as compared to
level-triggered for cancel tokens. Edge-triggered APIs are notoriously
tricky to use. You can see an example of this in Python's
<a class="reference external" href="https://docs.python.org/3/library/threading.html#threading.Event">threading.Event</a>:
even though it's called "event", it actually has an internal boolean
state; cancelling a cancel token is like setting an Event.</p>
<p>That's all pretty abstract. Let's make it more concrete. Consider the
common pattern of using a <tt class="docutils literal">try</tt>/<tt class="docutils literal">finally</tt> to make sure that a
connection is shut down properly. Here's a rather artificial example
of a function that makes a Websocket connection, sends a message, and
then makes sure to close it, regardless of whether <tt class="docutils literal">send_message</tt>
raises an exception: <a class="footnote-reference" href="#id7" id="id2">[2]</a></p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">send_websocket_messages</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">messages</span><span class="p">):</span>
<span class="n">open_websocket_connection</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">for</span> <span class="n">message</span> <span class="ow">in</span> <span class="n">messages</span><span class="p">:</span>
<span class="n">ws</span><span class="o">.</span><span class="n">send_message</span><span class="p">(</span><span class="n">message</span><span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">ws</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</pre></div>
<p>Now suppose we start this function running, but at some point the
other side drops off the network and our <tt class="docutils literal">send_message</tt> call hangs
forever. Eventually, we get tired of waiting, and cancel it.</p>
<p>With a thread-interrupt style edge-triggered API, this causes the
<tt class="docutils literal">send_message</tt> call to immediately raise an exception, and then our
connection cleanup code automatically runs. So far so good. But here's
an interesting fact about the websocket protocol: it has <a class="reference external" href="https://tools.ietf.org/html/rfc6455#section-5.5.1">a "close"
message</a> you're
supposed to send before closing the connection. In general this is a
good thing; it allows for cleaner shutdowns. So when we call
<tt class="docutils literal">ws.close()</tt>, it'll try to send this message. But... in this case,
the reason we're trying to close the connection is because we've given
up on the other side accepting any new messages. So now <tt class="docutils literal">ws.close()</tt>
also hangs forever.</p>
<p>If we used a cancel token, this doesn't happen:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">send_websocket_messages</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">messages</span><span class="p">,</span> <span class="n">cancel_token</span><span class="p">):</span>
<span class="n">open_websocket_connection</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">cancel_token</span><span class="o">=</span><span class="n">cancel_token</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">for</span> <span class="n">message</span> <span class="ow">in</span> <span class="n">messages</span><span class="p">:</span>
<span class="n">ws</span><span class="o">.</span><span class="n">send_message</span><span class="p">(</span><span class="n">message</span><span class="p">,</span> <span class="n">cancel_token</span><span class="o">=</span><span class="n">cancel_token</span><span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">ws</span><span class="o">.</span><span class="n">close</span><span class="p">(</span><span class="n">cancel_token</span><span class="o">=</span><span class="n">cancel_token</span><span class="p">)</span>
</pre></div>
<p>Once the cancel token is triggered, then <em>all</em> future operations on
that token are cancelled, so the call to <tt class="docutils literal">ws.close</tt> doesn't get
stuck. It's a less error-prone paradigm.</p>
<p>It's kind of interesting how so many older APIs could get this wrong.
If you follow the path we did in this blog post, and start by thinking
about applying a timeout to a complex operation composed out of
multiple blocking calls, then it's obvious that if the first call uses
up the whole timeout budget, then any future calls should fail
immediately. Timeouts are naturally level-triggered. And then when we
generalize from timeouts to arbitrary cancellations, the insight
carries over. But if you only think about timeouts for primitive
operations then this never arises; or if you start with a generic
cancellation API and then use it to implement timeouts (like e.g.
Twisted and asyncio do), then the advantages of level-triggered
cancellation are easy to miss.</p>
</div>
<div class="section" id="cancel-tokens-are-unreliable-in-practice-because-humans-are-lazy">
<h3><a class="toc-backref" href="#id16">Cancel tokens are unreliable in practice because humans are lazy</a></h3>
<p>So cancel tokens have really great semantics, and are certainly better
than raw timeouts or deadlines, but they still have a usability
problem: to write a function that supports cancellation, you have to
accept this boilerplate argument and then make sure to pass it on to
every subroutine you call. And remember, a correct and robust program
has to support cancellation in <em>every function that ever does I/O,
anywhere in your stack</em>. If you ever get lazy and leave it out, or
just forget to pass it through to any particular subroutine call, then
you have a latent bug.</p>
<p>Humans suck at this kind of boilerplate. I mean, not you, I'm sure
you're a very diligent programmer who makes sure to implement correct
cancellation support in every function and also flosses every day.
But... perhaps some of your co-workers are not so diligent? Or maybe
you depend on some library that someone else wrote – how much do you
trust your third-party vendors to get this right? As the size of your
stack grows then the chance that everyone everywhere always gets this
right approaches zero.</p>
<p>Can I back that up with any real examples? Well, consider this: in
both C# and Go, the most prominent languages that use this approach
and have been advocating it for a number of years, the underlying
networking primitives <em>still do not have cancel token support</em> <a class="footnote-reference" href="#id8" id="id3">[3]</a>.
These are like... THE fundamental operations that might hang for
reasons outside your control and that you need to be prepared to time
out or cancel, but... I guess they just haven't gotten around to
implementing it yet? Instead their socket layers support an older
mechanism for setting <a class="reference external" href="https://msdn.microsoft.com/en-us/library/system.net.sockets.socket.receivetimeout(v=vs.110).aspx">timeouts</a>
or <a class="reference external" href="https://golang.org/pkg/net/#IPConn.SetDeadline">deadlines</a> on
their socket objects, and if you want to use cancel tokens you have to
figure out how to bridge between the two different systems yourself.</p>
<p>The Go standard library does provide one example of how to do this:
their function for establishing a network connection (basically the
equivalent of Python's <tt class="docutils literal">socket.connect</tt>) does accept a cancel token.
Implementing this requires <a class="reference external" href="https://github.com/golang/go/blob/bf0f69220255941196c684f235727fd6dc747b5c/src/net/fd_unix.go#L99-L141">40 lines of source code</a>,
a background task, and the first try <a class="reference external" href="https://github.com/golang/go/issues/16523">had a race condition that took a
year to be discovered in production</a>. So... in Go if you
want to use cancel tokens (or <tt class="docutils literal">Context</tt>s, in Go parlance), then I
guess that's what you need to implement every time you use any socket
operation? Good luck?</p>
<p>I don't mean to make fun. This stuff is hard. But C# and Go are huge
projects maintained by teams of highly-skilled full-time developers
and backed by Fortune 50 companies. If they can't get it right, who
can? Not me. I'm one human trying to reinvent I/O in Python. I can't
afford to make things that complicated.</p>
</div>
</div>
<div class="section" id="cancel-scopes-trio-s-human-friendly-solution-for-timeouts-and-cancellation">
<h2><a class="toc-backref" href="#id17">Cancel scopes: Trio's human-friendly solution for timeouts and cancellation</a></h2>
<p>Remember way back at the beginning of this post, we noted that Python
socket methods don't take individual timeout arguments, but instead
let you set the timeout once on the socket so it's implicitly passed
to every method you call? And in the section just above, we noticed
that C# and Go do pretty much the same thing? I think they're on to
something. Maybe we should accept that when you have some data that
has to be passed through to every function you call, that's something
the computer should handle, rather than making flaky humans do the
work – but in a general way that supports complex abstractions, not
just sockets.</p>
<div class="section" id="how-cancel-scopes-work">
<h3><a class="toc-backref" href="#id18">How cancel scopes work</a></h3>
<p>Here's how you impose a 10 second timeout on an HTTP request in Trio:</p>
<div class="highlight"><pre><span></span><span class="c1"># The primitive API:</span>
<span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">open_cancel_scope</span><span class="p">()</span> <span class="k">as</span> <span class="n">cancel_scope</span><span class="p">:</span>
<span class="n">cancel_scope</span><span class="o">.</span><span class="n">deadline</span> <span class="o">=</span> <span class="n">trio</span><span class="o">.</span><span class="n">current_time</span><span class="p">()</span> <span class="o">+</span> <span class="mi">10</span>
<span class="k">await</span> <span class="n">request</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"https://..."</span><span class="p">)</span>
</pre></div>
<p>Of course normally you'd use a <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#trio.move_on_after">convenience wrapper</a>,
like:</p>
<div class="highlight"><pre><span></span><span class="c1"># An equivalent but more idiomatic formulation:</span>
<span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">move_on_after</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
<span class="k">await</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"https://..."</span><span class="p">)</span>
</pre></div>
<p>But since this post is about the underlying design, we'll focus on the
primitive version. (Credit: the idea of using <tt class="docutils literal">with</tt> blocks for
timeouts is something I first saw in Dave Beazley's Curio, though I
changed a bunch. I'll hide the details in a footnote: <a class="footnote-reference" href="#id9" id="id4">[4]</a>.)</p>
<p>You should think of <tt class="docutils literal">with open_cancel_scope()</tt> as creating a cancel
token, but it doesn't actually expose any <tt class="docutils literal">CancelToken</tt> object
publically. Instead, the cancel token is pushed onto an invisible
internal stack, and automatically applied to any blocking operations
called inside the <tt class="docutils literal">with</tt> block. So <tt class="docutils literal">requests</tt> doesn't have to do
anything to pass this through – when it eventually sends and receives
data over the network, those primitive calls will automatically have
the deadline applied.</p>
<p>The <tt class="docutils literal">cancel_scope</tt> object lets us control cancellation status: you
can change the deadline, issue an explicit cancellation by calling
<tt class="docutils literal">cancel_scope.cancel()</tt>, and <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#trio.The%20cancel%20scope%20interface">so forth</a>.
If you know C#, it's analogous to a <a class="reference external" href="https://msdn.microsoft.com/en-us/library/system.threading.cancellationtokensource(v=vs.110).aspx">CancellationTokenSource</a>.
One useful trick it allows is implementing the kind
<a class="reference external" href="https://github.com/python-trio/trio/blob/07d144e701ae8ad46d393f6ca1d1294ea8fc2012/trio/_timeouts.py#L96-L118">raise-an-error-if-the-timeout-fires API that people are used to</a>,
on top of the more primitive cancel scope unwinding semantics.</p>
<p>When an operation is cancelled, it raises a <tt class="docutils literal">Cancelled</tt> exception,
which is used to unwind the stack back out to the appropriate <tt class="docutils literal">with
open_cancel_scope</tt> block. Cancel scopes can be nested; <tt class="docutils literal">Cancelled</tt>
exceptions know which scope triggered them, and will keep propagating
until they reach the corresponding <tt class="docutils literal">with</tt> block. (As a consequence,
you should always let the Trio runtime take care of raising and
catching <tt class="docutils literal">Cancelled</tt> exceptions, so that it can properly keep track
of these relationships.)</p>
<p>Supporting nesting is important because some operations may want to
use timeouts internally as an implementation detail. For example, when
you ask Trio to make a TCP connection to a hostname that has multiple
IP addresses associated with it, it uses a "happy eyeballs" algorithm
to <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-io.html#trio.open_tcp_stream">run multiple connections attempts in parallel with a staggered
start</a>.
This requires an <a class="reference external" href="https://github.com/python-trio/trio/blob/d063d672de15edc231b14c0a9bc3673e5275a9dc/trio/_highlevel_open_tcp_stream.py#L260-L265">internal timeout</a>
to decide when it's time to initiate the next connection attempt. But
users shouldn't have to care about that! If you want to say "try to
connect to <tt class="docutils literal">example.com:443</tt>, but give up after 10 seconds", then
that's just:</p>
<div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">move_on_after</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
<span class="n">tcp_stream</span> <span class="o">=</span> <span class="k">await</span> <span class="n">trio</span><span class="o">.</span><span class="n">open_tcp_stream</span><span class="p">(</span><span class="s2">"example.com"</span><span class="p">,</span> <span class="mi">443</span><span class="p">)</span>
</pre></div>
<p>And everything works; thanks to the cancel scope nesting rules, it
turns out <tt class="docutils literal">open_tcp_stream</tt> handles this correctly with no
additional code.</p>
</div>
<div class="section" id="where-do-we-check-for-cancellation">
<h3><a class="toc-backref" href="#id19">Where do we check for cancellation?</a></h3>
<p>Writing code that's correct in the face of cancellation can be tricky.
If a <tt class="docutils literal">Cancelled</tt> exception were to suddenly materialize in a place
the user wasn't prepared for it – perhaps when their code was half-way
through manipulating some delicate data structure – it could corrupt
internal state and cause hard-to-track-down bugs. On the other hand, a
timeout and cancellation system doesn't do much good if you don't
notice cancellations relatively promptly. So an important challenge
for any system is to first pick a "goldilocks rule" that checks often
enough, but not too often, and then somehow communicate this rule to
users so that they can make sure their code is prepared.</p>
<p>In Trio's case, this is pretty straightforward. We already, for other
reasons, use Python's async/await syntax to annotate blocking
functions. The main thing does is let you look at the text of any
function and immediately see which points might block waiting for
something to happen. Example:</p>
<div class="highlight"><pre><span></span><span class="k">async</span> <span class="k">def</span> <span class="nf">user_defined_function</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Hello!"</span><span class="p">)</span>
<span class="k">await</span> <span class="n">trio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Goodbyte!"</span><span class="p">)</span>
</pre></div>
<p>Here we can see that the call to <tt class="docutils literal">trio.sleep</tt> blocks, because it has
the special <tt class="docutils literal">await</tt> keyword. You can't call <tt class="docutils literal">trio.sleep</tt> – or any
other of Trio's built-in blocking primitives – without using this
keyword, because they're marked as async functions. And then Python
enforces that if you want to use the <tt class="docutils literal">await</tt> keyword, then you have
to mark the calling function as async as well, which means that all
<em>callers</em> of <tt class="docutils literal">user_defined_function</tt> will also use the <tt class="docutils literal">await</tt>
keyword. This makes sense, since if <tt class="docutils literal">user_defined_function</tt> calls a
blocking function, that makes it a blocking function too. In many
other systems, whether a function might block is something you can
only determine by examining all of its potential callees, and all
their callees, etc.; async/await takes this global runtime property
and makes it visible at a glance in the source code.</p>
<p>Trio's cancel scopes then piggy-back on this system: we declare that
whenever you see an <tt class="docutils literal">await</tt>, that's a place where you might have to
handle a <tt class="docutils literal">Cancelled</tt> exception – either because it's a call to one
of Trio's primitives which directly check for cancellation, or because
it's a call to a function that indirectly calls one of those
primitives, and thus might see a <tt class="docutils literal">Cancelled</tt> exception come bubbling
out. This has several nice properties. It's extremely easy to explain
to users. It covers all the functions where you absolutely need
timeout/cancellation support to avoid infinite hangs – only functions
that block can get stuck blocking forever. It means that any function
that does I/O on a regular basis also automatically checks for
cancellation on a regular basis, so most of the time you don't need to
worry about this (though for the occasional long-running pure
computation, you may want to add some explicit cancellation checks by
calling <tt class="docutils literal">await trio.sleep(0)</tt> – which you have to do anyway to let
the scheduler work!). Blocking functions tend to have a <a class="reference external" href="https://docs.python.org/3/library/exceptions.html#os-exceptions">large variety
of failure modes</a>,
so in many cases any cleanup required to handle <tt class="docutils literal">Cancelled</tt>
exceptions will be shared with that needed to handle, for example, a
misbehaving network peer. And Trio's cooperative multi-tasking system
also uses the <tt class="docutils literal">await</tt> points to mark places where the scheduler
might switch to another task, so you already have to be careful about
leaving data structures in inconsistent states across an <tt class="docutils literal">await</tt>.
Cancellation and async/await go together like peanut butter and
chocolate.</p>
</div>
<div class="section" id="an-escape-hatch">
<h3><a class="toc-backref" href="#id20">An escape hatch</a></h3>
<p>While checking for cancellation at all blocking primitive calls makes
a great default, there are some very rare cases where you want to
disable this and take explicit control over cancellation. They're so
rare that I don't have a simple example to use here (though there are
a few arcane examples in the Trio source that you can grep for if
you're really curious). To provide this escape hatch, you can set a
cancel scope to "shield" its contents from outside cancellations. It
looks like this:</p>
<div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">move_on_after</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
<span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">open_cancel_scope</span><span class="p">()</span> <span class="k">as</span> <span class="n">inner_scope</span><span class="p">:</span>
<span class="n">inner_scope</span><span class="o">.</span><span class="n">shield</span> <span class="o">=</span> <span class="kc">True</span>
<span class="c1"># Sleeps for 20 seconds, ignoring the overall 10 second</span>
<span class="c1"># timeout</span>
<span class="k">await</span> <span class="n">trio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span>
</pre></div>
<p>To support composition, shielding is sensitive to the cancel scope
stack: it only blocks outer cancel scopes from applying, and has no
effect on inner scopes. In our example above, our shield doesn't have
any affect on any cancel scopes that might be used <em>inside</em>
<tt class="docutils literal">trio.sleep</tt> – those still behave normally. Which is good, because
whatever <tt class="docutils literal">trio.sleep</tt> does internally is its own private
implementation detail. And in fact, <tt class="docutils literal">trio.sleep</tt> <em>does</em> <a class="reference external" href="https://github.com/python-trio/trio/blob/07d144e701ae8ad46d393f6ca1d1294ea8fc2012/trio/_timeouts.py#L65-L66">use a
cancel scope internally</a>!
<a class="footnote-reference" href="#id10" id="id5">[5]</a></p>
<p>One reason that <tt class="docutils literal">shield</tt> is an attribute on cancel scopes instead of
having a special "shield scope" is that it makes it convenient to
implement this kind of nesting, because we can re-use cancel scope's
existing stack structure. The other reason is that anywhere you're
disabling external timeouts, you need to think about what you're going
to do instead to make sure things can't hang forever, and having a
cancel scope right there makes it easy to apply a new timeout that's
under the local code's control:</p>
<div class="highlight"><pre><span></span><span class="c1"># Demonstrating that the shielding scope can be used to avoid hangs</span>
<span class="c1"># after disabling outside timeouts:</span>
<span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">move_on_after</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span> <span class="k">as</span> <span class="n">outer_scope</span><span class="p">:</span>
<span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">move_on_after</span><span class="p">(</span><span class="mi">15</span><span class="p">)</span> <span class="k">as</span> <span class="n">inner_scope</span><span class="p">:</span>
<span class="n">inner_scope</span><span class="o">.</span><span class="n">shield</span> <span class="o">=</span> <span class="kc">True</span>
<span class="c1"># Returns after 15 seconds, when the shielding scope expires:</span>
<span class="k">await</span> <span class="n">trio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1000000</span><span class="p">)</span>
</pre></div>
<p>Now if you're a Trio user please forget you read this section; if you
think you need to use shielding then you almost certainly should
rethink what you're trying to do. But if you're an I/O runtime
implementer looking to add cancel scope support, then this is an
important feature.</p>
</div>
<div class="section" id="cancel-scopes-and-concurrency">
<h3><a class="toc-backref" href="#id21">Cancel scopes and concurrency</a></h3>
<p>Finally, there's one more feature of Trio that should be mentioned
here. So far in this essay, I haven't discussed concurrency much at
all; timeouts and cancellation are largely independent, and everything
above applies even to straightforward single-threaded synchronous
code. But we did make some assumptions that might seem trivial: that
if you call a function inside a <tt class="docutils literal">with</tt> block, then (a) the execution
will actually happen inside the <tt class="docutils literal">with</tt> block, and (b) any exceptions
it throws will propagate back to the <tt class="docutils literal">with</tt> block so it can catch
them. Unfortunately, many threading and concurrency libraries violate
this, specifically in the case where some work is spawned or
scheduled:</p>
<div class="highlight"><pre><span></span><span class="c1"># This looks innocent enough:</span>
<span class="k">with</span> <span class="n">move_on_after</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
<span class="n">do_the_thing</span><span class="p">()</span>
<span class="c1"># But it isn't:</span>
<span class="k">def</span> <span class="nf">do_the_thing</span><span class="p">():</span>
<span class="c1"># Using some made-up API similar to what most systems use:</span>
<span class="n">start_task_in_background</span><span class="p">(</span><span class="n">some_worker_that_will_actually_do_the_thing</span><span class="p">)</span>
</pre></div>
<p>If we were only looking at the <tt class="docutils literal">with</tt> block alone, this would seem
perfectly innocent. But when we look at how <tt class="docutils literal">do_the_thing</tt> is
implemented, we realize that it's likely that we'll exit the <tt class="docutils literal">with</tt>
block before the background task finishes, so there's some ambiguity:
should the timeout apply to the background task or not? And then if it
does apply, then how should we handle the <tt class="docutils literal">Cancelled</tt> exception? For
most system, unhandled exceptions in background threads/tasks are
simply discarded.</p>
<p>However, these problems don't arise in Trio, because of its unique
approach to concurrency. Trio's <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#tasks-let-you-do-multiple-things-at-once">nursery system</a>
means that child tasks are always integrated into the call stack,
which effectively becomes a call tree. Concretely, the way this is
enforced is that Trio has no global <tt class="docutils literal">start_task_in_background</tt>
primitive; instead, if you want to spawn a child task, you have to
first open a "nursery" block (for the <a class="reference external" href="http://www.dictionary.com/browse/nursery">child to live in</a>, get it?), and then the
lifetime of that child is tied to the <tt class="docutils literal">with</tt> block that created the
nursery:</p>
<div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">move_on_after</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
<span class="k">await</span> <span class="n">do_the_thing</span><span class="p">()</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">do_the_thing</span><span class="p">():</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">open_nursery</span><span class="p">()</span> <span class="k">as</span> <span class="n">nursery</span><span class="p">:</span>
<span class="n">nursery</span><span class="o">.</span><span class="n">start_soon</span><span class="p">(</span><span class="n">some_worker_that_will_actually_do_the_thing</span><span class="p">)</span>
<span class="c1"># Now the 'async with' block won't complete until the</span>
<span class="c1"># child task has finished, and if the child has an unhandled</span>
<span class="c1"># exception then it will be re-raised here in the parent.</span>
<span class="c1"># Which makes this example pretty silly -- the "background</span>
<span class="c1"># task" acts just like a function call. Which is the point :-)</span>
</pre></div>
<p>This system has many advantages, but the relevant one here is that it
preserves the key assumptions that cancel scopes rely on. Any given
nursery is either inside or outside the cancel scope – we can tell by
checking whether the <tt class="docutils literal">with open_cancel_scope</tt> block encloses the
<tt class="docutils literal">async with open_nursery</tt> block. And then it's straightforward to
say that if a nursery is inside a cancel scope, then that scope should
apply to all children in that nursery. This means that if we apply a
timeout to a function, it can't "escape" by spawning a child task –
the timeout applies to the child task too. (The exception is if you
pass an outside nursery into the function, then it can spawn tasks
into that nursery, which can escape the timeout. But then this is
obvious to the caller, because they have to provide the nursery – the
point is to make it clear what's going on, not to make it impossible
to spawn background tasks.)</p>
</div>
<div class="section" id="summary">
<h3><a class="toc-backref" href="#id22">Summary</a></h3>
<p>Returning to our initial example: I've been doing some initial work on
porting <tt class="docutils literal">requests</tt> to run on Trio (<a class="reference external" href="https://github.com/python-trio/urllib3/issues/1">you can help!</a>), and so far it
looks like the Trio version will not only handle timeouts better than
the traditional synchronous version, but that it will be able to do
this using <em>zero lines of code</em> – all the places where you'd want to
check for cancellation are the ones where Trio does so automatically,
and all the places where you need special care to handle the resulting
exceptions are places where <tt class="docutils literal">requests</tt> is prepared to handle
arbitrary exceptions for other reasons.</p>
<p>There are no free lunches; cancellation handling can still be a source
of bugs, and requires care when writing code. But Trio's cancel scopes
are dramatically easier to use – and therefore more reliable – than
any other system I've found. Hopefully we can make timeout bugs the
exception rather than the rule.</p>
</div>
</div>
<div class="section" id="who-else-can-benefit-from-cancel-scopes">
<h2><a class="toc-backref" href="#id23">Who else can benefit from cancel scopes?</a></h2>
<p>So... that's great if you're using Trio. Is this something that only
works in Trio's context, or is it more general? What kind of
adaptations would need to be made to use this in other environments?</p>
<p>If you want to implement cancel scopes, then you'll need:</p>
<ul class="simple">
<li>Some kind of implicit context-local storage to track the cancel
scope stack. If you're using threads, then thread-local storage
works; if you're using something more exotic, then you'll need to
figure out the equivalent in your system. (So for example, in Go
you'd need goroutine-local storage, which famously <a class="reference external" href="https://stackoverflow.com/questions/31932945/does-go-have-something-like-threadlocal-from-java">doesn't exist</a>.)
This can be a bit tricky; for example in Python, we need something
like <a class="reference external" href="https://www.python.org/dev/peps/pep-0568/">PEP 568</a> to iron
out some bad interactions <a class="reference external" href="https://github.com/python-trio/trio/issues/264">between cancel scopes and generators</a>.</li>
<li>A way to delimit the boundaries of a cancel scope. Python's <tt class="docutils literal">with</tt>
blocks work great; other options would include dedicated syntax, or
restricting cancel scopes to individual function calls like
<tt class="docutils literal">with_timeout(10, some_fn, arg1, arg2)</tt> (though this could force
awkward factorings, and you'd need to figure out some way to expose
the cancel scope object).</li>
<li>A strategy for unwinding the stack back to the appropriate cancel
scope after a timeout/cancellation occurs. Exceptions work great, so
long as you have a way to catch them at cancel scope boundaries –
this is another reason that Python's <tt class="docutils literal">with</tt> blocks work so well
for this. But if your language uses, say, error code returns instead
of exceptions, then I'm sure you could build some stack unwinding
convention out of those.</li>
<li>A story for how cancel scopes integrate with your concurrency API
(if any). Of course the ideal is something like Trio's nursery
system (which also has many other advantages, but that's a whole
'nother blog post). But even without that, you could for example
deem that any new tasks spawned inside a cancel scope inherit that
cancel scope, regardless of when they finish. (Unless they opt out
using something like the shielding feature.)</li>
<li>Some rule to determine which operations are cancellable and
communicate that to the user. As noted above, async/await works
perfectly for this, but if you aren't using async/await then other
conventions are certainly possible. Languages with rich static type
systems might be able to exploit them somehow. Worst case you could
just be careful to document it on each function.</li>
<li>Cancel scope integration for all of the blocking I/O primitives you
care about. This is reasonably straightforward if you're building a
system from scratch. Async systems have an advantage here because
integrating everything into an event loop already forces you to
reimplement all your I/O primitives in some uniform way, which gives
you an excellent opportunity to add uniform cancellation handling at
the same time.</li>
</ul>
<div class="section" id="synchronous-single-threaded-python">
<h3><a class="toc-backref" href="#id24">Synchronous, single-threaded Python</a></h3>
<p>Our original motivating examples involved <tt class="docutils literal">requests</tt>, an ordinary
synchronous library. And pretty much everything above applies equally
to synchronous or concurrent code. So I think it's interesting to
explore the idea of using these in classic synchronous Python. Maybe
we can fix <tt class="docutils literal">requests</tt> so it doesn't have to apologize for its
<tt class="docutils literal">timeout</tt> argument!</p>
<p>There are a few limitations we'll have to accept:</p>
<ul class="simple">
<li>It won't be ubiquitous – libraries will have to make sure that they
only use "scope-enabled" blocking operations. Perhaps in the long
run we could imagine this becoming part of the standard library and
integrated into all the standard primitives, but even then there
will still be third-party extension libraries that do their own I/O
without going through the standard library. On the other hand, a
library like <tt class="docutils literal">requests</tt> can be careful to only use scope-enabled
libraries, and then document that it itself is scope-enabled. (This
is perhaps the biggest advantage an async library like Trio has when
it comes to timeouts and cancellation: being async doesn't make a
difference per se, but an async library is forced to reimplement all
the basic I/O primitives to integrate them into its I/O loop; and if
you're reimplementing everything <em>anyway</em>, it's easy to make
cancellation support consistent.)</li>
<li>There's no marker like <tt class="docutils literal">await</tt> to show which operations are
cancellable. This means that users will have to take somewhat more
care and check the documentation for individual functions – but
that's still less work then what it currently takes to make timeouts
work right.</li>
<li>Python's underlying synchronous primitives generally only support
cancellation due to timeouts, not arbitrary events, so we probably
can't provide a <tt class="docutils literal">cancel_scope.cancel()</tt> operation. But this
limitation doesn't seem too onerous, because if you have a
single-threaded synchronous program and the single thread is stuck
in some blocking operation, then who's going to call <tt class="docutils literal">cancel()</tt>
anyway?</li>
</ul>
<p>Summing up: it can't be quite as nice as what Trio provides, but it'd
still be pretty darn useful, and certainly nicer than what we have
now.</p>
<p>If this sounds interesting to you, <a class="reference external" href="https://github.com/njsmith/deadline-scopes">check out the proof-of-concept
that I implemented</a>.</p>
</div>
<div class="section" id="asyncio">
<h3><a class="toc-backref" href="#id25">asyncio</a></h3>
<p>One of the original motivations for this blog post was talking to
<a class="reference external" href="https://github.com/1st1">Yury</a> about whether we could retrofit any
of Trio's improvements back into asyncio. Looking at asyncio through
the lens of the above analysis, a few things jump out at us:</p>
<ul class="simple">
<li>There's some impedence mismatch between the cancel scope model of
implicit stateful arbitrarily-scale cancel tokens, and asyncio's
current task-oriented, edge-triggered cancellation (and then the
<tt class="docutils literal">Future</tt>s layer has a slightly different cancellation model
again), so we'd need some story for how to meld those together. Or
maybe it would be possible to migrate <tt class="docutils literal">Task</tt>s to a stateful
cancellation model?</li>
<li>Without nurseries, there's no reliable way to propagate cancellation
across tasks, and there are a lot of different operations that are
sort of like spawning a task but at a different level of abstraction
(e.g. <tt class="docutils literal">loop.call_soon</tt>). You could have a rule that any new tasks
always inherit their spawner's cancel scopes, but I'm not sure
whether this would be a good idea or not – it needs some thought.</li>
<li>Without a generic mechanism for propagating exceptions back up the
stack, there's no way to reliably route <tt class="docutils literal">Cancelled</tt> exceptions
back to the original scope; generally asyncio simply prints and
discards unhandled exceptions from <tt class="docutils literal">Task</tt>s. Maybe that's fine?</li>
</ul>
<p>Unfortunately asyncio's in a bit of a tricky position, because it's
built on an architecture derived from the previous decade of
experience with async I/O in Python... and then after that
architecture was locked in, it added new syntax to Python that
invalidated all that experience. But hopefully it's still possible to
adapt some of these lessons – at least with some compromises.</p>
</div>
<div class="section" id="other-languages">
<h3><a class="toc-backref" href="#id26">Other languages</a></h3>
<p>If you're working in another language, I'd love to hear how the cancel
scope idea adapts – if at all. For example, it'll definitely need some
adjustment for languages that don't use exceptions, or that are
missing the kind of user-extensible syntax that Python's <tt class="docutils literal">with</tt>
blocks provide.</p>
</div>
</div>
<div class="section" id="now-go-forth-and-fix-your-timeout-bugs">
<h2><a class="toc-backref" href="#id27">Now go forth and fix your timeout bugs!</a></h2>
<p>Or if you want to read more about Trio, <a class="reference external" href="https://trio.readthedocs.io/">we have a friendly tutorial
that people seem to like</a>.</p>
</div>
<div class="section" id="comments">
<h2><a class="toc-backref" href="#id28">Comments</a></h2>
<p>You can <a class="reference external" href="https://trio.discourse.group/t/discussion-thread-timeouts-and-cancellation-for-humans/26">discuss this post on the Trio forum</a>.</p>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id1">[1]</a></td><td>In fact I'm glossing over several layers of abstraction here:
it's really more like boto3 → botocore → requests → urllib3 →
http.client → socket.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id2">[2]</a></td><td><p class="first">In real life we'd probably use a <tt class="docutils literal">with</tt> statement here, like:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">send_websocket_messages</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">messages</span><span class="p">):</span>
<span class="k">with</span> <span class="n">open_websocket_connection</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">as</span> <span class="n">ws</span><span class="p">:</span>
<span class="k">for</span> <span class="n">message</span> <span class="ow">in</span> <span class="n">messages</span><span class="p">:</span>
<span class="n">ws</span><span class="o">.</span><span class="n">send_message</span><span class="p">(</span><span class="n">message</span><span class="p">)</span>
</pre></div>
<p class="last">This makes the problem even <em>harder</em> to see, because now the nasty
<tt class="docutils literal">ws.close</tt> call that makes our program hang is entirely
invisible.</p>
</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id3">[3]</a></td><td>Incredibly, C#'s high-level async networking functions actually
<a class="reference external" href="https://stackoverflow.com/questions/12421989/networkstream-readasync-with-a-cancellation-token-never-cancels">accept cancel token arguments and then ignore them</a>.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id9" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id4">[4]</a></td><td>Curio's timeouts are derived from a thread-interrupt style
cancellation model (similar to Java/C#'s <tt class="docutils literal">Thread.interrupt</tt>), so
timeout expiration is edge-triggered, didn't handle nesting at all
<a class="reference external" href="https://github.com/dabeaz/curio/issues/82">until I complained about it to Dave</a>, and only applies to
the current task, not any child tasks that it might have spawned.
Trio's cancel scopes are basically Curio's timeout blocks + C#'s
cancel tokens + a more straightforward nesting model + shielding +
nursery-based concurrency to make child tasks respect stack
discipline. Keep reading to learn what all of these things are :-).</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id10" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id5">[5]</a></td><td>Possibly interesting context: in other systems, it's common to
have some kind of "<a class="reference external" href="https://twistedmatrix.com/documents/current/api/twisted.internet.interfaces.IReactorTime.callLater.html">call</a>
<a class="reference external" href="https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.AbstractEventLoop.call_later">later</a>"
primitive, that schedules some code to run at a particular time.
(This is a special case of the general "callback pattern" of
registering some arbitrary code to run when a certain event
occurs.) In those systems, you might expect cancel scope deadlines
to be implemented with something like <tt class="docutils literal">call_later(deadline,
<span class="pre">scope.cancel())</span></tt>. But one of Trio's core design principles is to
reject the whole callback paradigm, on the grounds that it's a
disguised way of spawning background tasks, and we think
concurrency is hard enough without disguised background tasks. So
in Trio, the way you implement <tt class="docutils literal">call_later</tt>-like functionality
is to spawn a task, and then have it sleep until the given time.
Event notification is always done by waking up a task, not spawning
a new one. And what <em>this</em> means is that cancel scope deadlines are
actually Trio's core primitive for timekeeping! All other
time-based operations like <tt class="docutils literal">sleep</tt> are implemented on top of
cancel scope deadlines.</td></tr>
</tbody>
</table>
</div>
Control-C handling in Python and Trio2017-04-19T00:00:00-07:002017-04-19T00:00:00-07:00Nathaniel J. Smithtag:vorpus.org,2017-04-19:/blog/control-c-handling-in-python-and-trio/<p class="first"><a class="reference external" href="https://github.com/python-trio/trio">Trio</a> is a new
asynchronous I/O library for Python, with a focus on usability and
correctness – the goal is to make it easy to get things right.</p>
<p>One thing well-behaved programs should do is exit cleanly when the
user hits control-C. In Python this mostly Just Works without
developers having to think about it too much, and as part of trio's
focus on <em>usability</em>, we'd like to carry that over: there are few
things more annoying than a program that refuses to quit when you hit
control-C! But preserving <em>correctness</em> in the face of an interrupt
that can happen at literally any moment is not at all trivial. This is
a place where trio's two goals interact in a surprisingly interesting
way! In this post I'll explore some different options for handling
control-C, and explain Trio's solution – with a bonus deep dive into
signal handling and some rather obscure corners of CPython's guts.</p>
<p class="last">The tl;dr is: if you're writing a program using trio, then control-C
should generally Just Work the way you expect from regular Python,
i.e., it will raise <tt class="docutils literal">KeyboardInterrupt</tt> somewhere in your code, and
this exception then propagates out to unwind your stack, run cleanup
handlers, and eventually exit the program. You don't need this article
to use trio; you can start with our <a class="reference external" href="https://trio.readthedocs.io/en/latest/tutorial.html">tutorial</a> and be happy
and productive without thinking about control-C ever again. In fact,
most developers probably won't even realize that there's anything
special happening at all. But if you're curious about how we make the
magic go, then read on...</p>
<p><a class="reference external" href="https://github.com/python-trio/trio">Trio</a> is a new asynchronous
I/O library for Python, with a focus on usability and correctness –
the goal is to make it easy to get things right.</p>
<p>One thing well-behaved programs should do is exit cleanly when the
user hits control-C. In Python this mostly Just Works without
developers having to think about it too much, and as part of trio's
focus on <em>usability</em>, we'd like to carry that over: there are few
things more annoying than a program that refuses to quit when you hit
control-C! But preserving <em>correctness</em> in the face of an interrupt
that can happen at literally any moment is not at all trivial. This is
a place where trio's two goals interact in a surprisingly interesting
way! In this post I'll explore some different options for handling
control-C, and explain Trio's solution – with a bonus deep dive into
signal handling and some rather obscure corners of CPython's guts.</p>
<p>The tl;dr is: if you're writing a program using trio, then control-C
should generally Just Work the way you expect from regular Python,
i.e., it will raise <tt class="docutils literal">KeyboardInterrupt</tt> somewhere in your code, and
this exception then propagates out to unwind your stack, run cleanup
handlers, and eventually exit the program. You don't need this article
to use trio; you can start with our <a class="reference external" href="https://trio.readthedocs.io/en/latest/tutorial.html">tutorial</a> and be happy
and productive without thinking about control-C ever again. In fact,
most developers probably won't even realize that there's anything
special happening at all. But if you're curious about how we make the
magic go, then read on...</p>
<div class="contents topic" id="contents">
<p class="topic-title"><strong>Contents:</strong></p>
<ul class="simple">
<li><a class="reference internal" href="#the-precedent-control-c-in-regular-python" id="id8">The precedent: control-C in regular Python</a><ul>
<li><a class="reference internal" href="#option-1-keyboardinterrupt" id="id9">Option 1: KeyboardInterrupt</a></li>
<li><a class="reference internal" href="#option-2-a-custom-signal-handler" id="id10">Option 2: a custom signal handler</a></li>
</ul>
</li>
<li><a class="reference internal" href="#the-dream" id="id11">The dream</a></li>
<li><a class="reference internal" href="#prior-art" id="id12">Prior art</a><ul>
<li><a class="reference internal" href="#twisted" id="id13">Twisted</a></li>
<li><a class="reference internal" href="#other-async-libraries" id="id14">Other async libraries</a></li>
<li><a class="reference internal" href="#how-does-the-python-interpreter-pull-it-off" id="id15">How does the Python interpreter pull it off?</a></li>
</ul>
</li>
<li><a class="reference internal" href="#how-trio-handles-control-c" id="id16">How Trio handles control-C</a><ul>
<li><a class="reference internal" href="#how-do-we-know-which-code-should-be-protected" id="id17">How do we know which code should be protected?</a></li>
<li><a class="reference internal" href="#how-do-we-deliver-a-keyboardinterrupt-if-we-can-t-raise-it" id="id18">How do we deliver a KeyboardInterrupt if we can't raise it?</a></li>
<li><a class="reference internal" href="#what-if-you-want-a-manual-control-c-handler" id="id19">What if you want a manual control-C handler?</a></li>
</ul>
</li>
<li><a class="reference internal" href="#limitations-and-potential-improvements" id="id20">Limitations and potential improvements</a><ul>
<li><a class="reference internal" href="#issues-with-handing-off-from-the-c-level-handler-to-the-python-level-handler" id="id21">Issues with handing off from the C-level handler to the Python-level handler</a></li>
<li><a class="reference internal" href="#issues-with-the-interaction-between-keyboardinterrupt-and-with-blocks" id="id22">Issues with the interaction between KeyboardInterrupt and <tt class="docutils literal">with</tt> blocks</a></li>
<li><a class="reference internal" href="#yield-from-and-await-aren-t-signal-safe" id="id23">yield from and await aren't signal-safe</a></li>
<li><a class="reference internal" href="#what-about-pypy" id="id24">What about PyPy?</a></li>
</ul>
</li>
<li><a class="reference internal" href="#conclusion" id="id25">Conclusion</a></li>
<li><a class="reference internal" href="#comments" id="id26">Comments</a></li>
</ul>
</div>
<div class="section" id="the-precedent-control-c-in-regular-python">
<h2><a class="toc-backref" href="#id8">The precedent: control-C in regular Python</a></h2>
<p>Before we get into event loops and all that, let's review how things
work in regular Python. When you're writing Python code, you have two
basic options for handling control-C.</p>
<div class="section" id="option-1-keyboardinterrupt">
<h3><a class="toc-backref" href="#id9">Option 1: KeyboardInterrupt</a></h3>
<p>The first option is to ignore the issue entirely. By default, the
Python interpreter sets things up so that control-C will cause a
<tt class="docutils literal">KeyboardInterrupt</tt> exception to materialize at some point in your
code, which then propagates out like any other regular exception. This
is pretty nice! If your code was accidentally caught in an infinite
loop, then it breaks out of that. If you have cleanup code in
<tt class="docutils literal">finally</tt> blocks, it gets run. It shows a traceback so you can find
that infinite loop. That's the advantage of the <tt class="docutils literal">KeyboardInterrupt</tt>
approach: even if you didn't think about control-C at all while you
were writing the program, then it still does something that's pretty
darn reasonable – say, 99% of the time.</p>
<p>The problem is that the other 1% of the time, things break in weird
ways. It's extremely difficult to write code that can correctly handle
a <tt class="docutils literal">KeyboardInterrupt</tt> <em>anywhere</em> and still guarantee
correctness. No-one audits or tests their code for this. And the edge
cases are very tricky. For example, suppose you have some code that
takes and then releases a lock:</p>
<div class="highlight"><pre><span></span><span class="n">lock</span><span class="o">.</span><span class="n">acquire</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">do_stuff</span><span class="p">()</span> <span class="c1"># <-</span>
<span class="n">do_something_else</span><span class="p">()</span> <span class="c1"># <- control-C anywhere here is safe</span>
<span class="n">and_some_more</span><span class="p">()</span> <span class="c1"># <-</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">lock</span><span class="o">.</span><span class="n">release</span><span class="p">()</span>
</pre></div>
<p>If the user hits control-C anywhere inside the <tt class="docutils literal">try</tt> block, then the
resulting <tt class="docutils literal">KeyboardInterrupt</tt> will cause the <tt class="docutils literal">finally</tt> block to
run, the lock will be released, and all will be well. But what if
we're unlucky?</p>
<div class="highlight"><pre><span></span><span class="n">lock</span><span class="o">.</span><span class="n">acquire</span><span class="p">()</span>
<span class="c1"># <- control-C could happen here</span>
<span class="k">try</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">finally</span><span class="p">:</span>
<span class="c1"># <- or here</span>
<span class="n">lock</span><span class="o">.</span><span class="n">release</span><span class="p">()</span>
</pre></div>
<p>If a <tt class="docutils literal">KeyboardInterrupt</tt> happens at one of the two points marked
above, then sucks to be us: the exception will propagate but our lock
will <em>never</em> be released, which means that instead of exiting cleanly
we might well get stuck in a deadlock. By moving the <tt class="docutils literal">acquire</tt>
inside the <tt class="docutils literal">try</tt> block we could convert the first point into a
<tt class="docutils literal">RuntimeError</tt> ("attempt to release an unlocked lock") instead of a
deadlock, but this isn't entirely satisfying, and doesn't help with
the second point. And there's another possibility:
<tt class="docutils literal">KeyboardInterrupt</tt> could be raised <em>inside</em> <tt class="docutils literal">lock.acquire</tt> or
<tt class="docutils literal">lock.release</tt> – meaning that we could end up with a lock that was
"half-acquired". I'm not sure what that means but it's probably bad.</p>
<p>In any case, the point here is to illustrate a more general principle:
most Python code has dozens of these kinds of dangerous moments when a
<tt class="docutils literal">KeyboardInterrupt</tt> will violate invariants. Our running example
uses a lock because trio is a concurrency library, but the same thing
applies to open files, database transactions, any kind of multi-step
operation that mutates external state... usually you're lucky enough
to get away with it, especially since the program usually exits
afterwards anyway, but it's basically impossible to know for certain,
so if you need 100% reliability then you need a different
approach. <a class="footnote-reference" href="#async-exc-literature" id="id1">[1]</a></p>
</div>
<div class="section" id="option-2-a-custom-signal-handler">
<h3><a class="toc-backref" href="#id10">Option 2: a custom signal handler</a></h3>
<p>The problem with <tt class="docutils literal">KeyboardInterrupt</tt> is that it can happen
<em>anywhere</em>. If we want to make this manageable, then we need to
somehow trim down the number of places where we need to think about
control-C. The general strategy here is to register a <a class="reference external" href="https://docs.python.org/3/library/signal.html#signal.signal">custom handler</a> for
SIGINT that does nothing except set some kind of flag to record that
the signal happened. This way we can be pretty confident that the
signal handler itself won't interfere with whatever the program was
doing when the signal handler ran. And then we have to make sure that
our program checks this flag on a regular basis at places where we
know how to safely clean up and exit. The best way to think about this
is that we set up a "chain of custody" where responsibility for
handling the signal gets handed along from tricky low-level code up to
higher-level code whose execution context is better-defined:</p>
<pre class="literal-block">
custom signal handler -> our program's main loop
sets flag checks flag
</pre>
<p>It's hard to say more than this, though, because the implementation is
going to depend a lot on the way each particular program is put
together. That's the downside to this approach: making it work at all
requires insight into our program's structure and careful attention to
detail. If we mess up and don't check the flag for a few seconds
(perhaps because we're busy doing something else, or the program is
sleeping while waiting for I/O to arrive, or ...), then oops, it takes
a few seconds to respond to control-C. To avoid this we may need to
invent some kind of mechanism to not just set the flag, but also prod
the main loop into checking it in a timely fashion:</p>
<pre class="literal-block">
custom signal handler -> our program's main loop
sets flag gets woken up by being poked with stick
& pokes main loop with a stick & checks flag
</pre>
<p>Another possibility is that we <em>really</em> mess up and accidentally get
stuck in an infinite loop that doesn't check for the flag, and then
oops, now control-C just doesn't work at all, which is a really adding
insult to injury – we've got a buggy program that's locked and chewing
up our CPU, and now we can't even kill it? This is <em>exactly</em> the
situation that control-C is supposed to handle! Argh! Super annoying.</p>
<p>Bottom line: this is the only viable way to handle interrupts 100%
correctly, but getting there requires a lot of work, and if you mess
up then you'll actually make things worse. For many programs it's not
worth it – we may be better off letting Python do its default thing of
raising <tt class="docutils literal">KeyboardInterrupt</tt> and crossing our fingers.</p>
<p>The nice thing about Python's approach is that it gives us both
options, and lets us pick the trade-offs that work best for each
situation.</p>
</div>
</div>
<div class="section" id="the-dream">
<h2><a class="toc-backref" href="#id11">The dream</a></h2>
<p>So those are your options in regular Python; what if you're using
Trio?</p>
<p>In general, Trio tries to make async programming feel similar to
regular Python programming, with some minimal extensions added. For
example, if we want to call A and then call B we don't write some
complicated thing like <tt class="docutils literal">fut = <span class="pre">A();</span> fut.add_callback(B)</tt>, we just
write <tt class="docutils literal"><span class="pre">A();</span> B()</tt> (maybe with some <tt class="docutils literal">await</tt>s thrown in). Our model
for running concurrent tasks is that spawning a task is similar to
calling a function, except that you now you can call several functions
<em>at the same time</em>. And – important for our current discussion – this
means we can report errors using ordinary exceptions and the usual
stack unwinding logic, even when those errors have to cross between
different concurrent tasks.</p>
<p>For example, a simple web server might have a task tree that looks
like:</p>
<pre class="literal-block">
parent task supervising the other tasks
│
├─ task listening for new connections on port 80
│
├─ task talking to client 1
│
├─ task talking to client 2
│
├─ task talking to client 3
┊
</pre>
<p>Now suppose we haven't defined any special control-C handling, the
user hits control-C, and the second client task receives a
<tt class="docutils literal">KeyboardInterrupt</tt>. Then this exception will propagate up the stack
inside the "client 2" task – running any cleanup code as it goes.
Generally in this kind of server you'd have some sort of catch-all
block near the top of the task that catches, logs, and discards most
exceptions, because we don't want a typo in some HTML template to take
down the whole server. But if our server is well written, this
catch-all handler will only catch <tt class="docutils literal">Exception</tt> and not
<tt class="docutils literal">BaseException</tt> – this is just a <a class="reference external" href="https://stackoverflow.com/questions/7160983/catching-all-exceptions-in-python">standard Python thing</a>,
nothing to do with trio – so it won't catch the <tt class="docutils literal">KeyboardInterrupt</tt>
exception, which will eventually hit the top of that task's stack.</p>
<p>At this point, it continues to propagate up the task tree, into the
supervisor task. When a <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#nurseries-and-spawning">supervisor</a>
sees a child crashing with an exception like this, the <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#custom-supervisors">default</a>
response is to <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#cancellation-and-timeouts">cancel</a>
all the other tasks and then re-raise the exception. So now all the
other tasks will recieve <tt class="docutils literal">trio.Cancelled</tt> exceptions, clean
themselves up, and then the whole thing exits with
<tt class="docutils literal">KeyboardInterrupt</tt>. Nice! That's just what we wanted, and we didn't
have to think about control-C handling at all when we were writing the
code – it just worked.</p>
<p>So what this suggests is that trio should provide exactly the same
semantics as regular Python: by default control-C triggers a
<tt class="docutils literal">KeyboardInterrupt</tt> in your code and then trio's normal exception
propagation logic will take care of things, or else you can define a
custom handler with some custom cleanup logic if you want to be really
careful.</p>
<p>Now all we need to do is implement it... but this turns out to be
non-trivial, because trio is itself implemented in Python. In our
little scenario above, we imagined that <tt class="docutils literal">KeyboardInterrupt</tt> was
raised inside the user's code. But if we're unlucky, we might get a
<tt class="docutils literal">KeyboardInterrupt</tt> inside trio itself. For example, in trio's core
scheduling loop there's a bit of code that picks the next task to run
by doing something like:</p>
<div class="highlight"><pre><span></span><span class="n">next_task</span> <span class="o">=</span> <span class="n">run_queue</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
</pre></div>
<p>Imagine a <tt class="docutils literal">KeyboardInterrupt</tt> arriving after the call to <tt class="docutils literal">pop()</tt>
but before the assignment! Even if we catch the error, we just lost
track of this task. That's no good.</p>
<p>This is a bit of a theme in trio: a genuinely wonderful thing about
Python's async/await design is that it's not bound to any particular
event loop or execution model: it's basically just a minimal
stack-switching primitive that lets us build our own cooperative
threading semantics on top as an ordinary Python library. If Python's
async/await looked like C#'s async/await or Javascript's async/await,
then libraries like trio and curio couldn't exist, because asyncio
would be baked into the language. But... it turns out that trying to
extend the Python runtime's core semantics, in Python, is a great way
to discover <a class="reference external" href="https://github.com/python-trio/trio/issues/103">all kinds of interesting edge cases</a>!</p>
<p>Can we do better?</p>
</div>
<div class="section" id="prior-art">
<h2><a class="toc-backref" href="#id12">Prior art</a></h2>
<p>Do other async libraries give any useful hints on what to do? Not
really, unfortunately.</p>
<div class="section" id="twisted">
<h3><a class="toc-backref" href="#id13">Twisted</a></h3>
<p>Twisted by default <a class="reference external" href="https://github.com/twisted/twisted/blob/8f07a5afbbdcd00e387b2d91344f1ac7a1d27354/src/twisted/internet/base.py#L1203">registers</a>
a <a class="reference external" href="https://github.com/twisted/twisted/blob/8f07a5afbbdcd00e387b2d91344f1ac7a1d27354/src/twisted/internet/base.py#L646">signal handler</a>
for control-C that triggers a clean shutdown of their event loop. This
means that control-C won't work if your Twisted program runs away in
an infinite loop that never yields to the event loop, and even if does
work then any callback chains or coroutines that are in progress will
get abruptly abandoned, but it will at least run any registered
<a class="reference external" href="https://twistedmatrix.com/documents/current/api/twisted.internet.interfaces.IReactorCore.html#addSystemEventTrigger">shutdown callbacks</a>. It's
not bad, it can be made to work, but doing so is tricky and there are
limitations. Trio's motto is "make it easy to get things right", so
we'd like to do better.</p>
</div>
<div class="section" id="other-async-libraries">
<h3><a class="toc-backref" href="#id14">Other async libraries</a></h3>
<p>I also looked at tornado, asyncio, curio, and gevent, but (as of April
2017) they're even less sophisticated than twisted: by default they
don't do any special handling for keyboard interrupts at all, so
hitting control-C may or may not blow up their event loop internals in
a graceless fashion; in particular, any callback chains or coroutines
you have running are likely to be abruptly abandoned, with no chance
to even run their <tt class="docutils literal">finally</tt> blocks, and it's entirely possible that
you'll hit a deadlock or something, who knows. And as an additional
wrinkle, at least asyncio has some problems handling control-C on
Windows. (Checked with asyncio in CPython 3.6.1; I didn't check the
other projects at all.) For example, if you run this program then be
prepared to kill it with the task manager or something, because your
control-C has no power here:</p>
<div class="highlight"><pre><span></span><span class="c1"># On Windows this ignores control-C, so be prepared to kill it somehow...</span>
<span class="kn">import</span> <span class="nn">asyncio</span>
<span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
<span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">asyncio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">99999</span><span class="p">))</span>
</pre></div>
<p>You can implement the Twisted-style behavior on these systems by
manually registering your own signal handler that triggers some
graceful shutdown logic, but all in all it's not very user friendly,
and has the same limitations. (The asyncio developers have even
<a class="reference external" href="https://github.com/python/asyncio/pull/305#issuecomment-168486131">considered</a>
making the Twisted-style behavior the default, but <a class="reference external" href="https://github.com/python/asyncio/issues/341#issuecomment-236443331">are unhappy about
the side-effects</a>
and haven't reached consensus on a solution.)</p>
</div>
<div class="section" id="how-does-the-python-interpreter-pull-it-off">
<h3><a class="toc-backref" href="#id15">How does the Python interpreter pull it off?</a></h3>
<p>We do have one example of a program that implements the semantics we
want: the Python interpreter itself. How does it work? Let's walk
through it.</p>
<p>Control-C handling starts when the operating system detects a
control-C and informs the interpreter. The way it does this is by
running whatever signal handler was previously registered to handle
the <tt class="docutils literal">SIGINT</tt> signal. Conceptually, this is similar to how
<a class="reference external" href="https://docs.python.org/3/library/signal.html#signal.signal">signal.signal</a>
works, but technically it's very different because <tt class="docutils literal">signal.signal</tt>
takes a <em>Python</em> function to be run when a signal arrives, and the
operating system APIs only let you register a <em>C</em> function to be run
when a signal arrives. (Note that here we're talking about "C" the
language – that it uses the same letter as control-C is just a
coincidence.) So if you're implementing a Python interpreter, that's
your challenge: write a function in C that causes the Python signal
handler function to be run. Once you've done that, you're basically
done; to get Python's default behavior you just have to install a
default handler that looks like:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">default_sigint_handler</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">KeyboardInterrupt</span>
</pre></div>
<p>and then if the user wants to override that with something fancier,
they can.</p>
<p>But implementing the C-level handler turns out to be trickier than you
might think, for the same basic reason we keep running into: control-C
can happen at <em>any</em> moment. On Unix, signal delivery is done by
hijacking a thread, essentially pausing it in between two assembly
instructions and inserting a call to a C function that was registered
as a signal handler. (What if the thread isn't running any assembly
instructions, because it's blocked in a syscall inside the kernel?
Then the kernel unceremoniously cancels that syscall – making it
return the special error code <tt class="docutils literal">EINTR</tt> – and this forces the thread
back into userspace so it can be hijacked. Remember that stick we
mentioned above? The kernel has a very big stick. This design is
historically somewhat controversial <a class="footnote-reference" href="#pc-losering" id="id2">[2]</a>.) On Windows,
things are a bit more civilized and also more annoying: when the user
hits control-C, a new thread spontaneously materializes inside our
process and runs the C signal handler. On the one hand, this is an
elegant re-use of an existing concept and avoids the whole weird
hijacking thing. On the other hand, if you want to somehow poke the
main thread to wake it up, then you're on your own – you have to build
your own stick from scratch.</p>
<p>In any case, the end result of all this is that the C-level signal
handler will get run, <em>but</em> this might happen a time when the
interpreter is in some messy and inconsistent state. And in
particular, this means that you can't simply have the C-level signal
handler run the Python-level signal handler, because the interpreter
might not be in a state where it can safely run Python code.</p>
<p>To see why this is a problem, let's look at an example from inside
CPython. When raising an exception, Python keeps track of three
things: <a class="reference external" href="https://docs.python.org/3/library/sys.html#sys.exc_info">the exception's type, value, and traceback</a>. Here's
the code from <tt class="docutils literal">PyErr_SetExcInfo</tt> that CPython uses to record these
(comments are mine; <a class="reference external" href="https://github.com/python/cpython/blob/cd815edf012dc6dd20dfeef91951270e96607616/Python/errors.c#L359">original is here</a>):</p>
<div class="highlight"><pre><span></span><span class="cm">/* Save the old exc_info values in temporary variables */</span>
<span class="n">oldtype</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_type</span><span class="p">;</span>
<span class="n">oldvalue</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_value</span><span class="p">;</span>
<span class="n">oldtraceback</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_traceback</span><span class="p">;</span>
<span class="cm">/* Assign the new exc_info values */</span>
<span class="n">tstate</span><span class="o">-></span><span class="n">exc_type</span> <span class="o">=</span> <span class="n">p_type</span><span class="p">;</span>
<span class="n">tstate</span><span class="o">-></span><span class="n">exc_value</span> <span class="o">=</span> <span class="n">p_value</span><span class="p">;</span>
<span class="n">tstate</span><span class="o">-></span><span class="n">exc_traceback</span> <span class="o">=</span> <span class="n">p_traceback</span><span class="p">;</span>
<span class="cm">/* Drop the references to the old values */</span>
<span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">oldtype</span><span class="p">);</span>
<span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">oldvalue</span><span class="p">);</span>
<span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">oldtraceback</span><span class="p">);</span>
</pre></div>
<p>You'll notice this is written in a slightly complicated way, where
instead of simply overwriting the old values, they get saved in
temporaries etc. There are two reasons for this. First, we can't just
overwrite the old values because we need to decrement their <a class="reference external" href="https://docs.python.org/3/c-api/intro.html#reference-counts">reference
counts</a>, or
else we'll cause a memory leak. But we can't decrement them one by one
as we assign each field, because <tt class="docutils literal">Py_XDECREF</tt> can potentially end up
causing an object to be deallocated, at which point its <tt class="docutils literal">__del__</tt>
method might run, which is arbitrary Python code, and as you can
imagine you don't want to start running Python code at a moment when
an exception is only <em>half</em> raised. Before it's raised is okay, after
it's raised is okay, but half-way raised, with <tt class="docutils literal">sys.exc_info()</tt> only
partially filled in? That's not going to end well. The CPython
developers of course are aware of this, so they carefully wrote this
function so that it assigns all of the values and puts the interpreter
back into a sensible state before it decrements any of the reference
counts.</p>
<p>But now imagine that a user is annoying (as users sometimes are) and
hits control-C right in the middle of this, so that just as we're
half-way through assigning the new values, the operating system pauses
our code and runs the C signal handler. What happens? If the C-level
signal handler runs the Python-level signal handler directly, then we
have the same problem that we just so carefully avoided: we're running
arbitrary Python code with an exception only half-raised. Even worse,
this Python function probably wants to raise <tt class="docutils literal">KeyboardInterrupt</tt>,
which means that we end up calling <tt class="docutils literal">PyErr_SetExcInfo</tt> to raise a
second exception while we're half-way through raising the
first. Effectively the code would end up looking something like:</p>
<div class="highlight"><pre><span></span><span class="cm">/******************************************************************/</span>
<span class="cm">/* Raising the first exception, like a RuntimeError or whatever */</span>
<span class="cm">/* Save the old exc_info values in temporary variables */</span>
<span class="n">oldtype1</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_type</span><span class="p">;</span>
<span class="n">oldvalue1</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_value</span><span class="p">;</span>
<span class="n">oldtraceback1</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_traceback</span><span class="p">;</span>
<span class="cm">/* Assign the new exc_info values */</span>
<span class="n">tstate</span><span class="o">-></span><span class="n">exc_type</span> <span class="o">=</span> <span class="n">p_type1</span><span class="p">;</span>
<span class="cm">/******************************************************************/</span>
<span class="cm">/* Surprise! Signal handler suddenly runs here, and calls this */</span>
<span class="cm">/* code again to raise a KeyboardInterrupt or something */</span>
<span class="cm">/* Save the old exc_info values in temporary variables */</span>
<span class="n">oldtype2</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_type</span><span class="p">;</span>
<span class="n">oldvalue2</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_value</span><span class="p">;</span>
<span class="n">oldtraceback2</span> <span class="o">=</span> <span class="n">tstate</span><span class="o">-></span><span class="n">exc_traceback</span><span class="p">;</span>
<span class="cm">/* Assign the new exc_info values */</span>
<span class="n">tstate</span><span class="o">-></span><span class="n">exc_type</span> <span class="o">=</span> <span class="n">p_type2</span><span class="p">;</span>
<span class="n">tstate</span><span class="o">-></span><span class="n">exc_value</span> <span class="o">=</span> <span class="n">p_value2</span><span class="p">;</span>
<span class="n">tstate</span><span class="o">-></span><span class="n">exc_traceback</span> <span class="o">=</span> <span class="n">p_traceback2</span><span class="p">;</span>
<span class="cm">/* Drop the references to the old values */</span>
<span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">oldtype2</span><span class="p">);</span>
<span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">oldvalue2</span><span class="p">);</span>
<span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">oldtraceback2</span><span class="p">);</span>
<span class="cm">/******************************************************************/</span>
<span class="cm">/* Back to the original call */</span>
<span class="n">tstate</span><span class="o">-></span><span class="n">exc_value</span> <span class="o">=</span> <span class="n">p_value1</span><span class="p">;</span>
<span class="n">tstate</span><span class="o">-></span><span class="n">exc_traceback</span> <span class="o">=</span> <span class="n">p_traceback1</span><span class="p">;</span>
<span class="cm">/* Drop the references to the old values */</span>
<span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">oldtype1</span><span class="p">);</span>
<span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">oldvalue1</span><span class="p">);</span>
<span class="n">Py_XDECREF</span><span class="p">(</span><span class="n">oldtraceback1</span><span class="p">);</span>
</pre></div>
<p>This would cause all kinds of chaos: notice that <tt class="docutils literal">p_type2</tt>
overwrites <tt class="docutils literal">p_type1</tt>, but <tt class="docutils literal">p_value1</tt> overwrites <tt class="docutils literal">p_value2</tt>, so
we might end up with a <tt class="docutils literal">sys.exc_info()</tt> where the type is
<tt class="docutils literal">KeyboardInterrupt</tt> but the exception object is an instance of
<tt class="docutils literal">RuntimeError</tt>. The <tt class="docutils literal">oldvalue1</tt> and <tt class="docutils literal">oldvalue2</tt> temporaries end
up referring to the same object, so we end up decrementing its
reference count <em>twice</em>, even though we only had one reference; this
probably leads to some kind of <a class="reference external" href="https://en.wikipedia.org/wiki/Dangling_pointer">nasty memory corruption</a>.</p>
<p>Clearly this isn't gonna work. The C-level signal handler cannot call
the Python-level signal handler directly. Instead, it needs to use the
same trick we discussed above: the C-level handler sets a flag, and
the interpreter makes sure to check this flag regularly at moments
when it knows that it can safely run arbitrary Python code.</p>
<p>Specifically, the way CPython does this is that in its core bytecode
evaluation loop, just before executing each bytecode instruction, it
checks to see if the C-level handler's flag was set, and if so then it
pauses and invokes the appropriate Python handler. (After all, the
moment when you're about to run an arbitrary opcode is by definition a
moment when you can run some arbitrary Python code.) And then, if the
Python-level handler raises an exception, the evaluation loop <a class="reference external" href="https://github.com/python/cpython/blob/e82cf8675bacd7a03de508ed11865fc2701dcef5/Python/ceval.c#L1074">lets
this exception propagate</a>
instead of running the next instruction. So a more complete picture of
our chain of custody looks like this, with two branches depending on
which kind of Python-level handler is currently set. (These correspond
to the two strategies we described at the beginning.):</p>
<pre class="literal-block">
C-level handler --> bytecode eval loop
sets flag checks flag & runs Python-level handler
| \
| default Python-level handler
| raises KeyboardInterrupt
\
\
custom Python-level handler --> main loop
sets another flag checks flag
</pre>
<p>But what if the eval loop isn't actually... looping? What if it's
sitting inside a call to <tt class="docutils literal">time.sleep</tt> or <tt class="docutils literal">select.select</tt> or
something? On Unix this is <a class="reference external" href="https://github.com/python-trio/trio/issues/109">mostly</a> taken care of
automatically by the kernel – though at the cost of the interpreter
needing <a class="reference external" href="https://www.python.org/dev/peps/pep-0475/">annoying boilerplate</a> every time it does an
operating system call. On Windows, we're on our own. And
unfortunately, there is no general solution, because, well, it's
Windows, and the Windows low-level APIs wouldn't recognize "general"
if it showed up in a uniform with stars on the shoulder. Windows has
at least 4 qualitatively different methods for interrupting a blocking
call, and any given API might respond to one, several, or none of them
<a class="footnote-reference" href="#windows" id="id3">[3]</a>.</p>
<p>In practice CPython compromises and uses two mechanisms: the C-level
handler can be configured to <a class="reference external" href="https://docs.python.org/3/library/signal.html#signal.set_wakeup_fd">write to a file descriptor</a>
(which is useful for waking up calls that wait for a file descriptor
to have data, like <a class="reference external" href="https://docs.python.org/3/library/select.html#select.select">select</a>), and
on Windows it unconditionally <a class="reference external" href="https://msdn.microsoft.com/en-us/library/windows/desktop/ms686211(v=vs.85).aspx">fires an "event" object</a>,
which is a Windows-specific synchronization primitive. And some parts
of CPython are written to check for this – for example the Windows
implementation of <tt class="docutils literal">time.sleep</tt> is written to <a class="reference external" href="https://github.com/python/cpython/blob/2c134c31252612ed4729fd05df6ab0e96de8d0b1/Modules/timemodule.c#L1471-L1484">wake up early if the
event gets fired and check for signals</a>. And
that's why on Windows you can do <tt class="docutils literal">time.sleep(99999)</tt> and then hit
control-C to cancel it. But this is a bit hit-and-miss: for example,
Python's implementation of <tt class="docutils literal">select.select</tt> doesn't have any similar
early-exit code, so if you run this code on Windows and hit control-C,
then it will raise <tt class="docutils literal">KeyboardInterrupt</tt>... a month from now, give or
take:</p>
<div class="highlight"><pre><span></span><span class="c1"># If you run this on Windows, have the task manager ready</span>
<span class="n">sock</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">()</span>
<span class="n">select</span><span class="o">.</span><span class="n">select</span><span class="p">([</span><span class="n">sock</span><span class="p">],</span> <span class="p">[],</span> <span class="p">[],</span> <span class="mi">2500000</span><span class="p">)</span>
</pre></div>
<p>The C-level signal handler runs and sets its flag, but the interpreter
doesn't notice until the <tt class="docutils literal">select</tt> call has finished. This explains
why asyncio has problems – it blocks in <tt class="docutils literal">select.select</tt>, not
<tt class="docutils literal">time.sleep</tt>. Which, I mean, that's what you <em>want</em> in an event
loop, I'm not saying it should block in <tt class="docutils literal">time.sleep</tt> instead, but if
you're using <tt class="docutils literal">select.select</tt> then Python's normal guarantees break
down and asyncio isn't compensating for that.</p>
<p>So here's the <em>final</em> version of our chain-of-custody diagram for
control-C in a generic Python program:</p>
<pre class="literal-block">
C-level handler --> bytecode eval loop
sets flag checks flag & runs Python-level handler
& writes to fd | \
(if enabled) | default Python-level handler
& fires an event | raises KeyboardInterrupt
(if on windows) \
\
custom Python-level handler --> main loop
sets another flag checks flag
</pre>
<p>And now you know how the Python runtime handles control-C (usually)
promptly and reliably, while protecting itself from getting into a
broken state.</p>
<p>Of course, this doesn't really help the code that's running on top –
if your Python code wants to avoid getting wedged in a broken state,
it's on its own.</p>
<p>...Mostly. It turns out that that there are some details that can
sometimes make our Python code a little more robust to
<tt class="docutils literal">KeyboardInterrupt</tt>s. There's no guarantee – remember, this is the
99% solution we're trying to implement – but if the interpreter can
make it 99.9% instead of 99.0% without any extra work for users, then
it's a nice thing to do (and we probably want to do the same thing in
trio, if we can). So let's look at how these work.</p>
<p>Let's start with our example from above, of some code that isn't quite
<tt class="docutils literal">KeyboardInterrupt</tt> safe:</p>
<div class="highlight"><pre><span></span><span class="n">lock</span><span class="o">.</span><span class="n">acquire</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">lock</span><span class="o">.</span><span class="n">release</span><span class="p">()</span>
</pre></div>
<p>First, what happens if <tt class="docutils literal">KeyboardInterrupt</tt> is raised when we're
half-way through running <tt class="docutils literal">lock.acquire</tt> or <tt class="docutils literal">lock.release</tt>? Can we
end up with our lock object in an inconsistent state where it's only
"half-locked" (whatever that would even mean)?</p>
<p>Well, <em>if</em> our lock is an instance of the standard library's
<tt class="docutils literal">threading.Lock</tt> class, then it turns out we're safe!
<tt class="docutils literal">threading.Lock</tt> is implemented in C code, so its methods get the
same kind of protection that <tt class="docutils literal">PyErr_SetExcInfo</tt> does: you can get a
<tt class="docutils literal">KeyboardInterrupt</tt> before or after the call, but not during the
call <a class="footnote-reference" href="#acquire-blocks" id="id4">[4]</a>. Sweet.</p>
<p>What about a <tt class="docutils literal">KeyboardInterrupt</tt> that happens between calling
<tt class="docutils literal">acquire</tt> and entering the <tt class="docutils literal">try</tt> block, or between entering the
<tt class="docutils literal">finally</tt> block and calling <tt class="docutils literal">release</tt>? Well, in current CPython
there's no way to eliminate this entirely, but it turns out that the
bytecode eval loop has some tricks up its sleeve to make things less
risky.</p>
<p>The first trick we'll examine is also the oldest, and probably the
least useful. To see how this works, we need to look at how our
example gets compiled down to bytecode instructions that run on
CPython's virtual machine. (If you aren't familiar with CPython's
bytecode, <a class="reference external" href="https://www.youtube.com/watch?v=mxjv9KqzwjI">this is a great talk and will give you a good introduction</a>.) Running this code:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">dis</span>
<span class="k">def</span> <span class="nf">f</span><span class="p">():</span>
<span class="n">lock</span><span class="o">.</span><span class="n">acquire</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">lock</span><span class="o">.</span><span class="n">release</span><span class="p">()</span>
<span class="n">dis</span><span class="o">.</span><span class="n">dis</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
</pre></div>
<p>prints a chunk of disassembled bytecode. I won't paste the whole
thing, but it starts like:</p>
<pre class="literal-block">
2 0 LOAD_GLOBAL 0 (lock)
3 LOAD_ATTR 1 (acquire)
6 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
9 POP_TOP
3 10 SETUP_FINALLY 4 (to 17)
</pre>
<p>The first four lines of bytecode correspond to the first line of our
Python code, the call to <tt class="docutils literal">lock.acquire()</tt>. Then SETUP_FINALLY marks
the beginning of the <tt class="docutils literal">try</tt> block. So danger here would be if a
<tt class="docutils literal">KeyboardInterrupt</tt> arrives in between the CALL_FUNCTION (where we
actually acquire the lock) and the SETUP_FINALLY. Since signal
handlers run in between opcodes, there are two places this could
happen: between CALL_FUNCTION and POP_TOP, and between POP_TOP and
SETUP_FINALLY.</p>
<p>Well, it turns out that <a class="reference external" href="https://github.com/python/cpython/commit/b8b6d0c2c63">way back in 2003</a>, Guido added
a bit of code to the bytecode eval loop to skip running signal
handlers if the next opcode is SETUP_FINALLY, and <a class="reference external" href="https://github.com/python/cpython/blob/e82cf8675bacd7a03de508ed11865fc2701dcef5/Python/ceval.c#L1067-L1071">it's still there
today</a>. This
means that we can't get a <tt class="docutils literal">KeyboardInterrupt</tt> in between POP_TOP and
SETUP_FINALLY. It's... mostly useless? We can still get a
<tt class="docutils literal">KeyboardInterrupt</tt> in between CALL_FUNCTION and POP_TOP, and in
fact the CALL_FUNCTION → POP_TOP case is much more likely to cause
problems then that POP_TOP → SETUP_FINALLY case. The check after
CALL_FUNCTION notices any signals that arrived during CALL_FUNCTION,
which can take an arbitrarily long time; the check after POP_TOP only
notices signals that arrived during POP_TOP, and POP_TOP is an
extremely fast opcode – basically just a few machine instructions. In
fact it's so fast that the interpreter usually doesn't bother to check
for signals after it <em>anyway</em> because the check would add substantial
overhead <a class="footnote-reference" href="#fast-dispatch" id="id5">[5]</a>, so in our example this special case
doesn't really accomplish anything at all.</p>
<p>The one case I can think of where the SETUP_FINALLY special case might
be useful is in code like:</p>
<div class="highlight"><pre><span></span><span class="n">SOME_VAR</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">try</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">SOME_VAR</span> <span class="o">=</span> <span class="kc">False</span>
</pre></div>
<p>because if you look at how this compiles to bytecode, the assignment
ends up being a single opcode that comes right before the
SETUP_FINALLY. But fundamentally, this strategy can't really work:
there's generally going to be <em>some</em> sort of logically atomic
operation before each <tt class="docutils literal">try</tt>/<tt class="docutils literal">finally</tt> pair that shouldn't be
interrupted by signals, but there's no way for the interpreter to
figure out where the start of that of that logical operation is. That
information just isn't recorded in the source code.</p>
<p>Except... sometimes it is, which leads to another trick the
interpreter pulls. Back in 2003 <tt class="docutils literal">try</tt>/<tt class="docutils literal">finally</tt> was all we had,
but in modern Python, a nicer way to write our example would be:</p>
<div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">lock</span><span class="p">:</span>
<span class="o">...</span>
</pre></div>
<p>Of course it's <a class="reference external" href="https://www.python.org/dev/peps/pep-0343/">well documented</a> that this is just
<a class="reference external" href="https://en.wikipedia.org/wiki/Syntactic_sugar">syntactic sugar</a>
for something like:</p>
<div class="highlight"><pre><span></span><span class="c1"># simplified but gives the idea, see PEP 343 for the full details</span>
<span class="n">lock</span><span class="o">.</span><span class="fm">__enter__</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="o">...</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">lock</span><span class="o">.</span><span class="fm">__exit__</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
</pre></div>
<p>This looks pretty similar to our problematic code above, so one would
think that the <tt class="docutils literal">with</tt> version has the same problems. But it turns
out this is not quite true – not only is the <tt class="docutils literal">with</tt> version nicer to
look at it than the <tt class="docutils literal">try</tt>/<tt class="docutils literal">finally</tt> version, it actually makes
stronger guarantees about <tt class="docutils literal">KeyboardInterrupt</tt> safety!</p>
<p>Again, let's look at the bytecode:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">dis</span>
<span class="k">def</span> <span class="nf">f</span><span class="p">():</span>
<span class="k">with</span> <span class="n">lock</span><span class="p">:</span>
<span class="k">pass</span>
<span class="n">dis</span><span class="o">.</span><span class="n">dis</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
</pre></div>
<pre class="literal-block">
2 0 LOAD_GLOBAL 0 (lock)
3 SETUP_WITH 5 (to 11)
6 POP_TOP
3 7 POP_BLOCK
8 LOAD_CONST 0 (None)
>> 11 WITH_CLEANUP_START
12 WITH_CLEANUP_FINISH
13 END_FINALLY
</pre>
<p>The key thing we learn here is that entering a <tt class="docutils literal">with</tt> block is done
via SETUP_WITH and exiting is done via WITH_CLEANUP_START. If we
consult <tt class="docutils literal">Python/ceval.c</tt> in the CPython source, it turns out that
<a class="reference external" href="https://github.com/python/cpython/blob/e82cf8675bacd7a03de508ed11865fc2701dcef5/Python/ceval.c#L3095-L3121">SETUP_WITH</a>
is a single opcode that both calls <tt class="docutils literal">lock.__enter__</tt> and also sets up
the invisible <tt class="docutils literal">try</tt> block, and <a class="reference external" href="https://github.com/python/cpython/blob/e82cf8675bacd7a03de508ed11865fc2701dcef5/Python/ceval.c#L3123-L3212">WITH_CLEANUP_START</a>
is a single opcode that both marks the beginning of the invisible
<tt class="docutils literal">finally</tt> block and also calls <tt class="docutils literal">lock.__exit__</tt>. And the crucial
thing for us is that since the interpreter only runs Python-level
signal handlers <em>in between</em> opcodes, this means it's now impossible
for a <tt class="docutils literal">KeyboardInterrupt</tt> to arrive in between calling
<tt class="docutils literal">lock.__enter__</tt> and entering the <tt class="docutils literal">try</tt> block, or in between
entering the <tt class="docutils literal">finally</tt> block and calling <tt class="docutils literal">lock.__exit__</tt>.</p>
<p>Basically, the key thing about <tt class="docutils literal">with</tt> blocks is that they tell the
interpreter where the boundary of the critical operations are (they're
whatever <tt class="docutils literal">__enter__</tt> and <tt class="docutils literal">__exit__</tt> do) so a solution becomes
possible in principle; then <tt class="docutils literal">threading.Lock.__enter__</tt> is
implemented in C so it's atomic itself, and the design of the <tt class="docutils literal">with</tt>
opcodes rules out the two remaining problematic cases:
<tt class="docutils literal">KeyboardInterrupt</tt> after acquiring the lock but entering the
<tt class="docutils literal">try</tt>, and <tt class="docutils literal">KeyboardInterrupt</tt> after entering the <tt class="docutils literal">finally</tt> but
before releasing the lock. Hooray, we're safe!</p>
<p>...almost. Now we can't have a <tt class="docutils literal">KeyboardInterrupt</tt> between entering
the <tt class="docutils literal">finally</tt> block and releasing the lock. But that's not really
what we want. We want to make sure we can't have a
<tt class="docutils literal">KeyboardInterrupt</tt> between exiting the <tt class="docutils literal">try</tt> block and releasing
the lock. But wait, you might think. This is really splitting hairs –
just look at the source code, the end of the <tt class="docutils literal">try</tt> block and the
start of the <tt class="docutils literal">finally</tt> block are the same thing!</p>
<p>Well, yeah, that would make sense... but if we look at the bytecode,
we can see that this isn't quite true: the POP_BLOCK instruction at
offset 7 is the end of the <tt class="docutils literal">try</tt> block, and then we do a LOAD_CONST
before we reach the WITH_CLEANUP_START at offset 11, which is where
the <tt class="docutils literal">finally</tt> block starts.</p>
<p>The reason the bytecode is written like this is that when the
interpreter gets to the <tt class="docutils literal">finally</tt> block hidden inside
WITH_CLEANUP_START, it needs to know whether it arrived there because
an exception was thrown or because the <tt class="docutils literal">try</tt> block finished
normally. The LOAD_CONST leaves a special value on the stack that
tells WITH_CLEANUP_START that we're in the latter case. But for
present purposes the reason doesn't really matter... the end result is
that there's this gap, where if we get a <tt class="docutils literal">KeyboardInterrupt</tt> raised
in between the POP_BLOCK and LOAD_CONST, or in between the LOAD_CONST
and WITH_CLEANUP_START, then it will propagate out of the <tt class="docutils literal">with</tt>
block without calling <tt class="docutils literal">__exit__</tt> at all. Oops!</p>
<p>Bottom line: even if you use a <tt class="docutils literal">with</tt> block AND use a lock that's
implemented in C, it's still possible for a control-C to happen at
<em>just</em> the wrong moment and leave you with a dangling lock. And of
course, there are many other ways that a poorly timed
<tt class="docutils literal">KeyboardInterrupt</tt> can trip you up; even if this particular case
were fixed (which would be nice!), then this doesn't provide a general
solution to those problems. But if we accept that the default
<tt class="docutils literal">KeyboardInterrupt</tt> handling is a best-effort kind of thing, then
this kind of extra safety is still a nice bonus when we can get it –
and in particular using <tt class="docutils literal">with</tt> and a lock implemented in C is much
less likely to break than using <tt class="docutils literal">try</tt>/<tt class="docutils literal">finally</tt> with a lock
implemented in Python, so we should appreciate the CPython developers
for taking the effort.</p>
</div>
</div>
<div class="section" id="how-trio-handles-control-c">
<h2><a class="toc-backref" href="#id16">How Trio handles control-C</a></h2>
<p>Ok, that's enough about regular Python – this is a blog post about
trio! And as we discussed above, trio has the same basic problem that
CPython does: we want to provide <tt class="docutils literal">KeyboardInterrupt</tt> semantics to
code running on top of us, but our internal code that implements
low-level runtime services like scheduling, exception propagation, and
<a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#trio.Lock">locking</a>,
is too delicate to survive random <tt class="docutils literal">KeyboardInterrupt</tt>s without
breaking the whole system. If trio were built into the interpreter,
then this would be no problem, because like we saw above, the
interpreter can (and must!) cheat to make operations like
<tt class="docutils literal">PyErr_SetExcInfo</tt> and <tt class="docutils literal">threading.Lock.__enter__</tt> atomic with
respect to signal delivery. But trio is an ordinary library written in
pure Python, so we don't have this option. What to do?</p>
<p>Okay, enough buildup. Here's how trio handles control-C:</p>
<p>First, we jump through some hoops to make sure that the hand-off from
the C-level signal handler to the Python-level signal handler happens
promptly, even on Windows. Basically this just means that whenever we
stop executing Python bytecode because we're waiting for I/O, we make
sure to hook up to one of the wakeup signals that C-level signal
handler sends. You can read the gory details <a class="reference external" href="https://github.com/python-trio/trio/issues/42">on the trio bug tracker</a>. This is just a
baseline requirement to get any kind of reliable signal handling in
Python.</p>
<p>Next, trio checks at startup to see if the user has configured their
own custom SIGINT handler. If not, then we figure they're expecting
the Python style semantics, and we automatically replace the default
handler with trio's custom handler.</p>
<p>Conceptually, trio's handler is similar to the interpreter's default
handler: its goal is to respond to control-C by raising
<tt class="docutils literal">KeyboardInterrupt</tt> inside user code, but not inside the delicate
parts of the runtime – just now it's trio's runtime we're worried
about protecting, not the underlying language runtime. But
unfortunately, we can't copy CPython's trick of waiting until user
code is running before calling the signal handler – that's just not
functionality that Python offers. Python will call our signal handler
whenever it wants, and we can't stop it. So when our signal handler
gets called, its first job is to figure out whether it's user code
that got interrupted. If so, then it raises <tt class="docutils literal">KeyboardInterrupt</tt>
directly – which lets us break out of that accidental infinite loop I
keep talking about – and we're done done. Otherwise, it sets a flag
and wakes up the run loop to deliver a <tt class="docutils literal">KeyboardInterrupt</tt> as soon
as possible.</p>
<p>So two questions: how does it know whether it's being called from
"user code"? and if we can't deliver a <tt class="docutils literal">KeyboardInterrupt</tt>
immediately, then how do we deliver it "as soon as possible"?</p>
<div class="section" id="how-do-we-know-which-code-should-be-protected">
<h3><a class="toc-backref" href="#id17">How do we know which code should be protected?</a></h3>
<p>The most important code we need to protect from <tt class="docutils literal">KeyboardInterrupt</tt>
is the core scheduling code that runs to switch between user tasks. So
my first thought was that we could have a global flag the keeps track
of whether protection is "enabled" or "disabled", and toggle it back
and forth when scheduling a user task. Something like:</p>
<div class="highlight"><pre><span></span><span class="n">KEYBOARD_INTERRUPT_PROTECTION_ENABLED</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">run_next_task_step</span><span class="p">(</span><span class="n">task</span><span class="p">):</span>
<span class="c1"># Disable protection just while we're running the user task,</span>
<span class="c1"># and re-enable immediately afterwards:</span>
<span class="k">global</span> <span class="n">KEYBOARD_INTERRUPT_PROTECTION_ENABLED</span>
<span class="n">KEYBOARD_INTERRUPT_PROTECTION_ENABLED</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">try</span><span class="p">:</span>
<span class="c1"># <- danger zone!</span>
<span class="c1"># Run this task for one step:</span>
<span class="k">return</span> <span class="n">task</span><span class="o">.</span><span class="n">coro</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">task</span><span class="o">.</span><span class="n">next_value_to_send</span><span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
<span class="c1"># <- danger zone!</span>
<span class="n">KEYBOARD_INTERRUPT_PROTECTION_ENABLED</span> <span class="o">=</span> <span class="kc">True</span>
</pre></div>
<p>But if you've read this far, then this <tt class="docutils literal">try</tt>/<tt class="docutils literal">finally</tt> block
should look pretty familiar, and you can probably guess the problem!
What if our signal handler runs at one of the places labeled "danger
zone"? In both places the protection is disabled, so a
<tt class="docutils literal">KeyboardInterrupt</tt> can be raised... but if it is, then we're in
trouble. If we call <tt class="docutils literal">run_next_task_step</tt> and get back a
<tt class="docutils literal">KeyboardInterrupt</tt>, then that might mean that we didn't run the
task at all, or it might mean that the task itself raised
<tt class="docutils literal">KeyboardInterrupt</tt>, or it might mean that we did run the task step
and then lost the return value... and we can't tell the difference or
recover the return value. So this doesn't work at all! What we need is
some way to combine task switching and the enabling/disabling of
<tt class="docutils literal">KeyboardInterrupt</tt> protection into a <em>single atomic operation</em>,
without cheating and using C code.</p>
<p>This stumped me for a while, but it turns out that this is actually
possible. Here's the trick: Python signal handlers receive an <a class="reference external" href="https://docs.python.org/3/library/signal.html#signal.signal">obscure
second argument</a>,
which is the <a class="reference external" href="https://docs.python.org/3/reference/datamodel.html#frame-objects">stack frame</a>
of the function whose execution was paused in order to run the signal
handler. This frame is either inside the user task, or inside trio's
scheduler. If our signal handler can somehow examine this frame and
figure out which type it is, then the handler will know whether it's
safe to raise <tt class="docutils literal">KeyboardInterrupt</tt>. And crucially, by tying this
decision to the frame object, we make it so that the actual act of
switching in or out of the user task is what toggles the protection,
so there's no moment where our protection is disabled inside the
scheduler.</p>
<p>So now our problem becomes: how do we "mark" a stack frame as
protected or unprotected? My first thought was to stick a special
attribute on functions that transition between the two modes, and then
the signal handler could walk up the stack looking for this special
attribute. But unfortunately, it turns out that there's no way to get
from a stack frame object back to a function object to look at its
attributes. And there's no way to attach generic metadata to frame
objects. (They don't have a <tt class="docutils literal">__dict__</tt>, and while code objects do
have a <tt class="docutils literal">flags</tt> attribute, it's read-only from Python. Of course
<a class="reference external" href="https://docs.python.org/3/library/ctypes.html">nothing is ever REALLY read-only</a> in Python, but
stealing one of CPython's code flags to use in a third-party library
might be considered rude...) In fact, it turns out that there's only
one place where we can attach arbitrary user-defined data to a frame
object, and that's in the local variables!</p>
<p>(In case you ever wondered why pytest uses a magic variable
<tt class="docutils literal">__tracebackhide__</tt> as its mechanism to <a class="reference external" href="https://docs.pytest.org/en/latest/example/simple.html#writing-well-integrated-assertion-helpers">mark functions that
shouldn't show up in tracebacks</a>,
this is why. This is also why tracebacks <a class="reference external" href="https://stackoverflow.com/questions/14817788/python-traceback-with-module-names">don't show class names on
methods</a>
– that information is stored in the method object's <tt class="docutils literal">__qualname__</tt>
attribute, but there's no reasonable way to get from a traceback back
to the method object.)</p>
<p>Anyway, since it's our only option, that's what trio does: we define a
special <a class="reference external" href="https://github.com/python-trio/trio/blob/2ef51af1efe442c277d0e2edcffdb5ab6003c0f2/trio/_core/_ki.py#L80">sentinel value</a>
as the "name" of our local variable. (It's not a string, to make sure
we don't accidentally clash with real user variables – it turns out
Python is fine with this, because the locals namespace is just a dict,
and like all dicts it accepts any random hashable object as a key.)
Whenever we start a user task, we <a class="reference external" href="https://github.com/python-trio/trio/blob/2ef51af1efe442c277d0e2edcffdb5ab6003c0f2/trio/_core/_run.py#L657-L658">stash</a>
a setting for this variable into the task's top stack frame. Then when
our signal handler runs, it can walk up the stack and when it sees the
magic variable, that tells it whether or not to raise
<tt class="docutils literal">KeyboardInterrupt</tt>. The details here aren't public APIs and are
subject to change, but that's how it works.</p>
<p>Then to handle cases like <tt class="docutils literal">trio.Lock.__enter__</tt> we also have <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-hazmat.html#safe-keyboardinterrupt-handling">a
decorator</a>
that can be used to mark a function as needing protection against
<tt class="docutils literal">KeyboardInterrupt</tt>. (And under the hood, of course, it also works
by setting up a magic local variable where our stack introspection
logic can find it.) It's not recommended for use in end-user code,
because if you care enough about control-C to take these kinds of
special measures, then you're almost certainly better off with a
generic solution (see below) than playing whack-a-mole with individual
functions. But internally trio uses this on all of its functions that
manipulate inter-task state to minimize the chances that an untimely
control-C will wedge the whole system, and it's a public API in case
you want to <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-hazmat.html#wait-queue-abstraction">implement your own synchronization primitives</a>.</p>
</div>
<div class="section" id="how-do-we-deliver-a-keyboardinterrupt-if-we-can-t-raise-it">
<h3><a class="toc-backref" href="#id18">How do we deliver a KeyboardInterrupt if we can't raise it?</a></h3>
<p>So now we know how trio's signal handler decides whether it's OK to
throw a <tt class="docutils literal">KeyboardInterrupt</tt> directly into the code that's currently
running. But what if it decides that it's not safe? What do we do
then? Really the only thing we can do is to set some sort of flag and
arrange for it to be delivered later.</p>
<p>Fortunately, trio has a generic <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#cancellation-and-timeouts">cancellation system</a>
that's designed to do things like raise an exception if some code
exceeds its timeout. So we've already solved the problem of finding a
good place to deliver the exception (we call these <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#checkpoints">"checkpoints"</a>),
implemented a mechanism for waking up a sleeping task if necessary to
make the delivery, and provided <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#cancellation-and-primitive-operations">well-defined semantics</a>
for these exceptions. All trio code already has to be prepared to
handle <tt class="docutils literal">Cancelled</tt> exceptions at checkpoints, and <tt class="docutils literal">Cancelled</tt> and
<tt class="docutils literal">KeyboardInterrupt</tt> are very similar. (They even both inherit from
<tt class="docutils literal">BaseException</tt> instead of <tt class="docutils literal">Exception</tt>, because in both cases the
only legal way to handle them is to clean up and let them propagate.)</p>
<p>So if a <tt class="docutils literal">KeyboardInterrupt</tt> happens at a moment when we can't
deliver it directly, we instead hand it off to the cancellation system
to deliver for us:</p>
<pre class="literal-block">
C-level handler -> bytecode eval loop
sets flag checks flag
& wakes loop & runs Python-level handler
\
trio's Python-level handler
raises KeyboardInterrupt
-- or --
sets flag and wakes task ---> trio's cancellation machinery
raises KeyboardInterrupt
at next checkpoint
</pre>
<p>This picture is still somewhat simplified, and omits several of the
trickier variations. For example, during startup and shutdown there
are brief periods where trio can receive a control-C signal but there
aren't any tasks running to deliver it to. <a class="footnote-reference" href="#edge-vs-level-triggered" id="id6">[6]</a>
But that's all solved, and we have an <a class="reference external" href="https://github.com/python-trio/trio/blob/2ef51af1efe442c277d0e2edcffdb5ab6003c0f2/trio/_core/tests/test_ki.py#L222-L339">exhaustive set of tests</a>
to make sure that the handoff chain is never broken and that no
<tt class="docutils literal">KeyboardInterrupt</tt> is accidentally lost.</p>
</div>
<div class="section" id="what-if-you-want-a-manual-control-c-handler">
<h3><a class="toc-backref" href="#id19">What if you want a manual control-C handler?</a></h3>
<p>Let's pause a moment and recap. We started out discussing the two
basic strategies Python programs can use to handle control-C: the easy
and mostly effective default of getting a <tt class="docutils literal">KeyboardInterrupt</tt> raised
at some arbitrary location, and the more difficult and fragile but
also potentially safer option of installing a custom handler and then
implementing some sort of hand-off chain to make sure that it promptly
triggers some kind of clean shutdown logic. Now you've heard how
CPython implements these two strategies internally, and how trio
implements the first strategy. That's good enough for most trio users
– ignore the problem and everything will mostly work out :-). But what
about if you're using trio, and you're paranoid enough that you want
the second strategy? How can trio help you?</p>
<p>It turns out that implementing this kind of safe control-C handling is
actually much easier for trio programs than for generic Python
programs, because you get a lot of the necessary infrastructure for
free. Trio's <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-io.html#signals">API for catching signals</a>
gives you a simple and reliable way to get signal notifications <em>in a
regular task context</em>, letting you skip all the tricky bits required
when writing your own custom signal handler. Then after you're
notified, you can respond however you want – but if what you want is
to just shut everything down in a clean fashion, then again, trio's
infrastructure can do most of the work for you. In a typical trio
program it might look like:</p>
<div class="highlight"><pre><span></span><span class="c1"># This works, but we'll discuss an even easier way below:</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">open_nursery</span><span class="p">()</span> <span class="k">as</span> <span class="n">nursery</span><span class="p">:</span>
<span class="k">if</span> <span class="n">threading</span><span class="o">.</span><span class="n">current_thread</span><span class="p">()</span> <span class="ow">is</span> <span class="n">threading</span><span class="o">.</span><span class="n">main_thread</span><span class="p">():</span>
<span class="c1"># Spawn a child to watch for control-C</span>
<span class="n">nursery</span><span class="o">.</span><span class="n">spawn</span><span class="p">(</span><span class="n">control_c_watcher</span><span class="p">)</span>
<span class="c1"># ... spawn other children to do the real work ...</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">control_c_watcher</span><span class="p">():</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">trio</span><span class="o">.</span><span class="n">catch_signals</span><span class="p">({</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGINT</span><span class="p">})</span> <span class="k">as</span> <span class="n">sigset_aiter</span><span class="p">:</span>
<span class="k">async</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">sigset_aiter</span><span class="p">:</span>
<span class="c1"># the user hit control-C</span>
<span class="k">raise</span> <span class="ne">KeyboardInterrupt</span>
<span class="n">trio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
</pre></div>
<p>The idea here is that we spawn a child task that does nothing but sit
and wait for a control-C to happen, at which point it raises
<tt class="docutils literal">KeyboardInterrupt</tt>. So if this is our example HTTP server from
above, we'd end up with a task tree like:</p>
<pre class="literal-block">
task supervising the other tasks
│
├─ task waiting for control-C # <-- this is new
│
├─ task listening for new connections on port 80
│
├─ task talking to client 1
│
┊
</pre>
<p>And as we discussed, trio implements sensible default behavior for
exception propagation, so if the <tt class="docutils literal">control_c_watcher</tt> task raises
<tt class="docutils literal">KeyboardInterrupt</tt>, then the <tt class="docutils literal">main</tt> task supervisor will notice
this and cleanly cancel the other tasks. This is the same behavior
that makes the default <tt class="docutils literal">KeyboardInterrupt</tt> handling useful; the
difference here is that now the <em>only</em> place that
<tt class="docutils literal">KeyboardInterrupt</tt> can be raised is inside <tt class="docutils literal">control_c_watcher</tt>,
so we don't have to worry about it interrupting some delicate state
manipulation inside our real logic.</p>
<p>That said, this is still a bit of extra work to set up, and has some
potential pitfalls – for example, <tt class="docutils literal">catch_signals</tt> can only be used
if we're running inside the main Python thread (this is a general
restriction on Python's signal handling functions), so we had to
remember to check that before spawning the watcher task. This is
already so, so, so much easier than the equivalent in most other
frameworks, and it has the advantage that it allows total flexibility
in how you respond to the signal – for example, you could use pretty
much the same code to watch for SIGHUP and then <a class="reference external" href="https://stackoverflow.com/questions/19052354/sighup-for-reloading-configuration">reload the server's
configuration</a>,
instead of shutting down. But in the normal control-C case where we
want to raise <tt class="docutils literal">KeyboardInterrupt</tt> and shut everything down... can
trio help you <em>even more</em>?</p>
<p>Well, while I was writing this section I realized that yeah, actually,
it could :-). Remember how earlier we learned that when trio's custom
signal handler can't deliver a <tt class="docutils literal">KeyboardInterrupt</tt> directly, then as
a fallback it routes it through trio's cancellation system? That
system that's carefully designed to allow arbitrary code execution to
be cancelled in a safe and controlled way? What if we, just... always
did that?</p>
<p>Every trio program starts with a line like:</p>
<div class="highlight"><pre><span></span><span class="n">trio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">)</span>
</pre></div>
<p>Starting in the next release (v0.2.0), you can instead write:</p>
<div class="highlight"><pre><span></span><span class="n">trio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">,</span> <span class="n">restrict_keyboard_interrupt_to_checkpoints</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
<p>and this toggles the behavior of trio's control-C handler so that it
<em>always</em> routes <tt class="docutils literal">KeyboardInterrupt</tt> through the cancellation
system. Basically this is just taking the protection that trio uses
for its own internals, and extending it over your whole program; the
implementation is <a class="reference external" href="https://github.com/python-trio/trio/commit/8fee2bc4f517ed95282e059a473881855e18bf95#diff-45dadd505953d0afb98daf0f76db9c57L170">one line long</a>.</p>
<p>The end result is that if you turn this on, then your program only
needs to handle <tt class="docutils literal">KeyboardInterrupt</tt> at certain well-defined places
called <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#checkpoints">checkpoints</a>
– and these are exactly the same places where your program needs to be
prepared to receive a <tt class="docutils literal">Cancelled</tt> exception anyway, e.g. because a
timeout expired, so the extra work is essentially zero. It's still not
enabled by default, because if you turn it on then runaway loops like</p>
<div class="highlight"><pre><span></span><span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="k">pass</span>
</pre></div>
<p>can't be interrupted (there's no checkpoint inside the loop), and
because it's not what users expect when coming from Python. But if you
want the safety of a custom signal handler, this lets you have the
safety without the complexity. Pretty sweet.</p>
</div>
</div>
<div class="section" id="limitations-and-potential-improvements">
<h2><a class="toc-backref" href="#id20">Limitations and potential improvements</a></h2>
<p>Unfortunately, even trio's control-C handling is not (yet) perfect –
mostly due to bugs and limitations in the Python interpreter. Here are
some notes on the issues I've run into so far. For reference, here's
the handoff chain diagram again – I find it useful to look at while
thinking about these things, because the bugs here all involve
something going wrong along that path:</p>
<pre class="literal-block">
C-level handler -> bytecode eval loop
sets flag checks flag
& wakes loop & runs Python-level handler
\
trio's Python-level handler
raises KeyboardInterrupt
-- or --
sets flag and wakes task ---> trio's cancellation machinery
raises KeyboardInterrupt
at next checkpoint
</pre>
<div class="section" id="issues-with-handing-off-from-the-c-level-handler-to-the-python-level-handler">
<h3><a class="toc-backref" href="#id21">Issues with handing off from the C-level handler to the Python-level handler</a></h3>
<p><a class="reference external" href="https://bugs.python.org/issue30038">bpo-30038</a>: This is a bug in
the logic used to hand-off from the C-level signal handler to the
Python-level signal handler: it turns out that the C-level signal
handler pokes the main thread to wake it up <em>before</em> it sets the flag
to tell it that there's a signal pending. So on Windows where the
C-level handler runs in its own thread, then depending on how the
kernel schedules things, sometimes the main thread gets woken up,
checks for signals, sees that the flag is <em>not</em> set, goes back to
sleep... and then the flag gets set, but it's already too late. The
main effect is that on Windows you might sometimes have to hit
control-C twice before trio will notice. No workaround seems to be
possible inside trio; I've <a class="reference external" href="https://github.com/python/cpython/pull/1082">submitted a fix for CPython</a>.</p>
<p><a class="reference external" href="https://bugs.python.org/issue30050">bpo-30050</a>: It turns out that
Python's "wakeup fd" logic wasn't quite designed to be used in the way
that trio's using it; this bug reflects a bit of behavior that made
sense for the original use case, but is annoying for trio. Because of
this, trio currently only uses the wakeup fd on Windows, not on
Unix. This is mostly fine, because on Unix, we mostly don't need it –
signals usually interrupt syscalls and wake up the main thread all by
themselves. But there are some <a class="reference external" href="https://github.com/python-trio/trio/issues/109">rare cases</a> where it'd be
useful even on Unix. The impact here is pretty low, and there are
workarounds possible, though they have their own mildly annoying
side-effects. So it's not a huge deal, but shouldn't be hard to fix
either; hopefully this will get fixed for Python 3.7 so we won't have
to make these compromises.</p>
<p>As far as I can tell, fixing these two issues should make the hand-off
from the C-level handler to the Python-level handler rock-solid on all
platforms.</p>
</div>
<div class="section" id="issues-with-the-interaction-between-keyboardinterrupt-and-with-blocks">
<h3><a class="toc-backref" href="#id22">Issues with the interaction between KeyboardInterrupt and <tt class="docutils literal">with</tt> blocks</a></h3>
<p>It'd be nice if code like</p>
<div class="highlight"><pre><span></span><span class="n">lock</span> <span class="o">=</span> <span class="n">trio</span><span class="o">.</span><span class="n">Lock</span><span class="p">()</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">lock</span><span class="p">:</span>
<span class="o">...</span>
</pre></div>
<p>could be guaranteed to release the lock even in the face of arbitrary
<tt class="docutils literal">KeyboardInterrupt</tt>s. Unfortunately, there are currently two issues
preventing this.</p>
<p>The first is <a class="reference external" href="https://bugs.python.org/issue29988">bpo-29988</a>:
remember how in our examination of CPython's bytecode, we discovered
that it's possible for a <tt class="docutils literal">KeyboardInterrupt</tt> at the wrong moment to
cause a <tt class="docutils literal">with</tt> block to be exited without running <tt class="docutils literal">__exit__</tt>? I
think this is a pretty surprising violation of <tt class="docutils literal">with</tt>'s semantics –
and it turns out that for <tt class="docutils literal">async with</tt> the race condition is
actually a little worse, because its bytecode has more unprotected
machinery at entry and exit to handle <tt class="docutils literal">await</tt>ing the <tt class="docutils literal">__aenter__</tt>
and <tt class="docutils literal">__aexit__</tt> methods. This is something that can only be fixed
inside the interpreter, and this is the bug to track that.</p>
<p>The second problem doesn't have a bug number yet, because the solution
isn't as obvious. Here's the problem: remember how pleased I was to
realize that by using a magic local variable to mark which stack
frames are "user code" versus "internal code", we could make it so
that switching stack frames also atomically toggles control-C
protection on and off? That's fine for when we want to toggle
protection upon switching to an <em>existing</em> stack frame, but has a
problem when creating a new stack frame, like <tt class="docutils literal">__exit__</tt> methods
do. <tt class="docutils literal">trio.Lock.__exit__</tt> effectively looks like:</p>
<div class="highlight"><pre><span></span><span class="k">async</span> <span class="k">def</span> <span class="fm">__aexit__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># <-- danger zone!</span>
<span class="nb">locals</span><span class="p">()[</span><span class="n">protection_enabled_marker_object</span><span class="p">]</span> <span class="o">=</span> <span class="kc">True</span>
<span class="c1"># .. actual logic here ...</span>
</pre></div>
<p>This is enough to make sure that the lock never gets left in an
<em>inconsistent</em> state where it's only "half-locked" – either
<tt class="docutils literal">__aexit__</tt> runs or it doesn't, and it can't get a
<tt class="docutils literal">KeyboardInterrupt</tt> in the middle. But if a control-C arrives at the
point marked "danger zone!" then our unlock might get cancelled before
it starts. The problem is that Python doesn't really provide any
decent way to attach a special value to a stack frame <em>at the moment
its created</em>. Potential workarounds would be to have the signal
handler introspect the current bytecode instruction pointer and treat
the stack frame as protected if it looks like it's about to execute
the protection code, or to have our <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-hazmat.html#trio.hazmat.disable_ki_protection">magic decorators</a>
rewrite code objects to set the magic local as a default kwarg, since
argument processing does seem to be atomic with respect to frame
setup. So far I haven't attempted this because both options are pretty
awkward, and at the moment it hardly seems worth the effort given that
<tt class="docutils literal">with</tt> and <tt class="docutils literal">async with</tt> blocks always have interpreter-level race
conditions.</p>
<p>What I'd really like to see would be for frame objects to retain a
pointer to the function object that was called to create them (if
any). That would:</p>
<ul>
<li><p class="first">Fix the signal atomicity problem.</p>
</li>
<li><p class="first">Let me throw away the <a class="reference external" href="https://github.com/python-trio/trio/blob/64119b12309ffeaf3a35622ef08d3b03e438006e/trio/_core/_ki.py#L108-L150">awful awful code</a>
currently required to implement the <tt class="docutils literal">KeyboardInterrupt</tt> protection
decorators <a class="footnote-reference" href="#throw-is-broken" id="id7">[7]</a> and replace it with something like:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">enable_ki_protection</span><span class="p">(</span><span class="n">fn</span><span class="p">):</span>
<span class="n">fn</span><span class="o">.</span><span class="n">_ki_protection_enabled</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">return</span> <span class="n">fn</span>
</pre></div>
</li>
<li><p class="first">Bonus: pytest could potentially do something similar, instead of the
odd <tt class="docutils literal">__tracebackhide__</tt> thing they do now.</p>
</li>
<li><p class="first">Bonus: tracebacks could start including class names, so instead of:</p>
<pre class="literal-block">
File "/home/njs/trio/trio/_sync.py", line 374, in acquire_nowait
return self._lock.acquire_nowait()
File "/home/njs/trio/trio/_sync.py", line 277, in acquire_nowait
raise WouldBlock
</pre>
<p>we could have (notice the method names on the right):</p>
<pre class="literal-block">
File "/home/njs/trio/trio/_sync.py", line 374, in Condition.acquire_nowait
return self._lock.acquire_nowait()
File "/home/njs/trio/trio/_sync.py", line 277, in Lock.acquire_nowait
raise WouldBlock
</pre>
</li>
</ul>
<p>Or maybe there's a better option, I dunno – it's just an idea. But
something like this sure would be nice.</p>
<p>Anyway. If these two issues were fixed, then we could guarantee that
<tt class="docutils literal">async with</tt> was signal-safe for trio objects (and also built-in
objects like <tt class="docutils literal">threading.Lock</tt>, for that matter!).</p>
</div>
<div class="section" id="yield-from-and-await-aren-t-signal-safe">
<h3><a class="toc-backref" href="#id23">yield from and await aren't signal-safe</a></h3>
<p><a class="reference external" href="https://bugs.python.org/issue30039">bpo-30039</a>: Remember up above
how I said that our local variable trick works when for switching
stack frames, because that's an atomic operation? Actually I
lied... currently in CPython, resuming a coroutine stack is <em>not</em>
atomic.</p>
<p>If we have coroutines calling each other, A → await B → await C, then
when we do <tt class="docutils literal"><span class="pre">A.send(...)</span></tt>, that resumes A's frame, and then A does
<tt class="docutils literal"><span class="pre">B.send(...)</span></tt>, which resumes B's frame, and then B does
<tt class="docutils literal"><span class="pre">C.send(...)</span></tt>, which resumes C's frame, and then C continues
executing.</p>
<p>The problem is that the interpreter <em>checks for signals in between
each of these steps</em>, so if the user hits control-C in the middle of
that sequence, then Python will raise <tt class="docutils literal">KeyboardInterrupt</tt> inside A
or B, and just completely forget about the rest of the call stack that
it's supposed to be executing. And this affects every use of <tt class="docutils literal">await</tt>
or <tt class="docutils literal">yield from</tt>, not just trio.</p>
<p>But the good news is that this is easy to fix. Remember up above how
we found that CPython has a special hack where it doesn't run signal
handlers if it's about to execute a SETUP_FINALLY instruction? For
SETUP_FINALLY we concluded that this mostly doesn't accomplish
anything, but it turns out that this is <em>exactly</em> what we need here:
if we extend that check so it also skips running signal handlers
before a YIELD_FROM instruction, then it fixes this bug. I've
submitted this fix as a <a class="reference external" href="https://github.com/python/cpython/pull/1081">pull request</a>.</p>
</div>
<div class="section" id="what-about-pypy">
<h3><a class="toc-backref" href="#id24">What about PyPy?</a></h3>
<p>We spent an awful lot of time above grovelling around in internal
implementation details of CPython. Trio also works on PyPy: what
happens there?</p>
<p>Answer: ¯\_(ツ)_/¯ ...PyPy's, like, really complicated, ok?</p>
<p>Or in a little more detail: the trio testsuite passes on PyPy, and
overall I've run into fewer bugs than I have on CPython (e.g. PyPy
writes to the wakeup fd at the proper time, and their <tt class="docutils literal">throw</tt> seems
to work properly). But when it comes to the fiddly details about when
exactly they check for signals, and how that's affected by JIT
inlining and other transformations they apply, I currently have no
idea. Maybe they'll read this blog post and help me out.</p>
</div>
</div>
<div class="section" id="conclusion">
<h2><a class="toc-backref" href="#id25">Conclusion</a></h2>
<p>Now you know basically everything there is to know about signal
handling in Python and Trio! You don't actually need to know any of
this to use either of them, but I think it's pretty neat.</p>
<p>And the end result is my absolute favorite kind of feature, because
it's totally invisible: it takes thousands of words to explain, but
most users don't need to know about it <em>at all</em>. Trio's goal is to
make it easy to get things right, and this is the ultimate example of
that philosophy: do nothing, and it Just Works.</p>
<p>Interesting in trying Trio? You can start with <a class="reference external" href="https://github.com/python-trio/trio/blob/master/README.rst">the README</a> , or
jump straight to <a class="reference external" href="https://trio.readthedocs.io/en/latest/tutorial.html">the tutorial</a>. Have fun!</p>
</div>
<div class="section" id="comments">
<h2><a class="toc-backref" href="#id26">Comments</a></h2>
<p>You can <a class="reference external" href="https://trio.discourse.group/t/discussion-control-c-handling-in-python-and-trio/30">discuss this post on the Trio forum</a>.</p>
<div class="line-block">
<div class="line"><br /></div>
<div class="line"><br /></div>
<div class="line"><br /></div>
<div class="line"><br /></div>
<div class="line"><br /></div>
</div>
<hr class="docutils" />
<div class="line-block">
<div class="line"><br /></div>
</div>
<table class="docutils footnote" frame="void" id="async-exc-literature" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id1">[1]</a></td><td><tt class="docutils literal">KeyboardInterrupt</tt> is an example of the
general category of <a class="reference external" href="https://en.wikipedia.org/wiki/Exception_handling#Exception_synchronicity">"asynchronous exception"</a>. (This
is a totally different use of "asynchronous" then the one in
"async/await".) If you want to read more about the problems
asynchronous exceptions cause, Java made the mistake of including
these as a feature in an early release and got stuck with them
before realizing how impossible they are to use safely, so they
have lots of documentation about <a class="reference external" href="https://docs.oracle.com/javase/8/docs/technotes/guides/concurrency/threadPrimitiveDeprecation.html">their challenges</a>
and why they <a class="reference external" href="https://www.securecoding.cert.org/confluence/display/java/THI05-J.+Do+not+use+Thread.stop%28%29+to+terminate+threads">should be avoided</a>.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="pc-losering" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id2">[2]</a></td><td><p class="first">In fact, this aspect of Unix design is so
controversial that it served as the <a class="reference external" href="https://www.dreamsongs.com/RiseOfWorseIsBetter.html">central example</a> of a
<a class="reference external" href="https://www.dreamsongs.com/WorseIsBetter.html">rather famous essay</a>.</p>
<p>One might also wonder how the kernel actually goes about cancelling
a system call. The <a class="reference external" href="http://www.makelinux.net/ldd3/chp-6-sect-2">full answer is a bit complicated</a>, but basically what
it comes down to is that when a signal arrives it sets an internal
flag, and then when you're implementing a system call inside the
kernel you have to remember to <a class="reference external" href="https://stackoverflow.com/questions/9576604/what-does-erestartsys-used-while-writing-linux-driver">check for that flag at appropriate
places...</a>
Conceptually it's extremely similar to what we end up doing to
deliver <tt class="docutils literal">KeyboardInterrupt</tt> via trio's cancellation system!</p>
<p class="last">Basically there are only a few different concepts here, and they
just get remixed over and over at different parts of the stack
:-). If you squint there's even a lot of commonality between trio's
extremely-high-level manipulation of Python's stack data to
enable/disable <tt class="docutils literal">KeyboardInterrupt</tt> for particular stretches of
code, and the extremely-low-level concept of <a class="reference external" href="http://www.makelinux.net/books/lkd2/ch06lev1sec7">enabling or disabling
interrupts</a> on
a CPU or <a class="reference external" href="https://www.arduino.cc/en/Reference/NoInterrupts">microcontroller</a>. Ancient x86
CPUs even <a class="reference external" href="https://pdos.csail.mit.edu/6.828/2011/readings/i386/s09_02.htm">added hacks to skip processing interrupts during certain
instruction sequences used for stack switching</a>,
which is a strikingly similar to the way we'll see we need to
modify CPython's bytecode loop to skip processing signals during
opcodes used in resuming a switching between coroutine stacks.</p>
</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="windows" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id3">[3]</a></td><td>For those keeping score: <tt class="docutils literal">WSASend</tt> wakes up <tt class="docutils literal">select</tt>
and its variants, <tt class="docutils literal">PostQueuedCompletionStatus</tt> wakes up
<tt class="docutils literal">GetQueuedCompletionStatus</tt> and its variants, <tt class="docutils literal">SetEvent</tt> wakes
up <tt class="docutils literal">WaitForSingleObject</tt> and its variants, and there are a few
other calls that can only be interrupted using <tt class="docutils literal">QueueUserAPC</tt>.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="acquire-blocks" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id4">[4]</a></td><td>There is one exception (no pun intended): if
<tt class="docutils literal">acquire</tt> blocks and then gets interrupted by a signal, then it
has some code to <a class="reference external" href="https://github.com/python/cpython/blob/fd0cd07a5a3c964c084f4efc5bbcb89dd2193ee6/Modules/_threadmodule.c#L73-L78">explicitly run signal handlers</a>. This
is still OK though, because it only does this <em>before</em> the lock is
acquired. And of course it only works on Unix...</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="fast-dispatch" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id5">[5]</a></td><td>It uses FAST_DISPATCH instead of <a class="reference external" href="https://github.com/python/cpython/blob/master/Python/ceval.c#L763-L769">DISPATCH</a>.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="edge-vs-level-triggered" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id6">[6]</a></td><td><p class="first">Another interesting design challenge is
that <tt class="docutils literal">KeyboardInterrupt</tt> is edge-triggered – we deliver it once
and then we're done until the user hits control-C again – while
cancellation in trio is normally level-triggered – once we start
delivering <tt class="docutils literal">Cancelled</tt> exceptions, we keep going until the
offending code exits the cancelled region. And in trio, it's
possible that we attempt to cancel an operation, then we find out
later that our cancellation failed, i.e., the operation succeeded
anyway. (This is how Windows' IOCP cancellation semantics work, so
we're kinda stuck with it.) Together these two things make life a
bit difficult, because we need to keep track of whether
<tt class="docutils literal">KeyboardInterrupt</tt> was delivered.</p>
<p>One part of this was tweaking the internal cancellation APIs a bit
so that they could keep track of whether an exception had actually
been delivered or not – previously that wasn't needed. This is why
in trio's <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-hazmat.html#trio.hazmat.yield_indefinitely">lowest level API for sleeping</a>
the callback used to cancel a sleep gets passed a function that
raises an exception, instead of an exception to raise directly – it
gives us a hook to <a class="reference external" href="https://github.com/python-trio/trio/blob/64119b12309ffeaf3a35622ef08d3b03e438006e/trio/_core/_run.py#L498-L500">record whether the exception was actually
delivered</a>.</p>
<p>The other tricky thing is – suppose we pick a task to receive the
<tt class="docutils literal">KeyboardInterrupt</tt> cancellation, and don't find out immediately
whether the delivery was successful. This leaves us in a delicate
state; basically it's Schrödinger's interrupt. We can't deliver it
to anyone else while the first attempt is pending, because it the
first attempt then succeeds we'll have delivered the same interrupt
twice. But we can't forget about it either, because the attempt
might fail. It might even happen that it fails and then the task we
picked exits without passing through another cancellation point,
and then we might need to pick another task to deliver it to.</p>
<p class="last">We solve this through a sneaky hack: we always pick the "main" task
to receive the <tt class="docutils literal">KeyboardInterrupt</tt>. (The main task is the first
task started, which is the ultimate ancestor of all other user
tasks.) This means we don't have to keep track of delivery failures
explicitly, because if the main task hits a second checkpoint
without the first delivery having succeeded, then it must have
failed. And we don't have to worry about switching to a different
victim task, because the main task is always the last user task to
exit, so if it exits then we can fall back on our logic for a
control-C that arrives during shutdown. So this simplification
actually solves a rather difficult problem!</p>
</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="throw-is-broken" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id7">[7]</a></td><td><p class="first">A particularly fun issue we have to work around
in the <tt class="docutils literal">KeyboardInterrupt</tt> protection decorators is <a class="reference external" href="https://bugs.python.org/issue29590">bpo-29590</a>: <tt class="docutils literal">throw</tt>ing into a
generator or coroutine breaks stack introspection. Obviously this
is a problem when the whole idea is for the signal handler to
introspect the stack. Most of the time trio works around this
by... never ever using the <tt class="docutils literal">throw</tt> method. (This is also
necessary to avoid hitting <a class="reference external" href="https://bugs.python.org/issue29587">bpo-29587</a>. <tt class="docutils literal">throw</tt> is really
buggy.) But a major use case for <tt class="docutils literal">enable_ki_protection</tt> is on
context managers, and <tt class="docutils literal">contextlib.contextmanager</tt> uses <tt class="docutils literal">throw</tt>,
so...</p>
<p class="last">Perhaps you can imagine how much fun I had debugging this the first
time I ran into it.</p>
</td></tr>
</tbody>
</table>
</div>
Announcing Trio2017-03-10T00:00:00-08:002017-03-10T00:00:00-08:00Nathaniel J. Smithtag:vorpus.org,2017-03-10:/blog/announcing-trio/<p>As <a class="reference external" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/">you may recall</a>,
I have strong feelings about the design of usable APIs for
asynchronous libraries.</p>
<p>So, I wrote my own.</p>
<p>I'm happy to announce the first release of <strong>Trio</strong>, a new
permissively-licensed async library for Python:</p>
<ul class="simple">
<li>Repository: <a class="reference external" href="https://github.com/python-trio/trio">https://github.com/python-trio/trio</a></li>
<li>Tutorial + manual: <a class="reference external" href="https://trio.readthedocs.io">https://trio.readthedocs.io</a></li>
<li>Download: <a class="reference external" href="https://pypi.python.org/pypi/trio">https://pypi.python.org/pypi/trio</a> (or just <tt class="docutils literal">pip
install <span class="pre">-U</span> trio</tt>)</li>
</ul>
<p>Trio is very much inspired by my work with and on <a class="reference external" href="http://curio.readthedocs.io">Curio</a>, so much credit to Dave Beazley. They
don't share any actual code, and at this point there are many small
and large divergences all over the stack, but if you're curious the
tipping point where I decided I wanted to explore an incompatible
approach <a class="reference external" href="https://trio.readthedocs.io/en/latest/design.html#cancel-points-and-schedule-points">was here</a>.</p>
<p>Some noteworthy features:</p>
<ul class="simple">
<li>Aspires to become production-quality</li>
<li>Full support for Windows, Linux, MacOS</li>
<li>Full support for both CPython 3.5+ and for PyPy 3.5 pre-releases</li>
<li>Flow control is fully <a class="reference external" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/#review-and-summing-up-what-is-async-await-native-anyway">async/await-native</a>
and easy to reason about: no callbacks, no futures, no implicit
concurrency</li>
<li>Powerful and composable framework for handling <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#cancellation-and-timeouts">cancellation and
timeouts</a></li>
<li><a class="reference external" href="https://trio.readthedocs.io/en/latest/design.html#cancel-points-and-schedule-points">Strong user-centered guarantees</a>
around cancel and schedule points make it easier to manage and
reason about cooperative concurrency</li>
<li><a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#tasks-let-you-do-multiple-things-at-once">Erlang-influenced interface for task spawning</a>
provides a structured system for managing child tasks. If user code
raises an exception then it's always propagated until handled, never
logged-and-discarded.</li>
<li>First-class support for <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-core.html#debugging-and-instrumentation">introspection and debugging</a>
(<a class="reference external" href="https://trio.readthedocs.io/en/latest/tutorial.html#task-switching-illustrated">example</a>)</li>
<li>Powerful built-in <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-testing.html">testing helpers</a>. For
example, you can speed up tests that involve timeouts by using a
clock that <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-testing.html#time-and-timeouts">automatically skips over boring periods</a><ul>
<li>As a demonstration of the power of good testing tools, trio's own
test suite achieves <a class="reference external" href="https://codecov.io/gh/python-trio/trio">>98% coverage</a> and runs in ~5 seconds in
"slow" mode (~2 seconds in default mode).</li>
</ul>
</li>
<li>Interrupting programs with control-C magically just works.</li>
<li><a class="reference external" href="https://trio.readthedocs.io/en/latest/tutorial.html">A mostly-written tutorial</a> that
doesn't assume any familiarity with async/await.</li>
<li>A low-level <a class="reference external" href="https://trio.readthedocs.io/en/latest/reference-hazmat.html">"hazmat" API</a> for
when you need to go under the hood. To make sure it's powerful
enough, Trio's main synchronization, networking, and threading APIs
are implemented using only public interfaces.</li>
<li>Exposes a whole <a class="reference external" href="https://github.com/python-trio/trio/issues/79">laundry list</a> of Python
limitations.</li>
<li>Lots of missing pieces left for you to help fill in! :-)</li>
</ul>
<p>I hope you'll check it out!</p>
Why does calloc exist?2016-12-05T00:00:00-08:002016-12-07T01:45:00-08:00Nathaniel J. Smithtag:vorpus.org,2016-12-05:/blog/why-does-calloc-exist/<p>[Edit: Welcome Hacker News readers! Before we dive into the neat
memory management esoterica, I want to briefly note that as engineers
we have an <a class="reference external" href="https://en.wikipedia.org/wiki/Engineering_ethics#Obligation_to_society">ethical obligation</a>
in our work to consider the <a class="reference external" href="http://www.ieee.org/about/corporate/governance/p7-8.html">"safety, health, and welfare of the
public"</a>,
because if we don't, <a class="reference external" href="https://en.wikipedia.org/wiki/Engineering_ethics#Case_studies_and_key_individuals">terrible things</a>
<a class="reference external" href="http://www.jewishvirtuallibrary.org/jsource/Holocaust/IBM.html">happen</a>. This
is a challenging responsibility that requires we all stay thoughtful
and informed – but that's difficult if popular technical news
aggregators choose to <a class="reference external" href="https://news.ycombinator.com/item?id=13108404">censor links and discussions about the societal
implications of technology</a>. I sympathize with
their moderation challenges, but this idea of creating a politics-free
safe space is the cowards' way out, quite literally choosing the
<a class="reference external" href="https://www.africa.upenn.edu/Articles_Gen/Letter_Birmingham.html">"absence of tension" over "the presence of justice"</a>. I
hope the HN moderators find a way to step up to the responsibility
their position entails; in the mean time, you might consider also
subscribing to to <a class="reference external" href="https://recompilermag.com/">The Recompiler</a> and
<a class="reference external" href="https://modelviewculture.com/">Model View Culture</a>, and checking
out <a class="reference external" href="https://www.safetypinbox.com/">Safety Pin Box</a>,
<a class="reference external" href="https://sfbay.techsolidarity.org/">techsolidarity.org</a>, or <a class="reference external" href="http://joinfundclub.com/">Fund
Club</a>. Anyway, thanks for listening! We
now return to our regularly scheduled <tt class="docutils literal">calloc</tt>-related programming,
and I hope you enjoy my essay. And if you like this, you might also
enjoy <a class="reference external" href="https://lukasa.co.uk/2016/12/Debugging_Your_Operating_System/">Cory Benfield's related post</a>.]</p>
<hr class="docutils" />
<p>When programming in C, there are two standard ways to allocate some
new memory on the heap:</p>
<div class="highlight"><pre><span></span><span class="kt">void</span><span class="o">*</span> <span class="n">buffer1</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">size</span><span class="p">);</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">buffer2</span> <span class="o">=</span> <span class="n">calloc</span><span class="p">(</span><span class="n">count</span><span class="p">,</span> <span class="n">size</span><span class="p">);</span>
</pre></div>
<p><tt class="docutils literal">malloc</tt> allocates an <em>uninitialized</em> array with the given number of
bytes, i.e., <tt class="docutils literal">buffer1</tt> could contain anything. In terms of its
public API, <tt class="docutils literal">calloc</tt> is different in two ways: first, it takes two
arguments instead of one, and second, it returns memory that is
pre-initialized to be all-zeros. So there are lots of books and
webpages out there that will claim that the <tt class="docutils literal">calloc</tt> call above is
equivalent to calling <tt class="docutils literal">malloc</tt> and then calling <tt class="docutils literal">memset</tt> to fill
the memory with zeros:</p>
<div class="highlight"><pre><span></span><span class="cm">/* Equivalent to the calloc() call above -- OR IS IT?? */</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">buffer3</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">count</span> <span class="o">*</span> <span class="n">size</span><span class="p">);</span>
<span class="n">memset</span><span class="p">(</span><span class="n">buffer3</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">count</span> <span class="o">*</span> <span class="n">size</span><span class="p">);</span>
</pre></div>
<p>So... why does <tt class="docutils literal">calloc</tt> exist, if it's equivalent to these 2 lines?
The C library is not known for its excessive focus on providing
convenient shorthands!</p>
<p>It turns out the answer is less widely known than I had realized! If I
were <a class="reference external" href="https://drawings.jvns.ca/">Julia Evans</a> at this point I'd
make a neat little comic 😊. But I'm not, so... here's a wall of text.</p>
<p>It turns out there are actually two differences between calling
<tt class="docutils literal">calloc</tt>, versus calling <tt class="docutils literal">malloc</tt> + <tt class="docutils literal">memset</tt>.</p>
<div class="section" id="difference-1-computers-are-bad-at-arithmetic">
<h2>Difference #1: computers are bad at arithmetic</h2>
<p>When <tt class="docutils literal">calloc</tt> multiplies <tt class="docutils literal">count * size</tt>, it checks for overflow,
and errors out if the multiplication returns a value that can't fit
into a 32- or 64-bit integer (whichever one is relevant for your
platform). This is good. If you do the multiplication the naive way I
did it above by just writing <tt class="docutils literal">count * size</tt>, then if the values are
too large then the multiplication will silently wrap around, and
<tt class="docutils literal">malloc</tt> will happily allocate a smaller buffer than we
expected. That's bad. "<em>This</em> part of the code thought the buffer was
<em>this</em> long but <em>that</em> part of the code thought it was <em>that</em> long" is
the beginning of, like, eleventy-billion security advisories every
year. (<a class="reference external" href="http://undeadly.org/cgi?action=article&sid=20060330071917">Example</a>)</p>
<p>I wrote a little program to demonstrate. It tries to allocate an
buffer containing <span class="formula">2<sup>63</sup> × 2<sup>63</sup> = 2<sup>126</sup></span> bytes, first using
<tt class="docutils literal">malloc</tt> and then using <tt class="docutils literal">calloc</tt>:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/why-does-calloc-exist/calloc-overflow-demo.c" class="reference external">calloc-overflow-demo.c</a>
</div>
<div class="highlight"><pre><span></span><span class="linenos"> 6</span><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">**</span> <span class="n">argv</span><span class="p">)</span>
<span class="linenos"> 7</span><span class="p">{</span>
<span class="linenos"> 8</span> <span class="kt">size_t</span> <span class="n">huge</span> <span class="o">=</span> <span class="n">INTPTR_MAX</span><span class="p">;</span>
<span class="linenos"> 9</span>
<span class="linenos">10</span> <span class="kt">void</span><span class="o">*</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">huge</span> <span class="o">*</span> <span class="n">huge</span><span class="p">);</span>
<span class="linenos">11</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">buf</span><span class="p">)</span> <span class="n">perror</span><span class="p">(</span><span class="s">"malloc failed"</span><span class="p">);</span>
<span class="linenos">12</span> <span class="n">printf</span><span class="p">(</span><span class="s">"malloc(huge * huge) returned: %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">buf</span><span class="p">);</span>
<span class="linenos">13</span> <span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
<span class="linenos">14</span>
<span class="linenos">15</span> <span class="n">buf</span> <span class="o">=</span> <span class="n">calloc</span><span class="p">(</span><span class="n">huge</span><span class="p">,</span> <span class="n">huge</span><span class="p">);</span>
<span class="linenos">16</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">buf</span><span class="p">)</span> <span class="n">perror</span><span class="p">(</span><span class="s">"calloc failed"</span><span class="p">);</span>
<span class="linenos">17</span> <span class="n">printf</span><span class="p">(</span><span class="s">"calloc(huge, huge) returned: %p</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">buf</span><span class="p">);</span>
<span class="linenos">18</span> <span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
<span class="linenos">19</span><span class="p">}</span>
</pre></div>
<p>On my computer, I get:</p>
<div class="highlight"><pre><span></span><span class="go">~$ gcc calloc-overflow-demo.c -o calloc-overflow-demo</span>
<span class="go">~$ ./calloc-overflow-demo</span>
<span class="go">malloc(huge * huge) returned: 0x55c389d94010</span>
<span class="go">calloc failed: Cannot allocate memory</span>
<span class="go">calloc(huge, huge) returned: (nil)</span>
</pre></div>
<p>So yeah, apparently <tt class="docutils literal">malloc</tt> successfully allocated a
73786976294838206464 exbiyte array? I'm sure that will work out
well. This is a nice thing about <tt class="docutils literal">calloc</tt>: it helps avoid terrible
security flaws.</p>
<p>But, it's not <em>that</em> exciting. (I mean, let's be honest: if we really
cared about security we wouldn't be writing in C.) It only helps in
the particular case where you're deciding how much memory to allocate
by multiplying two numbers together. This happens, it's an important
case, but there are lots of other cases where we either aren't doing
any arithmetic at all, or where we're doing some more complex
arithmetic and need a more general solution. Plus, if we wanted to, we
could certainly write our own wrapper for <tt class="docutils literal">malloc</tt> that took two
arguments and multiplied them together with overflow checking. And in
fact if we want an overflow-safe version of <tt class="docutils literal">realloc</tt>, or if we
don't want the memory to be zero-initialized, then... we still have to
do that. So, it's... nice? But it doesn't really justify
<tt class="docutils literal">calloc</tt>'s existence.</p>
<p>The other difference, though? Is super, super important.</p>
</div>
<div class="section" id="difference-2-lies-damned-lies-and-virtual-memory">
<h2>Difference #2: lies, damned lies, and virtual memory</h2>
<p>Here's <a class="reference external" href="https://vorpus.org/blog/why-does-calloc-exist/calloc-1GiB-demo.c">a little benchmark program</a>
that measures how long it takes to <tt class="docutils literal">calloc</tt> a 1 gibibyte buffer
versus <tt class="docutils literal">malloc+memset</tt> a 1 gibibyte buffer. (Make sure you compile
without optimization, because modern compilers are clever enough to
know that <tt class="docutils literal"><span class="pre">free(calloc(...))</span></tt> is a no-op and optimize it out!) On my
laptop I get:</p>
<div class="highlight"><pre><span></span><span class="go">~$ gcc calloc-1GiB-demo.c -o calloc-1GiB-demo</span>
<span class="go">~$ ./calloc-1GiB-demo</span>
<span class="go">calloc+free 1 GiB: 3.44 ms</span>
<span class="go">malloc+memset+free 1 GiB: 365.00 ms</span>
</pre></div>
<p>i.e., <tt class="docutils literal">calloc</tt> is more than 100x faster. Our textbooks and manual
pages says they're equivalent. What the heck is going on?</p>
<p>The answer, of course, is that <tt class="docutils literal">calloc</tt> is cheating.</p>
<p>For small allocations, <tt class="docutils literal">calloc</tt> literally will just call
<tt class="docutils literal">malloc+memset</tt>, so it'll be the same speed. But for larger
allocations, most memory allocators will for various reasons make a
special request to the operating system to fetch more memory just for
this allocation. ("Small" and "large" here are determined by some
heuristics inside your memory allocator; for glibc "large" is anything
>128 KiB, <a class="reference external" href="https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html">at least in its default configuration</a>).</p>
<p>When the operating system hands out memory to a process, it always
zeros it out first, because otherwise our process would be able to
peek at whatever detritus was left in that memory by the last process
to use it, which might include, like, crypto keys, or embarrassing
fanfiction. So that's the first way that <tt class="docutils literal">calloc</tt> cheats: when you
call <tt class="docutils literal">malloc</tt> to allocate a large buffer, then <em>probably</em> the memory
will come from the operating system and already be zeroed, so there's
no need to call <tt class="docutils literal">memset</tt>. But you don't know that for sure! Memory
allocators are pretty inscrutable. So <em>you</em> have to call <tt class="docutils literal">memset</tt>
every time just in case. But <tt class="docutils literal">calloc</tt> lives inside the memory
allocator, so <em>it</em> knows whether the memory it's returning is fresh
from the operating system, and if it is then it skips calling
<tt class="docutils literal">memset</tt>. And this is why <tt class="docutils literal">calloc</tt> has to be built into the
standard library, and you can't efficiently fake it yourself as a
layer on top of <tt class="docutils literal">malloc</tt>.</p>
<p>But this only explains part of the speedup: <tt class="docutils literal">memset+malloc</tt> is
actually clearing the memory twice, and <tt class="docutils literal">calloc</tt> is clearing it
once, so we might expect <tt class="docutils literal">calloc</tt> to be 2x faster at
best. Instead... it's 100x faster. What the heck?</p>
<p>It turns out that the kernel is also cheating! When we ask it for 1
GiB of memory, it doesn't actually go out and find that much RAM and
write zeros to it and then hand it to our process. Instead, it fakes
it, using virtual memory: it takes a single 4 KiB <a class="reference external" href="https://drawings.jvns.ca/pagetable/">page</a> of memory that is already
full of zeros (which it keeps around for just this purpose), and maps
1 GiB / 4 KiB = 262144 <a class="reference external" href="https://drawings.jvns.ca/copyonwrite/">copy-on-write</a> copies of it into our
process's address space. So the first time we actually <em>write</em> to each
of those 262144 pages, then at that point the kernel has to go and
find a real page of RAM, write zeros to it, and then quickly swap it
in place of the "virtual" page that was there before. But this happens
lazily, on a page-by-page basis.</p>
<p>So in real life, the difference won't be as stark as it looks in our
benchmark up above – part of the trick is that <tt class="docutils literal">calloc</tt> is shifting
some of the cost of zero'ing out pages until later, while
<tt class="docutils literal">malloc+memset</tt> is paying the full price up front. BUT, at least we
aren't zero'ing them out twice. And at least we aren't trashing the
cache hierarchy up front – if we delay the zero'ing until we were
going to write to the pages anyway, then that means both writes happen
at the same time, so we only have to pay one set of TLB / L2 cache /
etc. misses. And, most importantly, it's possible we might never get
around to writing to all of those pages at all, in which case
<tt class="docutils literal">calloc</tt> + the kernel's sneaky trickery is a huge win!</p>
<p>Of course, the exact set of optimizations <tt class="docutils literal">calloc</tt> uses will vary
depending on your environment. A neat trick that used to be popular
was that the kernel would go around and speculatively zero out pages
when the system was idle, so that they'd be fresh and ready when
needed – but this is <a class="reference external" href="http://lists.dragonflybsd.org/pipermail/commits/2016-August/624202.html">out of fashion</a>
on current systems. Tiny embedded systems without virtual memory
obviously won't use virtual memory trickery. But in general,
<tt class="docutils literal">calloc</tt> is never worse than <tt class="docutils literal">malloc+memset</tt>, and on mainstream
systems it can do much better.</p>
<p>One real life example is a recent <a class="reference external" href="https://github.com/kennethreitz/requests/issues/3729">bug in requests</a>, where doing
streaming downloads over HTTPS with a large receive block size was
chewing up 100% CPU. It turns out that the problem was that when the
user said they were willing to handle up to 100 MiB chunks at a time,
then requests passed that on <a class="reference external" href="https://github.com/pyca/pyopenssl/issues/577">to pyopenssl</a>, and then pyopenssl
used <tt class="docutils literal">cffi.new</tt> to allocate a 100 MiB buffer to hold the incoming
data. But most of the time, there wasn't actually 100 MiB ready to
read on the connection; so pyopenssl would allocate this large buffer,
but then would only use a small part of it. Except... it turns out
that <tt class="docutils literal">cffi.new</tt> <a class="reference external" href="https://bitbucket.org/cffi/cffi/issues/295/cffinew-is-way-slower-than-it-should-be-it">emulates calloc by doing malloc+memset</a>,
so they were paying to allocate and zero the whole buffer anyway. If
<tt class="docutils literal">cffi.new</tt> had used <tt class="docutils literal">calloc</tt> instead, then the bug never would
have happened! Hopefully they'll fix that soon.</p>
<p>Or here's another example that comes up in <a class="reference external" href="http://www.numpy.org/">numpy</a>: suppose you want to make a big <a class="reference external" href="https://en.wikipedia.org/wiki/Identity_matrix">identity
matrix</a>, one with
16384 rows and 16384 columns. That requires allocating a buffer to
hold 16384 * 16384 floating point numbers, and each float is 8 bytes,
so that comes to 2 GiB of memory total.</p>
<p>Before we create the matrix, our process is using 24 MiB of memory:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">resource</span>
<span class="gp">>>> </span><span class="c1"># this way of fetching memory usage probably only works right on Linux:</span>
<span class="gp">>>> </span><span class="k">def</span> <span class="nf">mebibytes_used</span><span class="p">():</span>
<span class="gp">... </span> <span class="k">return</span> <span class="n">resource</span><span class="o">.</span><span class="n">getrusage</span><span class="p">(</span><span class="n">resource</span><span class="o">.</span><span class="n">RUSAGE_SELF</span><span class="p">)</span><span class="o">.</span><span class="n">ru_maxrss</span> <span class="o">/</span> <span class="mi">1024</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">mebibytes_used</span><span class="p">()</span>
<span class="go">24.35546875</span>
</pre></div>
<p>Then we allocate a 2 GiB dense matrix:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">big_identity_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">eye</span><span class="p">(</span><span class="mi">16384</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">big_identity_matrix</span>
<span class="go">array([[ 1., 0., 0., ..., 0., 0., 0.],</span>
<span class="go"> [ 0., 1., 0., ..., 0., 0., 0.],</span>
<span class="go"> [ 0., 0., 1., ..., 0., 0., 0.],</span>
<span class="go"> ...,</span>
<span class="go"> [ 0., 0., 0., ..., 1., 0., 0.],</span>
<span class="go"> [ 0., 0., 0., ..., 0., 1., 0.],</span>
<span class="go"> [ 0., 0., 0., ..., 0., 0., 1.]])</span>
<span class="gp">>>> </span><span class="n">big_identity_matrix</span><span class="o">.</span><span class="n">shape</span>
<span class="go">(16384, 16384)</span>
</pre></div>
<p>How much memory is our process using now? The Answer May Surprise You
(Learn This One Weird Trick To Slim Down Your Processes Now):</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">mebibytes_used</span><span class="p">()</span>
<span class="go">88.3515625</span>
</pre></div>
<p>Numpy allocated the array using <tt class="docutils literal">calloc</tt>, and then it wrote 1s in
the diagonal... but most of the array is still zeros, so it isn't
actually taking up any memory, and our 2 GiB matrix fits into ~60 MiB
of actual RAM. Of course there are other ways to accomplish the same
thing, like using a real <a class="reference external" href="https://en.wikipedia.org/wiki/Sparse_matrix">sparse matrix library</a>, but that's not the
point. The point is that <em>if</em> you do something like this, <tt class="docutils literal">calloc</tt>
will magically make everything more efficient – and it's always at
least as fast as the alternative.</p>
<p>So basically, <tt class="docutils literal">calloc</tt> exists because it lets the memory allocator
and kernel engage in a sneaky conspiracy to make your code faster and
use less memory. You should let it! Don't use <tt class="docutils literal">malloc+memset</tt>!</p>
<hr class="docutils" />
<p><em>Changes history:</em></p>
<ul class="simple">
<li>2016-12-05 14:00 PST: Fix typo: HTTP where I meant HTTPS.</li>
<li>2016-12-05 17:00 PST: Add the HN note.</li>
<li>2016-12-07 01:45 PST: Several changes:<ul>
<li><a class="reference external" href="https://www.reddit.com/r/C_Programming/comments/5grwep/why_does_calloc_exist/daur0r9/">Use better error checking style</a>
in <tt class="docutils literal"><span class="pre">calloc-overflow-demo.c</span></tt></li>
<li>Clarify that I'm not saying you <em>can't</em> reimplement <tt class="docutils literal">calloc</tt>
yourself and get good performance, just that if you want to do
that you have to reimplement <tt class="docutils literal">malloc</tt> too – the point is that
<tt class="docutils literal">calloc</tt> might look like it can be implemented in terms of
<tt class="docutils literal">malloc</tt>, but this is misleading.</li>
<li>Add a paragraph noting more explicitly that <tt class="docutils literal">calloc</tt>
optimizations vary across systems.</li>
</ul>
</li>
</ul>
</div>
Some thoughts on asynchronous API design in a post-async/await world2016-11-05T00:00:00-07:002016-11-05T00:00:00-07:00Nathaniel J. Smithtag:vorpus.org,2016-11-05:/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/<p class="first last">This is a very long essay in which I try to think through
the implications of Python 3.5's new async/await support for
how we write asynchronous code. (Spoiler: I think curio has
a lot to teach us.)</p>
<p>I've recently been exploring the exciting new world of asynchronous
I/O libraries in Python 3 – specifically <a class="reference external" href="https://docs.python.org/3/library/asyncio.html">asyncio</a> and <a class="reference external" href="http://curio.readthedocs.org/">curio</a>. These two libraries make some
different design choices. This is an essay that I wrote to try to
explain to myself what those differences are and why I think they
matter, and distill some principles for designing event loop APIs and
asynchronous libraries in Python. This is a quickly changing area and
the ideas here are very much still under development, so this text
probably assumes all kinds of background knowledge and possibly that
you live inside my head – but maybe you'll find it interesting
anyway. I'd love to <a class="reference external" href="mailto:njs@pobox.com">hear what you think</a> or
<a class="reference external" href="https://mail.python.org/mailman/listinfo/async-sig">discuss further</a>.</p>
<p>[<strong>Update, 2017-03-10:</strong> While the text below focuses on Curio, most
of the commentary also applies to <a class="reference external" href="https://github.com/python-trio/trio">Trio</a>, which is my new Curio-like
library that came out of this blog post.]</p>
<div class="contents topic" id="contents">
<p class="topic-title"><strong>Contents:</strong></p>
<ul class="simple">
<li><a class="reference internal" href="#the-curious-effectiveness-of-curio" id="id2">The curious effectiveness of curio</a></li>
<li><a class="reference internal" href="#callback-soup-considered-harmful" id="id3">Callback soup considered harmful</a></li>
<li><a class="reference internal" href="#example-a-simple-proxy-server" id="id4">Example: a simple proxy server</a><ul>
<li><a class="reference internal" href="#three-examples" id="id5">Three examples</a><ul>
<li><a class="reference internal" href="#example-1-asyncio-with-callbacks" id="id6">Example #1: asyncio, with callbacks</a></li>
<li><a class="reference internal" href="#example-2-curio-with-async-await" id="id7">Example #2: curio, with async/await</a></li>
<li><a class="reference internal" href="#example-3-asyncio-with-async-await" id="id8">Example #3: asyncio, with async/await</a></li>
</ul>
</li>
<li><a class="reference internal" href="#three-bugs" id="id9">Three bugs</a><ul>
<li><a class="reference internal" href="#bug-1-backpressure" id="id10">Bug #1: backpressure</a></li>
<li><a class="reference internal" href="#bug-2-read-side-buffering" id="id11">Bug #2: read-side buffering</a></li>
<li><a class="reference internal" href="#bug-3-closing-time" id="id12">Bug #3: closing time</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#c-c-c-c-causality-breaker" id="id13">C-c-c-c-causality breaker</a><ul>
<li><a class="reference internal" href="#who-needs-causality-really" id="id14">Who needs causality, really?</a><ul>
<li><a class="reference internal" href="#http-servers" id="id15">HTTP servers</a></li>
<li><a class="reference internal" href="#websocket-servers" id="id16">Websocket servers</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#other-challenges-for-hybrid-apis" id="id17">Other challenges for hybrid APIs</a><ul>
<li><a class="reference internal" href="#timeouts-and-cancellation" id="id18">Timeouts and cancellation</a></li>
<li><a class="reference internal" href="#event-loop-lifecycle-management" id="id19">Event loop lifecycle management</a></li>
<li><a class="reference internal" href="#getting-in-touch-with-your-event-loop" id="id20">Getting in touch with your event loop</a></li>
<li><a class="reference internal" href="#context-passing-task-local-storage" id="id21">Context passing / task-local storage</a></li>
<li><a class="reference internal" href="#implementation-complexity" id="id22">Implementation complexity</a></li>
</ul>
</li>
<li><a class="reference internal" href="#review-and-summing-up-what-is-async-await-native-anyway" id="id23">Review and summing up: what is "async/await-native" anyway?</a></li>
<li><a class="reference internal" href="#open-questions" id="id24">Open questions</a><ul>
<li><a class="reference internal" href="#for-async-await-native-apis" id="id25">...for async/await-native APIs</a><ul>
<li><a class="reference internal" href="#orphan-tasks" id="id26">Orphan tasks</a></li>
<li><a class="reference internal" href="#cleanup-in-generators-and-async-generators" id="id27">Cleanup in generators and async generators</a></li>
</ul>
</li>
<li><a class="reference internal" href="#for-the-python-asynchronous-i-o-ecosystem" id="id28">...for the Python asynchronous I/O ecosystem</a><ul>
<li><a class="reference internal" href="#do-you-really-think-everyone-s-going-to-abandon-callbacks" id="id29">Do you really think everyone's going to abandon callbacks?</a></li>
<li><a class="reference internal" href="#so-should-i-drop-asyncio-twisted-etc-and-rewrite-everything-using-curio-tomorrow" id="id30">So should I drop asyncio/twisted/etc. and rewrite everything using curio tomorrow?</a></li>
<li><a class="reference internal" href="#should-asyncio-be-fixed-to-have-a-curio-style-async-await-native-api" id="id31">Should asyncio be "fixed" to have a curio-style async/await-native API?</a></li>
<li><a class="reference internal" href="#okay-then-should-curio-switch-to-using-asyncio-as-a-backend-or-what-will-the-story-be-on-cross-event-loop-compatibility-i-thought-asyncio-was-supposed-to-be-the-event-loop-to-end-all-event-loops" id="id32">Okay, then should curio switch to using asyncio as a backend? Or what will the story be on cross-event-loop compatibility? I thought asyncio was supposed to be the event loop to end all event loops!</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#where-next" id="id33">Where next?</a></li>
<li><a class="reference internal" href="#acknowledgements" id="id34">Acknowledgements</a></li>
</ul>
</div>
<div class="section" id="the-curious-effectiveness-of-curio">
<h2><a class="toc-backref" href="#id2">The curious effectiveness of curio</a></h2>
<p>So here's the tentative conclusion that spurred this essay, and which
surprised the heck out of me: the more I work with curio, the more
plausible it seems that in a few years, asyncio might find itself
relegated to becoming one of those stdlib libraries that savvy
developers avoid, like urllib2.</p>
<p>I'm not saying that the library we'll all be using instead will
necessarily <em>be</em> curio, or that asyncio can't possibly find some way
to adapt and avoid this fate, or that you should go switch to curio
right now – the practicalities of choosing a library are
complicated. Let me put it in bold: <strong>This is not an essay about curio
versus asyncio and which one is "the best".</strong> I'll talk a lot about
those two libraries, but for present purposes I'm profoundly
uninterested in things like which one wins at such-and-such
microbenchmark as of which-ever latest release, and I don't have any
personal investment in either. The reason I talk about them is because
they make good <em>illustrative examples</em> of two very different design
strategies.</p>
<p>The goal of this essay is to understand the trade-offs between the
"curio-style" design strategy versus the "asyncio-style" design
strategy. So first, I'll try to articulate a conceptual framework for
understanding what these two strategies actually are, and how they
differ – this is something I haven't seen discussed elsewhere. Then to
make that more concrete, I'll walk through some concrete examples
using the two libraries, and see how these underlying design decisions
play out in specific real world use cases. It turns out that in these
examples, the "curio-style" produces better results; I'll try to pull
out the general principles that explain why this happens, and that
might give us hints on how to design or improve new APIs for both
event loops and for the libraries that use them. Unfortunately, one of
the conclusions I come to is that it's hard to see how these
advantages could be "retrofitted" to asyncio – but I could be wrong,
and at least once we understand them we can have a conversation about
how to make Python's async I/O ecosystem as awesome as possible,
whatever that ends up looking like; I'll conclude by sketching out
some possible directions this could go.</p>
</div>
<div class="section" id="callback-soup-considered-harmful">
<h2><a class="toc-backref" href="#id3">Callback soup considered harmful</a></h2>
<p>The basic difference between asyncio and curio comes down to their
attitude towards Python 3.5's new async/await syntax. But before we
talk about the best way to use async/await, lets digress to talk about
why async/await even matters. ...Actually I'm going to digress even
more then that. Let's start by talking about what programming
languages are for.</p>
<p>It's easy to forget sometimes just how much work a modern language
like Python does to guide and constrain how you put together a
program. Like, just for the most basic example, consider how simply
juxtaposing two statements <tt class="docutils literal"><span class="pre">f();</span> g()</tt> expresses ordering: you know
that <tt class="docutils literal">g</tt> won't start executing until <tt class="docutils literal">f</tt> has finished. Another
example – the call stack tracks relationships between callers and
callees, allowing us to decompose our program into loosely-coupled
subroutines: a function doesn't need to keep track of who called it,
it can just say <tt class="docutils literal">return</tt> to fire a value into the void, and the
language runtime makes some magic happen so the value and the control
flow are delivered simultaneously to the right place. Exception
handling defines a systematic, structured way to unwind these
decoupled subroutines when something goes wrong; if you think about it
this trick of taking the call stack and <em>reusing</em> it as the unwind
stack is really quite clever. <tt class="docutils literal">with</tt> blocks build on that by giving
us an ergonomic way to pin the lifetime of resources – file handles
and so forth – to the dynamic lifetime of these call
stacks. Iterators track the state needed to bind control flow to the
shape of data.</p>
<p>These tools are so fundamental that they disappear into the background
– when we're typing out code we don't usually stop to think about all
the wacky ways that things could be done differently. Yet these all
had to be invented. In functional languages, <tt class="docutils literal"><span class="pre">f();</span> g()</tt> doesn't
express ordering. Back in the 1970s, the idea of <em>limiting</em> yourself
to using just function calls, loops, and if statements – <tt class="docutils literal">goto</tt>'s
hamstrung siblings – was <a class="reference external" href="https://en.wikipedia.org/wiki/Structured_programming#History">*incredibly controversial*</a>. There
are great living languages that disagree with Python about lots of the
points above. But Python's particular toolkit has been refined over
decades, and fits together to provide a powerful scaffolding for
structuring our code.</p>
<p>...until you want to do some asynchronous I/O. The traditional way to
do this is to have an <em>event loop</em> and use <em>callback-based
programming</em>: the event loop keeps a big table of future events that
you want to respond to when they happen (e.g., "this socket became
readable", "that timer expired"), and for each event you have a
corresponding callback. The event loop takes care of checking for
events and invoking callbacks, but if you want structure beyond that –
like the kind of things we just discussed above: causal sequencing,
delegation to and return from subroutines, error unwinding, resource
cleanup, iteration – then you get to build that yourself. You can do
it, just like you can use <tt class="docutils literal">goto</tt> to build loops and function
calls. Frameworks like Twisted and its descendents have invented all
kinds of useful strategies for keeping these callbacks organized, like
<a class="reference external" href="https://twistedmatrix.com/documents/current/core/howto/servers.html#protocols">protocols</a>
and <a class="reference external" href="https://twistedmatrix.com/documents/current/core/howto/defer.html">deferreds</a>
/ <a class="reference external" href="https://docs.python.org/3/library/asyncio-task.html#asyncio.Future">futures</a>
and even some kind of <a class="reference external" href="http://www.tornadoweb.org/en/stable/stack_context.html">exception handling</a> – but these
are still a pretty thin layer of structure on top of the underlying
unstructured callback soup, and from the perspective of regular Python
they're like some other mutant alternative programming language.</p>
<p>That's why <a class="reference external" href="https://www.python.org/dev/peps/pep-0492/">PEP 492</a>
and async/await are so exciting: they let us take Python's regular
toolkit for solving these problems, and start using it in asynchronous
code. Which is awesome, because frankly, Twisted, I love you and
deferreds are pretty cool, but as abstract languages for describing
computation go then real-actual-Python is wayyy better.</p>
<p>And with that background, then, I think I can articulate the key
difference between asyncio-style event-loop APIs and curio-style
event-loop APIs:</p>
<p>Asyncio is first and foremost a traditional callback-based API, with
async/await layered on top as a supplementary tool. And if you're
starting from a callback-oriented base, then this is a great addition:
async/await provide a major boost in usability without disrupting the
basic framework. Asyncio is what we might call a "hybrid" system:
callbacks <em>plus</em> async/await.</p>
<p>Curio takes this a step further, and throws out the callback API
altogether; it's async/await all the way down. Specifically, it still
has an event loop, but instead of managing arbitrary callbacks, it
manages async functions; there's exactly one way it can respond when
an event fires, and that's by resuming an async call-stack. I'll call
this the "async/await-native" approach.</p>
<p>The main point I want to argue in this essay – the point of all the
examples below – is that if you're using a hybrid API like asyncio,
then you <em>can</em> ignore the callback API and write structured
async/await code. But, even if you stick to async/await everywhere,
the underlying abstractions are leaky, so don't get the full
advantages. Your async/await functions are dumplings of local
structure floating on top of callback soup, and this has far-reaching
implications for the simplicity and correctness of your code. Python's
structuring tools were designed to fit together as a system – e.g.,
exception handling relies on the call stack, and <tt class="docutils literal">with</tt> blocks rely
on exception handling – and if you have a mix of structured and
unstructured parts, then this creates lots of unnecessary problems,
<em>even if</em> you stick to the structured async/await layer of the
library. In a curio-style async/await-native API, by contrast, your
whole program uses this one consistent set of structuring principles,
and this consistency – it turns out – has pervasive benefits.</p>
<p>What I'm arguing, in effect, is that asyncio is a victim of its own
success: when it was designed, it used the best approach possible; but
since then, work inspired by asyncio – like the addition of
async/await – has shifted the landscape so that we can do even better,
and now asyncio is hamstrung by its earlier commitments.</p>
<p>To make that more specific, let's look at some concrete examples.</p>
</div>
<div class="section" id="example-a-simple-proxy-server">
<h2><a class="toc-backref" href="#id4">Example: a simple proxy server</a></h2>
<p>For our main example, I'll take a simple proxy server, equivalent to
<tt class="docutils literal">socat <span class="pre">-u</span> <span class="pre">TCP-LISTEN:$LOCAL_PORT</span>
<span class="pre">TCP:$REMOTE_HOST:$REMOTE_PORT</span></tt>. Specifically, given a local port, a
remote host, and a remote port, we want to:</p>
<ol class="arabic simple">
<li>Listen for connections on the local port.</li>
<li>Accept a single connection.</li>
<li>Make a connection to the remote host + port.</li>
<li>Copy data from the local port to the remote port. (One way only, to
keep things simple.)</li>
<li>Exit after all the data has been copied.</li>
</ol>
<p>In addition, I'll follow these rules, to the best of my ability:</p>
<ul class="simple">
<li>Readability counts: I'll write each version in as elegant a manner
as I can.</li>
<li>No cheating: Since this is a toy one-off program, there are things
we could get away with that wouldn't fly if this were real, reusable
library code – like using global variables, or leaving open sockets
dangling to be cleaned up by the kernel when our program exits. To
make this more representative of real code, I'll hold myself to
those higher standards.</li>
</ul>
<div class="section" id="three-examples">
<h3><a class="toc-backref" href="#id5">Three examples</a></h3>
<div class="section" id="example-1-asyncio-with-callbacks">
<h4><a class="toc-backref" href="#id6">Example #1: asyncio, with callbacks</a></h4>
<p>Let's start by showing what this looks like using a traditional
callback approach. (The two examples after this will demonstrate curio
and asyncio's version of async/await-based APIs, which is what most
people will want to use – this first example is to provide context for
those.) I'll demonstrate with <a class="reference external" href="https://docs.python.org/3/library/asyncio-protocol.html">asyncio's "protocol" API</a>, though
the basic design here is inherited almost directly from the immensely
influential Twisted (a <a class="reference external" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/twisted-proxy.py">Twisted version</a>
is also available for the curious). Here's the complete code, and then
I'll give some commentary on how it works:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/asyncio-proxy-protocols.py" class="reference external">asyncio-proxy-protocols.py</a>
</div>
<div class="highlight"><pre><span></span><span class="linenos"> 1</span><span class="kn">import</span> <span class="nn">sys</span>
<span class="linenos"> 2</span><span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">partial</span>
<span class="linenos"> 3</span><span class="kn">import</span> <span class="nn">traceback</span>
<span class="linenos"> 4</span><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="linenos"> 5</span>
<span class="linenos"> 6</span><span class="k">class</span> <span class="nc">OneWayProxySource</span><span class="p">(</span><span class="n">asyncio</span><span class="o">.</span><span class="n">Protocol</span><span class="p">):</span>
<span class="linenos"> 7</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">loop</span><span class="p">,</span> <span class="n">server_task_container</span><span class="p">,</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">):</span>
<span class="linenos"> 8</span> <span class="bp">self</span><span class="o">.</span><span class="n">loop</span> <span class="o">=</span> <span class="n">loop</span>
<span class="linenos"> 9</span> <span class="bp">self</span><span class="o">.</span><span class="n">server_task</span> <span class="o">=</span> <span class="n">server_task_container</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="linenos">10</span> <span class="bp">self</span><span class="o">.</span><span class="n">dest_host</span> <span class="o">=</span> <span class="n">dest_host</span>
<span class="linenos">11</span> <span class="bp">self</span><span class="o">.</span><span class="n">dest_port</span> <span class="o">=</span> <span class="n">dest_port</span>
<span class="linenos">12</span>
<span class="linenos">13</span> <span class="k">def</span> <span class="nf">connection_made</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">transport</span><span class="p">):</span>
<span class="linenos">14</span> <span class="c1"># Stop listening for new connections</span>
<span class="linenos">15</span> <span class="bp">self</span><span class="o">.</span><span class="n">server_task</span><span class="o">.</span><span class="n">cancel</span><span class="p">()</span>
<span class="linenos">16</span>
<span class="linenos">17</span> <span class="c1"># Save our transport</span>
<span class="linenos">18</span> <span class="bp">self</span><span class="o">.</span><span class="n">transport</span> <span class="o">=</span> <span class="n">transport</span>
<span class="linenos">19</span>
<span class="linenos">20</span> <span class="c1"># Disable reading until the destination is ready.</span>
<span class="linenos">21</span> <span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">pause_reading</span><span class="p">()</span>
<span class="linenos">22</span>
<span class="linenos">23</span> <span class="c1"># Connect to the destination</span>
<span class="linenos">24</span> <span class="bp">self</span><span class="o">.</span><span class="n">dest_protocol</span> <span class="o">=</span> <span class="n">OneWayProxyDest</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">loop</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="p">)</span>
<span class="linenos">25</span> <span class="n">coro</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">loop</span><span class="o">.</span><span class="n">create_connection</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">dest_protocol</span><span class="p">,</span>
<span class="linenos">26</span> <span class="bp">self</span><span class="o">.</span><span class="n">dest_host</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">dest_port</span><span class="p">)</span>
<span class="linenos">27</span> <span class="n">task</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">loop</span><span class="o">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">coro</span><span class="p">)</span>
<span class="linenos">28</span> <span class="k">def</span> <span class="nf">connection_check_for_failure</span><span class="p">(</span><span class="n">fut</span><span class="p">):</span>
<span class="linenos">29</span> <span class="n">exc</span> <span class="o">=</span> <span class="n">fut</span><span class="o">.</span><span class="n">exception</span><span class="p">()</span>
<span class="linenos">30</span> <span class="k">if</span> <span class="n">exc</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="linenos">31</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Failed to connect:"</span><span class="p">)</span>
<span class="linenos">32</span> <span class="c1"># This isn't really right -- it doesn't handle exception</span>
<span class="linenos">33</span> <span class="c1"># chaining etc. I lack the will to worry about it.</span>
<span class="linenos">34</span> <span class="n">traceback</span><span class="o">.</span><span class="n">print_tb</span><span class="p">(</span><span class="n">exc</span><span class="o">.</span><span class="n">__traceback__</span><span class="p">)</span>
<span class="linenos">35</span> <span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">abort</span><span class="p">()</span>
<span class="linenos">36</span> <span class="bp">self</span><span class="o">.</span><span class="n">loop</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
<span class="linenos">37</span> <span class="n">task</span><span class="o">.</span><span class="n">add_done_callback</span><span class="p">(</span><span class="n">connection_check_for_failure</span><span class="p">)</span>
<span class="linenos">38</span>
<span class="linenos">39</span> <span class="k">def</span> <span class="nf">data_received</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="linenos">40</span> <span class="bp">self</span><span class="o">.</span><span class="n">dest_protocol</span><span class="o">.</span><span class="n">send_data</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="linenos">41</span>
<span class="linenos">42</span> <span class="k">def</span> <span class="nf">connection_lost</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">exc</span><span class="p">):</span>
<span class="linenos">43</span> <span class="bp">self</span><span class="o">.</span><span class="n">dest_protocol</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="linenos">44</span>
<span class="linenos">45</span><span class="k">class</span> <span class="nc">OneWayProxyDest</span><span class="p">(</span><span class="n">asyncio</span><span class="o">.</span><span class="n">Protocol</span><span class="p">):</span>
<span class="linenos">46</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">loop</span><span class="p">,</span> <span class="n">source_transport</span><span class="p">):</span>
<span class="linenos">47</span> <span class="bp">self</span><span class="o">.</span><span class="n">loop</span> <span class="o">=</span> <span class="n">loop</span>
<span class="linenos">48</span> <span class="bp">self</span><span class="o">.</span><span class="n">source_transport</span> <span class="o">=</span> <span class="n">source_transport</span>
<span class="linenos">49</span>
<span class="linenos">50</span> <span class="k">def</span> <span class="nf">connection_made</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">transport</span><span class="p">):</span>
<span class="linenos">51</span> <span class="bp">self</span><span class="o">.</span><span class="n">transport</span> <span class="o">=</span> <span class="n">transport</span>
<span class="linenos">52</span> <span class="c1"># Okay, now we're ready for data to start flowing</span>
<span class="linenos">53</span> <span class="bp">self</span><span class="o">.</span><span class="n">source_transport</span><span class="o">.</span><span class="n">resume_reading</span><span class="p">()</span>
<span class="linenos">54</span>
<span class="linenos">55</span> <span class="k">def</span> <span class="nf">send_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="linenos">56</span> <span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="linenos">57</span>
<span class="linenos">58</span> <span class="k">def</span> <span class="nf">close</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="linenos">59</span> <span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">write_eof</span><span class="p">()</span>
<span class="linenos">60</span>
<span class="linenos">61</span> <span class="k">def</span> <span class="nf">connection_lost</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">exc</span><span class="p">):</span>
<span class="linenos">62</span> <span class="bp">self</span><span class="o">.</span><span class="n">source_transport</span><span class="o">.</span><span class="n">abort</span><span class="p">()</span>
<span class="linenos">63</span> <span class="bp">self</span><span class="o">.</span><span class="n">loop</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
<span class="linenos">64</span>
<span class="linenos">65</span><span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">source_port</span><span class="p">,</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">):</span>
<span class="linenos">66</span> <span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
<span class="linenos">67</span> <span class="n">server_task_container</span> <span class="o">=</span> <span class="p">[]</span>
<span class="linenos">68</span> <span class="n">coro</span> <span class="o">=</span> <span class="n">loop</span><span class="o">.</span><span class="n">create_server</span><span class="p">(</span>
<span class="linenos">69</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">OneWayProxySource</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="n">server_task_container</span><span class="p">,</span>
<span class="linenos">70</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">),</span>
<span class="linenos">71</span> <span class="s2">"localhost"</span><span class="p">,</span> <span class="n">source_port</span><span class="p">)</span>
<span class="linenos">72</span> <span class="n">server_task</span> <span class="o">=</span> <span class="n">loop</span><span class="o">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">coro</span><span class="p">)</span>
<span class="linenos">73</span> <span class="n">server_task_container</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">server_task</span><span class="p">)</span>
<span class="linenos">74</span> <span class="n">loop</span><span class="o">.</span><span class="n">run_forever</span><span class="p">()</span>
<span class="linenos">75</span> <span class="n">loop</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="linenos">76</span>
<span class="linenos">77</span><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="linenos">78</span> <span class="k">try</span><span class="p">:</span>
<span class="linenos">79</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">3</span><span class="p">])]</span>
<span class="linenos">80</span> <span class="k">except</span> <span class="ne">Exception</span><span class="p">:</span>
<span class="linenos">81</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Usage: </span><span class="si">{}</span><span class="s2"> SOURCE_PORT DEST_HOST DEST_PORT"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="vm">__file__</span><span class="p">))</span>
<span class="linenos">82</span> <span class="k">else</span><span class="p">:</span>
<span class="linenos">83</span> <span class="n">main</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
</pre></div>
<p>There's a lot going on here, and the details aren't that important;
like I said, this is mostly here to provide context for the next two
examples. As a rough outline to get the idea, though:</p>
<ul class="simple">
<li><tt class="docutils literal">OneWayProxySource</tt> manages the incoming connection. When a
connection is made (line 13) it first does some bookkeeping, then
starts the outgoing connection (lines 24-36). Then when incoming
data is received, or the incoming connection is closed, it forwards
that on to the outgoing connection (lines 38-42).</li>
<li><tt class="docutils literal">OneWayProxyDest</tt> manages the outgoing connection. In particular,
it's responsible for actually sending data (lines 54-58), and
shutting down the program once all the data has been sent (lines
60-61).</li>
<li><tt class="docutils literal">main</tt> has the job of setting up the listening socket and
arranging for incoming connections to be allocated a
<tt class="docutils literal">OneWayProxySource</tt> object (lines 66-69).</li>
<li>And then there's the usual <tt class="docutils literal">if __name__ == __main__</tt> boilerplate
at the bottom.</li>
</ul>
<p>There's two main things I want to take away here:</p>
<ol class="arabic simple">
<li>The control flow is not at all straightforward or easy to follow.</li>
<li>All the actual reading and writing takes place "off-screen". By the
time our <tt class="docutils literal">data_received</tt> callback is run, someone else has
already done the work of reading data off the network and into
memory, and when we send data using <tt class="docutils literal">self.transport.write</tt>, that
doesn't actually do any sending. (How could it? Writing data takes
time, and we aren't allowed to block.) Instead, what it does is
<em>queue the data to be sent later</em>. This is also why we can't just
shut down after calling <tt class="docutils literal">self.transport.write_eof()</tt> – that just
schedules the socket to be closed later, and we have to wait until
<tt class="docutils literal">OneWayProxyDest.connection_lost</tt> is called to let us know that
the closure has actually happened.</li>
</ol>
</div>
<div class="section" id="example-2-curio-with-async-await">
<h4><a class="toc-backref" href="#id7">Example #2: curio, with async/await</a></h4>
<p>Now here's the equivalent program, but with curio. It looks very
different, and I'll go through it in more detail:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/curio-proxy.py" class="reference external">curio-proxy.py</a>
</div>
<div class="highlight"><pre><span></span><span class="linenos"> 1</span><span class="kn">import</span> <span class="nn">sys</span>
<span class="linenos"> 2</span><span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">partial</span>
<span class="linenos"> 3</span><span class="kn">import</span> <span class="nn">curio</span>
<span class="linenos"> 4</span>
<span class="linenos"> 5</span><span class="n">READ_SIZE</span> <span class="o">=</span> <span class="mi">20000</span>
<span class="linenos"> 6</span>
<span class="linenos"> 7</span><span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">source_port</span><span class="p">,</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">):</span>
<span class="linenos"> 8</span> <span class="n">main_task</span> <span class="o">=</span> <span class="k">await</span> <span class="n">curio</span><span class="o">.</span><span class="n">current_task</span><span class="p">()</span>
<span class="linenos"> 9</span> <span class="n">bound_cb</span> <span class="o">=</span> <span class="n">partial</span><span class="p">(</span><span class="n">proxy</span><span class="p">,</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">,</span> <span class="n">main_task</span><span class="p">)</span>
<span class="linenos">10</span> <span class="k">await</span> <span class="n">curio</span><span class="o">.</span><span class="n">tcp_server</span><span class="p">(</span><span class="s2">"localhost"</span><span class="p">,</span> <span class="n">source_port</span><span class="p">,</span> <span class="n">bound_cb</span><span class="p">)</span>
<span class="linenos">11</span>
<span class="linenos">12</span><span class="k">async</span> <span class="k">def</span> <span class="nf">proxy</span><span class="p">(</span><span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">,</span> <span class="n">main_task</span><span class="p">,</span> <span class="n">source_sock</span><span class="p">,</span> <span class="n">addr</span><span class="p">):</span>
<span class="linenos">13</span> <span class="k">await</span> <span class="n">main_task</span><span class="o">.</span><span class="n">cancel</span><span class="p">()</span>
<span class="linenos">14</span> <span class="n">dest_sock</span> <span class="o">=</span> <span class="k">await</span> <span class="n">curio</span><span class="o">.</span><span class="n">open_connection</span><span class="p">(</span><span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">)</span>
<span class="linenos">15</span> <span class="k">async</span> <span class="k">with</span> <span class="n">dest_sock</span><span class="p">:</span>
<span class="linenos">16</span> <span class="k">await</span> <span class="n">copy_all</span><span class="p">(</span><span class="n">source_sock</span><span class="p">,</span> <span class="n">dest_sock</span><span class="p">)</span>
<span class="linenos">17</span>
<span class="linenos">18</span><span class="k">async</span> <span class="k">def</span> <span class="nf">copy_all</span><span class="p">(</span><span class="n">source_sock</span><span class="p">,</span> <span class="n">dest_sock</span><span class="p">):</span>
<span class="linenos">19</span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="linenos">20</span> <span class="n">data</span> <span class="o">=</span> <span class="k">await</span> <span class="n">source_sock</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="n">READ_SIZE</span><span class="p">)</span>
<span class="linenos">21</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span> <span class="c1"># EOF</span>
<span class="linenos">22</span> <span class="k">return</span>
<span class="linenos">23</span> <span class="k">await</span> <span class="n">dest_sock</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="linenos">24</span>
<span class="linenos">25</span><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="linenos">26</span> <span class="k">try</span><span class="p">:</span>
<span class="linenos">27</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">3</span><span class="p">])]</span>
<span class="linenos">28</span> <span class="k">except</span> <span class="ne">Exception</span><span class="p">:</span>
<span class="linenos">29</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Usage: </span><span class="si">{}</span><span class="s2"> SOURCE_PORT DEST_HOST DEST_PORT"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="vm">__file__</span><span class="p">))</span>
<span class="linenos">30</span> <span class="k">else</span><span class="p">:</span>
<span class="linenos">31</span> <span class="n">curio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">))</span>
</pre></div>
<p>First our <tt class="docutils literal">main</tt> function sets up a listening socket, and arranges
for the <tt class="docutils literal">proxy</tt> function to be invoked on each incoming connection:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/curio-proxy.py" class="reference external">curio-proxy.py</a>
</div>
<div class="highlight"><pre><span></span><span class="linenos"> 7</span><span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">source_port</span><span class="p">,</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">):</span>
<span class="linenos"> 8</span> <span class="n">main_task</span> <span class="o">=</span> <span class="k">await</span> <span class="n">curio</span><span class="o">.</span><span class="n">current_task</span><span class="p">()</span>
<span class="linenos"> 9</span> <span class="n">bound_cb</span> <span class="o">=</span> <span class="n">partial</span><span class="p">(</span><span class="n">proxy</span><span class="p">,</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">,</span> <span class="n">main_task</span><span class="p">)</span>
<span class="linenos">10</span> <span class="k">await</span> <span class="n">curio</span><span class="o">.</span><span class="n">tcp_server</span><span class="p">(</span><span class="s2">"localhost"</span><span class="p">,</span> <span class="n">source_port</span><span class="p">,</span> <span class="n">bound_cb</span><span class="p">)</span>
</pre></div>
<p>When a connection arrives, <tt class="docutils literal">proxy</tt> first tells <tt class="docutils literal">main</tt> to stop
listening (line 13), since we only want to handle one connection. Then
it sets up the outgoing connection (line 14), and invokes <tt class="docutils literal">copy_all</tt>:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/curio-proxy.py" class="reference external">curio-proxy.py</a>
</div>
<div class="highlight"><pre><span></span><span class="linenos">12</span><span class="k">async</span> <span class="k">def</span> <span class="nf">proxy</span><span class="p">(</span><span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">,</span> <span class="n">main_task</span><span class="p">,</span> <span class="n">source_sock</span><span class="p">,</span> <span class="n">addr</span><span class="p">):</span>
<span class="linenos">13</span> <span class="k">await</span> <span class="n">main_task</span><span class="o">.</span><span class="n">cancel</span><span class="p">()</span>
<span class="linenos">14</span> <span class="n">dest_sock</span> <span class="o">=</span> <span class="k">await</span> <span class="n">curio</span><span class="o">.</span><span class="n">open_connection</span><span class="p">(</span><span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">)</span>
<span class="linenos">15</span> <span class="k">async</span> <span class="k">with</span> <span class="n">dest_sock</span><span class="p">:</span>
<span class="linenos">16</span> <span class="k">await</span> <span class="n">copy_all</span><span class="p">(</span><span class="n">source_sock</span><span class="p">,</span> <span class="n">dest_sock</span><span class="p">)</span>
</pre></div>
<p>And <tt class="docutils literal">copy_all</tt>, finally, implements the core logic of a proxy:
copying data from one socket to another in a loop:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/curio-proxy.py" class="reference external">curio-proxy.py</a>
</div>
<div class="highlight"><pre><span></span><span class="linenos">18</span><span class="k">async</span> <span class="k">def</span> <span class="nf">copy_all</span><span class="p">(</span><span class="n">source_sock</span><span class="p">,</span> <span class="n">dest_sock</span><span class="p">):</span>
<span class="linenos">19</span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="linenos">20</span> <span class="n">data</span> <span class="o">=</span> <span class="k">await</span> <span class="n">source_sock</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="n">READ_SIZE</span><span class="p">)</span>
<span class="linenos">21</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span> <span class="c1"># EOF</span>
<span class="linenos">22</span> <span class="k">return</span>
<span class="linenos">23</span> <span class="k">await</span> <span class="n">dest_sock</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</pre></div>
<p>I hope we can all agree that in terms of readability, this is a huge
improvement over the callback-based version. Ignoring imports and the
<tt class="docutils literal">__main__</tt> boilerplate, we've gone from 67 lines of code down to 17
(four times shorter!), and the logic is now straightforward and
procedural. Instead of having to manually check for everything that
could go wrong and abort connections, we can just use <tt class="docutils literal">with</tt> blocks
and let exceptions propagate. That's the power of async/await.</p>
</div>
<div class="section" id="example-3-asyncio-with-async-await">
<h4><a class="toc-backref" href="#id8">Example #3: asyncio, with async/await</a></h4>
<p>We can also write this example using asyncio's async/await-based
"streams" API layer, and it looks very similar to the curio version:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/asyncio-proxy-streams.py" class="reference external">asyncio-proxy-streams.py</a>
</div>
<div class="highlight"><pre><span></span><span class="linenos"> 1</span><span class="kn">import</span> <span class="nn">sys</span>
<span class="linenos"> 2</span><span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">partial</span>
<span class="linenos"> 3</span><span class="kn">from</span> <span class="nn">contextlib</span> <span class="kn">import</span> <span class="n">closing</span>
<span class="linenos"> 4</span><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="linenos"> 5</span>
<span class="linenos"> 6</span><span class="n">READ_SIZE</span> <span class="o">=</span> <span class="mi">20000</span>
<span class="linenos"> 7</span>
<span class="linenos"> 8</span><span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="n">source_port</span><span class="p">,</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">):</span>
<span class="linenos"> 9</span> <span class="n">connect_event</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">Event</span><span class="p">()</span>
<span class="linenos">10</span> <span class="n">server_closed_event</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">Event</span><span class="p">()</span>
<span class="linenos">11</span> <span class="n">bound_cb</span> <span class="o">=</span> <span class="n">partial</span><span class="p">(</span><span class="n">proxy</span><span class="p">,</span>
<span class="linenos">12</span> <span class="n">loop</span><span class="p">,</span> <span class="n">connect_event</span><span class="p">,</span> <span class="n">server_done_event</span><span class="p">,</span>
<span class="linenos">13</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">)</span>
<span class="linenos">14</span> <span class="n">server</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">start_server</span><span class="p">(</span><span class="n">bound_cb</span><span class="p">,</span> <span class="s2">"localhost"</span><span class="p">,</span> <span class="n">source_port</span><span class="p">,</span>
<span class="linenos">15</span> <span class="n">loop</span><span class="o">=</span><span class="n">loop</span><span class="p">)</span>
<span class="linenos">16</span> <span class="k">await</span> <span class="n">connect_event</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="linenos">17</span> <span class="n">server</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="linenos">18</span> <span class="k">await</span> <span class="n">server</span><span class="o">.</span><span class="n">wait_closed</span><span class="p">()</span>
<span class="linenos">19</span> <span class="n">server_done_event</span><span class="o">.</span><span class="n">set</span><span class="p">()</span>
<span class="linenos">20</span>
<span class="linenos">21</span><span class="k">async</span> <span class="k">def</span> <span class="nf">proxy</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="n">connect_event</span><span class="p">,</span> <span class="n">server_closed_event</span><span class="p">,</span>
<span class="linenos">22</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">,</span>
<span class="linenos">23</span> <span class="n">source_reader</span><span class="p">,</span> <span class="n">source_writer</span><span class="p">):</span>
<span class="linenos">24</span> <span class="n">connect_event</span><span class="o">.</span><span class="n">set</span><span class="p">()</span>
<span class="linenos">25</span> <span class="k">try</span><span class="p">:</span>
<span class="linenos">26</span> <span class="k">with</span> <span class="n">closing</span><span class="p">(</span><span class="n">source_writer</span><span class="p">):</span>
<span class="linenos">27</span> <span class="n">tmp</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">open_connection</span><span class="p">(</span><span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">,</span> <span class="n">loop</span><span class="o">=</span><span class="n">loop</span><span class="p">)</span>
<span class="linenos">28</span> <span class="n">dest_reader</span><span class="p">,</span> <span class="n">dest_writer</span> <span class="o">=</span> <span class="n">tmp</span>
<span class="linenos">29</span> <span class="k">with</span> <span class="n">closing</span><span class="p">(</span><span class="n">dest_writer</span><span class="p">):</span>
<span class="linenos">30</span> <span class="k">await</span> <span class="n">copy_all</span><span class="p">(</span><span class="n">source_reader</span><span class="p">,</span> <span class="n">dest_writer</span><span class="p">)</span>
<span class="linenos">31</span> <span class="k">finally</span><span class="p">:</span>
<span class="linenos">32</span> <span class="k">await</span> <span class="n">server_done_event</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="linenos">33</span> <span class="n">loop</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
<span class="linenos">34</span>
<span class="linenos">35</span><span class="k">async</span> <span class="k">def</span> <span class="nf">copy_all</span><span class="p">(</span><span class="n">source_reader</span><span class="p">,</span> <span class="n">dest_writer</span><span class="p">):</span>
<span class="linenos">36</span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="linenos">37</span> <span class="n">data</span> <span class="o">=</span> <span class="k">await</span> <span class="n">source_reader</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">READ_SIZE</span><span class="p">)</span>
<span class="linenos">38</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span> <span class="c1"># EOF</span>
<span class="linenos">39</span> <span class="k">return</span>
<span class="linenos">40</span> <span class="n">dest_writer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="linenos">41</span>
<span class="linenos">42</span><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="linenos">43</span> <span class="k">try</span><span class="p">:</span>
<span class="linenos">44</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">3</span><span class="p">])]</span>
<span class="linenos">45</span> <span class="k">except</span> <span class="ne">Exception</span><span class="p">:</span>
<span class="linenos">46</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Usage: </span><span class="si">{}</span><span class="s2"> SOURCE_PORT DEST_HOST DEST_PORT"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="vm">__file__</span><span class="p">))</span>
<span class="linenos">47</span> <span class="k">else</span><span class="p">:</span>
<span class="linenos">48</span> <span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
<span class="linenos">49</span> <span class="n">loop</span><span class="o">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">main</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">))</span>
<span class="linenos">50</span> <span class="n">loop</span><span class="o">.</span><span class="n">run_forever</span><span class="p">()</span>
</pre></div>
<p>In particular, notice how the core <tt class="docutils literal">copy_all</tt> function here is
almost identical to the curio version, modulo some spelling
adjustments like <tt class="docutils literal">read</tt> versus <tt class="docutils literal">recv</tt>.</p>
<p id="curio-shutdown">There is one source of extra complexity that ends up making the core
logic here almost twice as long as in the curio version: the need to
figure out when everything has completed so that the event loop can be
safely shut down. In curio, the general rule is straightforward: the
event loop exits when all (<a class="reference external" href="https://curio.readthedocs.io/en/latest/reference.html#spawn">non-daemonic</a>) tasks
have finished. Here we have two tasks (<tt class="docutils literal">main</tt> and <tt class="docutils literal">proxy</tt>), so
when they're both done, the loop exits. Asyncio doesn't provide any
equivalent – we can use <tt class="docutils literal">run_until_complete</tt> to run the loop until
any one task exits, but this may leave arbitrary other tasks and
callbacks unfinished. Instead, our two tasks have to manually
coordinate to make sure that both have finished their work and cleaned
up before we call <tt class="docutils literal">loop.stop()</tt>, and this takes a bit of doing
(lines 9-10, 16-19, 24, 32-33).</p>
<p>There's also another difference that doesn't show up on the page: the
curio code pretty much directly does what it says, e.g. <tt class="docutils literal">await
sock.recv</tt> initiates a read from the socket and suspends until it
completes. The asyncio code, on the other hand, is written as a <em>layer
on top of the protocol/transport system we saw in the first
example</em>. Now you see why we needed that first example! In the asyncio
code, even though it doesn't look like we're using protocols, there's
still a <tt class="docutils literal">data_received</tt> callback somewhere that's stashing data in
an internal buffer where <tt class="docutils literal">source_reader.read</tt> can find it, and when
we call <tt class="docutils literal">dest_writer.write</tt> that ultimately turns into a call to a
<tt class="docutils literal">transport.write</tt> method, which schedules the writing to happen
off-screen. The idea is that this is something the end-user doesn't
have to think about; unfortunately, as we'll see, it doesn't quite
work out that way.</p>
<p>And finally, there's one more extremely important difference between
these examples: the asyncio protocols code has a showstopper bug. The
asyncio async/await code has the same showstopper bug, plus a less
important second bug... and a third, different, showstopper bug. Yet,
remarkably, the curio code – despite being shorter and easier to
understand – is correct as originally written.</p>
</div>
</div>
<div class="section" id="three-bugs">
<h3><a class="toc-backref" href="#id9">Three bugs</a></h3>
<div class="section" id="bug-1-backpressure">
<h4><a class="toc-backref" href="#id10">Bug #1: backpressure</a></h4>
<p>This bug affects both of the asyncio examples, but not the curio
example.</p>
<p>Imagine that our example code is being used to proxy between two
different networks that run at different speeds: data is arriving on
the incoming socket at 3 MB/s, but the outgoing socket is only
transmitting at 1 MB/s. What happens?</p>
<!-- No idea where the original source for this is – reverse image
source finds copies of it dozens of different sites -->
<div class="figure align-right">
<a class="reference external image-reference" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/lord-oom-huge.jpg"><img alt="Illustration of Lord OOM – looking much like Cthulhu – rising from the deep, while our program – looking much like a small elf in a boat – looks on in horror." src="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/lord-oom-small.jpg" style="width: 400px;" /></a>
<p class="caption">Artist's impression: Lord OOM rising from the deep and turning
its baleful gaze upon a containerized app.</p>
</div>
<p>Remember what we said about how <tt class="docutils literal">transport.write</tt> doesn't actually
send any data, but rather just adds it to a buffer to be sent later?
With the two asyncio examples shown above, that's exactly what will
happen: each second we'll add 3 MB of data to the buffer, while 1 MB
are removed, so on net our buffer will grow by 2 MB/s second until we
eventually wake the <a class="reference external" href="https://linux-mm.org/OOM_Killer">out-of-memory killer</a>. Of course, we probably won't
catch this in testing, so it'll make a nice 2 am surprise someday.</p>
<p>And even if we don't actually run out of memory, we'll introduce
potentially epic amounts of latency: after 10 seconds, we'll have
accumulated 20 MB in our buffer, which means that a byte that arrives
from the sender will sit in our process for 20 seconds before being
forwarded on to the receiver; after 10 minutes, well... you get the
idea. We are failing to apply <em>backpressure</em>, one of the cardinal sins
of network programming. The elements of a distributed system can't
function well if they aren't getting accurate feedback from their
peers – and even just a single TCP connection is already a complex
distributed system involving two userspaces, two kernel socket layers,
two kernel packet layers, and who knows how many routers and competing
flows. Backpressure is important.</p>
<p>For the asyncio protocols-based implementation, this can be fixed
through some judicious use of the <a class="reference external" href="https://docs.python.org/3/library/asyncio-protocol.html#flow-control-callbacks">flow control callbacks</a>
and <a class="reference external" href="https://docs.python.org/3/library/asyncio-protocol.html#asyncio.ReadTransport.pause_reading">flow control commands</a>. I'll
leave the details as an exercise for the reader, but basically we want
to call <tt class="docutils literal">pause_reading</tt> after each chunk is received, and then define
a <tt class="docutils literal">resume_writing</tt> callback that calls <tt class="docutils literal">resume_reading</tt>. (The
<a class="reference external" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/twisted-proxy.py">Twisted protocol implementation</a> also has
this bug, and can be fixed in a similar way, albeit using some
<a class="reference external" href="https://twistedmatrix.com/trac/ticket/8867">essentially undocumented APIs</a>; issues around
backpressure seem to be a <a class="reference external" href="https://twistedmatrix.com/pipermail/twisted-python/2016-August/030739.html">perennial challenge for Twisted</a>.)</p>
<p>For the asyncio streams-based implementation, <tt class="docutils literal">StreamReader</tt>
automatically uses the <tt class="docutils literal">{pause,resume}_reading</tt> methods to transmit
backpressure upstream, and <tt class="docutils literal">StreamWriter</tt> provides a friendly
wrapper around <tt class="docutils literal">{pause,resume}_writing</tt> to help us accept
backpressure from downstream: <a class="reference external" href="https://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamWriter.drain">the drain method</a>
– we just have to remember to use it. So in order to fix our proxy to
transmit backpressure, all we need to do is to add one line of code to
<tt class="docutils literal">copy_all</tt>. Specifically, this line:</p>
<!-- .. [#] (The reason we ``drain`` before calling ``read`` is that we
want to be sure that there's somewhere to put the data before we
accept it from upstream. We could also call ``drain`` just before
calling ``write``, but that would slightly increase our effective
buffering, and thus our latency, for no benefit.) -->
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/asyncio-proxy-streams-2.py" class="reference external">asyncio-proxy-streams-2.py</a>
</div>
<div class="highlight"><pre><span></span><span class="linenos">35</span><span class="k">async</span> <span class="k">def</span> <span class="nf">copy_all</span><span class="p">(</span><span class="n">source_reader</span><span class="p">,</span> <span class="n">dest_writer</span><span class="p">):</span>
<span class="linenos">36</span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="hll"><span class="linenos">37</span> <span class="k">await</span> <span class="n">dest_writer</span><span class="o">.</span><span class="n">drain</span><span class="p">()</span>
</span><span class="linenos">38</span> <span class="n">data</span> <span class="o">=</span> <span class="k">await</span> <span class="n">source_reader</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">READ_SIZE</span><span class="p">)</span>
<span class="linenos">39</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span> <span class="c1"># EOF</span>
<span class="linenos">40</span> <span class="k">return</span>
</pre></div>
<p>In curio, things are different, because of the critical <tt class="docutils literal">await</tt> that
was already present:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/curio-proxy.py" class="reference external">curio-proxy.py</a>
</div>
<div class="highlight"><pre><span></span><span class="linenos">18</span><span class="k">async</span> <span class="k">def</span> <span class="nf">copy_all</span><span class="p">(</span><span class="n">source_sock</span><span class="p">,</span> <span class="n">dest_sock</span><span class="p">):</span>
<span class="linenos">19</span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="linenos">20</span> <span class="n">data</span> <span class="o">=</span> <span class="k">await</span> <span class="n">source_sock</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="n">READ_SIZE</span><span class="p">)</span>
<span class="linenos">21</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span> <span class="c1"># EOF</span>
<span class="linenos">22</span> <span class="k">return</span>
<span class="hll"><span class="linenos">23</span> <span class="k">await</span> <span class="n">dest_sock</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</span></pre></div>
<p>In curio, there are no hidden buffers – <tt class="docutils literal">sendall</tt> doesn't return
until the OS has accepted the data to be sent. This has the effect of
automatically propagating backpressure, without our having to remember
to do anything. Our original example worked because in curio, it would
take actual effort to get this wrong.</p>
</div>
<div class="section" id="bug-2-read-side-buffering">
<h4><a class="toc-backref" href="#id11">Bug #2: read-side buffering</a></h4>
<p>This bug mostly affects the asyncio + async/await example, somewhat
affects the asyncio + callbacks example, and doesn't affect the curio
example.</p>
<p>We mentioned above how we should try to minimize buffering in order to
keep latency down. Unfortunately, there's another source of extraneous
buffering in the streams-based asyncio code, on the <tt class="docutils literal">StreamReader</tt>
side.</p>
<p>The asyncio internals can be difficult to follow, but I believe that
at a steady state with data coming in faster than it can be processed,
on Unix-like system, the user-space buffer in <tt class="docutils literal">StreamReader</tt> will
oscillate between 128 KiB and 384 KiB. (I calculated this as 128 KiB =
<a class="reference external" href="https://github.com/python/asyncio/blob/1d6f0ed1381537490cd91ffd5122c18526ac5bed/asyncio/streams.py#L423">twice</a>
the <a class="reference external" href="https://github.com/python/asyncio/blob/1d6f0ed1381537490cd91ffd5122c18526ac5bed/asyncio/streams.py#L22">_DEFAULT_LIMIT</a>;
384 Kib = 128 KiB + <a class="reference external" href="https://github.com/python/asyncio/blob/1d6f0ed1381537490cd91ffd5122c18526ac5bed/asyncio/unix_events.py#L318">256 KiB</a>.)
For our proxy example, this is pretty bad – it's pure "bufferbloat",
adding latency without adding any value. In our example with 1 MB/s
outgoing bandwidth, this buffer adds an extra ~130-400 ms of
steady-state latency for no good reason; on a slower connection
(e.g. mobile) this could easily become multiple seconds.</p>
<p>In the curio version, this problem doesn't arise, again because of
curio's lack of userspace buffering: curio doesn't issue any <tt class="docutils literal">recv</tt>
syscalls until you actually call <tt class="docutils literal">await sock.recv</tt>. This is really
the right thing to do, because reading from a socket is
(paradoxically!) a data <em>transmitting</em> action: after you call the
<tt class="docutils literal">recv</tt> syscall, the kernel will literally send out a packet to the
remote peer asking them to send more data ("opening the receive
window"). So you shouldn't call <tt class="docutils literal">recv</tt> until you really are ready
for more data, and you should always request exactly as much data as
you're ready to process, no more. Curio makes this natural.</p>
<p>As bugs go, <tt class="docutils literal">StreamReader</tt>'s buffering is not as severe as the
other two we discuss here – it won't cause crashes or data corruption,
and receive-side backpressure isn't as universal a concern as
send-side backpressure. (Everyone has to be prepared to handle a slow
client, but perhaps some programs can process incoming data so quickly
that they don't need to worry about fast clients.) Still, proxies are
one place where this matters, and if we were seriously implementing
some kind of proxy – like a SOCKS or HTTP proxy, or ssh
port-forwarding – then asyncio's streams layer is probably not a good
choice as it currently stands. (Working directly at the protocol layer
would be somewhat better, because the buffer is reduced, but it's
still not ideal in this respect – the curio style still gives us more
control over the receive window while being simpler to use.)</p>
<p>I don't have a fix to show here. This seems to be an intrinsic
limitation of the hybrid design strategy – because of the way the
async/await-centric <tt class="docutils literal">StreamReader</tt> is built on top of asyncio's
callback-centric protocol layer, it's not obvious how or whether
asyncio could switch to the just-in-time-<tt class="docutils literal">recv</tt> model.</p>
</div>
<div class="section" id="bug-3-closing-time">
<h4><a class="toc-backref" href="#id12">Bug #3: closing time</a></h4>
<p>There's a final showstopper bug in the asyncio streams example (only),
which is pretty obvious once you see it, but not so obvious how to
fix. Here's the core of the <tt class="docutils literal">proxy</tt> function again for
reference. After the incoming connection has been closed, <tt class="docutils literal">copy_all</tt>
returns and the <tt class="docutils literal">with closing(dest_writer)</tt> block calls
<tt class="docutils literal">dest_writer.close()</tt>, and then we stop the event loop:</p>
<div class="highlight"><pre><span></span><span class="k">try</span><span class="p">:</span>
<span class="k">with</span> <span class="n">closing</span><span class="p">(</span><span class="n">source_writer</span><span class="p">):</span>
<span class="n">tmp</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">open_connection</span><span class="p">(</span><span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">,</span> <span class="n">loop</span><span class="o">=</span><span class="n">loop</span><span class="p">)</span>
<span class="n">dest_reader</span><span class="p">,</span> <span class="n">dest_writer</span> <span class="o">=</span> <span class="n">tmp</span>
<span class="hll"> <span class="k">with</span> <span class="n">closing</span><span class="p">(</span><span class="n">dest_writer</span><span class="p">):</span>
</span> <span class="k">await</span> <span class="n">copy_all</span><span class="p">(</span><span class="n">source_reader</span><span class="p">,</span> <span class="n">dest_writer</span><span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
<span class="k">await</span> <span class="n">server_done_event</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="hll"> <span class="n">loop</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
</span></pre></div>
<p>The problem is that again, it's <tt class="docutils literal">dest_writer.close()</tt>, not <tt class="docutils literal">await
dest_writer.close()</tt> – we don't wait for the socket to actually
close, we just make a note to close the socket later, once the send
buffer has finished emptying out. But then we immediately stop the
event loop before that can happen (maybe – it's a race condition), so
some data will get dropped on the floor and lost. We need to wait for
the <tt class="docutils literal">close</tt> to complete before stopping the loop.</p>
<p>But how? Unless I've missed something, the <tt class="docutils literal">StreamWriter</tt> API
actually does not provide any mechanism for detecting when the stream
has been closed (!). But we might reason that since the <tt class="docutils literal">close</tt> is
delayed until all data has been written, we can trick the close into
happening promptly by draining the send buffer first:</p>
<div class="highlight"><pre><span></span><span class="k">try</span><span class="p">:</span>
<span class="k">with</span> <span class="n">closing</span><span class="p">(</span><span class="n">source_writer</span><span class="p">):</span>
<span class="n">tmp</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">open_connection</span><span class="p">(</span><span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">,</span> <span class="n">loop</span><span class="o">=</span><span class="n">loop</span><span class="p">)</span>
<span class="n">dest_reader</span><span class="p">,</span> <span class="n">dest_writer</span> <span class="o">=</span> <span class="n">tmp</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">await</span> <span class="n">copy_all</span><span class="p">(</span><span class="n">source_reader</span><span class="p">,</span> <span class="n">dest_writer</span><span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
<span class="hll"> <span class="k">await</span> <span class="n">dest_writer</span><span class="o">.</span><span class="n">drain</span><span class="p">()</span>
</span><span class="hll"> <span class="n">dest_writer</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span><span class="k">finally</span><span class="p">:</span>
<span class="k">await</span> <span class="n">server_done_event</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="n">loop</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
</pre></div>
<p>Unfortunately, this isn't enough. As the <a class="reference external" href="https://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamWriter.drain">docs warn us</a>,
<tt class="docutils literal">drain</tt> doesn't actually block until <em>all</em> data is written; it only
guarantees that the unwritten data is less than the "high water mark",
whose default is undocumented but currently appears to be 64 KiB, and
specifically tries to make sure that there's at <em>least</em> a "low water
mark" worth of unsent data (default 16 KiB). So adding the <tt class="docutils literal">drain</tt>
call makes this bug harder to hit, and it might seem to work in
testing (especially since you need a slow network connection to really
increase the odds, and who runs their tests over a slow network
connection?), but sooner or later we're going to randomly lose data.</p>
<p>To really fix the problem, we need to get rid of this high-water mark
thing. This can be done by calling
<tt class="docutils literal">transport.set_write_buffer_limits(0)</tt> on the underlying transport
object; then <tt class="docutils literal">drain</tt> will only return once the send buffer is
completely empty. Unfortunately, the only supported way to get access
to the transport object is to <a class="reference external" href="https://github.com/python/asyncio/blob/1d6f0ed1381537490cd91ffd5122c18526ac5bed/asyncio/streams.py#L66-L68">copy-paste</a>
the implementation of <tt class="docutils literal">asyncio.open_connection</tt> into our code, and
add the call there. (It's lucky we aren't implementing a
bi-directional proxy; if we were, then we'd also have to do this to
the implementation of <tt class="docutils literal">asyncio.start_server</tt> – in general, all of
the stream helper functions are seem to currently be unusable if you
need to be able to cleanly shut down a write stream in a protocol
where you have the last word.)</p>
<p>Is this safe, though? The documentation specifically <a class="reference external" href="https://docs.python.org/3/library/asyncio-protocol.html#asyncio.WriteTransport.set_write_buffer_limits">warns us not to do this</a>:
<em>"Use of zero for either limit is generally sub-optimal as it reduces
opportunities for doing I/O and computation concurrently."</em></p>
<p>Superficially, this warning seems to make a lot of sense. Our network
card can only send data at certain moments, and in a modern operating
system with pre-emptive multi-tasking, there's no way to guarantee
that our application will be ready to hand off a packet at the exact
moment when the network card wants it. So we need some sort of buffer
that we load up with data ahead of time, and that the network card can
read at its leisure. And if this buffer ever runs completely empty,
the network card will go idle, which would be bad and waste bandwidth
– so our application needs to wake up to refill the buffer <em>before</em>
it goes empty, to give us a bit of a cushion to keep things going
while we're getting the new data ready. That's the concurrent
computation and I/O that the docs are referring to, and
low-water/high-water logic provides this cushion.</p>
<p>But... while I'm not an expert on networking (certainly not like
<a class="reference external" href="https://www.python.org/dev/peps/pep-3156/#acknowledgments">these folks</a>!), and
this stuff can be hella subtle... I'm pretty sure this reasoning is
all completely wrong. Because the thing is... we already have a buffer
that does all that stuff: the kernel's socket send buffer. The
<tt class="docutils literal">send</tt> syscall doesn't actually write packets to the network – it
just enqueues our data into a buffer inside the kernel, where the
lower-level networking stack knows to look for them when it wants
them. <tt class="docutils literal">select</tt> and friends don't wait for the send buffer to be
empty before marking the socket writeable – they implement some
low-water/high-water logic. And on modern systems, the kernel will
even do fancy stuff like automatically tuning the buffer size
depending on the speed of the connection and the amount of memory
pressure the system is under. Plus, for various reasons the kernel
buffer is usually <a class="reference external" href="https://github.com/dabeaz/curio/issues/83#issuecomment-254073052">too big</a>. Adding
a second user-space send buffer on top of this seems entirely
superfluous. (Especially since the way asynchronous I/O works,
whenever our application is blocked on the CPU then it means we aren't
running through the event loop and letting it hand off data from the
userspace buffer to the kernel. So how would this concurrent I/O and
computation even work?)</p>
<p>As far as I can tell, the historical reason the asyncio userspace send
buffer exists has nothing to do with performance. It's a necessary
evil motivated purely by the need to let <tt class="docutils literal">transport.write</tt> be
non-blocking, so that callback-based programming becomes less
painful. The asyncio low-watermark/high-watermark logic seems to be
based on it then being confused for a different kind of buffer. So the
docs are wrong: for optimal performance the watermarks should <em>always</em>
be set to zero, to reduce bufferbloat and let the kernel buffer do its
job.</p>
<p>Okay, phew. Having dealt with that, are we finally finished? Not
quite.</p>
<p>We can now be confident that all our data will be transmitted to the
outgoing socket, but it's still possible that the socket itself will
remain open at the time our event loop exits. This is a bit of a
nit-picky point. In our example, it actually doesn't matter, because
as soon as the event loop exits then the program exits as well, and
that will close down the socket. But in general this can be
important. For example, in your test suite you should probably be
running each test with its own isolated event loop, to make sure that
different tests can't interfere with each other – but if each event
loop you shut down leaves behind some dangling resources, multiply
that by the number of tests you run and you might have a
problem. Twisted's unit test framework goes to great lengths to <a class="reference external" href="https://jml.io/pages/how-to-disconnect-in-twisted-really.html">annoy
people into getting this right</a>.</p>
<p>So, let's assume we'd like to actually, finally, for real, close our
socket before we stop the event loop. We're calling <tt class="docutils literal">close</tt>. What
else do we need? Well, if we grovel around in the asyncio source for
long enough, we discover that <tt class="docutils literal">StreamWriter.close</tt> calls
<tt class="docutils literal">transport.close</tt>, but <tt class="docutils literal">transport.close</tt> doesn't actually close
the socket. Instead, it <a class="reference external" href="https://github.com/python/asyncio/blob/1d6f0ed1381537490cd91ffd5122c18526ac5bed/asyncio/selector_events.py#L622">schedules a call</a>
to <tt class="docutils literal">_call_connection_lost</tt> <em>on the next event loop iteration</em>, and
<a class="reference external" href="https://github.com/python/asyncio/blob/1d6f0ed1381537490cd91ffd5122c18526ac5bed/asyncio/selector_events.py#L664">that's</a>
what actually closes the socket. So if we want to actually close the
socket, we have to yield to the event loop between our call to
<tt class="docutils literal">dest_stream.close()</tt> and our call to <tt class="docutils literal">loop.stop()</tt>. I believe
that yielding once should be enough to make this happen
deterministically, since even if on the next iteration we get resumed
first and call <tt class="docutils literal">loop.stop()</tt> before the socket has closed, the event
loop should still finish calling all currently-scheduled callbacks
before it actually stops. I think. So long as those internal details
don't change.</p>
<p>I also wonder what about what happens if some error is detected during
socket shutdown, and how that's supposed to propagate out through this
API.</p>
<p>Anyway.</p>
<p>In conclusion, here's a final version of our simple asyncio
streams-based proxy server. This version still has extraneous
buffering on the receive side, but it now does transmit back-pressure
and (I think) reliably and correctly cleans up after itself. The lines
that had to be added/changed are highlighted:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/asyncio-proxy-streams-3.py" class="reference external">asyncio-proxy-streams-3.py</a>
</div>
<div class="highlight"><pre><span></span><span class="linenos"> 1</span><span class="kn">import</span> <span class="nn">sys</span>
<span class="linenos"> 2</span><span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">partial</span>
<span class="linenos"> 3</span><span class="kn">from</span> <span class="nn">contextlib</span> <span class="kn">import</span> <span class="n">closing</span>
<span class="linenos"> 4</span><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="linenos"> 5</span>
<span class="linenos"> 6</span><span class="n">READ_SIZE</span> <span class="o">=</span> <span class="mi">20000</span>
<span class="linenos"> 7</span>
<span class="hll"><span class="linenos"> 8</span><span class="c1"># Contains code derived (and modified) from the asyncio library, which is</span>
</span><span class="hll"><span class="linenos"> 9</span><span class="c1"># distributed under the Apache 2 license:</span>
</span><span class="hll"><span class="linenos">10</span><span class="c1"># https://github.com/python/asyncio/blob/master/COPYING</span>
</span><span class="hll"><span class="linenos">11</span><span class="c1"># asyncio is copyright its authors:</span>
</span><span class="hll"><span class="linenos">12</span><span class="c1"># https://github.com/python/asyncio/blob/master/AUTHORS</span>
</span><span class="hll"><span class="linenos">13</span><span class="nd">@asyncio</span><span class="o">.</span><span class="n">coroutine</span>
</span><span class="hll"><span class="linenos">14</span><span class="k">def</span> <span class="nf">fixed_open_connection</span><span class="p">(</span><span class="n">host</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span>
</span><span class="hll"><span class="linenos">15</span> <span class="n">loop</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">limit</span><span class="o">=</span><span class="mi">65536</span><span class="p">,</span> <span class="o">**</span><span class="n">kwds</span><span class="p">):</span>
</span><span class="hll"><span class="linenos">16</span> <span class="k">if</span> <span class="n">loop</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
</span><span class="hll"><span class="linenos">17</span> <span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
</span><span class="hll"><span class="linenos">18</span> <span class="n">reader</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">StreamReader</span><span class="p">(</span><span class="n">limit</span><span class="o">=</span><span class="n">limit</span><span class="p">,</span> <span class="n">loop</span><span class="o">=</span><span class="n">loop</span><span class="p">)</span>
</span><span class="hll"><span class="linenos">19</span> <span class="n">protocol</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">StreamReaderProtocol</span><span class="p">(</span><span class="n">reader</span><span class="p">,</span> <span class="n">loop</span><span class="o">=</span><span class="n">loop</span><span class="p">)</span>
</span><span class="hll"><span class="linenos">20</span> <span class="n">transport</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="k">yield from</span> <span class="n">loop</span><span class="o">.</span><span class="n">create_connection</span><span class="p">(</span>
</span><span class="hll"><span class="linenos">21</span> <span class="k">lambda</span><span class="p">:</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">host</span><span class="p">,</span> <span class="n">port</span><span class="p">,</span> <span class="o">**</span><span class="n">kwds</span><span class="p">)</span>
</span><span class="hll"><span class="linenos">22</span> <span class="c1">###### Following line added to fix buffering issues:</span>
</span><span class="hll"><span class="linenos">23</span> <span class="n">transport</span><span class="o">.</span><span class="n">set_write_buffer_limits</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span><span class="hll"><span class="linenos">24</span> <span class="c1">######</span>
</span><span class="hll"><span class="linenos">25</span> <span class="n">writer</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">StreamWriter</span><span class="p">(</span><span class="n">transport</span><span class="p">,</span> <span class="n">protocol</span><span class="p">,</span> <span class="n">reader</span><span class="p">,</span> <span class="n">loop</span><span class="p">)</span>
</span><span class="hll"><span class="linenos">26</span> <span class="k">return</span> <span class="n">reader</span><span class="p">,</span> <span class="n">writer</span>
</span><span class="linenos">27</span>
<span class="linenos">28</span><span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="n">source_port</span><span class="p">,</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">):</span>
<span class="linenos">29</span> <span class="n">connect_event</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">Event</span><span class="p">()</span>
<span class="linenos">30</span> <span class="n">server_closed_event</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">Event</span><span class="p">()</span>
<span class="linenos">31</span> <span class="n">bound_cb</span> <span class="o">=</span> <span class="n">partial</span><span class="p">(</span><span class="n">proxy</span><span class="p">,</span>
<span class="linenos">32</span> <span class="n">loop</span><span class="p">,</span> <span class="n">connect_event</span><span class="p">,</span> <span class="n">server_closed_event</span><span class="p">,</span>
<span class="linenos">33</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">)</span>
<span class="linenos">34</span> <span class="n">server</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">start_server</span><span class="p">(</span><span class="n">bound_cb</span><span class="p">,</span> <span class="s2">"localhost"</span><span class="p">,</span> <span class="n">source_port</span><span class="p">,</span>
<span class="linenos">35</span> <span class="n">loop</span><span class="o">=</span><span class="n">loop</span><span class="p">)</span>
<span class="linenos">36</span> <span class="k">await</span> <span class="n">connect_event</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="linenos">37</span> <span class="n">server</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="linenos">38</span> <span class="k">await</span> <span class="n">server</span><span class="o">.</span><span class="n">wait_closed</span><span class="p">()</span>
<span class="linenos">39</span> <span class="n">server_closed_event</span><span class="o">.</span><span class="n">set</span><span class="p">()</span>
<span class="linenos">40</span>
<span class="linenos">41</span><span class="k">async</span> <span class="k">def</span> <span class="nf">proxy</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="n">connect_event</span><span class="p">,</span> <span class="n">server_closed_event</span><span class="p">,</span>
<span class="linenos">42</span> <span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">,</span>
<span class="linenos">43</span> <span class="n">source_reader</span><span class="p">,</span> <span class="n">source_writer</span><span class="p">):</span>
<span class="linenos">44</span> <span class="n">connect_event</span><span class="o">.</span><span class="n">set</span><span class="p">()</span>
<span class="linenos">45</span> <span class="k">try</span><span class="p">:</span>
<span class="linenos">46</span> <span class="k">with</span> <span class="n">closing</span><span class="p">(</span><span class="n">source_writer</span><span class="p">):</span>
<span class="linenos">47</span> <span class="n">tmp</span> <span class="o">=</span> <span class="k">await</span> <span class="n">fixed_open_connection</span><span class="p">(</span><span class="n">dest_host</span><span class="p">,</span> <span class="n">dest_port</span><span class="p">,</span> <span class="n">loop</span><span class="o">=</span><span class="n">loop</span><span class="p">)</span>
<span class="linenos">48</span> <span class="n">dest_reader</span><span class="p">,</span> <span class="n">dest_writer</span> <span class="o">=</span> <span class="n">tmp</span>
<span class="hll"><span class="linenos">49</span> <span class="k">try</span><span class="p">:</span>
</span><span class="linenos">50</span> <span class="k">await</span> <span class="n">copy_all</span><span class="p">(</span><span class="n">source_reader</span><span class="p">,</span> <span class="n">dest_writer</span><span class="p">)</span>
<span class="hll"><span class="linenos">51</span> <span class="k">finally</span><span class="p">:</span>
</span><span class="hll"><span class="linenos">52</span> <span class="k">await</span> <span class="n">dest_writer</span><span class="o">.</span><span class="n">drain</span><span class="p">()</span>
</span><span class="hll"><span class="linenos">53</span> <span class="n">dest_writer</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span><span class="hll"><span class="linenos">54</span> <span class="c1"># To let the socket actually close</span>
</span><span class="hll"><span class="linenos">55</span> <span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">loop</span><span class="o">=</span><span class="n">loop</span><span class="p">)</span>
</span><span class="linenos">56</span> <span class="k">finally</span><span class="p">:</span>
<span class="linenos">57</span> <span class="k">await</span> <span class="n">server_closed_event</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="linenos">58</span> <span class="n">loop</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span>
<span class="linenos">59</span>
<span class="linenos">60</span><span class="k">async</span> <span class="k">def</span> <span class="nf">copy_all</span><span class="p">(</span><span class="n">source_reader</span><span class="p">,</span> <span class="n">dest_writer</span><span class="p">):</span>
<span class="linenos">61</span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="hll"><span class="linenos">62</span> <span class="k">await</span> <span class="n">dest_writer</span><span class="o">.</span><span class="n">drain</span><span class="p">()</span>
</span><span class="linenos">63</span> <span class="n">data</span> <span class="o">=</span> <span class="k">await</span> <span class="n">source_reader</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">READ_SIZE</span><span class="p">)</span>
<span class="linenos">64</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span> <span class="c1"># EOF</span>
<span class="linenos">65</span> <span class="k">return</span>
<span class="linenos">66</span> <span class="n">dest_writer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="linenos">67</span>
<span class="linenos">68</span><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="linenos">69</span> <span class="k">try</span><span class="p">:</span>
<span class="linenos">70</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">3</span><span class="p">])]</span>
<span class="linenos">71</span> <span class="k">except</span> <span class="ne">Exception</span><span class="p">:</span>
<span class="linenos">72</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Usage: </span><span class="si">{}</span><span class="s2"> SOURCE_PORT DEST_HOST DEST_PORT"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="vm">__file__</span><span class="p">))</span>
<span class="linenos">73</span> <span class="k">else</span><span class="p">:</span>
<span class="linenos">74</span> <span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
<span class="linenos">75</span> <span class="n">loop</span><span class="o">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">main</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">))</span>
<span class="linenos">76</span> <span class="n">loop</span><span class="o">.</span><span class="n">run_forever</span><span class="p">()</span>
</pre></div>
<!-- buffering system to let callback-based abstraction work without
requiring too many cumbersome callbacks
flow control system to manage the buffer
invoking the flow control system
but disable part of it -->
</div>
</div>
</div>
<div class="section" id="c-c-c-c-causality-breaker">
<span id="causality"></span><h2><a class="toc-backref" href="#id13">C-c-c-c-causality breaker</a></h2>
<p>Clearly, something, somewhere, has gone wrong. What happened? Some of
these issues with the asyncio streams-based API seem like
easily-fixable oversights (but why are there so many?); others seem to
point to something deeper. It's obviously not that the asyncio
developers are just "bad at their jobs" or something – compared to
curio, asyncio has more developers, with (as far as I can tell) a
higher average degree of network programming experience, and has been
put through considerably more real-world scrutiny and usage. So why
does curio perform so much better in this comparison? Is there a
generalization behind all these weird little problems?</p>
<p>I think so. I propose that the difference is that curio follows one of
the core principles of an async/await-native API, which is that it
should <em>respect causality</em>. Which is a term I just made up. But what I
mean by this is pretty basic: in Python, normally, if we write <tt class="docutils literal"><span class="pre">f();</span>
g()</tt>, then we know that <tt class="docutils literal">g</tt> won't start executing until after <tt class="docutils literal">f</tt>
has finished. If this is true then we say that <tt class="docutils literal">f</tt> "respects
causality". Causality is the fundamental property you rely on when
writing imperative code, and Python is an imperative language.</p>
<p id="glyph-unyielding">In Glyph's famous blog post <a class="reference external" href="https://glyph.twistedmatrix.com/2014/02/unyielding.html">Unyielding</a>, he makes
the point that if you have N logical threads concurrently executing a
routine with Y yield points, then there are N**Y possible execution
orders that you have to hold in your head. His point is that you can
reduce this complexity by using cooperative threading like callbacks
or async/await (= small Y) instead of pre-emptive threading (= large
Y).</p>
<p>Taking this further: Every time we schedule a callback – every time we
call <tt class="docutils literal">Future.add_done_callback</tt> or <tt class="docutils literal">transport.write</tt> or
<tt class="docutils literal">loop.call_later</tt> or <tt class="docutils literal">loop.add_reader</tt> or ... – then what we're
implicitly doing is spawning a <em>new logical thread of
execution</em>. Callback-based code has small Y, but large N. Which is
another way of saying: traditional callback APIs show a flagrant
disregard for causality. And this has infested even the async/await
parts of asyncio. Most of our problems above happened because we
started doing <tt class="docutils literal">g</tt> (reading the next chunk of data, shutting down the
event loop, ...) while <tt class="docutils literal">f</tt> (writing the previous chunk of data,
closing the socket, ...) looked like it had finished but was actually
still going.</p>
<p>Curio is different: <em>every</em> operation in curio is causal, except for
the explicit concurrency-spawning primitives <tt class="docutils literal">curio.spawn</tt> and
<tt class="docutils literal">curio.run_in_{thread,process,executor}</tt> [<strong>Edit</strong>: actually, I had
this wrong – <tt class="docutils literal">run_in_{thread,process,executor}</tt> are actually causal
as well!] Curio-style code has small Y <em>and</em> small N.</p>
<p>When it comes to reasoning about code, implicit concurrency
is... unhelpful. And callbacks by their nature are all about implicit
concurrency.</p>
<!-- the still image:
https://frinkiac.com/meme/S06E21/805420/m/IElOIFRISVMgW0FQSV0sIFdFIE9CRVkgVEhFCiBMQVdTIE9GIFtDQVVTQUxJVFldIQoKCgo= -->
<div class="figure align-right">
<a class="reference external image-reference" href="https://frinkiac.com/gif/S06E21/804369/805420/IElOIFRISVMgW0FQSV0sIFdFIE9CRVkgVEhFCiBMQVdTIE9GIFtDQVVTQUxJVFldIQoKCg=="><img alt="Still image from The Simpsons, with Homer saying: "In this API, we obey the laws of causality!"" src="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/causality-still.jpg" style="width: 350px;" /></a>
<p class="caption">Homer lays down the law.</p>
</div>
<p>Of course it's possible to write causal APIs in asyncio, and
non-causal APIs in Curio. But the underlying platform defaults have a
major influence on what kind of APIs you'll tend to write, because
causality is compositional: if some function is implemented using only
causal subroutines, then it will necessarily also be causal. But if an
API calls non-causal subroutines internally, then it will also be
non-causal, unless it takes some explicit action to recover causality
– which can be quite difficult.</p>
<p>For example, here's the current implementation of
<tt class="docutils literal">asyncio.StreamWriter.write</tt>, which lives on the async/await layer
but inherits its non-causality from the callback-layer
<tt class="docutils literal">transport.write</tt>:</p>
<div class="highlight"><pre><span></span><span class="c1"># Actual asyncio.StreamWriter.write</span>
<span class="k">def</span> <span class="nf">write</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</pre></div>
<p>We could (try to) make it causal by explicitly waiting for the write
to complete before returning:</p>
<div class="highlight"><pre><span></span><span class="c1"># Hypothetical improved StreamWriter.write</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">write</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_transport</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="hll"> <span class="k">await</span> <span class="bp">self</span><span class="o">.</span><span class="n">drain</span><span class="p">()</span>
</span></pre></div>
<p>(But note that this version is still non-causal in the presence of
cancellation – see <a class="reference internal" href="#timeouts-and-cancellation">Timeouts and cancellation</a> below.)</p>
<p>Similarly in curio we <em>could</em> define a non-causal write function,
equivalent to the asyncio version above:</p>
<div class="highlight"><pre><span></span><span class="c1"># Curio equivalent to StreamWriter.write</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">sendall_in_child_task</span><span class="p">(</span><span class="n">sock</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="k">await</span> <span class="n">curio</span><span class="o">.</span><span class="n">spawn</span><span class="p">(</span><span class="n">sock</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
</pre></div>
<p>We can even define a version that spawns a child task and then waits
for it, more-or-less equivalent to our hypothetical improved
<tt class="docutils literal">StreamWriter.write</tt>:</p>
<div class="highlight"><pre><span></span><span class="c1"># Curio equivalent to the improved StreamWriter.write</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">sendall_in_child_task</span><span class="p">(</span><span class="n">sock</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="n">task</span> <span class="o">=</span> <span class="k">await</span> <span class="n">curio</span><span class="o">.</span><span class="n">spawn</span><span class="p">(</span><span class="n">sock</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
<span class="k">await</span> <span class="n">task</span><span class="o">.</span><span class="n">join</span><span class="p">()</span>
</pre></div>
<p>But this would be extremely weird. Curio's decision to eliminate
implicit concurrency and force all concurrency to start from an
explicit <tt class="docutils literal">await <span class="pre">curio.spawn(...)</span></tt> makes concurrency a rare thing
that you think about before using. It also has some more subtle
consequences that together mean that when you do need concurrency,
it's easier to manage:</p>
<ul class="simple">
<li><tt class="docutils literal">curio.spawn</tt> is an async function, which means that
<a class="reference external" href="https://lukasa.co.uk/2016/07/The_Function_Colour_Myth/">synchronous-colored</a> functions
cannot call it. Therefore, synchronous-colored functions are
<em>always</em> causal – when a synchronous function call returns, we know
its work has finished. Compare this to traditional callback-based
APIs, where it's common to write innocent-looking code like <tt class="docutils literal"><span class="pre">f();</span>
g()</tt> but where the actual execution of <tt class="docutils literal">f</tt> and <tt class="docutils literal">g</tt> ends up
overlapping. In curio you'd have to at least write <tt class="docutils literal">await <span class="pre">f();</span>
g()</tt>.</li>
<li>In asyncio, there are many different representations of logically
concurrent threads of execution – <tt class="docutils literal">loop.add_reader</tt> callbacks,
<tt class="docutils literal">asyncio.Task</tt> callstacks, <tt class="docutils literal">Future</tt> callbacks, etc. In curio,
there is only one kind of object that can represent a logical thread
– <tt class="docutils literal">curio.Task</tt> – and this allows us to handle them in a uniform
way. We'll see below that this greatly simplifies <a class="reference internal" href="#event-loop-lifecycle-management">Event loop
lifecycle management</a> and <a class="reference internal" href="#context-passing-task-local-storage">Context passing / task-local storage</a>.</li>
<li>Because the spawn code is routed through <tt class="docutils literal">await</tt>, the event loop
always knows not just what child task is being spawned, but which
parent task is doing the spawning (i.e., it's whichever one emitted
the magic <tt class="docutils literal">yield</tt>). Currently curio uses this to propagate
task-local context from parent tasks to child tasks; in the future
it could potentially track and expose these relationships to allow
for powerful operations like "cancel this task <em>and all of its child
tasks, recursively</em>". I'm not sure if being able to explicitly
reason about and manipulate trees of worker tasks like this will
ultimately turn out to be useful, but <a class="reference external" href="http://erlang.org/documentation/doc-4.9.1/doc/design_principles/sup_princ.html">it opens up interesting
possibilities</a>.</li>
</ul>
<p>But of course main advantage of curio's causal-by-default approach is
that we can write straightforward imperative code like our example
proxy server and get something that works. These are extra benefits on
top of that.</p>
<div class="section" id="who-needs-causality-really">
<h3><a class="toc-backref" href="#id14">Who needs causality, really?</a></h3>
<p>I'm throwing around these fancy terms like "causality", but does it
actually matter in the real world? Here I'll take what we learned from
studying the toy proxy example, and see how some production codebases
handle these issues.</p>
<div class="section" id="http-servers">
<h4><a class="toc-backref" href="#id15">HTTP servers</a></h4>
<p>Here's a simple stress test for an HTTP server: send it lots of GET
requests, without ever reading the responses:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/constipation-attacks/get-flood.py" class="reference external">constipation-attacks/get-flood.py</a>
</div>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">sys</span><span class="o">,</span> <span class="nn">socket</span>
<span class="n">host</span><span class="p">,</span> <span class="n">port</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="k">with</span> <span class="n">socket</span><span class="o">.</span><span class="n">create_connection</span><span class="p">((</span><span class="n">host</span><span class="p">,</span> <span class="n">port</span><span class="p">))</span> <span class="k">as</span> <span class="n">sock</span><span class="p">:</span>
<span class="n">get</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">"GET / HTTP/1.1</span><span class="se">\r\n</span><span class="s2">Host: "</span> <span class="o">+</span> <span class="n">host</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"ascii"</span><span class="p">)</span> <span class="o">+</span> <span class="sa">b</span><span class="s2">"</span><span class="se">\r\n\r\n</span><span class="s2">"</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">sock</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">get</span><span class="p">)</span>
</pre></div>
<p>It turns out that if you point this at a server running twisted.web,
then our client will upload a few megabytes of data and <a class="reference external" href="https://twistedmatrix.com/trac/ticket/8868">then the
server will crash</a>. (Here's an <a class="reference external" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/constipation-attacks/twisted-server.py">example
server</a> in case you
want to try at home; make sure to hit control-C before your laptop
starts Death Swapping.) This is a backpressure bug: the server reads
the first GET, generates the response, and writes that to its send
buffer. Where it sits, because the client isn't reading. But the
server isn't paying attention to the buffer, so as far as it's
concerned the data has been sent, and it goes on to process the next
request, generates another response, and appends that to the send
buffer. Repeat until the send buffer swallows all available memory and
the server falls over. We shouldn't exactly panic over this – this
just is a denial-of-service attack, and it's impossible to fully
defend against DoS. What makes this a bit embarrassing is the degree
of amplification: if there's some URL on a server that returns, say, a
100 KB response body, then each ~1 MB of upload from a single client
will permanently swallow ~2 GB of memory.</p>
<p>Aiohttp (<a class="reference external" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/constipation-attacks/aiohttp-server.py">example server</a>) is more robust
against this particular attack – it will also crash eventually, but
doesn't show the severe amplification. The reason is that aiohttp
calls <tt class="docutils literal">StreamWriter.drain</tt> after processing each request, so the
send buffer can't grow to unbounded sizes. The reason it still crashes
eventually is a bit weird: it <a class="reference external" href="https://github.com/KeepSafe/aiohttp/issues/1368">never applies back-pressure on the
receive side</a>,
even though that's the case that <tt class="docutils literal">StreamReader</tt> will actually handle
for you automatically... but aiohttp turns out to use the protocol API
instead and its own ad hoc buffering code, which happily accepts and
queues infinitely many request bodies without ever pushing back. So
the amplification factor here is just 1x – this is a bug, but a
relatively minor one. If an attacker is willing to upload a few
gigabytes of data to an aiohttp server, they can crash it – but there
are probably other things an attacker can do with a few gigabytes of
data that are even worse (e.g. opening tens of thousands of
connections), so meh.</p>
<p>On the other hand, remember the problems we had with data being lost
at shutdown? It turns out that aiohttp has a "graceful shutdown" mode,
which gives all current connections time to finish before exiting –
but <a class="reference external" href="https://github.com/KeepSafe/aiohttp/issues/1369">as far as I can tell</a> this uses
<tt class="docutils literal">StreamWriter.close</tt> and doesn't disable the low-water mark stuff,
so it actually has no way to know when the connections have finished
closing. I haven't verified this experimentally, but I strongly
suspect that the "graceful" shutdown is randomly chopping off some of
those connections before they're finished.</p>
<p>If we turn to look at the <a class="reference external" href="https://github.com/njsmith/h11/blob/7f8f57f2b9c91475e2403d784dd2721e53fdd2fa/examples/curio-server.py">the toy curio-based HTTP server that I
wrote for some docs</a>
(yes, this says something about the relative maturity of the
twisted/asyncio/curio ecosystems), then it avoids all of these
problems. (Well, technically, it doesn't implement graceful shutdown,
but if it did then it wouldn't run into the <tt class="docutils literal">StreamWriter.close</tt>
bugs.) Now, you've probably figured out by now that I'm the kind of
paranoid person who worries about these things, so I won't pretend
that I didn't think about this at all while writing that code. But
when I thought about it, I realized that with curio, the most obvious
naive implementation actually was correct, so I didn't need to do
anything special.</p>
<p>With twisted and asyncio, you have to do something special. And
everyone makes mistakes.</p>
</div>
<div class="section" id="websocket-servers">
<h4><a class="toc-backref" href="#id16">Websocket servers</a></h4>
<p>Consider a websocket server that accepts connections and then sends
the client an ongoing, infinite stream of messages. This is a pretty
common configuration – examples would include IRC-style chat apps, or
a live twitter feed viewer. Now imagine what happens if we connect to
such a websocket and then go idle, never reading:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/constipation-attacks/constipate-websocket.py" class="reference external">constipation-attacks/constipate-websocket.py</a>
</div>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">sys</span><span class="o">,</span> <span class="nn">asyncio</span><span class="o">,</span> <span class="nn">aiohttp</span>
<span class="n">URL</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">constipate</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
<span class="n">session</span> <span class="o">=</span> <span class="n">aiohttp</span><span class="o">.</span><span class="n">ClientSession</span><span class="p">()</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">session</span><span class="o">.</span><span class="n">ws_connect</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">as</span> <span class="n">ws</span><span class="p">:</span>
<span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">100000000</span><span class="p">)</span>
<span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">constipate</span><span class="p">(</span><span class="n">URL</span><span class="p">))</span>
</pre></div>
<p>A naive server that doesn't respond to backpressure will keep trying
to send us messages, buffer them in memory, and eventually crash. This
attack effectively has an amplification factor of infinity: all we
have to do is send two packets to set up the connection, and then we
can walk away while the server slowly leaks to death. In fact, this
can easily happen by accident when a client establishes a connection
and then crashes or otherwise goes offline.</p>
<p>Do any Python websocket servers implement this naive buffering
algorithm? As far as I can tell, yes, by default, they all do. In
fact, they all use a very similar API for sending messages that makes
it difficult to avoid: they provide some kind of synchronous-colored
<tt class="docutils literal">write_message</tt> method that queues a message and then returns
immediately, and it causes the same kind of troubles that we had with
asyncio's synchronous-colored <tt class="docutils literal">write</tt> method above. [<strong>Edit:</strong> it
turns out there's an exception that I missed – the <a class="reference external" href="https://pypi.org/pypi/websockets">websockets</a> package provides exactly the API
that I advocate. Awesome!]</p>
<p>More specifically:</p>
<ul class="simple">
<li>aiohttp (<a class="reference external" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/constipation-attacks/aiohttp-websocket.py">example server</a>): Doesn't
respond to backpressure from slow clients by default. AFAICT <a class="reference external" href="https://github.com/KeepSafe/aiohttp/issues/1367">has no
API for doing so</a>, so handling
the disappearing client case is currently not possible without
somehow changing the protocol use itself. The API for receiving
messages does use <tt class="docutils literal">await</tt> and thus should be able to transmit
backpressure upstream to clients who send too fast (though I haven't
tested whether this works myself).</li>
<li>autobahn (<a class="reference external" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/constipation-attacks/autobahn-websocket.py">example server</a>): Doesn't
respond to or transmit backpressure by default. When running on
twisted, then it is <a class="reference external" href="https://github.com/crossbario/autobahn-python/blob/224370cd9dda312fc0583b61ed416b3f4d0e00d0/autobahn/twisted/websocket.py#L174-L183">possible to get backpressure notifications for
slow clients</a>
using the twisted API I mentioned earlier. When running on asyncio,
<a class="reference external" href="https://github.com/crossbario/autobahn-python/blob/224370cd9dda312fc0583b61ed416b3f4d0e00d0/autobahn/asyncio/websocket.py#L185-L186">this seems to be unimplemented</a>,
so responding to backpressure is impossible. Doesn't have any API
for transmitting backpressure upstream to clients who are sending
data too quickly.</li>
<li>tornado (<a class="reference external" href="constipation-attacks/tornado-websocket.py">example server</a>): Doesn't respond to
or transmit backpressure by default. Has APIs available that allow a
<a class="reference external" href="http://python-tornado.narkive.com/rO4TvIxL/websockets-and-buffering">sufficiently dedicated</a>
programmer to implement both upstream and downstream
backpressure. (Tornado FWIW is also immune to the GET flood attack
described above – they've clearly put a lot of thought into these
kinds of issues.)</li>
</ul>
<p>Even for the libraries where handling websocket backpressure is
possible in theory, I couldn't find any mention of this in their
examples, so it's doubtful that there are many deployments that
actually take advantage of this.</p>
<p>My goal isn't to shame the authors of these packages – obviously
they're helping people solve real problems right now, and all I have
to offer is some hypothetically-improved vaporware! Rather, I want to
point out that there's a whole set of nasty edge-cases that are very
difficult to handle when using the conventions of traditional
callback-based APIs. But if these servers had exposed a
causality-respecting API for websockets – one with methods like
<tt class="docutils literal">async def receive_message</tt> and <tt class="docutils literal">async def write_message</tt> – then
these issues would simply go away. It's very rare that we can solve
genuinely hard problems like this just by changing some API
conventions – we should be excited!</p>
<p>The tornado API is particularly instructive, because it's <em>so close</em>
to what I recommend: their <tt class="docutils literal">write_message</tt> method <a class="reference external" href="http://www.tornadoweb.org/en/stable/websocket.html#tornado.websocket.WebSocketHandler.write_message">returns a future</a>.
This means that in the asyncio/tornado async/await integration that
allows <tt class="docutils literal">await</tt> to be applied to <tt class="docutils literal">Future</tt>s, one <em>can</em> write code
like <tt class="docutils literal">await <span class="pre">websock.write_message(...)</span></tt> and get correct backpressure
handling. But, if we forget, and write <tt class="docutils literal"><span class="pre">websock.write_message(...)</span></tt>
instead... then the code will still seem to work! So despite the
similarity, this isn't really a causality-preserving method, the way
it would be if it were implemented as an async method that <em>required</em>
<tt class="docutils literal">await</tt> to run. It's a method that spawns a concurrent thread of
execution to do the actual work, while also providing an option to
join that thread. Much better than nothing! But the end result is that
if you look at their official examples, <a class="reference external" href="https://github.com/tornadoweb/tornado/blob/4d783c641b815d7a20c41c1f0d3511bebb15bc97/demos/websocket/chatdemo.py#L80">they don't actually check the
return value</a>.</p>
<p>Of course tornado itself can't easily fix this, because they have to
worry about backcompat and Python 2 support. But looking to the
future, if we want to habitually write reliable software without
breaking our brains, then causality needs to be opt-out, not opt-in,
and the great thing about async/await is that it makes causal APIs
just as easy to use as non-causal ones.</p>
</div>
</div>
</div>
<div class="section" id="other-challenges-for-hybrid-apis">
<h2><a class="toc-backref" href="#id17">Other challenges for hybrid APIs</a></h2>
<div class="section" id="timeouts-and-cancellation">
<h3><a class="toc-backref" href="#id18">Timeouts and cancellation</a></h3>
<p>Timeouts are important, because they're ubiquitous – any code that
does I/O and might be run unattended had better make sure that there
are timeouts covering every single I/O operation. Yet, in many I/O
frameworks, handling timeouts correctly is extremely difficult. For
example, if you're doing synchronous I/O using the stdlib <tt class="docutils literal">socket</tt>
module, then you get the ability to set a timeout on each individual
send and receive operation – but this is far too low-level.</p>
<p>Curio instead offers a context manager that imposes a timeout on
everything inside it. This means that we can straightforwardly take a
function that performs some complex operation with multiple types of
I/O, and impose a timeout on it as a whole:</p>
<div class="highlight"><pre><span></span><span class="c1"># A function that does some complex I/O</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">upload_big_file_over_http</span><span class="p">(</span><span class="n">sock</span><span class="p">):</span>
<span class="k">await</span> <span class="n">sock</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="sa">b</span><span class="s2">"POST /upload HTTP/1.1</span><span class="se">\r\n</span><span class="s2">"</span>
<span class="sa">b</span><span class="s2">"Host: example.com</span><span class="se">\r\n</span><span class="s2">"</span>
<span class="sa">b</span><span class="s2">"Expect: 100-continue</span><span class="se">\r\n</span><span class="s2">"</span>
<span class="sa">b</span><span class="s2">"Content-Length: 10000000</span><span class="se">\r\n\r\n</span><span class="s2">"</span><span class="p">)</span>
<span class="c1"># Read the server's interim response -- either telling us</span>
<span class="c1"># to go ahead, or giving a final rejection:</span>
<span class="n">response</span> <span class="o">=</span> <span class="k">await</span> <span class="n">read_http_response</span><span class="p">(</span><span class="n">sock</span><span class="p">)</span>
<span class="k">if</span> <span class="n">response</span><span class="o">.</span><span class="n">status</span> <span class="o">==</span> <span class="mi">100</span><span class="p">:</span> <span class="c1"># 100 Continue</span>
<span class="k">await</span> <span class="n">sock</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="sa">b</span><span class="s2">"x"</span> <span class="o">*</span> <span class="mi">10000000</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="k">await</span> <span class="n">read_http_response</span><span class="p">(</span><span class="n">sock</span><span class="p">)</span>
<span class="k">return</span> <span class="n">response</span>
<span class="c1"># Imposing a timeout on it from outside</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">sock</span> <span class="o">=</span> <span class="o">...</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">curio</span><span class="o">.</span><span class="n">timeout_after</span><span class="p">(</span><span class="mi">60</span><span class="p">):</span> <span class="c1"># 1 minute</span>
<span class="k">await</span> <span class="n">upload_big_file_over_http</span><span class="p">(</span><span class="n">sock</span><span class="p">)</span>
</pre></div>
<p>This approach is really brilliant. In the traditional system, every
function had to manually implement complex timeout logic and expose it
as part of its API. Here, the code living inside the context manager
does need to handle cancellation correctly, but otherwise can be
completely oblivious to timeouts. We use exactly the same API to
impose a timeout on a primitive <tt class="docutils literal">send</tt> call and on a complex RPC
operation. Timeouts can be nested. It's great.</p>
<!-- twisted has
something `conceptually identical
<https://github.com/twisted/twisted/pull/145/files>`__ (though with a
typically twisted API) and -->
<p>Of course this idea isn't unique to curio. You can implement the
<a class="reference external" href="https://github.com/aio-libs/async-timeout/blob/master/async_timeout/__init__.py">context manager style API in asyncio</a>
too. But – you can probably guess where I'm going to go with this –
handling timeouts and cancellations in a hybrid callbacks+async/await
system creates a number of unique and unnecessary challenges.</p>
<p>First, since we can't assume that everyone is using async/await, our
hybrid system needs to have some alternative, redundant system for
handling timeouts and cancellations in callback-using code – in
asyncio this is the <tt class="docutils literal">Future</tt> cancellation system, and there isn't
really a callback-level timeout system so you have to roll your
own. In curio, there are no callbacks, so there's no need for a second
system. In fact, in curio there's <em>only</em> the one way to express
timeouts – <tt class="docutils literal">timeout=</tt> kwargs simply don't exist. So we can focus
our energies on making this one system <a class="reference external" href="https://github.com/dabeaz/curio/issues/82#issuecomment-257078638">as awesome as possible</a>.</p>
<p>Then, once you have two systems, you have to figure out how they
interact. This is not trivial. For example, I suspect most people
would find the behavior of this asyncio program surprising:</p>
<!-- silly hard-coded styles from njs_code_include, will need changing
if theme changes -->
<div style="text-align: right; margin-top: -1em; font-size: 80%">
Download source: <a href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/spooky-cancellation-at-a-distance.py" class="reference external">spooky-cancellation-at-a-distance.py</a>
</div>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">child</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">fut</span><span class="p">,</span> <span class="n">event</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"</span><span class="si">{}</span><span class="s2"> started"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">event</span><span class="o">.</span><span class="n">set</span><span class="p">()</span>
<span class="k">await</span> <span class="n">fut</span>
<span class="k">except</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">CancelledError</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"</span><span class="si">{}</span><span class="s2"> cancelled"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span><span class="p">))</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">fut</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">Future</span><span class="p">()</span>
<span class="c1"># Start two tasks, and give them a chance to block on the same future.</span>
<span class="n">event1</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">Event</span><span class="p">()</span>
<span class="n">task1</span> <span class="o">=</span> <span class="n">loop</span><span class="o">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">child</span><span class="p">(</span><span class="s2">"task 1"</span><span class="p">,</span> <span class="n">fut</span><span class="p">,</span> <span class="n">event1</span><span class="p">))</span>
<span class="k">await</span> <span class="n">event1</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="n">event2</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">Event</span><span class="p">()</span>
<span class="n">task2</span> <span class="o">=</span> <span class="n">loop</span><span class="o">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">child</span><span class="p">(</span><span class="s2">"task 2"</span><span class="p">,</span> <span class="n">fut</span><span class="p">,</span> <span class="n">event2</span><span class="p">))</span>
<span class="k">await</span> <span class="n">event2</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="c1"># Cancel task1...</span>
<span class="n">task1</span><span class="o">.</span><span class="n">cancel</span><span class="p">()</span>
<span class="c1"># ...then block on task2.</span>
<span class="k">await</span> <span class="n">task2</span>
<span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">main</span><span class="p">())</span>
</pre></div>
<p>(It prints: task 1 started / task 2 started / task 1 cancelled / task
2 cancelled. Note that task 2 was not cancelled. Note also that if we
comment out the calls to <tt class="docutils literal">event.wait()</tt>, then the program hangs
instead (which is probably the outcome we expected in the first place
– but we might not have expected those lines to affect the result).
Note also also that if we move the cancellation to just after the call
to <tt class="docutils literal">event1.wait()</tt>, before spawning <tt class="docutils literal">task2</tt>, then the program does
<em>not</em> hang – so we can't avoid this by checking for multiple waiters
when propagating cancellations from tasks->futures.)</p>
<p>The fundamental problem here is that <tt class="docutils literal">Future</tt>s often have a unique
consumer but might have arbitrarily many, and that <tt class="docutils literal">Future</tt>s are
stuck half-way between being an abstraction representing
<em>communication</em> and being an abstraction representing
<em>computation</em>. The end result is that when a task is blocked on a
<tt class="docutils literal">Future</tt>, <tt class="docutils literal">Task.cancel</tt> simply has no way to know whether that
future should be considered to be "part of" the task. So it has to
guess, and inevitably its guess will sometimes be wrong. (An
interesting case where this could arise in real code would be two
<tt class="docutils literal">asyncio.Task</tt>s that both call <tt class="docutils literal">await writer.drain()</tt> on the same
<tt class="docutils literal">StreamWriter</tt>; under the covers, they end up blocked on the same
<tt class="docutils literal">Future</tt>.) In curio, there are no <tt class="docutils literal">Future</tt>s or callback chains,
so this ambiguity never arises in the first place.</p>
<p>Next, there's the problem of actually implementing cancellation. For
callback-based operations, this is certainly possible. It's just
really difficult to do, and every cancellable operation has to
carefully implement it from scratch. And in practice, basic primitives
like <tt class="docutils literal">transport.write</tt> don't support cancellation, which makes it
very difficult to write cancellation-safe code on top of them. For
example, here's the asyncio version of our HTTP upload example:</p>
<div class="highlight"><pre><span></span><span class="c1"># asyncio version</span>
<span class="c1"># Pretend that StreamWriter has been fixed to attempt to expose a</span>
<span class="c1"># causal API -- this example shows that that still isn't enough :-(</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">almost_causal_write</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="n">writer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="k">await</span> <span class="n">writer</span><span class="o">.</span><span class="n">drain</span><span class="p">()</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">upload_big_file_over_http</span><span class="p">(</span><span class="n">reader</span><span class="p">,</span> <span class="n">writer</span><span class="p">):</span>
<span class="k">await</span> <span class="n">almost_causal_write</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span>
<span class="sa">b</span><span class="s2">"POST /upload HTTP/1.1</span><span class="se">\r\n</span><span class="s2">"</span>
<span class="sa">b</span><span class="s2">"Host: example.com</span><span class="se">\r\n</span><span class="s2">"</span>
<span class="sa">b</span><span class="s2">"Expect: 100-continue</span><span class="se">\r\n</span><span class="s2">"</span>
<span class="sa">b</span><span class="s2">"Content-Length: 10000000</span><span class="se">\r\n\r\n</span><span class="s2">"</span><span class="p">)</span>
<span class="c1"># Read the server's interim response telling us whether</span>
<span class="c1"># to go ahead:</span>
<span class="n">response</span> <span class="o">=</span> <span class="k">await</span> <span class="n">read_http_response</span><span class="p">(</span><span class="n">reader</span><span class="p">)</span>
<span class="k">if</span> <span class="n">response</span><span class="o">.</span><span class="n">status</span> <span class="o">==</span> <span class="mi">100</span><span class="p">:</span> <span class="c1"># 100 Continue</span>
<span class="k">await</span> <span class="n">almost_causal_write</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span> <span class="sa">b</span><span class="s2">"x"</span> <span class="o">*</span> <span class="mi">10000000</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="k">await</span> <span class="n">read_http_response</span><span class="p">(</span><span class="n">sock</span><span class="p">)</span>
<span class="k">return</span> <span class="n">response</span>
<span class="c1"># Imposing a timeout on it from outside</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">reader</span><span class="p">,</span> <span class="n">writer</span> <span class="o">=</span> <span class="o">...</span>
<span class="kn">from</span> <span class="nn">async_timeout</span> <span class="kn">import</span> <span class="n">timeout</span>
<span class="k">with</span> <span class="n">timeout</span><span class="p">(</span><span class="mi">60</span><span class="p">):</span> <span class="c1"># 1 minute</span>
<span class="k">await</span> <span class="n">upload_big_file_over_http</span><span class="p">(</span><span class="n">reader</span><span class="p">,</span> <span class="n">writer</span><span class="p">)</span>
</pre></div>
<p>There's an excellent chance that the timeout will fire after we've
started the 10 MB upload, and are blocked in the <tt class="docutils literal">drain</tt> call inside
<tt class="docutils literal">almost_causal_write</tt>. If this happens, then
<tt class="docutils literal">upload_big_file_over_http</tt> will return early but <em>the upload will
continue</em>, because it's happening in a logically concurrent thread!
And note that this is really a another special case of a
non-causality. Our <tt class="docutils literal">causal_write</tt> function does manage to be causal
so long as it completes normally. But if it gets cancelled, then from
the perspective of the caller it has returned (by raising an
exception) – yet the underlying operation is still going, which is
the definition of non-causal behavior.</p>
<p>There are ways to work around this, but in general, any
function that calls any cancellation-unsafe functions is also going to
be cancellation-unsafe by default, and it's hard to write much code in
asyncio without calling unsafe functions like <tt class="docutils literal">transport.write</tt>. I'm
not even sure what a cancellation-safe version of <tt class="docutils literal">transport.write</tt>
would look like :-(.</p>
<p>In curio, supporting cancellation isn't free, but it's much much
easier: all primitive operations are cancellation-safe, so we start
from a solid foundation, and then beyond that it basically comes down
to writing code that properly cleans up in the face of exceptions. And
this kind of exception-safety is a local property we can check on a
function-by-function basis, is something we have to worry about
anyway, and is often easy to handle because we can let Python's tools
like context managers to do the heavy lifting.</p>
</div>
<div class="section" id="event-loop-lifecycle-management">
<h3><a class="toc-backref" href="#id19">Event loop lifecycle management</a></h3>
<p>Remember how up above in our initial discussion of the different
example programs, even before we got into the bugs in the asyncio
versions, we noted that the <a class="reference internal" href="#curio-shutdown">curio version was simpler</a> because it was able to take advantage of curio's
system for shutting down the event loop when all non-daemonic tasks
were complete? This system is possible because curio restricts itself
to managing only full-fledged async function callstacks, not arbitrary
callback chains. This means that curio has a complete high-level
representation of what tasks are running, and a standard place to
store metadata like which ones should be considered daemonic. And this
isn't just a convenience: it also means that curio can guarantee that
every task's cleanup code (e.g. <tt class="docutils literal">finally</tt> blocks) are executed
before shutting down.</p>
<p>Asyncio doesn't really have any way to do the equivalent, even in
principle. It can't tell whether a <tt class="docutils literal">loop.add_reader</tt> callback is
"daemonic", i.e., whether it's associated with providing some
background service like logging or whether it's some specific ongoing
callback chain that's on the application's critical path; if it is a
background service it has no way to tell whether it needs to be
cleaned up somehow when the loop shuts down; and even if it does need
some kind of cleanup, there's no way for the loop to "cancel" that
callback chain and tell it to run its cleanup. Because callback-based
execution threads are implicit, not reified, the user is left to keep
track of these kinds of things manually. And any code that uses
asyncio's protocol/transport layer is implicitly creating these kinds
of anonymous callback chains. Obviously it's possible despite all this
to write asyncio programs that correctly handle the event loop
lifetime; the point is that because of asyncio's choice to use a
hybrid design, it can't provide much help in solving these issues, so
you're stuck doing it manually each time instead.</p>
</div>
<div class="section" id="getting-in-touch-with-your-event-loop">
<h3><a class="toc-backref" href="#id20">Getting in touch with your event loop</a></h3>
<p>It's often useful to have multiple event loops in the same
process. For example, we might want to spawn several OS-level threads
and run an event loop in each, or we might want to give each of our
tests its own event loop, so that we don't have stray callbacks left
behind by test A firing while we're running test B. And this means
that any code that does I/O has to have some mechanism to figure out
which event loop it should be using.</p>
<img alt="Modified scan of a page from the children's book "Are you my mother?". Text reads: "Are you my event loop?" the baby bird asked a cow. "How could I be your event loop?" said the cow. "I am a cow."" class="align-left" src="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/are-you-my-event-loop.png" style="width: 250px;" />
<p>In principle, if you're using async/await, this should be trivial:
async functions by definition have to be supervised by some sort of
coroutine runner, and <tt class="docutils literal">await</tt> provides dedicated syntax for talking
to this supervisor. In curio, the supervisor is the event loop itself;
in asyncio, technically the supervisor isn't the event loop itself but
rather an <tt class="docutils literal">`asyncio.Task</tt> object, but <tt class="docutils literal">asyncio.Task</tt> objects hold
a reference to the correct event loop. So if you're using async/await,
this ambient event loop reference is always present and accessible in
principle, and the language runtime makes sure that it's passed
implicitly to callees without any effort on your part.</p>
<p>But asyncio assumes that you might <em>not</em> be using async/await, so
almost none of its APIs take advantage of this. Instead, it gives you
two options. Option 1 is that you can manually pass along a reference
to the correct event loop every time you call a function that might
(recursively) do I/O. This is unpopular because it takes work and
clutters up your code. (You can see a bit of this in the proxy
examples above.) Plus it's frustrating since, as we said above, if
you're a sensible person who uses async/await then this is forcing you
to do redundant work that the runtime is already doing for you. Option
2 is to be lazy, and to grab the global event loop whenever you need
it (which asyncio makes very easy to do – all asyncio API functions
will default to using the global event loop unless you explicitly tell
them otherwise). Of course the problem with this is that as we saw
above, you can't assume that there is just one global event loop,
which is why instead of having a single global event loop, the asyncio
API instead allows you to specify a global object implementing the
<tt class="docutils literal">AbstractEventLoopPolicy</tt> <a class="reference external" href="https://docs.python.org/3/library/asyncio-eventloops.html#asyncio.AbstractEventLoopPolicy">interface</a>
which encapsulates a strategy for introspecting the current context to
determine which global event loop should be returned from each call to
<tt class="docutils literal">get_event_loop</tt>. So, you know, implement that and you're good to
go. Just make sure the three different mechanisms for getting the
event loop always give identical results and it'll work fine, because
configuring redundant systems is always fun and certainly not error
prone.</p>
<p>I tease, of course. In practice this mostly works out well enough, and
none of this is actually going to stop anyone from writing working
programs with asyncio – this is probably the most minor issue
discussed in this essay. But it's a source of <a class="reference external" href="https://bugs.python.org/issue26969">ongoing friction</a>, and <a class="reference external" href="https://aiohttp.readthedocs.io/en/stable/faq.html#how-to-use-aiohttp-test-features-with-code-which-works-with-implicit-loop">causes real problems</a>.</p>
<p>OTOH, because curio is async/await all the way down, this friction
just... goes away. Or rather, is never there in the first place. There
are no global variables, no policy objects with LotsOfCapitalLetters,
and nothing to pass around. If you need to issue some I/O, you call
<tt class="docutils literal">await whatever()</tt> and the Python interpreter automatically directs
your request to the right event loop. (Sorta like how <tt class="docutils literal">return</tt>
values magically go to the right place.) Of course your test suite
creates a fresh new event loop for each independent test and different
tests never pollute each other's event loop by accident, that's just
the easiest way to do it. Normally we don't even explicitly
instantiate the event loop object; it's an internal implementation
detail of <tt class="docutils literal">curio.run</tt>.</p>
<p>I don't see any way to fix this in asyncio, because it's – again – a
sort of inevitable penalty for wanting to mix async/await and callback
APIs in the same library. A small improvement would be to add a
function like <tt class="docutils literal">await asyncio.get_ambient_event_loop()</tt>, so that leaf
async functions could at least summon the correct event loop reference
on demand whenever they were about to transition to the callback API,
without it having to be manually passed down the call stack. But this
still requires an error-prone manual step at that transition point,
and that seems unavoidable as long as we have a callback-friendly API
at all. I guess for the asyncio entry points that already are async
functions (e.g. <tt class="docutils literal">asyncio.open_connection</tt>) we could change the
default so that <tt class="docutils literal">loop=None</tt> does a call to <tt class="docutils literal">await
asyncio.get_ambient_event_loop()</tt> rather than
<tt class="docutils literal">asyncio.get_event_loop()</tt>. But then, it could get pretty confusing
if some API calls default to the right event loop, while others you
have to make sure to pass it in explicitly because the defaults are
different.</p>
<p>[<strong>Edit:</strong> I may have been overly pessimistic! I'm told that asyncio's
global event loop fetching API is going to be reworked in 3.6 and
backported to 3.5.3. If I understand correctly (which is not 100%
certain, and I don't think the actual code has been written yet
[edit**2: <a class="reference external" href="https://github.com/python/asyncio/pull/452">here it is</a>]), the new system will
be: <tt class="docutils literal">asyncio.get_event_loop()</tt>, instead of directly calling the
currently-registered <tt class="docutils literal">AbstractEventLoopPolicy</tt>'s
<tt class="docutils literal">get_event_loop()</tt> method, will first check some thread-local global
to see if a <tt class="docutils literal">Task</tt> is currently executing, and if so it will
immediately return the event loop associated with that <tt class="docutils literal">Task</tt> (and
otherwise it will continue to fall back on the
<tt class="docutils literal">AbstractEventLoopPolicy</tt>. This means that inside async functions it
should now be guaranteed (via somewhat indirect means) that
<tt class="docutils literal">asyncio.get_event_loop()</tt> gives you the same event loop that you'd
get by doing an <tt class="docutils literal">await</tt>. And, more importantly, since
<tt class="docutils literal">asyncio.get_event_loop()</tt> is what the callback-level APIs use to
pick a default event loop when one isn't specified, this also means
that async/await code should be able to safely use callback-layer
functions without explicitly specifying an event loop, which is a neat
improvement over my suggestion above.</p>
<p>I think it's still illustrative of my general point here that asyncio
required three Python releases in order to settle on a system that
uses multiple layers of complex logic just to get back to the place
where curio started. But as long as end-users aren't peeking under the
covers then they shouldn't notice much difference anymore, at least in
this regard.]</p>
</div>
<div class="section" id="context-passing-task-local-storage">
<h3><a class="toc-backref" href="#id21">Context passing / task-local storage</a></h3>
<p>Suppose we have a server handling various requests, where each request
triggers a complex set of events – RPCs, database calls,
whatever. When monitoring and debugging such a server, it's very
useful if we can arrange for each incoming request to be assigned a
unique id, and then make sure that all the logs generated deep inside
(for example) the database library are tagged with this unique id, so
that we can later aggregate the logs to get a complete picture of an
individual misbehaving request. But how does the logging code inside
the database library find this id? Ideally we could pass it down as
part of an explicit "context" object, but this isn't always practical,
especially given that the Python stdlib <tt class="docutils literal">logging</tt> module doesn't
provide any way to do this. What we need is some sort of async
equivalent to "thread-local storage", where we can stash data and make
it accessible to a complete logical execution flow.</p>
<!-- (though there are also `cogent arguments against it
<https://groups.google.com/d/msg/golang-nuts/Iyg3lKHV_lQ/ataabh6rBV8J>`__). -->
<p>In callback-based frameworks, this kind of context propagation
requires modifying every callback-scheduling operation to capture the
context when the callback is scheduled, store it, and then restore it
before the callback is executed. This is challenging, because there
are lots of callback-scheduling operations that need to implement this
logic, and some of them are in third-party libraries.</p>
<p>In a curio-style framework, the problem is almost trivial, because all
code runs in the context of a <tt class="docutils literal">Task</tt>, so we can store our task-local
data there and immediately cover all uses cases. And if we want to
propagate context to sub-tasks, then as described above, sub-task
spawning goes through a single bottleneck inside the curio library, so
this is also easy. I actually started writing a simple example here of
how to implement this on curio to show how easy it was... but then I
decided that probably it made more sense as a <a class="reference external" href="https://github.com/dabeaz/curio/pull/85">pull request</a>, so now I don't have to
argue that curio <em>could</em> easily support task-local storage, because it
actually does! It took ~15 lines of code for the core functionality,
and the rest is tests, comments, and glue to present a convenient
<tt class="docutils literal">threading.Local</tt>-style API on top; there's a <a class="reference external" href="https://curio.readthedocs.io/en/latest/tutorial.html#task-local-storage">concrete example</a>
to give a sense of what it looks like in action.</p>
<p>I also recommend this interesting <a class="reference external" href="https://docs.google.com/document/d/1tlQ0R6wQFGqCS5KeIw0ddoLbaSYx6aU7vyXOkv-wvlM/edit">review of async context propagation
mechanisms</a>
written by two developers at Google. A somewhat irreverant but (I
think) fair summary would be (a) Dart baked a solution into the
language, so that works great, (b) in Go, Google just forces everyone
to pass around explicit context objects <em>everywhere</em> as part of their
style guide, and they have enough leverage that everyone <a class="reference external" href="https://github.com/jtolds/gls">mostly</a> goes along with it, (c) in C# they
have the same system I implemented in curio (as I learned after
implementing it!) and it works great because no-one uses callbacks,
but (d) context propagation in Javascript is an ongoing disaster
because Javascript uses callbacks, and no-one can get all the
third-party libraries to agree on a single context-passing
solution... partly because even the core packages like node.js can't
decide on one.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Incidentally, looking at Javascript makes me grateful again
for the care and deliberateness of Python's maintainers. It's
possible that asyncio and its hybrid approach might turn out to be
a dead-end, but – if so, then, oh well, at the end of the day it's
just a library. We can recover. Javascript OTOH <a class="reference external" href="https://tc39.github.io/ecmascript-asyncawait/">is getting
async/await any day now</a>, but the
language is all-in on callbacks, so Javascript's async/await will
always be the hybrid kind, and AFAICT many of this essay's
critiques apply. A libuv-backed curio running on <a class="reference external" href="https://morepypy.blogspot.com/2016/08/pypy-gets-funding-from-mozilla-for.html">PyPy 3</a>
might someday make an <em>extraordinarily</em> compelling competitor to
node.js.</p>
</div>
<p>So this is another domain where asyncio's hybrid nature creates a
challenge: of course it would be easy enough to implement a task-local
storage system that works only for asyncio <tt class="docutils literal">Task</tt> objects, but then
we'd still have the question of what to do for callback-based code,
and how to handle the hand-off point between them. Probably there is
some solution, but finding it seems like it would take a lot of work,
and the benefit isn't clear given all the other issues that callbacks
create.</p>
</div>
<div class="section" id="implementation-complexity">
<h3><a class="toc-backref" href="#id22">Implementation complexity</a></h3>
<p>Finally, I want to say a few words about internal implementation
complexity. Just like you <em>can</em> implement anything with <tt class="docutils literal">goto</tt>
instead of structured programming, with enough work, it's almost
certainly possible for asyncio to eventually fix these bugs, implement
reliable cancellation for all of its callback-based primitives, and so
forth, and eventually expose a curio-style API on top of some sort of
callback-based infrastructure. And from the user's point of view, this
would be fine – it doesn't necessarily matter to them what's happening
under the hood, so long as the public semantics are right, and users
could just ignore all that callback-based stuff.</p>
<p>But... asyncio's internals are <em>really</em> hard to follow. This isn't a
criticism of the authors. I've highlighted a number of problems here,
but what you <em>can't</em> see is all the times that I was convinced that
I'd found another nasty edge case bug, only to eventually realize that
someone else had already noticed the problem and carefully arranged to
handle it. Asyncio is very thoughtfully written callback soup; the
problem is that callback soup is just too complicated for human minds
– at least, my human mind – to understand.</p>
<p>The end result is that I'd estimate it took me ~3x longer to
understand the actual semantics of <tt class="docutils literal">asyncio.StreamWriter</tt>'s and
<tt class="docutils literal">asyncio.StreamReader</tt>'s public APIs than it did to read all of
curio's source code, <a class="reference external" href="https://github.com/dabeaz/curio/pull/85">implement task-local storage</a>, and diagnose a
<a class="reference external" href="https://github.com/dabeaz/curio/issues/86#issuecomment-254123923">concurrency bug</a>
in <tt class="docutils literal">curio.Event</tt>. (By the way, I recommend reading through curio as
an exercise; it's not perfect – as that <tt class="docutils literal">Event</tt> bug indicates :-)
– but overall it's remarkably small and straightforward.)</p>
<p>And if that's what it takes me just to understand asyncio, then I
wince to think how much energy was spent on implementing it and fixing
all those edge cases in the first place. If you're an event-loop
developer, or the author of a protocol library, then is this really
how you want to be spending your time? On implementing complicated
callback-based APIs, and then on implementing another complicated
layer on top of that to cancel out all the problems introduced by the
callback-based APIs, and then spending a lot of time trying to squash
all the weird edge case bugs introduced by the impedence mismatch
between the layers? It's all so unnecessary! It's 2016, and you don't
have to live like that anymore! AFAICT going async/await-native is a
direct path to fewer bugs and more happiness.</p>
</div>
</div>
<div class="section" id="review-and-summing-up-what-is-async-await-native-anyway">
<h2><a class="toc-backref" href="#id23">Review and summing up: what is "async/await-native" anyway?</a></h2>
<p>In previous asynchronous APIs for Python, the use of callback-oriented
programming led to the invention of a whole set of conventions that
effectively make up an entire ad hoc programming language, in the
sense that they provide their own methods for expressing basic
computational constructs like sequencing, error handling, resource
cleanup, and so forth. The result is somewhat analogous to the bad old
days before structured programming, where basic constructs like
function calls and loops had to be constructed on the fly out of
primitive tools like <tt class="docutils literal">goto</tt>. In theory, one can do anything. In
practice, it's extraordinarily difficult to write correct code in this
style, especially when one starts to think about edge conditions.</p>
<p>Now that Python has async/await, it's possible to start using Python's
native mechanisms to solve these problems. Python's tools are,
unsurprisingly, a huge improvement over the old system; but, when used
in a hybrid system layered on top of the older callback approach, many
of these advantages are blunted or lost. We've seen above that going
to a fully async/await-native approach let us easily solve a number of
problems that arise in the hybrid approach, like <a class="reference internal" href="#bug-1-backpressure">handling
backpressure</a>, <a class="reference internal" href="#bug-2-read-side-buffering">avoiding bufferbloat</a> and <a class="reference internal" href="#bug-3-closing-time">race conditions</a>, handling <a class="reference internal" href="#timeouts-and-cancellation">timeouts and cancellation</a>, <a class="reference internal" href="#event-loop-lifecycle-management">figuring out when our program was finished running</a>, and <a class="reference internal" href="#context-passing-task-local-storage">context passing</a>, while reducing our <a class="reference internal" href="#getting-in-touch-with-your-event-loop">need for global
state</a> – and the code is
simpler too, both for the library <a class="reference internal" href="#implementation-complexity">implementors</a> and the library users!</p>
<p>These advantages come from consistently following a set of
<em>structuring principles</em>. What are these principles? What makes a
Python app "async/await-native"? Here's a first attempt at codifying
them:</p>
<ol class="arabic simple">
<li>An async/await-native application consists of a set of cooperative
threads (a.k.a. <tt class="docutils literal">Task</tt>s), each of which consists of some
<em>metadata</em> plus an <em>async callstack</em>. Furthermore, this set is
<em>complete</em>: all code must run on one of these threads.</li>
<li>These threads are <em>supervised</em>: it's guaranteed that every
callstack will run to completion – either organically, or after the
injection of a cancellation exception.</li>
<li>Thread spawning is always explicit, not implicit.</li>
<li>Each frame in our callstacks is a regular sync- or async-colored
Python function, executing regular imperative code from top to
bottom. This requires that both API primitives and higher-level
functions <a class="reference internal" href="#causality">*respect causality*</a> whenever possible.</li>
<li>Errors, including cancellation and timeouts, are signaled via
<em>exceptions</em>, which propagate through Python's regular callstack
unwinding.</li>
<li>Resource <em>cleanup</em> and <em>error-handling</em> is managed via exception
handlers (<tt class="docutils literal">with</tt> or <tt class="docutils literal">try</tt>).</li>
</ol>
<p>These work together so that if each piece of our program follows the
rules, we end up with strong global guarantees. For example, if an
error occurs: (5) tells us that this raises an exception, (1) tells us
that this exception will be on one of our thread's callstacks, (4)
implies that the exception interrupts execution at a well-defined
point in time (for example, we know that code that comes before the
exception-raising point is no longer running once we start unwinding),
and (6) implies that resources will be cleaned up appropriately as the
exception unwinds. Or for any given resource, that analysis + rule (2)
gives us confidence that the resource will eventually be cleaned up at
an appropriate time. Rules (4) + (5) + (6) together justify the use of
a <tt class="docutils literal">with</tt>-style composable timeout API.</p>
<p>These might seem almost too trivial to write down, and indeed, if you
delete the word "async" then these regular synchronous Python code
generally follows all of these rules without anyone bothering to
mention them. Yet writing them down seems useful – until curio, every
asynchronous I/O library for Python violated all of them!</p>
<!-- Each async function owns
a call stack. 'async with' binds the lifetime of resources like
sockets to these call stacks. Errors propagate deterministically
through them. Timeouts can be applied directly to their dynamic
extent. The kernel ensures that every call stack will be
eventually unwound. And each kernel makes its own local self-contained
universe, with no need for any global state. -->
</div>
<div class="section" id="open-questions">
<h2><a class="toc-backref" href="#id24">Open questions</a></h2>
<div class="section" id="for-async-await-native-apis">
<h3><a class="toc-backref" href="#id25">...for async/await-native APIs</a></h3>
<p>One of the nice things about this analysis is that it helps suggest
ways in which curio or other libraries like it could be improved. In
particular, it suggests we should focus on the two places where the
above rules currently break down. In particular, AFAICT these are the
unique two places where errors can pass silently.</p>
<div class="section" id="orphan-tasks">
<h4><a class="toc-backref" href="#id26">Orphan tasks</a></h4>
<p>Curio obviously supports spawning new <tt class="docutils literal">Task</tt>s, but once spawned it
can be rather tricky to manage them properly. In particular, the event
loop's supervision guarantees that they will eventually clean up and
exit. But unless you're very careful it's easy to get into situations
where a task has crashed and no-one notices, or where a parent task
has been cancelled but the child task continues on oblivious. For
example, a common pattern I've run into is where I want to spawn
several worker tasks that act like "part of" the parent task: if any
of them raises an exception then all of them should be cancelled + the
parent raise an exception; if the parent is cancelled then they should
be cancelled too. We need ergonomic tools for handling these kinds of
patterns robustly.</p>
<p>Fortunately, this is something that's easy to experiment with, and
there's lots of inspiration we can draw from existing systems: <a class="reference external" href="http://erlang.org/documentation/doc-4.9.1/doc/design_principles/sup_princ.html">Erlang</a>
certainly has some good ideas here. Or, curio makes much of the
analogy between its event loop and an OS kernel; maybe there should be
a way to let certain tasks sign up to act as PID 1 and catch failures
in orphan tasks? I think we'll see rapid progress here.</p>
</div>
<div class="section" id="cleanup-in-generators-and-async-generators">
<h4><a class="toc-backref" href="#id27">Cleanup in generators and async generators</a></h4>
<p>Generators and async generators present a somewhat stickier
problem. If you think about it, a generator is something very like an
independent thread of execution that runs arbitrary code. Given our
analysis above, this should make us nervous!</p>
<p>Fortunately, stepping through a generator with <tt class="docutils literal">__iter__</tt> and
<tt class="docutils literal">__next__</tt> turns out to be compatible with our rules, because while
the generator acts somewhat like an independent thread semantically,
each step gets executed as regular function call that's sort of
grafted onto a regular callstack – so the context is correct,
exceptions will propagate, etc.</p>
<p>The problem is that there's another piece to the generator API, that
not everyone realizes is even there. It's the <tt class="docutils literal">__del__</tt> method. If
we have a generator with some sort of cleanup code, like:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">some_generator</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">handle</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
<span class="k">yield</span> <span class="o">...</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">handle</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</pre></div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">While this essay focuses on async code, everything in this
section actually applies equally to the use of regular generators
in regular Python code. All the "async/await-native" principles we
formulated above like "functions should respect causality" and "no
implicit spawning of logical threads of execution" apply just as
much to non-async code – it's just that in the non-async case
they're so obvious that no-one needed to write them
down. async/await forces us to go back and re-examine the
foundations of how Python is put together, and take these implicit
principles and make them explicit. An interesting side-effect of
this is that once we've written them down, suddenly this hidden gap
in Python's existing design jumps out at us!</p>
</div>
<p>and we iterate it, but stop before reaching the end:</p>
<div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">obj</span> <span class="ow">in</span> <span class="n">some_generator</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
<span class="k">break</span>
<span class="c1"># or</span>
<span class="k">for</span> <span class="n">obj</span> <span class="ow">in</span> <span class="n">some_generator</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
<span class="k">raise</span> <span class="o">...</span>
</pre></div>
<p>then eventually that <tt class="docutils literal">finally</tt> block will be executed by the
generator's <tt class="docutils literal">__del__</tt> method (see <a class="reference external" href="https://www.python.org/dev/peps/pep-0342/">PEP 342</a> for details).</p>
<p>And if we think about how <tt class="docutils literal">__del__</tt> works, we realize: it's another
sneaky, non-causal implicit-threading API! <tt class="docutils literal">__del__</tt> does <em>not</em> get
executed in the context of the callstack that's using the generator –
it happens at some arbitrary time and place.</p>
<p>In the special case where you're using CPython, <em>and</em> there are no
reference loops involving your generator object, then CPython's use of
reference counting does at least guarantee that <tt class="docutils literal">__del__</tt> will be
called at the right time. I.e., as soon as the last reference is
dropped then CPython will immediately pause the thread that dropped
that reference and execute <tt class="docutils literal">__del__</tt> right there. However, this
still takes places in a special context where <em>exceptions are
discarded</em>. Besides which, in most general purpose code you probably
shouldn't assume that you're on CPython and that there are no
reference loops, in which case all bets are off: generator <tt class="docutils literal">__del__</tt>
methods can easily end up executing arbitrary code without respecting
causality, exception propagation, access to the correct task-local
storage, timeout restrictions, ... basically all of our rules and the
guarantees they're trying to provide just go out the window.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">What about <tt class="docutils literal">__del__</tt> methods on other objects, besides
generators? In theory they have the same problems, but (a) for most
objects, like <tt class="docutils literal">int</tt>s or whatever, we don't care when the object
is collected, and (b) objects that do have non-trivial cleanup
associated with them are mostly obvious "resources" like files or
sockets or thread-pools, so it's easy to remember to stick them in
a <tt class="docutils literal">with</tt> block. Plus, when we write a class with a <tt class="docutils literal">__del_</tt>
method we're usually very aware of what we're doing. Generators are
special because they're just as easy to write as regular functions,
and in some programming styles just as common. It's very very easy
to throw a <tt class="docutils literal">with</tt> or <tt class="docutils literal">try</tt> inside some generator code and
suddenly you've defined a <tt class="docutils literal">__del__</tt> method without even realizing
it, and it feels like a function call, not the creation of a new
resource type that needs managing.</p>
</div>
<p>That's for regular, synchronous generators. Async generators are
slightly different, because the reference counting GC part doesn't
apply. Even if we're in the happy case on CPython where <tt class="docutils literal">__del__</tt>
gets called synchronously on our callstack, then it still can't
actually run async cleanup code directly, because <tt class="docutils literal">__del__</tt> is
sync-colored. (This is a consequence of the weird environment where
<tt class="docutils literal">__del__</tt> methods run, similar to the reason they have to discard
exceptions.) PEP 525 provides an API for async generator <tt class="docutils literal">__del__</tt>
methods to hand off to an event loop to spawn a full-fledged <tt class="docutils literal">Task</tt>
to run the actual cleanup. Compared to synchronous generators this is
kind of an improvement, since regular <tt class="docutils literal">__del__</tt> methods can run at
<em>any</em> moment, like pre-emptively scheduled threads – which <a class="reference internal" href="#glyph-unyielding">Glyph told
us is bad</a> – while this async generator cleanup
code at least gets scheduled on a cooperative thread and thus respects
the use of yield points as a synchronization mechanism. But on the
other hand, an implicitly spawned thread is still an implicitly
spawned thread, and the fact that this is the only way to run async
generator <tt class="docutils literal">__del__</tt> methods means that we lose even CPython's weak
guarantees about when they'll run: so they will <em>never</em> respect
causality, exception propagation, access to the correct task-local
storage, timeout restrictions, etc.</p>
<p>This one worries me, because it's basically the one remaining hole in
the lovely interlocking set of rules described above – and here it's
the Python language itself that's fighting us.</p>
<p>For now, the only solution seems to be to make sure that you never,
ever call a generator without explicitly pinning its lifetime with a
<tt class="docutils literal">with</tt> block. For synchronous generators, this looks like:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">some_sync_generator</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="k">as</span> <span class="o">...</span><span class="p">:</span>
<span class="k">yield</span> <span class="o">...</span>
<span class="c1"># DON'T do this</span>
<span class="k">for</span> <span class="n">obj</span> <span class="ow">in</span> <span class="n">some_sync_generator</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
<span class="o">...</span>
<span class="c1"># DO do this</span>
<span class="kn">from</span> <span class="nn">contextlib</span> <span class="kn">import</span> <span class="n">closing</span>
<span class="k">with</span> <span class="n">closing</span><span class="p">(</span><span class="n">some_sync_generator</span><span class="p">(</span><span class="n">path</span><span class="p">))</span> <span class="k">as</span> <span class="n">tmp</span><span class="p">:</span>
<span class="k">for</span> <span class="n">obj</span> <span class="ow">in</span> <span class="n">tmp</span><span class="p">:</span>
<span class="o">...</span>
</pre></div>
<p>And for async generators, this looks like:</p>
<div class="highlight"><pre><span></span><span class="k">async</span> <span class="k">def</span> <span class="nf">some_async_generator</span><span class="p">(</span><span class="n">hostname</span><span class="p">,</span> <span class="n">port</span><span class="p">):</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">open_connection</span><span class="p">(</span><span class="n">hostname</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span> <span class="k">as</span> <span class="o">...</span><span class="p">:</span>
<span class="k">yield</span> <span class="o">...</span>
<span class="c1"># DON'T do this</span>
<span class="k">async</span> <span class="k">for</span> <span class="n">obj</span> <span class="ow">in</span> <span class="n">some_async_generator</span><span class="p">(</span><span class="n">hostname</span><span class="p">,</span> <span class="n">port</span><span class="p">):</span>
<span class="o">...</span>
<span class="c1"># DO do this</span>
<span class="k">class</span> <span class="nc">aclosing</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">agen</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_agen</span> <span class="o">=</span> <span class="n">agen</span>
<span class="k">def</span> <span class="fm">__aenter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_agen</span>
<span class="k">def</span> <span class="nf">__aclose__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">await</span> <span class="bp">self</span><span class="o">.</span><span class="n">_agen</span><span class="o">.</span><span class="n">aclose</span><span class="p">()</span>
<span class="k">async</span> <span class="k">with</span> <span class="n">aclosing</span><span class="p">(</span><span class="n">some_async_generator</span><span class="p">(</span><span class="n">hostname</span><span class="p">,</span> <span class="n">port</span><span class="p">))</span> <span class="k">as</span> <span class="n">tmp</span><span class="p">:</span>
<span class="k">async</span> <span class="k">for</span> <span class="n">obj</span> <span class="ow">in</span> <span class="n">tmp</span><span class="p">:</span>
<span class="o">...</span>
</pre></div>
<p>It might be possible for curio to subvert the PEP 525 <tt class="docutils literal">__del__</tt>
hooks to at least catch cases where async generators are accidentally
used without <tt class="docutils literal">with</tt> blocks and signal some kind of error.</p>
<p><a class="reference external" href="https://www.python.org/dev/peps/pep-0533/">PEP 533</a> is one
possible proposal for fixing this at the language level, by adding an
explicit <tt class="docutils literal">__iterclose__</tt> method to the iterator protocol and
adapting Python's iteration constructs like <tt class="docutils literal">for</tt> accordingly.</p>
</div>
</div>
<div class="section" id="for-the-python-asynchronous-i-o-ecosystem">
<h3><a class="toc-backref" href="#id28">...for the Python asynchronous I/O ecosystem</a></h3>
<p>What does all this mean for the broader ecosystem? I don't have the
answers, but I can try to make the questions more specific!</p>
<div class="section" id="do-you-really-think-everyone-s-going-to-abandon-callbacks">
<h4><a class="toc-backref" href="#id29">Do you really think everyone's going to abandon callbacks?</a></h4>
<p>I hope so! Apparently the C# world has done this and they seem to be
doing OK. Arguably it's how golang works too, if you squint. My guess
is that Python will get there eventually. But... it certainly won't
happen immediately.</p>
<p>There's a lot of code out there using callbacks, and for all its
flaws, it's a very well-understood paradigm that has been used for
lots of large, successful systems. There are also, I think, some very
compelling arguments for getting rid of them – hence this essay – but
the async/await-native paradigm is still fairly immature and will take
some time to settle down and prove itself. My suspicion is that
anything you can do with callbacks can be done better without them,
but I can't prove it until someone tries.</p>
<p>This is another place where the <tt class="docutils literal">goto</tt>/structured programming
analogy is surprisingly apt. By 1968, the structured programming folks
<a class="reference external" href="https://en.wikipedia.org/wiki/Structured_program_theorem">could show</a> that all
programs <em>could</em> be written without <tt class="docutils literal">goto</tt> – but the construction
was pretty ugly, and it left open the question of whether programs
could be written <em>elegantly</em> without <tt class="docutils literal">goto</tt>. And the answer was not
obvious, especially considering that structured programming advocates
of the time also liked to forbid the use of <tt class="docutils literal">break</tt>, <tt class="docutils literal">continue</tt>,
and mid-function <tt class="docutils literal">return</tt> on the grounds that they were too
<tt class="docutils literal">goto</tt>-like. (Exceptions were right out.)</p>
<p>Similarly, one could implement an asyncio event loop on top of curio
to prove that the curio paradigm is at least as powerful in principle
– but this wouldn't really answer the question, and it's entirely
possible that current implementations like curio are missing some
harmless quality-of-life measures like <tt class="docutils literal">break</tt>. It's pretty
exciting: we're on the cusp of learning things! But there are a lot of
open questions about what exactly this future looks like.</p>
</div>
<div class="section" id="so-should-i-drop-asyncio-twisted-etc-and-rewrite-everything-using-curio-tomorrow">
<h4><a class="toc-backref" href="#id30">So should I drop asyncio/twisted/etc. and rewrite everything using curio tomorrow?</a></h4>
<p>Well... that's a complicated question.</p>
<p>If you want to start using the async/await-native approach today, then
curio is currently the only game in town.</p>
<p>But even if you agree that this is where we want to end up eventually,
there are still very good reasons why you might decide not to switch
<em>yet</em>. Twisted and tornado are extremely mature, asyncio is in the
standard library, and curio is neither of those things. All have seen
years of intensive development by lots of very smart people and are in
production use at companies you've heard of; curio is currently
experimental alpha-status software by <a class="reference external" href="https://github.com/dabeaz/curio/graphs/contributors">basically one guy</a>. There's a
much larger ecosystem of supporting libraries around
twisted/tornado/asyncio than curio. And while the callback-based
paradigm has its faults, those faults and their magnitude is
well-understood with known workarounds, while the "curio paradigm" is
still under heavy development, and curio-the-software doesn't yet make
any promises about API stability.</p>
<!-- Original Russian at the bottom said, of course: "Glory to the
laborers of Soviet science and technology!" (trans.: Rose Lemberg) -->
<a class="reference external image-reference" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/curio-propaganda.jpg"><img alt="Modified Soviet space-race propaganda poster, showing a rocket shooting off into the distance while an ethnically diverse old and young man embrace and gaze into the glorious future. The rocket is labeled "curio", and large text says: "Looking forward to a future without Futures!". Small text says: "Glory to the laborers of Pythonic science and technology!"" class="align-right" src="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/curio-propaganda.jpg" style="width: 300px;" /></a>
<p>On the other hand, if you find the async/await-native programming
model compelling, and want to help flesh out a new paradigm, aren't
reliant on the existing ecosystems (or are excited to help build a new
one), and are comfortable with the risks, then you should totally go
for it. Help us stride forward into a glorious <tt class="docutils literal">Future</tt>-free
future! Even if curio doesn't end up being the async/await-native API
to end all APIs, we'll still learn something from the attempt.</p>
<p>For me personally, it helps that (a) the programs I'm working with are
on the smaller side; there are no 20-person teams and big budgets
depending on my technology choices; I could always port to something
else if I had to. And (b) based on my <a class="reference internal" href="#implementation-complexity">experience so far</a> in submitting patches to curio versus
just trying to understand asyncio's edge case semantics, I'm honestly
uncertain whether – if worst came to worst – it would be more work to
maintain a personal fork of curio or to use upstream asyncio. Asyncio
isn't bad; async/await is just so good that it radically changes the
usual maturity calculus.</p>
</div>
<div class="section" id="should-asyncio-be-fixed-to-have-a-curio-style-async-await-native-api">
<h4><a class="toc-backref" href="#id31">Should asyncio be "fixed" to have a curio-style async/await-native API?</a></h4>
<p>In many ways this would be the ideal solution, since it would let us
keep a single standard-bearer library with its interoperability story
and more-developed ecosystem. And there are APIs in asyncio, like
<tt class="docutils literal">loop.sock_sendall</tt>, that seem like they could be first steps in
this direction. But there are also some significant challenges:</p>
<ul class="simple">
<li>I can't see how this could be done without substantially throwing
out and rewriting most of asyncio. Currently, transports are
asyncio's fundamental abstraction layer: this is the layer that
abstracts across different kinds of communication channels
(e.g. sockets versus processes) and it also does a lot of the heavy
lifting in abstracting across different underlying APIs
(e.g. Unix-style polling versus Windows IOCP). The transport layer
is also, as we saw above, the layer whose callback-centric
abstractions cause so many problems for async/await code. And below
the transport layer we have the event loop itself, where the
mismatch isn't quite as great, but it still isn't the most natural
fit. I'm not an asyncio developer, so I could be missing
something... but the callback chaining parts are pretty deeply baked
into asyncio as it currently exists.</li>
<li>As mentioned above, the whole concept of a "curio-style API" is
still undergoing heavy development, while asyncio's position in the
stdlib makes it a poor place to experiment with new paradigms. This
seems like the kind of thing where the ecosystem may be better off
letting it stew in a 3rd-party lib for a while.</li>
<li>Presumably the point of giving asyncio a curio-style API is that it
would make it easier to mix together curio-style code and
callback-style code in the same program (since I assume asyncio's
current APIs wouldn't be going away – this would be
supplemental). But – my whole argument in this essay is that for the
most part, you <em>don't want to mix these</em>. So it's not clear what the
point would be, really. If we're going to end up with two
separate-but-equal APIs that only communicate at arms-length, then
sticking them into two separate namespaces seems like it would be a
lot less confusing.</li>
</ul>
<p>So maybe this wouldn't be the best idea? I'll be very interested to
see what the asyncio developers think.</p>
<p>One possible future would be: asyncio remains as the standard-bearer
for the callback/hybrid approach – which is obviously going to remain
viable and in use indefinitely – while eventually fading into becoming
a legacy library as async/await-native approach matures and takes over
for new code. (This is a not-unfamiliar trajectory for stdlib
libraries – see urllib2.)</p>
</div>
<div class="section" id="okay-then-should-curio-switch-to-using-asyncio-as-a-backend-or-what-will-the-story-be-on-cross-event-loop-compatibility-i-thought-asyncio-was-supposed-to-be-the-event-loop-to-end-all-event-loops">
<h4><a class="toc-backref" href="#id32">Okay, then should curio switch to using asyncio as a backend? Or what will the story be on cross-event-loop compatibility? I thought asyncio was supposed to be the event loop to end all event loops!</a></h4>
<p>Indeed, one of the <a class="reference external" href="https://www.python.org/dev/peps/pep-3156/#interoperability">original, compelling motivations</a> for
adding asyncio to the stdlib is that it could become a standard
foundation layer, so we could start up one event loop in one thread
and then use it to simultaneously run libraries written for twisted,
tornado, ... well, mostly twisted and tornado. And as a secondary
benefit, we could swap in different backends to work on Unix
vs. Windows, or on a headless server vs. embedded in a GUI app where
the GUI framework imposes a particular event loop. Plus it's nice if
different libraries can share code, so not everyone has to implement,
say, IOCP handling from scratch.</p>
<p>Of course, this vision is rather predicated on the fact that until
async/await came along, all these different event loops basically
worked the same way, and the idea that there are lots of existing
libraries that we want to use together.</p>
<p>It's not clear that curio could run nicely on top of asyncio – in
particular, it seems difficult to reconcile their different ideas of
how to manage the lifecycle of the event loop itself, and as we've
seen, asyncio's higher-level abstractions are not useful to
curio. They do already share code in the form of the <tt class="docutils literal">selectors</tt>
module (which is a great addition that asyncio brought to the
stdlib!). Unfortunately, <tt class="docutils literal">selectors</tt> isn't high-level enough to
abstract over the differences between Unix and Windows, and as a
result curio doesn't currently have great Windows support... but
unfortunately there currently is no such abstraction layer that could
be shared between curio and asyncio, because as mentioned above,
asyncio's IOCP abstractions rely on the transport interfaces, and
those are not useful to curio. It would be great if there were a
library that abstracted over these platform differences at a lower
level than asyncio does – maybe libuv could serve as an inspiration.</p>
<p>In addition, the main theme of this essay is that libraries written
using async/await natively will be simpler and higher-quality than
libraries written using callbacks, with or without an async/await
layer added on top. So ideally we want to throw out those old
libraries using the old APIs and replace them! This might be
particularly true given the current interest in migrating to <a class="reference external" href="https://sans-io.readthedocs.io/">Sans I/O</a>-style protocol libraries –
which, in addition to their advantages in terms of design and
maintainability, also make it much easier to migrate between
incompatible I/O APIs, which makes direct interoperability less
urgent.</p>
<p>...but of course, while the "throw out all the legacy code" strategy
might work okay for green-field projects using popular protocols like
HTTP, it doesn't help with that legacy tornado app, and it's probably
going to be a while until we see an async/await-native client for
<a class="reference external" href="https://twistedmatrix.com/documents/current/api/twisted.news.nntp.html">NNTP</a>
or <a class="reference external" href="https://twistedmatrix.com/documents/current/mail/examples/">IMAP4</a>. So it
would be nice to have some kind of interopability story. One approach
would be: start two OS threads. In one thread, run your
async/await-native event loop; in the other thread, run your twisted
reactor. Communicate by message passing. This approach is really
crude... but, the programming models are different enough that
message-passing might be what you want to use <em>anyway</em>. (I mean, curio
doesn't even have a <tt class="docutils literal">Deferred</tt> / <tt class="docutils literal">Future</tt> concept, and for good
reasons.) So even this crude approach might give you 90% of what you'd
get by merging the underlying event loops, and with much less fuss?</p>
<p>This suggests that a library for cross-event-loop message passing
might be an interesting short-term target for those who are worried
about interoperability.</p>
<p>As for plugging in different backends, like for GUI framework
interoperability: I'm not sure how that might work, and am not enough
of a GUI programmer to have any useful insight into how async/await
affect GUI programming. Definitely an interesting open question.</p>
</div>
</div>
</div>
<div class="section" id="where-next">
<h2><a class="toc-backref" href="#id33">Where next?</a></h2>
<p><strong>Update: 2019-02-06:</strong> There's also now a <a class="reference external" href="https://trio.discourse.group/t/discussion-some-thoughts-on-asynchronous-api-design-in-a-post-async-await-world/32">discussion thread for this
post on the Trio forum</a></p>
<p>If you want to read more or talk about curio: there's the <a class="reference external" href="https://curio.readthedocs.io/">fine manual</a>, the <a class="reference external" href="https://github.com/dabeaz/curio">github page</a>, and the <a class="reference external" href="https://forum.dabeaz.com/c/curio">discourse-based
discussion forum</a>.</p>
<p>If you want to talk more about async API design in general, then the
<a class="reference external" href="https://mail.python.org/mailman/listinfo/async-sig">async-sig@python.org mailing list</a> might be a
good venue.</p>
<p>You can also, of course, contact me <a class="reference external" href="mailto:njs@pobox.com">in person</a> –
though for general discussion, I'd rather stick to a public forum
where others can benefit and join in. I guess you can also <a class="reference external" href="https://twitter.com/vorpalsmith">"tweet at
me"</a>? I've only been on twitter
for 2 days so I'm still figuring out the lingo.</p>
<p><strong>Edit:</strong> Some interesting followup discussions:</p>
<ul class="simple">
<li><a class="reference external" href="https://www.reddit.com/r/Python/comments/5bdf48/some_thoughts_on_asynchronous_api_design_in_a/">Thread on /r/Python</a></li>
<li><a class="reference external" href="https://mail.python.org/pipermail/async-sig/2016-November/000162.html">Thread on async-sig</a></li>
</ul>
</div>
<div class="section" id="acknowledgements">
<h2><a class="toc-backref" href="#id34">Acknowledgements</a></h2>
<p>Without twisted and tornado, there'd be no asyncio; without asyncio
and <a class="reference external" href="https://www.python.org/dev/peps/pep-0342/">PEP 342</a>, there'd be
no async/await; without asyncio and async/await, there'd be no curio;
and without twisted, tornado, asyncio, curio, and others, then this
essay wouldn't exist. So many thanks to all the folks who've spent the
last 15+ years pushing forward on these hard and exciting problems.</p>
<p>Thanks to Rose Lemberg for help with Russian.</p>
<p>Thanks to Yury Selivanov, Andrew Svetlov, and David Beazley for
providing feedback on draft versions of this essay. Any remaining
errors and infelicities are, of course, entirely my fault.</p>
</div>
Emerging from the underworld2016-10-24T00:00:00-07:002016-10-24T00:00:00-07:00Nathaniel J. Smithtag:vorpus.org,2016-10-24:/blog/emerging-from-the-underworld/<img alt="Github contributions graph showing an empty void from mid-July through mid-October." class="align-right" src="https://vorpus.org/blog/emerging-from-the-underworld/github-contributions.png" style="width: 200px;" />
<p><a class="reference external" href="http://scipy2016.scipy.org/">SciPy</a> this year was awesome – there's
so much wonderful stuff happening in the community, and I had lots of
great, productive conversations about how to move forward on different
projects that I'm really excited about. ...and then immediately
afterwards, I pretty much dropped everything on the floor and
disappeared for 3 months, including cancelling <a class="reference external" href="https://www.euroscipy.org/2016/">several</a> <a class="reference external" href="https://plotcon.plot.ly/">talks</a> <a class="reference external" href="http://www.numfocus.org/blog/numfocus-summit-2016">and</a> <a class="reference external" href="http://cds.nyu.edu/msdse-summit2016/">trips</a>. Some people know part of
why, but I definitely haven't been as good about keeping my friends
and collaborators updated as I'd like to be. So here's an update.</p>
<p>The short version: the week after SciPy I had a minor surgery
scheduled, and that went fine – but then afterwards, instead of
recovering, I just got sicker and sicker. I couldn't eat – I lost 30
lbs in 60 days – I was feverish and in pain and sleeping 12-16 hours a
day, anemic... among other things. If you imagine having food
poisoning continuously 2+ months then you won't be too far off.</p>
<p>The good news is that after much medical wrangling (so much medical
wrangling), I finally have a diagnosis. The not-so-great news is that
it's <a class="reference external" href="https://en.wikipedia.org/wiki/Ulcerative_colitis">ulcerative colitis</a>, which is a
fairly serious, chronic, auto-immune disorder in which my immune
system periodically tries to kill my large intestine. There's a lot of
variation in the long-term effects: for some people it's a minor
annoyance, and for others it's a permanent severe disability. There's
every reason to hope that once I'm out of this massive flare triggered
by the surgery etc. then I'll turn out to be on the milder end of the
scale (and in retrospect, I'm suspiciously eyeing some past events as
signs that I might been having mild symptoms that evaded diagnosis for
quite a long time). But realistically, it'll take months or years before
I know for sure where I fall.</p>
<p>For now at least, it's good to know what's going on, and I'm
responding well to treatment. I'm feeling much, <em>much</em> better. Still
not 100%, and I'm trying to ease back into things slowly to avoid
over-extending myself and triggering a relapse (hello, <a class="reference external" href="https://butyoudontlooksick.com/articles/written-by-christine/the-spoon-theory/">spoon theory</a>).</p>
<p>To everyone who's been waiting for a response or followup from me on
something: I'm very sorry for disappearing like this. I'm starting to
work through my disaster zone of an inbox, and realistically it's
going to take some time to get on top of things. I may have to make
some tough choices about dropping previous commitments, or asking
folks to cover for me – but at least I can do a better job
communicating that. I also don't mind if you send me more email or
pings or whatever – if anything, it helps to triage. Just FYI though
that even though I'm doing somewhat better, I'm still going to have
limited bandwidth for a while. Thanks for your understanding and
patience! I'm looking forward to getting through this and back to
working on more awesome stuff.</p>
Stochastic descent2016-10-22T14:56:00-07:002016-10-29T13:44:00-07:00Nathaniel J. Smithtag:vorpus.org,2016-10-22:/blog/stochastic-descent/<p>Emacs has a lot of stuff in it. Like, really, a lot. Most of the time
it doesn't matter, you can just use whatever bits you know and be
happy, but there's always more to learn.</p>
<p>For example, here's a thing I often want to do: starting from whatever
file I have open, open the directory containing that file.</p>
<p>So my fingers have memorized the sequence: <tt class="docutils literal"><span class="pre">C-x</span> <span class="pre">C-f</span> <span class="pre">C-j</span></tt></p>
<p>(For the uninitiated, that's emacs-ese for "while holding down the
control key, press X F J in that order".)</p>
<p>This breaks down as:</p>
<ul>
<li><p class="first"><tt class="docutils literal"><span class="pre">C-x</span> <span class="pre">C-f</span></tt>: this runs the <tt class="docutils literal"><span class="pre">find-file</span></tt> command (or actually
<a class="reference external" href="https://www.masteringemacs.org/article/introduction-to-ido-mode">ido-find-file</a>
in my case), and causes emacs to ask for the name of a file to open.</p>
</li>
<li><p class="first"><tt class="docutils literal"><span class="pre">C-j</span></tt>: The prompt opens with the current directory filled in as
the initial value, so this says "yes, I want that". It's like
hitting enter, except that if you just hit enter without having
typed anything then emacs figures you made a mistake and want to
cancel out; <tt class="docutils literal"><span class="pre">C-j</span></tt> skips that logic and accepts the default.</p>
<p>(Allegedly, this is intuitive: <tt class="docutils literal"><span class="pre">C-j</span></tt> often does something
similar to hitting enter in emacs, and in other classic terminal
tools. This is because in ASCII, the <a class="reference external" href="https://en.wikipedia.org/wiki/Newline">newline</a> character has character
code 10, J is the tenth letter of the alphabet, and on old-school
terminals the "control" key was a modifier where hitting the Nth
letter of the alphabet would send the raw ascii control code N. You
see: intuitive. Looking at the <a class="reference external" href="https://en.wikipedia.org/wiki/ASCII#cite_ref-40">ASCII table</a>, then this also
explains why your terminal sometimes freaks out when trying to
distinguish between <tt class="docutils literal"><span class="pre">C-h</span></tt> and <a class="reference external" href="https://en.wikipedia.org/wiki/Backspace">backspace</a> <a class="footnote-reference" href="#id2" id="id1">[1]</a>, why <tt class="docutils literal"><span class="pre">C-d</span></tt> in
unix command-line tools means <a class="reference external" href="https://en.wikipedia.org/wiki/End-of-Transmission_character">end-of-file</a>, why
<tt class="docutils literal"><span class="pre">C-g</span></tt> is traditionally used for <a class="reference external" href="https://en.wikipedia.org/wiki/Bell_character">interruptions</a>, and why Windows
files on Unix sometimes end up displayed with <a class="reference external" href="https://en.wikipedia.org/wiki/Carriage_return">^M crud</a>. The more you know
🌠)</p>
</li>
</ul>
<p>Anyway, I've been typing <tt class="docutils literal"><span class="pre">C-x</span> <span class="pre">C-f</span> <span class="pre">C-j</span></tt> multiple times a day for,
uh. 15 years now? Probably more?</p>
<p>Last night I missed a keystroke, and accidentally typed: <tt class="docutils literal"><span class="pre">C-x</span> <span class="pre">C-j</span></tt></p>
<p>And it worked!</p>
<p>It turns out that <tt class="docutils literal"><span class="pre">C-x</span> <span class="pre">C-j</span></tt> is <tt class="docutils literal"><span class="pre">dired-jump</span></tt>, which opens a dired
buffer for the directory containing the current file, and then puts
the cursor on top of that file. So actually it works even better than
<tt class="docutils literal"><span class="pre">C-x</span> <span class="pre">C-f</span> <span class="pre">C-j</span></tt>, which doesn't do that last part. (Also, if you're
already in a dired-buffer, <tt class="docutils literal"><span class="pre">C-x</span> <span class="pre">C-j</span></tt> takes you up to the parent
directory.)</p>
<p>I never knew this existed, even though it's been lurking there
forever, just 1 <a class="reference external" href="https://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein distance</a> away. As far as
I know, the similarity is a complete and utter coincidence –
one "j" is short for "jump", and the other is short for "the tenth
letter of the alphabet".</p>
<p>I've probably used it 20 times since last night.</p>
<p>[Update, 2016-10-29: I'm informed that the <tt class="docutils literal"><span class="pre">C-x</span> <span class="pre">C-j</span></tt> key binding
isn't present by default, but requires that you have loaded the
<tt class="docutils literal"><span class="pre">dired-x</span></tt> package. You can do this by adding some <a class="reference external" href="https://www.gnu.org/software/emacs/manual/html_node/dired-x/Optional-Installation-Dired-Jump.html#Optional-Installation-Dired-Jump">autoload nonsense</a>
to your .emacs, or just <tt class="docutils literal">(require <span class="pre">'dired-x)</span></tt>. While you're at it you
should turn on <a class="reference external" href="https://www.gnu.org/software/emacs/manual/html_node/dired-x/Omitting-Files-in-Dired.html#Omitting-Files-in-Dired">dired-omit-mode</a>,
which is apparently why I had <tt class="docutils literal"><span class="pre">dired-x</span></tt> in the first place.]</p>
<p>(While we're here, a random bonus emacs tip that also took me wayyyy
too long to discover: in dired-mode, try hitting <tt class="docutils literal"><span class="pre">C-x</span> <span class="pre">C-q</span></tt>. It makes
the buffer editable, so you can go around and change the text so that
it looks like it's describing the directory that <em>you wish you had</em> –
you can rename files, use search-replace, whatever – and when you hit
<tt class="docutils literal"><span class="pre">C-x</span> <span class="pre">C-s</span></tt>, emacs will go and rearrange the real files on disk to
match your changes. Plus, if any of those files are open in emacs, then
the buffers will be automagically redirected to point to the new name,
so you avoid the annoying thing where you rename the file in the
terminal and then the next time you save emacs puts it back where it
was before.)</p>
<table class="docutils footnote" frame="void" id="id2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#id1">[1]</a></td><td>Do please appreciate the Wikipedian who labored over this page's
illustrative <a class="reference external" href="https://en.wikipedia.org/w/index.php?title=Backspace&oldid=735677289#/media/File:Backspace.jpg">figure + caption</a>.</td></tr>
</tbody>
</table>