Integrates Backend Multitasking | Python Async I/O Guide | Fluid Attacks

Multitasking

Introduction

It’s all about doing multiple things at the same time.

Multitasking involves handling multiple tasks simultaneously. By understanding multitasking, you can optimize software to leverage available hardware efficiently.

In web servers

Traditionally, a server handles one request at a time, leaving others waiting. To scale, servers spawn multiple copies, each still handling only one request at a time.

In the early 2000s, as web traffic surged, engineers encountered the C10K problem, requiring a solution to handle 10000 concurrent requests efficiently. They explored two options:
  1. Asynchronous I/O: utilize single-threaded asynchronous I/O, relying on the Operating System support to trigger I/O operations, and notify later once completed, allowing serving others simultaneously.
  2. Multi-threading: Serve one client per thread, but with increased resource consumption, as each thread allocates a portion of memory for its stack, and spends some CPU cycles in context switching, which in the hardware of that era was a matter of big concern.
Projects like NGINX, Node.js, and Twisted emerged, implementing the asynchronous I/O approach.

Options

Let’s explore multitasking options in Python.
Tip
Tip
⚙️ CPU-bound functions: Involve mathematical operations or iterating over large data sets.
def cpu_bound_function():
i = 0
while i < 999_999_999:
i += 1
return i
🌐 I/O-bound functions: Involve reading from or writing to a file, network, or database.
def io_bound_function():
response = requests.get(" https://veryslowsite.com/ ")
 return response

Threads

Good for 🌐 I/O-bound functions

Threads are akin to multiple lanes on a highway, allowing independent paths for tasks to proceed simultaneously. They are commonly used for multitasking, particularly suitable for handling I/O.bound functions.

Threads are efficient for tasks involving I/O operations but may not fully utilize multi-core processors for ⚙️ CPU-bound functions due to Python’s design limitations.

from threading import Thread

def get_data():
result = database.query()
return result
def send_mail():
result mailer.send()
return result
t1 = Thread(target=get_data)
t2 = Thread(target=send_mail)
t1.start()
t2.start()
t1.join()
t2.join()

Processes

from multiprocessing import Process
def calculate_fibonacci():
result = fibonacci(100)
return result
def calculate_pi():
result = digits_of_pi(100)
return result
p1 = Process(target=calculate_fibonacci)
p2 = Process(target=calculate_pi)
p1.start()
p2.start()
p1.join()
p2.join()

Async I/O

Good for 🌐 I/O-bound functions

Web applications commonly involve reading/writing to files, databases, and calling external services via HTTP requests. All of that time spent waiting for each call to complete is time wasted not processing other stuff, so here is where async I/O comes in handy to improve the throughput of an application.

Unlike threads, where the Operating System’s scheduler preemptively decides when to execute and interrupt functions, in this model, functions cooperate so they’re executed one at a time, but each explicitly yields control to the next one when it has completed its work or when it is waiting for some event to occur, such as I/O completion.

import aioextensions
# ^ Fluid Attacks library with asyncio utils to simplify its usage

async def get_data():
result = await database.query()
# ^ This will take a while. Keep going and we'll talk later
return result
async def send_mail():
result = await mailer.send()
return result
# While get_data waits for its query, send_mail will be executed
# It's doing multiple things at the same time 🙌
await aioextensions.collect([
get_data(),
send_mail(),
])

So, in this way of doing things, you get the benefits of multitasking without worrying about issues such as thread safety, but it also comes with its challenges.

Challenges

The main challenge of cooperative multitasking is its reliance on cooperation from all functions within the application.

In this model, each function must voluntarily yield control to other functions when it’s not actively processing work. However, if a function fails to yield control when necessary, it can cause delays in processing other requests or even lead to the entire application server becoming unresponsive.

While these issues can be identified and mitigated, they represent inherent risks in this design. In cases where reliability takes precedence over performance requirements, this model may not be the most suitable choice.

Idea
Tip
Try running the following examples in your Python console and spot the difference.

import asyncio
import aioextensions
async def get_data():
print("get_data started")
await asyncio.sleep(5)
print("get_data finished")
async def send_mail():
print("send_mail started")
await asyncio. sleep(3)
print("send_mail finished")
aioextensions.run(
aioextensions.collect([
get_data(),
send_mail(),
])
)

import asyncio
import aioextensions
import time
async def get_data():
print("get_data started")
time.sleep(5)
# ^ From the good old standard library, what could go wrong?
print("get_data finished")
async def send_mail():
print("send_mail started")
await asyncio.sleep(3)
print("send_mail finished")

# 😰 Oh no, get_data calls a ⌛️ blocking function
# send_mail will not even be triggered until it finishes!
aioextensions.run(
aioextensions.collect([
get_data(),
send_mail(),
])
)

FAQ

  1. Why was asyncio chosen for usage in our components?

  2. Asyncio is considered a good approach for applications with numerous 🌐 I/O-bound functions.

    Before 2019, our components used synchronous Python, but the decision was made to embrace asyncio to enable performance improvements.

    While some components still find threads and processes more suitable for their use case, asyncio offers advantages for I/O-bound tasks.
  1. What should I keep in mind when working on asyncio applications?

    1. Do not use ⌛️ blocking functions
    2. You do not use ⌛️ blocking functions
    3. Avoid using ⌛️ blocking functions

  2. For real, what are some tips to avoid breaking stuff?

    1. Be aware of ⌛️ blocking functions and either look for asyncio-compatible alternatives or wrap calls using  in_thread  to make them non-blocking.
    2. Many functions in Python’s standard library are ⌛️ blocking, as it pre-dates the  asyncio  way of doing things.
    3. If you’re using a third-party library, look for asyncio support in the docs, and if it doesn’t have it, consider opening an issue to let the maintainers know.

  3. So, is in_thread  as good as native asyncio? Why don’t we just use it everywhere?

  4. Using threads introduces overhead, so it’s advisable to use them only when necessary for specific 🌐 I/O-bound functions known to be ⌛️ blocking.

  5. But, what exactly is a ⌛️ blocking function?

  6. A blocking function is any operation that takes too long before returning or yielding control (using  await ).

    Some commonly used examples include:
    1. requests
    2. urllib.request.urlopen  
    3. time.sleep  
    4. subprocess.run  
    5. open  (including  file.read file.write , and  file.seeks ).

  7. Couldn’t we just lint it in the CI pipeline?

  8. Linting for blocking functions can be challenging since any function can be considered ⌛️ blocking if it takes long enough.
    One approach would be to have a list of functions that are known to be ⌛️ blocking, and break the build if one of them is used in the code. At the time of writing, the closest tool to a linter for this case would be flake8-async, which is likely better than nothing, but falls short in detecting some cases.

  9. What happens if I use  in_process  to run 🌐 I/O-bound functions?

  10. Using multiple processes for I/O-bound functions incurs unnecessary overhead, as using multiple processes only favors ⚙️ CPU-bound functions. Threads are more suitable for I/O-bound tasks and have less overhead.

  11. What happens if I declare a function as async def but never use await inside?            
    async def do_something():
    return "Hello world"
    The function will still run like a normal function, but will have some (usually trivial) overhead as Python generates additional code and treats it as a ‘coroutine’.

Further reading

Idea
Tip
Have an idea to simplify our architecture or noticed docs that could use some love? Don't hesitate to open an issue or submit improvements.