I think largely non-blocking I/O is a means to an end. Non-blocking I/O allows one to implement a single-threaded event loop, which allows one to skip all the overhead associated with multithreaded systems. It's not the non-blocking I/O which necessarily makes it more scalable, it's the fact that you avoid the heavy cost of context-switching and the memory and CPU complexity of thread creation.