Hello. This year, I have been working on fixing various bugs reported by syzbot. There are three problems which I am spending a lot of time; bugs in loop module, bugs under OOM condition, making printk() messages readable. The loop module, which syzbot is using as an infrastructure for testing various filesystems, has bugs like "crash due to not being thread-safe" and "silently deadlock due to hiding from lockdep inspection". A series of patches which should fix 2 bugs reported more than one year ago have just arriv****@linux***** ( https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/drivers/block/loop.c?h=next-20181109 ), and we are now monitoring whether these patches can fix other bugs where loop module might be the culprit. Regarding OOM, a deadly fight for fixing problems is in progress for many years. Desperate lack of participants, and a very bad situation that people do not consider the worst case / do not test patches at all. MM stands for Memory Management, but it is far from Management regarding Out Of Memory behavior. Since patches are merged without reviewing their correctness, I'm developing reproducers one by one and fixing bugs of bug-fix patches. I finally got to merge a patch into 4.20-rc1 which fixes a problem that the system silently hangups because workqueue does not sleep upon OOM ( https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/page_alloc.c?h=v4.20-rc1&id=15f570bf3d13aa94a97234538a5110d18df03aa3 ). As a different problem, since MMF_OOM_SKIP flag is set too quickly, a problem that the OOM killer needlessly kills more processes is remaining. To mitigate this problem, an approach which hands over setting MMF_OOM_SKIP flag to exiting task was proposed. But since nobody is participating, that approach is stalling because the correctness of the patch cannot be proven. Instead, I proposed a timeout based approach which is possible to prove the correctness ( https://lkml.kernel.org/r/15400****@I-lov***** ), and the collision between these two approaches remains. Then, yet different problems like "lockup caused by in-kernel memory leak" and "flooding bug caused by memcg OOM handling" are discovered because syzbot started testing unusual cases. The former is, the system can lockup when printk() is massively called for reporting Out Of Memory situation because printk() is a slow operation. The latter is, console becomes unusable when memcg OOM killer was not able to find a process to terminate because printk() is called forever for reporting that there is no killable process. Regarding this problem, a collision regarding how to reduce the frequency of calling printk() remains. printk() is a kernel function which corresponds to printf() for userspace programs. printk() works fine if one line of message (a text string which ends with '\n') is printed by one printk() call. But when multiple printk() calls are used for printing one line of message, by concurrently calling printk() from multiple threads, it becomes difficult to parse the printed messages because multiple messages are get mixed or '\n' is emitted more than needed. Since fuzzing test attempts unusual behavior repeatedly and/or intentionally passes unusual arguments, a lot of messages are printed. And it is important that we can pick up messages related to unexpected results so that we can figure out that a problem occurred. Regarding userspace program, a global variable "stdout" is shared by only that process. But in kernel space, all threads on the system share a "stdout"-equivalent global variable. Therefore, in order to prevent messages from being mixed, we need to pass a "FILE *fp"-equivalent variable to all functions which might call printk(). https://lkml.kernel.org/r/3786f****@i-lov***** is an example attempt doing it. (Or, we need to do "snprintf()"-equivalent processing before calling printk() so that one line of message is printed by one printk() call.) Since the kernel is a huge program, you can easily imagine how difficult it is to replace printf() with fprintf(fp) for the tree wide. Since the merit is small despite huge modification, I think that it is unlikely that printk() users update their code to use a new API even if printk() subsystem offered such API. Therefore, I have just proposed a different approach ( https://lkml.kernel.org/r/07dcb****@i-lov***** ) that behaves as if there are multiple "stdout"-equivalent variables by distinguishing printk() callers from printk(). Well, where will these discussions arrive at? :-)