Managing Multithreaded File Writing in Python: Strategies for Concurrent Access

How does Python multi-thread the same file?

The Python multi-threaded file writing solution is to write the same file in only one thread.

import Queue  # or queue in Python 3
import threading

class PrintThread(threading.Thread):
    def __init__(self, queue):
        threading.Thread.__init__(self)
        self.queue = queue

    def printfiles(self, p):
        for path, dirs, files in os.walk(p):
            for f in files:
                print(f, file=output)

    def run(self):
        while True:
            result = self.queue.get()
            self.printfiles(result)
            self.queue.task_done()

class ProcessThread(threading.Thread):
    def __init__(self, in_queue, out_queue):
        threading.Thread.__init__(self)
        self.in_queue = in_queue
        self.out_queue = out_queue

    def run(self):
        while True:
            path = self.in_queue.get()
            result = self.process(path)
            self.out_queue.put(result)
            self.in_queue.task_done()

    def process(self, path):
        # Do the processing job here

pathqueue = Queue.Queue()
resultqueue = Queue.Queue()
paths = getThisFromSomeWhere()

output = codecs.open('file', 'a')

# spawn threads to process
for i in range(0, 5):
    t = ProcessThread(pathqueue, resultqueue)
    t.setDaemon(True)
    t.start()

# spawn threads to print
t = PrintThread(resultqueue)
t.setDaemon(True)
t.start()

# add paths to queue
for path in paths:
    pathqueue.put(path)

# wait for queue to get empty
pathqueue.join()
resultqueue.join()

Python multithreads to write the same file:

The fact that you’ll never see messy text on the same line or a new line in the middle of a line is an indication that you don’t actually need to sync attached to a file. The problem with how Python writes the same file in multiple threads is that you use print to write a single file handle. I suspect that the file handle is actually being operated on 2 times in a single call, and these operations are competing between threads. Basically doing something like:printprint

file_handle.write('whatever_text_you_pass_it')
file_handle.write(os.linesep)

Python multithreaded write files because different threads do this at the same time on the same file handle, sometimes one thread will come in on the first write, and then another thread will come in on the first write, and then you will get two carriage returns in a row. Or really any permutation of these.

The easiest way to solve this problem is to stop using it and use it directly. Try something like this:printwrite

output.write(f + os.linesep)

Implementing multiple threads to write the same file in this way is still dangerous for me. I’m not sure what guarantees you can expect from all threads that use the same file handle object and fight for its internal buffers. The personal id solves the whole problem on the one hand, just having each thread have its own file handle. Also note that this works because the default value for write buffer flushes is row-buffered, so when it flushes a file, it starts with . Force it to send a as the third argument to using the row buffer. You can test it like this:os.linesep1open

#!/usr/bin/env python
import os
import sys
import threading

def hello(file_name, message, count):
  with open(file_name, 'a', 1) as f:
    for i in range(0, count):
      f.write(message + os.linesep)

if __name__ == '__main__':
  #start a file
  with open('some.txt', 'w') as f:
    f.write('this is the beginning' + os.linesep)
  #make 10 threads write a million lines to the same file at the same time
  threads = []
  for i in range(0, 10):
    threads.append(threading.Thread(target=hello, args=('some.txt', 'hey im thread %d' % i, 1000000)))
    threads[-1].start()
  for t in threads:
    t.join()
  #check what the heck the file had
  uniq_lines = set()
  with open('some.txt', 'r') as f:
    for l in f:
      uniq_lines.add(l)
  for u in uniq_lines:
    sys.stdout.write(u)

Python multithreaded write the same file, and the output is as follows:

hey im thread 6
hey im thread 7
hey im thread 9
hey im thread 8
hey im thread 3
this is the beginning
hey im thread 5
hey im thread 4
hey im thread 1
hey im thread 0
hey im thread 2

That’s all for Python multi-threaded files, I hope they can help you, if you have any questions, please comment below.