Python traverses every line of a file: how do I read a file line by line?

Introduce

A common task in Python programming is to open a file and parse its contents. What do you do when you’re trying to work with a very large file, such as a few gigabytes or larger? The answer to this question is to use Python to iterate over each line of the file, read in chunks of one file at a time, process it, and then free it from memory so that you can work on another chunk until the entire large file has been processed. While you can determine the right size for the block of data you’re working on, for many applications, it’s appropriate to process one file at a time. In this article, we’ll look at some code examples that demonstrate Python reading a file line by line. If you’d like to try some of these examples for yourself, you can find the code used in this article in the following GitHub repository that can help you quickly implement Python reading files line by line.

  • Basic file IO in Python
  • Python reads files by line readline()
  • Python iterates through the file line by line readlines()
  • Read files line by line using loops – the best way! for
  • An app that reads files line by line

Basic file IO in Python

Python is an excellent general-purpose programming language with a number of very useful file IO features in its standard library of built-in functions and modules. Built-in functions are used to open file objects for read or write purposes. Here’s how to open a file with it: As shown above, the function accepts multiple parameters. We’ll focus on two parameters, the first is the positional string parameter, which represents the path to the file to be opened. The second (optional) argument is also a string that specifies the interaction mode you intend to use on the file object returned by the function call. The following table lists the most common patterns, with the default ‘r’ for reads: open()

fp = open('path/to/file.txt', 'r')

open()

modedescription
rOpen to read plain text
wOpen to write plain text
aOpen an existing file to append plain text
rbOpen to read binary data
wbOpen to write binary data

Once you have written or read all the required data in the file object, you need to close the file so that resources can be reallocated on the operating system on which the code is running. Note: It’s always good practice to close a file object resource, but it’s a task that’s easy to forget. While you always remember to call the file object, there’s a more elegant alternative to opening the file object and making sure the Python interpreter cleans up after use: By simply using keywords (introduced in Python 2.5) in the code we use to open the file object, Python will do something similar to the following code. This ensures that whatever file object is used is closed after use: either of these two approaches is suitable, and the first example is more Pythonic. The Returned File Object feature has three common explicit methods (, , and) on data read. This method reads all data into a single string. This is useful for smaller files where you want to text the entire file. And then there’s the useful way to read just a single line, read it in increments at a time, and return them as strings. The last explicit method implements Python to iterate through each line of the file, which will read all the lines of the file and return them as a list of strings. Note: For the rest of this article, we will use the text of the book “Homer’s Epics“, which can be found in the Gutenberg.org and in the GitHub repository where the code for this article is located.

fp.close()

close()

with open('path/to/file.txt') as fp:
    # Do stuff with fp

with

try:
    fp = open('path/to/file.txt')
    # Do stuff with fp
finally:
    fp.close()

open()read()readline()readlines()read()readline()readlines()

Use readline() to read files line by line in Python

Python reads the file line by line: Let’s start with the method of reading a line, which will require us to use a counter and increment it: This code snippet opens a file object with a reference stored in , and then implements Python to read the file line by line by calling that file object in a loop. Then it simply prints the lines to the console. Run this code and you should see something like this: Still, this approach is sketchy and unambiguous. Not very Pythonic for sure. We can use this method to make this code more concise. readline()

filepath = 'Iliad.txt'
with open(filepath) as fp:
   line = fp.readline()
   cnt = 1
   while line:
       print("Line {}: {}".format(cnt, line.strip()))
       line = fp.readline()
       cnt += 1

fpreadline()while

...
Line 567: exceedingly trifling. We have no remaining inscription earlier than the
Line 568: fortieth Olympiad, and the early inscriptions are rude and unskilfully
Line 569: executed; nor can we even assure ourselves whether Archilochus, Simonides
Line 570: of Amorgus, Kallinus, Tyrtaeus, Xanthus, and the other early elegiac and
Line 571: lyric poets, committed their compositions to writing, or at what time the
Line 572: practice of doing so became familiar. The first positive ground which
Line 573: authorizes us to presume the existence of a manuscript of Homer, is in the
Line 574: famous ordinance of Solon, with regard to the rhapsodies at the
Line 575: Panathenaea: but for what length of time previously manuscripts had
Line 576: existed, we are unable to say.
...

readlines()

Python iterates through each line of the file: Use readlines() to read the file line by line

The method reads all the rows and stores them into one. We can then iterate over that list and use, creating an index for each row for our convenience: This results in: Now, while much better, we don’t even need to call the method to achieve the same functionality. This is the traditional way to read files line by line, but there is also a more modern and shorter way. readlines()Listenumerate()

file = open('Iliad.txt', 'r')
lines = file.readlines()

for index, line in enumerate(lines):
    print("Line {}: {}".format(index, line.strip()))
    
file.close()
...
Line 160: INTRODUCTION.
Line 161:
Line 162:
Line 163: Scepticism is as much the result of knowledge, as knowledge is of
Line 164: scepticism. To be content with what we at present know, is, for the most
Line 165: part, to shut our ears against conviction; since, from the very gradual
Line 166: character of our education, we must continually forget, and emancipate
Line 167: ourselves from, knowledge previously acquired; we must set aside old
Line 168: notions and embrace fresh ones; and, as we learn, we must be daily
Line 169: unlearning something which it has cost us no small labour and anxiety to
Line 170: acquire.
...

readlines()

Read the file line by line using a for loop – most Pythonic methods

The return itself is iterable. We don’t need to go through the extraction line at all – we can iterate over the returned object itself. This also makes it easy, so we can write the line number in each statement. How do I read files by line in Python? This is the shortest and most Pythonic way to solve a problem, and the most favored: This leads to: Here, we take advantage of Python’s built-in features to easily iterate on iterable objects just by using loops. If you’d like to learn more about Python’s built-in capabilities for iterative objects, we’ve got you covered: Filereadlines()enumerate()print()

with open('Iliad.txt') as f:
    for index, line in enumerate(f):
        print("Line {}: {}".format(index, line.strip()))
...
Line 277: Mentes, from Leucadia, the modern Santa Maura, who evinced a knowledge and
Line 278: intelligence rarely found in those times, persuaded Melesigenes to close
Line 279: his school, and accompany him on his travels. He promised not only to pay
Line 280: his expenses, but to furnish him with a further stipend, urging, that,
Line 281: "While he was yet young, it was fitting that he should see with his own
Line 282: eyes the countries and cities which might hereafter be the subjects of his
Line 283: discourses." Melesigenes consented, and set out with his patron,
Line 284: "examining all the curiosities of the countries they visited, and
...

for

  • Python iteration tools – count(), cycle() and chain()
  • Python iteration tools: filter(), islice(), map(), and zip()

Python is an application that reads files line by line

How do you actually use it? Most NLP applications process large amounts of data. Most of the time, it’s not wise to read the entire corpus into memory. While basic, you can write a solution from scratch to calculate the frequency of certain words without using any external libraries. Let’s write a simple script that loads the file, reads it line by line and counts the frequency of word occurrences, printing the 10 most commonly used words and how many times they occur: The script uses the module to make sure that the file we are trying to read actually exists. If so, then Python reads the file line by line and each line is passed into the function. It separates spaces between words and adds words to the dictionary – . Once all the rows have been recorded in the dictionary, we sort it by it, which returns a formatted tuple list, sorted by word count. Finally, we use Python to read the file line by line, and then print the first ten most commonly used words. Typically, for this, you’ll create a bag-of-words model, using a library like NLTK, but this implementation is sufficient. Let’s run the script and give it ours: This leads to: If you want to read more about NLP, we’ve got a series of guides on a variety of tasks: natural language processing in Python.

import sys
import os

def main():
   filepath = sys.argv[1]
   if not os.path.isfile(filepath):
       print("File path {} does not exist. Exiting...".format(filepath))
       sys.exit()
  
   bag_of_words = {}
   with open(filepath) as fp:
       for line in fp:
           record_word_cnt(line.strip().split(' '), bag_of_words)
   sorted_words = order_bag_of_words(bag_of_words, desc=True)
   print("Most frequent 10 words {}".format(sorted_words[:10]))
  
def order_bag_of_words(bag_of_words, desc=False):
   words = [(word, cnt) for word, cnt in bag_of_words.items()]
   return sorted(words, key=lambda x: x[1], reverse=desc)

def record_word_cnt(words, bag_of_words):
    for word in words:
        if word != '':
            if word.lower() in bag_of_words:
                bag_of_words[word.lower()] += 1
            else:
                bag_of_words[word.lower()] = 1

if __name__ == '__main__':
    main()

osrecord_word_cnt()bag_of_wordsorder_bag_of_words()(word, word_count)Iliad.txt

$ python app.py Iliad.txt
Most frequent 10 words [('the', 15633), ('and', 6959), ('of', 5237), ('to', 4449), ('his', 3440), ('in', 3158), ('with', 2445), ('a', 2297), ('he', 1635), ('from', 1418)]

Python reads the file summary by line

In this article, we explore the multiple ways Python traverses each line of a file and create a basic bag-of-words model to calculate the frequency of words in a given file.