Read, Write, and Parse CSV Files in Python Easily

Learn how to read, process, and parse CSV data using Python. This guide covers the essential csv library and demonstrates advanced CSV parsing with the pandas library.

The csv module provides classes for reading and writing tabular data in CSV format. Programmers can utilise this feature to provide the desired format for data to be written in Excel, or to extract data from a file created by Excel, without requiring knowledge of the specific specifics of the CSV format employed by Excel. Programmers have the ability to design their own specialised CSV formats or specify the CSV formats that are understood by other programs.

The reader and writer objects of the csv module are used to read and write sequences. Programmers have the ability to both read and write data in dictionary format by utilising the DictReader and DictWriter classes.

The Importance of CSV Handling Skills

Python developers must possess the ability to read and write CSV files. CSV files are ubiquitous in the data world, frequently employed for the storage and exchange of information between various applications and systems.

The capacity to manipulate CSV files enables the efficient management of data imports and exports, automation of data processing tasks, and integration with a variety of data sources, regardless of whether one is employed in data science, web development, or software engineering.

Developers can achieve more robust and efficient data management capabilities within their applications by mastering the csv module and comprehending its nuances, which enables them to interact with CSV files in a seamless manner. Additionally, these abilities are indispensable for numerous Python development positions, including data cleansing, data analysis, and reporting.

Hiring skilled Python developers might be difficult due to the increasing demand for their services across a variety of sectors. Skilled developers are in high demand due to Python's extensive use in automation, data analysis, machine learning, and web development, as well as its versatility.

What Is a CSV File?

A CSV file, often known as a Comma Separated Values file, is a plain text file that organises tabular data using a specified structure. As a plain text file, it can only consist of real text data, namely readable ASCII or Unicode characters.

The nomenclature of a CSV file reveals its structure. Typically, CSV files include a comma as a delimiter to separate individual data values. This is the visual representation of the structure:

column 1 name,column 2 name, column 3 name
first row data 1,first row data 2,first row data 3
second row data 1,second row data 2,second row data 3
...

Observe the clear distinction between each individual data element, which is denoted by a comma. Typically, the initial line specifies each data element, which is essentially the title of a data column. Each consecutive line following that contains actual data and is restricted only by the limitations of file capacity.

Typically, the letter that separates elements is referred to as a delimiter, and the comma is not the sole delimiter employed. Additional often used delimiters include the tab (\t), colon (:) and semicolon (;) characters. In order to accurately parse a CSV file, it is necessary to determine the specific delimiter being used.

csv.reader (csvfile, dialect='excel', **fmtparams)

Provide a reader object that will handle lines from the specified csvfile. A csvfile should be a collection of strings that can be iterated over, with each string conforming to the specified csv format of the reader. A csvfile often refers to an object or list that resembles a file. If the variable "csvfile" is an object representing a file, it should be opened with the parameter "newline=''' to handle line endings correctly.

[1] A dialect parameter can be provided as an option to specify a collection of parameters that are specific to a particular CSV dialect. It might be an occurrence of a subclass of the Dialect class or one of the strings produced by the list_dialects() method. The user can provide additional fmtparams keyword arguments to modify specific formatting parameters in the current dialect. To obtain comprehensive information on the dialect and formatting factors, please refer to the section under "Dialects and Formatting Parameters''.

The csv file returns each row as a list of strings. Data type conversion is not done automatically unless the QUOTE_NONNUMERIC format option is provided. If this option is used, unquoted fields are converted into floating-point numbers.

An illustrative instance of usage:

import csv
with open('eggs.csv', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
    for row in spamreader:
        print(', '.join(row))

csv.writer(csv file, dialect='excel', **fmtparams)

Retrieve a writer object that is responsible for transforming the user's data into delimited strings on the specified file-like object. The csv file parameter can accept any object that implements a write() function. If the variable "csvfile" is an object of the "file" class, it should be opened using the parameter "newline=''" [1]. A dialect parameter can be provided optionally to specify a collection of parameters that are specific to a particular CSV dialect. It might be an example of a subclass of the Dialect class or one of the strings produced by the list_dialects() method.

The remaining optional fmtparams keyword arguments might be used to supersede specific formatting parameters in the existing dialect. To obtain comprehensive information on dialects and formatting parameters, please refer to the Dialects and Formatting Parameters section. In order to facilitate the interaction with modules that implement the DB API, the value None is represented as an empty string. Although this transformation cannot be reversed, it simplifies the process of exporting SQL NULL data values to CSV files without the need to preprocess the data obtained from a cursor.Retrieve the call. Before being written, any non-string data is converted into strings using the str() function.

Here is a little example of how to use it:

import csv
with open('eggs.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

Where Do CSV Files Come From?

CSV files are typically generated by software applications that manage substantial volumes of data. They serve as a simple method for exporting data from spreadsheets and databases, as well as importing or utilising it in other programs. For instance, you may export the outcomes of a data mining tool to a CSV file and subsequently import it into a spreadsheet for the purpose of analysing the data, creating graphs for a presentation, or preparing a report for publishing.

Working with CSV files programmatically is quite straightforward. Any programming language that has built-in capability for reading text files and manipulating strings, such as Python, may natively handle CSV files.

Parsing CSV Files With Python’s Built-in CSV Library

The csv library offers features for both reading from and writing to CSV files. It is specifically designed to be compatible with CSV files created by Excel, and can be simply adjusted to operate with different types of CSV formats. The csv library comprises many objects and scripts that facilitate the reading, writing, and manipulation of data from and to CSV files.

Reading CSV Files with csv

The process of extracting data from a CSV file is accomplished by utilising the reader object. The CSV file is accessed as a text file using Python's built-in open() method, which returns a file object. The data is subsequently sent to the reader, which performs the demanding tasks.

Below is the file named "employee_birthday.txt":

name,department,birthday month
John Smith,Accounting,November
Erica Meyers,IT,March

Here’s code to read it:

import csv

with open('employee_birthday.txt') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}.')
            line_count += 1
    print(f'Processed {line_count} lines.')

This results in the following output:

Column names are name, department, birthday month
John Smith works in the Accounting department, and was born in November.
Erica Meyers works in the IT department, and was born in March.
Processed 3 lines.

Every row retrieved by the reader consists of a collection of String components that contain the data obtained after eliminating the delimiters. The initial row retrieved includes the names of the columns, and this is managed in a unique manner.

Reading CSV Files Into a Dictionary with csv

Instead of handling a collection of separate String items, you have the option to simply import CSV data into a dictionary, specifically an Ordered Dictionary.

The input file, employee_birthday.txt, remains the same.

name,department,birthday month
John Smith,Accounting,November
Erica Meyers,IT,March

Here’s the code to read it in as a dictionary this time:

import csv

with open('employee_birthday.txt', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        print(f'\t{row["name"]} works in the {row["department"]} department, and was born in {row["birthday month"]}.')
        line_count += 1
    print(f'Processed {line_count} lines.')

This results in the same output as before:

What is the origin of the dictionary keys? The initial row of the CSV file is presumed to consist of the primary identifiers that will be used to construct the dictionary. If the CSV file does not contain these keys, you can define your own keys by specifying the optional parameter "fieldnames" to a list that includes them.

High-paying jobs, flexible work, and top companies await! Become part of Index.dev’s elite talent network and secure remote projects with excellent pay!

Optional Python CSV reader Parameters

The reader object may accommodate many formats of CSV files by supplying extra arguments, a few of which are seen below:

A delimiter is a character that is used to separate each field. The default delimiter is the comma (',').

The parameter "quotechar" determines the character that is used to enclose fields that include the delimiter character. The default value is a double quotation mark (' " ').

The escape character is the character used to escape the delimiter character when quotes are not utilised. The default setting does not include an escape character.

Further clarification is necessary for these characteristics. Assume that you are now dealing with the employee_addresses.txt file.

name,address,date joined|
john smith,1132 Anywhere Lane Hoboken NJ, 07030,Jan 4
erica meyers,1234 Smith Lane Hoboken NJ, 07030,March 2

The CSV file consists of three fields: name, address, and date joined, separated by commas. The issue lies in the fact that the data in the address field includes a comma, which is used to indicate the presence of the zip code.

There exist three distinct approaches to manage this particular circumstance:

Utilise an alternative separator: In this manner, the comma can be employed without any risk inside the data itself. The delimiter optional parameter is used to indicate the new delimiter.
Enclose the data within quotation marks: The distinctive characteristics of your selected delimiter are disregarded within quoted strings. Hence, you have the ability to designate the character utilised for quotation marks by utilising the optional parameter called quotechar. Provided that the character is not included in the data, you will be unaffected.
Ensure that the delimiter characters in the data are properly escaped: Escape characters function in the same way that they do in format strings, rendering the interpretation of the escaped character (in this instance, the delimiter) void. The specification of the escape character, if needed, is required through the optional parameter escapechar.

Reading CSV Files Using Python

The csv module provides the csv.reader() method specifically designed for reading CSV files. It is utilised in conjunction with objects, including file objects, that are generated using Python's built-in open() method.

When a file object is obtained via a call to open(), csv.reader() will provide a reader object. The reader object facilitates the iteration over each line of CSV data, with each row being delivered as a list of strings.

For illustrative purposes, let's consider an example:

import csv

with open('employees.csv', newline='') as file_obj:
    reader_obj = csv.reader(file_obj)
    for row in reader_obj:
        print(row)

Here’s the output of the code above:

['Employee Id', 'First Name', 'Gender', 'Start Date', 'Last Login Time', 'Salary', 'Bonus %', 'Senior Management', 'Team']
['1', 'Douglas', 'Male', '8/6/1993', '12:42 PM', '', '6.945', 'TRUE', 'Marketing']
['2', 'Thomas', 'Male', '3/31/1996', '6:53 AM', '61933', '4.17', '', '']
['3', 'Maria', 'Female', '4/23/1993', '11:17 AM', '', '11.858', 'FALSE', 'Finance']
['4', 'Jerry', 'Male', '3/4/2005', '1:00 PM', '138705', '9.34', '', 'Finance']
['5', 'Larry', 'Male', '1/24/1998', '4:47 PM', '101004', '1.389', 'TRUE', 'Client Services']

The initial code line opens the employees.csv file and then utilises the csv.reader() function to parse it, resulting in the creation of a reader object. The reader object is iterated over using a basic for loop, retrieving a list of data from each row of the employees.csv file, beginning at the top.

Writing CSV Files With csv

To write to a CSV file, you can utilise a writer object and the .write_row() function.

import csv

with open('employee_file.csv', mode='w') as employee_file:
    employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    employee_writer.writerow(['John Smith', 'Accounting', 'November'])
    employee_writer.writerow(['Erica Meyers', 'IT', 'March'])

The optional argument "quotechar" specifies the character that the writer should use to enclose fields while writing. The usage of quotation is specified by the optional argument called "quoting".

If the quoting parameter is set to csv.QUOTE_MINIMAL, the .writerow() function will enclose fields in quotes only if they include the delimiter or the quotation character. This is the standard or typical situation.
If the quoting parameter is set to csv.QUOTE_ALL, then the .writerow() function will enclose all fields in quotation marks.
If the quoting parameter is set to csv.QUOTE_NONNUMERIC, the .writerow() function will enclose all fields that contain text data in quotation marks and convert all numeric fields to the float data type.
If the quoting parameter is set to csv.QUOTE_NONE, the .writerow() function will escape delimiters instead of enclosing them in quotation marks. In this scenario, it is necessary to specify a value for the escapechar optional argument.

Examining the file in its original format reveals that the file is structured in the following manner:

John Smith,Accounting,November
Erica Meyers,IT,March

Writing CSV File From a Dictionary With csv

Given that you have the capability to input our data into a dictionary, it is reasonable to expect that you should also have the ability to output it from a dictionary:

import csv

with open('employee_file2.csv', mode='w') as csv_file:
    fieldnames = ['emp_name', 'dept', 'birth_month']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'emp_name': 'John Smith', 'dept': 'Accounting', 'birth_month': 'November'})
    writer.writerow({'emp_name': 'Erica Meyers', 'dept': 'IT', 'birth_month': 'March'})

Contrary to DictReader, the fieldnames option is obligatory when creating a dictionary. Without a specified list of fieldnames, the DictWriter is unable to determine which keys to use for retrieving values from the dictionaries. Additionally, it utilises the keys in fieldnames to generate the initial row as column names.

The provided code produces the corresponding output file:

emp_name,dept,birth_month
John Smith,Accounting,November
Erica Meyers,IT,March

How to Write CSV Files Using Python

In addition to importing data from CSV files, Python also allows us to export data to these files. The csv.writer() method allows us to write data in CSV format. Once the file is opened in write mode, the csv.writer() function yields a writer object that transforms the given data into delimited strings on the specified file object.

The writer object provides the writerow() method for writing a single row, which consists of an iterable of strings or integers representing comma-separated data. On the other hand, the writerows() function is used to write many rows simultaneously. The writerow() and writerows() methods are the sole alternatives for writing data to a CSV file.

To accomplish the same outcome, the list objects utilised in the aforementioned code snippet may be organised into a 2D list and sent as an input to the writerows() function of the writer object.

Upon execution of the with statement, a CSV file named "products.csv" is generated in the present working directory, containing the data separated by commas.

Allow me to provide you with an illustration:

import csv

with open('products.csv', 'w', newline='') as file_obj:
    writer_obj = csv.writer(file_obj)
    writer_obj.writerow(['Product Name', 'Price', 'Quantity', 'SKU Number' ])
    writer_obj.writerow(['Rice', 80, 35, 'RI59023'])
    writer_obj.writerow(['Curry', 2, 200, 'CY13890'])
    writer_obj.writerow(['Milk', 9.5, 315, 'MK10204'])

Here’s the output of the code above:

Product Name,Price,Quantity,SKU Number
Rice,80,35,RI59023
Curry,2,200,CY13890
Milk,9.5,315,MK10204

Conclusion

The Comma Separated Values (CSV) format is an important way to share data and is often used to import and export data between spreadsheets and databases. The built-in csv function in Python makes it easy to work with CSV files without having to make your own processor. No matter what you need, the csv.reader and csv.DictReader classes let you read data from CSV files as either lists or dictionaries. When writing data to CSV files in an organised way, the csv.writer and csv.DictWriter classes come in handy. This tool is great for most everyday situations where you need to change a CSV file because it is flexible and easy to use.

The pandas library has advanced CSV parsing features that let you handle and analyse more complicated data. This makes it a good choice for large-scale data manipulation jobs. But Python's csv package is more than enough for most situations. You can easily work with CSV files from different sources by using the csv module's many features and choices.

For example, you can manage different delimiters, quote characters, and escape characters. This makes it possible for data to be easily shared between different software programs, so you can easily import, export, and change your data. After reading this piece, you should be able to handle CSV files in Python without any problems.

Ready to build amazing things with Python? Join Index.dev's developer network and access high-paying, long-term remote projects.

Got a project? Let’s talk! Index.dev can assist you in selecting and hiring highly-skilled Python developers who understand how to manage data structures and functions in the construction of high-performance applications.

Index developers can help you:

Use the best performing algorithms and data structures of processing.
Take advantage of Python and its attributes and libraries for efficient data manipulation.
Use clean, readable, and scalable code in order to solve any challenge.

With a vast database of resources and stringent vetting techniques, you get Python specialists with good command of data manipulation, data structures, and functions.

Blog