{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "All the IPython Notebooks in this lecture series by Dr. Milan Parmar are available @ **[GitHub](https://github.com/milaan9/05_Python_Files)**\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Python File I/O\n", "\n", "In this class, you'll learn about Python file operations. More specifically, opening a file, reading from it, writing into it, closing it, and various file methods that you should be aware of." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Files\n", "\n", "Files are named locations on disk to store related information. They are used to permanently store data in a non-volatile memory (e.g. hard disk).\n", "\n", "Since Random Access Memory (RAM) is volatile (which loses its data when the computer is turned off), we use files for future use of the data by permanently storing them.\n", "\n", "When we want to read from or write to a file, we need to open it first. When we are done, it needs to be closed so that the resources that are tied with the file are freed.\n", "\n", "Hence, in Python, a file operation takes place in the following order:\n", "\n", "1. Open a file\n", "2. Close the file\n", "3. Read or write (perform operation)\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Opening Files in Python\n", "\n", "Python has a built-in **`open()`** function to open a file. This function returns a file object, also called a handle, as it is used to read or modify the file accordingly.\n", "\n", "```python\n", ">>> f = open(\"test.txt\") # open file in current directory\n", ">>> f = open(\"C:/Python99/README.txt\") # specifying full path\n", "```\n", "\n", "We can specify the mode while opening a file. In mode, we specify whether we want to read **`r`**, write **`w`** or append **`a`** to the file. We can also specify if we want to open the file in text mode or binary mode.\n", "\n", "The default is reading in text mode. In this mode, we get strings when reading from the file.\n", "\n", "On the other hand, binary mode returns bytes and this is the mode to be used when dealing with non-text files like images or executable files.\n", "\n", "| Mode | Description |\n", "|:----:| :--- |\n", "| **`r`** | Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the default mode. | \n", "| **`t`** | Opens in text mode. (default). | \n", "| **`b`** | Opens in binary mode. | \n", "| **`x`** | Opens a file for exclusive creation. If the file already exists, the operation fails. | \n", "| **`rb`** | Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. This is the default mode. | \n", "| **`r+`** | Opens a file for both reading and writing. The file pointer placed at the beginning of the file. | \n", "| **`rb+`** | Opens a file for both reading and writing in binary format. The file pointer placed at the beginning of the file. | \n", "| **`w`** | Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing. | \n", "| **`wb`** | Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing. | \n", "| **`w+`** | Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing. | \n", "| **`wb+`** | Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing. | \n", "| **`a`** | Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing. | \n", "| **`ab`** | Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing. | \n", "| **`a+`** | Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing. |\n", "| **`ab+`** | Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing. | " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2021-06-18T15:33:59.892237Z", "start_time": "2021-06-18T15:33:59.884426Z" } }, "outputs": [], "source": [ "f = open(\"test.txt\") # equivalent to 'r' or 'rt'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-06-18T15:34:00.640278Z", "start_time": "2021-06-18T15:34:00.635397Z" } }, "outputs": [], "source": [ "f = open(\"test.txt\",'w') # write in text mode" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2021-06-18T15:34:01.153947Z", "start_time": "2021-06-18T15:34:01.134419Z" } }, "outputs": [], "source": [ "f = open(\"logo.png\",'r+b') # read and write in binary mode" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unlike other languages, the character **`a`** does not imply the number 97 until it is encoded using **`ASCII`** (or other equivalent encodings).\n", "\n", "Moreover, the default encoding is platform dependent. In windows, it is **`cp1252`** but **`utf-8`** in Linux.\n", "\n", "So, we must not also rely on the default encoding or else our code will behave differently in different platforms.\n", "\n", "Hence, when working with files in text mode, it is highly recommended to specify the encoding type." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2021-06-18T15:34:05.598733Z", "start_time": "2021-06-18T15:34:05.587019Z" } }, "outputs": [], "source": [ "f = open(\"test.txt\", mode='r', encoding='utf-8')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Closing Files in Python\n", "\n", "When we are done with performing operations on the file, we need to properly close the file.\n", "\n", "Closing a file will free up the resources that were tied with the file. It is done using the **`close()`** method available in Python.\n", "\n", "Python has a garbage collector to clean up unreferenced objects but we must not rely on it to close the file." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2021-06-18T15:34:06.985930Z", "start_time": "2021-06-18T15:34:06.979096Z" } }, "outputs": [], "source": [ "f = open(\"test.txt\", encoding = 'utf-8')\n", "# perform file operations\n", "f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This method is not entirely safe. If an exception occurs when we are performing some operation with the file, the code exits without closing the file.\n", "\n", "A safer way is to use a **[try-finally](https://github.com/milaan9/05_Python_Files/blob/main/004_Python_Exceptions_Handling.ipynb)** block." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2021-06-18T15:34:08.405351Z", "start_time": "2021-06-18T15:34:08.387777Z" } }, "outputs": [], "source": [ "try:\n", " f = open(\"test.txt\", encoding = 'utf-8')\n", " # perform file operations\n", "finally:\n", " f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This way, we are guaranteeing that the file is properly closed even if an exception is raised that causes program flow to stop.\n", "\n", "The best way to close a file is by using the **`with`** statement. This ensures that the file is closed when the block inside the **`with`** statement is exited.\n", "\n", "We don't need to explicitly call the **`close()`** method. It is done internally.\n", "\n", "```python\n", ">>>with open(\"test.txt\", encoding = 'utf-8') as f:\n", " # perform file operations\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The file Object Attributes\n", "\n", "* **file.closed** - Returns true if file is closed, false otherwise.\n", "* **file.mode** - Returns access mode with which file was opened.\n", "* **file.name** - Returns name of the file." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2021-06-18T15:34:11.108946Z", "start_time": "2021-06-18T15:34:11.098205Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name of the file: data.txt\n", "Closed or not : False\n", "Opening mode : wb\n" ] } ], "source": [ "# Open a file\n", "data = open(\"data.txt\", \"wb\")\n", "print (\"Name of the file: \", data.name)\n", "print (\"Closed or not : \", data.closed)\n", "print (\"Opening mode : \", data.mode)\n", "data.close() #closed data.txt file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Writing to Files in Python\n", "\n", "In order to write into a file in Python, we need to open it in write **`w`**, append **`a`** or exclusive creation **`x`** mode.\n", "\n", "We need to be careful with the **`w`** mode, as it will overwrite into the file if it already exists. Due to this, all the previous data are erased.\n", "\n", "Writing a string or sequence of bytes (for binary files) is done using the **`write()`** method. This method returns the number of characters written to the file." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2021-06-18T15:34:17.998538Z", "start_time": "2021-06-18T15:34:17.979009Z" } }, "outputs": [], "source": [ "with open(\"test_1.txt\",'w',encoding = 'utf-8') as f:\n", " f.write(\"my first file\\n\")\n", " f.write(\"This file\\n\\n\")\n", " f.write(\"contains three lines\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This program will create a new file named **`test_1.txt`** in the current directory if it does not exist. If it does exist, it is overwritten.\n", "\n", "We must include the newline characters ourselves to distinguish the different lines." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2021-06-18T15:34:21.520483Z", "start_time": "2021-06-18T15:34:21.515603Z" } }, "outputs": [], "source": [ "with open(\"test_2.txt\",'w',encoding = 'utf-8') as f:\n", " f.write(\"This is file\\n\")\n", " f.write(\"my\\n\")\n", " f.write(\"first file\\n\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2021-06-18T15:34:22.612276Z", "start_time": "2021-06-18T15:34:22.595674Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "done\n" ] } ], "source": [ "# open a file in current directory\n", "data = open(\"data_1.txt\", \"w\") # \"w\" write in text mode,\n", "data.write(\"Welcome to Dr. Milan Parmar's Python Tutorial\")\n", "print(\"done\")\n", "data.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n",
"