Understanding File I/O in Programming Languages

Some Theory

Types of Data Used for I/O:
  • Text: Represented as a sequence of Unicode characters (e.g., ‘12345’).
  • Binary: Represented as a sequence of bytes, typically the binary equivalent of a numerical value (e.g., 12345).
File Types:
  • Text Files: Used for program files where data is represented in a human-readable format.
  • Binary Files: Utilized for storing non-textual data such as images, music, videos, and executable files.

How File I/O is Done in Most Programming Languages

  1. Open a File: Establish a connection to the file.
  2. Read/Write Data: Perform operations like reading from or writing to the file.
  3. Close the File: Terminate the connection to release resources.

Writing to a File

Case 1 – File Not Present:

f = open('sample.txt','w')
f.write('Hello world')
f.close()  # File is closed, so the following won't work
f.write('hello')

Write Multiline Strings:

f = open('sample1.txt','w')
f.write('hello world')
f.write('\nhow are you?')
f.close()

Case 2 – File Already Present:

f = open('sample.txt','w')
f.write('I am Nabin Adhikari')
f.close()

Introducing Append Mode:

f = open('/content/sample1.txt','a')
f.write('\nI am fine')
f.close()

Write Lines:

L = ['hello\n','hi\n','how are you\n','I am fine']

f = open('/content/temp/sample.txt','w')
f.writelines(L)
f.close()

Reading from Files

Using read():

f = open('/content/sample.txt','r')
s = f.read()
print(s)
f.close()

Reading Up To n Chars:

f = open('/content/sample.txt','r')
s = f.read(10)
print(s)
f.close()

Using readline():

f = open('/content/sample.txt','r')
print(f.readline(), end='')
print(f.readline(), end='')
f.close()

Reading Entire File Using readline():

f = open('/content/sample.txt','r')

while True:
  data = f.readline()
  if data == '':
    break
  else:
    print(data, end='')

f.close()

Using Context Manager (With)

  • It’s a good practice to close files after usage to free up resources.
  • The with keyword automatically closes the file after usage.

Example:

# with
with open('/content/sample1.txt','w') as f:
  f.write('I am writing somethings')

# Try f.read() now
with open('/content/sample.txt','r') as f:
  print(f.readline())

Moving Within a File (Reading in Chunks):

with open('sample.txt','r') as f:
  print(f.read(10))
  print(f.read(10))
  print(f.read(10))
  print(f.read(10))

Benefit of Chunk Reading:

The benefits of chunk reading include improved memory management, faster processing, reduced resource usage, enhanced responsiveness, scalability, ease of implementation, and optimized disk I/O. These advantages make chunk reading a valuable approach when working with large datasets or files in various programming scenarios.

big_L = ['hello world ' for i in range(1000)]

with open('big.txt','w') as f:
  f.writelines(big_L)

with open('big.txt','r') as f:
  chunk_size = 10

  while len(f.read(chunk_size)) > 0:
    print(f.read(chunk_size), end='***')
    f.read(chunk_size)

Seek and Tell Functions:

with open('sample.txt','r') as f:
  f.seek(15)
  print(f.read(10))
  print(f.tell())

  print(f.read(10))
  print(f.tell())

Seek During Write:

with open('sample.txt','w') as f:
  f.write('Hello')
  f.seek(0)
  f.write('Xa')

Understanding file I/O operations is fundamental for working with data persistently in programming. The methods mentioned provide flexibility for reading and writing data, and the use of context managers enhances code readability and ensures proper resource management.

Challenges with Working in Text Mode

Working in text mode has limitations, primarily when dealing with binary files or diverse data types. Let’s explore some of the issues:

  1. Inability to Handle Binary Files:
  • Text mode is unsuitable for working with binary files such as images, audio, or video files.
  • Attempting to read or write binary data in text mode may result in data corruption or unintended transformations.

2. Limited Support for Non-Text Data Types:

  • Text mode is designed for handling textual data, making it less efficient for dealing with non-text data types like integers, floats, lists, or tuples.(not good for other data types like int/float/list/tuples)
  • Directly writing non-text data types in text mode may lead to unexpected behavior or loss of information.

Working with Binary Files

Reading Binary File:

with open('screenshot1.png','r') as f:
  f.read()

Copying Binary File:

with open('screenshot1.png','rb') as f:
  with open('screenshot_copy.png','wb') as wf:
    wf.write(f.read())

Working with a Large Binary File:

  • Working with large binary files in text mode may lead to performance issues and potential data corruption. Binary mode ('rb' and 'wb') is preferred for such scenarios.

Handling Other Data Types

Writing Integer as Text:

with open('sample.txt','w') as f:
  f.write(5)

Writing Integer as String:

with open('sample.txt','w') as f:
  f.write('5')

Reading and Manipulating Integer Data:

with open('sample.txt','r') as f:
  print(int(f.read()) + 5)

Working with More Complex Data Types

Writing Dictionary as String:

d = {
    'name': 'nabin',
    'age': 25,
    'gender': 'male'
}

with open('sample.txt','w') as f:
  f.write(str(d))

Reading and Modifying Dictionary Data:

with open('sample.txt','r') as f:
  content = f.read()
  updated_content = content.replace('nabin', 'udus')
  print(dict(updated_content))

Working in text mode is restrictive when dealing with non-textual data or complex data structures. For binary files and diverse data types, it is crucial to use the appropriate mode ('rb', 'wb', etc.) to ensure data integrity and prevent unintended transformations.

Serialization and Deserialization

Serialization is the process of converting Python data types into a format that can be easily stored or transmitted, commonly in JSON (JavaScript Object Notation) format. Deserialization, on the other hand, involves converting data from JSON back into Python data types.

What is JSON?

JSON (JavaScript Object Notation) is a lightweight data interchange format. It is easy for humans to read and write and easy for machines to parse and generate.

Serialization using JSON module

Serializing a List:
import json

L = [1, 2, 3, 4]

with open('demo.json', 'w') as f:
    json.dump(L, f)
Serializing a Dictionary:

The provided code demonstrates the serialization of a Python dictionary (d) to a JSON file using the json.dump method with indentation. Let’s break down the code step by step:

import json

This line imports the json module, which provides methods for working with JSON data.

d = {'name': 'nabin', 'age': 25, 'gender': 'male'}

A Python dictionary (d) is defined with key-value pairs representing information such as name, age, and gender.

with open('demo.json', 'w') as f:
    json.dump(d, f, indent=4)

In this block of code:

  • with open('demo.json', 'w') as f:: This line opens the file named ‘demo.json’ in write mode. The with statement ensures that the file is properly closed after the operations inside the block are completed.
  • json.dump(d, f, indent=4): The json.dump method is used to serialize the dictionary (d) and write it to the opened file (f). The indent=4 parameter adds indentation to the JSON output, making it more readable. The resulting JSON file will have a structured format with each level of nesting indented by four spaces.

Deserialization

Deserializing JSON to a Python Dictionary:
import json

with open('demo.json', 'r') as f:
    d = json.load(f)
    print(d)
    print(type(d))
Serializing and Deserializing a Tuple:
import json

t = (1, 2, 3, 4, 5)

with open('demo.json', 'w') as f:
    json.dump(t, f)
Serializing and Deserializing a Nested Dictionary:
import json

d = {'student': 'nabin', 'marks': [23, 14, 34, 45, 56]}

with open('demo.json', 'w') as f:
    json.dump(d, f)

Serializing and Deserializing Custom Objects

Defining a Custom Object (Person):
class Person:
    def __init__(self, fname, lname, age, gender):
        self.fname = fname
        self.lname = lname
        self.age = age
        self.gender = gender
Serialization of Custom Object:

The provided code demonstrates the serialization of a custom object (Person) to a JSON file using the json.dump method. Let’s go through the code step by step:

import json

This line imports the json module, which provides methods for working with JSON data.

def show_object(person):
    if isinstance(person, Person):
        return "{} {} age -> {} gender -> {}".format(person.fname, person.lname, person.age, person.gender)

Here, a custom function show_object is defined. This function takes a Person object as an argument and returns a formatted string representation of the object. The string includes information such as the first name, last name, age, and gender of the Person object.

person = Person('Nabin', 'Adhikari', 25, 'male')

An instance of the Person class named person is created with specific attribute values (first name ‘Nanin’, last name ‘Adhikari’, age 25, and gender ‘male’).

with open('demo.json', 'w') as f:
    json.dump(person, f, default=show_object)

In this block of code:

  • with open('demo.json', 'w') as f:: This line opens the file named ‘demo.json’ in write mode. The with statement ensures that the file is properly closed after the operations inside the block are completed.
  • json.dump(person, f, default=show_object): The json.dump method is used to serialize the person object and write it to the opened file (f). The default parameter is set to the custom function show_object, which specifies how to convert non-serializable objects (in this case, instances of the Person class) into a serializable format.

The resulting ‘demo.json’ file will contain a JSON representation of the Person object, with the formatting provided by the show_object function. The custom function defines the structure of the serialized data, making it more human-readable and tailored to the specific requirements of the application.

Deserialization of Custom Object:

Explanation of JSON Deserialization

The given code showcases the deserialization of a JSON file back into a Python object using the json.load method. Let’s break down the code step by step:

import json

This line imports the json module, which provides methods for working with JSON data.

def show_object(person):
    if isinstance(person, Person):
        return {'name': person.fname + ' ' + person.lname, 'age': person.age, 'gender': person.gender}

Here, a custom function show_object is defined. This function takes a Person object as an argument and returns a dictionary representation of the object. The dictionary includes keys such as ‘name’, ‘age’, and ‘gender’, with corresponding values extracted from the attributes of the Person object.

with open('demo.json', 'r') as f:
    d = json.load(f)
    print(d)
    print(type(d))

In this block of code:

  • with open('demo.json', 'r') as f:: This line opens the file named ‘demo.json’ in read mode. The with statement ensures that the file is properly closed after the operations inside the block are completed.
  • json.load(f): The json.load method is used to deserialize the content of the opened JSON file (f). It reads the JSON data from the file and converts it back into a Python object.
  • print(d): This prints the deserialized Python object (dictionary) to the console.
  • print(type(d)): This prints the type of the deserialized object, showing that it is a dictionary.

The resulting output will display the contents of the deserialized object and its type. The show_object function used during serialization provided a specific structure for the serialized data, and the deserialization process reconstructs the Python object according to that structure.

It’s important to note that for successful deserialization, the custom object class (Person in this case) should be defined and accessible during the process.

Indent Attribute in JSON Dump:

The provided code demonstrates the serialization of a custom object (Person) to a JSON file using the json.dump method. Let’s break down the code step by step:

import json

This line imports the json module, which provides methods for working with JSON data.

def show_object(person):
    if isinstance(person, Person):
        return {'name': person.fname + ' ' + person.lname, 'age': person.age, 'gender': person.gender}

Here, a custom function show_object is defined. This function takes a Person object as an argument and returns a dictionary representation of the object. The dictionary includes keys such as ‘name’, ‘age’, and ‘gender’, with corresponding values extracted from the attributes of the Person object.

with open('demo.json', 'w') as f:
    json.dump(person, f, default=show_object, indent=4)

In this block of code:

  • with open('demo.json', 'w') as f:: This line opens the file named ‘demo.json’ in write mode. The with statement ensures that the file is properly closed after the operations inside the block are completed.
  • json.dump(person, f, default=show_object, indent=4): The json.dump method is used to serialize the person object and write it to the opened file (f). The default parameter is set to the custom function show_object, which specifies how to convert non-serializable objects (in this case, instances of the Person class) into a serializable format. The indent=4 parameter adds indentation to the JSON output, making it more readable.

In summary, this code snippet serializes a Person object (person) to a JSON file named ‘demo.json’. The custom function show_object defines how the Person object should be represented in the JSON format. The resulting JSON file will have a structured and human-readable format due to the specified indentation.

Pickling: Storing and Retrieving Python Objects

Pickling involves converting a Python object hierarchy into a byte stream, while unpickling is the process of converting a byte stream back into a Python object hierarchy.

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def display_info(self):
        print('Hi, my name is', self.name, 'and I am', self.age, 'years old')

p = Person('nabin', 26)

Pickling – Saving the Object to a File

import pickle

with open('person.pkl', 'wb') as f:
    pickle.dump(p, f)

In this example, the Person object p is pickled and saved to a file named person.pkl in binary mode ('wb').

Unpickling – Retrieving the Object from the File

import pickle

with open('person.pkl', 'rb') as f:
    p = pickle.load(f)

The Person object is unpickled, retrieving the data stored in the file person.pkl.

Displaying Information from Unpickled Object

p.display_info()

The display_info method is called on the unpickled Person object to showcase the information.

Pickle vs JSON

  • Pickle:
  • Allows storing data in a binary format.
  • Suitable for complex Python objects.
  • Not human-readable.
  • Can store a broader range of Python-specific data types.

  • JSON:
  • Stores data in a human-readable text format.
  • Suitable for simple data structures.
  • Human-readable, making it useful for configuration files or data interchange between different languages.
  • Limited to basic data types and structures.

In summary, pickle and JSON serve different purposes. Pickle is suitable for preserving the integrity of complex Python objects, while JSON is more appropriate for human-readable, lightweight data interchange. The choice between them depends on the specific use case and the nature of the data being stored or transmitted.

22 Replies to “File Handling + Serialization & Deserialization”

  1. NGolo Kante https://ngolokante.prostoprosport-ar.com is a French footballer who plays as a defensive midfielder for the Saudi Arabian club Al-Ittihad and the French national team. His debut for the first team took place on May 18, 2012 in a match against Monaco (1:2). In the 2012/13 season, Kante became the main player for Boulogne, which played in Ligue 3.

  2. Kobe Bean Bryant https://kobebryant.prostoprosport-ar.com is an American basketball player who played in the National Basketball Association for twenty seasons for one team, the Los Angeles Lakers. He played as an attacking defender. He was selected in the first round, 13th overall, by the Charlotte Hornets in the 1996 NBA Draft. He won Olympic gold twice as a member of the US national team.

  3. Lebron Ramone James https://lebronjames.prostoprosport-ar.com American basketball player who plays the positions of small and power forward. He plays for the NBA team Los Angeles Lakers. Experts recognize him as one of the best basketball players in history, and a number of experts put James in first place. One of the highest paid athletes in the world.

  4. Sweet Bonanza https://sweet-bonanza.prostoprosport-fr.com is an exciting slot from Pragmatic Play that has quickly gained popularity among players thanks to its unique gameplay, colorful graphics and the opportunity to win big prizes. In this article, we’ll take a closer look at all aspects of this game, from mechanics and bonus features to strategies for successful play and answers to frequently asked questions.

  5. Philip Walter Foden https://phil-foden.prostoprosport-fr.com better known as Phil Foden English footballer, midfielder of the Premier club -League Manchester City and the England national team. On December 19, 2023, he made his debut at the Club World Championship in a match against the Japanese club Urawa Red Diamonds, starting in the starting lineup and being replaced by Julian Alvarez in the 65th minute.

  6. Jamal Musiala https://jamal-musiala.prostoprosport-fr.com footballeur allemand, milieu offensif du club allemand du Bayern et du equipe nationale d’Allemagne. Il a joue pour les equipes anglaises des moins de 15 ans, des moins de 16 ans et des moins de 17 ans. En octobre 2018, il a dispute deux matchs avec l’equipe nationale d’Allemagne U16. En novembre 2020, il a fait ses debuts avec l’equipe d’Angleterre U21.

Leave a Reply

Your email address will not be published. Required fields are marked *