While trying out the reader
project code presented in Chapter 1 of The Python Journeyman,
I problem solved a couple of issues. I'm certain that my issues are related to being a Java defector, and a pythonic newbie.
I'm running Python 3.8.1 on macOS 10.15.
- UnicodeDecodeError: 'utf-8' codec can't decode byte ...: invalid start byte
The invalid start byte, I assume is related to
test.bz2
and the authors'bz2.open(..., mode='wt')
. I usedmode='wb'
. - bound method MultiReader.read of <reader.multireader.MultiReader object at 0x10f4a8a90>
I remember
<object>.to_string()
from my Java days. I googled, found, and added a__str__
method to theMultiReader
class definition. - See my Results below...
# reader/compressed/bzipped.py
import bz2
import sys
opener = bz2.open
if __name__ == '__main__':
f = bz2.open(sys.argv[1], mode='wb')
# capture raw text from command line
text = ' '.join(sys.argv[2:])
# prepare srtring (text)
encoded_text = text.encode(encoding="utf-8", errors="backslashreplace")
f.write(encoded_text)
f.close()
# reader/multireader.py
import os
import re
from reader.compressed import bzipped, gzipped
""" This maps file extewnsions to the corresponding open methods."""
extension_map = {
'.bz2': bzipped.opener,
'.gz': gzipped.opener,
}
class MultiReader:
"""This class reads the contents of a compressed file."""
def __init__(self, filename):
"""Opens a compressed file for reading."""
self.extension = os.path.splitext(filename)[1]
opener = extension_map.get(self.extension, open)
# determine the reader's mode
read_mode = 'rb' if re.search("b", self.extension) else 'rt'
self.f = opener(filename, read_mode)
def __str__(self):
"""returns read content."""
return self.text
def close(self):
self.f.close()
def read(self):
"""Determines whether to decode read content."""
if re.search("b", self.extension):
self.text = self.f.read().decode(encoding="utf-8", errors="ignore")
else:
self.text = self.f.read()
lessons/pyjourney/chap1 took 2m 54s
➜ python3
Python 3.8.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from reader.multireader import MultiReader
>>> r = MultiReader('test.bz2')
>>> r.read()
>>> r.__str__()
'the rain in spain rains mainly on the plane'
>>>
>>> q = MultiReader('test.gz')
>>> q.read()
>>> q.__str__()
'the rain in spain rains mainly on the plane'