Create a gist now

Instantly share code, notes, and snippets.

@amitsaha /tail.py
Last active Jun 11, 2018

Embed
Simple implementation of the tail command in Python
'''
Basic tail command implementation
Usage:
tail.py filename numlines
'''
import sys
import linecache
if len(sys.argv) !=3:
print 'Usage: tail.py <file> <nlines>'
sys.exit(1)
# filename and number of lines requested
fname, nlines = sys.argv[1:]
nlines = int(nlines)
# count the total number of lines
tot_lines = len(open(fname).readlines())
# use line cache module to read the lines
for i in range(tot_lines - nlines + 1, tot_lines+1):
print linecache.getline(sys.argv[1],i),
""" This is a more efficient version, since it does not read the entire
file
"""
import sys
import os
bufsize = 8192
lines = int(sys.argv[1])
fname = sys.argv[2]
fsize = os.stat(fname).st_size
iter = 0
with open(sys.argv[2]) as f:
if bufsize > fsize:
bufsize = fsize-1
data = []
while True:
iter +=1
f.seek(fsize-bufsize*iter)
data.extend(f.readlines())
if len(data) >= lines or f.tell() == 0:
print(''.join(data[-lines:]))
break
@HyperManTT

This comment has been minimized.

Show comment
Hide comment
@HyperManTT

HyperManTT Nov 20, 2013

Excellent implementation! Simple and elegant. Well done man!

Excellent implementation! Simple and elegant. Well done man!

@sherbang

This comment has been minimized.

Show comment
Hide comment
@sherbang

sherbang Jun 29, 2015

There's a couple of problems in the second implementation when you exceed the buffer size. f.tell() won't ever == 0 since it's after f.readlines(), and data isn't cleared before f.readlines() so you end up with duplicated data.

There's a couple of problems in the second implementation when you exceed the buffer size. f.tell() won't ever == 0 since it's after f.readlines(), and data isn't cleared before f.readlines() so you end up with duplicated data.

@ksingh7

This comment has been minimized.

Show comment
Hide comment
@ksingh7

ksingh7 Jan 23, 2016

Another way of doing it

#!/usr/bin/python

import sys

if len(sys.argv) !=3:
    print 'Usage: tail.py <file> <nlines>'
    sys.exit(1)

fname, nlines = sys.argv[1:]
num_lines = int(nlines)

with open(fname) as f:
    content = f.read().splitlines()

count = len(content)
for i in range(count-num_lines,count):
  print content[i]

ksingh7 commented Jan 23, 2016

Another way of doing it

#!/usr/bin/python

import sys

if len(sys.argv) !=3:
    print 'Usage: tail.py <file> <nlines>'
    sys.exit(1)

fname, nlines = sys.argv[1:]
num_lines = int(nlines)

with open(fname) as f:
    content = f.read().splitlines()

count = len(content)
for i in range(count-num_lines,count):
  print content[i]
@Kentzo

This comment has been minimized.

Show comment
Hide comment
@Kentzo

Kentzo Feb 23, 2016

And yet another way of doing this: tailhead and pytailer.

Kentzo commented Feb 23, 2016

And yet another way of doing this: tailhead and pytailer.

@mikewen

This comment has been minimized.

Show comment
Hide comment
@mikewen

mikewen Feb 29, 2016

use deque:
from collections import deque
print deque(open(filename), nLines)

mikewen commented Feb 29, 2016

use deque:
from collections import deque
print deque(open(filename), nLines)

@rodmur

This comment has been minimized.

Show comment
Hide comment
@rodmur

rodmur Apr 7, 2017

For what it's worth, I adapted some code from here, basically just using seek() and read() one at time to read backwards and count newlines, that way you don't need a buffer.

#!/usr/bin/python3

import os,sys

def tail_file(filename, nlines):
    with open(filename) as qfile:
        qfile.seek(0, os.SEEK_END)
        endf = position = qfile.tell()
        linecnt = 0
        while position >= 0:
            qfile.seek(position)
            next_char = qfile.read(1)
            if next_char == "\n" and position != endf-1:
                linecnt += 1

            if linecnt == nlines:
                break
            position -= 1

        if position < 0:
            qfile.seek(0)

        print(qfile.read(),end='')


if __name__ == '__main__':
    filename = sys.argv[1]
    nlines = int(sys.argv[2])
    tail_file(filename, nlines)

rodmur commented Apr 7, 2017

For what it's worth, I adapted some code from here, basically just using seek() and read() one at time to read backwards and count newlines, that way you don't need a buffer.

#!/usr/bin/python3

import os,sys

def tail_file(filename, nlines):
    with open(filename) as qfile:
        qfile.seek(0, os.SEEK_END)
        endf = position = qfile.tell()
        linecnt = 0
        while position >= 0:
            qfile.seek(position)
            next_char = qfile.read(1)
            if next_char == "\n" and position != endf-1:
                linecnt += 1

            if linecnt == nlines:
                break
            position -= 1

        if position < 0:
            qfile.seek(0)

        print(qfile.read(),end='')


if __name__ == '__main__':
    filename = sys.argv[1]
    nlines = int(sys.argv[2])
    tail_file(filename, nlines)
@Sairamakrishna-Bhalla

This comment has been minimized.

Show comment
Hide comment
@Sairamakrishna-Bhalla

Sairamakrishna-Bhalla Dec 27, 2017

I tried the second implementation with a ~1.7 GB file and works like a magic whereas the first one fails.

GREAT

I tried the second implementation with a ~1.7 GB file and works like a magic whereas the first one fails.

GREAT

@RufusVS

This comment has been minimized.

Show comment
Hide comment
@RufusVS

RufusVS Mar 29, 2018

I'm an oldie, and the thought of reading a sequential file, one character at a time, backwards, from the end, just makes me shudder.

RufusVS commented Mar 29, 2018

I'm an oldie, and the thought of reading a sequential file, one character at a time, backwards, from the end, just makes me shudder.

@RufusVS

This comment has been minimized.

Show comment
Hide comment
@RufusVS

RufusVS Mar 29, 2018

I've done more reading this thread, and it appears to me there are problems with all the implementations. The very first one ends up reading the file twice, to no point. It reads the file to count the lines, you can just print from that read buffer, instead of just throwing it away and using "linecache" whatever that is. You could simply use:


file_lines = open(fname).readlines()
# count the total number of lines
tot_lines = len(file_lines)

print '\n'.join(file_lines[-(tot_lines if tot_lines < nlines else nlines):])

RufusVS commented Mar 29, 2018

I've done more reading this thread, and it appears to me there are problems with all the implementations. The very first one ends up reading the file twice, to no point. It reads the file to count the lines, you can just print from that read buffer, instead of just throwing it away and using "linecache" whatever that is. You could simply use:


file_lines = open(fname).readlines()
# count the total number of lines
tot_lines = len(file_lines)

print '\n'.join(file_lines[-(tot_lines if tot_lines < nlines else nlines):])

@RufusVS

This comment has been minimized.

Show comment
Hide comment
@RufusVS

RufusVS Mar 29, 2018

Also, the one with the "bufsize" moves back in the file by "bufsize" increments, but actually ends up extending the data by the entire file each jump back! To see the problems, use a line numbered file, and set a small buffsize.

RufusVS commented Mar 29, 2018

Also, the one with the "bufsize" moves back in the file by "bufsize" increments, but actually ends up extending the data by the entire file each jump back! To see the problems, use a line numbered file, and set a small buffsize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment