Friday, 6 October 2017

A Script for Sorting a File

This is based on a short script I have used a number of times, but I've modified it to allow for files outside the current directory. The simple version does not actually split lines up - it simply treats each line as a string and the contents of the whole file as a list of strings (thus if the file is full of numbers, they will be sorted ASCIIbetically, not numerically).

A word of warning - this program changes files in a way that is not easily reversible, so be careful which files you use it on. 

It includes a number of imported modules I have not discussed yet - sys and os.path (part of the os module). Here is the actual code:
#!/usr/bin/python3.5
import os.path
import os
import sys
print ("Program to sort the contents of a simple file")
print ("Current Working Directory is: ")
cwd = os.getcwd()
print (cwd)
pathchoice = input("Keep current directory (Y or N)?")
pathchoice = pathchoice.lower()
if pathchoice == "y":
    print ("Okay, looking for file in", cwd)
    print ("Please enter the name of the file to be sorted including extension: ")
    filename = input("? ")
    if os.path.isfile(filename): # if file exists
        fhandle = open(filename, 'r') # then open it up for reading
    else:
        print ("File not found. ")
        sys.exit() # Quits Python
else:
    print ("Please enter new absolute filepath. Remember to add extra backslashes as necessary")
    newpath = input("? ")
    if os.path.isdir(newpath): # checks if newpath is a directory/folder
        os.chdir(newpath) # changes to newpath
        print ("Please enter the name of the file to be sorted including extension: ")
        filename = input("? ")
        if os.path.isfile(filename): # if file exists
            fhandle = open(filename, 'r') # then open it up for reading
        else:
            print ("File not found. ")
            sys.exit() # Quits Python
    elif os.path.isfile(newpath):
        print ("File chosen")
        filename = newpath
        fhandle = open(filename, 'r')
    else:
        print ("Sorry, neither file nor folder found")
        sys.exit()
       
linelist = []
linecount = 0
for line in fhandle:
    linecount +=1
    linelist.append(line)
linelist.sort()
fhandle.close()
for line in linelist:
    print (line, end="") # prints line but doesn't add another newline
print ("number of lines: " + str(linecount))
fhandle = open(filename, 'w') # opens up same file as before but for writing
for line in linelist:
    fhandle.write(line)
fhandle.close()

 So there's a few things in here I feel I ought to explain
cwd = os.getcwd()
I encountered this in a previous post. It uses the getcwd() function in the os module to return the current working directory. Here it is then assigned to the variable cwd.

os.path.isfile(filename)
os.path.isdir(newpath)
These two check whether a given file is a file or if it is a directory and returns a Boolean (True/False) answer which can be used in the if...elif...else structure.

sys.exit() # Quits Python
Perhaps a bit drastic but it works. From what I can tell, it quits the program but not the Python shell, so it is easy enough to run the program again.
linelist = []
linecount = 0
for line in fhandle:
    linecount +=1
    linelist.append(line)

linelist.sort()
This is the heart of the script where it starts with an empty list (linelist) and reads each line in the file object fhandle, and appends the line onto the end of the list, while keeping track of the number of lines read/appended.
It then sorts the list.

print (line, end="") # prints line but doesn't add another newline
This gets around the quirk that the lines being read from the file already have a newline \n at the end, but print usually adds a newline after printing a string. This end="" tells print() to not add anything after the string, so when it prints, there is only one newline per line. 

fhandle = open(filename, 'w') # opens up same file as before but for writing
for line in linelist:
    fhandle.write(line)
fhandle.close()
It might seem unnecessary to open the file to read it, close it, then open it again for writing. There are two reasons why I have done this. 
Firstly if there is something wrong in the earlier part of the program and it quits early or throws an error, the file is not changed. 
Secondly as a Python program progresses through a file it has an imaginary cursor that generally keeps moving forward. Closing then reopening a file resets this progress point back to the start of the file. 

No comments:

Post a Comment