OggCopy utility

I know I want it, give it to me now

OK then, here you go: Zipped oggcopy.py Python script
See below for the contents of the script file.

OggCopy: Recursively copy/convert flac to ogg/vorbis

For everyone else who doesn't yet realize that they want it now, a little background:
This program (written in the Python programming language) will make a copy of a directory of audio files in the FLAC format, converting each FLAC file to a smaller file in the Vorbis format. Subdirectory names are created as needed so that the tree structure of the directory remains the same in the destination. If you know what that means, skip the next couple of paragraphs and see below for how to use the program.

I keep my music files as FLAC format (Free Lossless Audio Codec), which is "lossless" in the sense that the full CD quality is retained, even to the extent that the original audio file can be recreated exactly, but the file sizes are smaller, and "free" in the sense that not only can you (usually) find a copy which does not require you to pay (cost), but it also allows you to give copies to others, and modify the software if you see the need (liberty).

The other common methods of reducing audio file size, such as converting to MP3 or AAC or Vorbis format, reduce the size by removing some of the audio information in a (mostly) unobjectionable way, and are known as lossy coding. The size of the file can be scaled by dropping increasing amounts of audio information, decreasing the audio quality as the size gets progressively smaller. A lossy file which takes up only 25% of the space of the lossly encoded file can still be very good quality, but once converted, you can never recreate the original audio file exactly.

Lossless coding saves about 35%-40% of the space compared to the unencoded file size, which doesn't allow for many songs on my portable player. The Vorbis format, like MP3 or the AAC format that iPods use, discards some of the data in a tradeoff of audio quality for smaller file size. As an example, a pop song which is has several quiet sections (which makes it easier to reduce the size) is 63MB in WAV format, 33MB in FLAC format, and 7.6MB in Vorbis format at a reasonably high quality level. A hard rock song which is mostly distorted guitar from beginning to end is 45MB in WAV, 33MB in FLAC, and 5.9MB in Vorbis format. You can see from those numbers that the amount of space savings you get can vary quite a bit, especially with lossless format, but the lossy format is always significantly smaller file size.

My portable player ( Sanza Fuze) plays Vorbis encoded music (contained in an Ogg file, hence the description of Ogg/Vorbis), so I used to convert my files from FLAC to ogg/vorbis one directory at a time to put them on my player. That got tedious, so I wrote this utility to convert an entire directory tree of files at one time.

How to use

If you type "oggcopy.py --help" this is what you will see:

Usage: oggcopy [options] fromdir todir
Walk <fromdir> directory tree and every flac file encountered is
converted to an Ogg Vorbis file in the same directory structure under
<todir>>.  Optionally MP3 files are copied without processing.

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -m, --mp3             copy over MP3 files also
  -q QUALITY, --quality=QUALITY
                        encoding quality level for oggenc

Help and version should be relatively self explanatory. The -m or --mp3 option tells oggcopy that any MP3 files encountered should be copied to the destination directory structure unchanged. That is useful in the event that the source directory contains MP3 files in addition to FLAC files, and you want the MP3 files to end up in the destination directory. The default behavior is that the tree of directories from the source argument is created identically in the destination argument, but only FLAC files found in the source directories are converted to ogg/vorbis files in the destination directories; all other file types are ignored. The -q or --quality argument is the encoding quality level from 1 to 10 used by oggenc, and is passed unchanged when oggcopy calls oggenc to perform the conversion. Which brings me to the next topic:

System requirements and how it works

Python program interpreter

Oggcopy is written in the Python programming language, which requires the Python interpreter to be installed on your system. Rather than producing programs which can run stand alone, you either run it this way:

python oggcopy.py

or if you are using a system with Unix heritage, including Apple Macintosh, you can mark the file as executable, and just type the program name, and the command shell will check the beginning of the file to see which program should be used to run the file:

oggcopy.py

Either way, you will need a copy of the Python interpreter installed on your system. If you do not have a copy installed already, you can get one at the Python download page.

One of the great features of Python is the breadth of libraries available to that you don't have to re-invent common actions, and the os.walk() library function does almost all of the work in this program. The main() function parses the command line arguments, and makes sure that the source directory really exists, then calls a function which calls os.walk() to walk through the given source directory, and if any files are found, calls the function process_files() to check if those files are FLAC files, and if they are, call oggenc to convert those files to ogg in the destination directory.

Flac and Ogg/Vorbis tools

The oggenc program needs to be installed. Oggenc is the program which actually performs the conversion to the ogg/vorbis format. The oggcopy program just calls oggenc on each file after it has figured out which files need to be converted. On my Fedora Linux system, oggenc is in the search path, so the script just looks for "oggenc," without a full path specified. If you do not have oggenc in the path, which may be the case on a Windows system, the script might have to be modified with the full path to the encoder program.

Portability notes and use for other file types

This program should work in any environment which has python and oggenc installed, but at the moment has only been used on a Gnu/Linux system. There are a couple of suspect places in the code which may need to be changed for Windows use, such as the previously mentioned path to oggenc, but it will have to be tried on a Windows machine to be sure.
I have tested with Python 2.7 and 3.1.2, and as of version 1.1 the script works as expected with both 2.x and 3.x versions of Python. Anyone who grabbed oggcopy version 1.0 should get the latest if you want it to work with Python 3. Just had to change one except clause to account for a syntax change between Python 2 and Python 3. I have not figured out the best way to verify exception handling, so there is a possibility that certain corner case errors won't be handled as expected. As far as I can tell though, it should work with either Python 2.6 or later, or Python 3.x.
Since I only needed FLAC to ogg conversion, I did not write this program in a way that made it easy to change out the programs used, e.g. to convert FLAC to MP3, or Apple lossless to AAC, or RAW photo files to JPG photo files, whatever other conversions you might find handy. It would not be very difficult to modify to convert different file types, or even to be more generic so that you tell it when you run what file type you would like converted. That is very low on my list of things to do, however feel free to modify this for your own use. I would appreciate hearing if you found this useful, you can mail me at chris at this domain.

The Code

Here is the Python code (what you will get if you download the zip file above):

#!/usr/bin/env python

"""
Process directory of flac files to destination as Ogg Vorbis files.

Useage: oggcopy.py [options] <flac_directory> <vorbis_directory>
Options:
        --mp3, -m Copy any MP3 files found in directory tree
        --quality=n, -q n Use quality level n for encoding

Used for example to produce a size-reduced version of a FLAC
music libary for use on a portable player.
Optionally copy over MP3 files unchanged to new file tree.

Does not copy or encode files if the destination file already exists.
This allows either recovery if an error causes a halt to processing
part way through a directory tree, or picking up only new additions 
since the last time the program was executed.

"""

#  Copyright 2009 Chris Caudle
#
#     This program is free software: you can redistribute it and/or modify
#     it under the terms of the GNU General Public License as published by
#     the Free Software Foundation, either version 3 of the License, or
#     (at your option) any later version.
#
#     This program is distributed in the hope that it will be useful,
#     but WITHOUT ANY WARRANTY; without even the implied warranty of
#     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#     GNU General Public License for more details.
#
#     You should have received a copy of the GNU General Public License
#     along with this program.  If not, see <http://www.gnu.org/licenses/>.
#

# This imports Python 3.x style print functions so that this script
# is ready to go in python ver. 3.1; only works in python 2.6 and later
# To use this file with Python 2.5 and earlier, the calls to print()
# will have to be modified
from __future__ import print_function

# See PEP8 "Python Style Guide" for way to set this version string
# based on source control tag if this is ever placed in repository
__version__ = "%prog 1.1"

import os
import sys
import subprocess
import shutil

# File organization:
# For those not very familiar with Python scripts, the convention is that
# helper functions are defined at the top of the file, so that the parser 
# can finish parsing those functions and has references in memory, and the
# bottom of the file contains a check to see if the entry point is named main,
# which is the case if the script is run stand alone, as opposed to imported
# into another script.  If running stand alone, the necessary start up 
# housekeeping and command line parsing is done, and the helper functions are 
# called to do the actual work.


# Function: process_files
# Take from-dir, to-dir, and list of files.  Build up full path name
# of source file; determine if file should be converted, copied, ignored.
# Build up full path of destination file; copy or convert file as appropriate.
# Note that currently this function bases file match on filename extension
# only, which is somewhat brittle. Checking file header would be better,
# but for practical purposes checking extension should work almost all
# of the time for the files this was designed for, i.e. flac, mp3, ogg, wav.
def process_files(filelist, srcdir, destdir, quality="6", mpcp=False):
    """
    Encode FLAC files in srcdir to Ogg Vorbis files in destdir.
    Pass quality argument to oggenc to set encoding bitrate.
    Boolena variable mpcp controls whether MP3 files are copied (unchanged)
    or not.

    """

    # Could use header tags instead of file names to match,  (like file 
    # program does), but for now file name match is probably acceptable

    for name in filelist:
        # Check for errors? Think about where to check that file exists, dest
        # is writeable, catch errors coming back from oggenc or copy, etc.
        src_file = os.path.join(srcdir, name)
        #have to change dest filename to have different extension
        (dest_file_root, dest_file_ext) = \
            os.path.splitext(os.path.join(destdir, name))

        # Could also use fnmatch.*() functions, but straight text compare
        # seems easiest; change string to lower so that name 
        # matches flac, FLAC, Flac, etc.
        if (dest_file_ext.lower() == ".flac"):
            dest_file = dest_file_root + ".ogg"
            
            # Check to see if file already exists; skip if file exists
            # This allows restarting if first run fails, or running
            # periodically and only picking up new unconverted files.
            # Would be better do to this in one place, but the way the code
            # is currently structured, the destination name is created in
            # different places depending on the extension of the source name
            # so has to be done in each processing section (flac and mp3)

            # only process if destination doesn't already exist 
            if (not os.path.exists(dest_file)): 
                # Place arguments for oggenc into sequence
                enc_cmd = ["oggenc", "-q", quality, src_file, "-o", dest_file]

                # At least on Linux, oggenc seems to create intermediate
                # directories if needed; will need to check if that holds
                # on Windows; if not, or if some other tool other than 
                # oggenc is used in some future variant of this function,
                # will need to create intermediate directories explicitly
                # like for MP3 files below
                try:
                    subprocess.check_call(enc_cmd)
                except subprocess.CalledProcessError as error_object:
                    print("Prior error message was returned from oggenc", 
                          file=sys.stderr)
                    print("Return code was: ", error_object.returncode,
                          file=sys.stderr)

        # End of FLAC processing

        elif (dest_file_ext.lower() == ".mp3" and mpcp):
            # This just puts the file name back together, since there is
            # no variable holding the complete name at this point
            dest_file = dest_file_root + dest_file_ext

            # Need to create destination subdirectory before copying
            try:
                os.makedirs(destdir)
            except OSError:
                # don't care about directory already exists error
                # Should figure out how to isolate the "already exists" error
                # and raise other OSError's up to higher level
                pass
            
            # Check to see if file already exists
            # This allows restarting if first run fails, or running
            # periodically and only picking up new unconverted files
            # Would be better do to this in one place, but the way the code
            # is currently structured, the destination name is created in
            # different places depending on the extension of the source name

            if (not os.path.exists(dest_file)):
                # copy2() cannot create intermediate directories, so make
                # sure destination directory exists before trying to copy
                # Could wrap this in try/except block so that one error would
                # not stop entire process; currently one bad file name will
                # cause entire run to halt
                shutil.copy2(src_file, dest_file)
                print("Copied " + src_file + " to " + dest_file)

        #end of MP3 processing

        else: # Neither flac nor MP3 file, so no processing
            pass
# End of process_files()

# Function: ogg_dir_cp()
# Take a source directory, walk the tree finding 
# flac and optionally MP3 files, and in destination
# directory create the same directory structure as source
# but with flac files encoded to ogg, and MP3 files copied
# unchanged.  This is for taking the home flac music collection
# and making an ogg library for transfer to a portable device
# Defaults to oggqual 6 and copy of MP3 files, which can
# be overridden by calling routine
def ogg_dir_cp(src, dest, encqual, mpcp):
    """Recursively process FLAC files in src to Ogg Vorbis files in dest.

    Encoding quality of oggenc is determined by encqual, which is a
    digit string representing a digit to pass to oggenc.  Boolean
    argument mpcp determines whether MP3 files encountered in
    directory traversal are copied unmodified or ignored.

    """

    # src is source directory
    # dest is top of destination directory
    # encqual is encoding quality argument (numeric value)
    # mpcp is MP3 copy requested boolean

    for (dirname, subdirs, files) in os.walk(src):

        # Call function to determine if file needs to be copied, or
        # converted from flac to ogg

        # Each time through the loop, I get the
        # subdir name already concatenated onto the top source
        # name, so to create the destination name, I have to pop
        # off the original source dir name, take what is left and
        # concatenate onto the original dest dir name

        if (files): # only process if there are files in this sub-directory
            # get path without the top directory name which was passed in
            subdir = os.path.relpath(dirname, src)
            # create destination directory by concatenating the destination
            # which was passed in with current subdir name
            fulldest = os.path.join(dest, subdir)
            process_files(files, dirname, fulldest, encqual, mpcp)

# End of ogg_dir_cp()

# Main entry when called from command line (as opposed to imported into
# another script which wants to use the functions above)
if __name__ == "__main__":

    from optparse import OptionParser


    help_str = "%prog [options] fromdir todir\n\
Walk <fromdir> directory tree and every flac file encountered is \
converted to an Ogg Vorbis file in the same directory structure \
under <todir>.  Optionally MP3 files are copied without processing."

    parser = OptionParser(usage=help_str, version=__version__)
    parser.add_option("-m", "--mp3", action="store_true", dest="mp3also", \
                          default=False, help="copy over MP3 files also")
    parser.add_option("-q", "--quality", dest="quality", default="6", \
                          help="encoding quality level for oggenc")

    (options, args) = parser.parse_args()
    if len(args) != 2:
        parser.error("incomplete arguments")
    
    oggqual = options.quality
    cpmp3 = options.mp3also

    source_dir = args[0]
    dest_dir = args[1]

    #Do some simple verification before kicking off processing
    #Perhaps a more sophisticated check can be added later, e.g.
    #expand to absolute path, check for symlinks, etc.
    #Tested to work with relative paths, so not sure there would be
    #much benefit to expanding path name to absolute

    if not os.path.exists(source_dir):
        parser.error("Source path does not exist")
    if not os.path.isdir(source_dir):
        parser.error("Source argument is not a directory")

    # Would need this if function to create new file didn't also
    # creates any needed directories; 
    # Uncomment this section if behavior on Windows is different,
    # or if using utility other than oggenc which behaves diferently
    # if not os.path.exists(dest_dir):
    #     parser.error("Destination path does not exist")
    # if not os.path.isdir(dest_dir):
    #     parser.error("Destination argument is not a directory")

    # This program will need some helper utilities
    # Make sure that oggenc is available
    print("Checking availability of needed utilities...")

    # This is currently not really OS agnostic.  On Windows the oggenc
    # program will probably not end up in the path, so either you have
    # to put oggenc in the path, or this hard coded string will have to 
    # change to a global variable which can be modified on a script running
    # on Windows
    try:
        subprocess.check_call(["oggenc", "--version"])
    except OSError:
        sys.exit("Required oggenc utility not found")

    #OK, looks like everything is OK to start making paths and
    # encoding files
    ogg_dir_cp(source_dir, dest_dir, oggqual, cpmp3)

# End of __main__ block


Page last updated: 15 November 2010


Valid XHTML 1.0! Valid CSS! Created with GNU Emacs