

Interpreter startup time on my machine is about half the total wall clock time, 30% of the user time, and most of the system time.
#Python vs perl serial#
Comparing apples to oranges: In your original test case, Perl wasn't doing the file I/O or decompression work, the gzip program was doing so (and it's written in C, so it runs pretty fast) in that version of the code, you're comparing parallel computation to serial computation.Having tested a number of possibilities, it looks like the big culprits here are: Here is a comparative table of all the things I've tried so far: method average_running_time (s) Gzip = subprocess.Popen(, stdout=subprocess.PIPE) Python "subproc" solution: # read_million_subproc.py I have now also tried subprocess python module as per Strauser, and others. Here's what the read_million.py line-by-line profiling looks like. What can I do to speed up the Python implementation of this script (even if I never reach the Perl performance)? Perhaps the gzip module in Python is slow (or perhaps I'm using it in a bad way) is there a better solution? EDIT #1 For instance, in Perl, I open up the filehandle using a subproc to UNIX gzip command in Python, I use the gzip library. Now, I know that Perl and Python have some expected differences.

The Python script spends most of its time on for line in fh the Perl script spends most of its time in if($_ eq "1000000").
#Python vs perl code#
I tried profiling both scripts, but there really isn't much code to profile. Print "This is the millionth line: Python"įor whatever reason, the Python script takes almost ~8x longer: $ time perl read_ time python read_million.py Print "This is the millionth line: Perl\n"

My Perl script which processes this file is as follows: # read_ I have a gzipped data file containing a million lines: $ zcat million_ | head
