Gmail Calendar Documents Web Reader more »
Recently Visited Groups | Help | Sign in
Google Groups Home
pimp my code
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Sean McIlroy  
View profile  
 More options Mar 4, 1:27 am
Newsgroups: comp.lang.haskell
From: Sean McIlroy <namenobodywa...@gmail.com>
Date: Wed, 3 Mar 2010 22:27:10 -0800 (PST)
Local: Thurs, Mar 4 2010 1:27 am
Subject: pimp my code
hi all

here is a little module for parsing midi files. it seems to work, but
it's much slower than its python counterpart. maybe somebody could
look through it and see if i'm making any obvious mistakes?

peace
stm

{-------------------------------------------------------------------------- ------------------

A sequence is a tuple (formattype, timedivision, tracks) where

*) formattype is in [0..2]
*) timedivision is either [ticksperbeat] where ticksperbeat is a
nonnegative integer < (2^15)
   or [framespersecond, ticksperframe] where framespersecond,
ticksperframe are nonnegative
   integers with framespersecond < (2^7) and ticksperframe < (2^8)
*) tracks is a list of Events

An Event is either (ChannelEvent deltatime eventtype channel
parameters)
or (MetaEvent deltatime metatype message) or (SysexEvent deltatime
channel msg) where

*) deltatime  is a nonnegative integer
*) eventtype  is in [0..7]
*) channel    is a nonnegative integer < (2^4)
*) parameters is a list whose elements are nonnegative integers <
(2^7)
*) metatype   is a nonnegative integer < (2^7)
*) message    is a string
*) channel    is in [0..15]
*) msg        is a list whose elements are nonnegative integers <
(2^8)

The eventtypes and parameters of ChannelEvents have the following
verbal handles:

eventtype                        parameters

0 = noteoff                      [notenumber, velocity]
1 = noteon                       [notenumber, velocity]
2 = noteaftertouch               [notenumber, amount]
3 = controller                   [controllertype, value]
4 = programchange                [programnumber]
5 = channelaftertouch            [amount]
6 = pitchbend                    [valueLSB, valueMSB]

--------------------------------------------------------------------------- ------------------}

module Hugs where

import System.IO

readmidi filepath = (openBinaryFile filepath ReadMode) >>=
hGetContents

writemidi filepath text = (openBinaryFile filepath WriteMode) >>=
(\handle -> hPutStr handle text)

test filepath1 filepath2 = (readmidi filepath1)  >>=
                           (return . parse)      >>=
                           (return . unparse)    >>=
                           (writemidi filepath2)

chr = toEnum   :: Int  -> Char
ord = fromEnum :: Char -> Int

zeropadded digits minlength = (replicate (minlength - length digits)
0) ++ digits

noleadingzeros digits = until done loop digits
        where
        done (d:ds) = (d /= 0) || (null ds)
        loop (d:ds) = ds

number2digits number base = until done loop [number]
        where
        done (d:ds) = (d < base)
        loop (d:ds) = [div d base, mod d base]  ++ ds

digits2number digits base = sum (zipWith (*) (reverse (noleadingzeros
digits)) basepowers)
        where
        basepowers = 1 : (map (base *) basepowers)

number2fixedlength number minlength = zeropadded (number2digits number
(2^8)) minlength

fixedlength2number digits = digits2number digits (2^8)

number2variablelength number = zipWith (+) digits padding
        where
        digits  = number2digits number (2^7)
        padding = (replicate (length digits - 1) (2^7)) ++ [0]

variablelength2number variablelength = digits2number digits (2^7)
        where
        padding = (replicate (length variablelength - 1) (2^7)) ++ [0]
        digits  = zipWith (-) variablelength padding

largebyte number = (number >= (2^7))

smallbyte number = (number < (2^7))

getfixedlength numbers numbytes = splitAt numbytes numbers

getvariablelength numbers = (variablelength, remainder)
        where
        (v, small : remainder) = span largebyte numbers
        variablelength = v ++ [small]

analyzetimedivision [byte1, byte2] = if indicator==0 then
[ticksperbeat]                   else
                                     if indicator==1 then
[framespersecond, ticksperframe] else undefined
        where
        indicator       = div byte1 (2^7)
        framespersecond = mod byte1 (2^7)
        ticksperframe   = byte2
        ticksperbeat    = (2^8) * byte1 + byte2

synthesizetimedivision [ticksperbeat] = [byte1, byte2]
        where
        byte1 = div ticksperbeat (2^8)
        byte2 = mod ticksperbeat (2^8)

synthesizetimedivision [framespersecond, ticksperframe] = [byte1,
byte2]
        where
        byte1 = (2^7) + framespersecond
        byte2 = ticksperframe

analyzeheaderdata [a,b,c,d,e,f] = (formattype, numtracks,
timedivision)
        where
        formattype   = fixedlength2number  [a,b]
        numtracks    = fixedlength2number  [c,d]
        timedivision = analyzetimedivision [e,f]

synthesizeheaderdata (formattype, numtracks, timedivision) = f ++ n ++
t
        where
        f = number2fixedlength formattype 2
        n = number2fixedlength numtracks  2
        t = synthesizetimedivision timedivision

analyzestatus statusbyte = (eventtype, channel)
        where
        number    = statusbyte - (2^7)
        eventtype = div number (2^4)
        channel   = mod number (2^4)

synthesizestatus (eventtype, channel) = [statusbyte]
        where
        statusbyte = (2^7) + (2^4) * eventtype + channel

data Event = ChannelEvent Int Int Int [Int] | MetaEvent Int Int String
| SysexEvent Int Int [Int]
        deriving (Eq, Show)

synthesizeevent (ChannelEvent deltatime eventtype channel parameters)
= d ++ s ++ parameters
        where
        d = number2variablelength deltatime
        s = synthesizestatus (eventtype, channel)

synthesizeevent (MetaEvent deltatime metatype message) = d ++ s ++
[metatype] ++ m
        where
        d = number2variablelength deltatime
        s = synthesizestatus (7, 15)
        m = number2variablelength (length message) ++ (map ord
message)

synthesizeevent (SysexEvent deltatime channel msg) = d ++ s ++ m
        where
        d = number2variablelength deltatime
        s = synthesizestatus (7, channel)
        m = number2variablelength (length msg) ++ msg

makechunk identifier numbers = identifier ++ (number2fixedlength
(length numbers) 4) ++ numbers

makeheader (formattype, numtracks, timedivision) = makechunk
headeridentifier headerdata
        where
        headeridentifier = [77, 84, 104, 100]
        headerdata = synthesizeheaderdata (formattype, numtracks,
timedivision)

maketrack events = makechunk trackidentifier (concatMap
synthesizeevent events)
        where
        trackidentifier  = [77, 84, 114, 107]

getchunks numbers = fst loopoutput
        where
        done (chunks, remainder) = (null remainder)
        loop (chunks, remainder) = (chunks ++ [chunk], nextremainder)
                where
                pastChunkID                = drop 4 remainder
                (chunksize, pastChunkSize) = getfixedlength pastChunkID 4
                (chunk, nextremainder)     = getfixedlength pastChunkSize
(fixedlength2number chunksize)
        loopoutput = until done loop ([], numbers)

getevent (chunk, runningstatus) = (event, remainder,
nextrunningstatus)
        where
        (timestamp, pastTimeStamp) = getvariablelength chunk
        deltatime                  = variablelength2number timestamp
        explicitStatus             = if largebyte (head pastTimeStamp) then 1
else 0
        (status, pastStatus)       = splitAt explicitStatus pastTimeStamp
        nextrunningstatus          = if null status then runningstatus else
status
        (eventtype, channel)       = analyzestatus (head nextrunningstatus)
        (event, remainder)         = case (eventtype, channel) of
                (7, 15) -> (MetaEvent deltatime metatype message, pastEvent)
                        where
                        metatype : pastMetaType            = pastStatus
                        (messagelength, pastMessageLength) = getvariablelength pastMetaType
                        mlength                            = variablelength2number
messagelength
                        (msg, pastEvent)                   = getfixedlength
pastMessageLength mlength
                        message                            = map chr msg
                (7, _)  -> (SysexEvent deltatime channel message, pastEvent)
                        where
                        (messagelength, pastMessageLength) = getvariablelength pastStatus
                        mlength                            = variablelength2number
messagelength
                        (message, pastEvent)               = getfixedlength
pastMessageLength mlength
                (_, _)  -> (ChannelEvent deltatime eventtype channel parameters,
pastEvent)
                        where
                        numparameters           = if elem eventtype [4,5] then 1 else 2
                        (parameters, pastEvent) = getfixedlength pastStatus numparameters

getevents chunk = fst_loopoutput
        where
        done (events, runningstatus, remainder) = (null remainder)
        loop (events, runningstatus, remainder) = (nextevents,
nextrunningstatus, nextremainder)
                where
                (event, nextremainder, nextrunningstatus) = getevent (remainder,
runningstatus)
                nextevents                                = events ++ [event]
        (fst_loopoutput, _, _) = until done loop ([], [], chunk)

parse filecontents = (formattype, timedivision, tracks)
        where
        headerchunk : trackchunks             = getchunks (map ord
filecontents)
        (formattype, numtracks, timedivision) = analyzeheaderdata
headerchunk
        tracks                                = map getevents trackchunks

unparse (formattype, timedivision, tracks) = filecontents
        where
        numtracks    = length tracks
        header       = makeheader (formattype, numtracks, timedivision)
        filecontents = map chr (header ++ concatMap maketrack tracks)


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dirk Thierbach  
View profile  
 More options Mar 4, 8:27 am
Newsgroups: comp.lang.haskell
From: Dirk Thierbach <dthierb...@usenet.arcornews.de>
Date: Thu, 4 Mar 2010 14:27:45 +0100
Local: Thurs, Mar 4 2010 8:27 am
Subject: Re: pimp my code

Sean McIlroy <namenobodywa...@gmail.com> wrote:
> here is a little module for parsing midi files. it seems to work, but
> it's much slower than its python counterpart. maybe somebody could
> look through it and see if i'm making any obvious mistakes?

There's a number of inefficiencies in your code. I could go through the
code in detail, but that would probably end with a complete rewrite :-),
so I'll just do a summary (unless you still get stuck):

* Haskell has to read a sequence of bytes, convert them into a list of
Char, then another time into a list of Int, and then you process them.
Since there's not a lot of processing done, these conversions alone
will probably make up a significant part of the running time.

* You "analyze" this stream by repeatedly appending to a list, as in

  loop (chunks, remainder) = (chunks ++ [chunk], nextremainder)

The append operator (++) is *linear* in the size of the first
argument. That means here you are turning something that is linear
(go through a list of Ints) into something that is quadratic. That's
an antipattern that's easy to remember: Whenever you write something like
"xs_so_far ++ [x]", it's very likely a mistake (in terms of efficiency).

* The same mistake when you "synthesize".

* Some minor things, like using both "div" and "mod" instead of "divMod",
  and using those instead of the shift and logical and operations in
  Data.Bits in the first place.

* Adding (at least some) type signatures will also help, because it
  will allow the compiler to specialize functions, if possible.

You can fix the first point by using Data.ByteString (or the lazy variant,
if you don't want to read the whole file into memory at once). That will
also help a lot with your way of appending things at the end.

The next step would be to use the correct techniques for "analyzing" a
list (consume some prefix, return the rest or call a continuation on it)
and for "synthesizing" it (prepend some prefix, compute the tail
lazily, possibly in a continuation). Look at "reads" and "shows" in the
Prelude to get an idea how this works.

For your number conversions it's also helpful to remember that if you
have to process something twice in the first place, you can use an
accumulator each time. That will reverse the intermediate value,
but since you're reversing twice, it's ok.

Finally, some minor nitpicks:

> chr = toEnum   :: Int  -> Char
> ord = fromEnum :: Char -> Int

These already exist in Data.Char.

> noleadingzeros digits = until done loop digits
>        where
>        done (d:ds) = (d /= 0) || (null ds)
>        loop (d:ds) = ds

That could be done with

  noleadingzeros digits = dropWhile (== 0) digits

but you'll probably need to do it in a different way if you rearrange
the numeric conversions.

Familiarize yourself with the various functions in Data.List. You'll
need them. Frequently :-)

Also, stay away from the "until"-loops -- it looks like you're just
converting imperative code here, the idiomatic (and more efficient)
variant will in 9 out of 10 cases look different.

> analyzetimedivision [byte1, byte2] =
>   if indicator==0 then [ticksperbeat] else
>   if indicator==1 then [framespersecond, ticksperframe] else undefined
>   where ...

Multiple cases like these can be written with guards:

  analyzetimedivision [byte1, byte2]
    | indicator==0  = [ticksperbeat]
    | indicator==1  = [framespersecond, ticksperframe]
    | otherwise     = undefined
    where
      indicator = ...

That avoids the nested if's. Note the indentation, "where" must be aligned
with the |'s.

HTH,

- Dirk


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sean McIlroy  
View profile  
 More options Mar 5, 3:25 pm
Newsgroups: comp.lang.haskell
From: Sean McIlroy <namenobodywa...@gmail.com>
Date: Fri, 5 Mar 2010 12:25:13 -0800 (PST)
Local: Fri, Mar 5 2010 3:25 pm
Subject: Re: pimp my code
On Mar 4, 5:27 am, Dirk Thierbach <dthierb...@usenet.arcornews.de>
wrote:

thanks for the tips.

peace
stm


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dirk Thierbach  
View profile  
 More options Mar 6, 12:46 am
Newsgroups: comp.lang.haskell
From: Dirk Thierbach <dthierb...@usenet.arcornews.de>
Date: Sat, 6 Mar 2010 06:46:36 +0100
Local: Sat, Mar 6 2010 12:46 am
Subject: Re: pimp my code

Sean McIlroy <namenobodywa...@gmail.com> wrote:
> thanks for the tips.

You're welcome. Getting used to Haskell takes some time, and if you
need more details, feel free to ask.

- Dirk


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2010 Google