Before talking about the spelling of a note (a note has pitch and time properties) I feel it's useful to first guess the tonal center and key that that pitch resides in.
To figure out the tonal center of a note we use the other notes in the piece. We make a few observations about tonal music that will help us:
- Different tonal centres have different relative total durations of pitches.
- Pitches closer to a given moment in time are more relevant for the tonal center at that time than ones further away.
- Lower pitches are more important to the tonal center than higher pitches.
- A longer duration of a single note is less important than that same duration speard over multiple notes.
Sidenote on rejected properties... There are some properties that I rejected for not being very relevant for finding a tonal center, such as volume, timbre, perceived tempo. And there are some properties that are relevant but difficult to acquire without knowing the tonal center in the first place or are just uncommon to have easily available such as cadences, rhytmhic stress, phrasing, stylistic period, meaning of text and much more. If we want to use these things for figuring out the tonal center we need a model that figures these things out in conjuction with the tonal center to find some stable solution that satisfies all constraints well. We may come back to this if we are unhappy with our accuracy
First let's define a configurable falloff to be used to punish distance & or reward low pitches f(x, rate) = x/(|x|+rate)
Now we can create a function that looks like this: weight(distance, pitch, duration) = c (1 - f(distance, a)) f(pitch, b)
Note that duration is the duration of the note that pitch is a part of, not the duration of that pitch, this function is to be applied to all moments in time and all weights for each pitch are summed. a, b & c are constants to be determined.
To actually use this on a collection of notes we need to integrate this over the duration of that note relative to a given point in time. since only distance changes over time we just need to integrate 1-f(x, rate) over a time span v,w:
1 - f(x, rate) = (w - v) + rate log(v + rate) - rate log(w + rate)
So then the full weight equation for a note becomes (we assume the note start and end are relative to t & positive, we may need to split notes up for this to work)
weight(note) = c ((note.end - note.start) + a log(note.start + a) - a log(note.end + a)) (pitch/(pitch+rate))
Now for each moment in time we weigh all of our notes' pitches and get a chroma vector that describes the tonal landscape. Now you extract chroma vectors from pieces you know the key average them and find the r² least difference key relative to the vector you found. You can even find values for a,b & c that try to maximize the accuracy using simulated annealing or similar methods.
What use is this for polyphonic music? The tenor may center on the dominant, etc.... Try this on a chorale.