Skip to content

Instantly share code, notes, and snippets.

@sonney2k
Forked from nicococo/Sketch.cpp
Created May 4, 2012 15:48
Show Gist options
  • Save sonney2k/2595648 to your computer and use it in GitHub Desktop.
Save sonney2k/2595648 to your computer and use it in GitHub Desktop.
Shogun Structured Output Toolbox Interface
class CVanillaStructuredOutputMachine : public CMachine {
// Constructor, Destructor
CStructuredOutputMachine(
CStructuredData* trainset,CLoss* loss,CModel* model)
virtual ~CStructuredOutputMachine
// heritable data
CStructuredData* trainset
CLoss* loss
CModel* model
// nice to have \dots
// simple solution: deliver zeros or rand
virtual vector presolver() {
return zeros(trainset->get_dimensionality(),1)
}
// application specific methods
virtual void init_op()
virtual CResultSet compute_argmax(int index,vector w);
// vanilla SO-SVM training
void train() {
init_op
// assume diagonal regularization matrix with just one value
lambda = C(1,1)
w = presolver
List results = empty
repeat
// amount of constraints generated so far
lens = length(results)
for (i=0;i<trainset->get_size);i++) {
res = compute_argmax(i,w)
if (res->delta+ w*res->phi_pred >
max_j(results_i(j)->delta+w*results_i(j)->phi_pred) {
results_i->add(res)
}
}
// Solve QP
until lens&=&length(results)
}
};
struct CResultSet {
vector phi_example
vector phi_pred
double delta
}
class CStructuredData {
// class containing the structured data, e.g. sequences,
// trees, \dots
int get_size()
int get_dimensionality()
virtual generic get_example(int i)
virtual generic get_label(int i)
}
class CModel {
// Containing the application specific model.
// e.g. State model for HMM-SVM.
CDeltaLoss* get_delta_func();
}
class CLoss {
bool is_smooth
bool is_convex
// loss: Re -> Re^+ ?
bool is_positive
double calc(double x)
// derivatives missing
}
class CDeltaLoss {
// Application specific loss.
double calc(generic y_sol,y_pred)
}
@nicococo
Copy link

nicococo commented May 8, 2012

Thanks for your comments :)

In the current sketch the data is already abstracted
since there is a CStructuredData class which encapsulates it.
The CVanillaStructuredOutputMachine only works on
joint feature maps which are always vectors. Hence,
the current sketch is absolutely sufficient to work on data
without knowing its underlying 'nature'.

The same holds for the delta loss. The corresponding
function is indeed also generic in the current version,
see CDeltaLoss (function calc(y1,y2)). E.g. it could be
a Hamming-loss or 0-1-loss or whatever.

What is missing is:
a) it is necessary for some solvers (bundle methods, subgradient, BFGS, ..)
to get have a first (and possibly second) derivative (or some subgradient)
b) see first post: literally, how to get the data in? NOT, how to work with the
data once it is loaded into CStructuredData.
c) The current data representation should work fine for primal approaches,
but what about duals (e.g. kernels) or should we skip the duals in the first
version of the toolbox?

ps: The joint feature map (always a double vector) in the sketch
above is 'phi', it should be 'psi', sorry :)

@vigsterkr
Copy link

As for now i have only one request: since for latent svm i need to be able to calculate PSI for (x,y,h) (h is the latent variable for each iteration i would need a way to be able to define PSI by the user and call that with (x,y) or (x,y,h) arguments.

@nicococo
Copy link

nicococo commented May 9, 2012 via email

@iglesias
Copy link

--- Log opened Fri May 11 16:29:25 2012
Lastlog:
15:02 ||| Irssi: Starting query in freenode with nicococo
15:02 nicococo
Buenos
15:03 n4nd0
hallo :)
15:03 nicococo
how are you? already finished coding :)
15:03 nicococo
can i use it in my experiments now ;)
15:03 n4nd0
I am fine
15:04 n4nd0
wow not, not yet! I manage now to have a more or less trustful interface
15:05 nicococo
cool, i changed some parts a few hours ago in the gist
15:05 nicococo
and also added some more comments
15:06 n4nd0
ok
15:06 n4nd0
let's see
15:07 n4nd0
ok, so one thing I wanted to discuss with you about the delta loss
15:08 nicococo
okay
15:08 n4nd0
http://dl.dropbox.com/u/11020840/shogun/Test.pdf
15:09 n4nd0
what you can see in the diagram there is more or less what I have implemented this far
15:10 n4nd0
so I have made this class StructuredLoss
15:10 n4nd0
https://github.com/iglesias/shogun/blob/so/src/shogun/so/StructuredLoss.h
15:10 nicococo
wait.. lets start at the leafs with CStructuredLoss
15:10 n4nd0
ok
15:11 nicococo
do you had a look at https://gist.github.com/2634487 ?? or the old version?
15:11 n4nd0
I started to check the new one when we started the conversation
15:11 nicococo
i am asking because, there is no structured loss class anymore
15:12 n4nd0
I think it is the class CLoss
15:12 n4nd0
I gave it that name since I think that CLoss but might a bit too general
15:12 nicococo
the structured loss (=the delta loss) is now in StructuredApplication
15:13 n4nd0
ok, there's a misunderstanding with the naming here
15:13 n4nd0
probably my fault
15:13 nicococo
the CLoss is a very simple class that is for hinge-loss and linear loss aso
15:13 n4nd0
what you call the CLoss in gist is what I have called CStructuredLoss in the code
15:14 nicococo
okay but then you don't need any reference to labels and data
15:14 n4nd0
how to compute the loss without them then?
15:14 nicococo
e.g. for hingeloss:
15:15 nicococo
CHingeLoss.calc(double x) = max(0,x)
15:15 n4nd0
ahm
15:16 nicococo
its nothing more. it is the \ell in the pdf i sent around. the argument is always a scalar
15:16 n4nd0
oh yes
15:17 n4nd0
I see that the function l takes just as argument the maximum of an expression
15:17 n4nd0
of a scalar product
15:17 nicococo
for instance if you have a linear model (SVM) with w'x (x is a datapoint)
15:18 nicococo
then CHingeLoss.calc(w'x) = max(0, w'x) (is it clear??)
15:18 n4nd0
yes, I think so
15:19 nicococo
it is not really important anyway ;)
15:19 n4nd0
but it confuses me a bit
15:19 n4nd0
since in the references they use this function as
15:19 n4nd0
l(x, y, w)
15:19 nicococo
yes. thats not your fault
15:20 nicococo
yes indeed, then CLoss has to know the everything about the problem.. this would be a mess
15:21 n4nd0
then the point is that it is not necessary to have neither features nor labels in the loss, isn't it?
15:21 nicococo
(upss.. i think i forget the y in the example)
15:21 nicococo
yes!
15:21 n4nd0
?
15:21 n4nd0
:D
15:21 nicococo
e.g. l(x,y,w) for svm =
15:22 nicococo
= max(0, 1 - y w'x) right?
15:22 n4nd0
ok
15:22 n4nd0
but wait
15:22 n4nd0
how do you define y w'x?
15:22 nicococo
so for our loss function we set z = 1- y w'x and l(z) = max(0,z)
15:23 n4nd0
I see the part w'x
15:23 nicococo
it is defined in the strcutedoutputmachine .. because this is a single implementation a specific model
15:23 n4nd0
but I mean that if y is something with structure, e.g. sequence
15:24 n4nd0
how can you define y w'x?
15:24 n4nd0
I understand that w'x is a vector product
15:24 n4nd0
isn't it?
15:24 nicococo
sorry.. the example was for a plain binary svm.. i thought you mioght be more familiar with
15:26 nicococo
for so-svm with margin rescaling it is: z = max_y(w'psi(x_i,y) + delta(y_i,y)) - w'psi(x_i,y_i)
15:26 nicococo
AND l(z) = max(0,z)
15:26 nicococo
have a look at gist line 31 & 32
15:26 n4nd0
all right, that expression is something I recognize
15:27 nicococo
in line 31 slack is a scalar in any case
15:27 n4nd0
ok
15:28 nicococo
okay, shall we move on?
15:28 n4nd0
one moment
15:29 n4nd0
any function l (a CLoss in gist) is going to be simply a function that takes an scalar?
15:29 n4nd0
an outputs another scalar?
15:29 nicococo
yes
15:30 n4nd0
aham, good
15:31 n4nd0
let's move to the delta loss now?
15:31 nicococo
ok
15:32 n4nd0
a parenthesis first, what do you prefer to use as naming CStructuredApplication or CStructuredModel?
15:32 nicococo
it is in the CStructuredApplication now
15:32 n4nd0
I like more the second
15:32 nicococo
mmhh.. yes, i know what you mean..
15:32 nicococo
well, lets call it model
15:33 n4nd0
ok
15:33 nicococo
alright
15:33 nicococo
so, one question: our data is now in CFeatures ?
15:34 n4nd0
we have to sync in this aspect yes
15:34 nicococo
thats okay
15:35 n4nd0
so what you call CStructuredData in gist is the output space, isn't it?
15:35 n4nd0
I mean the labels, that belong to Y
15:35 nicococo
it is our data, x and y
15:36 n4nd0
ok, there is something that we should think about here though
15:36 n4nd0
since in shogun this is separated
15:36 n4nd0
normally one has the data of the input space, the x, in a CFeatures
15:36 n4nd0
and the data of the output space, the y, in a CLabels
15:37 nicococo
well, okay then we do it the shogun way :)
15:37 n4nd0
but there's something I need to ask you about this
15:37 n4nd0
so in the problems that we are concerned to solve
15:38 n4nd0
yi, each instance of Y, may be something with structure, e.g. a sequence
15:38 nicococo
yes
15:38 n4nd0
but what about the xi?
15:38 nicococo
same
15:38 n4nd0
ok
15:39 nicococo
e.g. for sequences you have a single y_i that is, say 1xL vector okay
15:39 n4nd0
ok
15:39 nicococo
and the corresponding x_i is FxL matrix (features x length)
15:39 n4nd0
in that case, there would be no problem
15:40 n4nd0
but the thing is
15:40 n4nd0
let's go to an example where we have another structure, I think sequences simplify the things and may head to confussion
15:40 n4nd0
so imagine we have parse trees in the output space
15:41 n4nd0
like the NLP the use as running example in the paper
15:41 nicococo
yes
15:41 n4nd0
does SO cover as well problems where the features, each xi, may be trees as well?
15:42 n4nd0
I thought not
15:42 nicococo
you can represent trees as vectors as well.. structured input. thats no problem
15:42 nicococo
also the sizes of the x_i and y_i might differ from example to example and of course
15:43 nicococo
y_i can be longer than x_i and the other way around
15:43 nicococo
just a matter of representation
15:43 n4nd0
ok
15:43 n4nd0
I see the idea but I have to think if CFeatures is ok to hold this, I think it is
15:44 n4nd0
but I'll ask shogun experts ;)
15:44 nicococo
if it is not sufficient we can also switch to CStructuredOutput.. but i think it is
15:45 nicococo
much better to use shogun structure if possible
15:45 n4nd0
if not Soeren will kick my ass :D
15:45 nicococo
mine too ;)
15:46 nicococo
..and you are in sweden i am here in berlin with soeren..
15:46 n4nd0
haha you're in more dangerous situation in that case then
15:47 n4nd0
let's come back to the topic
15:47 nicococo
yes.. to be more secure i decide that we use CLabels and CFeatures :)
15:47 n4nd0
I think we were going to talk about the delta loss
15:47 nicococo
okay
15:49 nicococo
?
15:50 n4nd0
I placed it inside the other loss function instead of the model
15:50 n4nd0
because the loss function will need it
15:51 nicococo
they are fundamentally different
15:51 nicococo
and also not related to each other
15:52 nicococo
so it belongs more to the model not the CLoss
15:52 n4nd0
mmmm
15:52 n4nd0
aren't they related to the extent that l uses Delta to be computed?
15:52 n4nd0
aham oh I think not
15:53 n4nd0
since you told me before that the input to l is a scalar, then l doesn't need to know about how does Delta look like
15:53 nicococo
yes, CLoss does not know how the x in calc(x) is computed
15:53 n4nd0
I see your point
15:54 n4nd0
the idea is to use this delta loss from the CSOMachine
15:54 n4nd0
and use the result (along with other things you wrote above) as input for l
15:54 n4nd0
is that right?
15:55 nicococo
right
15:55 n4nd0
:)
15:55 nicococo
but the calculation of the delta is done in the CSOModel not in CSOMachine
15:56 n4nd0
yeah
15:56 nicococo
alrighty ..
15:56 n4nd0
but it will be the CSOMachine who calls this function
15:56 n4nd0
it calls it using the CSOModel of course
15:56 nicococo
yes.. i have the feeling we are getting somewhere close :)
15:57 n4nd0
nice :)
15:57 nicococo
okay.. what is missing??
15:57 nicococo
resultset?
15:57 n4nd0
yeah, I didn't do anything related to that yet
15:57 n4nd0
I had a doubt about it, let me read its part in the last gist
15:59 n4nd0
ok
15:59 n4nd0
so the idea is to have a ResultSet instance for every call to argmax
15:59 n4nd0
is that right?
15:59 nicococo
yes
16:00 n4nd0
ok
16:00 nicococo
containing the psi's the delta (and some more) that is necessary for the loss calculation
16:01 n4nd0
ok
16:01 nicococo
and.. for one of the teams we have to take care about the latent variable h so, instead of delta(y1,y2) we need a delta(y1,y2,h)
16:02 n4nd0
ok, here we get to a point I want to discuss with you
16:02 n4nd0
I talked yesterday with wiking, the latent svm guy
16:03 n4nd0
he encouraged to avoid in the design the need of extending CStructuredModel each time a new SO application is to be done
16:03 n4nd0
and suggested to use pointer functions to do that
16:03 n4nd0
https://github.com/iglesias/shogun/blob/so/src/shogun/so/StructuredModel.h
16:03 n4nd0
something like that ^
16:04 nicococo
like that you can pluggin another delta but using the same argmax?
16:04 n4nd0
mmm not exactly
16:04 n4nd0
like you make an instance of CStructuredModel passing the functions to use
16:04 n4nd0
we implement three functions, argmax, loss and psi
16:05 n4nd0
for the application we do
16:05 nicococo
i think it is what i mean: you pass to CSModel the argmax,delta loss ..
16:05 n4nd0
new Model(argmax, loss, psi)
16:05 n4nd0
yes
16:05 n4nd0
sorry, I didn't understand :S
16:05 nicococo
yes yes.. very good idea
16:06 n4nd0
I think that in that case there will be no need of extending CStructuredModel, right?
16:06 nicococo
mmhh.. one has to see that each applicaation has its very specific needs
16:06 nicococo
which means, that even if two applications use hmm they can be use the viterbi in totally different ways
16:07 nicococo
mmhh.. but anyway.. it seems to be a nice idea
16:07 n4nd0
like two viterbi you mean?
16:08 nicococo
there are several possibilities to use viterbi, but they still do the same: predicting the most likely seq
16:08 nicococo
there will be still many CSModels
16:08 n4nd0
mmm
16:09 n4nd0
right now I don't see why, providing that this instantiation giving the functions is used
16:09 nicococo
but it is a good idea .. we should do that for at least some functions
16:10 nicococo
cool ;)
16:10 n4nd0
ok
16:10 n4nd0
so one last thing I would like to ask you
16:10 n4nd0
the idea I have in mind right now is that the class that appears in the diagram
16:10 nicococo
and you are right for standart models it is a big gain if you can plug in the functions :)
16:10 n4nd0
CSOMachine
16:11 n4nd0
would be the ancestor for any so classifier
16:11 n4nd0
e.g. so-svm, crfs or any other
16:11 n4nd0
what do you think?
16:12 nicococo
yes.. but should we differ between CLinearSOMachine and CKernelSOMachine ??
16:12 n4nd0
yes
16:12 n4nd0
but, did you read what Soeren said at IRC yesterday?
16:12 nicococo
no?
16:13 n4nd0
wait, I will show you
16:13 n4nd0
http://www.shogun-toolbox.org/irclogs/%23shogun.2012-05-10.log.html
16:13 n4nd0
look for slow
16:13 n4nd0
is the first hit
16:13 n4nd0
10:33
16:14 n4nd0
at 10:18 he said to you
16:14 n4nd0
nicococo: btw I don't think we should even consider SO with kernels
16:14 nicococo
i see, correction: it is assumed to be very very slow ;)
16:14 n4nd0
haha
16:15 nicococo
there are good reasons against it and we should not take much effort in that direction but
16:15 n4nd0
in any case, I agree with you, let's take into account that we may want to do it in any case
16:15 nicococo
we should at least think about it..
16:15 n4nd0
yes
16:15 n4nd0
I have to think how to fit it into shogun since right we have
16:15 n4nd0
CMachine ---- CLinearMachine
16:15 n4nd0
but also
16:15 n4nd0
CMachine ---- CKernelMachine
16:16 n4nd0
so to do
16:16 n4nd0
CMachine ---- CSOMachine
16:16 n4nd0
and later the division again in linear and kernel feels kind of stupid
16:16 nicococo
hehe.. okay
16:16 n4nd0
what do you think?
16:17 nicococo
i think an empty body CKernelSOMachine can appear just to show how to do that extension
16:17 n4nd0
ok
16:18 n4nd0
but I meant about how to put it into shogun
16:18 nicococo
5min work not more ;)
16:18 n4nd0
what do you think it makes more sense
16:18 nicococo
not sure, what do you think
16:18 n4nd0
if to have one CKernelSOMachine from CKernelMachine
16:18 n4nd0
and one CLinearSOMachine from CLinearMachine
16:19 n4nd0
ooor
16:19 n4nd0
one CSOMachine from CMachine
16:19 n4nd0
and later the division in linear in kernel
16:19 n4nd0
I like more the second to tell the truth
16:19 nicococo
then we do it your way
16:19 n4nd0
I will ask Soeren about this in any case
16:21 nicococo
okay, are there more questions or should we talk monday again??
16:21 n4nd0
I think I am ok right now
16:21 n4nd0
let's see the progress done until Monday
16:21 nicococo
well.. then thank you for your work and patience ;)
16:22 n4nd0
thanks to you!
16:22 nicococo
have a nice weekend and see you on monday
16:22 n4nd0
enjoy your weekend too, talk you on monday!
16:22 nicococo
cya
16:22 n4nd0
do you mind posting the conversation in gist again?
16:23 nicococo
aehm sorry this time i only have the last 5 lines (i dont know why)
16:24 n4nd0
ok, I will do that then, I have some problem with the scrolling in my IRC client but I'll try :P
16:24 nicococo
good luck :))
16:25 n4nd0
thanks, bye
End of Lastlog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment