Skip to content

Instantly share code, notes, and snippets.

@sonney2k
Forked from nicococo/Sketch.cpp
Created May 4, 2012 15:48
Show Gist options
  • Save sonney2k/2595648 to your computer and use it in GitHub Desktop.
Save sonney2k/2595648 to your computer and use it in GitHub Desktop.
Shogun Structured Output Toolbox Interface
class CVanillaStructuredOutputMachine : public CMachine {
// Constructor, Destructor
CStructuredOutputMachine(
CStructuredData* trainset,CLoss* loss,CModel* model)
virtual ~CStructuredOutputMachine
// heritable data
CStructuredData* trainset
CLoss* loss
CModel* model
// nice to have \dots
// simple solution: deliver zeros or rand
virtual vector presolver() {
return zeros(trainset->get_dimensionality(),1)
}
// application specific methods
virtual void init_op()
virtual CResultSet compute_argmax(int index,vector w);
// vanilla SO-SVM training
void train() {
init_op
// assume diagonal regularization matrix with just one value
lambda = C(1,1)
w = presolver
List results = empty
repeat
// amount of constraints generated so far
lens = length(results)
for (i=0;i<trainset->get_size);i++) {
res = compute_argmax(i,w)
if (res->delta+ w*res->phi_pred >
max_j(results_i(j)->delta+w*results_i(j)->phi_pred) {
results_i->add(res)
}
}
// Solve QP
until lens&=&length(results)
}
};
struct CResultSet {
vector phi_example
vector phi_pred
double delta
}
class CStructuredData {
// class containing the structured data, e.g. sequences,
// trees, \dots
int get_size()
int get_dimensionality()
virtual generic get_example(int i)
virtual generic get_label(int i)
}
class CModel {
// Containing the application specific model.
// e.g. State model for HMM-SVM.
CDeltaLoss* get_delta_func();
}
class CLoss {
bool is_smooth
bool is_convex
// loss: Re -> Re^+ ?
bool is_positive
double calc(double x)
// derivatives missing
}
class CDeltaLoss {
// Application specific loss.
double calc(generic y_sol,y_pred)
}
@iglesias
Copy link

--- Log opened Fri May 11 16:29:25 2012
Lastlog:
15:02 ||| Irssi: Starting query in freenode with nicococo
15:02 nicococo
Buenos
15:03 n4nd0
hallo :)
15:03 nicococo
how are you? already finished coding :)
15:03 nicococo
can i use it in my experiments now ;)
15:03 n4nd0
I am fine
15:04 n4nd0
wow not, not yet! I manage now to have a more or less trustful interface
15:05 nicococo
cool, i changed some parts a few hours ago in the gist
15:05 nicococo
and also added some more comments
15:06 n4nd0
ok
15:06 n4nd0
let's see
15:07 n4nd0
ok, so one thing I wanted to discuss with you about the delta loss
15:08 nicococo
okay
15:08 n4nd0
http://dl.dropbox.com/u/11020840/shogun/Test.pdf
15:09 n4nd0
what you can see in the diagram there is more or less what I have implemented this far
15:10 n4nd0
so I have made this class StructuredLoss
15:10 n4nd0
https://github.com/iglesias/shogun/blob/so/src/shogun/so/StructuredLoss.h
15:10 nicococo
wait.. lets start at the leafs with CStructuredLoss
15:10 n4nd0
ok
15:11 nicococo
do you had a look at https://gist.github.com/2634487 ?? or the old version?
15:11 n4nd0
I started to check the new one when we started the conversation
15:11 nicococo
i am asking because, there is no structured loss class anymore
15:12 n4nd0
I think it is the class CLoss
15:12 n4nd0
I gave it that name since I think that CLoss but might a bit too general
15:12 nicococo
the structured loss (=the delta loss) is now in StructuredApplication
15:13 n4nd0
ok, there's a misunderstanding with the naming here
15:13 n4nd0
probably my fault
15:13 nicococo
the CLoss is a very simple class that is for hinge-loss and linear loss aso
15:13 n4nd0
what you call the CLoss in gist is what I have called CStructuredLoss in the code
15:14 nicococo
okay but then you don't need any reference to labels and data
15:14 n4nd0
how to compute the loss without them then?
15:14 nicococo
e.g. for hingeloss:
15:15 nicococo
CHingeLoss.calc(double x) = max(0,x)
15:15 n4nd0
ahm
15:16 nicococo
its nothing more. it is the \ell in the pdf i sent around. the argument is always a scalar
15:16 n4nd0
oh yes
15:17 n4nd0
I see that the function l takes just as argument the maximum of an expression
15:17 n4nd0
of a scalar product
15:17 nicococo
for instance if you have a linear model (SVM) with w'x (x is a datapoint)
15:18 nicococo
then CHingeLoss.calc(w'x) = max(0, w'x) (is it clear??)
15:18 n4nd0
yes, I think so
15:19 nicococo
it is not really important anyway ;)
15:19 n4nd0
but it confuses me a bit
15:19 n4nd0
since in the references they use this function as
15:19 n4nd0
l(x, y, w)
15:19 nicococo
yes. thats not your fault
15:20 nicococo
yes indeed, then CLoss has to know the everything about the problem.. this would be a mess
15:21 n4nd0
then the point is that it is not necessary to have neither features nor labels in the loss, isn't it?
15:21 nicococo
(upss.. i think i forget the y in the example)
15:21 nicococo
yes!
15:21 n4nd0
?
15:21 n4nd0
:D
15:21 nicococo
e.g. l(x,y,w) for svm =
15:22 nicococo
= max(0, 1 - y w'x) right?
15:22 n4nd0
ok
15:22 n4nd0
but wait
15:22 n4nd0
how do you define y w'x?
15:22 nicococo
so for our loss function we set z = 1- y w'x and l(z) = max(0,z)
15:23 n4nd0
I see the part w'x
15:23 nicococo
it is defined in the strcutedoutputmachine .. because this is a single implementation a specific model
15:23 n4nd0
but I mean that if y is something with structure, e.g. sequence
15:24 n4nd0
how can you define y w'x?
15:24 n4nd0
I understand that w'x is a vector product
15:24 n4nd0
isn't it?
15:24 nicococo
sorry.. the example was for a plain binary svm.. i thought you mioght be more familiar with
15:26 nicococo
for so-svm with margin rescaling it is: z = max_y(w'psi(x_i,y) + delta(y_i,y)) - w'psi(x_i,y_i)
15:26 nicococo
AND l(z) = max(0,z)
15:26 nicococo
have a look at gist line 31 & 32
15:26 n4nd0
all right, that expression is something I recognize
15:27 nicococo
in line 31 slack is a scalar in any case
15:27 n4nd0
ok
15:28 nicococo
okay, shall we move on?
15:28 n4nd0
one moment
15:29 n4nd0
any function l (a CLoss in gist) is going to be simply a function that takes an scalar?
15:29 n4nd0
an outputs another scalar?
15:29 nicococo
yes
15:30 n4nd0
aham, good
15:31 n4nd0
let's move to the delta loss now?
15:31 nicococo
ok
15:32 n4nd0
a parenthesis first, what do you prefer to use as naming CStructuredApplication or CStructuredModel?
15:32 nicococo
it is in the CStructuredApplication now
15:32 n4nd0
I like more the second
15:32 nicococo
mmhh.. yes, i know what you mean..
15:32 nicococo
well, lets call it model
15:33 n4nd0
ok
15:33 nicococo
alright
15:33 nicococo
so, one question: our data is now in CFeatures ?
15:34 n4nd0
we have to sync in this aspect yes
15:34 nicococo
thats okay
15:35 n4nd0
so what you call CStructuredData in gist is the output space, isn't it?
15:35 n4nd0
I mean the labels, that belong to Y
15:35 nicococo
it is our data, x and y
15:36 n4nd0
ok, there is something that we should think about here though
15:36 n4nd0
since in shogun this is separated
15:36 n4nd0
normally one has the data of the input space, the x, in a CFeatures
15:36 n4nd0
and the data of the output space, the y, in a CLabels
15:37 nicococo
well, okay then we do it the shogun way :)
15:37 n4nd0
but there's something I need to ask you about this
15:37 n4nd0
so in the problems that we are concerned to solve
15:38 n4nd0
yi, each instance of Y, may be something with structure, e.g. a sequence
15:38 nicococo
yes
15:38 n4nd0
but what about the xi?
15:38 nicococo
same
15:38 n4nd0
ok
15:39 nicococo
e.g. for sequences you have a single y_i that is, say 1xL vector okay
15:39 n4nd0
ok
15:39 nicococo
and the corresponding x_i is FxL matrix (features x length)
15:39 n4nd0
in that case, there would be no problem
15:40 n4nd0
but the thing is
15:40 n4nd0
let's go to an example where we have another structure, I think sequences simplify the things and may head to confussion
15:40 n4nd0
so imagine we have parse trees in the output space
15:41 n4nd0
like the NLP the use as running example in the paper
15:41 nicococo
yes
15:41 n4nd0
does SO cover as well problems where the features, each xi, may be trees as well?
15:42 n4nd0
I thought not
15:42 nicococo
you can represent trees as vectors as well.. structured input. thats no problem
15:42 nicococo
also the sizes of the x_i and y_i might differ from example to example and of course
15:43 nicococo
y_i can be longer than x_i and the other way around
15:43 nicococo
just a matter of representation
15:43 n4nd0
ok
15:43 n4nd0
I see the idea but I have to think if CFeatures is ok to hold this, I think it is
15:44 n4nd0
but I'll ask shogun experts ;)
15:44 nicococo
if it is not sufficient we can also switch to CStructuredOutput.. but i think it is
15:45 nicococo
much better to use shogun structure if possible
15:45 n4nd0
if not Soeren will kick my ass :D
15:45 nicococo
mine too ;)
15:46 nicococo
..and you are in sweden i am here in berlin with soeren..
15:46 n4nd0
haha you're in more dangerous situation in that case then
15:47 n4nd0
let's come back to the topic
15:47 nicococo
yes.. to be more secure i decide that we use CLabels and CFeatures :)
15:47 n4nd0
I think we were going to talk about the delta loss
15:47 nicococo
okay
15:49 nicococo
?
15:50 n4nd0
I placed it inside the other loss function instead of the model
15:50 n4nd0
because the loss function will need it
15:51 nicococo
they are fundamentally different
15:51 nicococo
and also not related to each other
15:52 nicococo
so it belongs more to the model not the CLoss
15:52 n4nd0
mmmm
15:52 n4nd0
aren't they related to the extent that l uses Delta to be computed?
15:52 n4nd0
aham oh I think not
15:53 n4nd0
since you told me before that the input to l is a scalar, then l doesn't need to know about how does Delta look like
15:53 nicococo
yes, CLoss does not know how the x in calc(x) is computed
15:53 n4nd0
I see your point
15:54 n4nd0
the idea is to use this delta loss from the CSOMachine
15:54 n4nd0
and use the result (along with other things you wrote above) as input for l
15:54 n4nd0
is that right?
15:55 nicococo
right
15:55 n4nd0
:)
15:55 nicococo
but the calculation of the delta is done in the CSOModel not in CSOMachine
15:56 n4nd0
yeah
15:56 nicococo
alrighty ..
15:56 n4nd0
but it will be the CSOMachine who calls this function
15:56 n4nd0
it calls it using the CSOModel of course
15:56 nicococo
yes.. i have the feeling we are getting somewhere close :)
15:57 n4nd0
nice :)
15:57 nicococo
okay.. what is missing??
15:57 nicococo
resultset?
15:57 n4nd0
yeah, I didn't do anything related to that yet
15:57 n4nd0
I had a doubt about it, let me read its part in the last gist
15:59 n4nd0
ok
15:59 n4nd0
so the idea is to have a ResultSet instance for every call to argmax
15:59 n4nd0
is that right?
15:59 nicococo
yes
16:00 n4nd0
ok
16:00 nicococo
containing the psi's the delta (and some more) that is necessary for the loss calculation
16:01 n4nd0
ok
16:01 nicococo
and.. for one of the teams we have to take care about the latent variable h so, instead of delta(y1,y2) we need a delta(y1,y2,h)
16:02 n4nd0
ok, here we get to a point I want to discuss with you
16:02 n4nd0
I talked yesterday with wiking, the latent svm guy
16:03 n4nd0
he encouraged to avoid in the design the need of extending CStructuredModel each time a new SO application is to be done
16:03 n4nd0
and suggested to use pointer functions to do that
16:03 n4nd0
https://github.com/iglesias/shogun/blob/so/src/shogun/so/StructuredModel.h
16:03 n4nd0
something like that ^
16:04 nicococo
like that you can pluggin another delta but using the same argmax?
16:04 n4nd0
mmm not exactly
16:04 n4nd0
like you make an instance of CStructuredModel passing the functions to use
16:04 n4nd0
we implement three functions, argmax, loss and psi
16:05 n4nd0
for the application we do
16:05 nicococo
i think it is what i mean: you pass to CSModel the argmax,delta loss ..
16:05 n4nd0
new Model(argmax, loss, psi)
16:05 n4nd0
yes
16:05 n4nd0
sorry, I didn't understand :S
16:05 nicococo
yes yes.. very good idea
16:06 n4nd0
I think that in that case there will be no need of extending CStructuredModel, right?
16:06 nicococo
mmhh.. one has to see that each applicaation has its very specific needs
16:06 nicococo
which means, that even if two applications use hmm they can be use the viterbi in totally different ways
16:07 nicococo
mmhh.. but anyway.. it seems to be a nice idea
16:07 n4nd0
like two viterbi you mean?
16:08 nicococo
there are several possibilities to use viterbi, but they still do the same: predicting the most likely seq
16:08 nicococo
there will be still many CSModels
16:08 n4nd0
mmm
16:09 n4nd0
right now I don't see why, providing that this instantiation giving the functions is used
16:09 nicococo
but it is a good idea .. we should do that for at least some functions
16:10 nicococo
cool ;)
16:10 n4nd0
ok
16:10 n4nd0
so one last thing I would like to ask you
16:10 n4nd0
the idea I have in mind right now is that the class that appears in the diagram
16:10 nicococo
and you are right for standart models it is a big gain if you can plug in the functions :)
16:10 n4nd0
CSOMachine
16:11 n4nd0
would be the ancestor for any so classifier
16:11 n4nd0
e.g. so-svm, crfs or any other
16:11 n4nd0
what do you think?
16:12 nicococo
yes.. but should we differ between CLinearSOMachine and CKernelSOMachine ??
16:12 n4nd0
yes
16:12 n4nd0
but, did you read what Soeren said at IRC yesterday?
16:12 nicococo
no?
16:13 n4nd0
wait, I will show you
16:13 n4nd0
http://www.shogun-toolbox.org/irclogs/%23shogun.2012-05-10.log.html
16:13 n4nd0
look for slow
16:13 n4nd0
is the first hit
16:13 n4nd0
10:33
16:14 n4nd0
at 10:18 he said to you
16:14 n4nd0
nicococo: btw I don't think we should even consider SO with kernels
16:14 nicococo
i see, correction: it is assumed to be very very slow ;)
16:14 n4nd0
haha
16:15 nicococo
there are good reasons against it and we should not take much effort in that direction but
16:15 n4nd0
in any case, I agree with you, let's take into account that we may want to do it in any case
16:15 nicococo
we should at least think about it..
16:15 n4nd0
yes
16:15 n4nd0
I have to think how to fit it into shogun since right we have
16:15 n4nd0
CMachine ---- CLinearMachine
16:15 n4nd0
but also
16:15 n4nd0
CMachine ---- CKernelMachine
16:16 n4nd0
so to do
16:16 n4nd0
CMachine ---- CSOMachine
16:16 n4nd0
and later the division again in linear and kernel feels kind of stupid
16:16 nicococo
hehe.. okay
16:16 n4nd0
what do you think?
16:17 nicococo
i think an empty body CKernelSOMachine can appear just to show how to do that extension
16:17 n4nd0
ok
16:18 n4nd0
but I meant about how to put it into shogun
16:18 nicococo
5min work not more ;)
16:18 n4nd0
what do you think it makes more sense
16:18 nicococo
not sure, what do you think
16:18 n4nd0
if to have one CKernelSOMachine from CKernelMachine
16:18 n4nd0
and one CLinearSOMachine from CLinearMachine
16:19 n4nd0
ooor
16:19 n4nd0
one CSOMachine from CMachine
16:19 n4nd0
and later the division in linear in kernel
16:19 n4nd0
I like more the second to tell the truth
16:19 nicococo
then we do it your way
16:19 n4nd0
I will ask Soeren about this in any case
16:21 nicococo
okay, are there more questions or should we talk monday again??
16:21 n4nd0
I think I am ok right now
16:21 n4nd0
let's see the progress done until Monday
16:21 nicococo
well.. then thank you for your work and patience ;)
16:22 n4nd0
thanks to you!
16:22 nicococo
have a nice weekend and see you on monday
16:22 n4nd0
enjoy your weekend too, talk you on monday!
16:22 nicococo
cya
16:22 n4nd0
do you mind posting the conversation in gist again?
16:23 nicococo
aehm sorry this time i only have the last 5 lines (i dont know why)
16:24 n4nd0
ok, I will do that then, I have some problem with the scrolling in my IRC client but I'll try :P
16:24 nicococo
good luck :))
16:25 n4nd0
thanks, bye
End of Lastlog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment