This gist holds the Caffe style model spec for the CVPR'15 paper
Recognize Complex Events from Static Images by Fusing Deep Channels
The model has two channels, one for appearance analysis, the other one for detection bounding box analysis.
The appearcance analysis channel has the similar structure of the AlexNet and thus is initialized using a model pretrained on ImageNet.