This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# to use this, first install python and exllamav2 (https://github.com/turboderp/exllamav2) | |
# load a model, rearrange the layers as you like, set generation parameters, and run it | |
# duplicate layers share tensors, but still need extra memory for the cache | |
# thanks to @dnhkng for showing that the cache needs to be re-created | |
# licensed under WTFPL (http://www.wtfpl.net/about/) - Silphendio | |
# Additional updates to use LoRA with duplicate layers | |
# Update to model.modules_dict to include the new layers | |
# LoRA must be created with the static frankenmerge model first | |
# then can be used on top of the dynamic layers |