AlvaroJoseLopes/gsoc2023.md

## gsoc2023.md

      
    Raw
  

              gsoc2023.md
            
          
    Knowledge Graph aware Recommendation System with DBpedia

This project took place over the summer of 2023 as part of the Google Summer of Code, working closely with DBpedia. For some background, DBpedia is a community-driven project that extracts structured information from Wikipedia and makes it available as an Open Knowledge Graph (OKG), for use in various applications and research.
The main goal of this project was to explore DBpedia’s potential to enrich the data
provided to Recommender Systems (RS) on different standard datasets. The first step of the project was to implement a data integration pipeline to enrich the MovieLens and LastFM datasets.
Also, a framework was implemented for running reproducible experiments
with only the model implementation and a simple .yaml configuration file. Through the
framework, this project allows practitioners to easily evaluate and compare their proposed
RS algorithms with existing approaches, enabling future benchmarks on enriched and non-enriched datasets.

Project results summary

My main contributions were the implementation of the Data Integration and Framework modules. I implemented the backbone for both, with some supported datasets and methods, to allow the configuration of a simple experiment pipeline.
Currently, the supported datasets for Data Integration are:


Dataset
#items matched
#items


MovieLens-100k
1462
1681


MovieLens-1M
3356
3883


LastFM-hetrec-2011
11815
17632


Each Framework submodule currently supports:

Pre-processing Methods:

Binarize ratings.
Filtering by k-core


Splitting Methods

Random by Ratio
Timestamp by Ratio
Fixed Timestamp
K-Fold


Recommender System Models:

deepwalk_based: Node embedding based model (Node2Vec) + cosine similarity.


Evaluation Metrics

MAP@k
nDCG@k


Future Work

Although I did complete all of the goals set for this project proposal, there is a lot of room for improvement. Currently, we have the backbone for running simple benchmarks, but we need to add more datasets and methods on both, Data Integration and Framework modules, for running more complete ones.
With this in mind, this project was also designed to support further contributions, whether adding new supported datasets, new baseline models, new evaluation metrics, or bug fixing.
I plan to continue contributing to this project, by adding new datasets, RS models, and metrics. I and my mentor also plan to publish a benchmarking paper using this resource as a way to share with the RS and open-source communities that this project is available for running reproducible experiments.
I also plan to contribute to different projects at DBpedia in the future.
Resources


DBpedia
Project repository
Project documentation
Project proposal

Acknowledgement

Participating in the Google Summer of Code 2023 program, and dedicating my efforts to the "Knowledge Graph aware Recommendation System with DBpedia" project, has been an enriching experience. I'm grateful for the opportunity to contribute to DBpedia and Recommender System communities.
I also would like to thank the guidance of my mentors Paulo do Carmo and Edgard Marx, and for always helping me throughout this program. I have gained a lot of knowledge while working with them and the open-source community. I'm looking forward to contributing more with you in the near future.
Thanks for your support ❤️
Dataset	#items matched	#items
MovieLens-100k	1462	1681
MovieLens-1M	3356	3883
LastFM-hetrec-2011	11815	17632