Skip to content

Instantly share code, notes, and snippets.

@yuvalif
Last active August 23, 2023 01:25
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save yuvalif/71a5c827a16269554c1a7f8f15234f09 to your computer and use it in GitHub Desktop.
Save yuvalif/71a5c827a16269554c1a7f8f15234f09 to your computer and use it in GitHub Desktop.

Gotta Catch 'Em All - GSoC 2023 Ceph Project

Below are detailed instructions regarding the Gotta Catch 'Em All - GSoC 2023 Ceph Project

Introduction

Coverity is a tool used by the Ceph project to find issues in the code. Even though Coverity it is a commercial product, they perfom regular scans for many Open Source project, including Ceph.

The Ceph storage system has an S3 compatible Object Store interface, implemented by the RADOS Gateway (or RGW) component of Ceph. The RGW is written in C++, and would be the focus of this project.

To access the list of issues found by coverity, you would need to first open a coverity account. Would recommend usign your github account, since you would need one anyway to contribute to Ceph. Once you have your coverity account, you can ask for access from the Ceph project page. The pending Coverity issues for the RGW could be found here: https://scan5.scan.coverity.com/reports.htm#v58144/p10114

As with any static analysis tool, these are just suggestion that require further analysis and classification by a developer.

  • an issue may be a real but that worthwhile fixig
  • or a minor issue that does not need to be fixed
  • some are "false positive" issues
  • and some are code smells indicating a issue that needs to be fixed, but not necessarily the issue pointed by the tool
  • something else? The goal of the project is to to through as many of these issues, classify and fix them.

More frequent scans of the latest code base perfomed by the Ceph team, and the results are posted here: http://folio07.sepia.ceph.com/main/

Notes:

  • you would need a Sepia Lab VPN access to get to that list. Which requires Ceph tracker registration. Would recommend performing these steps only after the project starts.
  • these are just tabular outputs of all of the Ceph codebase (not just RGW) that does not allow updates or queries

End Goal

Static analysis should be part of a process (ideally automated) that prevents bugs from sneaking into the system, even if reviewers missed them, and testing did not cover them. However, given the amount of issues that currently exist, both real and false-positives, it would be difficult to deploy such a process. Once the issues are cleaned up, false-positives are marked as such in the code, and real issues are either fixed or have trackers opened against them, it would be easy to add a process (not in scope for this GSoC project) where newly found issues are reported and should be addressed by the developer that introduced them.

Step 0

In this step we would build ceph and tests its Object Store interface.

Linux

First would be to have a linux based development environment, as a minimum you would need a 4 CPU machine, with 8G RAM and 50GB disk. Unless you already have a linux distro you like, I would recommend choosing from:

  • Fedora (37 or rawhide) - my favorite!
  • Ubuntu (20.04 LTS)
  • OpenSuse (Leap 15.3/4 or tumbleweed)
  • WSL (Windows Subsystem for Linux)

Git

Once you have that up and running, you should clone the Ceph repo from github (https://github.com/ceph/ceph). If you don’t know what github and git are, this is the right time to close these gaps :-) And yes, you should have a github account, so you can later share your work on the project.

Build

First install any missing system dependencies use:

./install-deps.sh

Note that, the first build may take long time, so the following cmake parameter could be used to minimize the build time. With a fresh ceph clone use the following:

./do_cmake.sh -DBOOST_J=$(nproc) -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_MGR_DASHBOARD_FRONTEND=OFF \
  -DWITH_DPDK=OFF -DWITH_SPDK=OFF -DWITH_SEASTAR=OFF -DWITH_CEPHFS=OFF -DWITH_RBD=OFF -DWITH_KRBD=OFF -DWITH_CCACHE=OFF

if the build directory already exists, you can rebuild the ninja files by using (from build):

cmake -DBOOST_J=$(nproc) -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_MGR_DASHBOARD_FRONTEND=OFF \
  -DWITH_DPDK=OFF -DWITH_SPDK=OFF -DWITH_SEASTAR=OFF -DWITH_CEPHFS=OFF -DWITH_RBD=OFF -DWITH_KRBD=OFF -DWITH_CCACHE=OFF ..

then invoke the build process (using ninja) from withon the build directory (created by do_cmake.sh). Assuming the build was completed successfully, you can run the unit tests (see: https://github.com/ceph/ceph#running-unit-tests).

Test

Now you are ready to run the ceph processes, as explained here: https://github.com/ceph/ceph#running-a-test-cluster You probably would also like to check the developer guide (https://docs.ceph.com/docs/master/dev/developer_guide/) and learn more on how to build Ceph and run it locally (https://docs.ceph.com/docs/master/dev/quick_guide/). Would recommend using the following command for starting the cluster:

MON=1 OSD=1 MDS=0 MGR=1 RGW=1 ../src/vstart.sh -n -d

Assuming you have everything up and running, you can create a bucket in Ceph and upload an object to it. Best way for doing that is the s3cmd python command line tool: https://github.com/s3tools/s3cmd Note that the tool is mainly geared towards AWS S3, so make sure to specify the location of the RGW as the endpoint, and the RGW credentials (as printed to the screen after running vstart.sh).

For example:

$ s3cmd --no-ssl --host=localhost:8000 --host-bucket="localhost:8000/%(bucket)" \
--access_key=0555b35654ad1656d804 \
--secret_key=h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q== \
mb s3://mybucket

Would create a bucket called mybucket in Ceph. And:

$ s3cmd --no-ssl --host=localhost:8000 --host-bucket="localhost:8000/%(bucket)" \
--access_key=0555b35654ad1656d804 \
--secret_key=h7GhxuBLTrlhVUyxSPUKUV8r/2EI4ngqJxD7iBdBYLhwluN30JaT3Q== \
put myimage.jpg s3://mybucket

Would put myimage.jpg into that bucket.

Step 1

In this step we would try to analyze an issue from the list above (with "high" or "medium" impact) and see which category it matches

Step 2

Pick one of the issues that were already classified as bugs and try to fix them. Note that these issues will have a "tracker". e.g. https://tracker.ceph.com/issues/57516

Note that to read tracker issues no registration is needed. However, to update an issues you must register to the Cpeh tracker

@yuvalif
Copy link
Author

yuvalif commented Feb 27, 2023

@yuvalif After making code changes that solve the Coverity issue and before making the pull request should I run the tests locally found in the following link to ensure everything works as expected and nothing's broken?

Link: https://docs.ceph.com/en/quincy/dev/developer_guide/running-tests-locally/

Primarily S3 tests command:

../qa/workunits/rgw/run-s3tests.sh

when running S3 tests you need to run that against a cluster with an RGW.
best option is to start that cluster using vstart (as you already did).

you can also run the unit tests:

ninja check

Also is there any way to test that I solved the Coverity issue locally before making a pull request?

this is more problematic. after code is merged i will check the internal results in sepia

@mohamedawnallah
Copy link

@yuvalif After making code changes that solve the Coverity issue and before making the pull request should I run the tests locally found in the following link to ensure everything works as expected and nothing's broken?
Link: https://docs.ceph.com/en/quincy/dev/developer_guide/running-tests-locally/
Primarily S3 tests command:

../qa/workunits/rgw/run-s3tests.sh

when running S3 tests you need to run that against a cluster with an RGW. best option is to start that cluster using vstart (as you already did).

you can also run the unit tests:

ninja check

Also is there any way to test that I solved the Coverity issue locally before making a pull request?

this is more problematic. after code is merged i will check the internal results in sepia

Got It

@vedanshbhartia
Copy link

vedanshbhartia commented Mar 1, 2023

Hi yuvalif,
I built ceph using the cmake flags mentioned above in the gist. To follow the guide for running a test cluster, I ran ninja vstart, but it seems that ceph-mds was not built. Might any of those flags cause it to not be built?

(venv) [root@ceph-test build]# ls /root/ceph/build/bin/ | grep mds
(venv) [root@ceph-test build]#

I also saw the following logs when running the vstart.sh script, are these logs expected?

2023-03-01T08:24:14.303+0000 7f0c0b241580 -1 bdev(0x55ed2616e800 /root/ceph/build/dev/osd2/block) unable to get device name for /root/ceph/build/dev/osd2/block: (22) Invalid argument
2023-03-01T08:24:14.304+0000 7f0c0b241580 -1 bdev(0x55ed2616ed00 /root/ceph/build/dev/osd2/block.wal) unable to get device name for /root/ceph/build/dev/osd2/block.wal: (22) Invalid argument
2023-03-01T08:24:15.072+0000 7f0c0b241580 -1 bdev(0x55ed2616e800 /root/ceph/build/dev/osd2/block.db) unable to get device name for /root/ceph/build/dev/osd2/block.db: (22) Invalid argument

@yuvalif
Copy link
Author

yuvalif commented Mar 1, 2023

Hi yuvalif, I built ceph using the cmake flags mentioned above in the gist. To follow the guide for running a test cluster, I ran ninja vstart, but it seems that ceph-mds was not built. Might any of those flags cause it to not be built?

(venv) [root@ceph-test build]# ls /root/ceph/build/bin/ | grep mds
(venv) [root@ceph-test build]#

you are right, it is not built. however the MDS is not needed for object storage (it is used by CephFS - file storage). so, you should use the following for vstart:

MON=1 OSD=1 MDS=0 MGR=1 RGW=1 ../src/vstart.sh -n -d

(will update in the doc)

I also saw the following logs when running the vstart.sh script, are these logs expected?

2023-03-01T08:24:14.303+0000 7f0c0b241580 -1 bdev(0x55ed2616e800 /root/ceph/build/dev/osd2/block) unable to get device name for /root/ceph/build/dev/osd2/block: (22) Invalid argument
2023-03-01T08:24:14.304+0000 7f0c0b241580 -1 bdev(0x55ed2616ed00 /root/ceph/build/dev/osd2/block.wal) unable to get device name for /root/ceph/build/dev/osd2/block.wal: (22) Invalid argument
2023-03-01T08:24:15.072+0000 7f0c0b241580 -1 bdev(0x55ed2616e800 /root/ceph/build/dev/osd2/block.db) unable to get device name for /root/ceph/build/dev/osd2/block.db: (22) Invalid argument

not sure that these errors are causing real issues. but anyway, try again with the vstart command above (that uses only one OSD) - i think it should be ok

@vedanshbhartia
Copy link

MON=1 OSD=1 MDS=0 MGR=1 RGW=1 ../src/vstart.sh -n -d

Thanks a lot, this works! You might also want to update the s3cmd command with the --no-ssl flag; rgw does not seem to be configured to use SSL by default and kept returning error 400 without this flag.

@yuvalif
Copy link
Author

yuvalif commented Mar 1, 2023

thanks @vedanshbhartia ! doc updated

@mohamedawnallah
Copy link

@yuvalif I've just made a pull request for this issue. Here is the pull request link:
ceph/ceph#50330

@mohamedawnallah
Copy link

mohamedawnallah commented Mar 1, 2023

@yuvalif Now I'm working on my proposal. I'd like to know how many hours (175 or 350) hours allocated for this Google summer of code project so that I plan accordingly ?

Also, what are the expected outcomes?

Thank you for your help and support!

@yuvalif
Copy link
Author

yuvalif commented Mar 2, 2023

this will be a 175 hours project (I don't think there are enough coverity issues for 350 hours).
also added an "End Goal" section.

@mohamedawnallah
Copy link

mohamedawnallah commented Mar 5, 2023

this will be a 175 hours project (I don't think there are enough coverity issues for 350 hours). also added an "End Goal" section.

Thanks that was helpful. Now I have made good progress on the proposal and have completed the basic information sections. However, I am still unsure about how to quantify the timeline for the project and divide the work across the weeks during the Google Summer of Code period.

In particular, I would like to know more about the number of Coverity issues available to solve during this period. I believe this information will be helpful in determining the amount of work to be done each week and creating a more accurate timeline for the project.

Could you please provide me with an estimate of the number of Coverity issues available to solve during the Google Summer of Code period? Any information you can share on this would be greatly appreciated.

Thanks for your help and support!

@xinxin22194
Copy link

Hi @yuvalif, right now I'm at step 0 and trying to run ./install-deps.sh, but I got an error message saying ./install-deps.sh: line 329: /etc/os-release: No such file or directory. I'm on a mac and there's no os-release in my etc. Could you help me with this?

@yuvalif
Copy link
Author

yuvalif commented Mar 6, 2023

@xinxin22194 sorry, but i never used a mac :-)
you can try the mailing list or slack channel for help: https://ceph.io/en/community/connect/

@yuvalif
Copy link
Author

yuvalif commented Mar 6, 2023

this will be a 175 hours project (I don't think there are enough coverity issues for 350 hours). also added an "End Goal" section.

Thanks that was helpful. Now I have made good progress on the proposal and have completed the basic information sections. However, I am still unsure about how to quantify the timeline for the project and divide the work across the weeks during the Google Summer of Code period.

In particular, I would like to know more about the number of Coverity issues available to solve during this period. I believe this information will be helpful in determining the amount of work to be done each week and creating a more accurate timeline for the project.

Could you please provide me with an estimate of the number of Coverity issues available to solve during the Google Summer of Code period? Any information you can share on this would be greatly appreciated.

Thanks for your help and support!

currently there are about 440 issue, out of them 40 are "high impact" (note that this is changing as development continues).
my guess is that about 5-10 of them are real issues that needs to be fixed. from which less than 5 are critical fixes, and the rest are "possible future issues".
this does not sound like a-lot, but to get there we should analyze all 440 issues, and mark them so that the system ignores them.
the goal is that at the end of the project, the amount of noise goes down to such level that we can continuously analyze the coverity results and keep the number of issues small.
so, i would say that the first couple of week should be dedicated to analysis. once we have a clear picture of the real issues we have, we can prioritize them and fix at least some of them.
note that sometimes the fix will not be trivial and may require major refactoring, so, it is difficult to estimate at this point how many of the issues we will end up fixing as part of the project.

@mohamedawnallah
Copy link

mohamedawnallah commented Mar 6, 2023

Hi @yuvalif, right now I'm at step 0 and trying to run ./install-deps.sh, but I got an error message saying ./install-deps.sh: line 329: /etc/os-release: No such file or directory. I'm on a mac and there's no os-release in my etc. Could you help me with this?

if you have an M1/M2 Mac Architecture, you can use Parallels Desktop (paid after 14 days trial) or UTM (free) to run a virtual machine of another operating system such as Fedora Linux Arm or you could use a bare-metal version of Ashai Linux (I don't know if ceph runs on ashai linux) . If you have an Intel-based Mac, you can use VirtualBox which is a popular virtualization software that allows you to run multiple operating systems on your intel Mac or you could use dual boot to make most use of the computing resources and RAM of your machine.

I think these are most of the options available to you because I spent much time configuring ceph on my mac btw I've used the paid version of Parallels Desktop and it works like a charm but other options work as well.

Please let me know for any help!

@xinxin22194
Copy link

@xinxin22194 sorry, but i never used a mac :-) you can try the mailing list or slack channel for help: https://ceph.io/en/community/connect/

Thanks! I'll try @mhmdawnallah's method, but I'll also post a message there and see if there's some other ways around.

@xinxin22194
Copy link

Hi @yuvalif, right now I'm at step 0 and trying to run ./install-deps.sh, but I got an error message saying ./install-deps.sh: line 329: /etc/os-release: No such file or directory. I'm on a mac and there's no os-release in my etc. Could you help me with this?

if you have an M1/M2 Mac Architecture, you can use Parallels Desktop (paid after 14 days trial) or UTM (free) to run a virtual machine of another operating system such as Fedora Linux Arm or you could use a bare-metal version of Ashai Linux (I don't know if ceph runs on ashai linux) . If you have an Intel-based Mac, you can use VirtualBox which is a popular virtualization software that allows you to run multiple operating systems on your intel Mac or you could use dual boot to make most use of the computing resources and RAM of your machine.

I think these are most of the options available to you because I spent much time configuring ceph on my mac btw I've used the paid version of Parallels Desktop and it works like a charm but other options work as well.

Please let me know for any help!

Thank you so much for the advice!!!!

@pranjalXi
Copy link

pranjalXi commented Mar 28, 2023

Hi @yuvalif,
right now I'm at step 0 and trying to run ./install-deps.sh, I have installed docker for ubunto as my container for windows .
could you please tell me where to run ./install-deps.sh (on which terminal)/. Is this something I have to learn about of git .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment