Skip to content

Instantly share code, notes, and snippets.

@davdunc
Last active November 2, 2021 00:44
Show Gist options
  • Save davdunc/9f8c97c9688a8127d616b65b825abcf9 to your computer and use it in GitHub Desktop.
Save davdunc/9f8c97c9688a8127d616b65b825abcf9 to your computer and use it in GitHub Desktop.
Cloud Working Group COE 1

Cloud-Working-Group-COE-1

1 What Happened?

Link to Bugzilla Entry

In what was, at least to the active members of the Fedora Cloud Working Group, a flash decision on October 29th, 2021, the Fedora cloud images were eliminated from consideration as an edition for Fedora 35. There was some historical foundation for making this decision, but it did not, in the opinion of the working group members, disqualify the actions and delivery requirements for the group as they had been actively working towards a successful edition release with modifications and updates. With the popularity of the cloud offering today, it still deserves to be available on the Fedora Website and as an edition.

This was an unexpected setback for the cloud working group. As a working group, we had an appointed a member to work on including the edition in the new website revamp and it was almost two-years ago that we were told to wait for the website revamp before the edition would be changing locations. The effort put forth to make the cloud images possible for public and private hyperscale environments is a distinctly complex action, but like every other edition, it is wrapped up in the common concerns of delivering the other server, workstation, and immutable OS editions.

2 What was the Impact

The Fedora Cloud base image was removed as an edition from the F35 release of Fedora. The cloud working group’s very existence has been challenged and there was a move from community members to force the Working Group to reassemble by updating the PRD and actitivies. As the Fedora Cloud base image is no longer considered an edition, it will not be included in the Website revamp which is a currently ongoing project.

3 What was the Root Cause?

Link to Reference Discussion

Matthew Miller was kind enough to collect a series of discussions that demonstrated a thread of shifting the focus of the cloud working group from the standard Fedora Cloud base images to work related to the Atomic image. Shifting of the Cloud WG to Atomic WG was formalized 4 years ago and there was an effort to stabilize the Atomic experience and fulfill the needs of the Cloud image user base with the Atomic Project. Before that was completed, Red Hat purchased CoreOS, a lightweight operating system purpose-built to support lightweight clustering systems. This purchase eclipsed the work being done on Project Atomic and brought with it a significantly large code base and distribution model that was not yet integrated with any of the Cloud WG’s program. Meanwhile the Cloud SIG continued to provide a distinctly minimal Fedora base without the integration of OStree. Fedora CoreOS superceded the Project Atomic mechanics, but did not align with the Cloud WG goals. It is clear that there was great discussion when Atomic WG was being replaced with CoreOS, but at the same time, the goals of the Cloud WG and the Atomic WG were not yet unified so the Fedora Cloud WG remained active independently. The Fedora CoreOS mission and the Fedora Cloud mission have not aligned either and continue to represent distinct user bases.

For both the cloud working group and the FCOS working group, a significant amount of work continued, so there has been an operational split in what is delivered by the Cloud and FCOS working groups and updates to the goals related to the cloud images and the FCOS in separated programs. The cloud images maintain a focus on supporting distinctly different workloads than those of the lightweight containers workloads targeted by the FCOS working group. The Cloud Working Group continues improving the cloud image base to continue delighting users under the impression that they were still included in the editions.

4 What lessons did we learn?

From the discussion it was clear that there was a difference of opinion over the value of the cloud base edition delivered independent of other editions. Opinions were voiced identifying that there was an argument to be made for the cloud base image to be a secondary form of one of the other editions, such as Server or Workstation with no requirement to consider it specialized. There will need a new charter for the cloud images since there are a number of council members who have called for clarification and an updated review.

  • The Cloud Working Group learned that the Fedora Cloud image is not an edition and that they cannot reestablish without a new PRD and council review
  • The cloud image base will not be included in the website revamp and will remain in alternate downloads

4.1 Is there a clear value to the community?

According to the statistics presented by Matthew Miller, Fedora Cloud base use makes up more than 15% of the total persistent1 installations and more than 30% of the total ephemeral2 installations today making it the second most popular effort in the portfolio after Desktop.

The first issue raised was one of value to users and developers. There was an opinion expressed that the goals of these, cloud edition are not sufficiently divergent from those of the Fedora Server edition to warrant a second series of goals and responsibilities. Additionally, another opinion was stated that at times, the goals of Cloud overlap with those of Workstation Edition. This was stated as a reason to consider delivering cloud based distributions of both of these editions rather than keeping this single edition that matches this overlap with additional specialization.

The argument that there is insufficient value in the Cloud edition because it has overlapping value in other projects is not, in our evaluation, sufficient to support giving up on highly specialized development with a sensitivity to the modifications based on functional requirements in the individual environemnts supported (more every day)

4.2 Why isn’t the Fedora cloud image just a derivative of Fedora {Server,Workstation} edition?

Matthew also identified himself as one of those people who no longer sees the varitions in the versions. He was convinced, but now he isn’t as to what kind of controls separate server from cloud editions. There were a couple of current considerations that came to mind immediately that determine those differences. Thanks to Matthew for helping us to tailor our discussion.

David Duncan mentioned in a recent Fedora Podcast with Grayson, that users who work with Cloud instances in public cloud have access to the latest hardware and platform specialities. Just like the flexibility users have in instance types, they require flexibility in what they do on their instances. That means advanced specialized functionality for the platform infrastructure with little to no additional overhead. The mission of server includes preserving support for legacy hardware and that’s just not a critical focus for Cloud images.

Fedora Server has a strong requirement to remain stable, which is why, when the cloud working group found that moving to btrfs as a file system was a compelling position, Server edition working group members voiced concerns that users would not have access to advanced RAID types in an attempt to identify that this was not the decision they would make. For users adept in cloud configurations, it is well-known that the advanced RAID types are not generally useful. The virtual volumes presented as hardware and used by cloud consumers are typically already striped and redundant behind the hyperscaler storage. Virtual volumes are elastic as a result and extend to match larger sizes without requiring advanced RAID to achieve that outcome. What is useful to a typical cloud user is advanced partitioning with subvolumes and additional methods for handling dynamic snapshots and in cases where there are adjustments or updates to deliver for new images cloud is focused on using techniques for fast API transfer, like the ones found in coldsnap.

Cloud, not Server deals with hyperscaler platform variations, like the use of swapfiles and power management in opportunistic compute power. That isn’t an issue for Fedora Server since the partitioning is expected to follow disk management practices. It might be something that you want to address on workstation, but even then, using a swapfile would be a strong deviation from the best practices for standard installations and deployment. Workstation WG and Server WG are keenly focused on different support goals. The FCOS WG is focused on building a foundation for lightweight clusters and basic batch efforts. The difference in focus differentiates them. The Cloud WG is valuable to both persistent and ephemeral users who are taking advantage of platform features in their modernization.

This is by no means an exhaustive list of the distinctions in mission or functionality.

5 The Five Why’s

5.1 Why is the Cloud Group being asked to reiterate their value when others are not as influential?

The cloud working group, while still active, was intended to fold into the Atomic Working Group when it was stable. That never really happened. The Atomic Working Group was dissolved in favor of the Fedora CoreOS alignment with Red Hat CoreOS when Red Hat acquired CoreOS. The Cloud Working Group remained active throughout all of these changes and so while there was a plan of action in the council to make the FCOS working group replace the Atomic WG, there was no direct dissolution of the cloud working group because it had remained active throughout. Now, there is a BZ that states that Cloud is no longer an edition and that means the team will have to reestablish the status as an edition.

5.2 Why are there people of the opinion that the Cloud Working Group could fold into another edition?

The Server Edition has a direct affiliation with the RHEL product line. It is a forgone conclusion that it exists as an asset of continued significance with planned use in the future. Therefore there is no way to remove it without creating a gap in the current sponsorship. FCOS also has that same direct relationship with Red Hat CoreOS and cannot deviate from the downstream alignment without creating uncertainty. The cloud image is not currently connected to a release by a direct line. It merely informs the builds for the cloud images for Red Hat Enterprise Linux. With the move to use BTRFS, there was a strong divergence from Server. Moving the project under server would be a forcing function for the group to revert changes that are beneficial to public and private cloud initiatives in favor of legacy expectations.

With Workstation, there are similar concerns regarding the alignment. For developers and site reliability engineers, there is no association with Workstation as a directive for a minimal installation. It would be a considerable change for this group that has historically been focused on Laptops and PC Hardware. The Target Audience would use easy to configure virtual machines for development that isn’t container specific. It is more likely that those developers would benefit from the Fedora Cloud base images to achieve a lightweight development environment that meets their requirements for a minimal install environment and dynamic configuration through userdata and automation. Vagrant images are also the product of the Cloud WG.

5.3 Why doesn’t the cloud image base align with the rest of Fedora on a file system?

Fedora Workstation was the first to make the move to btrfs and it was alignment with that existing direction that helped the Cloud WG to move the cloud images to deviate from what the Server working group was doing. It was not expected that the Fedora Server would be able to be consistent with these directives as there is not yet full support for RAID5 and RAID6. Fedora CoreOS moved to XFS before workstation moved to btrfs for reasons of alignment, but originally, they were openly using use of btrfs and that was simply discouraged based on concerns related to issues of consistency and negative perceptions of btrfs. With Fedora Cloud Base image, the advantages outweighed the alternatives in moving away from ext4.

5.4 Why isn’t the cloud working group just producing raw images of Server and Workstation instead?

The Cloud base image goals are not the same as Workstation or Server, but the mechanics are similar. Hyperscalers and public clouds don’t need another server and do not typically require all the components for Workstation. The emphasis in cloud is on boot times and complex snapshot management or platform special requirements. In many cases, there are platform services that cannot be fully utilized using other editions without significant modifications and the Cloud Edition will continue to focus on these requirements to ensure that it remains popular with users.

5.5 Why is the Fedora cloud image popular with users

This is mostly answered by the environments where customers choose to run their workloads. A considerable number of workloads are moving to cloud and hyperscaler environments wihout changing their fundamental runbooks. Fedora Cloud images are popular because they are available where customers require them. This will further increase as the cloud working group continues to extend the availability of the images and tailors them to various environments’ specific needs as a principal goal for their target users.

6 Conclusion

The focus of the working group is the most importnat aspect of the published editions, but clarity on the status of the working group can retard the progress of even a strong working plan. The current cloud working group members believed that the disollution of the Atomic WG returned the focus to the original charter when FCOS formed independently, with different goals and focus. Getting back to the expected status requires the cloud group to retrace their steps and lose ground gained over years of action.

As of this midnight hour change for the release of F35, effort will be required to reestablish the working cloud release as an edition. The Cloud Base image will not be included in the F35 releases or the website revamp. The cloud working group learned that de facto working programs, even when all processes align with the target goals previously agreed upon, cannot undo a procedural confusion that aligns with a well-intentioned, but inconsistent chain of decisions. The cloud working group intends to remedy this as soon as possible with an expectation that we can be back on track by the release of F36.

7 Footnotes

2 ephemeral - installed for less than one week.

1 persistent - installed for more than one week.

___________________________
CLOUD-WORKING-GROUP-COE-1
___________________________
<2021-10-31 Sun>
1 What Happened?
================
[Link to Bugzilla Entry]
In what was, at least to the active members of the *Fedora Cloud
Working Group*, a flash decision on October 29th, 2021, the Fedora
cloud images were eliminated from consideration as an /edition/ for
Fedora 35. There was some historical foundation for making this
decision, but it did not, in the opinion of the working group members,
disqualify the actions and delivery requirements for the group as they
had been actively working towards a successful release. With the
popularity of the cloud offering today, it deserves to be available on
the Fedora Website as an edition.
This was an unexpected setback for the cloud working group. As a
working group, we had an appointed member to work on including the
edition in the new website revamp. The effort put forth to make cloud
images possible for public and private hyperscaler environments is a
distinctly complex action, but like every other edition, it is wrapped
up in the common concerns of delivering the other server, workstation,
and immutable OS editions.
[Link to Bugzilla Entry]
<https://bugzilla.redhat.com/show_bug.cgi?id=2018271>
2 What was the Impact
=====================
The Fedora Cloud base image was removed as an edition from the F35
release of Fedora. The cloud working group's very existence has been
challenged and there was a move from community members to force the
Working Group to reassemble by updating the PRD and activities. As the
Fedora Cloud base image is no longer considered an edition, it will
not be included in the Website revamp which is a currently ongoing
project.
3 What was the Root Cause?
==========================
[Link to Reference Discussion]
Matthew Miller was kind enough to collect a series of discussions that
demonstrated a thread of shifting the focus of the cloud working group
from the standard Fedora Cloud base images to work related to the
Atomic image. Shifting of the Cloud WG to Atomic WG was formalized and
there was an effort to stabilize the Atomic experience and fulfill the
needs of the Cloud image user base with the Atomic Project. Before
that was completed, Red Hat purchased CoreOS, a lightweight operating
system purpose-built to support lightweight clustering systems. This
purchase eclipsed the work being done on Project Atomic and brought
with it a significantly large code base and distribution model that
was not yet integrated with any of the Cloud WG's program. Meanwhile
the Cloud SIG continued to provide a distinctly minimal Fedora base
without the integration of OStree. Fedora CoreOS superseded the
Project Atomic mechanics, but did not align with the Cloud WG
goals. It is clear that there was some discussion at a point that
Atomic was being replaced with CoreOS, but at the same time, the goals
of the Cloud WG were fulfilled using the Fedora-Cloud kickstarts and
process outside of the functionality of the Fedora CoreOS
image. Fedora CoreOS did not and continues to operate in a way that is
inconsistent with the expectations of cloud image users.
For both the cloud working group and the FCOS working group, a
significant amount of work continued, so there has been an operational
split in what is delivered by the Cloud and FCOS working groups and
updates to the goals related to the cloud images and the FCOS in
separated programs. The cloud images maintain a focus on supporting
distinctly different workloads than those of the lightweight
containers workloads targeted by the FCOS working group. The Cloud
Working Group continues improving the cloud image base to continue
delighting users under the impression that they were still included in
the editions.
[Link to Reference Discussion]
<https://discussion.fedoraproject.org/t/fedora-cloud-edition-not-an-edition-and-the-future/34064/31>
4 What lessons did we learn?
============================
From the discussion it was clear that there was a difference of
opinion over the value of the cloud base edition delivered independent
of other editions. Opinions were voiced identifying that there was an
argument to be made for the cloud base to be a secondary form of one
of the other editions, such as Server or Workstation with no
requirement to consider it specialized. There will need a new charter
for the cloud images since there are a number of council members who
have called for clarification and an updated review.
* The Cloud Working Group learned that the Fedora Cloud image is /not
an edition/ and that they cannot reestablish without a new PRD and
council review.
* The cloud image base will not be included in the website revamp and
will remain in alternate downloads
4.1 Is there a clear value to the community?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
According to the statistics presented by Matthew Miller, Fedora Cloud
base use makes up more than 15% of the total persistent[1]
installations and more than 30% of the total ephemeral[2]
installations today making it the second most popular effort in the
portfolio after Desktop.
The first issue raised was one of value to users and developers. There
was an opinion expressed that the goals of thees, cloud edition are
not sufficiently divergent from those of the Fedora Server edition to
warrant a second series of goals and responsibilities. Additionally,
another opinion was stated that at times, the goals of Cloud overlap
with those of Workstation Edition. This was stated as a reason to
consider delivering cloud based distributions of both of these
editions rather than keeping this single edition that matches this
overlap with additional specialization.
The argument that there is insufficient value in the Cloud edition
because it has overlapping value in other projects is not, in our
evaluation, sufficient to support giving up on highly specialized
development with a sensitivity to the modifications based on
functional requirements in the individual environments supported (more
every day)
4.2 Why isn't the Fedora cloud image just a derivative of Fedora {Server,Workstation} edition?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthew also identified himself as /one of/ *those* /people/ who no
longer sees the variations in the versions. He was convinced, but now
he isn't as to what kind of controls separate server from cloud
editions. There were a couple of current considerations that came to
mind immediately that determine those differences. Thanks to Matthew
for helping us to tailor our discussion.
David Duncan mentioned in a recent Fedora Podcast with Grayson, that
users who work with Cloud instances in public cloud have access to the
latest hardware and platform specialties. Just like the flexibility
users have in instance types, they require flexibility in what they do
on their instances. That means advanced specialized functionality for
the platform infrastructure with little to no additional overhead. The
mission of server includes preserving support for legacy hardware and
that's just not a critical focus for Cloud images.
Fedora Server has a strong requirement to remain stable, which is why,
when the cloud working group found that moving to btrfs as a file
system was a compelling position, Server edition working group members
voiced concerns that users would not have access to advanced RAID
types in an attempt to identify that this was not the decision they
would make. For users adept in cloud configurations, it is well-known
that the advanced RAID types are not generally useful. The virtual
volumes presented as hardware and used by cloud consumers are
typically already striped and redundant behind the hyperscaler
storage. Virtual volumes are elastic as a result and extend to match
larger sizes /without/ requiring advanced RAID to achieve that
outcome. What /is/ useful to a typical cloud user is advanced
partitioning with subvolumes and additional methods for handling
dynamic snapshots and in cases where there are adjustments or updates
to deliver for new images cloud is focused on using techniques for
fast API transfer, like the ones found in [coldsnap].
Cloud, not Server, deals with hyperscaler platform variations, like the
use of swap files and power management in opportunistic compute
power. That isn't an issue for Fedora Server since the partitioning is
expected to follow disk management practices. It might be something
that you want to address on workstation, but even then, using a
swap file would be a strong deviation from the best practices for
standard installations and deployment. Workstation WG and Server WG
are keenly focused on different support goals. The FCOS WG is focused
on building a foundation for lightweight clusters and basic batch
efforts. The difference in focus differentiates them. The Cloud WG is
valuable to both persistent and ephemeral users who are taking
advantage of platform features in their modernization.
[coldsnap] <https://github.com/awslabs/coldsnap>
5 The Five Why's
================
5.1 Why is the Cloud Group being asked to reiterate their value when others are not as influential?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The cloud working group, while still active, was intended to fold into
the Atomic Working Group when it was stable. That never happened. The
Atomic Working Group was dissolved in favor of the Fedora CoreOS
alignment with Red Hat CoreOS when Red Hat acquired CoreOS. The Cloud
Working Group remained active throughout all of these changes and so
while there was a plan of action in the council to make the FCOS
working group replace the Atomic WG, there was no direct dissolution
of the active cloud working group because it had remained active
throughout. Now, there is a new BZ that states that Cloud is no longer an
edition and that team will have to reestablish the status as an /edition/.
5.2 Why are there people of the opinion that the Cloud Working Group could fold into another edition?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Server Edition has a direct affiliation with the RHEL product
line. It is a forgone conclusion that it exists as an asset of
continued significance with planned use in the future. Therefore there
is no way to remove it without creating a gap in the current
sponsorship. FCOS also has that same direct relationship with Red Hat
CoreOS and cannot deviate from the downstream alignment without
creating uncertainty. The cloud image is not currently connected to a
release by a direct line. It merely informs the build for the cloud
images for Red Hat Enterprise Linux. With the move to use BTRFS, there
was a strong divergence from Server. Moving the project under server
would be a forcing function for the group to revert changes that are
beneficial to public and private cloud initiatives in favor of necessary legacy
expectations.
5.3 Why doesn't the cloud image base align with the rest of Fedora on a file system?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fedora Workstation was the first to make the move to btrfs and it was
alignment with that existing direction that helped the Cloud WG to
move the cloud images to deviate from what the Server working group
was doing. It was not expected that the Fedora Server would
be consistent with these directives as there is not yet full support
for RAID5 and RAID6 in btfs[?Link]. Fedora CoreOS moved to XFS before workstation
moved to btrfs for [reasons of alignment], but originally, they were
focused on the [use of btrfs] and that was simply discouraged based on
concerns related to opinions. With Fedora Cloud Base image, the
advantages outweighed the alternatives in moving from ext4.
[reasons of alignment]
<https://github.com/coreos/fedora-coreos-tracker/issues/33#issuecomment-415186828>
[use of btrfs]
<https://github.com/coreos/fedora-coreos-tracker/issues/33#issuecomment-421193997>
5.4 Why isn't the cloud working group just producing raw images of Server and Workstation instead?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Cloud base image goals are not the same as Workstation or Server,
but the mechanics are similar. Hyperscalers and public clouds don't
need another server and do not typically require all the components
for Workstation. The emphasis in cloud is on boot times and complex
snapshot management or platform special requirements. In many cases,
there are platform services that cannot be fully utilized using other
editions without significant modifications and the Cloud Edition will
continue to focus on these requirements to ensure that it remains
popular with users.
5.5 Why is the Fedora cloud image popular with users
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is mostly answered by the environments where customers choose to
run their workloads. A considerable number of workloads are moving to
cloud and hyperscaler environments without changing their fundamental
runbooks. Fedora Cloud images are popular because they are available
where customers require them. This will further increase as the cloud
working group continues to extend the availability of the images and
tailors them to various environments' specific needs.
6 Conclusion
============
The focus of the working group is the most important aspect of the
published editions, but clarity on the status of the working group can
retard the progress of even a strong working plan. The current cloud
working group members believed that the dissolution of the Atomic WG
returned the focus to the original charter when FCOS formed
independently, with different goals and focus. Getting back to the
expected status requires the cloud group to retrace their steps and
lose ground gained over years of action.
As of this midnight hour change for the release of F35. Effort will be
required to reestablish the working release as an edition. The Cloud
Base image will not be included in the F35 releases or the website
revamp. The cloud working group learned that de facto working
programs, even when all processes align with the target goals
previously agreed upon, cannot undo a procedural confusion that aligns
with a well-intentioned, but inconsistent chain of decisions. The
cloud working group intends to remedy this as soon as possible with an
expectation that we can be back on track by the release of F36.
Footnotes
_________
[1] /persistent/ - installed for more than one week.
[2] /ephemeral/ - installed for less than one week.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment