infra-docs-fpo/modules/ROOT/pages/sle.adoc

:experimental:
:toc:
= Service Level Expectations

The infrastructure team does not have any formal agreement or contract regarding
the availability of its different services. However, we do try our best to keep
services running, and as a result, you can have some expectations as to what
we will do to this extent.

== Primary Business Hours

Fedora Infrastructure is a community team, involving volunteers as well as
people employed by Red Hat to work on Fedora.
However, despite the help of volunteers, primary business hours are mostly
aligned with the work schedule of Red Hat. Normal hours should be seen as
Monday through Friday from 1000 UTC to 2300 UTC, excluding US/EU national holidays
and a 2 weeks end of year closure affecting staffing and response times.

Services outside of primary business hours are done on call and depend on
the availability of staff.

== Roles and Responsibilities

=== Fedora Infrastructure to Community

* To have staff present and available in appropriate communication channels to answer
  questions during primary hours.
* Interact with community members with respect and courtesy.
* Work with community members to get accurate and thorough documentation of
  incidents, problems, or feature requests.
* Resolve reported problems as soon as acknowledged if possible.
* Clearly communicate estimated resolution times.
* Move items which can not be resolved within a reasonable time to future
  feature requests or close out.

=== Community Members to Fedora Infrastructure

* Provide full and detailed reports of the problem or requested service.
* Provide clear and complete contact information and times when available.
* Leave alternative contacts who can also be available in case of vacation
  or other emergencies.
* When contacted by Fedora IT, respond back within 5 business days.

=== Fedora Infrastructure to Fedora Infrastructure

* Have a clear schedule of reachable hours.
* Set and take regular vacation time to be rested.
* Rotate through days on-call in matrix and tickets.
* If adding a new service, be available outside of normal business hours to
  help debug problems.
* Follow procedures and checklists when adding or updating services.
* Help with regular audits of the documentation

== Definition of Service Priorities

The general design of service priorities is that of concentric circles, where
items rely on services in their own circle or a circle below them.

. *Critical* services are ones which Fedora Infrastructure will work to be available
  24x7 with a 52 week coverage if an unplanned outage occurs.
  Services will be configured to be highly available with an estimated
  planned/unplanned uptime of 95%. Response time should be within 1 hour during business
  hours. Outside business hours this will be addressed when the Fedora infra staff is
  available.
. *Important* services are ones which Fedora Infrastructure will work to be
  available 24x7 with a 50 week coverage. Response time should be within a day
  during business days. Outside business days this will be addressed when the
  Fedora infra staff is available.
. *Normal* services are ones which Fedora Infrastructure will work to be
  available during primary work hours. Problems outside of these hours will
  be looked at as people are available. The services may be available
  outside of these but are of a lower priority than important services.
. *Low priority* services are ones which are not critical or important for
  the primary function of Fedora Infrastructure. They will be worked on and
  looked at during primary business hours.
. *Third Party* services are ones which Fedora Infrastructure has outsourced
  tools and services to. Uptimes, service hours, and coverage are dictated
  by the third party. Depending on the type of problem, Fedora Infrastructure
  will act as an intermediary, or in the case of tools like retrace and COPR,
  direct the user to talk with the service owners.
. *Deprecated* services are ones which Fedora Infrastructure are no longer
  putting resources into. This may be because the project has completed its
  mission, the upstream software is dead, or the original reasons for the
  service no longer exists. Problems with these services will be looked at
  during primary business hours. Responses may be mostly "Will Not Fix".

== Limitations on Support

* Some services that are associated with Fedora are provided by third
  parties. Changes and outages which affect them are outside the control
  of Fedora Infrastructure.
* Fedora Infrastructure will prioritize issues and requests that affect
  multiple people or teams over a smaller group or individual.
* Fedora Infrastructure has limited budget and hours. Requests and features
  will be prioritized to fit within those.
* Fedora Infrastructure is bound by the laws and regulations of the United
  States of America. This means that certain requests, changes and problems
  are outside the ability of members to deal with.

== Glossary

* **Planned outage**: A planned outage is one that is announced sufficiently
  ahead of time to allow most users to plan around it.

* **Unplanned outage**: An outage that occurs suddenly without proper
  allowance for users to plan around it.

* **Scheduled outage**: An outage which has been scheduled to occur, but may
  not have been announced with enough time for users to plan around it.

* **High Availability**: Systems are available during specified operating
  hours with any unplanned outages 'masked' by other tools.

* **Continuous Operations**: Systems are available 24 hours a day, 7 days
  a week, with no scheduled outages. Unplanned outages are possible during
  this time.

* **Continuous Availability**: Systems or applications are available 24x7
  with no planned or unplanned outages. This is a combination of high
  availability and continuous operations.

* **Level of availability**:

[options=header]
|===
|Percentage | Max outage time per day
| 90%       | 144.0 minutes
| 95%       | 72.0 minutes
| 99%       | 14.4 minutes
| 99.9%     | 1.4 minutes
|===

* **Committed Hours of Availability**: Hours that an organization will have
  staff available to help deal with issues with systems, services, and
  applications. Also known as "Regular Business Hours"

* **Outage Hours**: Total number of hours of outage considered normal for
  calculating achieved availability.

* **Response Time**: The time between the users notification of the problem
  and when the help desk will begin to work on that problem.

* **Resolution Update**: The frequency of updates to tickets

== Estimated Time of Resolution:
By priority Levels:

* **Emergency**: Problems which are site wide, and affect the core functions
  of the project. These problems are priority and should be solved as soon as possible.
  Estimated time of resolution is within hours.

* **Urgent**: Problems which affect multiple functions and groups in the
  project. These problems will be solved when there is no emergency going on.
  Estimated time of resolution is within a day.

* **Normal**: Problems which affect a single user from performing needed
  duties. These problems will be looked at when staff is available.
  Estimated time resolution is within a week.

* **Low**: A request for service, instruction, information that has no
  immediate impact on services. Those problems are lowest priority.
  Estimated time of resolution is within a month.