Since we are moving to matrix, lets drop reference to irc. I may have missed a few of these and I left the Zodbot SOP alone for now until we replace it with the new matrix one. Signed-off-by: Kevin Fenzi <kevin@scrye.com>
162 lines
7.3 KiB
Text
162 lines
7.3 KiB
Text
:experimental:
|
|
:toc:
|
|
= Service Level Expectations
|
|
|
|
The infrastructure team does not have any formal agreement or contract regarding
|
|
the availability of its different services. However, we do try our best to keep
|
|
services running, and as a result, you can have some expectations as to what
|
|
we will do to this extent.
|
|
|
|
== Primary Business Hours
|
|
|
|
Fedora Infrastructure is a community team, involving volunteers as well as
|
|
people employed by Red Hat to work on Fedora.
|
|
However, despite the help of volunteers, primary business hours are mostly
|
|
aligned with the work schedule of Red Hat. Normal hours should be seen as
|
|
Monday through Friday from 1000 UTC to 2300 UTC, excluding US/EU national holidays
|
|
and a 2 weeks end of year closure affecting staffing and response times.
|
|
|
|
Services outside of primary business hours are done on call and depend on
|
|
the availability of staff.
|
|
|
|
== Roles and Responsibilities
|
|
|
|
=== Fedora Infrastructure to Community
|
|
|
|
* To have staff present and available in appropriate communication channels to answer
|
|
questions during primary hours.
|
|
* Interact with community members with respect and courtesy.
|
|
* Work with community members to get accurate and thorough documentation of
|
|
incidents, problems, or feature requests.
|
|
* Resolve reported problems as soon as acknowledged if possible.
|
|
* Clearly communicate estimated resolution times.
|
|
* Move items which can not be resolved within a reasonable time to future
|
|
feature requests or close out.
|
|
|
|
=== Community Members to Fedora Infrastructure
|
|
|
|
* Provide full and detailed reports of the problem or requested service.
|
|
* Provide clear and complete contact information and times when available.
|
|
* Leave alternative contacts who can also be available in case of vacation
|
|
or other emergencies.
|
|
* When contacted by Fedora IT, respond back within 5 business days.
|
|
|
|
=== Fedora Infrastructure to Fedora Infrastructure
|
|
|
|
* Have a clear schedule of reachable hours.
|
|
* Set and take regular vacation time to be rested.
|
|
* Rotate through days on-call in matrix and tickets.
|
|
* If adding a new service, be available outside of normal business hours to
|
|
help debug problems.
|
|
* Follow procedures and checklists when adding or updating services.
|
|
* Help with regular audits of the documentation
|
|
|
|
== Definition of Service Priorities
|
|
|
|
The general design of service priorities is that of concentric circles, where
|
|
items rely on services in their own circle or a circle below them.
|
|
|
|
. *Critical* services are ones which Fedora Infrastructure will work to be available
|
|
24x7 with a 52 week coverage if an unplanned outage occurs.
|
|
Services will be configured to be highly available with an estimated
|
|
planned/unplanned uptime of 95%. Response time should be within 1 hour during business
|
|
hours. Outside business hours this will be addressed when the Fedora infra staff is
|
|
available.
|
|
. *Important* services are ones which Fedora Infrastructure will work to be
|
|
available 24x7 with a 50 week coverage. Response time should be within a day
|
|
during business days. Outside business days this will be addressed when the
|
|
Fedora infra staff is available.
|
|
. *Normal* services are ones which Fedora Infrastructure will work to be
|
|
available during primary work hours. Problems outside of these hours will
|
|
be looked at as people are available. The services may be available
|
|
outside of these but are of a lower priority than important services.
|
|
. *Low priority* services are ones which are not critical or important for
|
|
the primary function of Fedora Infrastructure. They will be worked on and
|
|
looked at during primary business hours.
|
|
. *Third Party* services are ones which Fedora Infrastructure has outsourced
|
|
tools and services to. Uptimes, service hours, and coverage are dictated
|
|
by the third party. Depending on the type of problem, Fedora Infrastructure
|
|
will act as an intermediary, or in the case of tools like retrace and COPR,
|
|
direct the user to talk with the service owners.
|
|
. *Deprecated* services are ones which Fedora Infrastructure are no longer
|
|
putting resources into. This may be because the project has completed its
|
|
mission, the upstream software is dead, or the original reasons for the
|
|
service no longer exists. Problems with these services will be looked at
|
|
during primary business hours. Responses may be mostly "Will Not Fix".
|
|
|
|
== Limitations on Support
|
|
|
|
* Some services that are associated with Fedora are provided by third
|
|
parties. Changes and outages which affect them are outside the control
|
|
of Fedora Infrastructure.
|
|
* Fedora Infrastructure will prioritize issues and requests that affect
|
|
multiple people or teams over a smaller group or individual.
|
|
* Fedora Infrastructure has limited budget and hours. Requests and features
|
|
will be prioritized to fit within those.
|
|
* Fedora Infrastructure is bound by the laws and regulations of the United
|
|
States of America. This means that certain requests, changes and problems
|
|
are outside the ability of members to deal with.
|
|
|
|
== Glossary
|
|
|
|
* **Planned outage**: A planned outage is one that is announced sufficiently
|
|
ahead of time to allow most users to plan around it.
|
|
|
|
* **Unplanned outage**: An outage that occurs suddenly without proper
|
|
allowance for users to plan around it.
|
|
|
|
* **Scheduled outage**: An outage which has been scheduled to occur, but may
|
|
not have been announced with enough time for users to plan around it.
|
|
|
|
* **High Availability**: Systems are available during specified operating
|
|
hours with any unplanned outages 'masked' by other tools.
|
|
|
|
* **Continuous Operations**: Systems are available 24 hours a day, 7 days
|
|
a week, with no scheduled outages. Unplanned outages are possible during
|
|
this time.
|
|
|
|
* **Continuous Availability**: Systems or applications are available 24x7
|
|
with no planned or unplanned outages. This is a combination of high
|
|
availability and continuous operations.
|
|
|
|
* **Level of availability**:
|
|
|
|
[options=header]
|
|
|===
|
|
|Percentage | Max outage time per day
|
|
| 90% | 144.0 minutes
|
|
| 95% | 72.0 minutes
|
|
| 99% | 14.4 minutes
|
|
| 99.9% | 1.4 minutes
|
|
|===
|
|
|
|
* **Committed Hours of Availability**: Hours that an organization will have
|
|
staff available to help deal with issues with systems, services, and
|
|
applications. Also known as "Regular Business Hours"
|
|
|
|
* **Outage Hours**: Total number of hours of outage considered normal for
|
|
calculating achieved availability.
|
|
|
|
* **Response Time**: The time between the users notification of the problem
|
|
and when the help desk will begin to work on that problem.
|
|
|
|
* **Resolution Update**: The frequency of updates to tickets
|
|
|
|
== Estimated Time of Resolution:
|
|
By priority Levels:
|
|
|
|
* **Emergency**: Problems which are site wide, and affect the core functions
|
|
of the project. These problems are priority and should be solved as soon as possible.
|
|
Estimated time of resolution is within hours.
|
|
|
|
* **Urgent**: Problems which affect multiple functions and groups in the
|
|
project. These problems will be solved when there is no emergency going on.
|
|
Estimated time of resolution is within a day.
|
|
|
|
* **Normal**: Problems which affect a single user from performing needed
|
|
duties. These problems will be looked at when staff is available.
|
|
Estimated time resolution is within a week.
|
|
|
|
* **Low**: A request for service, instruction, information that has no
|
|
immediate impact on services. Those problems are lowest priority.
|
|
Estimated time of resolution is within a month.
|