= Replacing a Failed Hard Drive :page-description: Steps for replacing a failed drive on a Fedora infrastructure server. :page-aliases: replacing-failed-drive.adoc == Overview This document provides a step-by-step procedure for replacing a failed hard drive on a Fedora infrastructure server. It includes access requirements, necessary tools, and the process for initiating and completing the drive replacement. == Contact Information Owner:: Fedora Infrastructure Team Contact:: #fedora-admin, sysadmin-main Purpose:: Provide basic orientation and introduction to the sysadmin group == Access Level To perform this procedure, you may need to have sysadmin-main access. In the future, access details might be shared with a dedicated assignee or stored in a smaller vault. Currently, reach out to the sysadmin-main team for necessary information exchange. == Requirements * Red Hat VPN Access - Needed for SSH access to the machine. * Bitwarden Vault Access - Access to the vault is under discussion. For now, consult the sysadmin-main team for the login credentials. == Process .Firstly, access the management console: . Ensure you are connected to the official Red Hat VPN. . Identify the server in question. For this SOP, we will use `bvmhost-x86-01.stg.iad2.fedoraproject.org` as an example. . To access the management console, append `.mgmt` to the hostname: `bvmhost-x86-01-stg.mgmt.iad2.fedoraproject.org`. . Obtain the IP address by pinging the server from `batcave01`: + [source,bash] ---- ssh batcave01.iad2.fedoraproject.org ping bvmhost-x86-01-stg.mgmt.iad2.fedoraproject.org ---- . Visit the IP address in a web browser. The management console uses HTTPS, so accept the self-signed certificate: + [source] ---- https:// ---- . Login using the credentials found in the `admin-stg` entry in Bitwarden. .Navigate to the overview page to find the serial number/service tag of the machine. === Identify the Failed Drive . Navigate to the storage menu to identify the failed drive. Warnings about failing/failed drives will be indicated here. . Note the failed drive's details (e.g., drive 4). . Create a failed drice report by clicking on the exporting the information of failed drive. === Create a Support Ticket . In the management console, click on the support link in the top right corner. . Follow these steps to contact technical support: .. Go to the top left search bar and select "Support > Contact Technical Support". .. Search for the device using the service tag from the overview page. .. Select "HardDrive and RAID Controller" from the drop-down menu. .. Choose one of the support options: ... Call: 24/7 ... Live Chat: 7 am - 9 pm CDT, Monday - Friday ... Social Connect . In the live chat support, provide the failed drive report, once they verify and confirm the failure issue, they will send an email regarding replacement details. . If live chat is unsuccessful, call support at 1-866-362-5350 (available 24/7). === Follow-Up with the Support Ticket . Once the support ticket is created, the assignee will receive a form via email. . Forward this form to Patrick Cole (pcole@redhat.com) along with the machine's serial number and location. + [NOTE] ==== At this point, Patrick Cole will handle the coordination with Dell for the drive replacement. This avoids adding unnecessary intermediaries. ==== Patrick will then coordinate the replacement with Dell, including arranging access for the technician if needed. == Conclusion Following this SOP ensures a systematic approach to replacing failed drives, minimizing downtime and maintaining system integrity. Always reach out to the sysadmin-main team for any clarifications or additional support.