System Development Engineer – AWS Monitoring

Location: Rockville, Maryland - Remote
Category: DevOps/Cloud
Employment Type: Contract
Job ID: 16670
Date Added: 06/12/2024

Apply Now

Fill out the form below to submit your information for this opportunity. Please upload your resume as a doc, pdf, rtf or txt file. Your information will be processed as soon as possible.

* Required field.
Job Summary:

We are seeking a highly skilled System Development Engineer specializing in implementing monitoring solutions to join our dynamic IT team. The candidate will be responsible for developing and maintaining comprehensive monitoring systems that ensure the smooth operation of our IT infrastructure. This role involves working closely with cross-functional teams to identify monitoring needs, implement monitoring tools, and provide insights to optimize system performance and reliability.

A System Development Engineer specializing in implementing monitoring solutions and is responsible for designing, implementing, and maintaining monitoring solutions to ensure the reliability, performance, and scalability of IT infrastructure and applications. This role typically involves working closely with various teams, such as DevOps, IT, and software development, to identify monitoring requirements and deliver solutions that enhance system observability. This position participate in designing, implementing, and maintaining monitoring solutions to detect issues and optimize system operations.
Key Responsibilities:
  • Design and Implementation:
    • Design, develop, and implement monitoring solutions for various IT systems and applications in Cloud.
    • Integrate monitoring tools with existing systems and applications to provide comprehensive observability.
    • Develop custom monitoring scripts and plugins as needed.
    • Design and develop in-house monitoring tools and support some small foot print of on-prem monitoring as well. 
  • Monitoring, Alerting, and Reporting:
    • Any experience with setting up and configuring monitoring tools such as AWS Prometheus, Nagios, Grafana, Datadog, or similar is a plus.
    • Create and manage dashboards, alerts, and notifications to ensure timely detection of issues.
    • Establish and maintain SLAs and SLOs for various services and applications.
    • Prepare reports and dashboards using Splunk, AWS Grafana or equivalent tools to provide visibility into system performance and health.
  • Performance Optimization:
    • Analyze system performance data to identify areas for automation, optimization, and improvement.
    • Conduct capacity planning and scaling exercises to support growth and demand.
    • Collaborate with teams to implement automation solutions that improve system efficiency and reduce downtime.
  • Collaboration, Communication, and Documentation:
    • Work closely with application and operations teams to understand their monitoring needs and challenges.
    • Provide training and support to team members on monitoring tools and best practices.
    • This job is Monday through Friday 9AM to 5PM and occasional Participate in on-call rotations and respond to monitoring issues as needed. (Not frequent call support, however need to be able to support when called)
    • Create and maintain documentation for automation processes, tools, and frameworks.

Qualifications

  • Education:
    • Bachelor’s degree in computer science, Information Technology, or a related field (or equivalent experience).
  • Experience:
    • Proficiency in scripting languages such as Python, Bash, or PowerShell.
    • Proven experience in Database Development.
    • AWS experience is a must.  Experience supporting applications in AWS at an enterprise level is needed. Specifically working with EC2, S3, RDS, EBS Volumes, ELB, and Security Groups
    • Knowledge of other cloud platforms and experience in (Azure, GCP) in added plus  
  • Skills:
    • Excellent understanding of data structures and algorithms.
    • Strong understanding of IT infrastructure, including servers, networks, databases, and cloud environments.
    • Excellent problem-solving and analytical skills.
    • Ability to work independently and as part of a team.
    • Strong communication skills and the ability to convey complex technical concepts to non-technical stakeholders.
  • Certifications:
    • Relevant certifications in IT infrastructure (e.g., AWS Certified Solutions Architect) are a plus.

Preferred Qualifications

    • Hands-on experience with monitoring tools such as Nagios, Prometheus, Grafana, ELK Stack, or similar.
    • Proven experience in system development, with a focus on monitoring/observability.
    • Experience with containerization and orchestration tools like Docker and Kubernetes.
    • Knowledge of automation and configuration management tools such as Ansible, Puppet, or Chef.

#LI-MK
#Dice