Beschreibung
Wir suchen aktuell für einen IT-Dienstleister mit Sitz in Kalifornien nach einem Network Engineer.
Da die Firma ihren Sitz in den USA hat, kann es sein, dass potenzielle Kandidaten sich darauf einstellen müssten, u.U. gelegentlich abseits der gewohnten Zeiten zu arbeiten, sofern es das Projekt erfordert.
Dennoch können die Arbeitszeiten relativ flexibel eingeteilt und die Arbeit zu 100% remote erbracht werden. Die Projektsprache ist Englisch.
Project Overview
Lead a mission-critical network infrastructure transformation for our AI development environment, eliminating single points of failure and implementing enterprise-grade redundancy for 75-100 Linux servers supporting intensive AI computational workloads. This high-visibility project directly impacts our organization's AI development capacity and operational resilience.
Technical Challenge
Current Environment Issues:
• Single point of failure on Mellanox switch creating operational risk
• Single WAN link vulnerability affecting AI development continuity
• Non-redundant storage connectivity limiting high-availability operations
• Suboptimal network topology constraining AI workload performance
Target Architecture:
• Redundant Arista Leaf-Spine Topology - Full L3 with eBGP routing for optimal performance
• Dual WAN Provider Integration - Eliminating connectivity single points of failure
• Pure Storage Redundancy - Implementing proper spanning-tree for storage high availability
• Linux Server Optimization - Enhanced networking configuration for AI workload efficiency
Project Phases & Compensation Structure
Phase 0: Assessment & Qualification
• Candidate interview and technical evaluation process
• Current environment analysis and scope validation
• Suitability determination and project approach alignment
Phase 1: Design & Planning
• Comprehensive new network design delivery
• Implementation strategy and approach outline
• Risk assessment and mitigation planning
Phase 2: Detailed Implementation Planning
• Step-by-step migration procedures with testing protocols
• Zero-downtime implementation methodology
• Comprehensive rollback and contingency planning
• Change management and approval processes
Phase 3: Migration Execution
• Coordinated implementation with data center remote hands
• Collaboration with internal server and storage teams
• Real-time monitoring and issue resolution
• Performance validation and optimization
Phase 4: Documentation & Handover
• Complete technical documentation delivery
• Team training and knowledge transfer
• Project wrap-up and lessons learned documentation
Required Technical Expertise
Network Infrastructure (Essential):
• Advanced Network Design - Enterprise-scale topology planning and implementation
• Arista Switch Platforms - Configuration, management, and troubleshooting expertise
• Mellanox/Sonic Experience - Understanding current environment and migration challenges
• BGP & Routing Protocols - eBGP implementation and advanced routing configuration
• High Availability Design - Redundancy planning and single-point-of-failure elimination
Specialized Skills (Highly Preferred):
• Data Center Operations - Experience coordinating with remote hands and facility teams
• Linux Network Administration - Advanced networking, teaming, and bond configuration
• Pure Storage Integration - Storage networking and high-availability configuration
• AI/HPC Networking - Understanding high-performance computing network requirements
Critical Success Factors
Project Management Excellence:
• Phased Delivery Expertise - Proven ability to deliver complex projects in structured phases
• Risk-Free Implementation - Zero-tolerance approach to production impact during migration
• Stakeholder Coordination - Managing multiple technical teams and external data center resources
Technical Leadership:
• Enterprise Network Authority - Confidence in making critical infrastructure decisions
• Documentation Excellence - Creating maintainable, comprehensive technical documentation
• Knowledge Transfer Ability - Effectively training internal teams on new infrastructure
Compensation & Engagement
• Phased Payment Structure - Each phase funded upon completion and approval
• Premium Project Rates - Reflecting specialized network engineering expertise and project complexity
• Performance Incentives - Additional compensation for ahead-of-schedule, zero-downtime delivery
• Future Engagement Potential - Opportunity for ongoing infrastructure optimization projects
Contract Type: Phased Project Engagement
Timezone: Pacific Time Preferred (US-Based)
Location: Remote (Bay Area Preferred)
Da die Firma ihren Sitz in den USA hat, kann es sein, dass potenzielle Kandidaten sich darauf einstellen müssten, u.U. gelegentlich abseits der gewohnten Zeiten zu arbeiten, sofern es das Projekt erfordert.
Dennoch können die Arbeitszeiten relativ flexibel eingeteilt und die Arbeit zu 100% remote erbracht werden. Die Projektsprache ist Englisch.
Project Overview
Lead a mission-critical network infrastructure transformation for our AI development environment, eliminating single points of failure and implementing enterprise-grade redundancy for 75-100 Linux servers supporting intensive AI computational workloads. This high-visibility project directly impacts our organization's AI development capacity and operational resilience.
Technical Challenge
Current Environment Issues:
• Single point of failure on Mellanox switch creating operational risk
• Single WAN link vulnerability affecting AI development continuity
• Non-redundant storage connectivity limiting high-availability operations
• Suboptimal network topology constraining AI workload performance
Target Architecture:
• Redundant Arista Leaf-Spine Topology - Full L3 with eBGP routing for optimal performance
• Dual WAN Provider Integration - Eliminating connectivity single points of failure
• Pure Storage Redundancy - Implementing proper spanning-tree for storage high availability
• Linux Server Optimization - Enhanced networking configuration for AI workload efficiency
Project Phases & Compensation Structure
Phase 0: Assessment & Qualification
• Candidate interview and technical evaluation process
• Current environment analysis and scope validation
• Suitability determination and project approach alignment
Phase 1: Design & Planning
• Comprehensive new network design delivery
• Implementation strategy and approach outline
• Risk assessment and mitigation planning
Phase 2: Detailed Implementation Planning
• Step-by-step migration procedures with testing protocols
• Zero-downtime implementation methodology
• Comprehensive rollback and contingency planning
• Change management and approval processes
Phase 3: Migration Execution
• Coordinated implementation with data center remote hands
• Collaboration with internal server and storage teams
• Real-time monitoring and issue resolution
• Performance validation and optimization
Phase 4: Documentation & Handover
• Complete technical documentation delivery
• Team training and knowledge transfer
• Project wrap-up and lessons learned documentation
Required Technical Expertise
Network Infrastructure (Essential):
• Advanced Network Design - Enterprise-scale topology planning and implementation
• Arista Switch Platforms - Configuration, management, and troubleshooting expertise
• Mellanox/Sonic Experience - Understanding current environment and migration challenges
• BGP & Routing Protocols - eBGP implementation and advanced routing configuration
• High Availability Design - Redundancy planning and single-point-of-failure elimination
Specialized Skills (Highly Preferred):
• Data Center Operations - Experience coordinating with remote hands and facility teams
• Linux Network Administration - Advanced networking, teaming, and bond configuration
• Pure Storage Integration - Storage networking and high-availability configuration
• AI/HPC Networking - Understanding high-performance computing network requirements
Critical Success Factors
Project Management Excellence:
• Phased Delivery Expertise - Proven ability to deliver complex projects in structured phases
• Risk-Free Implementation - Zero-tolerance approach to production impact during migration
• Stakeholder Coordination - Managing multiple technical teams and external data center resources
Technical Leadership:
• Enterprise Network Authority - Confidence in making critical infrastructure decisions
• Documentation Excellence - Creating maintainable, comprehensive technical documentation
• Knowledge Transfer Ability - Effectively training internal teams on new infrastructure
Compensation & Engagement
• Phased Payment Structure - Each phase funded upon completion and approval
• Premium Project Rates - Reflecting specialized network engineering expertise and project complexity
• Performance Incentives - Additional compensation for ahead-of-schedule, zero-downtime delivery
• Future Engagement Potential - Opportunity for ongoing infrastructure optimization projects
Contract Type: Phased Project Engagement
Timezone: Pacific Time Preferred (US-Based)
Location: Remote (Bay Area Preferred)