Machine Learning Engineer (Audio & Video Models)
- Posted On: 2026-03-18 19:25:30
- Openings: 10
- Applicants: 0
Job Description
">Key Responsibilities
Design, train, and optimize audio and video ML models, including classification, detection, segmentation, generative models, speech processing, and multimodal architectures.
Develop and maintain data pipelines for large-scale audio/video datasets, ensuring quality, labeling consistency, and efficient ingestion.
Implement model evaluation frameworks that measure robustness, latency, accuracy, and overall performance across real-world conditions.
Work with product teams to transform research prototypes into production-ready models with reliable inference performance.
Optimize models for scalability, low latency, and edge/cloud deployment, including quantization, pruning, and hardware-aware tuning.
Collaborate with cross-functional teams to define technical requirements and experiment roadmaps.
Monitor and troubleshoot production models, ensuring reliability and continuous improvement.
Stay current with trends in deep learning, computer vision, speech processing, and multimodal AI.
Required Qualifications
Bachelor s or Master s degree in Computer Science, Electrical Engineering, Machine Learning, or a related field (PhD a plus).
Strong experience with deep learning frameworks such as PyTorch or TensorFlow.
Proven experience training and deploying audio or video models, such as: Speech recognition, speech enhancement, speaker identification
Audio classification, event detection
Video classification, action recognition, tracking
Video-to-text, lip reading, multimodal fusion models
Solid understanding of neural network architectures (CNNs, RNNs, Transformers, diffusion models, etc.).
Proficiency in Python, along with ML tooling for experimentation and production (e.g., NumPy, OpenCV, FFmpeg, PyTorch Lightning).
Experience working with GPU/TPU environments, distributed training, and model optimization.
Ability to write clean, maintainable production-quality code.
Preferred Qualifications
Experience with foundation models or multimodal transformers (e.g., audio-language, video-language).
Background in signal processing, feature extraction (MFCCs, spectrograms), or codec-level audio/video understanding.
Experience with MLOps tools (e.g., MLflow, Weights & Biases, Kubeflow, Airflow).
Knowledge of cloud platforms (AWS, GCP, Azure) and scalable model serving frameworks.
Experience with real-time audio/video processing for streaming applications.
Publications, open-source contributions, or competitive ML achievements are a plus.
Experience:
Min 2 years
- %BUTTON_
More Info
Education
Required Skills
Contact Details
Latest Job
Similar Jobs
- 2 years
- Hyderabad
- 21 Hours
- 1 years
- Hyderabad
- 21 Hours
- 6+ years
- Hyderabad
- 21 Hours
- 5 years
- Mumbai
- 21 Hours
- 1 years
- Hyderabad
- 21 Hours
