Senior PySpark Developer
- Posted On: 2026-01-16 19:41:21
- Openings: 10
- Applicants: 0
Job Description
Key Responsibilities :
- Design and develop scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity.
- Implement efficient XPath expressions, recursive parsing techniques, and custom schema definitions to extract data from nested XML structures.
- Optimize Spark jobs through partitioning, caching, and parallel processing to handle terabytes of XML data efficiently.
- Transform raw hierarchical XML data into structured DataFrames for analytics, machine learning, and reporting use cases.
- Collaborate with data architects and analysts to define data models for nested XML schemas.
- Troubleshoot performance bottlenecks and ensure reliability in distributed environments (e.g., AWS, Databricks, Hadoop).
- Document parsing logic, data lineage, and optimization strategies for maintainability.
Qualifications :
- 5+ years of hands-on experience with PySpark and Spark XML libraries (e.g., `spark-xml`) in production environments.
- Proven track record of parsing XML data with 20+ levels of nesting using recursive methods and schema inference.
- Expertise in XPath, XQuery, and DataFrame transformations (e.g., `explode`, `struct`, `selectExpr`) for hierarchical data.
- Strong understanding of Spark optimization techniques: partitioning strategies, broadcast variables, and memory management.
- Experience with distributed computing frameworks (e.g., Hadoop, YARN) and cloud platforms (AWS, Azure, GCP).
- Familiarity with big data file formats (Parquet, Avro) and orchestration tools (Airflow, Luigi).
- Bachelor's degree in Computer Science, Data Engineering, or a related field.
Preferred Skills :
- Experience with schema evolution and versioning for nested XML/JSON datasets.
- Knowledge of Scala or Java for extending Spark XML libraries.
- Exposure to Databricks, Delta Lake, or similar platforms.
- Certifications in AWS/Azure big data technologies.
More Info
Education
Required Skills
Contact Details
Latest Job
Similar Jobs
- 2 years
- Mumbai
- 3 Weeks
- Fresher
- Hyderabad
- 3 Weeks
- 2 years
- Mumbai
- 3 Weeks
- Fresher
- Hyderabad
- 3 Weeks
- 1 years
- Mumbai
- 3 Weeks
- Fresher
- Mumbai
- 3 Weeks
