Senior PySpark Developer

Victrix Systems And Labs
  • Posted On: 2026-01-16 19:41:21
  • Openings: 10
  • Applicants: 0
Job Description

Key Responsibilities :

- Design and develop scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity.

- Implement efficient XPath expressions, recursive parsing techniques, and custom schema definitions to extract data from nested XML structures.

- Optimize Spark jobs through partitioning, caching, and parallel processing to handle terabytes of XML data efficiently.

- Transform raw hierarchical XML data into structured DataFrames for analytics, machine learning, and reporting use cases.

- Collaborate with data architects and analysts to define data models for nested XML schemas.

- Troubleshoot performance bottlenecks and ensure reliability in distributed environments (e.g., AWS, Databricks, Hadoop).

- Document parsing logic, data lineage, and optimization strategies for maintainability.

Qualifications :

- 5+ years of hands-on experience with PySpark and Spark XML libraries (e.g., `spark-xml`) in production environments.

- Proven track record of parsing XML data with 20+ levels of nesting using recursive methods and schema inference.

- Expertise in XPath, XQuery, and DataFrame transformations (e.g., `explode`, `struct`, `selectExpr`) for hierarchical data.

- Strong understanding of Spark optimization techniques: partitioning strategies, broadcast variables, and memory management.

- Experience with distributed computing frameworks (e.g., Hadoop, YARN) and cloud platforms (AWS, Azure, GCP).

- Familiarity with big data file formats (Parquet, Avro) and orchestration tools (Airflow, Luigi).

- Bachelor's degree in Computer Science, Data Engineering, or a related field.

Preferred Skills :

- Experience with schema evolution and versioning for nested XML/JSON datasets.

- Knowledge of Scala or Java for extending Spark XML libraries.

- Exposure to Databricks, Delta Lake, or similar platforms.

- Certifications in AWS/Azure big data technologies.

More Info
Full Time
o
Not Disclosed
English
Not Disclosed
Education
Any Graduate
Not Disclosed
Required Skills
Pyspark java data pipeline SCALA Hadoop Cloud big data spark

Contact Details
Victrix Systems And Labs
+91 987654567
info@victrixsystems.com
  • Experience5 years
  • Salary Above 10 LAKHS ANNUALLY
  • Location for Hiring Chennai
  • Apply Now
Latest Job

Similar Jobs

Marketing Executive
Simulationhub
  • 2 years
  • Mumbai
  • 3 Weeks
Marketing Executive
Bard Roy Infotech
  • Fresher
  • Hyderabad
  • 3 Weeks
  • Fresher
  • Hyderabad
  • 3 Weeks
Automation Test Engineer
Planit Testing India
  • 1 years
  • Mumbai
  • 3 Weeks