Data Engineer Senior #R104646
Data Engineer Senior
The PNC Financial Services Group, Inc. has an opening for a Data Engineer Senior in Columbus, OH. Within the Enterprise Data Management department, the position will be responsible for performing analytical tasks on vast amounts of structured and unstructured data to extract actionable business insights.
Specific duties include: (i) leading the development, support, and implementation of data solutions for multiple applications in order to meet business objectives and user requirements; leveraging technical knowledge and industry experience in order to design, develop, and maintain technology solutions; (ii) leading the development of software applications for data requirement analysis and the development of data process preparation for targeted data solutions; (iii) leading the development of software applications for the monitoring of big data analytics infrastructure, educating the business community, and encouraging the use of best practices; and (iv) implementing data governance and audit policies for big data ad-hoc environments. Master's degree in Analytics, Economics, Engineering, Statistics, Mathematics, or Information Systems Management plus 3 years of experience with Hadoop development is required.
Experience must include: (i) Extracting data from different sources, transforming, and loading it on Central repository using ETL tools Hive, Impala, Python, and Spark; (ii) implementing a wide range of operations such as data processing, access, storage, governance, security and operations using Hive, Impala, Cloudera manager, and Cloudera Navigator; (iii) identifying ways to improve data reliability, efficiency, and performance using Unravel; (iv) providing Hadoop onboarding sessions to LOB users to give an overview of Hadoop environment along with best practices using Cloudera Hadoop platform; (v) importing and exporting data using Sqoop from HDFS to relational database systems (Sqlserver/Oracle/Teradata) and vice versa; (vi) developing data pipelines using Flume to extract the data from weblogs and store in HDFS; (vii) writing Shell scripts to monitor the health check of Hadoop daemon services; (viii) optimizing of existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frames and RDDs; and (ix) developing workflows using Oozie to automate the tasks of loading data into HDFS and preprocessing with Pyspark. 40 hours/week, 8:00 am-5:00 pm.
PNC provides equal employment opportunity to qualified persons regardless of race, color, sex, religion, national origin, age, sexual orientation, gender identity, disability, veteran status, or other categories protected by law.