VOOZH about

URL: https://repost.aws/questions/QU1K-BpameRU-rZkf2veFAzg/not-able-to-import-lxml-in-aws-glue-job

⇱ Not able to import lxml in AWS Glue Job | AWS re:Post


Skip to content

Not able to import lxml in AWS Glue Job

0

Hi Team, I have a complex nested xml file which I want to read using AWS Glue and convert it to parquet format. I want to use pandas read_xml function to read the xml file. But, I get error lxml not found. So, I tried to add these 2 keys to AWS Glue Job parameters - --python-modules-installer-option (value of --upgrade) and --additional-python-modules (value of lxml==4.9.2). In the code, I tried to import etree from lxml as - from lxml import etree

But I'm still get error that lxml module not found. Please guide. My aim is to use pandas read_xml function to read an xml file in S3 using AWS Glue job. Please help

Language
English

asked 3 years ago765 views

1 Answer
  • Newest
  • Most votes
  • Most comments
Are these answers helpful? Upvote the correct answer to help the community benefit from your knowledge.
0
Accepted Answer

Cannot reproduce that, is it possible you are running on a VPC without internet access?
Check the error log, you should see a command like this and hopefully the reason it's not installed (maybe some conflict):

pip3 install --upgrade --user lxml==4.9.2
Collecting lxml==4.9.2 Downloading lxml-4.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (7.1 MB) 
Checking pymodule installation result for List(lxml==4.9.2): 
EXPERT

answered 3 years ago

  • User-8937554
    3 years ago

    Yes, you are right. Working on a client machine in a private subnet. No internet access. Thank you Sir!