[SOLVED] Parse XML on BeautifulSoup

jmorillo · October 16, 2024 - 3:38 PM

Hello,

I need to create a "Clari Copilot" package (easy, their installer.exe works correctly with /S).
However, I'm stuck on the update_package.py function because the binary is hosted on a CDN, without a main HTML page, but I was able to find an XML page listing the releases.
In setupdevhelpers.py, there are the bs_find and bs_find_all functions (which call BeautifulSoup (bs4)) with features="html.parser" by default.
BeautifulSoup, as well as the bs_find* functions, accept feature="xml", except that BeautifulSoup needs the "lxml" Python library, which isn't present by default in WAPT's Python virtual environment, I believe.
I could create a crude parser using a workaround, but it would be better to use bs_find* and BeautifulSoup natively with XML.
Do you have any suggestions? Is there a plan to integrate this LXML library into a future release? Or perhaps I've missed something?...
Thank you very much in advance.
Sincerely,
Jordi

October 16, 2024 - 4:45 PM

Hi Jordi,
You can still parse the XML with the HTML parser (you'll get a warning). This is the case with this package: https://wapt.tranquil.it/store/fr/tis-0install

jmorillo · October 16, 2024 - 6:33 PM

Thank you so much, Bertrand!
Everything is working correctly!
Just a minor issue because the XML element was declared like this: "<Key> "

I couldn't find any results for

Code: Select all

bs_find_all('https://contoso.com/test.xml', 'Key')

You had to put Key -> key (lowercase) in

Code: Select all

bs_find_all('https://contoso.com/test.xml', 'key')

for a result to be displayed.
In any case, I will be able to finish the update_package function.
Thank you again so much

October 17, 2024 - 3:47 PM

Hi Jordi,
thanks for your feedback,

I'm marking the topic as resolved.
Denis