OWASP 2022 Global AppSec San Francisco has ended
Global AppSec San Francisco returns November 14-18.

Designed for private and public sector infosec professionals, the two-day OWASP conferences equip developers, defenders, and advocates to build a more secure web. We are offering educational 1-day, 2-day, and 3-day training courses on November 14-16.

Join us for leading application security technologies, speakers, prospects, and the community, in a unique event that will build on everything you already know to expect from an OWASP Global Conference.
Back To Schedule
Friday, November 18 • 4:30pm - 5:30pm
Detecting Malicious PyPi Packages With Semgrep

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Software packages are a juicy target for attackers to compromise. They allow malicious actors to access machines and production environments to steal sensitive data, or perform cryptojacking. In the last few months alone, multiple malicious Python packages have been reported to steal credentials from their victims and were subsequently removed. In the worst case, these packages are an attractive target for advanced threat actors to gain access to victims to steal intellectual property or carry out nation state objectives, as seen in CodeCov and SUNBURST.

What makes a “bad” package? How can we identify software packages that look malicious? In this talk, we start by showcasing some real-world malicious Pypi packages and the techniques they use to spread and execute code in victims’ environments. We then discuss how we use Semgrep, a static analysis tool designed for vulnerability detection, to scan the source code of Pypi packages and identify suspicious patterns characteristic of malware. Finally, we demonstrate the concept by dissecting malicious Pypi packages we found in the wild.

- Explanation of SLSA threat model with focus on dependency
- Short history of malicious Pypi packages - Why it’s a real problem, mention that most existing tools look for previously detected malware and cannot identify never-before-seen malicious software
- Problem statement: How to identify malicious packages at scale?

Techniques used by Pypi malware (with illustration with real-world examples)
- Quick explanation of data analyzed to find techniques: 30-40 PyPI packages removed from PyPI
- Explanation of most common patterns found in malware:
- Initial access: typosquatting, compromising the maintainer account, compromising the maintainer email domain
- Execution: Using a setup script, hooking a function, evaluating dynamic code
- Exfiltration: Using url shorteners, stealing environment variables, using an unusual domain extension
- Goal: cryptomining, stealing credentials

Writing Semgrep rules to catch malicious Pypi packages:
- Quick intro to Semgrep (30s)
- Semgrep taint analysis mode
- Explanation of detection heuristics created:
- Execution of base64-encoded strings
- Exfiltration over HTTP of sensitive information
- Download and execution of an executable file
- Executing commands in setup.py
- Putting it all together in a CLI
- Results overview: real-world malicious packages we caught and false positive rate

- Brief summary
- Future work: Running it at scale and continuously in AWS Lambda

avatar for Andrew Krug

Andrew Krug

avatar for Ellen Wang

Ellen Wang

Security Research Intern, Datadog
Ellen Wang is a security research intern at Datadog and lives in Boston, MA. She is currently pursuing a master’s degree in computer science at MIT. When she's not working, she is probably playing board games or baking pastries.

Friday November 18, 2022 4:30pm - 5:30pm PST
Bayview A