More than 400 malicious packages were recently uploaded to PyPI (Python Package Index), the official code repository for the Python programming language, in the latest indication that the targeting of software developers using this form of attack isn’t a passing fad.
All 451 packages found recently by security firm Phylum contained almost identical malicious payloads and were uploaded in bursts that came in quick succession. Once installed, the packages create a malicious JavaScript extension that loads each time a browser is opened on the infected device, a trick that gives the malware persistence over reboots.
The JavaScript monitors the infected developer’s clipboard for any cryptocurrency addresses that may be copied to it. When an address is found, the malware replaces it with an address belonging to the attacker. The objective: intercept payments the developer intended to make to a different party.
In November, Phylum identified dozens of packages, downloaded hundreds of times, that used highly encoded JavaScript to surreptitiously do the same thing. Specifically, it:
- Created a textarea on the page
- Pasted any clipboard contents to it
- Used a series of regular expressions to search for common cryptocurrency address formats
- Replaced any identified addresses with the attacker-controlled addresses in the previously created textarea
- Copied the textarea to the clipboard
“If at any point a compromised developer copies a wallet address, the malicious package will replace the address with an attacker-controlled address,” Phylum Chief Technical Officer Louis Lang wrote in the November post. “This surreptitious find/replace will cause the end user to inadvertently send their funds to the attacker.”
New obfuscation method
Besides vastly increasing the number of malicious packages uploaded, the latest campaign also uses a significantly different way to cover its tracks. Whereas the packages disclosed in November used encoding to conceal the behavior of the JavaScript, the new packages write function and variable identifiers in what appear to be random 16-bit combinations of Chinese language ideographs found in the following table:
Unicode code point | Ideograph | Definition |
---|---|---|
0x4eba | 人 | man; people; mankind; someone else |
0x5200 | 刀 | knife; old coin; measure |
0x53e3 | 口 | mouth; open end; entrance, gate |
0x5973 | 女 | woman, girl; feminine |
0x5b50 | 子 | child; fruit, seed of |
0x5c71 | 山 | mountain, hill, peak |
0x65e5 | 日 | sun; day; daytime |
0x6708 | 月 | moon; month |
0x6728 | 木 | tree; wood, lumber; wooden |
0x6c34 | 水 | water, liquid, lotion, juice |
0x76ee | 目 | eye; look, see; division, topic |
0x99ac | 馬 | horse; surname |
0x9a6c | 马 | horse; surname |
0x9ce5 | 鳥 | bird |
0x9e1f | 鸟 | bird |
Using this table, the line of code
''.join(map(getattr(__builtins__, oct.__str__()[-3 << 0] + hex.__str__()[-1 << 2] + copyright.__str__()[4 << 0]), [(((1 << 4) - 1) << 3) - 1, ((((3 << 2) + 1)) << 3) + 1, (7 << 4) - (1 << 1), ((((3 << 2) + 1)) << 2) - 1, (((3 << 3) + 1) << 1)]))
creates the built-in function chr
and maps the function to the list of integers [119, 105, 110, 51, 50]
. Then the line combines it into a string that ultimately creates 'win32'
.
Phylum researchers explained:
We can see a series of these kinds of calls
oct.__str__()[-3 << 0]
. The[-3 << 0]
evaluates to[-3]
andoct.__str__()
evaluates to the string''
. Using Python’s index operator[]
on a string with a-3
will grab the 3rd character from the end of the string, in this case''[-3]
will evaluate to'c'
. Continuing with this on the other 2 here gives us'c' + 'h' + 'r'
and simply evaluating the complex bitwise arithmetic tacked on to the end leaves us with:''.join(map(getattr(__builtins__, 'c' + 'h' + 'r'), [119, 105, 110, 51, 50]))
The
getattr(__builtins__, 'c' + 'h' + 'r')
just gives us the built-in functionchr
and then it mapschr
to the list of ints[119, 105, 110, 51, 50]
and then joins it all together into a string ultimately giving us'win32'
. This technique is continued throughout the entirety of the code.
While giving the appearance of highly obfuscated code, the technique is ultimately easy to defeat, the researchers said, simply by observing what the code does when it runs.
The latest batch of malicious packages attempts to capitalize on typos developers make when downloading one of these legitimate packages:
- bitcoinlib
- ccxt
- cryptocompare
- cryptofeed
- freqtrade
- selenium
- solana
- vyper
- websockets
- yfinance
- pandas
- matplotlib
- aiohttp
- beautifulsoup
- tensorflow
- selenium
- scrapy
- colorama
- scikit-learn
- pytorch
- pygame
- pyinstaller
Packages that target the legitimate vyper package, for instance, used 13 file names that omitted or duplicated a single character or transposed two characters of the correct name:
- yper
- vper
- vyer
- vype
- vvyper
- vyyper
- vypper
- vypeer
- vyperr
- yvper
- vpyer
- vyepr
- vypre
“This technique is trivially easy to automate with a script (we leave this as an exercise for the reader), and as the length of the name of the legitimate package increases, so do the possible typosquats,” the researchers wrote. “For example, our system detected 38 typosquats of the cryptocompare
package published nearly simultaneously by the user named pinigin.9494
.”
The availability of malicious packages in legitimate code repositories that closely resemble the names of legitimate packages dates back to at least 2016 when a college student uploaded 214 booby-trapped packages to the PyPI, RubyGems, and NPM repositories that contained slightly modified names of legitimate packages. The result: The imposter code was executed more than 45,000 times on more than 17,000 separate domains, and more than half were given all-powerful administrative rights. So-called typosquatting attacks have flourished ever since.
The names of all 451 malicious packages the Phylum researchers found are included in the blog post. It’s not a bad idea for anyone who intended to download one of the legitimate packages targeted to double-check that they didn’t inadvertently obtain a malicious doppelganger.