Source Code vs Binary Analysis for SBOMs
SBOM generation techniques
There are two primary techniques for generating a Software Bill of Materials (SBOM):
- Source code analysis
- Binary analysis
Each technique has their owns strengths/weaknesses and an ideal solution would be the combination of the two. This article will provide a brief overview of each techniques, pros and cons and wrap up with a quick tutorial.
Source code analysis
As the name implies, source code analysis refers to analyzing the source code of the application. There are a variety of ways that developers include 3rd party SDKs but all approaches, in some way, list their direct dependencies in their source code. These lists, often persisted as manifest files, can be inspected statically to list the direct dependencies.
At initial pass, the story should end here. But there are real challenges to this approach, especially for mobile apps.
Challenge 1 - transitive dependencies
The first major challenge is that nearly all the 3rd party SDKs you include in your app also have 3rd party dependencies (yes, turtles all the way down). These indirect transitive dependencies most certainly impact the security and privacy of your app but are not always easy to resolve in source code analysis.
Modern applications have grown in complexity and there are now complex build tools and package managers for including 3rd party SDK and their dependencies. Some examples for mobile apps include build tools like Gradle and Maven for Android and package managers for iOS including Cocoapods, Swift Package Manager and Carthage. For both Android and iOS, npm or yarn can we used to build cross platform React Native apps. And this is just the start.
When a developer builds a mobile app, the various direct and transitive dependencies are downloaded and then bundled into the app. In some instances (e.g. npm), a nearly full list of direct and transitive dependencies are stored in manifest files that can be analyzed. However, additional dependencies are often still added and simple missed by source code analysis (see vulnerable OpenSSL example below).
Finally, one major advantage of static source code analysis is the static part: you are analyzing code that isn’t changing. However, by definition the build tools and package manager are in fact dynamic which breaks the simplicity model and requires “static” SBOM analysis dynamically run tooling.
Challenge 2 - source code access
While I often think about SBOM generation from the developer perspective, there’s an equally high need for security teams to assess their software supply chain to understand what code is running in their organization. Generally speaking, the customer will not have access to the source code of the systems they are runnings. Of course they can rely on the developer to provide the SBOM but the industry isn’t ready for that yet and you have to trust that the software provider has generated a complete SBOM for their application.
The log4j vulnerability in late 2021 is a sobering example of the risks involved. Within a day of the world learning about the vulnerability, it was being actively exploited in the wild (as reported by multiple CERTs). The vulnerability scored a “perfect 10” on the CVSS scale (CVE-2021-44228). That sent security teams scrambling to determine all installs of log4j throughout their entire enterprise. This was an extremely difficult task for the software companies wrote but almost impossible for the software they use. Rob Joyce, NSA Director of Cybersecurity, tweeted:
“The Log4j vulnerability is a significant threat for exploitation due to the widespread inclusion in software frameworks, even NSA’s GHIDRA. This is a case study in why the software bill of material (SBOM) concepts are so important to understand exposure.” - Rob Joyce, NSA Director of Cybersecurity.
So without the source code of the apps they use, companies cannot assess and mitigate the vulnerability. This is where binary analysis can have a major impact.
Binary analysis
As the name implies, binary analysis refers to analyzing the application binary directly. Using various reverse engineering techniques, it’s possible to determine (with vary levels of fidelity) the 3rd party SDKs an app uses.
This is obviously as game changer as consumers of software can now analyze the actual software they run to understand the dependencies.
Furthermore, binary analysis is generally able to determine transitive dependencies, applying the same technique recursively for all 3rd party SDKs and their dependencies.
And binary analysis has the added benefit of assessing the fully compiled software so the list of dependencies is defacto (well, apps can dynamically include code but that’s for a different blog). A prime example is how Apple and Google have moved to compiling apps on the fly for each download. This allows that to provide a smaller download specific to the device/OS however it means that assessing the SBOM from source code before it’s complied and sent to the device can provide an inaccurate SBOM.
So again, the story should end here with applause. But unfortunately security isn’t easy and there a major challenge.
Challenge 1 - missing version details
Compilers often strip properties important for SBOM analysis out of the binary. The most common example is version information. While version information is present some of the time, quite often it is missing which means assessing is a vulnerability is present again requires source code.
Challenge 2 - dependency tree
While binary analysis is often able to detect the presence of components, it becomes extremely difficult to build deeply nested dependencies trees. So, we might be able to determine openssl is present but which code included it is not always available.
Combining source and binary analysis
Perhaps the ideal situation is, when possible, leverage the combination of source code and binary analysis for SBOM generation. The source code analysis can often provide high fidelity, nested dependency information and then binary analysis can catch the transitive dependencies missed through source code analysis.
In practice, this could work well for the apps you build but security teams will likely need to embrace binary analysis for the apps they use.
Comparing source and binary SBOM results
To wrap things up, let’s take a look at the difference between the results. Instead an in-depth analysis, we’ll just point of a few obvious items.
For this comparison, we will use the output from gradle dependencies
verse the CycloneDX SBOM from NowSecure’s binary analysis. You can generate these SBOMs yourself following the steps I shared in my technical tutorial for generating an Android SBOM.
- Component versions: the list of dependencies from gradle contain full version info while binary analysis only has version info for some components
- Transitive dependencies: while gradle lists some transitive dependencies, it misses important opens like OpenSSL that was caught with binary analysis. It was included in the debug build (not production!) as part of Flipper and it has 8 known vulnerabilities including CVE-2018-0732 (7.5) and CVE-2019-1543 (7.4)
- You can leverage binary analysis on all the software you use (or build) but you need the source code for source analysis
- I spent several days trying various tooling to output a list of Android and/or iOS dependencies. It was a frustrating and nearly unsuccessful effort that was incredibly costly in terms of my time!
In the end, the best results are probably the combination of source and binary analysis but to date, I am not aware of a tool that automatically provides both capabilities in a single tool.
Next Steps
If you’re new to Software Bill of Materials, I hope you found this blog useful. I’m planning on a number of follow up parts to this series including:
- Technical introduction to Software Bill of Materials (SBOMs)
- How to generate a Node.JS SBOM in CycloneDX format
- Source code vs binary analysis for SBOMs
- How to generate an Android (React Native) SBOM in CycloneDX format
- Generating an Android SBOM on each build of your mobile app with GitHub Actions
- Generating an iOS SBOM on each build of your mobile app with GitHub Actions
- Leveraging Dependency-Track to continuously analyze your mobile SBOMs
If you have any suggestions for other topics or feedback in general, please connect with me and let me know!