Background
Back in 2020, when I was still new to programming, I stumbled upon a course on functional programming. The link is long gone, but I distinctly remember one assignment: implementing a concise JSON parser. Although I used JSON constantly in web development, I had never considered how it was actually built. Realizing its syntax was recursive, I couldn’t immediately wrap my head around it, planting a seed of curiosity that stayed with me.
In 2021, I studied compiler theory and built a compiler for a C-like language. By 2022, I wrote an article introducing algorithms for parsing JSON syntax, accompanied by a rough implementation.
This year, I decided it was time to build a relatively complete JSON library, including serialization, deserialization, and all configuration parameters. I chose Python’s standard json library as my target. After finishing it, I realized the project didn’t actually require complex, abstract compiler knowledge. The real challenges lay in software engineering and the myriad details regarding string encoding.
I hope this article gives interested readers a taste of the fun and difficulties involved in implementing a robust JSON library. You can find the source code here. It will be easier to grasp if you follow along with both the article and the code, though you are certainly welcome to skip the source and try building it yourself.
Python’s json library actually consists of two versions: a pure Python implementation and a C extension. The Python source is located at /Lib/json, while the C extension is at /Modules/_json.c. The author’s goal was to ensure speed while maintaining compatibility with pure Python environments (like PyPy). Since my current work focuses on Python, I implemented the features using only pure Python. When I started, the latest source code on GitHub was for Python 3.15, so I based my work on that version. (This decision, however, caused some obstacles later on—a lesson learned from my first attempt at reproducing an open-source library).
The project took about three weeks, and I passed almost all official test cases. The only exceptions were:
- C-extension specific tests: Since I didn’t use C.
- Command-line tool tests: As mentioned, I used the 3.15 source code (specifically for the tests). However, when I started coding, the latest available interpreter was 3.14. Consequently, some APIs used in the 3.15 tests were missing in my 3.14 interpreter, preventing those tests from running.
Additionally, I did not implement detailed error messaging for JSON parsing. Since this is a helper feature not covered by the official test cases, I skipped it.
The project consists of roughly 700 lines of application code and 800 lines of test code. It is worth noting that the ratio of application code to test code is approximately 1:1.
In fact, it only took me two or three days to implement the basic loads and dumps functionality. However, I ran into difficulties with the configuration parameters later on. The sections involving Unicode and Escape characters were particularly obscure, requiring significant effort and research to understand.