What is the core of the Python programming language?
Why ask this question?
It's no secret that I want a Python implementation for WebAssembly. It would not only get Python into the browser, but with the fact that both iOS and Android support running JavaScript as part of an app it would also get Python on to mobile. That all excites me.
But when thinking about the daunting task of creating a new implementation of Python, my brain also began asking the question of what exactly is Python? We have lived with CPython for so long that I suspect most of us simply think that "Python == CPython". PyPy tries to be so compatible that they will implement implementation details of CPython. Basically most implementations of Python that I know of strive to pass CPython's test suite and to be as compatible with CPython as possible.
That's daunting. Python as implemented by CPython is very dynamic and exposes many things that only make sense if you implement Python using an interpreter somehow. For instance, PyPy has a baseline interpreter that they JIT from, but there are many things you can use in Python which force PyPy to turn off the JIT and stick with bytecode. The REPL alone makes things very dynamic as everything you enter into the REPL is dynamically parsed, compiled, and executed by the interpreter right then and there.
That has led me to contemplate the question of what exactly is Python? What is the core of the language that makes it what it is? What baseline would all Python implementations need to cover in order to truly be able to call themselves an implementation of Python that people would still recognize? Or from my perspective, how much would one have to implement to compile Python directly to WebAssembly and still be considered a Python implementation?
Does Python need a REPL?
The thing that really made me start thinking about this is when I began contemplating what it would take to compile Python down to WebAssembly? Not implement another interpreter, but actually emit static WebAssembly from Python source and still reasonably call it "Python".
One thing I knew is dynamic compilation via eval()
or compile()
might not be easily doable as WebAssembly's security model validates modules at load time. That would suggest there isn't structured to run just arbitrary code in other code's memory space which might make implementing a REPL tricky.
But that got me thinking: does Python really need a REPL? Don't get me wrong, it's extremely handy, but if an implementation didn't have a REPL, would it no longer be Python? I would argue a REPL-less Python would still be Python, it would just be lacking a (potentially key) feature.
This led me down the road of thinking which parts of Python are required to be considered "Python"?
Could you live without locals()
? Its a very dynamic thing to be able to arbitrarily gather all defined local variables and their values into a dictionary. If you're in an intepreter like CPython you just get the locals by pulling together some things from the current execution frame. But in a compiled language this takes a lot more work as you have to know when to gather all of this information as it isn't necessarily just lying around when one calls locals()
.
Or how about people overriding locals()
itself? Once again this isn't a big deal in CPython as the builtins
module has a __dict__
which you can override and it will simply propagate down to any future calls. But in a compiled language it takes way more effort to do this sort of detection and it ends up costing performance to do such a check.
What about sys.settrace()
? It actually triggers the callback per bytecode and that doesn't quite work if the code is compiled. You can fake it by checking if a trace function is set after every line, but that seems a bit much when you don't set such a hook most of the time (it could potentially be a compiler flag to compile in such support, though).
And how about sys._getframe()
? Compiled languages do not necessarily end up with direct access to the execution frame, so do you bother simulating this? Since the execution frames could be requested by any function you would need to always be prepared to supply the execution frames on-demand.
As you can see there is a lot of stuff in Python where it makes compilation difficult (and thus more power to Nuitka for taking this challenge on). But I'm willing to bet the stuff I mentioned above you don't use 99.9% of the time, so if an implementation left them out could it still be considered "Python"?
How much compatibility is necessary to be useful?
I don't have a good answer to this question. But its answer dictates how hard it is to implement Python and how compatible it would be with preexisting software. I will say that I think WebAssembly doesn't need to support the vast amount of Python software out there to be useful. WebAssembly has access to other language ecosystems like Rust and JavaScript, so the possibility of having something you need implemented in another language that you can use else is definitely above zero.
I have no answers
It might make sense to develop a compiler that translates Python code directly to WebAssembly and sacrifice some compatibility for performance. It might make sense to develop an interpreter that targets WebAssembly's design but maintains a lot of compatibility with preexisting code. It might make sense to simply support RustPython in their WebAssembly endeavours. Maybe Pyodide will get us there. I don't think any of these possibilities are inherently wrong and it will probably just come down to whichever one sparks people's interest enough to see it to the point of being useful to others.