Python best practices and common security issues

Gergő Turcsányi (Software Engineer, Avatao)

python best practices

​Python is a high-level, flexible programming language that offers some great features. To be as effective as possible, it is important to possess the knowledge to make the most out of coding with Python. As one of the most common languages, Python has its security pitfalls. Our plan is to raise awareness while covering some of the most common security concerns, vulnerabilities and Python best practices in this post.

    1. input() (Python 2)

    They say security starts with Python 3, and this one is a classic example of that. This function not simply takes user input but evaluates it immediately as well (like `eval()`). It works as expected with numbers, but once you start entering strings, you’ll see it tries to find variables with the submitted names and throws an error if it can’t. Fortunately, it makes it hard for this function to end up in production code, but the key takeaway is the insecure default here.
    In Python 2, you should use raw_input() instead, to read user input as string (in Python 3 this behavior is changed, so you can use input() for this purpose).

      2. str.format()

      This function can be used – not surprisingly – to format strings. The trouble begins when the string contains user input before calling its `format()` function, because it can lead to vulnerabilities in special cases, like the one below:

      CONFIG = {'SECRET_KEY': '12345'}
      class User():
      def __init__(self, name): = name
      user = User('Joe')
      print('Hello {}'.format(user=user))
      #print('Already existing user input: {user.__class__.__init__.__globals__[CONFIG]}'.format(user=user))
      #print(f'Hello {}')

      If the user input ends up in the string before formatting, then attackers can leak the contents of the sensitive config dictionary like this:

      print('Malicious user input here: {user.__class__.__init__.__globals__[CONFIG]}'.format(user=user))

      Our recommended solution is using f-strings instead as they’re newer, more simple, and secure:

      print(f'Hello {}')

        3. yaml.load()

        I think this is one of the most known textbook examples of dangerous Python functions. Although it’s not a built-in function (like the others above), there’s a good chance you have to parse YAML files using Python at some point in your career, and usually the PyYAML module is the recommended choice for that:

        import yaml
        example = '''
        name: Joe
        age: 99

        Fortunately PyYAML is helpful and honest, hence it warns us about the unsafe and deprecated function call: *YAMLLoadWarning: calling yaml.load() without Loader=… is deprecated, as the default Loader is unsafe. Please read for full details. In case you’re wondering why is it so dangerous, we have prepared a nice exploit that executes os.system(“cat /etc/passwd”) to list the user database:

        exploit = '''!!python/object/new:tuple [!!python/object/new:map [!!python/name:eval , [ '__import__("os").system("cat /etc/passwd")' ]]]'''

          4. Shell command injection

          If you’re running processes or OS commands with user-supplied values or parameters, then there’s a risk of attackers injecting malicious payloads to achieve Remote Code Execution on your server.

          Let’s say we want to echo user input for the sake of simplicity. Probably the most naive approach is something like os.system(‘echo ‘ + user_input). Hopefully, you’re familiar with SQL injections and you already know that a string concatenation like this rarely ends well. A malicious actor can easily exploit your application with a payload like this `hello’; cat ‘secret.txt` to read the contents of arbitrary files.
          Of course, nowadays there are more sophisticated tools and best practices for calling OS commands and running subprocesses. Not surprisingly, the name of the most popular module for this purpose is subprocess. It provides several ways to start new processes and even communicate with them – one of them is They’re protected against command injection vulnerabilities by default.

          import subprocess
['echo', user_string])

          According to the documentation: “Unlike some other popen functions, this implementation will never implicitly call a system shell. This means that all characters, including shell metacharacters, can safely be passed to child processes.”
          The trouble begins when you want to use special shell characters (like pipes `|` or redirects `>`) because you need to invoke the shell explicitly in this case, as you can see below. Unfortunately, it comes with a downside, since `shell=True` makes our code unprotected against command injection vulnerabilities again.

'echo "{user_string}" >> user_string.txt', shell=True)

          If you really have to call subprocesses with shell=True and user input using Python, there’s still a way to do it securely. According to the official documentation: The shlex.quote() function can be used to properly escape whitespace and shell metacharacters in strings that are going to be used to construct shell commands.*

          import shlex
'echo {shlex.quote(user_string)} >> user_string.txt', shell=True)


          5. Working with archives

          Archives are a convenient solution among Python best practices that enable you to package several files into one. Python can be used to extract compressed data, but you should validate untrusted input to prevent vulnerabilities as usual. Let’s imagine we’re running an application that allows users to upload archives and automatically extract them into a public directory called uploads.
          One of the most popular built-in libraries for this purpose is tarfile – especially in Linux environments. Check out the source code of this pretty basic example script:

          import tarfile
          tf ='upload.tar.gz')

          These simple lines contain (at least) 2 potential vulnerabilities:
          Using symbolic links attackers can access sensitive files from this directory (ln -s /etc/passwd passwd.txt). Maliciously crafted archives can place (and even overwrite) files outside the target directory using path traversal. Usually, input validation is the key to security and this example is no exception. Make sure you’re only extracting regular files without tricky paths.
          Now let’s talk a little bit about the zipfile module. Don’t worry, it’s more secure, which means the vulnerabilities mentioned above are prevented by default. But we can still have our fun with ZIP bombs.

          import zipfile
          zf = zipfile.ZipFile('', 'r')

          Using a simple shell command anyone can create malicious ZIP archives that increase 1000x in size upon decompressing, which results in exhausting the resources of the computer trying to extract them. Checking the total size of uncompressed files could help prevent the issue, so it’s strongly recommended:

          def get_size(zf):
          res = 0
          for i in range(len(zf.filelist)):
          res += zf.filelist[i].file_size
          return res

          6. Working with XML

          If you ever had the opportunity to work with XML files, then you might know the attack called XML External Entity attack (XXE). It’s a common attack against a web application that parses XML as input. Sometimes it allows a malicious user to view files on the app server’s filesystem and to interact with systems that the application can access. The /etc/passwd file could be easily read using this payload through a vulnerable application:

          <!ENTITY age SYSTEM "file:///etc/passwd">
          Ethical hacker

          The fix is simply disabling XML external entity and DTD processing in all XML parsers in the application. Also, there’s an awesome package called defusedxml with secure default settings.

            7. Insecure deserialization

            Deserialization can be really dangerous, since you basically execute the serialized code. This means deserializing user-submitted objects equals Remote Code Execution.

            import pickle
            pickle.load(open('user_supplied_serialized_object', 'rb'))

            Unfortunately, these kinds of attacks can’t be prevented by using a secure function or a magic library. You should serialize data as JSON instead of classes as bytes if possible. If you can’t avoid deserializing user-submitted classes, then make sure you’re doing it in a sandbox environment with limited privileges. Don’t forget you’ll execute code from potential attackers. If the object was serialized by a trusted source, then make sure it can’t be tampered with. Use a secure channel and digital signature or MAC (Message Authentication Code)

              8. Bandit

              If you’re reading this there’s a good chance you’re working (or will work in the future) on larger Python projects. Let us introduce you to the most popular security-oriented static analyzer for Python: Bandit. It can help you to find common security issues in a huge codebase.
              It’s just that – a static analyzer, so don’t expect it to make your code automagically secure, but it’s really useful for catching risky coding patterns like:

              • Hardcoded passwords
              • Unsafe functions
              • Weak cryptographic keys
              • Potential injection vulnerabilities

              It’s important to highlight that the results are based on coding patterns, and don’t always mean they can be actually exploited. If you have found a false positive, then let’s just add a `#nosec` comment after the line – this way the “issue” will be ignored while scanning.
              Bandit should be added to your CI/CD pipeline to catch bugs before deploying production code, but ideally, it should be used as pre-commit git hook as well (to avoid committing hardcoded passwords accidentally for example).

                9. Safety

                It’s a really useful offline dependency scanner that can be installed easily with pip3 install safety.
                It can even process requirements lists as well, which is great if you want to check the modules before installing them. Also, you can (and should) add a safety check into your CI/CD pipeline to avoid releasing your application with known security issues.
                Of course, it’s recommended to keep this package up to date and you should consider using the dependency scanner of Snyk as well to maximize security.

                  Closing words

                  It’s important to highlight that this is more like an appetizer of Python best practices than an exhaustive list about how to secure Python applications. There is much more to explore when it comes to Python security. In an upcoming post, we will dig a bit deeper into the security issues of Python, so stay tuned! If you are interested in a challenge, try our interactive tutorials based on the topics above, to get hands-on experience in hacking and fixing real applications.

                  Share this post on social media!

                  Related Articles

                  JWT handling best practices

                  JWT handling best practices

                  The purpose of this post is to present one of the most popular authorization manager open standards JWT. It goes into depth about what JWT is, how it works, why it is secure, and what the most common security pitfalls are.

                  Ruby needs security

                  Ruby needs security

                  Every year, Ruby is becoming more and more popular thanks to its elegance, simplicity, and readability. Security, however, is an issue we can’t afford to neglect.

                  5 Steps your security program should include

                  5 Steps your security program should include

                  For most companies, security is considered a side quest, which is partly related to the daily processes. In reality, security ought to be a strong foundation of any organization. To ensure the defense of the enterprise, the relevant teams need strong security knowledge and abilities.