Reading Time: 9 minutes

Python best practices and common security issues

Gergő Turcsányi (Software Engineer, Avatao)

python best practices

​Python is a high-level, flexible programming language that offers some great features. To be as effective as possible, it is important to possess the knowledge to make the most out of coding with Python. As one of the most common languages, Python has its security pitfalls. Our plan is to raise awareness while covering some of the most common security concerns, vulnerabilities and Python best practices in this post.

    1. input() (Python 2)

    They say security starts with Python 3, and this one is a classic example of that. This function not simply takes user input but evaluates it immediately as well (like `eval()`). It works as expected with numbers, but once you start entering strings, you’ll see it tries to find variables with the submitted names and throws an error if it can’t. Fortunately, it makes it hard for this function to end up in production code, but the key takeaway is the insecure default here.
    In Python 2, you should use raw_input() instead, to read user input as string (in Python 3 this behavior is changed, so you can use input() for this purpose).

      2. str.format()

      This function can be used – not surprisingly – to format strings. The trouble begins when the string contains user input before calling its `format()` function, because it can lead to vulnerabilities in special cases, like the one below:

      ```python
      CONFIG = {'SECRET_KEY': '12345'}
      ​
      class User():
      def __init__(self, name):
      self.name = name
      ​
      user = User('Joe')
      ​
      print('Hello {user.name}'.format(user=user))
      #print('Already existing user input: {user.__class__.__init__.__globals__[CONFIG]}'.format(user=user))
      #print(f'Hello {user.name}')
      ```
      

      If the user input ends up in the string before formatting, then attackers can leak the contents of the sensitive config dictionary like this:

      ```python
      print('Malicious user input here: {user.__class__.__init__.__globals__[CONFIG]}'.format(user=user))
      ```​
      

      Our recommended solution is using f-strings instead as they’re newer, more simple, and secure:

      ```python
      print(f'Hello {user.name}')
      ```

        3. yaml.load()

        I think this is one of the most known textbook examples of dangerous Python functions. Although it’s not a built-in function (like the others above), there’s a good chance you have to parse YAML files using Python at some point in your career, and usually the PyYAML module is the recommended choice for that:

        ```python
        import yaml
        ​
        example = '''
        person:
        name: Joe
        age: 99
        '''
        ​
        print(yaml.load(example))
        ```

        Fortunately PyYAML is helpful and honest, hence it warns us about the unsafe and deprecated function call: *YAMLLoadWarning: calling yaml.load() without Loader=… is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. In case you’re wondering why is it so dangerous, we have prepared a nice exploit that executes os.system(“cat /etc/passwd”) to list the user database:

        ```python
        exploit = '''!!python/object/new:tuple [!!python/object/new:map [!!python/name:eval , [ '__import__("os").system("cat /etc/passwd")' ]]]'''
        print(yaml.load(exploit))
        ```

          4. Shell command injection

          If you’re running processes or OS commands with user-supplied values or parameters, then there’s a risk of attackers injecting malicious payloads to achieve Remote Code Execution on your server.

          Let’s say we want to echo user input for the sake of simplicity. Probably the most naive approach is something like os.system(‘echo ‘ + user_input). Hopefully, you’re familiar with SQL injections and you already know that a string concatenation like this rarely ends well. A malicious actor can easily exploit your application with a payload like this `hello’; cat ‘secret.txt` to read the contents of arbitrary files.
          Of course, nowadays there are more sophisticated tools and best practices for calling OS commands and running subprocesses. Not surprisingly, the name of the most popular module for this purpose is subprocess. It provides several ways to start new processes and even communicate with them – one of them is subprocess.call(). They’re protected against command injection vulnerabilities by default.

          ```python
          import subprocess
          ​
          subprocess.call(['echo', user_string])
          ```

          According to the documentation: “Unlike some other popen functions, this implementation will never implicitly call a system shell. This means that all characters, including shell metacharacters, can safely be passed to child processes.”
          The trouble begins when you want to use special shell characters (like pipes `|` or redirects `>`) because you need to invoke the shell explicitly in this case, as you can see below. Unfortunately, it comes with a downside, since `shell=True` makes our code unprotected against command injection vulnerabilities again.

          ```python
          subprocess.call(f'echo "{user_string}" >> user_string.txt', shell=True)
          ```

          If you really have to call subprocesses with shell=True and user input using Python, there’s still a way to do it securely. According to the official documentation: The shlex.quote() function can be used to properly escape whitespace and shell metacharacters in strings that are going to be used to construct shell commands.*

          ```python
          import shlex
          subprocess.call(f'echo {shlex.quote(user_string)} >> user_string.txt', shell=True)
          ```

            5. Working with archives

            Archives are a convenient solution among Python best practices that enable you to package several files into one. Python can be used to extract compressed data, but you should validate untrusted input to prevent vulnerabilities as usual. Let’s imagine we’re running an application that allows users to upload archives and automatically extract them into a public directory called uploads.
            One of the most popular built-in libraries for this purpose is tarfile – especially in Linux environments. Check out the source code of this pretty basic example script:

            ```python
            import tarfile
            tf = tarfile.open('upload.tar.gz')
            tf.extractall('uploads')
            ```

            These simple lines contain (at least) 2 potential vulnerabilities:
            Using symbolic links attackers can access sensitive files from this directory (ln -s /etc/passwd passwd.txt). Maliciously crafted archives can place (and even overwrite) files outside the target directory using path traversal. Usually, input validation is the key to security and this example is no exception. Make sure you’re only extracting regular files without tricky paths.
            Now let’s talk a little bit about the zipfile module. Don’t worry, it’s more secure, which means the vulnerabilities mentioned above are prevented by default. But we can still have our fun with ZIP bombs.

            ```python
            import zipfile
            zf = zipfile.ZipFile('upload.zip', 'r')
            zf.extractall('uploads')
            ```

            Using a simple shell command anyone can create malicious ZIP archives that increase 1000x in size upon decompressing, which results in exhausting the resources of the computer trying to extract them. Checking the total size of uncompressed files could help prevent the issue, so it’s strongly recommended:

            ```python
            def get_size(zf):
            res = 0
            for i in range(len(zf.filelist)):
            res += zf.filelist[i].file_size
            return res
            ```

            6. Working with XML

            If you ever had the opportunity to work with XML files, then you might know the attack called XML External Entity attack (XXE). It’s a common attack against a web application that parses XML as input. Sometimes it allows a malicious user to view files on the app server’s filesystem and to interact with systems that the application can access. The /etc/passwd file could be easily read using this payload through a vulnerable application:

            ```xml
            <!ENTITY age SYSTEM "file:///etc/passwd">
            ​
            ]>
             Hackerman
            &age;
            Ethical hacker
             ```

            The fix is simply disabling XML external entity and DTD processing in all XML parsers in the application. Also, there’s an awesome package called defusedxml with secure default settings.

              7. Insecure deserialization

              Deserialization can be really dangerous, since you basically execute the serialized code. This means deserializing user-submitted objects equals Remote Code Execution.

              ```python
              import pickle
              pickle.load(open('user_supplied_serialized_object', 'rb'))
              ```

              Unfortunately, these kinds of attacks can’t be prevented by using a secure function or a magic library. You should serialize data as JSON instead of classes as bytes if possible. If you can’t avoid deserializing user-submitted classes, then make sure you’re doing it in a sandbox environment with limited privileges. Don’t forget you’ll execute code from potential attackers. If the object was serialized by a trusted source, then make sure it can’t be tampered with. Use a secure channel and digital signature or MAC (Message Authentication Code)

                8. Bandit

                If you’re reading this there’s a good chance you’re working (or will work in the future) on larger Python projects. Let us introduce you to the most popular security-oriented static analyzer for Python: Bandit. It can help you to find common security issues in a huge codebase.
                It’s just that – a static analyzer, so don’t expect it to make your code automagically secure, but it’s really useful for catching risky coding patterns like:

                • Hardcoded passwords
                • Unsafe functions
                • Weak cryptographic keys
                • Potential injection vulnerabilities

                It’s important to highlight that the results are based on coding patterns, and don’t always mean they can be actually exploited. If you have found a false positive, then let’s just add a `#nosec` comment after the line – this way the “issue” will be ignored while scanning.
                Bandit should be added to your CI/CD pipeline to catch bugs before deploying production code, but ideally, it should be used as pre-commit git hook as well (to avoid committing hardcoded passwords accidentally for example).

                  9. Safety

                  It’s a really useful offline dependency scanner that can be installed easily with pip3 install safety.
                  It can even process requirements lists as well, which is great if you want to check the modules before installing them. Also, you can (and should) add a safety check into your CI/CD pipeline to avoid releasing your application with known security issues.
                  Of course, it’s recommended to keep this package up to date and you should consider using the dependency scanner of Snyk as well to maximize security.

                    Closing words

                    It’s important to highlight that this is more like an appetizer of Python best practices than an exhaustive list about how to secure Python applications. There is much more to explore when it comes to Python security. In an upcoming post, we will dig a bit deeper into the security issues of Python, so stay tuned! If you are interested in a challenge, try our interactive tutorials based on the topics above, to get hands-on experience in hacking and fixing real applications.

                    Related Articles

                    5 Key Challenges When Building a Security Training Program

                    5 Key Challenges When Building a Security Training Program

                    Reading Time: 6 minutes To build an enterprise security program, one has to go back to the well-known fundamentals of organizational change: People, Process, and Technology (originates from Harold Leavitt’s “Applied Organization Change in Industry”, 1964).

                    Getting started with Kotlin

                    Getting started with Kotlin

                    Reading Time: 8 minutes If you are working on Java projects you might have heard about other languages that run on the JVM, like Clojure, Kotlin, or Scala. Programmers like to try new things out but is it worth it to pick one of them over Java?

                    Life Before Docker and Beyond – A Brief History of Container Security

                    Life Before Docker and Beyond – A Brief History of Container Security

                    Reading Time: 11 minutes Containers have been around for over a decade. Yet before Docker’s explosive success beginning in 2013 they were not wide-spread or well-known. Long gone are the days of chroot, containers are all the rage, and with them, we have a whole new set of development and security challenges.